0% found this document useful (0 votes)

12 views19 pages

Automating GDPR Checks for Android Apps

Uploaded by

ishitasinghal26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views19 pages

Automating GDPR Checks for Android Apps

Uploaded by

ishitasinghal26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Automating the GDPR Compliance

Assessment for Cross-border Personal

Data Transfers in Android Applications**
Danny S. Guamán , Xavier Ferrer, Jose M. del Alamo , Jose Such

Abstract— The General Data Protection Regulation (GDPR) aims to ensure that all personal data processing activities are fair
and transparent for the European Union (EU) citizens, regardless of whether these are carried out within the EU or anywhere
else. To this end, it sets strict requirements to transfer personal data outside the EU. However, checking these requirements is a
daunting task for supervisory authorities, particularly in the mobile app domain due to the huge number of apps available and
their dynamic nature. In this paper, we propose a fully automated method to assess compliance of mobile apps with the GDPR
requirements for cross-border personal data transfers. We have applied the method to the top-free 10,080 apps from the
Google Play Store. The results reveal that there is still a very significant gap between what app providers and third-party
recipients do in practice and what is intended by the GDPR. A substantial 56% of analysed apps are potentially non-compliant
with the GDPR cross-border transfer requirements.

Index Terms—D.4.6 Security and Privacy Protection, J.9 Mobile Applications, K.4.1.f Privacy, K.4.1.g Regulation, k.4.1.h
Transborder data flow

—————————— ◆ ——————————

1 INTRODUCTION
THE distributed nature of today’s digital systems and ser- across the world [1], or shared between chains of third-
vices not only facilitates the collection of personal data party service providers [6], even without the app develop-
from individuals anywhere, but also their transfer to differ- er’s knowledge [7]. Second, apps are distributed through
ent countries around the world [1]. This raises potential global stores, enabling app providers to easily reach mar-
risks to the privacy of individuals, as the organizations kets and users beyond its country of residence. In this
sending and receiving personal data can be subject to dif- context, there is a need for constant vigilance by the vari-
ferent data protection laws and, therefore, may not offer an ous stakeholders, including app developers, supervisory
equivalent level of protection. In some regions, such as authorities, and app distribution platforms, to ensure that
China, privacy may be less valued, or valued differently, appropriate requirements have been met and to avoid
when compared to order and governance [2]. While in potential data protection compliance breaches.
other regions, particularly in the European Union (EU), pri- Nevertheless, testing or auditing mobile apps against
vacy is strenuously protected [3] and is conceived as a Hu- legal data protection requirements is challenging. First, it
man Right [4]. As a result, the General Data Protection Reg- is necessary to simplify the high-level legal requirements
ulation (GDPR) [3], constraints cross-border transfers (also into concrete step-by-step technical criteria and indicators
named international transfers) outside the European Eco- to be assessed in mobile apps. Second, parties responsi-
nomic Area (EEA)1 and recognises only twelve non-EU ble for checking compliance with data protection re-
countries as providing protection equivalent to the GDPR. quirements, such as supervisory authorities, require au-
Mobile applications, or just “apps”, exacerbate the data tomated assessment methods and tools to cope with the
protection compliance issues for organizations, notably vast and ever changing mobile ecosystem [5]. Even distri-
with requirements related to cross-border transfers. The bution platforms like Google Play Store can benefit from
particularities of the app development and distribution these automated assessment methods and thus extend
ecosystems are major factors underlying these issues [5]. their protection mechanisms, such as Google Play Protect,
First, apps collect a great amount of personal data, which which currently focuses on detecting potential harmful
may be transmitted from the device to data processors behaviour in the Android ecosystem only from the securi-
————————————————
ty perspective.
** Author's copy of the manuscript
Our main contributions to address this challenge are:
• Danny S. Guamán is with Universidad Politécnica de Madrid, Spain and Escuela 1) An automated method to assess compliance of An-
Politécnica Nacional, Ecuador. E-mail: danny.guaman@epn.edu.ec. droid apps with the cross-border transfer requirements
• Xavier Ferrer is with the Department of Informatics, King’s College London, established by the GDPR. It leverages our prior work on a
United Kingdom.
• Jose M. del Alamo is with Universidad Politécnica de Madrid, 28040 Madrid,
compliance assessment process [8], and further extends it
Spain. E-mail: jm.delalamo@upm.es with an automated approach to identify cross-border
• Jose Such is with the Department of Informatics, King’s College London, United transfer statements from natural language privacy poli-
Kingdom. E-mail:
cies. With an F-measure ranging from 85.7% to 100% in
1
The EEA includes all the EU Member States plus Norway, Iceland
and Liechtenstein. For the sake of clarity, we will use the term EU identifying the different cross-border transfer statements,
from now to refer to all of these countries.
our approach can be exploited to extract these privacy To fill this gap, in prior work [8] we defined an earlier
practices with a high degree of certainty. method for the compliance assessment of Android apps
2) A large-scale assessment of Google Play Store apps with GDPR cross-border personal data transfer require-
compliance with the cross-border transfer requirements of ments. This work supported the app behaviour analysis
GDPR. We leveraged our automated method to assess the through dynamic testing techniques. It also identified the
top-free 10,080 apps in Spain and the top 110 third-party specific requirements for the transparency elements to be
services they use. included in the privacy policies for the lawful disclosure of
the international transfer. However, the compliance as-
sessment process was not automated, as the interpreta-
2 RELATED WORK tion of the privacy policies required human analysis, and
The automated compliance assessment of Android apps thus did not scale. We have extended this prior work with
requires the analysis of both the privacy policy text and an automated approach to identify cross-border transfer
the app behaviour. statements from natural language policies. As a result, we
For the privacy policy analysis, automated approaches have been able to carry out an extensive assessment of
[9] [10] rely on the codification or annotation method [11], cross-border transfers in Google Play Store apps.
where one or multiple domain analysts generate struc-
tured annotations of privacy practices (i.e., a corpus) by
systematically assigning a label to the policy statements. 3 GDPR CROSS-BORDER TRANSFERS
Useful corpora have been released in the privacy domain As illustrated in Figure 1, there are specific criteria for
[10], [12]. These corpora ultimately are used as ground determining what must be considered a cross-border
truth for building automatic classification models. For transfer, to which the GDPR lays down further require-
example, Zimmeck et al. [10] automated the extraction of ments. These requirements include, in each case, the dis-
data collection practices from privacy policies, while closure of specific and meaningful information to data
Andow et al. [13] distinguished the entity (i.e. first-party subjects. Next, we briefly summarize the criteria, and refer
vs. third-party) to which personal data is sent. the interested reader to [8] for details.
Focusing on GDPR, Fan et al. [14] empirically assessed The criterion C1.1 determines whether personal data, in
transparency, data minimization, and confidentiality re- the meaning of GDPR [19], are sent to remote recipients
quirements in Android mHealth apps, checking whether (See DT enumeration in Fig. 1).
six different practices are informed through privacy poli- The criterion C1.2 determines whether an app targets
cies. Mangset [15] also checked GDPR requirements relat- EU citizen. We fairly assume that mobile apps available in
ed to transparency, data minimization (collection practic- the Google Play Store reachable from an EU country are
es), confidentiality (data at rest in transit), and some user indeed targeting EU users.
rights (particularly, consent and objection automatically The criterion C1.3 determines to which country person-
individual decision-making). Unfortunately, none of these al data are sent. Data transfers between EU countries do
works addressed cross-border transfer practices. not add further constraints or requirements.
As for the app behaviour analysis, researchers have lev- The criterion C1.4 distinguishes whether the servers lo-
eraged static, dynamic or hybrid techniques. Ferrara and cated outside the EU belong to the app provider itself ( i.e.
Spoto [16] relied on static code analysis to detect disclo- first-party recipient or data controller in GDPR terms), or
sures of personal data so that data protection officers another organization (i.e. third-party recipient). In the
could spot potential GDPR infringements. Jia et al. [17] former case, if the first-party recipient is also located out-
leveraged dynamic techniques to detect personal data side the EU then it must disclose the contact details of its
disclosures in network packets lacking user consent. While representative in the EU2 (T1 in Fig. 1). The latter is con-
our work focuses on different GDPR requirements, Jia’s sidered an international data transfer.
work could be seen as complementary to our work as the The criterion C1.5 seeks to determine the specific
app behavoiur analysis method could minimize the false- transparency requirements of the international transfer,
negative rate. namely adequacy or non-adequacy decision.
Finally, we consider Eskandari et al. [18] as the closest Twelve non-EU countries have been recognized as
related work with regard to the GDPR requirements cov- providing data protection equivalent to the GDPR and
ered. They propose PDTLoc, an analysis tool that employs therefore maintaining an adequacy decision (See ADC
static analysis to detect violations of article 25.1 of the EU enumeration in Fig. 1). International transfers to these
Data Protection Directive (European data protection law countries can take place without any further safeguards.
replaced by the GDPR). This Directive set requirements for However, to ensure transparency3 the app provider should
international transfers similar to those laid down in the disclose (1) the intention to transfer personal data to a
GDPR. However, this prior work presumes any transfer non-EU country, (2) the names of targeted countries, and
outside the EU to be a regulatory infringement, thus this (2) the existence of an adequacy decision by the Commis-
approach would have incorrectly identified potential sion [20] (T2 in Fig. 1).
compliance issues. The authors did not consider the priva-
cy policies as a means of disclosing the intention to per-
form cross-border transfers and the appropriate safe- 2
guards that do enable these transfers. GDPR Art. 27(1)
3
GDPR Art. 13(1)(f) and Art. 14(1)(f)
Extract data flow
<<enumeration>>
fA = (d, c, r) <<enumeration>> <<enumeration>>
transparencyNonEUFirstParty
adequacyDecisionCountry dataType
(T1)
Check criterion C1.1 (ADC) (DT)

Representative_Contact_Details d DT
Andorra Contact_Address_Book
[not meet]
<<enumeration>> Argentina Contact_E_Mail_Address
[meet]
transparencyAdequacyDecision Canada Contact_Phone_Number
(T2) Check criterion C1.2
A targets the EU Faroe Islands Identifier_Advertising_ID
Transfer_Intention [not meet] Guernsey Identifier_Cookie
Target_Countries [meet] Israel Identifier_Device_ID
Check criterion C1.3
Existence_Adequacy_Decision Isle of Man Identifier_IMEI
c EU
Japan Identifier_IMSI
<<enumeration>> [not meet]
transparencyNonAdequacyDecision Jersey Identifier_MAC
[meet]
(T3)
Check criterion C1.4 New Zealand Identifier_SIM_Serial
r is-a third-party recipient
Transfer_Intention Switzerland Identifier_SSID_BSSID

Target_Countries [first-party] Uruguay Location_Bluetooth

Disclose T1
Appropriate_Safeguards [third-party]
Location_GPS
Check criterion C1.5
Copy_Safeguards Location_WiFi
c ADC

[adequacy decision] [non-adequacy decision]

Disclose T2 Disclose T3

Fig. 1. Criteria for distinguishing the type of cross-border transfer performed by each app (A) targeting European Union (EU) data subjects.
Each flow of mobile application A (fA) is represented by the type of personal data (d), destination country (c), and type of recipient (r). In each
case (grey boxes), specific information (Ti) should be disclosed to data subjects, generally through privacy policies to ensure transparency.

International transfers not covered by an adequacy de- tain a copy of the safeguard(s) [20] (T3 in Fig. 1).
cision requires app providers to adopt “appropriate safe- In the absence of an adequacy decision or any appro-
guards” before carrying them out. GDPR defines assur- priate safeguards, some exceptions5 allow for internation-
ance mechanisms to enable international transfers in such al transfers in specific situations. We highlight the explicit
cases4, including Standard Data Protection Clauses, Bind- consent, which requires the consent through an affirma-
ing Corporate Rules, Approved Codes of Conduct and tive action of the data subjects, e.g., ticking a box, to be
Approved Certification Schemes. These assurance mecha- obtained after providing precise details of the internation-
nisms should be approved by the EU and, in general, al- al transfers. In this case, the data subject should also be
low the app provider to ensure that third-party recipients able to withdraw consent easily at any time.
have implemented appropriate safeguards to guarantee a
protection level equivalent to GDPR. The EU has adopted
four Standard Data Protection Clauses [21], which should
4 COMPLIANCE ASSESSMENT METHOD
be incorporated into contracts between the parties con- Assessing compliance of an app with cross-border trans-
cerned. Binding Corporate Rules should be established fers requires fundamentally three activities (privacy policy
when international transfers take place between compa- analysis, app behaviour analysis and compliance checking)
nies belonging to a corporate group and should be and two inputs (the privacy policy used for the privacy
signed with the Commission’s approval. To the best of our policy analysis and the Android application package APK
knowledge, the EU has not yet adopted any Code of Con- used for the App analysis) as shown in Fig. 2.
duct or Certification Scheme for GDPR. Finally, to ensure The privacy policy analysis parses the privacy policy of
transparency, the app provider should inform the data the app to extract the cross-border transfer practices
subjects about (1) the intention to transfer personal data disclosed by the app provider.
to a non-EU country, (2) the names of targeted countries,
(3) a reference to the appropriate safeguard(s) according
to the aforementioned options, and (4) the means to ob-

4 5
GDPR Chapter V GDPR Art. 49
App analysis Privacy policy analysis Compliance checking

Identify type of cross-

border transfer**

Download app Download

privacy policy
[non-EU data controller] [Non-adequacy decision]
Analyse app traffic Tag cross-border
and metadata transfer intention [Adequacy decision]

Check T1 Check T2 Check T3**

Identify personal Identify recipient Identify Tag target Tag appropriate Tag safeguard
data types country recipient type country safeguard copy reference

Report type of
consistency

Fig. 2. The automated pipeline that detects the app’s cross-border transfers and automatically checks them against the corresponding privacy
practices in its privacy policy. (**) These activities have been detailed in Fig. 1

In parallel, the app analysis part consists of installing TABLE 1. CROSS-BORDER TRANSFER ANNOTATION SCHEME.
and executing the application (APK) to observe its real Cross-border transfer type Required transparency elements
behaviour, particularly the type of personal data it leaks, T1. Transfer to non-EU data EU Representative contact infor-
the type of recipient who receives the data (i.e., first-party controller mation
or third-party recipient), the country in which the recipient Transfer intention
servers are located, and the information on the app digital T2. International transfer
Existence of EU adequacy decision
certificate that determines the app provider and its loca- (with adequacy decision)
Target country
tion.
Transfer intention
Finally, based on the practices disclosed through the
Target country
app privacy policy and the cross-border transfers it actual-
Appropriate safeguards:
ly performs, a compliance checking alerts of potential
T3. International transfer -Standard Data Protection Clauses
non-compliant behaviour.
(without adequacy deci- -Binding Corporate Rules
Section 4.1 details the privacy policy analysis, while
sion) -Approved Codes of Conduct
Sections 4.2 and 4.3 summarise the relevant aspects of the
-Approved Certification Schemes
app analysis and compliance checking, respectively. For
-Explicit consent
more details on the latter two, refer to [8].
Copy means
4.1 Automated Privacy Policy Analysis
4.1.1 Cross-border transfer intention classifier
In this section, we present our automated approach for
classifying cross-border transfer practices in privacy poli- This classifier tags each policy segment, roughly a para-
cies (Table 1). We relied on the IT-100 Corpus6 that con- graph, to indicate whether it discloses (1) or not (0) the
sists of one hundred privacy policies manually annotated intention to perform a cross-border transfer. We use the
by two privacy experts and contains 3,715 policy seg- IT-100 corpus to train a supervised machine learning (ML)
ments of which 281 segments contain transparency ele- algorithm and generate a binary classifier. We have fol-
ments of cross-border transfers. lowed the systematic process shown in Figure 4 to deter-
As illustrated in Fig. 3, individual transparency elements mine the best performance classification model.
are part of an entire policy segment disclosing a cross- Our business may require us to transfer your Personal Data to
border transfer practice. Therefore, we composed a two- countries outside of the European Economic Area (EEA), includ-
layer classification pipeline: a cross-border transfer inten- ing to countries such as the Peoples Republic of China or Singa-
tion classifier to identify entire policy segments disclosing pore. We take appropriate steps to ensure that recipients of
the intention to perform a cross-border transfer (Section your Personal Data are bound to duties of confidentiality and
4.1.1), followed by a transparency element classifier to we implement measures such as standard contractual clauses. A
identify the individual transparency elements disclosed copy of those clauses can be obtained by contacting our Help
(Section 4.1.2). The validation of the two-layer classifica- Center.
Fig. 3. Segments of the privacy policy of the
tion pipeline is presented in Section 4.1.3. net.manga.geek.mangamaster app. The segment discloses a typical
cross-border transfer practice, including the transfer intention (blue),
the target country (red), the appropriate safeguards (purple), and the
means to get a copy of such safeguards (green).
6
The corpus has been released at https://github.com/PrivApp/IT100-
Corpus
Experimental setting

Feature vector Importance weights Application of Stratified k-fold Experimental

composition assignment supervised ML validation setting selection

Fig. 4. The overall process to generate the cross-border transfer classification model.

a) Feature vector composition. We relied on the bag-of- b) Importance weight assignment. We experimented with
words model to define a set of candidate features based three different approaches to assigning weights to the
on all distinct terms in the IT-100 corpus policy segments. selected features: a binary counter (BC), the term frequen-
We tokenized the 3,715 policy segments and extracted a cy (TF) and the Term Frequency-Inverse Document Fre-
set of 9,239 candidate features. The rationale for this key quency (𝑇𝐹. 𝐼𝐷𝐹). The BC encodes the presence or ab-
assumption is that there is a distribution of individual sence of each feature. The 𝑇𝐹 encodes the number of
terms in cross-border transfer practices that is distinct times that each feature occurs in a policy segment. Third,
from the distribution of unrelated practices. Figure 5 de- the 𝑇𝐹. 𝐼𝐷𝐹 relies on the 𝑇𝐹 to encode the number of
picts the relative document frequency of the 30-top times that each feature occurs in a policy segment, but
terms7 that mostly appear in segments (1) disclosing the 𝐼𝐷𝐹 penalizes (decreases) it as the feature 𝑥𝑖 occurs
cross-border transfers practices and (2) disclosing other across many policy segments. Formally, the 𝑇𝐹. 𝐼𝐷𝐹 for
unrelated practices (e.g., collection or sharing practices).
𝑁
the feature 𝑥𝑖 is computed as 𝑇𝐹𝑖 . 𝑙𝑜𝑔 , where 𝑁 is the
𝑛𝑖
For illustrative purposes, we represent the 30-top terms of total number of IT-100 corpus policy segments, and 𝑛 is
each segment class in the same plot, which have then the number of policy segments that contain the feature 𝑥𝑖 .
been rearranged to distinguish the terms frequently used c) Application of an ML supervised algorithm. The above-
in both segment classes and the terms frequently used in mentioned feature vectors and their corresponding class
one of them. There is a subset of generic terms which are labels can be denoted as 𝑆 = {(𝑥1 , 𝑦1 ), … , (𝑥𝑛 , 𝑦𝑛 )}, where
frequently used in both segment classes. For example, the 𝑥𝑖 is the feature vector of the policy segment 𝑖, and 𝑦𝑖 ∈
term “information” appears in 64% of the cross-border {0,1} indicates the class label of the policy segment. In our
transfer segments and also in 58% of the unrelated seg- case, 1 implies the policy segment 𝑖 discloses the inten-
ments. The same applies to other terms such as “privacy”, tion to perform a cross-border transfer, and 0 its absence.
“data” or “protection”, which are generic and are not tied By using the training sample 𝑆, stemmed from the IT-100
to a specific privacy practice. On the other hand, a subset corpus, we used the Support Vector Machine (SVM) tech-
of terms is related to cross-border transfers (e.g., “trans- nique to find the optimal separation hyperplane that best
fer”, “country”, and “outside”). These terms appear mostly divides the dataset into the two classes mentioned. SVM
in the segments disclosing cross-border transfers and only has been empirically demonstrated better performance
marginally in unrelated segments. For example, the term over a variety of other ML techniques in high dimensional
“transfer” appears in 87% of the cross-border transfer spaces, being still effective in cases where the number of
statements but only in 4% of the other statements. dimensions is greater than the number of samples[22].
We pre-processed the 9,239 candidate features, remov- Also, prior work [9], [23] demonstrated that SVM can
ing number-related tokens, punctuation, non-ASCII char- reach higher performance than Logistic Regression and
acters, stop words, and duplicated tokens after normaliz- Convolutional Neural Network for privacy practices classi-
ing them to lowercase. This pre-processing removed up to fication.
32% of features worthless in distinguishing cross-border
transfers, leaving a total of 6,204 features.
Furthermore, while individual terms of policy segments
(e.g., “transfer”) are relevant for categorizing cross-border
transfer practices, other relevant term units built upon a
contiguous sequence of n terms can also be relevant (e.g.,
“approved contractual clauses”). Therefore, apart from the
individual terms, we relied on the n-gram model to parse
each policy segment into new composite features. We
experimented empirically to determine the value of n,
selecting the range of n-grams that provides the highest
performance metrics.

7
Before computing the relative document frequency of each term,
we removed morphological affixes from terms and took their roots Fig. 5. Relative document frequency of top-30 terms that appear in
by using the Porter Snowball Stemmer [25] in order to resolve the cross-border transfer segments (blue) and other unrelated segments
usage of inflected words. The full relative frequency distribution can (yellow).
be found in the replication package available at
http://dx.doi.org/10.17632/drx5nc3hr4
d) Stratified k-fold validation. The IT-100 corpus is an international transfers to the United States. Third, city
imbalanced dataset i.e. the number of samples tagged as names rather than country names are also used by a mi-
disclosing cross-border transfers is much lower than the nor number of privacy policies. Our approach, therefore,
number of samples not disclosing it. Accordingly, stratified involves a dictionary of countries, cities and aliases, whose
k-fold cross-validation was carried out, establishing k = 5. occurrences are sought in the pertinent policy segments.
The advantage of the stratified k-fold cross-validation is More specifically, we relied on the CountryInfo dataset8
that the entire dataset is used for both training and test- that provides details on all countries, including their ca-
ing while ensuring that policy segments tagged as dis- nonical names, country codes, as well as their states and
closing a cross-border transfer practice are represented provinces. We extended it by adding an ‘alias’ field to
consistently among all training and validation folds. register domain-specific terms implicating a particular
e) Results and experimental setting selection. We per- country. For example, “Privacy Shield” was added as an
formed an empirical evaluation of the effect of iteratively alias for the United States. Both policy segments disclos-
applying the different experimental settings explained ing a cross-border transfer and the country dictionary
above (Table 2). We relied on Scikit Learn [24] and the values were first normalised to lowercase. Then, if a non-
Natural Language Toolkit [25] to carry out all our experi- EU country, state, province, or alias from the dictionary
ments. In each case, we computed the standard perfor- occur in a policy segment, it is labelled with the identified
mance metrics to compare the models generated. Since country/s.
the IT-100 corpus is an imbalanced dataset, we primarily The results of applying this approach to a set of 117 IT-
used the F-measure instead of the accuracy metric. For 100-Corpus policy segments disclosing cross-border
models with comparable F-measures, we have favoured transfers are shown in Table 4. Overall, we observe that
the model with the highest recall for the sake of more the approach provides high performance in detecting
conservative analysis. That is, we seek to prevent a privacy target countries in policy segments. Admittedly, the num-
policy that does disclose the intention to perform a cross- ber of positive ground truth is minimum for certain coun-
border transfer from being tagged as not doing so. tries, but even in the case of the United States, which is
the most mentioned country, the method works strongly
TABLE 2. SETTING PARAMETERS USED TO BUILD THE CROSS- despite being disclosed in different ways. We observed
BORDER CLASSIFICATION MODELS. that a couple of policy segments were misclassified due to
Task Approaches typos in the country names or compound ways of refer-
Individual terms ring to countries (e.g. California-based), which scape the
Feature vector proposed approach. On the other hand, this approach
N-gram terms (1-4)
generation depends directly on the policy segments fed by the binary
Stemmed N-gram terms
classifier of cross-border transfers.
Binary counter
Importance weight Term frequency (TF)
TABLE 3. PERFORMANCE OF CROSS-BORDER CLASSIFICATION
assignment Term frequency-Inverse Document MODELS BY USING DIFFERENT WEIGHT ASSIGMENTS.
Frequency (TF.IDF)
Weighting N-gram Precision Recall F-measure
approach
Table 3 summarises the results. The experimental set-
ting that consistently provided better performance (F- BC 1 81.6% 80.8% 81.2%
measure of 90.9%) built on top of a feature vector of 1-2 83.4% 75.2% 79.1%
stemmed uni- and bigrams and TF as the weighting as- 1-3 87.0% 64.8% 74.3%
signment approach. Also, the best SVM parameters are 1-4 87.7% 57.6% 69.5%
the Modifier-Huber loss function and SVM alpha of 10-3. TF 1 90.1% 90.4% 90.2%
This setting has been used to build the definitive binary 1-2 89.9% 92.0% 90.9%
classification model to identify the entire policy segments 1-3 83.0% 94.4% 88.3%
disclosing the intention to perform a cross-border trans- 1-4 76.6% 94.4% 84.5%
fer. This is then placed at the entry to the transparency
TF-IDF 1 87.7% 88.0% 87.9%
element classifiers explained in the next section.
1-2 87.9% 88.8% 88.3%
4.1.2 Transparency elements classifiers 1-3 85.0% 89.6% 87.2%
The policy segments disclosing the intention to perform 1-4 84.7% 90.4% 87.5%
cross-border transfers are further processed to identify All these classification models were trained by considering stemmed
specific transparency elements, as per Table 1. features and different n-gram sizes. The experimental settings that
Target country classification. After analysing the IT-100 got the highest performance metric are distinguished in blue, while
Corpus, we observed that target countries are disclosed in the lowest performance metrics are distinguished in red.
three different ways. First, explicit country names or their
Appropriate safeguard and copy means. We built one
abbreviations (e.g. U.S.) are mostly disclosed in privacy
binary SVM classifier (Adequacy Decision) and four
policies. Second, some domain-specific terms were also
keyword-based rule classifiers (Standard Data Protec-
used to implicitly disclose the target countries. For exam-
ple, Privacy Shield is a certification framework that, until
16 July 2020, ensured an adequacy decision to perform 8
https://pypi.org/project/countryinfo
tion transfers are shown in Table 5. The binary adequacy deci-
TABLE 4. PERFORMANCE OF TARGET COUNTRY CLASSIFIER. sion classifier achieves high performance, only misclassify-
Country Precision Recall F-measure +/- ing policy segments ambiguously stated. The keyword-
Support based rule classifiers also allowed to correctly identify
Japan 100% 100% 100% 2/115 most of the disclosed transparency elements in the IT-100
Corpus policy segments. Admittedly, their generalization
South Korea 100% 100% 100% 1/116
may be hindered since their rules were built based on a
United Arab 100% 100% 100% 1/116
relatively small positive ground truth. We, therefore, per-
Emirates
formed a further evaluation on a subset of unseen privacy
Australia 100% 100% 100% 3/114
policies as explained next.
Singapore 100% 100% 100% 6/111
India 100% 100% 100% 1/116 TABLE 5. PERFORMANCE OF IDENTIFYING THE TRANSPARENCY
United 100% 97% 99% 71/46 ELEMENTS ON A SET OF POLICY SEGMENTS (𝒏 = 𝟏𝟏𝟕) THAT
States DISCLOSE THE INTENTION TO PERFORM A CROSS-BORDER
TRANSFER.
China 100% 100% 100% 5/112
Canada 100% 100% 100% 1/116 Transparency Precision Recall F-measure +/-
It builds upon a set of policy segments (𝒏 = 𝟏𝟏𝟕) that disclose the element Support
intention to perform a cross-border transfer. The Support column Adequacy 97% 90% 94% 41/76
shows the number of ground truths of (+) the segments that disclose Decision
the corresponding target country and (-) the segments that do not Standard Data 100% 100% 100% 12/105
disclose any target country. Protection
Clauses
Clauses, Binding Corporate Rules, Explicit Consent, and Binding Cor- 100% 100% 100% 4/113
Means to get a copy of safeguards) to identify the other porate Rules
individual transparency elements besides the target coun- Explicit Con- 60% 100% 75% 4/113
try. We have not generated classifiers for Approved Certi- sent
fication and Approved Code of Conduct as they are not Copy Refer- 100% 100% 100% 5/112
disclosed in any IT-100 Corpus privacy policy. This makes ence
sense since, so far, the Commission has not yet adopted The Support column shows the number of ground truths of (+) the
any Code of Conduct or Certification Scheme for GDPR. segments that disclose the corresponding transparency element and
The binary SVM classifier to identify cross-border (-) the segments that do not disclose it. Adequacy decision results
transfer statements covered by an Adequacy Decision was relied on 3-fold stratified cross-validation.
built by following the same procedure explained in Sec-
tion 3.1.1. The best performance was achieved by a binary
4.1.3 Validation
SVM classifier built on top a uni- and bigram-based fea- We validated the two-layer classification pipeline, i.e., the
tures, and using a TF.IDF weighting approach. The details cross-border transfer intention classifier and the individual
on the performance achieved by other classifiers we tried transparency element classifiers. To this end, we took
are available in the accompanying replication package. advantage of the large-scale compliance assessment
On the other hand, due to the limited number of posi- method presented in Section 5, which automatically
tive ground truths and the higher performance compared tagged the privacy policies of 10,080 apps using the
to binary classifiers, we generated keyword-based rule aforementioned classifiers. A cluster sampling was con-
classifiers to identify the remaining four transparency ducted on these tagged privacy policies to randomly se-
elements. Our approach involves developing a set of rules lect a subset of 30 privacy policies, while ensured a bal-
leveraging that a scoped domain-specific vocabulary is anced number of each transparency element. These 30
used to refer them. The set of 117 IT-100-Corpus policy policies were manually annotated and used as ground
segments disclosing a cross-border transfer were analysed truth for the evaluation9. Since the actual compliance
by a privacy expert, who selected minimum phrases (2-5 checking is based on an entire privacy policy, we say that
terms) that captured the key terms of each transparency a privacy policy discloses a cross-border transfer practice
element. These phrases were first normalized to lowercase or a transparency element if at least one policy segment
and their grammatical roots by using Porter Stemmer and contains them.
then turned into a set of rules. For example, the rule (‘con- Table 6 shows the performance of the classifiers on the
tract’|’standard’) w/4 (‘model’|’clause‘) implies that the nor- 30 unseen privacy policies. As can be observed, the cross-
malized form ‘contract’ or ‘standard’ must occur before or border transfer intention classifier achieves the highest
after the normalised form ‘model’ or ‘clause’ by no more performance in identifying when this practice is disclosed
than 4 terms in the same sentence. If that rule is satisfied, as well as when is not disclosed in a privacy policy. That
a policy segment is labelled as disclosing Standard Data consistency is particularly important as it minimizes the
Protection Clauses. The same procedure was followed for carry-over of misclassifications into the transparency ele-
the other transparency elements. ment classifiers that follow the pipeline. With F-measures
The results of applying these classifiers to a set of 117 9
These annotated privacy policies can be found in the sheet as-
IT-100-Corpus policy segments disclosing cross-border sembled_30_validation.csv available at the replication package.
ranging from 94.4% to 100% in 3 out of 4 appropriate checking process to be compared with the practices ex-
measure classifiers, 90.9% in the copy means classifier, tracted from the app privacy policy.
and from 85.7% to 100% in the target country classifiers, We rely on dynamic analysis to observe the app behav-
we believe that our approach can be exploited to extract iour and extract its personal data flows. Personal data
these privacy practices at the level of transparency ele- flows can also be inferred from the app’s representations
ments with a high degree of certainty. or models by using static analysis [26]–[29]. However, we
We examined the misclassifications of the target coun- favour dynamic analysis to prioritise soundness over com-
try (United Arab Emirates) and the explicit consent classi- pleteness, as our goal is to extract actual evidence of
fiers, which have the lowest performance. The false nega- cross-border transfers carried out by an app. Furthermore,
tive in the country classifier is because the privacy policy we rely on app network interfaces as sources of behaviour.
used a non-standard country code (UAE instead of EA), Previous studies [30] have shown the prevalent usage of
which escapes the proposed approach. The explicit con- network interfaces over SMSs or short-range interfaces
sent classifier did not distinguish between tacit and ex- such as Bluetooth or NFC by mobile apps to communicate
plicit consent. This is an area that could be improved, externally. Thus, it is fair to assume that most cross-border
perhaps through an effort to extract more positive ground transfers occur naturally through the network. Figure 6
truths and then build a robust ML-based classifier. Never- sets out the overall data flow extraction process, which is
theless, the classifier achieves the highest recall, thus summarized below. Details can be found in [8].
avoiding pointing out a wrong compliance issue. Configuration. Based on the Google Play API, we au-
tomatically crawl and download the target mobile app
4.2 App Behaviour Analysis (APK) from Google Play Store. Once downloaded, we ex-
This process aims to analyse the behaviour of an Android tract the metadata from the APK digital certificate.
app and then extract the personal data flows in terms of Stimulation. Automated stimulation is based on a ran-
(i) the type of personal data; (ii) the type of recipient who dom strategy provided by the U/I Exerciser Monkey [31],
receives the personal data (i.e., first-party or third-party which provides better performance in terms of code cov-
recipient); and, (iii) the country in which the recipient erage compared to other approaches [32].
servers are located. This information feeds the compliance

TABLE 6. PERFORMANCE OF IDENTIFYING THE CROSS-BORDER TRANSFER INTENTION AND THEIR TRANSPARENCY ELEMENTS ON A
SET OF UNSEEN PRIVACY POLICIES (𝒏 = 𝟑𝟎).

Class Precision Recall F-measure NVP Specificity F-measure- +/-

negative Support
Cross-border Transfer 100% 100% 100% 100% 100% 100% 24/6
Transfer & Standard Data Protection 93.8% 100% 96.8% 100% 93.3% 96.6% 15/15
Clauses
Transfer & Adequacy Decision 89.5% 100% 94.4% 100% 84.6% 91.7% 17/13
Transfer & Binding Corporate Rules 100% 100% 100% 100% 100% 100% 5/25
Transfer & Explicit Consent 37.5% 100% 54.5% 100% 81.5% 89.8% 3/27
Transfer & Copy Means 100% 83.3% 90.9% 96.0% 100% 98.0% 6/24
Transfer & United States 85.0% 94.4% 89.5% 90.0% 75.0% 81.8% 18/12
Transfer & Canada 100% 100% 100% 100% 100% 100% 3/27
Transfer & Singapore 75.0% 100% 85.7% 100% 96.3% 98.1% 3/27
Transfer & Russia 100% 100% 100% 100% 100% 100% 3/27
Transfer & Japan 100% 100% 100% 100% 100% 100% 3/27
Transfer & Mexico 100% 100% 100% 100% 100% 100% 1/29
Transfer & China 75.0% 100% 85.7% 100% 96.3% 98.1% 3/27
Transfer & Azerbaijan 100% 100% 100% 100% 100% 100% 1/29
Transfer & Brazil 100% 100% 100% 100% 100% 100% 3/27
Transfer & Argentina 100% 100% 100% 100% 100% 100% 1/29
Transfer & Israel 100% 100% 100% 100% 100% 100% 1/29
Transfer & United Arab Emirates 0.0% 0.0% 0.0% 96.7% 100% 98.3% 1/29
Transfer & South Korea 100% 100% 100% 100% 100% 100% 3/27
Transfer & Australia 75.0% 100% 85.7% 100% 96.3% 98.1% 3/27
Transfer & Belarus 100% 100% 100% 100% 100% 100% 1/29
The Support column shows the number of ground truths of (+) privacy policies that disclose the corresponding transparency element and (-)
the privacy policies that do not disclose it. The binary SVM classifier of cross-border transfer built upon 1-2 grams-based feature vectors after
applying stemming and the TF weighting approach.
Remote server
App play store Controller Mobile device MITM Proxy Log repository Analyzer
domain
1: Get APK
and metadata
2: Save app's metadata

3: Install APK
4: Start capturing traffic

5: Start Monkey

6: Instrument Frida

Capture loop
7. Generate
fullTestingTime =True random events
8: HTTP(S) request 9: HTTP(S) request

10. Check flow-

app mapping
11: Save HTTP(s) packet
13: HTTP(S) response 12: HTTP(S) response

14: Analyse app traffic

Analysis loop
15. Get app HTTP(S) packet
[for each HTTP(s)
packet] 16. Determine
data type

17. Determine
18. Get app's metadata recipient country

19. Determine
20. Save results recipient type

Fig. 6. Overall process to extract cross-border personal data flows from apps.

Interception. This component is responsible for captur- bag of tokens representing the app and a bag of tokens
ing the app network traffic and storing it for further analy- representing the target domain. The former consist of the
sis. Traffic capture from the device’s network interface is second-level domain (SLD) and subdomains from the APK
built around a man-in-the-middle (MITM) proxy10, which name, the organisation name extracted from the digital
requires installing a self-signed CA certificate to become certificate used to sign the app, and the app name re-
trusted. We leveraged Frida11 to further bypass the most trieved from the Play Store. The latter consists of the SLD
common countermeasures to HTTPS interception e.g. and subdomains of the domain targeted by the traffic. A
certificate pinning. After configuring the mobile device to token matching is then made between the two bags, clas-
connect to the Internet through the MITM proxy, it is pos- sifying the domain as a first-party recipient if there is at
sible to capture both HTTP and HTTPS traffic from any least one token match. Domains not classified as first-
app. We further implemented a flow-to-app mapping party recipients were searched in webXray13; if found, they
component to filter in the traffic belonging to the target were classified as third-party recipients. This dataset14 has
app since each app is analysed independently. been created in the specific context of disclosing personal
Analysis. This component analyses each app traffic to data to third parties in the web and mobile ecosystem. It
determine (i) the type of personal data transferred by an maps individual target domains to the owner company
app, (ii) the country where the personal data recipient is and even to parent companies, including the country in
located, and (iii) the type of recipient who receives the which the headquarters are located and service category.
personal data (i.e., first-party or third-party recipient). For Domains not classified as a first- or third-party recipient
(i), we use string searching in the packets payload. It also were classified as unknown and excluded from further
searches for data encoded in Base64, MD5, SHA1 and analysis.
SHA256. For (ii), we relied on ipstack API12 to determine
the location of the servers receiving the app’s connections.
For (iii), we first performed a token matching between a
13
https://webxray.org/
14
As a further contribution of this study, we added 234 new do-
10
https://mitmproxy.org/ mains to the dataset maintained by a research community, available
11
https://frida.re/ at https://github.com/PrivApp/webXray_Domain_Owner_List
12
https://ipstack.com/ .
4.3 Compliance Checking The privacy policy of the app includes the statement
The final process aims to check whether the apps per- shown in Figure 8, which discloses the intention to trans-
forming cross-border transfers properly disclose them fer personal data (green). However, it does not reveal the
through their privacy policies. To this end, we consider target countries neither appropriate safeguards and the
four consistency types between the app’s personal data means to get a copy of such safeguards. Despite this app
flows and privacy policy statements, as illustrated below. provider appeals to the consent, the AAID is transferred to
startup.mobile.yandex.net before the user interacts with the
4.3.1 Full cross-border transfer disclosure app for the first time, nullifying any attempt to underpin
It implies that a privacy policy discloses all transparency the transfer by explicit consent, as explained in Section 3.
elements according to the type of cross-border transfer
actually carried out by an app. For illustrative purposes, “Some countries implement the EU Data Protection Directive
consider the “Viber Messenger” com.viber.voip app, owned (95/46/EC) regarding the transfer of information. If Tapvpn col-
by Viber Media. It has been installed +50,000,000 times lects and processes PI disclosed by you to us, by clicking the I
from the Google Play Store. For marketing purpose, it Accept button or otherwise accepting our Services, you consent
transfers the AAID (Android Advertisement Identifier) to, to such transfer”
inter alia, app.adjust.com, whose servers are located in the Fig. 8. International transfer statement of the pm.tap.vpn app.
United States (US). The domain adjust.com is owned by the
third-party recipient Adjust, which is based in the US.
4.3.3 Inconsistent cross-border transfer disclosure
The privacy policy of the app includes the statement
shown in Figure 7, which fully discloses the cross-border It implies that a privacy policy includes statements that
transparency elements. That is, the transfer intention contradict the cross-border transfers actually carried out
(green); target country (yellow); appropriate safeguard - by an app. For illustrative purposes, consider the “Kids
Standard Data Protection Clause and Binding Corporate Learn Professions” (com.forqan.tech.Jobs) app, owned by
Rules (blue); and the means to get a copy of these safe- Jobs Match, that has been installed +10,000,000 times from
guards (grey). Therefore, in this specific case, we classify the Google Play Store. For analytics and advertisement
this app as a full cross-border transfer disclosure. purposes, it transfers the AAID and a Fingerprinting identifier
to, inter alia, the domain ads.api.vungle.com, whose servers
“International Transfer. We operate internationally and provide are located in the US. This domain is owned by the third-
our Services to Viber users worldwide allowing them to com- party recipient Vungle based in the US.
municate with each other across the globe. That means that your The privacy policy of the app includes the statement
personal information may need to be processed in countries shown in Figure 9. It properly discloses all three transpar-
where data protection and privacy regulations may not offer the ency elements, i.e., the intention to transfer personal data
same level of protection as in your home country. We store and (green) to a country (yellow) covered by an adequacy
process your personal information on our computers in the Unit- decision (blue). However, an international transfer is actu-
ed States, Asia, Europe (including Russia), Australia and Brazil, and ally made to the US. Therefore, in this specific case, we
use service providers that may be located in various locations classify this app as inconsistent cross-border transfer dis-
outside of the European Economic Area (EEA).We have put in closure.
place appropriate safeguards (such as contractual commitments)
in accordance with applicable legal requirements to ensure that “The Analytics Information we collect will be processed in Israel
your data is adequately protected. For more information on the which is recognized by the European Commission as having
appropriate safeguards in place, please contact us at the details adequate protection for personal data. If we transfer your Analyt-
below. As part of the Rakuten Group, Viber relies on the Rakuten ics Information from the EU to other jurisdictions, we will do so
Group Binding Corporate Rules to legitimize international data by using adequate safeguards determined by the EU Commis-
transfers within the Group. The Rakuten Group Binding Corporate sion.”
Rules can be found at Fig. 9. International transfer statement of the com.forqan.tech.Jobs
app.
https://corp.rakuten.co.jp/privacy/en/bcr.html
Fig. ”.
7. International transfer statement of the com.viber.voip app.

4.3.4 Omitted cross-border transfer disclosure

4.3.2 Ambiguous cross-border transfer disclosure
It implies that a privacy policy does not include any trans-
It implies that a privacy policy includes only a subset of
parency element when an app performs a cross-border
the transparency elements required by the GDPR accord-
transfer. For example, consider the “PrimalCraft”
ing to the type of cross-border transfer carried out by the
(com.tellurionmobile.primalcraft) app that has been installed
app. The missing transparency elements are either includ-
+10,000,000 times from the Google Play Store. It transfers
ed in an ambivalent manner or not included at all. For
personal data to 13 different third-party recipients, includ-
example, consider the “TapVPN Free VPN” (pm.tap.vpn)
ing transfers of the GPS Location and AAID to the domain
app, owned by Smart Media, which has been installed
sdk-android.ad.smaato.net, owned by the advertisement
+10,000,000 times from the Google Play Store. For analyt-
company Smaato and hosted in servers located in the USA.
ics purposes, it transfers the AAID to, inter alia,
Furthermore, it transfers GPS Location, SSID Wi-Fi information
startup.mobile.yandex.net, whose servers are located in Rus-
and MAC address information to the domain ad.mail.ru, a
sia. This domain is owned by the third-party recipient
social network owned by the company Mail.ru based in Rus-
Yandex LLC that is based in Russia.
sia. After searching for a policy statement describing at Since the classification models presented in section 4.1
least the intention to make a cross-border transfer, it was operate based on policy segments, each privacy policy
not found at all. We, therefore, classify this app as omitted was broken down accordingly, using the full stop as a
cross-border transfer disclosure. paragraph separator. Each policy segment was finally pre-
processed and then fed into the classification models for
extracting the cross-border transfer practices.
5 GOOGLE PLAY STORE APPS ASSESSMENT
Apps selection. To have a representative sample of the
To advance the fundamental understanding of GDPR apps mostly used by EU data subjects, the main criterion
compliance in the Android ecosystem, we carried out a guiding the selection of apps was their popularity within
compliance assessment of cross-border transfers of the an EU country. We relied on Google Play Store's categori-
free, most-popular 10,080 apps from the Google Play sation of the top free apps in Spain to download a set of
Store in Spain. In particular, after defining the experi- 10,080 apps17, which have been highly downloaded as
mental environment set-up (Section 5.1), we examined shown in Table 7. These set of apps are distributed across
how many apps conducted cross-border transfers and the different categories available on Google Play Store
how many of them disclosed properly such practices (Fig. 11a). Furthermore, based on the Issuer Locality field
through privacy policies according to the type of cross- of digital certificates used to sign the apps, we observe
border transfer (Section 5.2), and discuss our findings that apps have been supposedly signed by app providers
(Section 5.3). coming from 115 different countries, thus ensuring diver-
sity in the geographical location of app providers. Inter-
5.1 Experimental Environment
estingly, a vast majority of apps (65%) have been signed
We conducted a controlled experiment by using the as- by non-EU app providers, in particular by providers based
sessment method presented in Section 4. Both the An- in the United States, while only the 8% of apps were
droid mobile apps and their privacy policies were down- signed by EU app providers (Fig. 11b). The remaining 27%
loaded and tested between 20 July and 22 August 2020 of apps do not provide Locality Issuer information in digi-
from Spain. The apps were installed and tested on five tal certificates.
mobile devices: 3 Xiaomi Redmi 7a (API 28) and 2 Xiaomi
Redmi 5 (API 25). Each app was run for 10 minutes con- TABLE 7. DISTRIBUTION OF APPS PER DOWNLOADS
sidering two phases: idle stage (i.e., without user interac-
# downloads # apps (%)
tion) and active stage (i.e., with user interaction). As illus-
< 10.000 539 (5.35%)
trated in Figure 10, traffic was captured for 2 minutes
without any user interaction (idle stage). Then, a monkey 10.000+ 997 (9.89%)
was configured to interact with the application for 8 50,000+ 739 (7.33%)
minutes by generating 5000 events (active stage) in that 100.000+ 2,158 (21.41%)
time span. Before starting the active stage, all permissions 500.000+ 1,045 (10.37%)
requested by the app were automatically granted. 1.000.000+ 2351 (23.32%)
5.000.000+ 859 (8.52%)
Active
Idle stage 10.000.000+ 1,068 (10.60%)
stage
2 mins 50.000.000+ 324 (3.21%)
8 mins
App is Monkey starts App is
App is
Mobile installed
launched interacting removed App’s data flow dataset. A total of 339,447 data flows
device
with the app generated by the 10,080 apps have been logged. Each log
boots
includes the app name, app version, capture stage (idle or
Fig. 10. Testing timeline active), target domain, target country, and personal data
type disclosed (if any). From these flows, 262,253 flows
On the other hand, we relied on the privacy policy URLs (77.2%) have disclosed at least one of the personal data
from the console of Google Play Store to download types to 2,041 unique fully-qualified domain names with
plaintext of privacy policies, as the European Data Protec- 1,309 unique second-level domains (SLDs). These SLDs are
tion Board (EDPB) explicitly states that, for apps, the nec- hosted on servers located across 38 different countries, 17
essary should be made available from an online store EU countries and 21 non-EU countries hosting 165 (12%)
prior to download [20]. After retrieving the privacy policy and 1,183 (88%) SLDs, respectively. Note that the sum of
URLs using the Google Play API, we download the content both exceeds the aforementioned 1,309 unique SLDs
of each policy using the Selenium WebDriver15 into a because 39 of them are hosted on servers located in both
headless Chrome browser, allowing the execution of dy- EU and non-EU countries.
namic content such as JavaScript. Non-English privacy Privacy policy dataset. Of the 10,080 apps finally se-
policies16 and their apps were excluded from the analysis. lected, 254 (2.5%) have not published their privacy policy
URL, while 165 (1.6%), although published, were not
15
Selenium Chrome WebDriver available at reachable at harvest time. The remaining 9,661 (95.9%)
https://www.selenium.dev/documentation/en/webdriver [Accessed:
23-Mar-2020]
16
LangDetect was used to determine whether the majority of a 17
Actually, an initial set of 10,470 apps were considered, but 390
privacy policy was written in English. Available at were excluded as their privacy policies are not written in English,
https://pypi.org/project/langdetect/ [Accessed: 23-Mar-2020] leaving the 10,080 considered in this study.
privacy policies were fed into the classification models 5.2 Results
irrespective of whether cross-border transfers were de- The number of apps that fulfil the criteria leading to the
tected during the testing time of their corresponding four types of cross-border transfers is detailed in Figure
apps. A total of 355,009 policy segments have been ana- 12. It was found that three quarters (7,579) of the apps
lysed. Of these, 21,122 contain transparency elements of transferred some type of personal data during the testing
cross-border transfer practices as shown in Table 8. Some period. From them, a subset of 1,508 apps (15% of total)
transparency elements are redundant within a privacy transferred personal data solely to other EU countries. As
policy. For example, the cross-border transfer intention is explained in Section 3, these do not imply further re-
disclosed in 8,408 different policy segments from only quirements in terms of GDPR. The remaining 6,071 (60%)
3,808 apps' privacy policies. apps transferred personal data outside the EU. This subset
of apps branches out into three groups that imply differ-
TABLE 8. NUMBER OF APPS AND STATEMENTS DECLARING THE ent transparency requirements: a small subset of 758 apps
TRANSPARENCY ELEMENTS OF CROSS-BORDER TRANSFERS. transferred personal data to non-EU first-party recipients
Transparency element Number of Number of (Section 5.2.1), a reduced subset of 75 apps transferred
apps statements personal data to third-party recipients covered by an ade-
Data controller/representative 2,285 (22.6%) 5,937 quacy decision (Section 5.2.2), and, finally, a substantial
Transfer intention 3,808 (37.7%) 8,408 subset of 5,665 apps transferred personal data to third-
Specific target country 2,808 (27.9%) 5,159 party recipients not covered by an adequacy decision
Existence of EU adequacy decision 940 (9.32%) 1,952 (Section 5.2.3). Note that some apps have performed
Standard Data Protection Clauses 831 (8.24%) 978
more than one cross-border transfer type. For example,
out of 6,071 apps that transferred personal data to a non-
Binding Corporate Rules 46 (0.46%) 54
EU country (C1.3.2 in Figure 12), a subset of 365 apps
Explicit consent 138 (1.37%) 166
transferred personal data to both non-EU third-party
Means to get a copy of safe- 370 (3.67%) 420
recipients and non-EU first-party recipients. Thus, the sum
guards
up of both subsets of apps (5,678 and 758) exceeds its
input in 365.

(a)

(b)
Fig. 11. Distribution of (a) categories and (b) providers’ locality for the apps analysed. Note that (b) is represented in a log-scale axis and due
to space limitations only the top-45 countries are shown.
n=10,080

Extract data flow

fA = (d, c, r)

Check criterion C1.1

d DT

[not meet]
[meet] 7,579 (75%)
Check criterion C1.2
A targets the EU
[not meet]

[meet] 7,579 (75%)

Check criterion C1.3
c EU

[not meet] 1,508 (15%)

[meet] 6,071 (60%)

Check criterion C1.4
r is-a third-party recipient

FD: 248 [first-party] 758 (8%)

Disclose T1
OD: 510
[third-party] 5,678 (56%)
Check criterion C1.5
c ADC

[adequacy decison] 75 (0.7%) [non-adequacy decision] 5,665 (56%)

FD: 1 FD: 176
AD: 20 AD: 2,206
Disclose T2 Disclose T3
ID: 32 ID: 23
OD: 22 OD: 3,260

Fig. 12. Number of apps performing cross-border transfers. Each app transfer has been classified as full cross-border transfer disclosures
(FD), ambiguous cross-border transfer disclosures (AD), inconsistent cross-border transfer disclosures (ID) or omitted cross-border disclo-
sures (OD). The % is relative to the total number of apps (n=10.080).

To perform a more in-depth analysis, we relied on the

5.2.1 Transfers to non-EU data controllers Issuer Locality information of each app digital certificate
During the testing time, a subset of 758 apps performed to identify the app provider location and thus distinguish-
cross-border transfers to domains belonging to them but ing those that appear to be established within the EU
hosted on servers outside the EU 18. In particular, the Unit- from those established outside the EU. To be sure of the
ed States is the country that hosts the vast majority (89%) reliability of this assumption, we manually checked the
of first-party recipient domains. The remaining 11% of consistency between the supposed issuer country extract-
these apps host their domains in 13 different countries, ed from 40 randomly selected apps and the organisation's
including Canada, China, Singapore, Russia, Iran, Japan, country (if any) disclosed in their respective privacy poli-
India and Australia. cies, finding only 1 inconsistency20.
Nearly a third (248) of these apps inform accordingly As a result, a remarkable 76% of apps providers (579)
about an EU representative or data controller19 and there- would be established outside the EU, compared to a small
fore have been classified as full cross-border transfer dis- 7% of apps (129) within the EU, while the Issuer Locality of
closures (FD), while the remaining 67% ommitted cross- the remaining 17% (129) of apps is missing from their
border transfer disclosures (OD). digital certificates. Focusing on the 579 non-EU app pro-
viders, a substantial 70% (403) of apps omitted the disclo-
18
Details on non-EU data controller cross-border transfers can be
found the T1_results.csv available at the replication package.
19
Strictly speaking, it is only a representative, but we have not no- 20
Details can be found in the Checking_issuer_locality.csv sheet
ticed that some privacy policies refer to it as an EU data controller. available at the replication package.
sure on an EU representative/data controller in the EU in More specifically, first, only one app discloses all the
their privacy policies, thus raising a potential compliance three transparency elements and therefore have been
issue. Four key different app providers attitudes can be classified as full cross-border transfer disclosures (FD).
stemmed after analyzing in-depth some involved privacy Second, a subset of 20 apps (27%) has been classified as
policies. ambiguous cross-border transfer disclosures (AD), as de-
Several app providers do disclose the intention to per- spite they disclose the intention to perform a cross-border
form a transfer outside the EU to their premises in a spe- transfer, they fail to inform the target countries. The EC
cific country but appeal to tacit consent as a unique ena- has recognized twelve different countries as offering an
bling mechanism to perform a cross-border transfer, adequate level of data protection, so it is ambiguous to
which is not valid under the GDPR. For example, the flip- which countries personal data is transferred. Third, about
board.app, a News & Magazine app that has been down- 43% (32) of apps have been classified as inconsistent
loaded more than 50,000,000 times, resorts to the follow- cross-border transfer disclosures (ID), because despite
ing statement: “As a California-based company, we store disclosing the transparency elements required, there is a
and use personal data outside the EU. By using our web- disagreement between the countries to which the data are
sites or submitting your personal data, you consent to actually transferred and the countries disclosed in their
such transfer, storing and processing ”. privacy policies. Finally, about 30% (22) of apps omitted
Second, some app providers explicitly state that the disclosure of cross-border transfer practices at all and
underlying service targets a specific country market, other have been therefore classified as omitted cross-border
than EU, although disclaims liability for any cross-border transfer disclosures (OD).
transfers of personal data of those who still choose to use
them. For example, the com.hulu.plus, a popular video 5.2.3 Transfers not covered by adequacy decision
stream app that has also been downloaded more than The majority of apps (5,665) transferred personal data to
50,000,000 times, disclaims the following: “ Hulu is head- third-party recipients located in countries that are not
quartered in the U.S. and the Hulu Services are intended covered by an adequacy decision22. In particular, we found
for users in the U.S. By viewing any Content or otherwise that the vast majority of these apps (96%) performed
using the Hulu Services, you consent to the transfer of cross-border transfers to the United States. The remaining
information to the U.S. to the extent applicable, and the 4% of apps targeted third-party recipients located in nine
collection, storage, and processing of information under different countries: Russia (139), Iran (22), China (20), Sin-
U.S. laws.”. gapore (11), South Korea (6), India (4), Australia (1), South
Some app providers inform about the collection of per- Africa (1), and Ukraine (1).
sonal data but omit any applicable regulations at all, so As explained in Section 3, four transparency elements
they cannot be expected to designate an EU data repre- should be disclosed to data subjects when such transfers
sentative or controller. take place, namely transfer intention, names of targeted
Finally, a minor number of apps (19) still rely on the Pri- countries, appropriate safeguards, and a means to get a
vacy Shield framework as an enabling mechanism to per- copy of such implemented safeguards. However, a re-
form a cross-border transfer without further constraints, duced 3% (176) of these apps have disclosed the four
as it was covered by an EU adequacy decision. Yet, as aforementioned transparency elements through their
already mentioned, it was invalidated by the EU Court of privacy policies and therefore have been classified as full
Justice on July 16, 2020. cross-border transfer disclosures (FD).
Besides, less than 1% (23) of these apps do disclose the
5.2.2 Transfers covered by adequacy decision intention to perform a cross-border transfer but there is a
A small set of 75 apps transferred personal data to third- disagreement between the countries actually targeted by
party domains hosted on servers located in countries transfers and the countries disclosed in the apps’ privacy
covered by an EU adequacy decision21. In particular, Japan policies. Therefore, they have been classified as incon-
(43 apps), Canada (29), New Zealand (3), and Argentina (1) sistent cross-border transfer disclosures (ID).
were targeted by the apps involved. Since all these coun- Also, 36% (2,206) of these apps disclose the intention
tries maintain an adequacy decision, which is an assurance to perform a cross-border transfer but omit one or more
mechanism to ensure a protection level equivalent to of the other three transparency elements and therefore
GDPR, any transfer can be made without any further safe- have been classified as ambiguous cross-border transfer
guards. However, as explained in Section 3, three trans- disclosures (AD). More specifically, the majority of these
parency elements should be disclosed to data subjects apps’ privacy policies (1,698) fail to inform on the appro-
when such transfers take place, namely transfer intention, priate safeguards or the means to get a copy of them.
names of targeted countries, and the existence of an ade- Interestingly, 411 of these apps performed transfers to the
quacy decision itself. On balance, we found that 70% of United States and still rely on the Privacy Shield frame-
applications disclose to some extent the intention to per- work as an enabling mechanism to perform a cross-
form cross-border transfers to countries covered by an border transfer without further constraints. This frame-
adequacy decision, while the remaining 30% omitted it at work was invalidated by the EU Court of Justice on July 16,
all. 2020, as it does not provide an adequate level of protec-

21
Details on adequacy decision-based cross-border transfers can 22
Details on non-adequacy decision-based cross-border transfers
be found the T2_results.csv available at the replication package. can be found the T3_results.csv available at the replication package.
tion, and therefore these apps fall into non-compliant short banner of 144 words. Information about privacy
ones. Another remarkable aspect is that several apps (360) practices, including cross-border transfer practices, is
disclose the implementation of appropriate safeguards practically non-existent, but their underlying services have
(almost all of them through the establishment of Standard been bundled in some way into 26 different apps that
Data Protection Clauses) but fail to provide data subjects transferred personal data to servers outside the EU.
a means to obtain a copy of these safeguards ( e.g., an
email or download URL). Also, a concern that arises for all 5.3 Discussion
types of cross-border transfers is that the privacy policies A substantial 56%25 of analysed apps are potentially non-
of several applications use ambivalent statements, such as compliant with the GDPR cross-border transfer require-
“countries around the world”, “outside the EEA” or “any ments. Despite efforts to ensure cross-border transfers
country in which we do business”, to refer to the targeted through GDPR, the results clearly reveal that there is still a
countries. very significant gap between what app providers and
Finally, a significant 57% (3,260) of apps omitted cross- third-party recipients do in practice and what is intended
border transfer disclosure as neither the transfer intention, by GDPR. The results show that 56% of popular apps in an
the recipient countries, nor the appropriate safeguards EU country are potentially non-compliant with GDPR
were disclosed by their privacy policies. cross-border transfer requirements. In particular, 32% of
mobile apps do not disclose these practices at all, while
5.2.4 Who are the third-party recipients? the remaining 26% partially disclose them in an ambigu-
We further analyzed the third-party recipients targeted by ous and/or inconsistent way.
apps performing cross-border transfers (i.e., those 5,678 We argue that two main concerns inherent to mobile
apps that meet criterion C1.4.2 in Fig. 12). Out of 312 apps, consistent with previous work [1], impact on com-
different third-party recipients, as expected, a vast majori- pliance issues for cross-border transfer and leave a long
ty (94%) are indeed headquartered across 17 different way to go, namely app providers unawareness and lack of
non-EU countries. Figure 13 shows the top-40 third-party transparency from third-party services.
domain owners that have been targeted by apps perform- First, app providers may be unaware of the cross-
ing cross-border transfers during our testing. Taking ad- border transfers performed by third-party libraries they
vantage of the classification models presented in Section embed. The apps studied transferred personal data to
4, we examined the privacy policies of the top-110 third- third-party recipients, who provide a variety of services
party recipients23. predominantly related to advertising and analytics. On
Around 12% (13) of these third-party recipients, which average, each app performed international transfers to
collectively have been targeted by 1,551 apps, do not 2.35 different third-party services. These services often
even report the intention to perform a cross-border trans- require third-party libraries (TPL) to be embedded in the
fer. app code. While app developers may be expected to
A significant 52% (57) of them, which overall have been completely understand the practices of these TPLs before
targeted by 6,852 apps (note that certain apps may trans- including them in their apps, the evidence indicates that
fer data to more than one third-party recipient), are still this is not always the case [1].
appealing to the Privacy Shield framework24 as an ena- Second, some third-party services lack transparency on
bling mechanism to perform a cross-border which is not cross-border transfers. TPLs are expected to transparently
valid at all. More importantly, 30 of them use this frame- inform their cross-border transfer practices through priva-
work as a unique assurance mechanism becoming poten- cy policies or terms of service. Thus, app providers who
tially non-compliant. embed them could, in turn, include such practices in their
Unlike mobile apps, which ambiguously disclose the privacy policies. Some TPLs actually require developers to
target countries, we positively observe that a significant explicitly disclose their integration into the app’s privacy
78% (86) of third-party recipients do disclose this infor- policy and to obtain explicit consent as per GDPR [33].
mation explicitly. However, only half of them (40) inform However, as revealed, other TPLs are either ambiguous or
on the appropriate safeguards implementations that ena- do not even provide a privacy policy. Specifically, 12% of
ble a cross-border transfer. the top-110 third-party services do not provide infor-
We also emphasize that the aforementioned results mation on their international transfer intentions, and the
correspond to the available privacy policies. However, in 36% that do provide it fail to do it transparently.
some cases, finding the privacy policies of third-party Explicit consent is a fair enabler of cross-border trans-
recipients can be a challenging process, as they are not fers, but it is being misused. In the absence of an adequa-
centralised like mobile apps. Several of them are not cy decision or any appropriate safeguards, the explicit
available at all, which denotes the poor transparency prac- consent26 could also enable cross-border transfers. Never-
tices of the services that are being embedded by app theless, explicit consent requires a clear affirmative action
developers. Just to give an example, the globalcampaign- of the data subjects, e.g., ticking a box, to be obtained
tracker.com and aawrnstrk.com domains only show a
25
Note that that some apps have performed more than one type of
23
Details on the results of the top-110 third-party recipient privacy cross-border transfer. Therefore, an app has been classified as fully com-
policies can be found in the TPL-110.csv sheet available at the repli- pliant only if all individual transfers have been classified as full cross-
cation package, and the full dataset of third-party recipients at border transfer disclosures. Detailed results of each app can be found at
https://github.com/PrivApp/webXray_Domain_Owner_List the replication package.
26
24
These privacy policies were downloaded in December 2020. GDPR Art. 49
after providing precise details of the international trans- sent.
fers. As such, explicit consent removes the possibility of
using the dark pattern of pre-ticked boxes or tacit con-

Fig. 13.
Distribution of the top-40
third-party domain owners targeted by apps that perform cross-border transfers. Area indi-
cates the headquarters country.

To observe the prevalence in disclosing (explicit) con- framework. US-based app providers extensively used this
sent as an enabler of cross-border transfers, we further framework, as evidenced in this study, which allowed
analyzed the apps that meet criterion C1.4.2 (Fig. 12)27. In these providers to legally target EU data subjects and
particular, we observed 1,826 privacy policies that disclose transfer personal data between the EU and the US. How-
the intention to perform a cross-border transfer but not ever, this framework was invalidated by the Court of Jus-
the appropriate safeguard implementation. The result is tice of the EU on July 16, 2020 [34]. The automated ap-
that around 30% (555) of them resorts to the consent as proach presented in this paper has enabled a survey on
an enabler of cross-border transfers. We randomly select- compliance with GDPR cross-border transfers, finding the
ed a subset of 50 privacy policies and examined the con- still prevalent use of the EU-US Privacy Shield in the priva-
sent-labelled statement. Interestingly, in all cases, app cy policies of both apps (Section 5.2.3) and third-party
providers appealed to tacit consent stating that the usage recipients (Section 5.2.4) despite its invalidity. This shows
of the app by the data subject implies a consent to trans- the relevance of our approach in a landscape that is con-
fer personal data outside the EU. In this line, we observed stantly evolving.
that most of them (458) performed cross-border transfers In the same line, fully automated methods and tech-
during the idle testing stage, i.e., before the user interacts niques offer a viable alternative for large-scale assessment
with the app, thus nullifying any attempt to underpin the of apps compliance with different requirements. The open
transfer by explicit consent. As already mentioned, while model of the Android mobile ecosystem allows a vast
the consent is, in fact, a GDPR legal basis to perform a number of apps to be offered globally by developers eve-
cross-border transfer, it requires a clear affirmative action rywhere. Hence, manual review by control authorities or
of data subjects consenting it before it takes place [3]. distribution platform operators is impossible. The mean
Automated means for compliance assessment are key. time for downloading and tagging cross-border transfer
The need for automated methods and tools to evaluate practices in a privacy policy was 10 seconds, so with a
privacy requirements are essential in an evolving regulato- single instance up to 8,640 privacy policies can be tagged
ry landscape. During the course of this study, we coinci- in 24h. On the other hand, although in this study we have
dentally experienced a significant change in GDPR cross- combined the ML-based classification models with dy-
border transfer requirements. In 2016, the EU-US Privacy namic analysis techniques to obtain evidence of the actual
Shield was introduced as a limited adequacy decision to execution of apps, these can be combined with static
allow the transfer of personal data to US-based third party analysis techniques [10] for greater coverage, or with
recipients that were certified under the terms of this both static and dynamic, as needed. As such, we strongly
believe that our approach can be leveraged for large-scale
compliance scrutiny of cross-border transfers.
27
Details on the declaration of consent performed by these privacy
policies can be found in the T3_results.csv sheet available at the A supporting tool for developers. Whilst our proposed
replication package. method mainly aims to support audits by authorities or
distribution platforms, such as Google Play Store, it can unusual TLS certificate pinning implementations [36], and
also support app developers in ensuring the compliance sub-optimal coverage of app execution paths [37] are
of their apps or the third-party libraries they embed in some particular open orthogonal challenges to our pro-
their apps. It should be recognized that the mobile appli- posal, which can generate false negatives. Therefore, it
cation ecosystem is a mix of formal requirements estab- cannot ensure completeness and the results of fully com-
lished by prevailing regulations such as GDPR, along with pliant apps should not be misleading generalized. The fact
informal developer training. App developers can range that we have not observed a cross-border transfer during
from hobbyist to experienced professionals in large com- our testing period does not mean that an app will not
panies. As found in previous work [7], while large compa- definitely do so if its developers, e.g., use customized
nies may be able to form multidisciplinary teams to en- encoding mechanisms.
force legal requirements, small business developers may All in all, potential false negatives do not put at risk the
struggle to understand the privacy and data protection validity of the results of non-compliant apps, which is in
implications of their code. We consider that the necessary fact remarkably high (56%). The strength of dynamic anal-
multidisciplinary knowledge, including evaluation criteria ysis techniques is that evidence of non-compliant apps
supported by legal and not only technical interpretations, stem from real app behaviour and do not generate false
can be simplified into indicators that can be checked au- positives. Therefore, we consider that our proposal, as well
tomatically. This paper, as well as other related work [9], as the results, are valuable for app providers, app distribu-
[10], proved that (at least part of) such multidisciplinary tion platforms such as Google Play Store, and supervisory
knowledge can be embedded into GDPR automated as- authorities to detect the lower bound of non-compliance
sessment pipelines. While these automatic approaches do issues with GDPR cross-border transfers.
not act as infallible judges, they have the potential to alert
developers about possible non-compliance issues.
7 CONCLUSION AND FUTURE WORK
In this work, we presented a fully automated method to
6 THREATS TO VALIDITY assess compliance of mobile apps with the cross-border
Construct validity. The classification models build upon transfer requirements established by the GDPR. With an F-
the Corpus IT-100, which is an annotated dataset of legal measure ranging from 85.7% to 100% in identifying the
requirements laid down in the GDPR. Therefore, there is a different cross-border transfer transparency elements, our
risk that such a Corpus do not reflect the construct under approach can be exploited to extract these privacy prac-
study when moving legal requirements to the technical tices with a high degree of certainty and at scale.
domain. To mitigate this threat, the elaboration of the Also, we applied the automated compliance assess-
Corpus IT-100 was undertaken by privacy and data pro- ment method to determine the extent to which the apps
tection experts who comprehensively guided the building from the Google Play Store comply with the cross-border
of the annotation process, annotation scheme, and the transfer requirements of the GDPR. After evaluating the
simplified assumptions used during the annotation pro- top-free 10,080 apps from the Google Play Store in Spain,
cess of cross-border transfer practices in privacy policies. the results revealed that there is still a great gap between
Internal validity. If policies in languages other than Eng- what app providers and third-party services do in practice
lish were excluded, a bias towards the evaluation of non- and what is intended by GDPR. Notably, 5,646 (56%) apps
EU based applications could occur. Surprisingly, we ob- failed (completely or partially) to comply with the regula-
served that the providers of the applications mainly in- tions, either because their privacy policies include ambig-
clude privacy policies in English. Only 390 apps (3.7%) uous or inconsistent disclosures about cross-border trans-
published non-English policies exclusively, and only 80 fers, or they simply omit them. In addition, the results of
were in Spanish (0.76%). analysing the privacy policies of the top-110 third-party
Moreover, our automated privacy policy analysis ap- services, which were collectively targeted by 6,852 apps,
proach, like any approach based on statistical learning, revealed a significant 52% of them still relying on the
exhibits problems of misclassification that should be taken Privacy Shield Framework as an enabling mechanism to
into account. Thus, as pointed out in the different results perform cross-border transfers, almost six months after it
in section 4, although high F-measure values are achieved was invalidated.
(from 85.7% to 100%) there is the possibility of a small In a complex and evolving regulatory landscape auto-
number of other classifications. All in all, while it certainly mated methods and tools to evaluate privacy require-
does not act as infallible judges, we highlight the perfor- ments are essential to several stakeholders, including
mance of the cross-border transfer intention classifier supervisory authorities, distribution platforms and devel-
which does not exhibit any misclassification in a subset of opers. Our current efforts are aimed at extending the
randomly selected privacy policies, demonstrating its analysis of privacy policies disclosed in other languages.
potential to alert stakeholders of potential non- Likewise, we aim to extend our method to address other
compliance issues. requirements of the GDPR that have not yet been subject
External validity. Since the current implementation of of research efforts, in particular those related to automat-
the assessment method is built upon dynamic analysis ed decision-making.
techniques, it inherits the same limitation faced by them.
The use of non-standard encodings mechanisms [35],
ACKNOWLEDGMENT pp. 985–1002.
[14] M. Fan et al., “An empirical evaluation of GDPR compliance
This research has been partially supported by the CLIIP violations in android mHealth apps,” arXiv, pp. 253–264,
project (grant reference APOYO-JOVENES-QINIM8-72- 2020, doi: 10.1109/issre5003.2020.00032.
PKGQ0J) funded by the Comunidad de Madrid and Uni- [15] P. L. Mangset, “Analysis of Mobile Application’s Compliance
versidad Politécnica de Madrid under the V-PRICIT re- with the General Data Protection Regulation (GDPR),”
search programme ‘Apoyo a la realización de Proyectos de Norwegian University of Science and Technology, Norway,
I+D para jóvenes investigadores UPM-CAM’, and by the 2018.
Escuela Politécnica Nacional in Ecuador. [16] P. Ferrara and F. Spoto, “Static analysis for GDPR
Jose M. del Alamo (jm.delalamo@upm.es) is the corre- compliance,” in Proc. Italian Conference on Cybersecurity
sponding author. ITASEC2018, 2018, vol. 2058, pp. 1–10.
[17] Q. Jia, L. Zhou, H. Li, R. Yang, S. Du, and H. Zhu, “Who Leaks
My Privacy: Towards Automatic and Association Detection
REFERENCES with GDPR Compliance,” in Proc. 14th International
[1] A. Razaghpanah et al., “Apps, Trackers, Privacy, and Conference on Wireless Algorithms, Systems, and
Regulators: A Global Study of the Mobile Tracking Applications, 2019, pp. 137–148.
Ecosystem,” in Proceedings 2018 Network and Distributed [18] M. Eskandari, B. Kessler, M. Ahmad, A. S. de Oliveira, and B.
System Security Symposium, 2018, doi: Crispo, “Analyzing Remote Server Locations for Personal
10.14722/ndss.2018.23353. Data Transfers in Mobile Apps,” Priv. Enhancing Technol., pp.
[2] Tiffany Li and Zhou Zhou, “Do You Care About Chinese 118–131, Jan. 2017.
Privacy Law? Well, You Should,” IAPP, 2015. [Online]. [19] Article 29 Data Protection Working Party, “Opinion 02/2013
Available: https://iapp.org/news/a/do-you-care-about- on apps on smart devices,” European Comission, Brussels,
chinese-privacy-law-well-you-should/. [Accessed: 23-Feb- Belgium, 2013.
2021]. [20] Article 29 Data Protection Working Party, “Guidelines on
[3] European Parliament and the Council of the European transparency under Regulation 2016/679,” 2018. .
Union, “General Data Protection Regulation,” 2016. . [21] European Commission, Commission Decision of 27
[4] European Parliament and the Council of the European December 2004 amending Decision 2001/497/EC as regards
Union, Charter of Fundamental Rights of the European the introduction of an alternative set of standard
Union. 2012, pp. 391–407. contractual clauses for the transfer of personal data to third
[5] C. Castelluccia, S. Guerses, M. Hansen, J.-H. Hoepman, J. van countries (notified under document number C(200.
Hoboken, and B. Vieira, “A study on the app development European Comission, 2004.
ecosystem and the technical implementation of GDPR,” [22] J. M. Moguerza and A. Muñoz, “Support vector machines
Union Agency for Network and Information Security with applications,” Stat. Sci., vol. 21, no. 3, pp. 322–336,
(ENISA), European Union, 2017. 2006, doi: 10.1214/088342306000000493.
[6] J. Gamba, M. Rashed, A. Razaghpanah, J. Tapiador, and N. [23] S. Wilson et al., “Analyzing Privacy Policies at Scale,” ACM
Vallina-Rodriguez, “An Analysis of Pre-installed Android Trans. Web, vol. 13, no. 1, pp. 1–29, Feb. 2019, doi:
Software,” in Proc. IEEE Symposium on Security and Privacy 10.1145/3230665.
(SP), 2020, pp. 1039–1055. [24] F. Pedregosa et al., “Scikit-learn: Machine learning in
[7] R. Balebako, A. Marsh, J. Lin, J. Hong, and L. Faith Cranor, Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.
“The Privacy and Security Behaviors of Smartphone App [25] S. Bird, S. Bird, and E. Loper, “NLTK : The natural language
Developers,” in Proceedings 2014 Workshop on Usable toolkit NLTK : The Natural Language Toolkit,” Proc. ACL-02
Security, 2014, pp. 1–10, doi: 10.14722/usec.2014.23006. Work. Eff. tools Methodol. Teach. Nat. Lang. Process.
[8] D. S. Guaman, J. M. Del Alamo, and J. C. Caiza, “GDPR Comput. Linguist. 1, no. March, 2016.
Compliance Assessment for Cross-Border Personal Data [26] L. Batyuk, M. Herpich, S. A. Camtepe, K. Raddatz, A.-D.
Transfers in Android Apps,” IEEE Access, vol. 9, pp. 15961– Schmidt, and S. Albayrak, “Using static analysis for
15982, 2021, doi: 10.1109/ACCESS.2021.3053130. automatic assessment and mitigation of unwanted and
[9] S. Zimmeck et al., “Automated Analysis of Privacy malicious activities within Android applications,” in Proc.
Requirements for Mobile Apps,” in Proceedings 2017 International Conference on Malicious and Unwanted
Network and Distributed System Security Symposium, 2017, Software, 2011, pp. 66–72.
vol. 3066, no. 132, pp. 286–296, doi: [27] M. Junaid, D. Liu, and D. Kung, “Dexteroid: Detecting
10.14722/ndss.2017.23034. malicious behaviors in Android apps using reverse-
[10] S. Zimmeck et al., “MAPS: Scaling Privacy Compliance engineered life cycle models,” Comput. Secur., vol. 59, pp.
Analysis to a Million Apps,” Proc. Priv. Enhancing Technol., 92–117, 2016, doi: 10.1016/j.cose.2016.01.008.
vol. 2019, no. 3, pp. 66–86, 2019, doi: 10.2478/popets-2019- [28] G. Barbon, A. Cortesi, P. Ferrara, and E. Steffinlongo, “DAPA:
0037. Degradation-Aware Privacy Analysis of Android Apps,” in
[11] J. Saldana, The Coding Manual for Qualitative Researchers, Proc. 12th International Workshop on Security and Trust
3rd ed., vol. 1. CA, USA: Sage Publications, 2015. Management, 2016, pp. 32–46.
[12] S. Wilson et al., “The Creation and Analysis of a Website [29] S. Kelkar, T. Kraus, D. Morgan, J. Zhang, and R. Dai,
Privacy Policy Corpus,” in Proceedings of the 54th Annual “Analyzing HTTP-Based Information Exfiltration of Malicious
Meeting of the Association for Computational Linguistics Android Applications,” in Proc. 17th IEEE International
(Volume 1: Long Papers), 2016, pp. 1330–1340, doi: Conference On Trust, Security And Privacy In Computing
10.18653/v1/P16-1126. And Communications/ 12th IEEE International Conference
[13] B. Andow et al., “Actions speak louder than words: Entity- On Big Data Science And Engineering
sensitive privacy policy and data flow analysis with (TrustCom/BigDataSE), 2018, pp. 1642–1645.
policheck,” in Proc. 29th USENIX Security Symposium, 2020, [30] M. Lindorfer, M. Neugschwandtner, L. Weichselbaum, Y.
Fratantonio, V. Van Der Veen, and C. Platzer, “ANDRUBIS --
1,000,000 Apps Later: A View on Current Android Malware
Behaviors,” in Proc. International Workshop on Building
Analysis Datasets and Gathering Experience Returns for
Security (BADGERS), 2014, pp. 3–17.
[31] Android Developer’s Documentation, “UI/Application
Exerciser Monkey.” [Online]. Available:
https://developer.android.com/studio/test/monkey.
[Accessed: 23-Feb-2021].
[32] S. R. Choudhary, A. Gorla, and A. Orso, “Automated Test
Input Generation for Android: Are We There Yet? (E),” in
Proc. 30th IEEE/ACM International Conference on
Automated Software Engineering (ASE), 2015, pp. 429–440.
[33] Google Support, “Policy requirements for Google Analytics
Advertising,” 2019. [Online]. Available:
https://support.google.com/analytics/answer/2700409?hl=e
n. [Accessed: 23-Feb-2021].
[34] Court of Justice of the European Union, Judgment of the
Court (Grand Chamber) of 16 July 2020. 2020.
[35] I. Reyes et al., “‘Won’t Somebody Think of the Children?’
Examining COPPA Compliance at Scale,” Priv. Enhancing
Technol., no. 3, pp. 63–83, Jun. 2018.
[36] A. Razaghpanah, A. A. Niaki, N. Vallina-Rodriguez, S.
Sundaresan, J. Amann, and P. Gill, “Studying TLS Usage in
Android Apps,” in Proc. Applied Networking Research
Workshop, 2018, pp. 5–5.
[37] P. Patel, G. Srinivasan, S. Rahaman, and I. Neamtiu, “On the
effectiveness of random testing for Android: Or how i
learned to stop worrying and love the monkey,” Proc. - Int.
Conf. Softw. Eng., pp. 34–37, 2018, doi:
10.1145/3194733.3194742.

GDPR Is Coming:: How To Make Sure Your
100% (1)
GDPR Is Coming:: How To Make Sure Your
22 pages
Rews14relawmain Relawmainid6 P 24888 9f54173 21682 Preprint
No ratings yet
Rews14relawmain Relawmainid6 P 24888 9f54173 21682 Preprint
10 pages
WP2017 O-2-2-4 GDPR Mobile
No ratings yet
WP2017 O-2-2-4 GDPR Mobile
70 pages
MSC Presentation Sankar Harini - 10%
No ratings yet
MSC Presentation Sankar Harini - 10%
13 pages
Unit 4
No ratings yet
Unit 4
45 pages
MAPS - Scaling Privacy Compliance Analysis To A Million Apps
No ratings yet
MAPS - Scaling Privacy Compliance Analysis To A Million Apps
23 pages
Loki 2
No ratings yet
Loki 2
15 pages
GDPR Personal Data SOTICS VF Proceedings
No ratings yet
GDPR Personal Data SOTICS VF Proceedings
6 pages
Top X GDPR Assessment Tools
100% (1)
Top X GDPR Assessment Tools
17 pages
FTC Mobile Privacy Report
No ratings yet
FTC Mobile Privacy Report
36 pages
с обложкой wp202 - en PDF
No ratings yet
с обложкой wp202 - en PDF
30 pages
Complying With GDPR: An Agile Case Study
No ratings yet
Complying With GDPR: An Agile Case Study
7 pages
Scriptie Yildirim
No ratings yet
Scriptie Yildirim
105 pages
Barati 2020 Automating
No ratings yet
Barati 2020 Automating
6 pages
Dissertation Proposal
No ratings yet
Dissertation Proposal
6 pages
Privacy Between Regulation and Technology - GDPR and The Blockchain
No ratings yet
Privacy Between Regulation and Technology - GDPR and The Blockchain
13 pages
CTH Fyp 1
No ratings yet
CTH Fyp 1
29 pages
Guileak: Tracing Privacy Policy Claims On User Input Data For Android Applications
No ratings yet
Guileak: Tracing Privacy Policy Claims On User Input Data For Android Applications
11 pages
Data and GDPR Questions For Event App Providers Checklist
No ratings yet
Data and GDPR Questions For Event App Providers Checklist
1 page
Design Principles For The Genera Ldata Protection Regulation (GDPR)
No ratings yet
Design Principles For The Genera Ldata Protection Regulation (GDPR)
14 pages
Can Consumers Bank On Financial Services Being Secu 2018 Computer Fraud Se 3
No ratings yet
Can Consumers Bank On Financial Services Being Secu 2018 Computer Fraud Se 3
1 page
Scenarios and Quiz
No ratings yet
Scenarios and Quiz
7 pages
The Ultimate Guide To ECP-653 Ericsson Certified Professional - LTE RAN Solution
No ratings yet
The Ultimate Guide To ECP-653 Ericsson Certified Professional - LTE RAN Solution
3 pages
Cross
No ratings yet
Cross
16 pages
Principle Based Framework Towards Cross Border Data Flows The Dialogue
No ratings yet
Principle Based Framework Towards Cross Border Data Flows The Dialogue
62 pages
TIA and GDPR
No ratings yet
TIA and GDPR
20 pages
Neonpulse (R1 03)
No ratings yet
Neonpulse (R1 03)
7 pages
The Ultimate Guide To ECP-402 Ericsson Certified Associate - Ericsson 5G RAN
No ratings yet
The Ultimate Guide To ECP-402 Ericsson Certified Associate - Ericsson 5G RAN
3 pages
Secure Mobile Application Developments Critical Issues and Challenges IJERTCONV2IS02007
No ratings yet
Secure Mobile Application Developments Critical Issues and Challenges IJERTCONV2IS02007
4 pages
Modul Mobile Apps - Pdf10-Dikonversi
No ratings yet
Modul Mobile Apps - Pdf10-Dikonversi
27 pages
How To Think of Unique Project Ideas That Solve Real-Life Problems
No ratings yet
How To Think of Unique Project Ideas That Solve Real-Life Problems
4 pages
Mobile Terminals L2 Module 2
No ratings yet
Mobile Terminals L2 Module 2
18 pages
Understanding Data Transfers Under The GDPR (OneTrust One Trust) (Z-Library)
No ratings yet
Understanding Data Transfers Under The GDPR (OneTrust One Trust) (Z-Library)
17 pages
CNIL Recommendation Mobiles App
No ratings yet
CNIL Recommendation Mobiles App
93 pages
Product Marketer Assignment V2
No ratings yet
Product Marketer Assignment V2
10 pages
Using Artificial Intelligence To Support Compliance With The Genera
No ratings yet
Using Artificial Intelligence To Support Compliance With The Genera
13 pages
Mobile Data Privacy
No ratings yet
Mobile Data Privacy
3 pages
UNU TB - 3 2023 - Regulating Cross Border Data Flows
No ratings yet
UNU TB - 3 2023 - Regulating Cross Border Data Flows
7 pages
Core Regulations Overview
No ratings yet
Core Regulations Overview
4 pages
A Large Scale Privacy Assessment of Android Third-Party SDKs
No ratings yet
A Large Scale Privacy Assessment of Android Third-Party SDKs
16 pages
Third-Party Privacy Tools
No ratings yet
Third-Party Privacy Tools
5 pages
Mobile Device Security Essentials
No ratings yet
Mobile Device Security Essentials
10 pages
IAPP - Top10 Operational Responses To The GDPR
No ratings yet
IAPP - Top10 Operational Responses To The GDPR
55 pages
GDPR Future Banking Business
No ratings yet
GDPR Future Banking Business
4 pages
Dissertation Proposal New
No ratings yet
Dissertation Proposal New
6 pages
Assignement 4
No ratings yet
Assignement 4
6 pages
The Ultimate Guide To ECP-682 Ericsson Certified Professional - RAN Transport
No ratings yet
The Ultimate Guide To ECP-682 Ericsson Certified Professional - RAN Transport
3 pages
White Paper Mobile Data Privacy Pradeo2
No ratings yet
White Paper Mobile Data Privacy Pradeo2
17 pages
Class Activities - 2024 - GDPR International Transfer Quiz
No ratings yet
Class Activities - 2024 - GDPR International Transfer Quiz
5 pages
Manual On Cross-Border Industrial Data Management: January 27, 2025
No ratings yet
Manual On Cross-Border Industrial Data Management: January 27, 2025
33 pages
Analyzing The Impact of GDPR On Storage Systems
No ratings yet
Analyzing The Impact of GDPR On Storage Systems
7 pages
Module 2 App Testing Standards and Guidelines
No ratings yet
Module 2 App Testing Standards and Guidelines
24 pages
Cipp e
No ratings yet
Cipp e
5 pages
DroidData Tracking and Monitoring Data Transmission in The Android Operating System
No ratings yet
DroidData Tracking and Monitoring Data Transmission in The Android Operating System
15 pages
Playstore - 6010 (1) (27) - Pages
No ratings yet
Playstore - 6010 (1) (27) - Pages
62 pages
Deep Learning Using Rectified Linear Units (ReLU)
No ratings yet
Deep Learning Using Rectified Linear Units (ReLU)
7 pages
A Web Based Application For Automating Bank Loan Eligibility Using Machine Learning
No ratings yet
A Web Based Application For Automating Bank Loan Eligibility Using Machine Learning
43 pages
ML Unit 2
No ratings yet
ML Unit 2
22 pages
IR UNIT I - Notes
0% (1)
IR UNIT I - Notes
23 pages
CME Iceberg Order Detection Guide
No ratings yet
CME Iceberg Order Detection Guide
16 pages
Sentiment Analysis of Twitter Data Using TF-IDF and Machine Learning Techniques
No ratings yet
Sentiment Analysis of Twitter Data Using TF-IDF and Machine Learning Techniques
4 pages
IR Evaluation with TREC_EVAL
No ratings yet
IR Evaluation with TREC_EVAL
10 pages
Classification of Spam Emails Using Deep Learning
No ratings yet
Classification of Spam Emails Using Deep Learning
6 pages
Digital 02 00024 v2
No ratings yet
Digital 02 00024 v2
19 pages
Cross-Domain Recommendations Using Reviews
No ratings yet
Cross-Domain Recommendations Using Reviews
5 pages
Financial Risk Analysis Guide
No ratings yet
Financial Risk Analysis Guide
49 pages
Fundamentals of Machine Learning With QA
No ratings yet
Fundamentals of Machine Learning With QA
41 pages
BERT Model in Sentiment Analysis
No ratings yet
BERT Model in Sentiment Analysis
2 pages
Food Waste ML Thesis
No ratings yet
Food Waste ML Thesis
45 pages
Oral Questions LP-II: Star Schema
No ratings yet
Oral Questions LP-II: Star Schema
21 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
A New Architecture of Internet of Things and Big Data e - 2018 - Future Generati
No ratings yet
A New Architecture of Internet of Things and Big Data e - 2018 - Future Generati
13 pages
Face Mask Classification Using Convolutional Neural Networks With Facial Image Regions and Super Resolution
No ratings yet
Face Mask Classification Using Convolutional Neural Networks With Facial Image Regions and Super Resolution
10 pages
Early Wheat Disease Detection
No ratings yet
Early Wheat Disease Detection
15 pages
Information Extraction Using Context-Free Grammatical Inference From Positive Examples
No ratings yet
Information Extraction Using Context-Free Grammatical Inference From Positive Examples
4 pages
Developing Recommendation Systems Using Deep Learning Comparison of
No ratings yet
Developing Recommendation Systems Using Deep Learning Comparison of
10 pages
A Novel Approach For End-To-End Navigation For Real Mobile Robots Using A Deep Hybrid Model
No ratings yet
A Novel Approach For End-To-End Navigation For Real Mobile Robots Using A Deep Hybrid Model
21 pages
PupilHeart Heart Rate Variability Monitoring Via Pupillary Fluctuations On Mobile Devices
No ratings yet
PupilHeart Heart Rate Variability Monitoring Via Pupillary Fluctuations On Mobile Devices
12 pages
Clinithink CLiX ENRICH White Paper
No ratings yet
Clinithink CLiX ENRICH White Paper
16 pages
Report On Ectopic Beat Classification Using ML
No ratings yet
Report On Ectopic Beat Classification Using ML
50 pages
Human Activity Recognition via ML
100% (1)
Human Activity Recognition via ML
5 pages
Corn Disease Detection Using Multimodal Data Fusion For Enhanced Precision in Agricultural Decision-Making.
No ratings yet
Corn Disease Detection Using Multimodal Data Fusion For Enhanced Precision in Agricultural Decision-Making.
19 pages
Sonnleitner - Audio Identification Via FIngerprinting: Achieving Robustness To Severe Signal Modifications
No ratings yet
Sonnleitner - Audio Identification Via FIngerprinting: Achieving Robustness To Severe Signal Modifications
196 pages
Detecting Crypto Market Manipulations
No ratings yet
Detecting Crypto Market Manipulations
10 pages

Automating GDPR Checks for Android Apps

Uploaded by

Automating GDPR Checks for Android Apps

Uploaded by

Automating the GDPR Compliance

Assessment for Cross-border Personal

Target_Countries [first-party] Uruguay Location_Bluetooth

[adequacy decision] [non-adequacy decision]

Identify type of cross-

Download app Download

Check T1 ** Check T2** Check T3**

Feature vector Importance weights Application of Stratified k-fold Experimental

Class Precision Recall F-measure NVP Specificity F-measure- +/-

10. Check flow-

14: Analyse app traffic

4.3.4 Omitted cross-border transfer disclosure

Extract data flow

Check criterion C1.1

[meet] 7,579 (75%)

[not meet] 1,508 (15%)

[meet] 6,071 (60%)

FD: 248 [first-party] 758 (8%)

[adequacy decison] 75 (0.7%) [non-adequacy decision] 5,665 (56%)

To perform a more in-depth analysis, we relied on the

You might also like

Check T1 Check T2 Check T3**