1. Introduction
The reckless use of social media and other online services can expose insights into personal affairs beyond what was intentionally meant to be shared. The posted data might reveal significantly more information than first thought, particularly when analyzed and combined with auxiliary databases. For instance, it has been shown that algorithms can estimate a person’s personality effectively from likes [
1,
2,
3]. With regard to the rich multimedia data that are posted in large volumes on social media, one might suspect significant unintended disclosure of personal information amongst the many inconspicuous pictures and videos of family and friends. Such unintended disclosure of personal information can threaten someones privacy. For this reason, this study is of interest to investigate the awareness of a person posting content on social media. In this paper, we study the awareness of unintentionally published data on social media, commonly referred to as incidental data [
4]. We conduct a survey that implicitly assesses the participant’s awareness of incidental data without the influence of the question itself through a survey, as the mere assessment of a privacy concern increases its awareness [
5]. As a methodology, we used a survey that asked questions regarding postings that contained incidental data. In our previous research [
6], we analyzed postings using
Open Source Intelligence (OSINT) methods while limited to two hours per target. It was possible to detect data that were not intended to be published within that time limit. The responses from the quantitative survey method are then analyzed using statistical methods. Further, data found within those postings lead to further data that most found privacy-compromising. Our survey provides robust evidence that there exists a significant lack of awareness around incidental data in social media postings as more than one-fifth of participants would share content containing hidden data they find privacy-compromising. We have indicators that suggest awareness of hidden data that can be a threat to privacy; thus, the focus on proper guidelines and education to help prevent the self-publication of incidental data may be a point of improvement. This paper makes the following contributions:
Presents a novel survey methodology that indirectly evaluates participants’ awareness of incidental data, avoiding the influence of the question itself;
Provides robust empirical evidence of the lack of awareness among social media users regarding the privacy implications of their online content;
Highlights the prevalence of incidental data in social media posts and its potential threat to user privacy.
In the first part of this paper, we discuss the current state of the art and summarize the views on the psychological impact of disclosure or nondisclosure of information. Then, we provide insights on our study design and further list the questions of our survey, including answer choices. Next, we show and describe our results in
Section 4 before we evaluate and discuss those results in
Section 5. We summarize our study with a final conclusion in
Section 6.
2. Background
Privacy on social media is often discussed in terms of either privacy-jeopardizing settings or malicious actors [
7]. However, Krämer and Schäwel [
8] discuss the urge of people to self-disclose personal information on social media. Schneier [
4] defined a taxonomy of different data that is in connection with the usage of social media, namely, the following: service, disclosed, entrusted, incidental, behavioral, and derived data. In particular, Schneier [
4] argues that incidental data in this context are data posted by other people, over which one has no control.
Definition 1. “Incidental data is what other people post about you: a paragraph about you that someone else writes, a picture of you that someone else takes and posts. Again, it’s basically the same stuff as disclosed data, but the difference is that you don’t have control over it, and you didn’t create it in the first place.” Schneier [4]. In previous work [
6], we argue that the term incidental is used in case something unexpected was found that should not be there. For instance, during an X-ray examination meant to assess a potential bone fracture, the discovery of tumorous tissue is termed an incidental finding [
9]. Considering this more general meaning, we argue that one’s unawareness of unintentionally publishing problematic data, alongside the primary reason for publishing content on social media, also leads to the uncontrollability of personal data. Fitting the intent behind the definition by Schneier [
4], we propose an extended Definition 2.
Definition 2. Incidental data is data that one has no control over, either due to another person disclosing it or the unawareness of its existence within data disclosed by oneself.
2.1. Privacy from a Psychological Perspective
As argued by Schlosser [
10],
self-disclosure can be defined as communicating personal information about oneself to another person that is a close representation of oneself. Whereas
self-presentation is defined as controlled and directed information that impacts the impressions of people about oneself Schlosser [
10]. Barasch [
11] discusses
intrapersonal as emotions and processes within oneself, whereas
interpersonal describes effects on relationships between others and oneself.
Luo and Hancock [
12] state that disclosure fulfills basic social needs and thus improves one’s well-being. Krämer and Schäwel [
8] continue that privacy is an intrapersonal secondary need for people. Equally important is the view of the intrapersonal and interpersonal cost of not disclosing information. Sharing a problem (privileged information) can help to improve one’s situation by gaining new views on a personal topic or the view of others about oneself [
13]. However, the consequences of sharing information are often overestimated [
13]. Furthermore, the sharing of secrets can be used in a strategic manner to evade criticism and gain support [
14]. Consequently, it can be said that disclosing information can have positive effects and be vital for oneself.
Social media seems to have filled a perfect spot that can fulfill the human motivation for self-disclosure [
8]. However, this also entails dangers, as interactions with social media as simple and trivial as giving
likes to certain posts can give away personal information [
1,
15].
Brough and Martin [
5] claim that research on privacy is strongly focused on a user’s motivation to protect their personal data from unauthorized usage, which correlates to privacy concerns; however, they focus very little of their research on privacy knowledge. The authors further state that privacy concerns might be artificially increased when they are being assessed.
Automated data collection and the usage of specialized algorithms can reveal sensitive information about one’s life [
16,
17]. Fast and Jago [
16] find that people underestimate the risks of sharing personal data; moreover, people seem unable to take strong actions even after severe privacy violations [
18]. Such behavior comes from focusing on benefits and convenience combined with not being an explicitly identifiable victim [
19]. Conversely, the benefit and convenience of data collection and usage of algorithms pose a massive threat to one’s privacy; however, this may have created a state of mind where people think it is not realistically possible to stop it.
2.2. Privacy from a Technical Perspective
When people interact with a social network of their choice, it can be assumed that the main goal is to share content and not to tackle a host of privacy settings. This can be problematic as companies have discovered that user data, especially of a large group of people, can be a valuable asset. Even though there are good examples of user-based privacy, there are companies that take advantage of people’s behavior. As discussed by Bösch et al. [
20], such methods are referred to as dark privacy strategies or dark privacy patterns. Research in human–computer interaction and user-experience design has found that people are more likely to press a button in a rush if it is green. This led to situations where companies made the accept button for “allow cookies” or “share statistical data” buttons slightly larger and green, whereas the decline button is slightly smaller and gray [
21,
22].
Al-Charchafchi et al. [
23] found in their review that users are threatened in multiple ways. The threat vectors concern information privacy, social engineering, data leakages through unfit privacy settings, or
Application Programming Interface (API) weaknesses. A similar line is taken by the work of Johansen et al. [
24], with the authors providing an insight into the problems and opportunities of lifelogging systems. In forensics and in court, the analysis of
Electric Network Frequency (ENF) becomes used more often in order to verify timestamps or the untampered integrity of audio and video recordings [
25,
26,
27].
2.3. Privacy from an Awareness Perspective
The quantitative study of Amon et al. [
28] on interdependent privacy provides valuable insights into aspects of privacy awareness, especially the sharing of private information of other persons. The study analyzed 245 responses on 68 real-world pictures out of 13 categories through a questionnaire about the likelihood of sharing given pictures, entertainment, and its privacy rating. The study assessed the specific personality traits known as the
dark triad, which focuses on narcissism, psychopathy, and manipulative personality style. Even though the study gives valuable insights into privacy awareness on social media, it focuses on pictures shared by others. Based on the responses on the pictures and personality traits, the findings from a cluster analysis were the following three interdependent privacy user categories: privacy preservers, privacy ignorers, and privacy violators. The study reveals that privacy ignorers have a low dark triad and low levels of education level but prefer personal privacy. Privacy violators have a high dark triad, high levels of education, and further prefer openness as a key motivation factor for sharing potentially sensitive pictures of other persons.
Padyab et al. [
29] conducted two sub-studies regarding privacy awareness on social media based on exploratory focus groups. The first tackles dedicated algorithms on social media; the second explores self-disclosure. These studies show that users were generally unaware of the extent published data can be used to extract private information. Further, it was shown that a user’s awareness could be raised by letting them use an extraction tool on their own social media profile.
3. Implementation
We conducted the following four separate surveys: IDS2301, IDS2301U, IDS2302, and IDS2302U. It is important to mention that the presented study relies only on one survey, namely, IDS2301. Nonetheless, surveys IDS2301U, IDS2302, and IDS2302U were implemented in order to obtain reliable results, as explained in
Section 5.4. An overview of each survey, including the motivation, can be found in
Table A1. In accordance with the
General Data Protection Regulation (EU) 2016/679 (GDPR), an online survey provider was chosen in order to conduct the survey and collect responses. For the social media postings, we decided to use two well-analyzed postings from our previous work [
6], while limited to two hours per target, Kutschera [
6] found that privacy-compromising data unintentionally posted could be found by using OSINT. For the IDS2301 survey, we ran a test phase where 12 participants were asked to give feedback on the consistency, subjective understandability, and potential typos. The feedback was used to improve the survey. At the end of the survey, a link to the dedicated follow-up survey was presented.
3.1. Recruitment
Participants were recruited from three different courses at Graz University of Technology, Austria:
INH.04062UF Agile Software Development (170 students),
INP.32600UF Mobile Applications (45 students), and
INP.33172UF Software Technology (84 students). Of all 299 students, 198 optionally participated in survey
IDS2301, as shown in
Figure 1. Students who were signed up for two or more lectures were only allowed to take the
IDS2301 survey once.
To motivate participation, students were offered bonus points counting toward their final grade for completing the survey. To claim points, a student had to submit a self-generated random
Universal Unique Identifier (UUID) token as part of the follow-up survey and subsequently to the university’s e-learning platform. Submitting the token as part of a separate follow-up study instead of the main study, allowed us to correctly identify students for the purpose of crediting points while at the same time preserving their anonymity in main survey.
Section 5.4 discusses this aspect in more detail. The instructions on how to claim points were only revealed at the end of the main survey, which reduced the risk of students going directly to the follow-up study to claim credits, skipping the main survey. It was certainly possible to receive these instructions out-of-band, circumvent the protection mechanism, and claim points without accessing the main survey. However, this was by design, as we found it preferable over the case where students would fast-click through the survey and submit bogus data. Our scheme gives no incentive to complete the survey multiple times, as bonus points will only be received once.
3.2. Assessment Design
In our survey, we judiciously employed 5-point and 6-point Likert scales [
30] for distinct sets of questions driven by the nature of the responses we sought to capture. For 23 of our questions, we utilized the 5-point Likert scale, acknowledging its capacity to provide a balanced range of options from “Strongly Disagree” to “Strongly Agree,” along with a neutral midpoint. Including a neutral option in these cases allows for a more accurate representation of respondent attitudes, particularly when they may lack a definitive opinion or possess moderate views [
31]. This configuration was especially suitable for questions Q1 and Q2, as depicted in
Table 1, and Q3.1 to Q3.21, as seen in
Table 2, where neutrality or a middle-ground perspective was a plausible and informative response.
Conversely, for 14 of our sub-questions, we chose the 6-point Likert scale. By compelling respondents to lean towards agreement or disagreement, the 6-point scale aids in delineating clearer, more decisive insights into specific attitudes or opinions, which is particularly valuable in areas where a neutral stance is less informative or relevant to our research objectives [
32]. The absence of a neutral midpoint in the 6-point scale is instrumental in scenarios where decisiveness in responses is critical, or neutrality could result in ambiguous data interpretation [
31,
33]. We intended to compel respondents to take a definitive stance on Q7.1–Q7.14, as shown in
Table A4, thereby eliminating the central tendency bias, where participants might gravitate towards a neutral choice.
3.3. Questions
Our main survey consisted of 14 main questions and 72 sub-questions. For the initial questions Q1 and Q2, as seen in
Table 1, participants were presented with two example scenarios, one for each question. Each scenario was made up of three pictures, and we asked if they would share the content publicly if it was theirs. The questions consisted of a 5-P Likert scale entry in combination with an open-ended sub-question. Because such reflective questions can be influenced by later questions in the survey [
5], we ask these questions first.
For Q1, participants were presented with Example 1 consisting of three images from a video where someone shows the surroundings of a rural area that can be assumed to be their home, as seen in
Figure 2. The video title: Wild Oklahoma Weather, indicates that the video is about an upcoming severe storm. The participants were asked to consider if they would share the video if the depicted house was theirs. For Q2, participants were presented with Example 2, consisting of three social media postings, as seen in
Figure 3, and we ask them similarly if they would share the images publicly if they depicted their own property and surroundings.
Question Q3, as seen in
Table 2, asks the participants about their privacy perceptions on the various data types that are detectable from the social media postings detectable in Example 2 according to the OSINT analysis method proposed by Kutschera [
6]. Several other key sensitive data types, like date of birth (Q3.3), blood type (Q3.6), and social security number (Q3.7), are also included.
Questions Q4 and Q5, as seen in
Table 3, ask participants about their perception of various privacy guidelines they practice currently (Q4) and in the future (Q5). The questions Q4 and Q5 differ only in their usage, namely, Q4-current and Q5-future. As the guidelines are the same, they are best represented in joined
Table 3. The purpose of Q5 is to see to what extent participants had their perceptions influenced by participating in this survey.
Questions Q6 and Q7 are about social media usage (
Table A3 and
Table A4). Questions Q8–Q14, which are about demographic values, as seen in
Table A5, were implemented. We use these demographic data to organize respondents into various sub-group filters. The abbreviations on the filters used for each subgroup are listed and explained in
Table 4. Besides these active responses, the survey provider also collected the start and end timestamps for each survey. These start and end times allow us to calculate the time spent on the survey.
4. Results
Table 5 shows the percentage of participants who responded with either agree or strongly agree on questions about the privacy compromise for the various data types listed in questions Q3.1–Q3.20. The percentage of those that disagreed or strongly disagreed is shown in
Table A6. The background color in both tables is graded from green to red through yellow based on the cell value. Within both tables, the data type that can be found is shown more visually within rows E.1 and E.2, respectively. Further, the data types correlate to questions Q3.1–Q3.20.
The boxplot in
Figure 4 depicts the statistical properties of the responses, with the median at the tapered point with an orange line and supports the results presented in
Table 5. The adjoining areas indicate the 25% above and below the median, the whiskers indicate the first and fourth quartile of responses, and outliers are indicated by a circle.
Row E.1 within
Table 5 marks the data types extractable from Example 1, as shown in
Figure 2. Namely, the state or country (Q3.1), full name (Q3.2), full address (Q3.4), full name of previous owners (Q3.5), information on relatives (Q3.8), parcel number of the property (Q3.12), price of the property (Q3.13), date of purchase of property (Q3.14), size of the property (Q3.16), property tax (Q3.17), and security measures against burglars (Q3.19) as well as the absence thereof (Q3.20).
Row E.2 within
Table 5 marks extractable from Example 2, as shown in
Figure 3. Namely, the state or country (Q3.1), full name (Q3.2), full address (Q3.4), information on relatives (Q3.8), phone number (Q3.9), parcel number of the property (Q3.12), price of the property (Q3.13), date of purchase of property (Q3.14), size of the property (Q3.16), and security measures against burglars (Q3.19) as well as the absence thereof (Q3.20).
The percentage of positive answers to Q4 towards the current usage of awareness guidelines are shown in
Table 6, while the percentage of positive answers to Q5 towards the future usage of awareness guidelines are shown in
Table 7. The same filter groups are used as for Q3 in
Table 5. Based on the cell value, the background color is graded from green to red through yellow.
In Q6 and Q7 of our survey, participants were asked to optionally answer questions about their social media usage, what social media platforms they use, and to what extent on a 6P-Likert Scale with the following options: no answer (0), never (1), very rarely (2), rarely (3), occasionally (4), frequently (5), and very frequently (6), with 0 as the default value. The results are visualized in
Table A2, whereas the questions are listed in
Table A3 and
Table A4. The background color in
Table A2 is determined by the value of the cell from green to red through yellow.
The privacy awareness guidelines proposed by Kutschera [
6] are enumerated in
Table 3.
Table 8 shows how each guideline can prevent the exposure of a certain data type. For instance, enforcing guideline Q4.4 will help minimize the exposure risk of current state or country (Q3.1), date of birth (Q3.3), security measures against burglars (Q3.19), and absence of security measures against burglars (Q3.20). Naturally, OSINT has manifold ways of detecting data types, and some can be obtained by gaining knowledge of another data type first. For example, price of property (Q3.13) or date of property purchase (Q3.14) may become evident through the detection of full address (Q3.4). Those data types are listed in the Indirect column of
Table 8.
5. Discussions
We found that a two-thirds supermajority of the participants have privacy concerns about data types Q3.2, Q3.4, Q3.7, Q3.8, Q3.9, Q3.12, Q3.15, Q3.19, and Q3.20. For each of these data types,
Table 9 shows the percentage of respondents in various subgroups who agree or strongly agree to post either or both Example 1 and Example 2, matched with the privacy concerns of the group. Rows E.1 and E.2 indicate whether or not the data type can actually be found in the examples. Cell color corresponding to the filtered groups (ALP, E1P, and E2P) illustrates the distribution of majority levels in cases where the data type can be revealed. Cells that reach a simple majority (50%) are highlighted in yellow, while the ones reaching a two-thirds majority (66.67%) are highlighted in orange.
5.1. Evaluation and Interpretation of Survey Results
Our study aims to detect privacy awareness on social media implicitly. Alongside the methodology used, this study never asked or measured direct awareness about incidental data as a direct question, as this might have influenced the participant and thus rendered this study invalid. Moreover, we used well-analyzed postings from our previous research, of which we knew exactly what data types could be discovered in a strict time frame of up to two hours. In the first step, the participants had to answer whether they would have posted the content shown, see
Table 1. In the second step, the participants had to answer which data type would compromise their privacy if shared, see
Table 2. By combining the results of both questions and the data types found in each example, we gained implicit knowledge about whether the participants would have shared a certain data type and also had concerns about this data type, as shown in
Table 5. Below, we split the evaluation into topical sections to evaluate and interpret the present survey results from this study.
5.1.1. Implicit Incidental Data Awareness
Upon taking a closer look, it becomes evident that in certain cases, a supermajority of people concerned about their privacy with regard to a specific data type are willing to share content that can be used to reveal those specific data types.
For example, according to the definition, all 27 individuals in subgroup E1P would publish Example 1 as seen in
Figure 2. This example includes, among others, extractable data types full name (Q3.2), full address (Q3.4), information on relatives (Q3.8), and absence of security measures (Q3.20). Although, more than two-thirds of subgroup E1P express concern about these same data types as follows: Q3.2 (77.78%), Q3.4 (81.48%), Q3.8 (77.78%), and Q3.20 (66.67%).
Furthermore, more than two-thirds of subgroup E2P, who are likely to publish corresponding posting as shown in
Figure 3, are concerned about the following data types Q3.2 (73.81%), Q3.4 (85.71%), Q3.8 (83.33%), Q3.9 (85.71%), Q3.19 (69.05%), and Q3.20 (73.81%).
Data type parcel number (Q3.12) was included, but it did only reach a single majority of 62.96% (E1P) and 64.29% (E2P), respectively. The discussed details are visible in
Table 9, which is an excerpt of
Table 5.
In summary, the participants of E1P and E2P are concerned about data types Q3.2, Q3.4, Q3.8, and Q3.20, but are also very likely to share a post containing those data types. This allows us to draw an implicit conclusion that these individuals are unaware of incidental data contained in certain postings. Together these results provide important insights into the awareness of sharing privacy concerning incidental data.
5.1.2. Notable Results from Opposite Filter Groups
Overall, the privacy concern in group ALL with regard to data types Q3.2 (74.48%), Q3.4 (87.5%), Q3.8 (77.08%), and Q3.20 (73.96%) is high, as
Table 5 shows. At the same time, a look at
Table A6 reveals that between 18.75% and 6.25% disagree that the data types Q3.2 (18.75%), Q3.4 (6.25%), Q3.8 (14.58%), and Q3.20 (17.19%) are privacy-compromising, which further confirms our findings.
Furthermore, interesting is that 93.75% of those in subgroup LOB are concerned about the car license number (Q3.15), but only 76.04% of the overall group ALL and 66.28% of subgroup LRS, respectively, are concerned about the same data type. The reason for this could either be that people who lived on their own property are more aware of what can be revealed or what harm can be performed through a car license number, or the meaning of car license number was misunderstood for something other than a license plate.
As for data type full name of previous owners (Q3.5), 62.5% of those in the LOB and LOS groups are concerned, whereas in the ALL group 47.92% are concerned. Even significantly lower is the concern in subgroup ALP with 25.0%, and 40.7% for the LRS subgroup. An indication of a decrease in concern could be that property owners are more aware of potential risks that the name of previous owners can pose in comparison with people who live in rented accommodation.
Another subgroup of interest is LRB where 83.67% are concerned about Q3.19 (security measures against burglars) but only 62.5% of LOB are concerned whereas in the overall group ALL 71.88% are concerned. A possible explanation for this is that people who live on their own property have full power over installation and can also choose on their own to implement concealed and potentially strong measures against burglars, whereas people who live in rented accommodation need the approval of the landlord and will not get compensated in case they move to different housing. This reasoning could lead to a decision for a cheaper movable, and thus non-concealed measures against burglars.
The results that surprised us the most were that subgroup SUC has fewer concerns in each of the most concerned data types. Moreover, compared with group ALL, the concern is lowered on each data type except for current state or country (Q3.1) 47.62%, blood type (Q3.6) 23.81%, size of property (Q3.16) 42.86%, and property tax (Q3.17) 42.86%. A supermajority within the group SUC is concerned about full address (Q3.4) 66.67%, social security number (Q3.7) 66.67%, information on relatives (Q3.8) 71.43%, phone number (Q3.9) 71.43%, and the car license number (Q3.15) 71.43%.
5.1.3. Usage of Guidelines
From
Table 6, we observe that a two-thirds supermajority of participants (i.e., subgroup ALL) currently use guidelines Q4.1, Q4.6, Q4.7, and Q4.8, whereas Q4.5, Q4.10, Q4.11, and Q4.12 are currently only used by one-third of the same group. In contrast, the answers regarding future usage of the guidelines,
Table 7 shows the highest response on Q5.2 (Avoiding reflections on surfaces and mirrors), and the least response with regard to future usage on Q5.10 (Close curtains or avoid windows). The most feedback received was on Q5.0 (None other than those before) with 42.71%. Merely asking questions about privacy concerns is influencing participants [
5]. These results and the overall low rate of response on future usage of the mentioned guidelines suggest that the survey design did not greatly influence the participants.
From subgroup ALP’s responses on Q4.9–Q4.11 and Q5.9–Q5.11, as seen in
Table 3, we see that few countermeasures are in place today and that this situation will likely be the same in the future. This is interesting as
Table 5 shows that 81.25% of the same subgroup ALP “Agree” and “Strongly Agree” that data type full address (Q3.4) can compromise their privacy. Moreover, it can be assumed that full address (Q3.4) can be discovered very quickly when map material (Q4.9 and Q5.9) is included in a post, which the subgroup ALP would publish as per the definition in
Table 4.
As discussed in
Section 5.1.1, subgroups E1P and E2P are willing to post data that a majority of the group has privacy concerns about. Furthermore, the usage of the guidelines in
Table 8 reveals that the measures stated in the guidelines, which may well have prevented the publication of incidental data, are not used. For example, in order to avoid data type Q3.4 incidental data, one can focus on guidelines Q4.1, Q4.2, Q4.5, Q4.6, and Q4.8–13. Only two to three guidelines are used by a supermajority in groups E1P and E2P, as shown in
Table 6.
5.2. Similar Studies
There is a notable gap in the existing literature regarding the implicit analysis of individuals’ awareness when sharing postings on social media potentially containing incidental data. To the best of our knowledge, there exists no study we can compare with, while not directly addressing this specific aspect, the research conducted by Padyab et al. [
29] and Amon et al. [
28] are interesting to approximate with. The study by Amon et al. [
28] focuses on the psychological motivations behind users posting pictures of others, whereas the research by Padyab et al. [
29] confronts participants with data extraction tools on their own social media profiles.
5.3. Statistical Significance
This study uses a confidence interval of 95%. The confidence interval reflects an estimated range of values. Furthermore, the confidence interval indicates the accuracy of the estimate. The margin of error is also used for statistical evaluation. In our study, the margin of error is 7.07%. This indicates the accuracy of the estimate in relation to an entire group. Altogether, the results of the study of 192 students reflect the opinions and awareness of Austrian students. Equation (
1) represents the formula used for the margin of error
E with
Standard Error of the Proportion (SEP) and
Finite Population Correction (FPC).
Here,
E is the margin of error,
Z is the Z-score associated with the desired confidence level,
p as the estimated proportion of the population is set to 50%,
n is the sample size, and
N is the population size. The Z-score was set at 1.96. The target population are Austrian students, with a population of 288,381 as of February 2022 [
38,
39]. The population size is negligible because the FPC is 0.9999.
5.4. Trustability of Survey Results
All responses were received from students who received bonus points towards their grades. Due to data protection and ethics, the survey was designed to be 100% anonymous. Within the course, students had to enter one or multiple UUID tokens into the university submission system in order to receive offered bonus points. This token also had to be entered into the token collection surveys IDS2301U or IDS2302U. As a student, it was not allowed to have multiple tokens in the IDS2301U survey. However, IDS2302U received multiple entries since this was also the token submission survey for students who attended three courses. Further, we are able to analyze the data of the token collection survey IDS2301U, IDS2302U, and the university submission system.
In order to understand why we are highly certain students did not take a survey twice, we need to describe the process in more detail. The survey and bonus point granting workflow is also visualized within
Figure 5. Emails with the link to the survey IDS2301 were sent out alongside with emails to students who attended one or more classes. Students had to go through the survey to find the link for the token collection survey IDS2301U. Students had to generate the UUID token by themself and enter it into IDS2301U. Since IDS2301U was a two-question survey (email for updates and token) it would have been easier for students to ask their peers and simply enter their UUID, hence claiming that they had done the survey (IDS2301) rather than actually going through the survey (IDS2301). An analysis of the university data and the token collection survey showed that zero students claimed bonus points for multiple courses in IDS2301U. It is important to mention that not all enrolled students were graded and that not all graded students needed bonus points since the link to the survey was sent out close to the end of the course. Hence, students could estimate if an excellent grade was already reached or not, thus rendering bonus points useless.
Students who enrolled to two or more courses received, alongside the IDS2301 survey link, the information and link to the secondary IDS2302 survey. The IDS2302 survey shows the same postings but asks what data and information they find in an open question. At the end a link to the dedicated token collection survey IDS2302U was revealed.
6. Conclusions
Disclosure of private information is crucial to social interactions, yet the awareness of privacy-compromising data hidden within in a self-disclosed posting needs more attention. This study extends the previous study of Kutschera [
6] and using the comprehensively analyzed postings as seen in
Figure 2 and
Figure 3. The implicit design of survey questions allowed us to gain inside information on the awareness of people about private data. Furthermore, our study shows that awareness about incidental data is very low, and this constitutes a privacy and security concern. Our survey shows clear privacy concerns on data types full name (Q3.2), full address (Q3.4), information on relatives (Q3.8), phone number (Q3.9), parcel number of the property (Q3.12), security measures against burglars (Q3.19), as well as the absence thereof (Q3.20). Though participants were not forced to a decision by responding neutral on the 5-P Likert scale, 14.06% to 21.88% responded that, despite their privacy concerns, they were surprisingly willing to publish a posting, knowingly or not, that contains information considered privacy-compromising, thus incidental data.
Even though our survey achieved a confidence interval of 95%, the margin of error of 7.07% is still above the standard of 5% with 192 responses. Further, the results of our survey are limited with regard to interpretation, as the survey only asked Austrian students. With that in mind, we recommend a more widespread survey on the privacy and security issues of incidental data. Policymakers should also be made aware of these issues so that they can implement guidelines or other mechanisms that are latent to either raise awareness among the general public or alert persons before posting potentially harmful postings.