[go: up one dir, main page]

Skip to main content

Classification of Datasets Used in Data Anonymization for IoT Environment

  • Conference paper
  • First Online:
Advances and Trends in Artificial Intelligence. Theory and Applications (IEA/AIE 2024)

Abstract

The integration of Internet of Things (IoT) devices in smart cities facilitates the collection, sharing, and analysis of data to optimize city services, enhance residents’ well-being, and enable more efficient decision-making processes. Balancing the benefits of IoT data collection and sharing with the imperative of preserving user privacy involves implementing robust security measures and anonymization techniques to ensure the safety, security, and trustworthiness of IoT technologies. Anonymization techniques enable service providers to collect, store, and share data while preserving user privacy, with various methods applicable across different types. This paper presents a literature review focusing on data anonymization in the IoT environment, aiming to classify datasets used to evaluate newly proposed anonymization methods, examining their types, availability, size, and applicability in research. The study aims to provide insights into the datasets employed in data anonymization research for IoT environments and determine the level of using relational datasets in such studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdelhameed, S.A., Moussa, S.M., Khalifa, M.E.: Privacy-preserving tabular data publishing: a comprehensive evaluation from web to cloud. Comput. Secur. 72, 74–95 (2018). https://doi.org/10.1016/j.cose.2017.09.002

    Article  Google Scholar 

  2. Alemdar, H., Ertan, H., Incel, O.D., Ersoy, C.: ARAS human activity datasets in multiple homes with multiple residents. In: 2013 7th International Conference on Pervasive Computing Technologies for Healthcare and Workshops, pp. 232–235 (2013)

    Google Scholar 

  3. Alex, C., Creado, G., Almobaideen, W., Alghanam, O.A., Saadeh, M.: A comprehensive survey for IoT security datasets taxonomy, classification and machine learning mechanisms. Comput. Secur. 132, 103283 (2023). https://doi.org/10.1016/j.cose.2023.103283

    Article  Google Scholar 

  4. American Time Use Survey. U.S. Bureau of Labor Statistics. https://www.bls.gov/tus/tables.htm. Accessed 29 Jan 2024

  5. ARAS datasets. Bogazici Univerzity. https://www.cmpe.boun.edu.tr/aras/. Accessed 29 Jan 2024

  6. Becker, B., Kohavi, R.: Adult. UCI Machine Learning Repository (1996). https://archive.ics.uci.edu/dataset/2/adult. Accessed 29 Jan 2024

  7. Cattral, R., Oppacher, F.: Poker Hand. UCI Machine Learning Repository (2007). https://archive.ics.uci.edu/dataset/158/poker+hand. Accessed 29 Jan 2024

  8. CER Smart Meter Project by Irish Social Science Data Archive. https://github.com/wwzjustin/CER-Smart-Meter-Project-by-Irish-Social-Science-Data-Archive. Accessed 29 Jan 2024

  9. CER smart metering project-electricity customer behaviour trial, 2009–2010. Commission for Energy Regulation (CER), Irish Social Science Data Archive Dublin, Ireland (2012)

    Google Scholar 

  10. Chatzaki, C., Pediaditis, M., Vavoulas, G., Tsiknakis, M.: Human daily activity and fall recognition using a smartphone’s acceleration sensor. In: Röcker, C., O’Donoghue, J., Ziefle, M., Helfert, M., Molloy, W. (eds.) ICT4AWE 2016. CCIS, vol. 736, pp. 100–118. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62704-5_7

    Chapter  Google Scholar 

  11. Chen, S.: PM2.5 Data of Five Chinese Cities. UCI Machine Learning Repository (2017). https://archive.ics.uci.edu/dataset/394/pm2+5+data+of+five+chinese+cities. Accessed 29 Jan 2024

  12. Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1082–1090 (2011)

    Google Scholar 

  13. De Keersmaeker, F., Cao, Y., Ndonda, G.K., Sadre, R.: A survey of public IoT datasets for network security research. IEEE Commun. Surv. Tutor. 25(3), 1808–1840 (2023). https://doi.org/10.1109/COMST.2023.3288942

    Article  Google Scholar 

  14. Electric Vehicle Charging Station Data. City of Boulder Open Data. https://open-data.bouldercolorado.gov/datasets/95992b3938be4622b07f0b05eba95d4c_0. Accessed 29 Jan 2024

  15. Environmental Sensor Telemetry Data. https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k. Accessed 29 Jan 2024

  16. Fish, J., et al.: Transportation secure data center. United States (2015). https://doi.org/10.25984/1798334

  17. Goldberg, K., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: a constant time collaborative filtering algorithm. Inf. Retrieval 4, 133–151 (2001)

    Article  Google Scholar 

  18. Gowalla. Stanford Network Analysis Project. https://snap.stanford.edu/data/loc-gowalla.html. Accessed 29 Jan 2024

  19. IPUMS USA dataset. U.S. Census Data for Social, Economic, and Health Research. https://usa.ipums.org/usa/. Accessed 29 Jan 2024

  20. Janosi, A., Steinbrunn, W., Pfisterer, M., Detrano, R.: Heart Disease. UCI Machine Learning Repository (1988). https://archive.ics.uci.edu/dataset/45/heart+disease. Accessed 29 Jan 2024

  21. Jester Datasets for Recommender Systems and Collaborative Filtering Research. UC Berkeley. https://eigentaste.berkeley.edu/dataset/. Accessed 29 Jan 2024

  22. Jia, R., Caleb Sangogboye, F., Hong, T., Spanos, C., Baun Kjærgaard, M.: PAD: protecting anonymity in publishing building related datasets. In: Proceedings of the 4th ACM Conference on Embedded Systems for Energy-Efficient Buildings. ACM (2017)

    Google Scholar 

  23. Kaur, B., et al.: Internet of Things (IoT) security dataset evolution: challenges and future directions. Internet Things 22, 100780 (2023). https://doi.org/10.1016/j.iot.2023.100780

    Article  Google Scholar 

  24. Kuldeep, G., Zhang, Q.: Multi-class privacy-preserving cloud computing based on compressive sensing for IoT. J. Inf. Secur. Appl. 66 (2022). https://doi.org/10.1016/j.jisa.2022.103139

  25. Lawrance, J.U., Jesudhasan, J.V.N.: Privacy preserving parallel clustering based anonymization for big data using MapReduce framework. Appl. Artif. Intell. 35(15), 1587–1620 (2021). https://doi.org/10.1080/08839514.2021.1987709

    Article  Google Scholar 

  26. Liu, X., Feng, X., Zhu, Y.: Transactional data anonymization for privacy and information preservation via disassociation and local suppression. Symmetry 14(3) (2022). https://doi.org/10.3390/sym14030472

  27. Medková, J., Hynek, J.: Application-oriented anonymization framework for social network datasets and IoT environments. In: Bella, G., Doinea, M., Janicke, H. (eds.) SecITC 2022. LNCS, vol. 13809, pp. 261–274. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-32636-3_15

    Chapter  Google Scholar 

  28. Moro, S., Rita, P., Cortez, P.: Bank Marketing. UCI Machine Learning Repository (2012). https://archive.ics.uci.edu/dataset/222/bank+marketing. Accessed 29 Jan 2024

  29. Murray, D., et al.: A data management platform for personalised real-time energy feedback. In: Proceedings of the 8th International Conference on Energy Efficiency in Domestic Appliances and Lighting (2015)

    Google Scholar 

  30. n2c2 NLP Research Data Sets. Harvard Medical School. https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/. Accessed 29 Jan 2024

  31. Neves, F., Souza, R., Sousa, J., Bonfim, M., Garcia, V.: Data privacy in the Internet of Things based on anonymization: a review. J. Comput. Secur. 31(3), 261–291 (2023). https://doi.org/10.3233/JCS-210089

    Article  Google Scholar 

  32. OMNeT++ Discrete Event Simulator. https://github.com/omnetpp/omnetpp/tree/master. Accessed 29 Jan 2024

  33. Otgonbayar, A., Pervez, Z., Dahal, K., Eager, S.: K-VARP: K-anonymity for varied data streams via partitioning. Inf. Sci. 467, 238–255 (2018). https://doi.org/10.1016/j.ins.2018.07.057

    Article  Google Scholar 

  34. PAD Protecting Anonymity. https://github.com/PAD-Protecting-Anonymity/PAD. Accessed 29 Jan 2024

  35. Piorkowski, M., Sarafijanovic-Djukic, N., Grossglauser, M.: CRAWDAD epfl/mobility (2022). https://ieee-dataport.org/open-access/crawdad-epflmobility. Accessed 29 Jan 2024

  36. REFIT: Electrical load measurements. University of Strathclyde. https://pureportal.strath.ac.uk/en/datasets/refit-electrical-load-measurements-cleaned/datasets/. Accessed 29 Jan 2024

  37. Sadeghi-Nasab, A., Ghaffarian, H., Rahmani, M.: Apache Flink and clustering-based framework for fast anonymization of IoT stream data. Intell. Syst. Appl. 20 (2023). https://doi.org/10.1016/j.iswa.2023.200267

  38. Southern Nevada Household Travel Survey. Transportation Secure Data Center (2014). https://www.nrel.gov/transportation/secure-transportation-data/tsdc-southern-nevada-travel-survey.html. Accessed 29 Jan 2024

  39. The MobiFall and MobiAct datasets. BMI lab. https://bmi.hmu.gr/the-mobifall-and-mobiact-datasets-2/. Accessed 29 Jan 2024

  40. UC Irvine Machine Learning Repository. https://archive.ics.uci.edu/. Accessed 1 Feb 2024

  41. UMassTraceRepository. http://traces.cs.umass.edu/index.php/Smart/Smart. Accessed 29 Jan 2024

  42. US NHTSA crash statistics. National US Center for Statistics and Analysis. https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars. Accessed 29 Jan 2024

  43. Uzuner, O., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18(5), 552–556 (2011). https://doi.org/10.1136/amiajnl-2011-000203

    Article  Google Scholar 

  44. Vito, S.: Air Quality. UCI Machine Learning Repository (2016). https://archive.ics.uci.edu/dataset/360/air+quality. Accessed 29 Jan 2024

  45. Xie, M., Huang, M., Bai, Y., Hu, Z.: The anonymization protection algorithm based on fuzzy clustering for the ego of data in the Internet of Things. J. Electr. Comput. Eng. 2017 (2017). https://doi.org/10.1155/2017/2970673

  46. Zheng, Y.: T-drive trajectory data sample (2011). https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/. Accessed 29 Jan 2024

Download references

Acknowledgements

This study is supported by the SPEV project 2024 run at the Faculty of Informatics and Management, University of Hradec Králové, Czech Republic. The assistance provided by Dominik Palla with data collection is greatly appreciated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jana Medková .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Medková, J. (2024). Classification of Datasets Used in Data Anonymization for IoT Environment. In: Fujita, H., Cimler, R., Hernandez-Matamoros, A., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Theory and Applications. IEA/AIE 2024. Lecture Notes in Computer Science(), vol 14748. Springer, Singapore. https://doi.org/10.1007/978-981-97-4677-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-4677-4_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-4676-7

  • Online ISBN: 978-981-97-4677-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics