Abstract
The integration of Internet of Things (IoT) devices in smart cities facilitates the collection, sharing, and analysis of data to optimize city services, enhance residents’ well-being, and enable more efficient decision-making processes. Balancing the benefits of IoT data collection and sharing with the imperative of preserving user privacy involves implementing robust security measures and anonymization techniques to ensure the safety, security, and trustworthiness of IoT technologies. Anonymization techniques enable service providers to collect, store, and share data while preserving user privacy, with various methods applicable across different types. This paper presents a literature review focusing on data anonymization in the IoT environment, aiming to classify datasets used to evaluate newly proposed anonymization methods, examining their types, availability, size, and applicability in research. The study aims to provide insights into the datasets employed in data anonymization research for IoT environments and determine the level of using relational datasets in such studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdelhameed, S.A., Moussa, S.M., Khalifa, M.E.: Privacy-preserving tabular data publishing: a comprehensive evaluation from web to cloud. Comput. Secur. 72, 74–95 (2018). https://doi.org/10.1016/j.cose.2017.09.002
Alemdar, H., Ertan, H., Incel, O.D., Ersoy, C.: ARAS human activity datasets in multiple homes with multiple residents. In: 2013 7th International Conference on Pervasive Computing Technologies for Healthcare and Workshops, pp. 232–235 (2013)
Alex, C., Creado, G., Almobaideen, W., Alghanam, O.A., Saadeh, M.: A comprehensive survey for IoT security datasets taxonomy, classification and machine learning mechanisms. Comput. Secur. 132, 103283 (2023). https://doi.org/10.1016/j.cose.2023.103283
American Time Use Survey. U.S. Bureau of Labor Statistics. https://www.bls.gov/tus/tables.htm. Accessed 29 Jan 2024
ARAS datasets. Bogazici Univerzity. https://www.cmpe.boun.edu.tr/aras/. Accessed 29 Jan 2024
Becker, B., Kohavi, R.: Adult. UCI Machine Learning Repository (1996). https://archive.ics.uci.edu/dataset/2/adult. Accessed 29 Jan 2024
Cattral, R., Oppacher, F.: Poker Hand. UCI Machine Learning Repository (2007). https://archive.ics.uci.edu/dataset/158/poker+hand. Accessed 29 Jan 2024
CER Smart Meter Project by Irish Social Science Data Archive. https://github.com/wwzjustin/CER-Smart-Meter-Project-by-Irish-Social-Science-Data-Archive. Accessed 29 Jan 2024
CER smart metering project-electricity customer behaviour trial, 2009–2010. Commission for Energy Regulation (CER), Irish Social Science Data Archive Dublin, Ireland (2012)
Chatzaki, C., Pediaditis, M., Vavoulas, G., Tsiknakis, M.: Human daily activity and fall recognition using a smartphone’s acceleration sensor. In: Röcker, C., O’Donoghue, J., Ziefle, M., Helfert, M., Molloy, W. (eds.) ICT4AWE 2016. CCIS, vol. 736, pp. 100–118. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62704-5_7
Chen, S.: PM2.5 Data of Five Chinese Cities. UCI Machine Learning Repository (2017). https://archive.ics.uci.edu/dataset/394/pm2+5+data+of+five+chinese+cities. Accessed 29 Jan 2024
Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1082–1090 (2011)
De Keersmaeker, F., Cao, Y., Ndonda, G.K., Sadre, R.: A survey of public IoT datasets for network security research. IEEE Commun. Surv. Tutor. 25(3), 1808–1840 (2023). https://doi.org/10.1109/COMST.2023.3288942
Electric Vehicle Charging Station Data. City of Boulder Open Data. https://open-data.bouldercolorado.gov/datasets/95992b3938be4622b07f0b05eba95d4c_0. Accessed 29 Jan 2024
Environmental Sensor Telemetry Data. https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k. Accessed 29 Jan 2024
Fish, J., et al.: Transportation secure data center. United States (2015). https://doi.org/10.25984/1798334
Goldberg, K., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: a constant time collaborative filtering algorithm. Inf. Retrieval 4, 133–151 (2001)
Gowalla. Stanford Network Analysis Project. https://snap.stanford.edu/data/loc-gowalla.html. Accessed 29 Jan 2024
IPUMS USA dataset. U.S. Census Data for Social, Economic, and Health Research. https://usa.ipums.org/usa/. Accessed 29 Jan 2024
Janosi, A., Steinbrunn, W., Pfisterer, M., Detrano, R.: Heart Disease. UCI Machine Learning Repository (1988). https://archive.ics.uci.edu/dataset/45/heart+disease. Accessed 29 Jan 2024
Jester Datasets for Recommender Systems and Collaborative Filtering Research. UC Berkeley. https://eigentaste.berkeley.edu/dataset/. Accessed 29 Jan 2024
Jia, R., Caleb Sangogboye, F., Hong, T., Spanos, C., Baun Kjærgaard, M.: PAD: protecting anonymity in publishing building related datasets. In: Proceedings of the 4th ACM Conference on Embedded Systems for Energy-Efficient Buildings. ACM (2017)
Kaur, B., et al.: Internet of Things (IoT) security dataset evolution: challenges and future directions. Internet Things 22, 100780 (2023). https://doi.org/10.1016/j.iot.2023.100780
Kuldeep, G., Zhang, Q.: Multi-class privacy-preserving cloud computing based on compressive sensing for IoT. J. Inf. Secur. Appl. 66 (2022). https://doi.org/10.1016/j.jisa.2022.103139
Lawrance, J.U., Jesudhasan, J.V.N.: Privacy preserving parallel clustering based anonymization for big data using MapReduce framework. Appl. Artif. Intell. 35(15), 1587–1620 (2021). https://doi.org/10.1080/08839514.2021.1987709
Liu, X., Feng, X., Zhu, Y.: Transactional data anonymization for privacy and information preservation via disassociation and local suppression. Symmetry 14(3) (2022). https://doi.org/10.3390/sym14030472
Medková, J., Hynek, J.: Application-oriented anonymization framework for social network datasets and IoT environments. In: Bella, G., Doinea, M., Janicke, H. (eds.) SecITC 2022. LNCS, vol. 13809, pp. 261–274. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-32636-3_15
Moro, S., Rita, P., Cortez, P.: Bank Marketing. UCI Machine Learning Repository (2012). https://archive.ics.uci.edu/dataset/222/bank+marketing. Accessed 29 Jan 2024
Murray, D., et al.: A data management platform for personalised real-time energy feedback. In: Proceedings of the 8th International Conference on Energy Efficiency in Domestic Appliances and Lighting (2015)
n2c2 NLP Research Data Sets. Harvard Medical School. https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/. Accessed 29 Jan 2024
Neves, F., Souza, R., Sousa, J., Bonfim, M., Garcia, V.: Data privacy in the Internet of Things based on anonymization: a review. J. Comput. Secur. 31(3), 261–291 (2023). https://doi.org/10.3233/JCS-210089
OMNeT++ Discrete Event Simulator. https://github.com/omnetpp/omnetpp/tree/master. Accessed 29 Jan 2024
Otgonbayar, A., Pervez, Z., Dahal, K., Eager, S.: K-VARP: K-anonymity for varied data streams via partitioning. Inf. Sci. 467, 238–255 (2018). https://doi.org/10.1016/j.ins.2018.07.057
PAD Protecting Anonymity. https://github.com/PAD-Protecting-Anonymity/PAD. Accessed 29 Jan 2024
Piorkowski, M., Sarafijanovic-Djukic, N., Grossglauser, M.: CRAWDAD epfl/mobility (2022). https://ieee-dataport.org/open-access/crawdad-epflmobility. Accessed 29 Jan 2024
REFIT: Electrical load measurements. University of Strathclyde. https://pureportal.strath.ac.uk/en/datasets/refit-electrical-load-measurements-cleaned/datasets/. Accessed 29 Jan 2024
Sadeghi-Nasab, A., Ghaffarian, H., Rahmani, M.: Apache Flink and clustering-based framework for fast anonymization of IoT stream data. Intell. Syst. Appl. 20 (2023). https://doi.org/10.1016/j.iswa.2023.200267
Southern Nevada Household Travel Survey. Transportation Secure Data Center (2014). https://www.nrel.gov/transportation/secure-transportation-data/tsdc-southern-nevada-travel-survey.html. Accessed 29 Jan 2024
The MobiFall and MobiAct datasets. BMI lab. https://bmi.hmu.gr/the-mobifall-and-mobiact-datasets-2/. Accessed 29 Jan 2024
UC Irvine Machine Learning Repository. https://archive.ics.uci.edu/. Accessed 1 Feb 2024
UMassTraceRepository. http://traces.cs.umass.edu/index.php/Smart/Smart. Accessed 29 Jan 2024
US NHTSA crash statistics. National US Center for Statistics and Analysis. https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars. Accessed 29 Jan 2024
Uzuner, O., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18(5), 552–556 (2011). https://doi.org/10.1136/amiajnl-2011-000203
Vito, S.: Air Quality. UCI Machine Learning Repository (2016). https://archive.ics.uci.edu/dataset/360/air+quality. Accessed 29 Jan 2024
Xie, M., Huang, M., Bai, Y., Hu, Z.: The anonymization protection algorithm based on fuzzy clustering for the ego of data in the Internet of Things. J. Electr. Comput. Eng. 2017 (2017). https://doi.org/10.1155/2017/2970673
Zheng, Y.: T-drive trajectory data sample (2011). https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/. Accessed 29 Jan 2024
Acknowledgements
This study is supported by the SPEV project 2024 run at the Faculty of Informatics and Management, University of Hradec Králové, Czech Republic. The assistance provided by Dominik Palla with data collection is greatly appreciated.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Medková, J. (2024). Classification of Datasets Used in Data Anonymization for IoT Environment. In: Fujita, H., Cimler, R., Hernandez-Matamoros, A., Ali, M. (eds) Advances and Trends in Artificial Intelligence. Theory and Applications. IEA/AIE 2024. Lecture Notes in Computer Science(), vol 14748. Springer, Singapore. https://doi.org/10.1007/978-981-97-4677-4_8
Download citation
DOI: https://doi.org/10.1007/978-981-97-4677-4_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-4676-7
Online ISBN: 978-981-97-4677-4
eBook Packages: Computer ScienceComputer Science (R0)