Abstract
Data mining techniques are highly efficient in sifting through big data to extract hidden knowledge and assist evidence-based decisions. However, it poses severe threats to individuals’ privacy because it can be exploited to allow inferences to be made on sensitive data. Researchers have proposed several privacy-preserving data mining techniques to address this challenge. One unique method is by extending anonymisation privacy models in data mining processes to enhance privacy and utility. Several published works in this area have utilised clustering techniques to enforce anonymisation models on private data, which work by grouping the data into clusters using a quality measure and then generalise the data in each group separately to achieve an anonymisation threshold. Although they are highly efficient and practical, however guaranteeing adequate balance between data utility and privacy protection remains a challenge. In addition to this, existing approaches do not work well with high-dimensional data, since it is difficult to develop good groupings without incurring excessive information loss. Our work aims to overcome these challenges by proposing a hybrid approach, combining self organising maps with conventional privacy based clustering algorithms. The main contribution of this paper is to show that, dimensionality reduction techniques can improve the anonymisation process by incurring less information loss, thus producing a more desirable balance between privacy and utility properties.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\(D^k\) denotes a k-anonymised version of the original table D.
- 2.
where \(||\cdot ||\) is the measure of distance.
References
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Series in Data Management Systems, 3rd edn. Morgan Kaufmann, Amsterdam (2011)
Narwaria, M., Arya, S.: Privacy preserving data mining – ‘a state of the art’. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 2108–2112, March 2016
Sharma, S., Shukla, D.: Efficient multi-party privacy preserving data mining for vertically partitioned data. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp. 1–7, August 2016
Kaur, A.: A hybrid approach of privacy preserving data mining using suppression and perturbation techniques. In: 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 306–311, February 2017
Liu, W., Luo, S., Wang, Y., Jiang, Z.: A protocol of secure multi-party multi-data ranking and its application in privacy preserving sequential pattern mining. In: 2011 Fourth International Joint Conference on Computational Sciences and Optimization, pp. 272–275, April 2011
Lin, J.-L., Wei, M.-C.: An efficient clustering method for k-anonymization. In: Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society - PAIS 2008. ACM Press (2008)
Lin, K.-P., Chen, M.-S.: On the design and analysis of the privacy-preserving SVM classifier. IEEE Trans. Knowl. Data Eng. 23(11), 1704–1717 (2011)
Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Advances in Databases: Concepts, Systems and Applications, pp. 188–200 (2007)
Oliveira, S., Zaïane, O.: Privacy preserving clustering by data transformation. J. Inf. Data Manage. 1(1), 05 (2010)
Kabir, E., Wang, H., Bertino, E.: Efficient systematic clustering method for k-anonymization. Acta Informatica 48(1), 51–66 (2011)
Xu, X., Numao, M.: An efficient generalized clustering method for achieving k-anonymization. In: 2015 Third International Symposium on Computing and Networking (CANDAR). IEEE, December 2015
Zheng, W., Wang, Z., Lv, T., Ma, Y., Jia, C.: K-anonymity algorithm based on improved clustering. In: Algorithms and Architectures for Parallel Processing, pp. 462–476 (2018)
Loukides, G., Shao, J.: Clustering-based K-anonymisation algorithms. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 761–771. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74469-6_74
Ciriani, V., De Capitani, S., di Vimercati, S., Foresti, P.S.: k: anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining, pp. 105–136. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-70992-5_5
Gkoulalas-Divanis, A., Loukides, G.: A survey of anonymization algorithms for electronic health records. In: Gkoulalas-Divanis, A., Loukides, G. (eds.) Medical Data Privacy Handbook, pp. 17–34. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23633-9_2
Pin, L., Wen-bing, Y., Nian-sheng, C. A unified metric method of information loss in privacy preserving data publishing. In: 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing, vol. 2, pp. 502–505, April 2010
Dua, D., Graff, C.: Adult data set UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Ciriani, V., De Capitani Di Vimercati, S., Foresti, S., Samarati, P.: k-anonymity. In: Yu, T., Jajodia, S. (eds.) Secure data management in decentralized systems, pp. 323–353. Springer, Boston (2007). https://doi.org/10.1007/978-0-387-27696-0_10
Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the Seventeenth Symposium on Principles of Database Systems. ACM Press (1998)
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2004, New York, NY, USA, pp. 223–228. Association for Computing Machinery (2004)
Tripathy, B.: Database anonymization techniques with focus on uncertainty and multi-sensitive attributes. In: Handbook of Research on Computational Intelligence for Engineering, Science, and Business, pp. 364–383. IGI Global (2013)
Friedman, A., Wolff, R., Schuster, A.: Providing k-anonymity in data mining. VLDB J. 17(4), 789–804 (2008)
Kawano, A., Honda, K., Kasugai, H., Notsu, A.: A greedy algorithm for k-member co-clustering and its applicability to collaborative filtering. Procedia Comput. Sci. 22, 477–484 (2013)
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-56927-2
Aggarwal, C.C.: Neural Networks and Deep Learning. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94463-0
Dogan, Y., Birant, D., Kut, A.: SOM++: integration of self-organizing map and K-Means++ algorithms. In: Perner, P. (ed.) MLDM 2013. LNCS (LNAI), vol. 7988, pp. 246–259. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39712-7_19
Flavius, G., Alfredo, C.J.: PartSOM: a framework for distributed data clustering using SOM and k-means. In: Self-Organizing Maps. IntechOpen, April 2010
Tsiafoulis, S., Zorkadis, V.C., Karras, D.A.: A neural-network clustering-based algorithm for privacy preserving data mining. In: Kim, T., Yau, S.S., Gervasi, O., Kang, B.-H., Stoica, A., Ślęzak, D. (eds.) FGIT 2010. CCIS, vol. 121, pp. 269–276. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17625-8_27
Byun, J.-W., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: Jonker, W., Petković, M. (eds.) SDM 2006. LNCS, vol. 4165, pp. 48–63. Springer, Heidelberg (2006). https://doi.org/10.1007/11844662_4
Zare-Mirakabad, M.-R., Jantan, A., Bressan, S.: Privacy risk diagnosis: mining l-diversity. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds.) DASFAA 2009. LNCS, vol. 5667, pp. 216–230. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04205-8_19
Wang, K., Fung, B.C.M.: Anonymizing sequential releases. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, New York, NY, USA, pp. 414–423. ACM (2006)
Gong, Q.: Clustering based k-anonymization. MIT License, January 2016
Mishra, A.; Metrics to evaluate your machine learning algorithm (2018). https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mohammed, K., Ayesh, A., Boiten, E. (2020). Utility Promises of Self-Organising Maps in Privacy Preserving Data Mining. In: Garcia-Alfaro, J., Navarro-Arribas, G., Herrera-Joancomarti, J. (eds) Data Privacy Management, Cryptocurrencies and Blockchain Technology. DPM CBT 2020 2020. Lecture Notes in Computer Science(), vol 12484. Springer, Cham. https://doi.org/10.1007/978-3-030-66172-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-66172-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66171-7
Online ISBN: 978-3-030-66172-4
eBook Packages: Computer ScienceComputer Science (R0)