[go: up one dir, main page]

Skip to main content

Utility Promises of Self-Organising Maps in Privacy Preserving Data Mining

  • Conference paper
  • First Online:
Data Privacy Management, Cryptocurrencies and Blockchain Technology (DPM 2020, CBT 2020)

Abstract

Data mining techniques are highly efficient in sifting through big data to extract hidden knowledge and assist evidence-based decisions. However, it poses severe threats to individuals’ privacy because it can be exploited to allow inferences to be made on sensitive data. Researchers have proposed several privacy-preserving data mining techniques to address this challenge. One unique method is by extending anonymisation privacy models in data mining processes to enhance privacy and utility. Several published works in this area have utilised clustering techniques to enforce anonymisation models on private data, which work by grouping the data into clusters using a quality measure and then generalise the data in each group separately to achieve an anonymisation threshold. Although they are highly efficient and practical, however guaranteeing adequate balance between data utility and privacy protection remains a challenge. In addition to this, existing approaches do not work well with high-dimensional data, since it is difficult to develop good groupings without incurring excessive information loss. Our work aims to overcome these challenges by proposing a hybrid approach, combining self organising maps with conventional privacy based clustering algorithms. The main contribution of this paper is to show that, dimensionality reduction techniques can improve the anonymisation process by incurring less information loss, thus producing a more desirable balance between privacy and utility properties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    \(D^k\) denotes a k-anonymised version of the original table D.

  2. 2.

    where \(||\cdot ||\) is the measure of distance.

References

  1. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Series in Data Management Systems, 3rd edn. Morgan Kaufmann, Amsterdam (2011)

    Google Scholar 

  2. Narwaria, M., Arya, S.: Privacy preserving data mining – ‘a state of the art’. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 2108–2112, March 2016

    Google Scholar 

  3. Sharma, S., Shukla, D.: Efficient multi-party privacy preserving data mining for vertically partitioned data. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp. 1–7, August 2016

    Google Scholar 

  4. Kaur, A.: A hybrid approach of privacy preserving data mining using suppression and perturbation techniques. In: 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 306–311, February 2017

    Google Scholar 

  5. Liu, W., Luo, S., Wang, Y., Jiang, Z.: A protocol of secure multi-party multi-data ranking and its application in privacy preserving sequential pattern mining. In: 2011 Fourth International Joint Conference on Computational Sciences and Optimization, pp. 272–275, April 2011

    Google Scholar 

  6. Lin, J.-L., Wei, M.-C.: An efficient clustering method for k-anonymization. In: Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society - PAIS 2008. ACM Press (2008)

    Google Scholar 

  7. Lin, K.-P., Chen, M.-S.: On the design and analysis of the privacy-preserving SVM classifier. IEEE Trans. Knowl. Data Eng. 23(11), 1704–1717 (2011)

    Article  Google Scholar 

  8. Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Advances in Databases: Concepts, Systems and Applications, pp. 188–200 (2007)

    Google Scholar 

  9. Oliveira, S., Zaïane, O.: Privacy preserving clustering by data transformation. J. Inf. Data Manage. 1(1), 05 (2010)

    Google Scholar 

  10. Kabir, E., Wang, H., Bertino, E.: Efficient systematic clustering method for k-anonymization. Acta Informatica 48(1), 51–66 (2011)

    Article  MathSciNet  Google Scholar 

  11. Xu, X., Numao, M.: An efficient generalized clustering method for achieving k-anonymization. In: 2015 Third International Symposium on Computing and Networking (CANDAR). IEEE, December 2015

    Google Scholar 

  12. Zheng, W., Wang, Z., Lv, T., Ma, Y., Jia, C.: K-anonymity algorithm based on improved clustering. In: Algorithms and Architectures for Parallel Processing, pp. 462–476 (2018)

    Google Scholar 

  13. Loukides, G., Shao, J.: Clustering-based K-anonymisation algorithms. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 761–771. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74469-6_74

    Chapter  Google Scholar 

  14. Ciriani, V., De Capitani, S., di Vimercati, S., Foresti, P.S.: k: anonymous data mining: a survey. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining, pp. 105–136. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-70992-5_5

    Chapter  Google Scholar 

  15. Gkoulalas-Divanis, A., Loukides, G.: A survey of anonymization algorithms for electronic health records. In: Gkoulalas-Divanis, A., Loukides, G. (eds.) Medical Data Privacy Handbook, pp. 17–34. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23633-9_2

    Chapter  Google Scholar 

  16. Pin, L., Wen-bing, Y., Nian-sheng, C. A unified metric method of information loss in privacy preserving data publishing. In: 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing, vol. 2, pp. 502–505, April 2010

    Google Scholar 

  17. Dua, D., Graff, C.: Adult data set UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  18. Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  19. Ciriani, V., De Capitani Di Vimercati, S., Foresti, S., Samarati, P.: k-anonymity. In: Yu, T., Jajodia, S. (eds.) Secure data management in decentralized systems, pp. 323–353. Springer, Boston (2007). https://doi.org/10.1007/978-0-387-27696-0_10

    Chapter  Google Scholar 

  20. Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the Seventeenth Symposium on Principles of Database Systems. ACM Press (1998)

    Google Scholar 

  21. Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2004, New York, NY, USA, pp. 223–228. Association for Computing Machinery (2004)

    Google Scholar 

  22. Tripathy, B.: Database anonymization techniques with focus on uncertainty and multi-sensitive attributes. In: Handbook of Research on Computational Intelligence for Engineering, Science, and Business, pp. 364–383. IGI Global (2013)

    Google Scholar 

  23. Friedman, A., Wolff, R., Schuster, A.: Providing k-anonymity in data mining. VLDB J. 17(4), 789–804 (2008)

    Article  Google Scholar 

  24. Kawano, A., Honda, K., Kasugai, H., Notsu, A.: A greedy algorithm for k-member co-clustering and its applicability to collaborative filtering. Procedia Comput. Sci. 22, 477–484 (2013)

    Article  Google Scholar 

  25. Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-56927-2

    Book  MATH  Google Scholar 

  26. Aggarwal, C.C.: Neural Networks and Deep Learning. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94463-0

    Book  MATH  Google Scholar 

  27. Dogan, Y., Birant, D., Kut, A.: SOM++: integration of self-organizing map and K-Means++ algorithms. In: Perner, P. (ed.) MLDM 2013. LNCS (LNAI), vol. 7988, pp. 246–259. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39712-7_19

    Chapter  Google Scholar 

  28. Flavius, G., Alfredo, C.J.: PartSOM: a framework for distributed data clustering using SOM and k-means. In: Self-Organizing Maps. IntechOpen, April 2010

    Google Scholar 

  29. Tsiafoulis, S., Zorkadis, V.C., Karras, D.A.: A neural-network clustering-based algorithm for privacy preserving data mining. In: Kim, T., Yau, S.S., Gervasi, O., Kang, B.-H., Stoica, A., Ślęzak, D. (eds.) FGIT 2010. CCIS, vol. 121, pp. 269–276. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17625-8_27

    Chapter  Google Scholar 

  30. Byun, J.-W., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: Jonker, W., Petković, M. (eds.) SDM 2006. LNCS, vol. 4165, pp. 48–63. Springer, Heidelberg (2006). https://doi.org/10.1007/11844662_4

    Chapter  Google Scholar 

  31. Zare-Mirakabad, M.-R., Jantan, A., Bressan, S.: Privacy risk diagnosis: mining l-diversity. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds.) DASFAA 2009. LNCS, vol. 5667, pp. 216–230. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04205-8_19

    Chapter  Google Scholar 

  32. Wang, K., Fung, B.C.M.: Anonymizing sequential releases. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, New York, NY, USA, pp. 414–423. ACM (2006)

    Google Scholar 

  33. Gong, Q.: Clustering based k-anonymization. MIT License, January 2016

    Google Scholar 

  34. Mishra, A.; Metrics to evaluate your machine learning algorithm (2018). https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kabiru Mohammed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mohammed, K., Ayesh, A., Boiten, E. (2020). Utility Promises of Self-Organising Maps in Privacy Preserving Data Mining. In: Garcia-Alfaro, J., Navarro-Arribas, G., Herrera-Joancomarti, J. (eds) Data Privacy Management, Cryptocurrencies and Blockchain Technology. DPM CBT 2020 2020. Lecture Notes in Computer Science(), vol 12484. Springer, Cham. https://doi.org/10.1007/978-3-030-66172-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66172-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66171-7

  • Online ISBN: 978-3-030-66172-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics