Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization

Reza Mortazavi² &
Saeed Jalili¹

358 Accesses
5 Citations
Explore all metrics

Abstract

Microaggregation is a masking mechanism to protect confidential data in a public release. This technique can produce a k-anonymous dataset where data records are partitioned into groups of at least k members. In each group, a representative centroid is computed by aggregating the group members and is published instead of the original records. In a conventional microaggregation algorithm, the centroids are computed based on simple arithmetic mean of group members. This naïve formulation does not consider the proximity of the published values to the original ones, so an intruder may be able to guess the original values. This paper proposes a disclosure-aware aggregation model, where published values are computed in a given distance from the original ones to attain a more protected and useful published dataset. Empirical results show the superiority of the proposed method in achieving a better trade-off point between disclosure risk and information loss in comparison with other similar anonymization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novel Iterative Min-Max Clustering to Minimize Information Loss in Statistical Disclosure Control

New Multi-dimensional Sorting Based K-Anonymity Microaggregation for Statistical Disclosure Control

K−Means Clustering Microaggregation for Statistical Disclosure Control

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

The measures are discussed in Sect. 2.3 with more details.
For simplicity, we define $Var(X)=\sigma ^2_X=1/n \sum _{i=1}^{n}(x_i-\mu _{X})^2$ where X is a set of n equally likely values $x_i$ with $\mu _{X}=Mean(X)$.
We review some general purpose $\textit{DR}$ and $\textit{IL}$ measures only for continuous data type, which is addressed in this paper. The variants of the measures for other data types can be found in Hundepool et al. (2012).
It is also known as identity disclosure or re-identification risk.
Interval disclosure is a special case of attribute disclosure for continuous datasets.
The heuristic can be simply extended to consider each attribute separately, however, our experiments show that there is no a significant improvement that justifies this additional cost.
These methods are described in Sect. 3.
Please note that in Table 2, MDAV-DA usually performs better for $k=5$ than other aggregation levels for MDAV-DA.
In fact, we select the trade-off point with closest but greater $\textit{DR}$ than the value of MDAV-DA, to allow a more (potential) decrease of $\textit{IL}$ for the methods.
An illustrative example is presented in Fig. 1.

References

Askari M, Safavi-Naini R, Barker K (2012) An information theoretic privacy and utility measure for data sanitization mechanisms. In: Proceedings of the second ACM conference on data and application security and privacy, ACM, New York, NY CODASPY, pp 283–294
Batet M, Erola A, Sánchez D, Castellà-Roca J (2013) Utility preserving query log anonymization via semantic microaggregation. Inf Sci 242:49–63
Article Google Scholar
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Article MathSciNet MATH Google Scholar
Brand R (2003) Microdata protection through noise addition. In: Domingo-Ferrer J (ed) Inference control in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 97–116
Google Scholar
Brand R, Domingo-Ferrer J, Mateo-Sanz J (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. European Project IST-2000-25069 CASC, http://neon.vb.cbs.nl/casc
Burridge J (2003) Information preserving statistical obfuscation. Stat Comput 13(4):321–327
Article MathSciNet Google Scholar
Charu A, Philip S (2008) Privacy-preserving data mining: models and algorithms. ASPVU, Boston
Google Scholar
Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, pp 195–204
Domingo-Ferrer J, Torra V (2001a) Disclosure protection methods and information loss for microdata. Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, pp 91–110
Domingo-Ferrer J, Torra V (2001b) A quantitative comparison of disclosure control methods for microdata. Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, pp 111–134
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Article MathSciNet Google Scholar
Domingo-Ferrer J, Rebollo-Monedero D (2009) Measuring risk and utility of anonymized data using information theory. In: Proceedings of the EDBT/ICDT Workshops, ACM, New York, NY, EDBT/ICDT, pp 126–130
Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS, vol 2, pp 807–826
Domingo-Ferrer J, Martínez-Ballesté A, Mateo-Sanz JM, Sebé F (2006a) Efficient multivariate data-oriented microaggregation. VLDB J 15(4):355–369
Article Google Scholar
Domingo-Ferrer J, Solanas A, Martinez-Balleste A (2006b) Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings of international conference on granular computing, IEEE, pp 774–777
Domingo-Ferrer J, Sebé F, Solanas A (2008) An anonymity model achievable via microaggregation. In: Secure data management, Springer, Heidelberg, pp 209–218
Drud AS (1994) CONOPT a large-scale GRG code. ORSA J Comput 6(2):207–216
Article MATH Google Scholar
Fayyoumi E, Oommen BJ (2010) A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases. Softw Pract Exp 40(12):1161–1188
Article Google Scholar
Hansen S, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044
Article Google Scholar
Heaton B (2012) New record ordering heuristics for multivariate microaggregation. PhD thesis, Nova Southeastern University
Herranz J, Matwin S, Nin J, Torra V (2010) Classifying data from protected statistical datasets. Comput Secur 29(8):875–890
Article Google Scholar
Herranz J, Nin J, Solé M (2012a) Kd-trees and the real disclosure risks of large statistical databases. Inf Fusion 13(4):260–273
Article Google Scholar
Herranz J, Nin J, Solé M (2012b) More hybrid and secure protection of statistical data sets. IEEE Trans Dependable Secur Comput 9(5):727–740
Google Scholar
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, De Wolf PP (2006) Cenex SDC handbook on statistical disclosure control, version 1.01
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, De Wolf PP (2012) Statistical disclosure control. Wiley, Chichester
Book Google Scholar
Kim JJ (1986) A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the ASA section on survey research methodology, pp 303–308
Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911
Article Google Scholar
Li Y, Zhu S, Wang L, Jajodia S (2002) A privacy-enhanced microaggregation method. In: Eiter T, Schewe KD (eds) Foundations of Information and Knowledge Systems., Lecture notes in computer scienceSpringer, Berlin, pp 148–159
Chapter Google Scholar
Lin JL, Chang PC, Liu JYC, Wen TH (2010) Comparison of microaggregation approaches on anonymized data quality. Expert Syst Appl 37(12):8161–8165
Article Google Scholar
López A (2011) Effect of microaggregation on regression results: an application to Spanish innovation data. Emprical Econ Lett 10(12):1265–1272
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov From Data (TKDD) 1(1):1–52
Article Google Scholar
Mateo-Sanz J, Sebé F, Domingo-Ferrer J (2004) Outlier protection in continuous microdata masking. In: Domingo-Ferrer J, Torra V (eds) Privacy in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 201–215
Chapter Google Scholar
Mateo-Sanz J, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193
Article MathSciNet Google Scholar
Moore Jr RA (1996) Controlled data-swapping techniques for masking public use microdata sets. Tech. Rep. 96-04, Statistical Research Division Report Series, US Bureau of the Census, Washington D.C
Mortazavi R, Jalili S (2014) Fast data-oriented microaggregation algorithm for large numerical datasets. Knowl Based Syst 67:195–205
Article Google Scholar
Mortazavi R, Jalili S (2015) Preference-based anonymization of numerical datasets by multi-objective microaggregation. Inf Fusion 25:85–104
Article Google Scholar
Mortazavi R, Jalili S, Gohargazi H (2013) Multivariate microaggregation by iterative optimization. Appl Intell 39(3):529–544
Article Google Scholar
Navarro-Arribas G, Torra V (2009) Towards microaggregation of log files for Web usage mining in B2C e-commerce. In: Fuzzy information processing society (NAFIPS), IEEE, pp 1–6
Navarro-Arribas G, Torra V (2012) Information fusion in data privacy: a survey. Inf Fusion 13(4):235–244
Article Google Scholar
Nin J, Herranz J, Torra V (2008) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412
Article Google Scholar
Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J U N Econ Com Eur 18(4):345–354
Google Scholar
Oganian A, Karr AF (2006) Combinations of SDC methods for microdata protection. In: Privacy in Statistical Databases, Springer, Heidelberg, pp 102–113
Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Report, Esprit SDC Project, Deliverable MI-3 D
Schmid M, Schneeweiss H, Küchenhoff H (2007) Estimation of a linear regression under microaggregation with the response variable as a sorting variable. Statis Neerl 61(4):407–431
Article MathSciNet MATH Google Scholar
Solanas A (2008) Privacy protection with genetic algorithms. In: Yang A, Shan Y, Bui L (eds) Success in evolutionary computation, studies in computational intelligence. Springer, Berlin, pp 215–237
Chapter Google Scholar
Solanas A, Sebé F, Domingo-Ferrer J (2008) Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond. In: Proceedings of the 2008 international workshop on privacy and anonymity in information society, ACM, pp 61–69
Solé M, Muntés-Mulero V, Nin J (2012) Efficient microaggregation techniques for large numerical data volumes. Int J Inf Secur 11(4):253–267
Article Google Scholar
Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Article MathSciNet MATH Google Scholar
Torra V (2005) Fuzzy c-means for fuzzy hierarchical clustering. In: The 14th IEEE international conference on fuzzy systems, IEEE, pp 646–651
Truta TM, Vinay B (2006) Privacy protection: p-sensitive k-anonymity property. In: Proceedings 22nd international conference on data engineering workshops, IEEE, pp 94–94
Willenborg LC, De Waal T (2001) Elements of statistical disclosure control, vol 155. Springer, New York
MATH Google Scholar
Winkler WE (2004) Re-identification methods for masked microdata. In: Privacy in statistical databases, Springer, Berlin, pp 216–230
Yancey W, Winkler W, Creecy R (2002) Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer J (ed) Inference control in statistical databases., Lecture notes in computer scienceSpringer, Berlin, pp 135–152
Chapter Google Scholar

Download references

Acknowledgments

This research is partially supported by ITRC (Iran Telecommunication Research Center) under Contract No. 12200/500.

Author information

Authors and Affiliations

Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
Saeed Jalili
School of Engineering, Damghan University, Damghan, Iran
Reza Mortazavi

Authors

Reza Mortazavi
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Jalili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeed Jalili.

Additional information

Responsible editor: Kristian Kersting.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mortazavi, R., Jalili, S. Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization. Data Min Knowl Disc 30, 605–639 (2016). https://doi.org/10.1007/s10618-015-0432-z

Download citation

Received: 22 July 2014
Accepted: 14 July 2015
Published: 29 July 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s10618-015-0432-z

Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Novel Iterative Min-Max Clustering to Minimize Information Loss in Statistical Disclosure Control

New Multi-dimensional Sorting Based K-Anonymity Microaggregation for Statistical Disclosure Control

K−Means Clustering Microaggregation for Statistical Disclosure Control

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Novel Iterative Min-Max Clustering to Minimize Information Loss in Statistical Disclosure Control

New Multi-dimensional Sorting Based K-Anonymity Microaggregation for Statistical Disclosure Control

K−Means Clustering Microaggregation for Statistical Disclosure Control

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation