Abstract
This work addresses the problem of Statistical Disclosure Control (SDC) on an electronic voting scenario. Electoral datasets containing voting choices linked to voters demographic profile information, can be used to perform fine-grained analysis of citizen opinion. However, it is strongly required to protect voters’ privacy. Traditional SDC techniques study methods to met some predefined privacy criteria, assuming a trustworthy owner that knows the values of the confidential attributes. Unfortunately, this assumption cannot be made in our scenario, since its dataset contains secret voting choices, which are unknown until they are properly anonymized and decrypted. We propose a protocol and a system architecture to perform SDC in datasets with encrypted attributes, while minimizing the amount of information an attacker can learn about the secret data. The protocol enables the release of electoral datasets, which allow governments and third parties to gain more insight into citizen opinion, and improve decision making processes and public services.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bernhard, D., Cortier, V., Galindo, D., Pereira, O., Warinschi, B.: SoK: a comprehensive analysis of game-based ballot privacy definitions. In: Proceedings of the IEEE Symposium on Security and Privacy (S&P), San Jose, CA, pp. 499–516, May 2015
Bernhard, D., Warinschi, B.: Cryptographic voting — a gentle introduction. In: Aldini, A., Lopez, J., Martinelli, F. (eds.) FOSAD 2012-2013. LNCS, vol. 8604, pp. 167–211. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10082-1_7
Chaum, D.: Untraceable electronic mail, return addresses, and digital pseudonyms. Commun. ACM 24(2), 84–88 (1981)
Cramer, R., Gennaro, R., Schoenmakers, B.: A secure and optimally efficient multi-authority election scheme. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 103–118. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-69053-0_9
Delaune, S., Kremer, S., Ryan, M.D.: Verifying privacy-type properties of electronic voting protocols. J. Comput. Secur. 17(4), 435–487 (2009)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)
Fujioka, A., Okamoto, T., Ohta, K.: A practical secret voting scheme for large scale elections. In: Seberry, J., Zheng, Y. (eds.) AUSCRYPT 1992. LNCS, vol. 718, pp. 244–251. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-57220-1_66
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Hundepool, A., et al.: Statistical Disclosure Control. Surv. Method. Wiley, Chichester (2012)
Jonker, H., Mauw, S., Pang, J.: Privacy and verifiability in voting systems: methods, developments and trends. Comput. Sci. Rev. 10, 1–30 (2013)
Li, N., Li, T., Venkatasubramanian, S.: \(t\)-closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), Istanbul, Turkey, pp. 106–115, April 2007
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: \(l\)-diversity: privacy beyond \(k\)-anonymity. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), p. 24, Apr 2006
Neff, C.A.: A verifiable secret shuffle and its application to E-voting. In: Proceedings of the ACM Conference on Computer and Communications Security (CCS), Philadelphia, PA, USA, pp. 116–125 (2001)
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. UNECE Stat. J. 18(4), 345–354 (2001)
Rebollo-Monedero, D., Forné, J., Soriano, M.: \(p\)-probabilistic \(k\)-anonymous microaggregation for the anonymization of surveys with uncertain participation (2016, submitted)
Rebollo-Monedero, D., Forné, J., Soriano, M., Puiggalí-Allepuz, J.: \(k\)-anonymous microaggregation with preservation of statistical dependence. Inf. Sci. 342(1), 1–23 (2016)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, Computer Science Laboratory, SRI International (1998)
Truta, T.M., Vinay, B.: Privacy protection: \(p\)-sensitive \(k\)-anonymity property. In: Proceedings of the International Workshop on Privacy Data Management (PDM), p. 94. IEEE Computer Society (2006)
Acknowledgments
The authors would like to thank Xavier Alsina and Alexey Akimov for their collaboration and helpful comments. This work has been partly supported by the Spanish Ministry of Industry, Energy and Tourism (MINETUR) through the “Acción Estratégica Economía y Sociedad Digital (AEESD)” funding plan, through Project ref. TSI-100202-2013-23 “Data-Distortion Framework (DDF).”
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Estimation of the Cluster Size
Appendix: Estimation of the Cluster Size
In order to execute the microaggregation algorithm in Step 3, an appropriate value of the cluster size must be estimated using the SDC_KEst algorithm.
Our procedure assumes that the votes are cast under a common voting profile \(\pi =(\pi _1,\dots ,\pi _m)\). That is, each voter chooses option i with probability \(\pi _i\). This information can be provided, e.g., by the Statistical Data Provider in the form of surveys or exit polls. We acknowledge that this is a quite strong assumption, but it is a necessary first step toward the problem nonetheless.
Let \(X_k\) be the m-ary vector r.v. that counts the number of votes cast for each of the m voting options in a cluster of size k. Therefore, \(X_k\) follows a multinomial distribution with parameters k and \(\pi \). Additionally, let \(W_k={{\mathrm{wt}}}(X_k)\) denote the hamming weight of \(X_k\), i.e., the number of nonzero positions in \(X_k\). Then, a cluster fails to satisfy the p-sensitive privacy property whenever \(W_k< p\). This event occurs with probability
where the sum runs over all the m-ary vectors x of nonnegative integers whose components sum k. It is easy to see that \(\Pr \{W_k \ge p\}\ge \Pr \{W_{k-1} \ge p\}\), which signifies that \(\rho (k)\) is a decreasing function of k, as the intuition suggests.
Our aim is to estimate a cluster size \(k_\text {est}\) so that, with probability at least \(1-\epsilon \), a proportion of \({\ge }1-\delta \) clusters satisfy the p-sensitive k-anonymity privacy criterion, for sufficiently small values of \(\epsilon \) and \(\delta >\rho (k)\). For a total of \(N=n/k\) clusters, let \(N^*\) be the number of clusters satisfying the privacy criterion. Hence, using the Chernoff bound [9], we see that
where \({{\mathrm{D}}}(\delta \Vert \rho (k))\) denotes the Kullback-Leibler divergence between two Bernoulli distributed r.v.’s with parameters \(\delta \) and \(\rho (k)\), respectively. Therefore, the SDC_KEst algorithm obtain an estimation of the cluster size as
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Blasco, P., Moreira, J., Puiggalí, J., Cucurull, J., Rebollo-Monedero, D. (2018). Improving Opinion Analysis Through Statistical Disclosure Control in eVoting Scenarios. In: Kő, A., Francesconi, E. (eds) Electronic Government and the Information Systems Perspective. EGOVIS 2018. Lecture Notes in Computer Science(), vol 11032. Springer, Cham. https://doi.org/10.1007/978-3-319-98349-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-98349-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98348-6
Online ISBN: 978-3-319-98349-3
eBook Packages: Computer ScienceComputer Science (R0)