Abstract
The support vector machine (SVM) is a popular classification algorithm. Since training the SVM is very time-consuming, outsourcing the computations of solving the SVM to external service providers benefits the data owner who is not familiar with the techniques of the SVM or has only limited computing resources. In outsourcing, the data privacy is a critical concern for some legal or commercial reasons since there may be sensitive information contained in the data. Existing privacy-preserving data mining works are either not appropriate to outsourcing the SVM or weak in security. In this paper, we propose a scheme for privacy-preserving outsourcing of the SVM, which perturbs the data by random linear transformation. The service provider solves the SVM from the perturbed data without knowing the actual content of the data, and the generated SVM classifier is also perturbed, which can only be recovered by the data owner. Both the privacy of data and generated classifiers are protected. The proposed scheme is stronger in security than existing techniques, and incurs very little additional communication and computation cost. The experimental results show that the proposed scheme imposes very little overhead on the data owner, and the classification accuracy is similar to a normal SVM classifier.






Similar content being viewed by others
Notes
It is not necessary to put the whole matrix \(M\) in the main memory. The computation can be decomposed to \(M \mathbf {x}= x_1 M_{:,1} + \cdots + x_n M_{:,n} \), where \(M_{:,i}\) is the \(i\)-th column of \(M\).
References
Standard for privacy of individually identifiable health information, 2001. [Online]. http://www.hhs.gov/ocr/privacy/index.html
Aggarwal CC, Yu PS (2004) A condensation approach to privacy preserving data mining. In: Proceedings of the 9th international conference on extending database technology (EDBT)
Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS)
Agrawal R, Kiernan J, Srikant R, Xu Y (2004) Order preserving encryption for numeric data. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data (SIGMOD)
Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (SIGMOD)
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of the 5th IEEE international conference on data mining (ICDM)
Chen K, Sun G, Liu L (2007) Towards attack-resilient geometric data perturbation. In: Proceedings of the 7th SIAM international conference on data mining (SDM)
Chen M-S, Han J, Yu PS (1996) Data mining: an overview from database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Gentry C (2009) Fully homomorphic encryption using ideal lattices. In: Proceedings of the 41st annual ACM symposium on theory of computing (STOC)
Gentry C (2010) Computing arbitrary functions of encrypted data. Commun ACM 53(3):97–105
Goldwasser S, Micali S (1984) Probabilistic encryption. J Comput Syst Sci 28(2):270–299
Hacıgümüş H, Iyer B, Li C, Mehrotra S (2002) Executing SQL over encrypted data in the database-service-provider model. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data (SIGMOD)
Hsu C-W, Chang C-C, Lin C-J (2003) A practical guide to support vector classification. Department of Computer Science, National Taiwan University. Technical report. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Inan A, Kantarcioglu M, Bertino E (2009) Using anonymized data for classification. In: Proceedings of the 25th IEEE international conference on data engineering (ICDE)
Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037
Laur S, Lipmaa H, Mielikäinen T (2006) Cryptographically private support vector machines. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Lee Y-J, Huang S-Y (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 18(1):1–13
Lee Y-J, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of the 1st SIAM international conference on data mining (SDM)
Lee Y-J, Mangasarian OL (2001) SSVM: a smooth support vector machine for classification. Comput Optim Appl 20(1):5–22
Lin K-M, Lin C-J (2003) A study on reduced support vector machines. IEEE Trans Neural Netw 14(6):1449–1459
Lin K-P, Chen M-S (2008) Releasing the SVM classifier with privacy-preservation. In: Proceedings of the 8th IEEE international conference on data mining (ICDM)
Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15:177–206
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) \(l\)-Diversity: privacy beyond \(k\)-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering (ICDE), pp 24–24
Mangasarian OL, Wild EW, Fung GM (2008) Privacy-preserving classification of vertically partitioned data via random kernels. ACM Trans Knowl Discov Data 2(3):12:1–12:16
Mangasarian OL, Wild T (2007) Privacy-preserving classification of horizontally partitioned data via random kernels. 07-03, Data Mining Institute, Computer Sciences Department, University of Wisconsin–Madison, technical report
Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Advances in cryptography—EUROCRYPT’99, vol 1592 of lecture notes in computer science. Springer, Berlin, pp 223–238
Pinkas B (2002) Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explor Newsl 4(2):12–19
Platt JC (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Mika S (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 185–208
Prokhorov D (2001) IJCNN 2001 neural network competition. Slide presentation in IJCNN’01, Ford Research Laboratory, technical report
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Sweeney L (2002) Achieving \(k\)-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5):571–588
Sweeney L (2002) \(k\)-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Vaidya J, Clifton C (2002) Privacy-preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Vaidya J, Yu H, Jiang X (2008) Privacy-preserving SVM classification. Knowl Inf Syst 14:161–178
Vapnik VN (1998) Statistical learning theory. Wiley, Hoboken
Wong WK, Cheung DW, Hung E, Kao B, Mamoulis N (2007) Security in outsourcing of association rule mining. In: Proceedings of the 33rd international conference on very large data bases (VLDB)
Wong WK, Cheung DW, Kao B, Mamoulis N (2009) Secure kNN computation on encrypted databases. In: Proceedings of the 35th SIGMOD international conference on management of data (SIGMOD)
Yu H, Jiang X, Vaidya J (2006) Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proceedings of the 2006 ACM symposium on applied computing (SAC)
Yu H, Vaidya J, Jiang X (2006) Privacy-preserving SVM classification on vertically partitioned data. In: Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining (PAKDD)
Acknowledgments
This work was supported in part by the National Science Council, Taiwan, under NSC 101-2410-H-110-081-MY2.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, KP., Chang, YW. & Chen, MS. Secure support vector machines outsourcing with random linear transformation. Knowl Inf Syst 44, 147–176 (2015). https://doi.org/10.1007/s10115-014-0751-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0751-1