[go: up one dir, main page]

CN114626086B - Data privacy protection method based on negative database and deep learning based on ciphertext - Google Patents

Data privacy protection method based on negative database and deep learning based on ciphertext Download PDF

Info

Publication number
CN114626086B
CN114626086B CN202210159783.2A CN202210159783A CN114626086B CN 114626086 B CN114626086 B CN 114626086B CN 202210159783 A CN202210159783 A CN 202210159783A CN 114626086 B CN114626086 B CN 114626086B
Authority
CN
China
Prior art keywords
data
negative database
ndb
privacy protection
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210159783.2A
Other languages
Chinese (zh)
Other versions
CN114626086A (en
Inventor
赵冬冬
张平川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202210159783.2A priority Critical patent/CN114626086B/en
Publication of CN114626086A publication Critical patent/CN114626086A/en
Application granted granted Critical
Publication of CN114626086B publication Critical patent/CN114626086B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0869Generation of secret information including derivation or calculation of cryptographic keys or passwords involving random numbers or seeds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/14Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using a plurality of keys or algorithms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于密文的负数据库及深度学习的数据隐私保护方法,采用数据隐私保护模型进行数据隐私保护;首先对原始数据转换为二进制串,并于随机生成的密钥进行异或加密;接着选取负数据库生成算法,对机密数据生成相应的负数据库;并提取负数据库的梗概sketch;然后利用基于负数据库的激活函数估算公式来进行计算,从而完成神经网络的前向传播以及反向传播过程;直到神经网络收敛,获得训练好的数据隐私保护模型;本发明解决了例如差分隐私难以平衡隐私与利用率的问题以及同态加密等方法导致计算代价过大的问题,具有较强的鲁棒性,综合提升了隐私保护深度学习过程当中的效率以及精度。

The present invention discloses a data privacy protection method based on a negative database and deep learning of ciphertext, which adopts a data privacy protection model to perform data privacy protection; firstly, the original data is converted into a binary string, and is XOR-encrypted with a randomly generated key; then a negative database generation algorithm is selected to generate a corresponding negative database for the confidential data; and a sketch of the negative database is extracted; then, an activation function estimation formula based on the negative database is used to perform calculations, thereby completing the forward propagation and back propagation processes of the neural network; until the neural network converges, a trained data privacy protection model is obtained; the present invention solves the problem that differential privacy is difficult to balance privacy and utilization, and the problem that methods such as homomorphic encryption lead to excessive computational costs, and has strong robustness, and comprehensively improves the efficiency and accuracy of the privacy protection deep learning process.

Description

Negative database based on ciphertext and data privacy protection method for deep learning
Technical Field
The invention belongs to the technical field of data privacy protection, relates to a novel data privacy protection method, and in particular relates to a negative database based on ciphertext and a data privacy protection method for deep learning.
Background
In recent years, deep learning has received extensive attention from academia and industry because of its ability to mine useful knowledge from large-scale data. Deep learning has been applied in various fields and has made a number of remarkable breakthroughs and developments. However, few studies have focused on the importance of privacy concerns, as a large amount of data is involved, and privacy concerns become more important than ever before. For example, medical data may include patient private data such as disease, family history, and DNA sequences. While some banking-like financial institutions store sensitive information for many customers, exposure to leakage once such data is analyzed can result in immeasurable losses and even threatens personal safety. Therefore, these technologies should pay more attention to the privacy disclosure, which is a potential problem, while benefiting mankind to accelerate social development.
In recent years, several deep learning methods for protecting privacy have been proposed by scholars, however, most of them suffer from reduced efficiency or reduced accuracy. In particular, differential privacy-based methods protect data privacy by adding noise, which can affect the accuracy and practicality of the data. Homomorphic encryption based methods generally require high computational costs and efficiency can become intolerant in scenarios with large-scale data.
Negative Database (NDB) is a new form of information representation, the inspiration of which comes from negative selection mechanisms in the artificial immune system. NDB stores information in a supplementary set of DB to achieve privacy protection, and it can also support operations of insertion, deletion, update, and selection, like a conventional database. Reverse negative database recovery of raw data has proven to be an NP-hard problem. Furthermore, it supports coarse distance estimation. These characteristics make it suitable for many areas of privacy, for example. Password authentication, information hiding, biometric authentication, data mining and other fields.
Although effective privacy protection can be achieved by combining the negative data with deep learning, the privacy protection degree is also possible to be improved under the condition that parameter setting is extreme. The exclusive-or operation is applied to various scenes as a simple and efficient operation, and if the exclusive-or operation is performed by using a binary string generated randomly before converting the data into a negative database, the privacy of the data is further protected.
Disclosure of Invention
In order to solve the technical problems, the invention provides a data privacy protection method and system based on a ciphertext-based negative data combined with a deep learning algorithm.
The technical scheme adopted by the invention is as follows: a kind of negative database and data privacy protection method of deep learning based on ciphertext, adopt the data privacy protection model to protect the data privacy;
The data privacy protection model comprises the following steps:
Step 1: preprocessing the original data and converting the original data into a binary string X= { X 1…Xn };
Step 2: performing exclusive or encryption on the secret key K with the specified length and the data processed in the step 1 to obtain encrypted data X '= { X 1'…Xn' };
step 3: selecting a negative database generation algorithm, and generating a corresponding negative database NDB= { NDB 1…NDBn } according to the data X '= { X 1'…Xn' } encrypted in the step 2;
step 4: extracting an outline s= { S 1...Sn } of the negative database from step 3, wherein S i is an outline of NDB i;
Step 5: based on the obstruction S, the activation function estimation based on the negative database is completed, the deep learning network is trained until the network converges, and a trained data privacy protection model is obtained.
Aiming at the characteristics of the current negative database and exclusive or operation, the invention provides a ciphertext-based negative database privacy protection method, which solves the problems that the privacy and the utilization rate are difficult to balance due to differential privacy, and the calculation cost is overlarge due to homomorphic encryption and other methods, has stronger robustness, and comprehensively improves the efficiency and the precision in the privacy protection deep learning process.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a method application scenario diagram of an embodiment of the present invention;
FIG. 3 is a diagram illustrating an example of XOR encryption in accordance with an embodiment of the present invention;
FIG. 4 is a graph comparing the original activation function estimation with the embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and practice of the invention, those of ordinary skill in the art will now make further details with reference to the drawings and examples, it being understood that the examples described herein are for the purpose of illustration and explanation only and are not intended to limit the invention thereto.
Referring to fig. 1 and fig. 2, the invention provides a data privacy protection method for a ciphertext-based negative database and deep learning, which adopts a data privacy protection model to protect data privacy;
The data privacy protection model of the embodiment, the acquisition process includes the following steps:
Step 1: preprocessing the original data and converting the original data into a binary string X= { X 1…Xn };
Step 2: performing exclusive or encryption on the secret key K with the specified length and the data processed in the step 1 to obtain encrypted data X '= { X 1'…Xn' };
In this embodiment, the original data x= { X 1…Xn } is encrypted as X '= { X 1'…Xn' } using a randomly generated key K;
The key K is a binary string of len xor, and if x=x k, the j-th bit of the i-th attribute of X will be xored with the (i×l+j)% len xor bit of the key K Where L represents the length of the attribute. Fig. 3 shows an exclusive-or encryption case.
According to the embodiment, keys with different lengths can be generated according to the characteristics of the original data so as to achieve privacy protection of different degrees. The present embodiment employs a QK-hidden negative database generation algorithm that can control the distribution of negative database records with finer granularity through a set of parameters Q, thereby making the computation more accurate.
Step 3: selecting a negative database generation algorithm, and generating a corresponding negative database NDB= { NDB 1…NDBn } according to the data X '= { X 1'…Xn' } encrypted in the step 2;
The present embodiment generates a corresponding negative database ndb= { NDB 1…NDBn } for the encrypted data X ' = { X 1'…Xn ' }, using the QK-hidden algorithm, where NDB i (i=1..n) is the negative database generated from the ciphertext X i '.
Step 4: extracting the outline s= { S 1...Sn } of the negative database from the step 3, wherein S i is the outline of NDB i, and uploading the outline to a high-performance server;
The present embodiment extracts the summary s= { S 1...Sn } from the negative database in NDB, where S i is the summary of NDB i, and uploads S and tag data y= { y 1…yn } to the server.
The present embodiment improves efficiency by extracting the tabs of the negative database. The sketch is used as a two-dimensional array, the number of the negative databases recorded as '0' and '1' in each bit is stored, and the security is improved while the negative databases are compressed.
Step 5: and (3) the server receives the sketch uploaded in the step (4), completes activation function estimation based on the negative database, trains the deep learning network until the network converges, and obtains a trained data privacy protection model.
Please refer to fig. 4, in this embodiment, since the search obtained by the server is input to the neural network instead of the original privacy data, the activation function cannot be calculated again by the original data. Therefore, for activation functions such as Sigmoid, reLU, and tanh, they were initially calculated on the neural network as follows:
Where z represents the linear computation in the neuron, expressed as X=x 1…xM represents the original privacy data, x∈x, M represents the number of input X attributes, and w 1、…、wM represents the weight parameters in the neural network;
If the input is x and the negative database generated by the input is NDB x, calculating the probability P diff [ i ] that the ith bit recorded in the NDB x is different from x according to the formula (4);
K represents that K types of negative database records exist, wherein the type of negative database record in the ith type has i determination bits, p j represents the probability of generating the type of negative database record in the ith type, the j determination bits are opposite to the corresponding positions of the hidden strings, the remaining K-j determination bits are identical to the hidden strings, and q i represents the probability that the ith bit of the selection attribute is different from the hidden strings in the corresponding positions; l represents the number of bits of the attribute;
If it is The probability that the j-th bit of the i-th attribute in the hidden string x corresponding to the NDB x is '0' is represented and calculated through a formula (5);
Wherein, P same [ j ] represents the probability that the j-th bit recorded in the NDB x is the same as the hidden string at the corresponding position; n 0 and n 1 are the numbers of the j-th bit being '0' or '1' in all the recorded i-th attributes in NDB x, respectively, and are obtained from the result obtained in step 3; if x=x k, then n 0=Sk[i×L+j][0],n1=Sk [ i X L + j ] [1],
The probability of the ith attribute of x being d is calculated by equation (6) as:
Wherein d is more than or equal to 0 and less than or equal to 2 L -1, Is a binary representation of x i, the binary representation of d is d bin=b1...bL;
then use the formula (7) to Substituting z in formulas (1), (2) and (3) to complete the evaluation of the activation function;
Training a deep learning network, and completing forward propagation calculation by selecting data with batch size t each time through a formula (7); then back-propagating through equation (8) to calculate the gradient Finally, updating the weight W= { W 1…Wn } through a formula (9) until the parameter reaches the optimal or maximum iteration times;
Where y= { y 1…yn } is tag data corresponding to input x= { X 1…Xn }, loss represents a Loss function, and η is a learning rate.
In the test stage, the user uploads the test data to the server after the test data are transformed in the steps 1 and 2, and the server predicts the test data by using a trained model and returns the result to the client.
It should be understood that the foregoing description of the preferred embodiments is not intended to limit the scope of the invention, but rather to limit the scope of the claims, and that those skilled in the art can make substitutions or modifications without departing from the scope of the invention as set forth in the appended claims.

Claims (4)

1. A kind of data privacy protection method based on the cipher text negative database and deep learning, characterized in that: adopting a data privacy protection model to protect data privacy;
The data privacy protection model comprises the following steps:
Step 1: preprocessing the original data and converting the original data into a binary string X= { X 1…Xn };
Step 2: performing exclusive or encryption on the secret key K with the specified length and the data processed in the step 1 to obtain encrypted data X '= { X 1'…Xn' };
step 3: selecting a negative database generation algorithm, and generating a corresponding negative database NDB= { NDB 1…NDBn } according to the data X '= { X 1'…Xn' } encrypted in the step 2;
step 4: extracting an outline s= { S 1...Sn } of the negative database from step 3, wherein S i is an outline of NDB i;
Step 5: based on the obstruction S, the activation function estimation based on the negative database is completed, the deep learning network is trained until the network converges, and a trained data privacy protection model is obtained.
2. The method for protecting data privacy for ciphertext-based negative databases and deep learning of claim 1, wherein in step 2, the original data x= { X 1…Xn } is encrypted as X '= { X 1'…Xn' } using a randomly generated key K;
The key K is a binary string of len xor, and if x=x k, the j-th bit of the i-th attribute of X will be xored with the (i×l+j)% len xor bit of the key K Where L represents the length of the attribute.
3. The method for protecting data privacy of ciphertext-based negative database and deep learning of claim 1, wherein in step 5, for Sigmoid, reLU, and tanh activation functions f 1(z)、f2(z)、f3 (z), the neural network is calculated as follows:
where z represents the result of the linear computation in the neuron, expressed as [ x 1…xM ] Representing original privacy data, X epsilon X, M representing the number of input X attributes, and w 1、…、wM representing weight parameters in the neural network;
If the input is x and the negative database generated by the input is NDB x, calculating the probability P diff [ i ] that the ith bit recorded in the NDB x is different from the hidden string at the corresponding position according to the formula (4);
K in the above formula represents that K types of negative database records exist, wherein i determination bits are recorded in the type i negative database record, p j represents the probability of generating the type j negative database record, j determination bits are opposite to the corresponding position of the hidden string, the rest K-j determination bits are the same as the hidden string, q i represents the probability of selecting the type i bit of the attribute to be different from the hidden string at the corresponding position, and L represents the length of the attribute;
If it is The probability that the j-th bit of the i-th attribute in the hidden string x corresponding to the NDB x is '0' is represented and calculated through a formula (5);
Wherein, P same [ j ] represents the probability that the j-th bit recorded in the NDB x is the same as the hidden string at the corresponding position; n 0 and n 1 are the number of "0" or "1" bits, respectively, in all records in NDB x for the j-th bit in the i-th attribute, if x=x k, then n 0=Sk[i×L+j][0],n1=Sk [ i X L + j ] [1],
The probability of the ith attribute of x being d is calculated by equation (6) as:
Wherein d is more than or equal to 0 and less than or equal to 2 L -1, Is a binary representation of x i, the binary representation of d is d bin=b1...bL;
then use the formula (7) to Substituting z in formulas (1), (2) and (3) to complete the evaluation of the activation function;
4. The method for protecting data privacy of a ciphertext-based negative database and deep learning of claim 3, wherein in step 5, the training deep learning network completes forward propagation calculation by selecting data with batch size t each time through a formula (7); then back-propagating through equation (8) to calculate the gradient Finally, updating the weight W through a formula (9) until the parameter reaches the optimal or maximum iteration times;
Where y= { y 1…yn } is tag data corresponding to input x= { X 1…Xn }, loss represents a Loss function, and η is a learning rate.
CN202210159783.2A 2022-02-22 2022-02-22 Data privacy protection method based on negative database and deep learning based on ciphertext Active CN114626086B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210159783.2A CN114626086B (en) 2022-02-22 2022-02-22 Data privacy protection method based on negative database and deep learning based on ciphertext

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210159783.2A CN114626086B (en) 2022-02-22 2022-02-22 Data privacy protection method based on negative database and deep learning based on ciphertext

Publications (2)

Publication Number Publication Date
CN114626086A CN114626086A (en) 2022-06-14
CN114626086B true CN114626086B (en) 2024-10-18

Family

ID=81900033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210159783.2A Active CN114626086B (en) 2022-02-22 2022-02-22 Data privacy protection method based on negative database and deep learning based on ciphertext

Country Status (1)

Country Link
CN (1) CN114626086B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260081A (en) * 2020-02-14 2020-06-09 广州大学 Non-interactive privacy protection multi-party machine learning method
CN113434873A (en) * 2021-06-01 2021-09-24 内蒙古大学 Federal learning privacy protection method based on homomorphic encryption

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4220464A1 (en) * 2017-03-22 2023-08-02 Visa International Service Association Privacy-preserving machine learning
CN110598438B (en) * 2019-07-19 2023-05-30 福州大学 Privacy protection system for protecting outsourced data in the cloud based on deep convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260081A (en) * 2020-02-14 2020-06-09 广州大学 Non-interactive privacy protection multi-party machine learning method
CN113434873A (en) * 2021-06-01 2021-09-24 内蒙古大学 Federal learning privacy protection method based on homomorphic encryption

Also Published As

Publication number Publication date
CN114626086A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
Rao et al. Privacy inference attack and defense in centralized and federated learning: A comprehensive survey
CN108898028B (en) Neural network model encryption protection system and method related to iteration and random encryption
CN110490128A (en) A kind of hand-written recognition method based on encryption neural network
CN119383014B (en) Multi-private-domain visitor portrait sharing and privacy protection routing method based on federal learning
Devaraj et al. An efficient framework for secure image archival and retrieval system using multiple secret share creation scheme
CN108920981B (en) Neural network model encryption protection system and method related to data iterative encryption
WO2011052056A1 (en) Data processing device
CN113240129A (en) Multi-type task image analysis-oriented federal learning system
CN112860932B (en) Image retrieval method, device, device and storage medium for resisting malicious sample attack
CN119180056A (en) Data privacy protection method based on multiparty security calculation
Dou et al. Towards privacy-preserving person re-identification via person identify shift
Li et al. Verifiable encrypted image retrieval with reversible data hiding in cloud environment
CN114626086B (en) Data privacy protection method based on negative database and deep learning based on ciphertext
Zhao et al. NegDL: Privacy-preserving deep learning based on negative database
CN116383470B (en) Image searching method with privacy protection function
CN118656733A (en) A user intention recognition method based on multimodal cross-improvement
CN117744146A (en) Face identity transformation method based on reversible neural network
CN117668895A (en) Secure image retrieval method based on fully homomorphic encryption and convolutional neural network
CN112256839B (en) A ciphertext search method, device, system and computer-readable storage medium
CN108460023A (en) A kind of carrier-free examination question camouflage method that the double full keys of certification in legal section rely on
Li et al. SeSMR: Secure and efficient session-based multimedia recommendation in edge computing
CN115659366A (en) Data reconstruction attack method, device and equipment for gradient compression federal learning
CN115496147A (en) A Naive Bayesian Classification Method with Privacy Preservation
CN114611134A (en) Multivariate time series data privacy protection method based on negative database and deep learning
Gu et al. Research on Face Attribute Privacy Protection Method Based on Multi task Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant