CN114626086B

CN114626086B - Data privacy protection method based on negative database and deep learning based on ciphertext

Info

Publication number: CN114626086B
Application number: CN202210159783.2A
Authority: CN
Inventors: 赵冬冬; 张平川
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2024-10-18
Anticipated expiration: 2042-02-22
Also published as: CN114626086A

Abstract

The present invention discloses a data privacy protection method based on a negative database and deep learning of ciphertext, which adopts a data privacy protection model to perform data privacy protection; firstly, the original data is converted into a binary string, and is XOR-encrypted with a randomly generated key; then a negative database generation algorithm is selected to generate a corresponding negative database for the confidential data; and a sketch of the negative database is extracted; then, an activation function estimation formula based on the negative database is used to perform calculations, thereby completing the forward propagation and back propagation processes of the neural network; until the neural network converges, a trained data privacy protection model is obtained; the present invention solves the problem that differential privacy is difficult to balance privacy and utilization, and the problem that methods such as homomorphic encryption lead to excessive computational costs, and has strong robustness, and comprehensively improves the efficiency and accuracy of the privacy protection deep learning process.

Description

Negative database based on ciphertext and data privacy protection method for deep learning

Technical Field

The invention belongs to the technical field of data privacy protection, relates to a novel data privacy protection method, and in particular relates to a negative database based on ciphertext and a data privacy protection method for deep learning.

Background

In recent years, deep learning has received extensive attention from academia and industry because of its ability to mine useful knowledge from large-scale data. Deep learning has been applied in various fields and has made a number of remarkable breakthroughs and developments. However, few studies have focused on the importance of privacy concerns, as a large amount of data is involved, and privacy concerns become more important than ever before. For example, medical data may include patient private data such as disease, family history, and DNA sequences. While some banking-like financial institutions store sensitive information for many customers, exposure to leakage once such data is analyzed can result in immeasurable losses and even threatens personal safety. Therefore, these technologies should pay more attention to the privacy disclosure, which is a potential problem, while benefiting mankind to accelerate social development.

In recent years, several deep learning methods for protecting privacy have been proposed by scholars, however, most of them suffer from reduced efficiency or reduced accuracy. In particular, differential privacy-based methods protect data privacy by adding noise, which can affect the accuracy and practicality of the data. Homomorphic encryption based methods generally require high computational costs and efficiency can become intolerant in scenarios with large-scale data.

Negative Database (NDB) is a new form of information representation, the inspiration of which comes from negative selection mechanisms in the artificial immune system. NDB stores information in a supplementary set of DB to achieve privacy protection, and it can also support operations of insertion, deletion, update, and selection, like a conventional database. Reverse negative database recovery of raw data has proven to be an NP-hard problem. Furthermore, it supports coarse distance estimation. These characteristics make it suitable for many areas of privacy, for example. Password authentication, information hiding, biometric authentication, data mining and other fields.

Although effective privacy protection can be achieved by combining the negative data with deep learning, the privacy protection degree is also possible to be improved under the condition that parameter setting is extreme. The exclusive-or operation is applied to various scenes as a simple and efficient operation, and if the exclusive-or operation is performed by using a binary string generated randomly before converting the data into a negative database, the privacy of the data is further protected.

Disclosure of Invention

In order to solve the technical problems, the invention provides a data privacy protection method and system based on a ciphertext-based negative data combined with a deep learning algorithm.

The technical scheme adopted by the invention is as follows: a kind of negative database and data privacy protection method of deep learning based on ciphertext, adopt the data privacy protection model to protect the data privacy;

The data privacy protection model comprises the following steps:

Step 1: preprocessing the original data and converting the original data into a binary string X= { X ₁…X_n };

Step 2: performing exclusive or encryption on the secret key K with the specified length and the data processed in the step 1 to obtain encrypted data X '= { X ₁'…X_n' };

step 3: selecting a negative database generation algorithm, and generating a corresponding negative database NDB= { NDB ₁…NDB_n } according to the data X '= { X ₁'…X_n' } encrypted in the step 2;

step 4: extracting an outline s= { S ₁...S_n } of the negative database from step 3, wherein S _i is an outline of NDB _i;

Step 5: based on the obstruction S, the activation function estimation based on the negative database is completed, the deep learning network is trained until the network converges, and a trained data privacy protection model is obtained.

Aiming at the characteristics of the current negative database and exclusive or operation, the invention provides a ciphertext-based negative database privacy protection method, which solves the problems that the privacy and the utilization rate are difficult to balance due to differential privacy, and the calculation cost is overlarge due to homomorphic encryption and other methods, has stronger robustness, and comprehensively improves the efficiency and the precision in the privacy protection deep learning process.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a method application scenario diagram of an embodiment of the present invention;

FIG. 3 is a diagram illustrating an example of XOR encryption in accordance with an embodiment of the present invention;

FIG. 4 is a graph comparing the original activation function estimation with the embodiment of the present invention.

Detailed Description

In order to facilitate the understanding and practice of the invention, those of ordinary skill in the art will now make further details with reference to the drawings and examples, it being understood that the examples described herein are for the purpose of illustration and explanation only and are not intended to limit the invention thereto.

Referring to fig. 1 and fig. 2, the invention provides a data privacy protection method for a ciphertext-based negative database and deep learning, which adopts a data privacy protection model to protect data privacy;

The data privacy protection model of the embodiment, the acquisition process includes the following steps:

In this embodiment, the original data x= { X ₁…X_n } is encrypted as X '= { X ₁'…X_n' } using a randomly generated key K;

The key K is a binary string of len _xor, and if x=x _k, the j-th bit of the i-th attribute of X will be xored with the (i×l+j)% len _xor bit of the key K Where L represents the length of the attribute. Fig. 3 shows an exclusive-or encryption case.

According to the embodiment, keys with different lengths can be generated according to the characteristics of the original data so as to achieve privacy protection of different degrees. The present embodiment employs a QK-hidden negative database generation algorithm that can control the distribution of negative database records with finer granularity through a set of parameters Q, thereby making the computation more accurate.

The present embodiment generates a corresponding negative database ndb= { NDB ₁…NDB_n } for the encrypted data X ' = { X ₁'…X_n ' }, using the QK-hidden algorithm, where NDB _i (i=1..n) is the negative database generated from the ciphertext X _i '.

Step 4: extracting the outline s= { S ₁...S_n } of the negative database from the step 3, wherein S _i is the outline of NDB _i, and uploading the outline to a high-performance server;

The present embodiment extracts the summary s= { S ₁...S_n } from the negative database in NDB, where S _i is the summary of NDB _i, and uploads S and tag data y= { y ₁…y_n } to the server.

The present embodiment improves efficiency by extracting the tabs of the negative database. The sketch is used as a two-dimensional array, the number of the negative databases recorded as '0' and '1' in each bit is stored, and the security is improved while the negative databases are compressed.

Step 5: and (3) the server receives the sketch uploaded in the step (4), completes activation function estimation based on the negative database, trains the deep learning network until the network converges, and obtains a trained data privacy protection model.

Please refer to fig. 4, in this embodiment, since the search obtained by the server is input to the neural network instead of the original privacy data, the activation function cannot be calculated again by the original data. Therefore, for activation functions such as Sigmoid, reLU, and tanh, they were initially calculated on the neural network as follows:

Where z represents the linear computation in the neuron, expressed as X=x ₁…x_M represents the original privacy data, x∈x, M represents the number of input X attributes, and w ₁、…、w_M represents the weight parameters in the neural network;

If the input is x and the negative database generated by the input is NDB _x, calculating the probability P _diff [ i ] that the ith bit recorded in the NDB _x is different from x according to the formula (4);

K represents that K types of negative database records exist, wherein the type of negative database record in the ith type has i determination bits, p _j represents the probability of generating the type of negative database record in the ith type, the j determination bits are opposite to the corresponding positions of the hidden strings, the remaining K-j determination bits are identical to the hidden strings, and q _i represents the probability that the ith bit of the selection attribute is different from the hidden strings in the corresponding positions; l represents the number of bits of the attribute;

If it is The probability that the j-th bit of the i-th attribute in the hidden string x corresponding to the NDB _x is '0' is represented and calculated through a formula (5);

Wherein, P _same [ j ] represents the probability that the j-th bit recorded in the NDB _x is the same as the hidden string at the corresponding position; n ₀ and n ₁ are the numbers of the j-th bit being '0' or '1' in all the recorded i-th attributes in NDB _x, respectively, and are obtained from the result obtained in step 3; if x=x _k, then n ₀＝S_k[i×L+j][0],n₁＝S_k [ i X L + j ] [1],

The probability of the ith attribute of x being d is calculated by equation (6) as:

Wherein d is more than or equal to 0 and less than or equal to 2 ^L -1, Is a binary representation of x _i, the binary representation of d is d ^bin＝b₁...b_L;

then use the formula (7) to Substituting z in formulas (1), (2) and (3) to complete the evaluation of the activation function;

Training a deep learning network, and completing forward propagation calculation by selecting data with batch size t each time through a formula (7); then back-propagating through equation (8) to calculate the gradient Finally, updating the weight W= { W ₁…W_n } through a formula (9) until the parameter reaches the optimal or maximum iteration times;

Where y= { y ₁…y_n } is tag data corresponding to input x= { X ₁…X_n }, loss represents a Loss function, and η is a learning rate.

In the test stage, the user uploads the test data to the server after the test data are transformed in the steps 1 and 2, and the server predicts the test data by using a trained model and returns the result to the client.

It should be understood that the foregoing description of the preferred embodiments is not intended to limit the scope of the invention, but rather to limit the scope of the claims, and that those skilled in the art can make substitutions or modifications without departing from the scope of the invention as set forth in the appended claims.

Claims

1. A kind of data privacy protection method based on the cipher text negative database and deep learning, characterized in that: adopting a data privacy protection model to protect data privacy;

The data privacy protection model comprises the following steps:

2. The method for protecting data privacy for ciphertext-based negative databases and deep learning of claim 1, wherein in step 2, the original data x= { X ₁…X_n } is encrypted as X '= { X ₁'…X_n' } using a randomly generated key K;

The key K is a binary string of len _xor, and if x=x _k, the j-th bit of the i-th attribute of X will be xored with the (i×l+j)% len _xor bit of the key K Where L represents the length of the attribute.

3. The method for protecting data privacy of ciphertext-based negative database and deep learning of claim 1, wherein in step 5, for Sigmoid, reLU, and tanh activation functions f ₁(z)、f₂(z)、f₃ (z), the neural network is calculated as follows:

where z represents the result of the linear computation in the neuron, expressed as [ x ₁…x_M ] Representing original privacy data, X epsilon X, M representing the number of input X attributes, and w ₁、…、w_M representing weight parameters in the neural network;

If the input is x and the negative database generated by the input is NDB _x, calculating the probability P _diff [ i ] that the ith bit recorded in the NDB _x is different from the hidden string at the corresponding position according to the formula (4);

K in the above formula represents that K types of negative database records exist, wherein i determination bits are recorded in the type i negative database record, p _j represents the probability of generating the type j negative database record, j determination bits are opposite to the corresponding position of the hidden string, the rest K-j determination bits are the same as the hidden string, q _i represents the probability of selecting the type i bit of the attribute to be different from the hidden string at the corresponding position, and L represents the length of the attribute;

Wherein, P _same [ j ] represents the probability that the j-th bit recorded in the NDB _x is the same as the hidden string at the corresponding position; n ₀ and n ₁ are the number of "0" or "1" bits, respectively, in all records in NDB _x for the j-th bit in the i-th attribute, if x=x _k, then n ₀＝S_k[i×L+j][0],n₁＝S_k [ i X L + j ] [1],

4. The method for protecting data privacy of a ciphertext-based negative database and deep learning of claim 3, wherein in step 5, the training deep learning network completes forward propagation calculation by selecting data with batch size t each time through a formula (7); then back-propagating through equation (8) to calculate the gradient Finally, updating the weight W through a formula (9) until the parameter reaches the optimal or maximum iteration times;