CN118118164A - Data protection method, prediction method and device based on longitudinal federal learning model - Google Patents
Data protection method, prediction method and device based on longitudinal federal learning model Download PDFInfo
- Publication number
- CN118118164A CN118118164A CN202211527282.1A CN202211527282A CN118118164A CN 118118164 A CN118118164 A CN 118118164A CN 202211527282 A CN202211527282 A CN 202211527282A CN 118118164 A CN118118164 A CN 118118164A
- Authority
- CN
- China
- Prior art keywords
- data
- exchanged
- encryption
- training sample
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0816—Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
- H04L9/0819—Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s)
- H04L9/083—Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s) involving central third party, e.g. key distribution center [KDC] or trusted third party [TTP]
- H04L9/0833—Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s) involving central third party, e.g. key distribution center [KDC] or trusted third party [TTP] involving conference or group key
- H04L9/0836—Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s) involving central third party, e.g. key distribution center [KDC] or trusted third party [TTP] involving conference or group key using tree structure or hierarchical structure
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0478—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload applying multiple layers of encryption, e.g. nested tunnels or encrypting the content with a first key and then with at least a second key
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
- H04L9/0869—Generation of secret information including derivation or calculation of cryptographic keys or passwords involving random numbers or seeds
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/08—Randomization, e.g. dummy operations or using noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/46—Secure multiparty computation, e.g. millionaire problem
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The disclosure provides a data protection method, a prediction method and a prediction device based on a longitudinal federal learning model. The data protection method comprises the following steps: acquiring data to be exchanged of each training sample; determining an encryption function for homomorphic encryption of the data to be exchanged of each training sample based on the data to be exchanged of each training sample, wherein a key of the encryption function is determined based on encryption precision and the data to be exchanged of each training sample; homomorphic encryption is carried out on data to be exchanged of each training sample by using an encryption function, homomorphic encryption results corresponding to each training sample are obtained, and homomorphic encryption results corresponding to each training sample are sent to at least one participant in the longitudinal federal learning system; and receiving an aggregation result of homomorphic encryption results corresponding to each training sample from the at least one participant, and decrypting the received aggregation result by using a decryption function corresponding to the encryption function to obtain a decryption result.
Description
Technical Field
The present disclosure relates generally to the field of information security, and more particularly, to a data protection method, a prediction method and apparatus based on a longitudinal federal learning model.
Background
Longitudinal federal learning is applicable to federal learning scenarios composed of participants on a dataset that have the same sample space, different feature spaces. In the process that the participants train the federal learning model together, a large amount of data transmission is required by each participant, and in order to ensure the privacy safety of the data, each participant needs to encrypt the data before transmitting the data information, so that the real data of each participant cannot be local.
The gradient lifting tree model (GBDT, gradient Boosting Decision Tree) is widely used in longitudinal federal learning tasks because of the advantages of strong interpretability, stable effect and the like. However, the GBDT model for longitudinal federal learning at present reduces the efficiency of data protection based on encryption and decryption because of low efficiency of data encryption and decryption, and further reduces the training efficiency of the whole model, which cannot meet the time overhead requirement of large-scale data training.
Taking the currently popular SecureBoost scheme as an example, the scheme utilizes a paillier homomorphic encryption algorithm to encrypt data information, and then the operations such as decryption and the like are needed to be carried out on the encrypted data. In the process, the training efficiency of the whole federal learning model is low due to the fact that the operation speed of the paillier homomorphic encryption algorithm is low.
Disclosure of Invention
The exemplary embodiments of the present disclosure provide a data protection method, a prediction method and a device based on a vertical federal learning model, which can improve the efficiency of data encryption, decryption, and ciphertext operation while protecting data privacy.
According to an exemplary embodiment of the present disclosure, there is provided a data protection method applied to a vertical federal learning system, the data protection method including: acquiring data to be exchanged of each training sample; determining an encryption function for homomorphic encryption of the data to be exchanged of each training sample based on the data to be exchanged of each training sample, wherein the encryption function relates to encryption precision and a key, and the key is determined based on the encryption precision and the data to be exchanged of each training sample; homomorphic encryption is carried out on data to be exchanged of each training sample by using the encryption function, homomorphic encryption results corresponding to each training sample are obtained, and the homomorphic encryption results corresponding to each training sample are sent to at least one participant in the longitudinal federal learning system; and receiving an aggregation result of homomorphic encryption results corresponding to the training samples from the at least one participant, and decrypting the received aggregation result by using a decryption function corresponding to the encryption function to obtain a decryption result.
Optionally, the key is used for ensuring that the sum of the at least one data to be exchanged can be obtained through the sum of homomorphic encryption results of the at least one data to be exchanged and the modulo operation result of the key.
Optionally, the step of determining, based on the data to be exchanged of the respective training samples, an encryption function for homomorphically encrypting the data to be exchanged of the respective training samples includes: and carrying out randomization processing on the secret key aiming at the data to be exchanged of each training sample to obtain an encryption function corresponding to the data to be exchanged of each training sample.
Optionally, the step of randomizing the key for the data to be exchanged of each training sample to obtain an encryption function corresponding to the data to be exchanged of each training sample includes: the following procedure is executed for the data to be exchanged of each training sample respectively: generating a random number for the data to be exchanged of the training sample; and calculating the product of the random number corresponding to the data to be exchanged of the training sample and the secret key, and determining an encryption function corresponding to the data to be exchanged of the training sample based on the product and the data to be exchanged of the training sample.
Optionally, the method further comprises: a key generation process; the key generation process comprises the following steps: calculating the product of the data to be exchanged of each training sample and the encryption precision, performing rounding operation on each obtained product, and summing the results obtained by all rounding operations; the key is generated based on the result of the summation operation.
Optionally, the step of decrypting the received aggregation result by using a decryption function corresponding to the encryption function, to obtain a decryption result includes: the following procedure is performed for each received aggregation result, respectively: determining a first function based on the aggregation result and the key; determining a decryption function corresponding to the aggregation result based on a first function and decryption precision corresponding to the aggregation result, and decrypting the aggregation result by using the decryption function corresponding to the aggregation result to obtain a decryption result of the aggregation result, wherein the decryption precision is determined based on the encryption precision.
Optionally, the step of determining the first function based on the aggregation result and the key comprises: performing modular operation on the aggregation result and the secret key; when the obtained modular operation result is in a first preset range, taking the modular operation result as a first function corresponding to the aggregation result; and when the obtained modular operation result is in a second preset range, calculating a difference value between the modular operation result and the secret key, and taking the difference value as a first function corresponding to the aggregation result.
Optionally, the step of determining the decryption function corresponding to the aggregation result based on the first function corresponding to the aggregation result and the decryption accuracy includes: and calculating the ratio of the first function corresponding to the aggregation result to the decryption precision, and taking the ratio as a decryption function corresponding to the aggregation result.
Optionally, the encryption precision is the same as the decryption precision.
Optionally, the step of acquiring data to be exchanged of each training sample includes: and determining the loss function of each training sample according to the label value and the current predicted value of each training sample, and taking gradient information of the loss function of each training sample as data to be exchanged of each training sample.
According to an exemplary embodiment of the present disclosure, there is provided a prediction method based on a longitudinal federal learning model, the prediction method including: acquiring data to be exchanged of each training sample; determining an encryption function for homomorphic encryption of the data to be exchanged of each training sample based on the data to be exchanged of each training sample, wherein the encryption function relates to encryption precision and a key, and the key is determined based on the encryption precision and the data to be exchanged of each training sample; homomorphic encryption is carried out on data to be exchanged of each training sample by using the encryption function, homomorphic encryption results corresponding to each training sample are obtained, and the homomorphic encryption results corresponding to each training sample are sent to at least one participant in a longitudinal federal learning system; receiving an aggregation result of homomorphic encryption results corresponding to each training sample from the at least one participant, and decrypting the received aggregation result by using a decryption function corresponding to the encryption function to obtain a decryption result; constructing a gradient lifting tree model cooperatively trained by the longitudinal federal learning system based on the decryption result; and predicting the sample to be predicted by using the gradient lifting tree model.
Optionally, the step of constructing a gradient lifting tree model cooperatively trained by the longitudinal federal learning system based on the decryption result includes: aiming at the current tree node to be segmented of the constructed current tree, calculating segmentation gain values of candidate segmentation points of the current tree node based on the decryption result; selecting a candidate segmentation point with the maximum segmentation gain value as a segmentation point of the current tree node, dividing training samples on the current tree node into a left child node and a right child node of the current tree node based on the segmentation point, notifying the at least one participant of the current division result so that the at least one participant obtains a new aggregation result of homomorphic encryption results corresponding to each training sample based on the division result, and returning to execute the step of receiving the aggregation result of homomorphic encryption results corresponding to each training sample from the at least one participant until the current tree construction is completed; and when the current tree construction is completed, updating the predicted value of each training sample based on the current tree, and returning to execute the step of acquiring the data to be exchanged of each training sample until all the tree constructions of the gradient lifting tree model are completed.
Optionally, the aggregate result received from each participant is: and the party carries out aggregation processing on homomorphic encryption results corresponding to each training sample based on the feature data of each training sample and the division result.
Optionally, the step of receiving, from the at least one participant, an aggregate result of homomorphic encryption results corresponding to the respective training samples includes: receiving a gradient histogram and an identification of the gradient histogram from the at least one participant, wherein each bucket in the gradient histogram corresponds to an aggregate result; wherein the prediction method further comprises: recording the identification of a gradient histogram corresponding to the segmentation point of the current tree node and the identification of a bucket in the corresponding gradient histogram on a lookup table for storing the gradient lifting tree model structure; the step of predicting the sample to be predicted by using the gradient lifting tree model comprises the following steps: and predicting the sample to be predicted based on the lookup table.
Optionally, the step of acquiring data to be exchanged of each training sample includes: and determining the loss function of each training sample according to the label value and the current predicted value of each training sample, and taking gradient information of the loss function of each training sample as data to be exchanged of each training sample.
Optionally, the step of using gradient information of the loss function of each training sample as data to be exchanged of each training sample includes: taking the first-order gradient value and the second-order gradient value of the loss function of each training sample as data to be exchanged of each training sample; wherein the step of determining an encryption function for homomorphically encrypting the data to be exchanged of the respective training samples based on the data to be exchanged of the respective training samples includes: determining a first encryption function for encrypting the first-order gradient values corresponding to the training samples based on the first-order gradient values corresponding to the training samples, wherein a key of the first encryption function is determined based on the encryption precision and the first-order gradient values corresponding to the training samples; and determining a second encryption function for encrypting the second-order gradient values corresponding to the training samples based on the second-order gradient values corresponding to the training samples, wherein a key of the second encryption function is determined based on the encryption precision and the second-order gradient values corresponding to the training samples.
Optionally, the step of updating the predicted value of each training sample based on the current tree includes: for each training sample, updating the predicted value of the training sample according to the weight of the leaf node of the current tree where the training sample is located; wherein the weight of each leaf node of the current tree is determined based on gradient information corresponding to all training samples on the leaf node.
Optionally, the key is used for ensuring that the sum of the at least one data to be exchanged can be obtained through the sum of homomorphic encryption results of the at least one data to be exchanged and the modulo operation result of the key.
Optionally, the step of determining, based on the data to be exchanged of the respective training samples, an encryption function for homomorphically encrypting the data to be exchanged of the respective training samples includes: and carrying out randomization processing on the secret key aiming at the data to be exchanged of each training sample to obtain an encryption function corresponding to the data to be exchanged of each training sample.
Optionally, the step of randomizing the key for the data to be exchanged of each training sample to obtain an encryption function corresponding to the data to be exchanged of each training sample includes: the following procedure is executed for the data to be exchanged of each training sample respectively: generating a random number for the data to be exchanged of the training sample; and calculating the product of the random number corresponding to the data to be exchanged of the training sample and the secret key, and determining an encryption function corresponding to the data to be exchanged of the training sample based on the product and the data to be exchanged of the training sample.
Optionally, the method further comprises: a key generation process; the key generation process comprises the following steps: calculating the product of the data to be exchanged of each training sample and the encryption precision, performing rounding operation on each obtained product, and summing the results obtained by all rounding operations; the key is generated based on the result of the summation operation.
Optionally, the step of decrypting the received aggregation result by using a decryption function corresponding to the encryption function, to obtain a decryption result includes: the following procedure is performed for each received aggregation result, respectively: determining a first function based on the aggregation result and the key; determining a decryption function corresponding to the aggregation result based on a first function and decryption precision corresponding to the aggregation result, and decrypting the aggregation result by using the decryption function corresponding to the aggregation result to obtain a decryption result of the aggregation result, wherein the decryption precision is determined based on the encryption precision.
Optionally, the step of determining the first function based on the aggregation result and the key comprises: performing modular operation on the aggregation result and the secret key; when the obtained modular operation result is in a first preset range, taking the modular operation result as a first function corresponding to the aggregation result; and when the obtained modular operation result is in a second preset range, calculating a difference value between the modular operation result and the secret key, and taking the difference value as a first function corresponding to the aggregation result.
Optionally, the step of determining the decryption function corresponding to the aggregation result based on the first function corresponding to the aggregation result and the decryption accuracy includes: and calculating the ratio of the first function corresponding to the aggregation result to the decryption precision, and taking the ratio as a decryption function corresponding to the aggregation result.
Optionally, the encryption precision is the same as the decryption precision.
According to an exemplary embodiment of the present disclosure, there is provided a data protection apparatus applied to a vertical federal learning system, the data protection apparatus including: an acquisition unit configured to acquire data to be exchanged of each training sample; an encryption function determining unit configured to determine an encryption function for homomorphic encryption of data to be exchanged of the respective training samples based on the data to be exchanged of the respective training samples, wherein the encryption function relates to an encryption precision and a key determined based on the encryption precision and the data to be exchanged of the respective training samples; the homomorphic encryption unit is configured to obtain homomorphic encryption results corresponding to the training samples by homomorphic encrypting the data to be exchanged of the training samples by using the encryption function; a transmitting unit configured to transmit homomorphic encryption results corresponding to the respective training samples to at least one participant in the vertical federal learning system; a receiving unit configured to receive, from the at least one participant, an aggregate result of homomorphic encryption results corresponding to the respective training samples; and the decryption unit is configured to decrypt the received aggregation result by using a decryption function corresponding to the encryption function to obtain a decryption result.
Optionally, the key is used for ensuring that the sum of the at least one data to be exchanged can be obtained through the sum of homomorphic encryption results of the at least one data to be exchanged and the modulo operation result of the key.
Optionally, the encryption function determining unit is configured to: and carrying out randomization processing on the secret key aiming at the data to be exchanged of each training sample to obtain an encryption function corresponding to the data to be exchanged of each training sample.
Optionally, the encryption function determining unit is configured to: the following procedure is executed for the data to be exchanged of each training sample respectively: generating a random number for the data to be exchanged of the training sample; and calculating the product of the random number corresponding to the data to be exchanged of the training sample and the secret key, and determining an encryption function corresponding to the data to be exchanged of the training sample based on the product and the data to be exchanged of the training sample.
Optionally, the method further comprises: the key generation unit is configured to respectively calculate products of data to be exchanged and the encryption precision of each training sample, perform rounding operation on each obtained product, and sum results obtained by all rounding operations; the key is generated based on the result of the summation operation.
Optionally, the decryption unit is configured to: the following procedure is performed for each received aggregation result, respectively: determining a first function based on the aggregation result and the key; determining a decryption function corresponding to the aggregation result based on a first function and decryption precision corresponding to the aggregation result, and decrypting the aggregation result by using the decryption function corresponding to the aggregation result to obtain a decryption result of the aggregation result, wherein the decryption precision is determined based on the encryption precision.
Optionally, the decryption unit is configured to: performing modular operation on the aggregation result and the secret key; when the obtained modular operation result is in a first preset range, taking the modular operation result as a first function corresponding to the aggregation result; and when the obtained modular operation result is in a second preset range, calculating a difference value between the modular operation result and the secret key, and taking the difference value as a first function corresponding to the aggregation result.
Optionally, the decryption unit is configured to: and calculating the ratio of the first function corresponding to the aggregation result to the decryption precision, and taking the ratio as a decryption function corresponding to the aggregation result.
Optionally, the encryption precision is the same as the decryption precision.
Optionally, the acquisition unit is configured to: and determining the loss function of each training sample according to the label value and the current predicted value of each training sample, and taking gradient information of the loss function of each training sample as data to be exchanged of each training sample.
According to an exemplary embodiment of the present disclosure, there is provided a prediction apparatus based on a longitudinal federal learning model, the prediction apparatus including: an acquisition unit configured to acquire data to be exchanged of each training sample; an encryption function determining unit configured to determine an encryption function for homomorphic encryption of data to be exchanged of the respective training samples based on the data to be exchanged of the respective training samples, wherein the encryption function relates to an encryption precision and a key determined based on the encryption precision and the data to be exchanged of the respective training samples; the homomorphic encryption unit is configured to obtain homomorphic encryption results corresponding to the training samples by homomorphic encrypting the data to be exchanged of the training samples by using the encryption function; a transmitting unit configured to transmit homomorphic encryption results corresponding to the respective training samples to at least one participant in a longitudinal federal learning system; a receiving unit configured to receive, from the at least one participant, an aggregate result of homomorphic encryption results corresponding to the respective training samples; a decryption unit configured to decrypt the received aggregation result using a decryption function corresponding to the encryption function, to obtain a decryption result; a model construction unit configured to construct a gradient lifting tree model cooperatively trained by the longitudinal federal learning system based on the decryption result; and the prediction unit is configured to predict a sample to be predicted by using the gradient lifting tree model.
Optionally, the model construction unit is configured to: aiming at the current tree node to be segmented of the constructed current tree, calculating segmentation gain values of candidate segmentation points of the current tree node based on the decryption result; selecting a candidate segmentation point with the maximum segmentation gain value as a segmentation point of the current tree node, dividing training samples on the current tree node into a left child node and a right child node of the current tree node based on the segmentation point, notifying the at least one participant of the current division result so that the at least one participant obtains a new aggregation result of homomorphic encryption results corresponding to each training sample based on the division result, and receiving the aggregation result of homomorphic encryption results corresponding to each training sample from the at least one participant by a receiving unit until the current tree construction is completed; and when the current tree construction is completed, updating the predicted value of each training sample based on the current tree by an acquisition unit, and acquiring data to be exchanged of each training sample until all tree constructions of the gradient lifting tree model are completed.
Optionally, the aggregate result received from each participant is: and the party carries out aggregation processing on homomorphic encryption results corresponding to each training sample based on the feature data of each training sample and the division result.
Optionally, the receiving unit is configured to: receiving a gradient histogram and an identification of the gradient histogram from the at least one participant, wherein each bucket in the gradient histogram corresponds to an aggregate result; wherein the prediction apparatus further comprises: a lookup table construction unit configured to record, on a lookup table for storing the gradient lifting tree model structure, an identification of a gradient histogram corresponding to a cut point of the current tree node and an identification of a bucket in the corresponding gradient histogram; wherein the prediction unit is configured to: and predicting the sample to be predicted based on the lookup table.
Optionally, the acquisition unit is configured to: and determining the loss function of each training sample according to the label value and the current predicted value of each training sample, and taking gradient information of the loss function of each training sample as data to be exchanged of each training sample.
Optionally, the acquisition unit is configured to: taking the first-order gradient value and the second-order gradient value of the loss function of each training sample as data to be exchanged of each training sample; wherein the encryption function determination unit is configured to: determining a first encryption function for encrypting the first-order gradient values corresponding to the training samples based on the first-order gradient values corresponding to the training samples, wherein a key of the first encryption function is determined based on the encryption precision and the first-order gradient values corresponding to the training samples; and determining a second encryption function for encrypting the second-order gradient values corresponding to the training samples based on the second-order gradient values corresponding to the training samples, wherein a key of the second encryption function is determined based on the encryption precision and the second-order gradient values corresponding to the training samples.
Optionally, the acquisition unit is configured to: for each training sample, updating the predicted value of the training sample according to the weight of the leaf node of the current tree where the training sample is located; wherein the weight of each leaf node of the current tree is determined based on gradient information corresponding to all training samples on the leaf node.
Optionally, the key is used for ensuring that the sum of the at least one data to be exchanged can be obtained through the sum of homomorphic encryption results of the at least one data to be exchanged and the modulo operation result of the key.
Optionally, the encryption function determining unit is configured to: and carrying out randomization processing on the secret key aiming at the data to be exchanged of each training sample to obtain an encryption function corresponding to the data to be exchanged of each training sample.
Optionally, the encryption function determining unit is configured to: the following procedure is executed for the data to be exchanged of each training sample respectively: generating a random number for the data to be exchanged of the training sample; and calculating the product of the random number corresponding to the data to be exchanged of the training sample and the secret key, and determining an encryption function corresponding to the data to be exchanged of the training sample based on the product and the data to be exchanged of the training sample.
Optionally, the method further comprises: the key generation unit is configured to respectively calculate products of data to be exchanged and the encryption precision of each training sample, perform rounding operation on each obtained product, and sum results obtained by all rounding operations; the key is generated based on the result of the summation operation.
Optionally, the decryption unit is configured to: the following procedure is performed for each received aggregation result, respectively: determining a first function based on the aggregation result and the key; determining a decryption function corresponding to the aggregation result based on a first function and decryption precision corresponding to the aggregation result, and decrypting the aggregation result by using the decryption function corresponding to the aggregation result to obtain a decryption result of the aggregation result, wherein the decryption precision is determined based on the encryption precision.
Optionally, the decryption unit is configured to: performing modular operation on the aggregation result and the secret key; when the obtained modular operation result is in a first preset range, taking the modular operation result as a first function corresponding to the aggregation result; and when the obtained modular operation result is in a second preset range, calculating a difference value between the modular operation result and the secret key, and taking the difference value as a first function corresponding to the aggregation result.
Optionally, the decryption unit is configured to: and calculating the ratio of the first function corresponding to the aggregation result to the decryption precision, and taking the ratio as a decryption function corresponding to the aggregation result.
Optionally, the encryption precision is the same as the decryption precision.
According to an exemplary embodiment of the present disclosure, a system is provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform a data protection method as described above or a prediction method based on a longitudinal federal learning model as described above.
According to an exemplary embodiment of the present disclosure, a computer-readable storage medium storing instructions is provided, wherein the instructions, when executed by at least one computing device, cause the at least one computing device to perform a data protection method as described above or a prediction method based on a longitudinal federal learning model as described above.
According to the data protection method, the prediction method and the prediction device based on the longitudinal federal learning model, which are disclosed by the embodiment of the invention, the data encryption, decryption and ciphertext operation efficiency can be improved while protecting the data privacy, so that the training efficiency of the longitudinal federal learning tree model can be improved without influencing the model precision.
Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
Drawings
The foregoing and other objects and features of exemplary embodiments of the present disclosure will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate the embodiments by way of example, in which:
FIG. 1 illustrates a flow chart of a data protection method according to an exemplary embodiment of the present disclosure;
FIG. 2 illustrates a flowchart of a training method of a longitudinal federal learning model according to an exemplary embodiment of the present disclosure;
FIG. 3 illustrates a flowchart of a method of constructing a gradient-lifting tree model based on decryption results, according to an exemplary embodiment of the present disclosure;
FIG. 4 illustrates a flowchart of a method of prediction based on a longitudinal federal learning model according to an exemplary embodiment of the present disclosure;
FIG. 5 illustrates a block diagram of a data protection apparatus according to an exemplary embodiment of the present disclosure;
FIG. 6 illustrates a block diagram of a training apparatus of a longitudinal federal learning model according to an exemplary embodiment of the present disclosure;
fig. 7 illustrates a block diagram of a prediction apparatus based on a longitudinal federal learning model according to an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments will be described below in order to explain the present disclosure by referring to the figures.
Fig. 1 illustrates a flowchart of a data protection method according to an exemplary embodiment of the present disclosure. The data protection method is applied to a longitudinal federal learning system.
As an example, the longitudinal federal learning system can include one initiator and at least one participant. The initiator has part of characteristic information and label information of the training sample; the participants each possess part of the characteristic information of the training samples, but do not possess the tag information of the training samples. As an example, the data protection method may be applied to an initiator in a vertical federal learning system.
Referring to fig. 1, in step S101, data to be exchanged for each training sample is acquired.
As an example, the data to be exchanged for each training sample may be gradient information of a loss function corresponding to each training sample. It should be appreciated that the data to be exchanged for each training sample may also be other types of data that need to be exchanged with the participant and that need to be encrypted, as this disclosure is not limited in this regard.
As an example, the loss function of each training sample may be determined according to the label value and the current predicted value (i.e., the predicted label value) of each training sample, and gradient information of the loss function of each training sample may be used as data to be exchanged of each training sample. An exemplary embodiment of obtaining predicted values for training samples will be described in detail below in conjunction with fig. 3.
As an example, the first-order gradient value and the second-order gradient value of the loss function of the respective training samples may be used as data to be exchanged of the respective training samples.
In step S102, an encryption function for homomorphically encrypting the data to be exchanged of the respective training samples is determined based on the data to be exchanged of the respective training samples.
The encryption function involves an encryption precision, which is specified by a user, and a key. The key is determined based on the encryption accuracy and the data to be exchanged for the respective training samples.
As an example, the encryption accuracy is determined based on the number of training samples and the requirement for encryption efficiency.
As an example, the key is used to ensure that the sum of the at least one data to be exchanged can be obtained by means of a sum of homomorphic encryption results of the at least one data to be exchanged and a modulo operation result of the key. That is, the key is used to ensure that additive homomorphic encryption can be achieved by the encryption function.
In other words, the encryption function and its corresponding decryption algorithm are targeted at: for a set p=p 1,p2,…,pk } containing k real numbers, and any subset p j of the set, there are Wherein/>The i-th element in the representation set p j, E, D represent the encryption function and decryption function, respectively.
According to the exemplary embodiments of the present disclosure, the encryption function only supports semi-homomorphic encryption (i.e., addition homomorphic encryption), and does not need to support full homomorphic encryption, thereby achieving the encryption and decryption process and the light weight of ciphertext data.
As an example, the key may be generated by: calculating the product of the data to be exchanged of each training sample and the encryption precision, performing rounding operation on each obtained product, and summing the results obtained by all rounding operations; the key is then generated based on the result of the summation operation.
As an example, the key is not smaller thanTo ensure robustness of encryption. For example, the key may be/>P k represents the data to be exchanged for the kth training sample; m represents the encryption precision; [. Represents the rounding operator; l represents the number of training samples; the absolute value operator is represented by the absolute value.
According to the exemplary embodiment of the present disclosure, the key is determined based on the encryption precision and the data to be exchanged of each training sample, and no large prime number is involved, so compared with a mainstream homomorphic encryption algorithm, such as a paillier homomorphic encryption, the encryption method can effectively reduce the length of ciphertext, and improve the efficiency of data encryption, decryption, and ciphertext operation.
As an example, for the data to be exchanged of each training sample, an encryption function for homomorphically encrypting the data to be exchanged of the training sample may be determined based on the data to be exchanged of the training sample, the key, and the encryption precision.
As an example, the key may be randomized for the data to be exchanged of each training sample, to obtain an encryption function corresponding to the data to be exchanged of the training sample.
As an example, for each training sample of data to be exchanged, the following procedure may be performed: generating a random number for the data to be exchanged of the training sample; and calculating the product of the random number corresponding to the data to be exchanged of the training sample and the secret key, and determining an encryption function corresponding to the data to be exchanged of the training sample based on the product and the data to be exchanged of the training sample.
According to the embodiment of the disclosure, the distribution of homomorphic encryption results can be changed by introducing random numbers, so that the data encryption effect is further improved, and the data security is ensured.
As an example, the encryption function corresponding to the data to be exchanged of each training sample may be obtained based on the product of the random number corresponding to the data to be exchanged of the training sample and the key, and the second function corresponding to the data to be exchanged of the training sample. The second function is determined based on the data to be exchanged of the training sample and the encryption accuracy. For example, when the result of the rounding operation of the product of the data to be exchanged of the training sample and the encryption precision is smaller than 0, the second function corresponding to the data to be exchanged of the training sample may be obtained based on the result of the rounding operation of the product of the data to be exchanged of the training sample and the encryption precision and the secret key. For example, when the result of the rounding operation of the product of the data to be exchanged of the training sample and the encryption precision is greater than or equal to 0, the second function corresponding to the data to be exchanged of the training sample may be obtained based on the result of the rounding operation of the product of the data to be exchanged of the training sample and the encryption precision.
As an example, the encryption function of the data to be exchanged p k for the kth training sample may be: e (p k)=fn([m*pk])+rk x n, where m represents the encryption precision, n represents the key, f n([m*pk) represents a function with respect to [ m x p k ] and n; r k represents a random number generated for p k.
As an example of this, the number of devices,Wherein/>Representing the indication function, f n([m*pk), i.e. the second function above.
As an example, r k is larger than or equal to 1, and is a random positive integer.
As an example, if m=1000, the representative precision remains three bits after the decimal point. Preferably, m is a positive integer power of two.
As an example, the first-order gradient value and the second-order gradient value of the loss function of the respective training samples may be used as data to be exchanged of the respective training samples. In other words, the data to be exchanged for each training sample may include: the first order gradient value and the second order gradient value of the loss function of the training sample. In this case, the first-order gradient value and the second-order gradient value need to be encrypted separately.
As an example, a first encryption function for encrypting the first order gradient values corresponding to the respective training samples may be determined based on the first order gradient values corresponding to the respective training samples, wherein a key of the first encryption function is determined based on the encryption accuracy and the first order gradient values corresponding to the respective training samples.
As an example, a second encryption function for encrypting the second order gradient values corresponding to the respective training samples may be determined based on the second order gradient values corresponding to the respective training samples, wherein a key of the second encryption function is determined based on the encryption accuracy and the second order gradient values corresponding to the respective training samples.
In step S103, homomorphic encryption is performed on the data to be exchanged of each training sample by using the encryption function, so as to obtain homomorphic encryption results corresponding to each training sample, and the homomorphic encryption results corresponding to each training sample are sent to at least one participant in the longitudinal federal learning system.
As an example, the resulting encryption function of the data to be exchanged for each training sample may be used to homomorphically encrypt the data to be exchanged for that training sample.
In step S104, an aggregate result of homomorphic encryption results corresponding to the training samples is received from the at least one participant, and the received aggregate result is decrypted using a decryption function corresponding to the encryption function, so as to obtain a decrypted result.
As an example, the aggregate result may be an accumulated sum of at least one homomorphic encryption result.
As an example, for each received aggregation result, the following procedure may be performed: determining a first function based on the aggregation result and the key; and determining a decryption function corresponding to the aggregation result based on the first function and the decryption precision corresponding to the aggregation result, and decrypting the aggregation result by using the decryption function corresponding to the aggregation result to obtain a decryption result of the aggregation result. The decryption accuracy is determined based on the encryption accuracy. Preferably, the encryption precision may be the same as the decryption precision.
As an example, the aggregation result may be modulo-computed with the key; when the obtained modular operation result is in a first preset range, taking the modular operation result as a first function corresponding to the aggregation result; and when the obtained modular operation result is in a second preset range, calculating a difference value between the modular operation result and the secret key, and taking the difference value as a first function corresponding to the aggregation result. For example, the first preset range and the second preset range may be determined based on the encryption key.
As an example, a ratio of the first function corresponding to the aggregation result to the decryption accuracy may be calculated, and the ratio may be used as the decryption function corresponding to the aggregation result.
As an example, the decryption function corresponding to the encryption function may be: wherein, Representing ciphertext to be decrypted (i.e., the aggregate result); mod represents the modulo operator;
FIG. 2 illustrates a flowchart of a training method of a longitudinal federal learning model according to an exemplary embodiment of the present disclosure. The training method is applied to a longitudinal federal learning system.
As an example, the longitudinal federal learning system can include one initiator and at least one participant. The initiator has part of characteristic information and label information of the training sample; the participants each possess part of the characteristic information of the training samples, but do not possess the tag information of the training samples. As an example, the training method may be applied to an initiator in a vertical federal learning system.
Referring to fig. 2, in step S201, data to be exchanged for each training sample is acquired.
As an example, the loss function of each training sample may be determined according to the label value and the current predicted value of each training sample, and gradient information of the loss function of each training sample may be used as data to be exchanged of each training sample.
As an example, the first-order gradient value and the second-order gradient value of the loss function of the respective training samples may be used as data to be exchanged of the respective training samples.
As an example, the predicted values of one training sample may be initialized
In step S202, an encryption function for homomorphically encrypting the data to be exchanged of the respective training samples is determined based on the data to be exchanged of the respective training samples.
The encryption function involves an encryption precision, which is specified by a user, and a key. The key is determined based on the encryption accuracy and the data to be exchanged for the respective training samples.
In step S203, homomorphic encryption is performed on the data to be exchanged of each training sample by using the encryption function, so as to obtain homomorphic encryption results corresponding to each training sample, and the homomorphic encryption results corresponding to each training sample are sent to at least one participant in the longitudinal federal learning system.
In step S204, an aggregate result of homomorphic encryption results corresponding to the training samples is received from the at least one participant, and the received aggregate result is decrypted using a decryption function corresponding to the encryption function, so as to obtain a decrypted result.
The specific implementation of steps S201 to S204 may refer to the above-mentioned exemplary embodiments of steps S101 to S104, and will not be described herein.
In step S205, a gradient lifting tree model cooperatively trained by the vertical federal learning system is constructed based on the decryption result.
An example embodiment of step S205 will be described below in conjunction with fig. 3.
FIG. 3 illustrates a flowchart of a method of constructing a gradient-lifting tree model based on decryption results, according to an exemplary embodiment of the present disclosure.
Referring to fig. 3, in step S301, for a current tree node to be segmented of a current tree constructed, segmentation gain values of respective candidate segmentation points of the current tree node are calculated based on the decryption result.
In step S302, selecting a candidate segmentation point with the largest segmentation gain value as a segmentation point of the current tree node, dividing the training samples on the current tree node into a left child node and a right child node of the current tree node based on the segmentation point, notifying the at least one participant of the current division result so that the at least one participant obtains a new aggregation result of homomorphic encryption results corresponding to each training sample based on the division result, and returning to execute step S204 until the current tree structure is completed.
As an example, when the cut stop condition is reached, the cut tree node is stopped, and the current tree structure is completed.
As an example, the aggregate result received from each participant may be: the party carries out aggregation processing on homomorphic encryption results corresponding to all the training samples based on the feature data of all the training samples and the division results.
As an example, a gradient histogram and an identification of the gradient histogram may be received from the at least one participant, where each bucket in the gradient histogram corresponds to one aggregated result. For example, a step histogram and a second order gradient histogram may be received from the at least one participant.
As an example, the prediction method may further include: and recording the identification of the gradient histogram corresponding to the segmentation point of the current tree node and the identification of the barrel in the corresponding gradient histogram on a lookup table for storing the gradient lifting tree model structure.
The structure of the lookup table according to the exemplary embodiment of the present disclosure involves only identification information (e.g., number) of data on the lookup table, and does not involve the content of the data. The privacy of the data can be further protected.
In step S303, when the current tree construction is completed, the predicted values of the respective training samples are updated based on the current tree, and step S201 is performed back until all tree constructions of the gradient-lifted tree model are completed.
As an example, the construction of the gradient-lifted tree model may be completed by stopping the construction of a new tree when the gradient-lifted tree model converges or reaches a maximum number of iterations.
As an example, the step of updating the predicted values of the respective training samples based on the current tree may include: for each training sample, updating the predicted value of the training sample according to the weight of the leaf node of the current tree where the training sample is located; wherein the weight of each leaf node of the current tree is determined based on gradient information corresponding to all training samples on that leaf node.
FIG. 4 illustrates a flowchart of a method of prediction based on a longitudinal federal learning model according to an exemplary embodiment of the present disclosure. The prediction method is applied to a longitudinal federal learning system.
As an example, the longitudinal federal learning system can include one initiator and at least one participant. The initiator has part of characteristic information and label information of the sample; the participants each possess part of the characteristic information of the sample, but do not possess the tag information of the sample. As an example, the prediction method may be applied to an initiator in a vertical federal learning system.
In step S401, data to be exchanged of each training sample is acquired.
In step S402, an encryption function for homomorphically encrypting the data to be exchanged of the respective training samples is determined based on the data to be exchanged of the respective training samples.
The encryption function relates to an encryption precision and a key, the key being determined based on the encryption precision and the data to be exchanged for the respective training samples.
In step S403, homomorphic encryption is performed on the data to be exchanged of each training sample by using the encryption function, so as to obtain homomorphic encryption results corresponding to each training sample, and the homomorphic encryption results corresponding to each training sample are sent to at least one participant in the longitudinal federal learning system.
In step S404, an aggregate result of homomorphic encryption results corresponding to the training samples is received from the at least one participant, and the received aggregate result is decrypted using a decryption function corresponding to the encryption function, so as to obtain a decrypted result.
In step S405, a gradient lifting tree model cooperatively trained by the vertical federal learning system is constructed based on the decryption result.
The specific implementation of steps S401 to S405 may refer to the above-described exemplary embodiments of steps S201 to S205, and will not be described herein.
In step S406, the sample to be predicted is predicted using the gradient-lifting tree model.
As an example, the samples to be predicted may be predicted based on a lookup table for storing the gradient-lifting tree model structure.
As an example, step S406 may include: acquiring an identification (e.g., a sample ID) of a sample to be predicted; for a current node to be searched of each tree in the gradient lifting tree model, searching a gradient histogram identifier and a bucket identifier corresponding to a segmentation point of the current node from a lookup table for storing the gradient lifting tree model structure; transmitting the identification of the sample to be predicted, the gradient histogram identification and the bucket identification to a participant in the longitudinal federal learning system corresponding to the gradient histogram identification so as to inquire child nodes of the current node into which the sample to be predicted will enter; receiving a query result from the participant, determining a child node which the sample to be predicted enters according to the query result, taking the entered child node as a current node to be searched, and returning to execute the steps of searching the gradient histogram identification and the bucket identification corresponding to the segmentation point of the current node until the sample to be predicted enters the leaf node of the tree; and determining the predicted value of the sample to be predicted based on the weight of the leaf node where the sample to be predicted is located in each tree.
Fig. 5 shows a block diagram of a data protection apparatus 10 according to an exemplary embodiment of the present disclosure.
The data protection apparatus 10 is applied to a longitudinal federal learning system, and the data protection apparatus 10 includes: an acquisition unit 101, an encryption function determination unit 102, a homomorphic encryption unit 103, a transmission unit 104, a reception unit 105, and a decryption unit 106.
Specifically, the acquisition unit 101 is configured to acquire data to be exchanged for each training sample.
The encryption function determining unit 102 is configured to determine an encryption function for homomorphically encrypting the data to be exchanged of the respective training samples based on the data to be exchanged of the respective training samples.
The encryption function relates to an encryption precision and a key, the key being determined based on the encryption precision and the data to be exchanged for the respective training samples.
The homomorphic encryption unit 103 is configured to obtain homomorphic encryption results corresponding to the training samples by homomorphic encrypting the data to be exchanged of the training samples by using the encryption function.
The sending unit 104 is configured to send homomorphic encryption results corresponding to the respective training samples to at least one participant in the vertical federal learning system.
The receiving unit 105 is configured to receive, from the at least one participant, an aggregate result of homomorphic encryption results corresponding to the respective training samples.
The decryption unit 106 is configured to decrypt the received aggregate result using a decryption function corresponding to the encryption function, resulting in a decrypted result.
As an example, the key may be used to ensure that the sum of the at least one data to be exchanged can be obtained by modulo arithmetic result of the sum of homomorphic encryption results of the at least one data to be exchanged and the key.
As an example, the encryption function determination unit 102 may be configured to: and carrying out randomization processing on the secret key aiming at the data to be exchanged of each training sample to obtain an encryption function corresponding to the data to be exchanged of the training sample.
As an example, the encryption function determination unit 102 may be configured to: the following procedure is executed for the data to be exchanged of each training sample respectively: generating a random number for the data to be exchanged of the training sample; and calculating the product of the random number corresponding to the data to be exchanged of the training sample and the secret key, and determining an encryption function corresponding to the data to be exchanged of the training sample based on the product and the data to be exchanged of the training sample.
As an example, further comprising: a key generation unit (not shown) configured to calculate products of data to be exchanged and the encryption precision for each training sample, respectively, perform rounding operations on each of the obtained products, and sum results obtained by all the rounding operations; the key is generated based on the result of the summation operation.
As an example, the decryption unit 106 may be configured to: the following procedure is performed for each received aggregation result, respectively: determining a first function based on the aggregation result and the key; and determining a decryption function corresponding to the aggregation result based on the first function and the decryption precision corresponding to the aggregation result, and decrypting the aggregation result by using the decryption function corresponding to the aggregation result to obtain a decryption result of the aggregation result, wherein the decryption precision is determined based on the encryption precision.
As an example, the decryption unit 106 may be configured to: performing modular operation on the aggregation result and the secret key; when the obtained modular operation result is in a first preset range, taking the modular operation result as a first function corresponding to the aggregation result; and when the obtained modular operation result is in a second preset range, calculating a difference value between the modular operation result and the secret key, and taking the difference value as a first function corresponding to the aggregation result.
As an example, the decryption unit 106 may be configured to: and calculating the ratio of the first function corresponding to the aggregation result to the decryption precision, and taking the ratio as a decryption function corresponding to the aggregation result.
As an example, the encryption precision may be the same as the decryption precision.
As an example, the acquisition unit 101 may be configured to: and determining the loss function of each training sample according to the label value and the current predicted value of each training sample, and taking gradient information of the loss function of each training sample as data to be exchanged of each training sample.
Fig. 6 illustrates a block diagram of a training apparatus 20 of a longitudinal federal learning model according to an exemplary embodiment of the present disclosure.
The training device 20 includes: an acquisition unit 201, an encryption function determination unit 202, a homomorphic encryption unit 203, a transmission unit 204, a reception unit 205, a decryption unit 206, and a model construction unit 207.
Specifically, the acquisition unit 201 is configured to acquire data to be exchanged for each training sample.
The encryption function determining unit 202 is configured to determine an encryption function for homomorphically encrypting the data to be exchanged of the respective training samples based on the data to be exchanged of the respective training samples.
The encryption function relates to an encryption precision and a key, the key being determined based on the encryption precision and the data to be exchanged for the respective training samples.
The homomorphic encryption unit 203 is configured to obtain homomorphic encryption results corresponding to the training samples by homomorphic encrypting the data to be exchanged of the training samples by using the encryption function.
The sending unit 204 is configured to send homomorphic encryption results corresponding to the respective training samples to at least one participant in the longitudinal federal learning system.
The receiving unit 205 is configured to receive, from the at least one participant, an aggregate result of homomorphic encryption results corresponding to the respective training samples.
The decryption unit 206 is configured to decrypt the received aggregation result using a decryption function corresponding to the encryption function, resulting in a decrypted result.
The model construction unit 207 is configured to construct a gradient-lifting tree model co-trained by the longitudinal federal learning system based on the decryption result.
As an example, the model construction unit 207 may be configured to: aiming at the current tree node to be segmented of the constructed current tree, calculating segmentation gain values of candidate segmentation points of the current tree node based on the decryption result; selecting a candidate segmentation point with the maximum segmentation gain value as a segmentation point of the current tree node, dividing training samples on the current tree node into a left child node and a right child node of the current tree node based on the segmentation point, notifying the at least one participant of the current division result so that the at least one participant obtains a new aggregation result of homomorphic encryption results corresponding to each training sample based on the division result, and receiving the aggregation result of homomorphic encryption results corresponding to each training sample from the at least one participant by a receiving unit 205 until the current tree structure is completed; when the current tree construction is completed, the obtaining unit 201 updates the predicted value of each training sample based on the current tree, and obtains the data to be exchanged of each training sample until all tree constructions of the gradient-lifted tree model are completed.
As an example, the aggregate result received from each participant may be: the party carries out aggregation processing on homomorphic encryption results corresponding to all the training samples based on the feature data of all the training samples and the division results.
As an example, the receiving unit 205 may be configured to: a gradient histogram and an identification of the gradient histogram are received from the at least one participant, wherein each bucket in the gradient histogram corresponds to an aggregate result.
As an example, the prediction apparatus 20 may further include: a look-up table construction unit (not shown) configured to record, on a look-up table storing the gradient lifting tree model structure, an identification of a gradient histogram corresponding to a cut point of the current tree node and an identification of a bucket in the gradient histogram corresponding thereto.
As an example, the acquisition unit 201 may be configured to: and determining the loss function of each training sample according to the label value and the current predicted value of each training sample, and taking gradient information of the loss function of each training sample as data to be exchanged of each training sample.
As an example, the acquisition unit 201 may be configured to: and taking the first-order gradient value and the second-order gradient value of the loss function of each training sample as data to be exchanged of each training sample.
As an example, the encryption function determination unit 202 may be configured to: determining a first encryption function for encrypting the first-order gradient values corresponding to the training samples based on the first-order gradient values corresponding to the training samples, wherein a key of the first encryption function is determined based on the encryption precision and the first-order gradient values corresponding to the training samples; and determining a second encryption function for encrypting the second-order gradient values corresponding to the training samples based on the second-order gradient values corresponding to the training samples, wherein a key of the second encryption function is determined based on the encryption precision and the second-order gradient values corresponding to the training samples.
As an example, the acquisition unit 201 may be configured to: for each training sample, updating the predicted value of the training sample according to the weight of the leaf node of the current tree where the training sample is located; wherein the weight of each leaf node of the current tree is determined based on gradient information corresponding to all training samples on that leaf node.
As an example, the key may be used to ensure that the sum of the at least one data to be exchanged can be obtained by modulo arithmetic result of the sum of homomorphic encryption results of the at least one data to be exchanged and the key.
As an example, the encryption function determination unit 202 may be configured to: and carrying out randomization processing on the secret key aiming at the data to be exchanged of each training sample to obtain an encryption function corresponding to the data to be exchanged of the training sample.
As an example, the encryption function determination unit 202 may be configured to: the following procedure is executed for the data to be exchanged of each training sample respectively: generating a random number for the data to be exchanged of the training sample; and calculating the product of the random number corresponding to the data to be exchanged of the training sample and the secret key, and determining an encryption function corresponding to the data to be exchanged of the training sample based on the product and the data to be exchanged of the training sample.
As an example, further comprising: a key generation unit (not shown) configured to calculate products of data to be exchanged and the encryption precision for each training sample, respectively, perform rounding operations on each of the obtained products, and sum results obtained by all the rounding operations; the key is generated based on the result of the summation operation.
As an example, decryption unit 206 may be configured to: the following procedure is performed for each received aggregation result, respectively: determining a first function based on the aggregation result and the key; and determining a decryption function corresponding to the aggregation result based on the first function and the decryption precision corresponding to the aggregation result, and decrypting the aggregation result by using the decryption function corresponding to the aggregation result to obtain a decryption result of the aggregation result, wherein the decryption precision is determined based on the encryption precision.
As an example, decryption unit 206 may be configured to: performing modular operation on the aggregation result and the secret key; when the obtained modular operation result is in a first preset range, taking the modular operation result as a first function corresponding to the aggregation result; and when the obtained modular operation result is in a second preset range, calculating a difference value between the modular operation result and the secret key, and taking the difference value as a first function corresponding to the aggregation result.
As an example, decryption unit 206 may be configured to: and calculating the ratio of the first function corresponding to the aggregation result to the decryption precision, and taking the ratio as a decryption function corresponding to the aggregation result.
As an example, the encryption precision may be the same as the decryption precision.
Fig. 7 illustrates a block diagram of a prediction apparatus 30 based on a longitudinal federal learning model according to an exemplary embodiment of the present disclosure.
The prediction apparatus 30 includes: an acquisition unit 301, an encryption function determination unit 302, a homomorphic encryption unit 303, a transmission unit 304, a reception unit 305, a decryption unit 306, a model construction unit 307, and a prediction unit 308.
Specifically, the acquisition unit 301 is configured to acquire data to be exchanged for each training sample.
The encryption function determining unit 302 is configured to determine an encryption function for homomorphically encrypting the data to be exchanged of the respective training samples based on the data to be exchanged of the respective training samples.
The encryption function relates to an encryption precision and a key, the key being determined based on the encryption precision and the data to be exchanged for the respective training samples.
The homomorphic encryption unit 303 is configured to obtain homomorphic encryption results corresponding to the training samples by homomorphic encrypting the data to be exchanged of the training samples by using the encryption function.
The sending unit 304 is configured to send homomorphic encryption results corresponding to the respective training samples to at least one participant in the longitudinal federal learning system.
The receiving unit 305 is configured to receive, from the at least one participant, an aggregate result of homomorphic encryption results corresponding to the respective training samples.
The decryption unit 306 is configured to decrypt the received aggregate result using a decryption function corresponding to the encryption function, resulting in a decrypted result.
The model construction unit 307 is configured to construct a gradient-lifting tree model co-trained by the longitudinal federal learning system based on the decryption result.
The prediction unit 308 is configured to predict a sample to be predicted using the gradient-lifting tree model.
As an example, the prediction unit 308 may be configured to: and predicting the sample to be predicted based on the lookup table.
As an example, the specific implementation of the units 301 to 307 may refer to the above-described exemplary embodiments of the units 201 to 207, and will not be described here again.
It should be understood that the specific implementation manners of the training data generating apparatus 10, the image detecting apparatus 20, and the image classifying apparatus 30 according to the exemplary embodiments of the present disclosure may be implemented with reference to the related specific implementation manners described in connection with fig. 1 to 4, and will not be described herein.
The units comprised by the data protection apparatus 10, the training apparatus 20, the prediction apparatus 30 according to the exemplary embodiments of the present disclosure may be configured as software, hardware, firmware or any combination of the above performing specific functions, respectively. For example, these units may correspond to application specific integrated circuits, to pure software code, or to modules of software in combination with hardware. Furthermore, one or more functions implemented by these units may also be performed uniformly by components in a physical entity device (e.g., a processor, a client, a server, or the like).
It should be understood that the data protection method according to the exemplary embodiments of the present disclosure may be implemented by a program recorded on a computer-readable medium, for example, according to the exemplary embodiments of the present disclosure, a computer-readable medium may be provided, on which a program for executing the data protection method as described in the above exemplary embodiments is recorded.
It should be understood that the training method of the longitudinal federal learning model or the prediction method based on the longitudinal federal learning model according to the exemplary embodiments of the present disclosure may be implemented by a program recorded on a computer-readable medium, for example, according to the exemplary embodiments of the present disclosure, a computer-readable medium may be provided, on which the training method of the longitudinal federal learning model or the prediction method based on the longitudinal federal learning model for performing the exemplary embodiments described above is recorded.
The computer program in the above-described computer readable medium may be run in an environment deployed in a computer device such as a client, a host, a proxy device, a server, etc., and it should be noted that the computer program may also be used to perform additional steps other than the above-described steps or to perform more specific processes when the above-described steps are performed, and the contents of these additional steps and further processes have been described with reference to fig. 1 to 4, and will not be repeated here.
It should be noted that the data protection apparatus 10, the training apparatus 20, the prediction apparatus 30 according to the exemplary embodiments of the present disclosure may completely rely on the execution of a computer program to implement the corresponding functions, i.e., each unit corresponds to each step in the functional architecture of the computer program, so that the entire system is called through a specific software package (e.g., lib library) to implement the corresponding functions.
On the other hand, the respective units included in the data protection apparatus 10, the training apparatus 20, the prediction apparatus 30 according to the exemplary embodiments of the present disclosure may also be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium, such as a storage medium, so that the processor can perform the corresponding operations by reading and executing the corresponding program code or code segments.
For example, exemplary embodiments of the present disclosure may also be implemented as a computing device including a storage component and a processor, the storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform a data protection method as described in the above exemplary embodiments.
For example, exemplary embodiments of the present disclosure may also be implemented as a computing device including a storage component and a processor, the storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform a training method of a longitudinal federal learning model or a prediction method based on a longitudinal federal learning model as described in the above exemplary embodiments.
In particular, the computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the above-described set of instructions.
Here, the computing device need not be a single computing device, but may be any device or collection of circuits capable of executing the above-described instructions (or instruction set) alone or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with locally or remotely (e.g., via wireless transmission).
In the computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processor (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
Some operations described in the data protection method according to the exemplary embodiment of the present disclosure may be implemented in software, some operations may be implemented in hardware, and furthermore, the operations may be implemented in a combination of software and hardware.
Some of the operations described in the training method of the longitudinal federal learning model or the prediction method based on the longitudinal federal learning model according to the exemplary embodiments of the present disclosure may be implemented in software, some of the operations may be implemented in hardware, and furthermore, the operations may be implemented in a combination of software and hardware.
The processor may execute instructions or code stored in one of the storage components, wherein the storage component may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The memory component may be integrated with the processor, for example, RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, a storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, network connection, etc., such that the processor is able to read files stored in the storage component.
In addition, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via buses and/or networks.
Operations involved in a data protection method according to exemplary embodiments of the present disclosure may be described as various interconnected or coupled functional blocks or diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operate at non-exact boundaries.
Operations involved in a training method of a longitudinal federal learning model or a prediction method based on the longitudinal federal learning model according to exemplary embodiments of the present disclosure may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operate at non-exact boundaries.
For example, as described above, a computing device according to an exemplary embodiment of the present disclosure may include a storage unit and a processor, wherein the storage unit stores a set of computer-executable instructions that, when executed by the processor, perform a data protection method as described in the above exemplary embodiment.
For example, as described above, a computing device according to an exemplary embodiment of the present disclosure may include a storage unit and a processor, wherein the storage unit stores a set of computer-executable instructions that, when executed by the processor, perform a training method of a longitudinal federal learning model or a prediction method based on the longitudinal federal learning model as described in the above exemplary embodiment.
The foregoing description of exemplary embodiments of the present disclosure has been presented only to be understood as illustrative and not exhaustive, and the present disclosure is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the disclosure. Accordingly, the scope of the present disclosure should be determined by the scope of the claims.
Claims (10)
1. A data protection method, wherein the data protection method is applied to a vertical federal learning system, the data protection method comprising:
Acquiring data to be exchanged of each training sample;
Determining an encryption function for homomorphic encryption of the data to be exchanged of each training sample based on the data to be exchanged of each training sample, wherein the encryption function relates to encryption precision and a key, and the key is determined based on the encryption precision and the data to be exchanged of each training sample;
Homomorphic encryption is carried out on data to be exchanged of each training sample by using the encryption function, homomorphic encryption results corresponding to each training sample are obtained, and the homomorphic encryption results corresponding to each training sample are sent to at least one participant in the longitudinal federal learning system;
And receiving an aggregation result of homomorphic encryption results corresponding to the training samples from the at least one participant, and decrypting the received aggregation result by using a decryption function corresponding to the encryption function to obtain a decryption result.
2. The data protection method according to claim 1, wherein the key is used to ensure that the sum of the at least one data to be exchanged can be obtained by a modulo operation result of the sum of homomorphic encryption results of the at least one data to be exchanged and the key.
3. The data protection method according to claim 1, wherein the step of determining an encryption function for homomorphically encrypting the data to be exchanged of the respective training samples based on the data to be exchanged of the respective training samples includes:
And carrying out randomization processing on the secret key aiming at the data to be exchanged of each training sample to obtain an encryption function corresponding to the data to be exchanged of each training sample.
4. The data protection method as claimed in claim 3, wherein the step of randomizing the key with respect to the data to be exchanged for each training sample to obtain the encryption function corresponding to the data to be exchanged for each training sample includes:
The following procedure is executed for the data to be exchanged of each training sample respectively:
Generating a random number for the data to be exchanged of the training sample;
and calculating the product of the random number corresponding to the data to be exchanged of the training sample and the secret key, and determining an encryption function corresponding to the data to be exchanged of the training sample based on the product and the data to be exchanged of the training sample.
5. The data protection method of any one of claims 1 to 4, further comprising: a key generation process;
The key generation process comprises the following steps:
Calculating the product of the data to be exchanged of each training sample and the encryption precision, performing rounding operation on each obtained product, and summing the results obtained by all rounding operations;
the key is generated based on the result of the summation operation.
6. The data protection method according to any one of claims 1 to 4, wherein the step of decrypting the received aggregation result using a decryption function corresponding to the encryption function to obtain a decryption result includes:
The following procedure is performed for each received aggregation result, respectively:
Determining a first function based on the aggregation result and the key;
determining a decryption function corresponding to the aggregation result based on a first function and decryption precision corresponding to the aggregation result, decrypting the aggregation result by using the decryption function corresponding to the aggregation result to obtain a decryption result of the aggregation result,
Wherein the decryption accuracy is determined based on the encryption accuracy.
7. A prediction method based on a longitudinal federal learning model, the prediction method comprising:
Acquiring data to be exchanged of each training sample;
Determining an encryption function for homomorphic encryption of the data to be exchanged of each training sample based on the data to be exchanged of each training sample, wherein the encryption function relates to encryption precision and a key, and the key is determined based on the encryption precision and the data to be exchanged of each training sample;
Homomorphic encryption is carried out on data to be exchanged of each training sample by using the encryption function, homomorphic encryption results corresponding to each training sample are obtained, and the homomorphic encryption results corresponding to each training sample are sent to at least one participant in a longitudinal federal learning system;
Receiving an aggregation result of homomorphic encryption results corresponding to each training sample from the at least one participant, and decrypting the received aggregation result by using a decryption function corresponding to the encryption function to obtain a decryption result;
Constructing a gradient lifting tree model cooperatively trained by the longitudinal federal learning system based on the decryption result;
And predicting the sample to be predicted by using the gradient lifting tree model.
8. A data protection device for use in a vertical federal learning system, the data protection device comprising:
an acquisition unit configured to acquire data to be exchanged of each training sample;
An encryption function determining unit configured to determine an encryption function for homomorphic encryption of data to be exchanged of the respective training samples based on the data to be exchanged of the respective training samples, wherein the encryption function relates to an encryption precision and a key determined based on the encryption precision and the data to be exchanged of the respective training samples;
The homomorphic encryption unit is configured to obtain homomorphic encryption results corresponding to the training samples by homomorphic encrypting the data to be exchanged of the training samples by using the encryption function;
A transmitting unit configured to transmit homomorphic encryption results corresponding to the respective training samples to at least one participant in the vertical federal learning system;
a receiving unit configured to receive, from the at least one participant, an aggregate result of homomorphic encryption results corresponding to the respective training samples;
And the decryption unit is configured to decrypt the received aggregation result by using a decryption function corresponding to the encryption function to obtain a decryption result.
9. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the data protection method of any one of claims 1 to 6 or the longitudinal federal learning model-based prediction method of claim 7.
10. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the data protection method of any one of claims 1 to 6 or the longitudinal federal learning model-based prediction method of claim 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211527282.1A CN118118164A (en) | 2022-11-30 | 2022-11-30 | Data protection method, prediction method and device based on longitudinal federal learning model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211527282.1A CN118118164A (en) | 2022-11-30 | 2022-11-30 | Data protection method, prediction method and device based on longitudinal federal learning model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118118164A true CN118118164A (en) | 2024-05-31 |
Family
ID=91211191
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211527282.1A Pending CN118118164A (en) | 2022-11-30 | 2022-11-30 | Data protection method, prediction method and device based on longitudinal federal learning model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118118164A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118573471A (en) * | 2024-07-30 | 2024-08-30 | 蚂蚁科技集团股份有限公司 | Method, apparatus, storage medium, device and program product for joint prediction |
-
2022
- 2022-11-30 CN CN202211527282.1A patent/CN118118164A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118573471A (en) * | 2024-07-30 | 2024-08-30 | 蚂蚁科技集团股份有限公司 | Method, apparatus, storage medium, device and program product for joint prediction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11451370B2 (en) | Secure probabilistic analytics using an encrypted analytics matrix | |
US9900147B2 (en) | Homomorphic encryption with optimized homomorphic operations | |
US10608811B2 (en) | Private set intersection encryption techniques | |
CN115485700A (en) | Method and system for federal learning | |
CN110214325B (en) | Method and system for data shielding | |
CN111191255B (en) | Information encryption processing method, server, terminal, device and storage medium | |
CN112073444B (en) | Data set processing method and device and server | |
CN109687952A (en) | Data processing method and its device, electronic device and storage medium | |
KR102308066B1 (en) | Privacy retrieval using low-linear public-key operations | |
CN114329644B (en) | Method, device and storage medium for carrying out encryption simulation on logic system design | |
CN116488814A (en) | FPGA-based data encryption secure computing method | |
EP3975471B1 (en) | Homomorphic encryption device and operating method thereof | |
KR20210152030A (en) | Compression and indeterminate expansion of RLWE ciphertexts | |
CN112149706A (en) | Model training method, device, equipment and medium | |
CN118118164A (en) | Data protection method, prediction method and device based on longitudinal federal learning model | |
CN111931204A (en) | Encryption and de-duplication storage method and terminal equipment for distributed system | |
EP4072062A1 (en) | Apparatus for processing non-polynomial operation on homomorphic encrypted messages and methods thereof | |
US10078492B2 (en) | Generating pseudo-random numbers using cellular automata | |
CN1938740A (en) | Computer system, computer program, and addition method | |
US12169572B2 (en) | Method and apparatus for managing LWE instance | |
US12206757B2 (en) | Efficient random masking of values while maintaining their sign under fully homomorphic encryption (FHE) | |
CN117786751A (en) | Symmetrical searchable encryption method, device, equipment and medium | |
CN113452783B (en) | Digital PAAS open platform system of block chain cloud architecture and implementation method | |
CN117171202A (en) | Data query method and device | |
Gbashi et al. | Privacy Security System for Video Data Transmission in Edge-Fog-cloud Environment. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |