CN116822620B

CN116822620B - Method and device for multi-party joint training model

Info

Publication number: CN116822620B
Application number: CN202310790640.6A
Authority: CN
Inventors: 郑龙飞; 王磊
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Digital Service Technology Co ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2025-08-08
Anticipated expiration: 2043-06-29
Also published as: CN116822620A

Abstract

The embodiment of the specification provides a method and a device for jointly updating a model, which are suitable for a longitudinal federal learning architecture and are used for carrying out proper data interaction based on the technical conception of disorder processing of characteristic members and hiding processing of tag data and predicted data by tag members. Specifically, the fusion tensor of the intermediate results of each feature member is disordered in the sample dimension by a single feature member, and the label member predicts using the disordered fusion tensor. The label member conceals the disordered fusion tensor and the label data in the normal sequence in the disordered data and provides the disordered fusion tensor and the label data in the normal sequence for the single feature member, and the gradient data of the prediction result under the disordered condition is fed back by the label member, so that the label member updates the global model, and gradient information of the fusion data under the disordered condition is determined through gradient reverse transmission and is transmitted to the feature member, and the local model is updated by the feature member. Therefore, the label plaintext is transmitted, the malicious marking of data by label members is avoided, and the data privacy is more effectively protected.

Description

Method and device for multi-party joint training model

Technical Field

One or more embodiments of the present disclosure relate to the field of secure computing technology, and in particular, to a method and apparatus for multi-party joint training models.

Background

With the rapid development of deep learning, artificial intelligence technology is exhibiting its advantages in almost every industry. However, big data driven artificial intelligence presents many difficulties in real world situations. For example, the data islanding phenomenon is serious, the utilization rate is low, and the cost is always high. Single training members of some industries may also have problems with limited data or poor data quality. In addition, due to industry competition, privacy security and complex management procedures, even data integration between different departments of the same company may face huge resistance, and data integration cost is high.

Federal learning is proposed in such a context. Federal learning is a framework based on distributed machine learning, the main idea being to build a machine learning model based on data sets distributed over multiple devices while preventing data leakage. Under this framework, clients (e.g., mobile devices) cooperatively train the model under the coordination of the server, while training data can remain local to the client without uploading the data to a data center as in traditional machine learning methods. In the federal learning process, in order to ensure data privacy, a data confidentiality mode is generally required to be introduced, and reasonable data processing and communication are performed based on the corresponding data confidentiality mode, so that how to process and interact data among the training members, and balance between protecting the data privacy and reducing the communication traffic is an important problem in the federal learning process.

Disclosure of Invention

One or more embodiments of the present specification describe a method, apparatus, and system for multi-party joint training models to address one or more of the problems mentioned in the background.

According to a first aspect, a method for multipartite joint training models is provided, the method is suitable for a scene of longitudinal federal learning by a plurality of training members by using respective local privacy data, the plurality of training members comprise a first member holding tag data and at least one feature member, the models comprise local models corresponding to the feature members respectively, and global models corresponding to the first member, the at least one feature member comprises a second member, the method is executed by the second member, in a current updating period of the models, samples of a current batch are subjected to fusion to obtain fusion tensor ciphertext, wherein a single intermediate result ciphertext is obtained by encrypting corresponding intermediate results by a first key, the single intermediate result ciphertext is obtained by processing local feature data by the local model, the first key is provided by the first member, the fusion tensor ciphertext is obtained by conducting disorder fusion on sample dimensions by the first member through a disorder rule f, the fusion tensor ciphertext is provided for the first member, the first member is used for predicting the first member to obtain a disturbance tensor in a disorder dimension by using a second pair corresponding to the first disorder key, the fusion tensor is obtained by conducting a disturbance label prediction on the first disturbance label in a disorder dimension, the disturbance label is obtained by the fusion tensor, and the disturbance label is obtained by the fusion tensor is respectively based on the fusion tensor, in a disorder label prediction in a disorder dimension, and the disturbance label is obtained by the fusion label prediction in a disorder dimension, the method comprises the steps of predicting a label data, determining a global disorder gradient corresponding to the disorder predicted tensor according to a disorder rule f, comparing the disorder predicted tensor with the disorder predicted tensor, executing an inadvertent transmission protocol with a first member, safely selecting a first sub-disorder gradient at the position k from the global disorder gradient by the first member, enabling the first member to update the global model by utilizing reverse transmission of the first sub-disorder gradient, feeding back a second sub-disorder gradient corresponding to the disorder fusion ciphertext, and updating a local second local model by utilizing a second sub-gradient obtained by sequentially recovering the second sub-disorder gradient in a sample dimension according to the disorder rule f.

In one embodiment, the at least one feature member includes a third member corresponding to a third local model, and the method further includes determining a third sub-gradient of an intermediate tensor corresponding to the third member based on the second sub-gradient and a fusion of the respective intermediate result ciphertext, and providing the third sub-gradient to the third member for the third member to update the third local model with the third sub-gradient.

In one embodiment, the fusion mode for fusing the intermediate result ciphertext corresponding to each feature member comprises one of addition and weighted average.

In one embodiment, the determining the global disorder gradient corresponding to the disorder disturbance prediction tensor based on the disorder rule f, the disturbance label tensor and the disorder disturbance prediction tensor in a comparison mode comprises the steps of carrying out disorder on the disturbance label tensor based on the disorder rule f to obtain a disorder disturbance label tensor, comparing the disorder disturbance label tensor with the disturbance label tensor to obtain model loss, and determining the global disorder gradient according to the model loss and aiming at partial derivatives of each element in the disorder disturbance prediction tensor.

In one embodiment, the order restoration of the second sub-disordered gradient in the sample dimension based on the disordered rule f comprises determining an inverse disordered rule f ^-1 of the disordered rule f and sequentially adjusting the second sub-disordered gradient in the sample dimension by using the inverse disordered rule f ^-1.

In one embodiment, the performing an inadvertent transmission protocol with the first member, the safely selecting by the first member a first sub-out-of-order gradient at position k from the global out-of-order gradient includes sequentially partitioning the global out-of-order gradient into P messages according to the dimension occupied by the predicted tensor in the predicted dimension, and performing an inadvertent transmission protocol with the first member, the selecting by the first member a kth message from the P messages.

According to a second aspect, a method for multi-party joint training models is provided, the method is suitable for a scenario that a plurality of training members utilize respective local privacy data to perform longitudinal federal learning, the plurality of training members comprise first members with tag data and at least one feature member, the models comprise respective local models corresponding to the respective feature members, and global models corresponding to the first members, the at least one feature member comprises second members, the method is executed by the first members, samples of a current batch are subjected to a current update period of the models, the method comprises the steps of obtaining disorder fusion ciphertext from the second members, wherein the disorder fusion ciphertext is obtained by a disorder rule f after fusion of respective intermediate result ciphertext corresponding to the respective feature members through the second members, the single intermediate result ciphertext is obtained by encrypting the respective intermediate result of the local feature data through the local model, the first key is provided by the first member, the disorder fusion ciphertext is decrypted into a disorder fusion tensor by the second key, the second member is used for a current update period of the models, the first member is used for a current batch of samples, the first member is used for a disorder tensor, the disorder tensor is predicted in a disorder label, the disorder label is provided in a position of the first member, the disorder tensor is predicted by the disorder label, and the disorder label is provided for the first member is predicted in a disorder label, and the disorder label is provided for the first tensor, and a disorder label is predicted in a corresponding position, and a disorder label is obtained by the disorder label, the method comprises the steps of predicting tensors by disorder disturbance, feeding back global disorder gradients aiming at the disturbance label tensors by the disturbance label tensors, executing an inadvertent transmission protocol with a second member, safely selecting a first sub disorder gradient at a position k from the global disorder gradients, updating the global model by reverse transmission of the first sub disorder gradient, determining a second sub disorder gradient corresponding to the disorder fusion ciphertext, and providing the second sub disorder gradient for the second member to update a local second local model by the second member based on the disorder rule f and the second sub disorder gradient.

In one embodiment, the first member further corresponds to first feature data and a first local model, and the method further comprises sequentially processing the first feature data of a first batch of samples by using the first local model and an updated global model to obtain a first prediction tensor, determining local model loss based on comparison of first label data corresponding to the first batch of samples and the first prediction tensor, and adjusting undetermined parameters in the first local model towards the direction of reducing the local model loss so as to update the first local model.

In a further embodiment, the first batch sample is consistent with the current batch sample, the intermediate result obtained by processing the first feature data of the current batch by the first local model is a first intermediate result, and the processing the first feature data of the first batch sample sequentially by using the first local model and the updated global model obtains a first prediction tensor by processing the first intermediate result by using the updated global model.

In one embodiment, the second member fuses each intermediate result ciphertext corresponding to each feature member separately in one of a summation and a weighted average.

In one embodiment, the out-of-order disturbance predicted tensor/disturbance label tensor is expanded by randomly generating disturbance data consistent with the P-1 predicted tensor/label data volumes within a range of values corresponding to the predicted values/label values, and arranging the disturbance data and the predicted tensor/label data along a predicted dimension/label dimension to form the out-of-order disturbance predicted tensor/disturbance label tensor, wherein the disturbance data and the predicted tensor/label data are arranged at a position k.

According to a third aspect, a method for multi-party joint training model is provided, and the method is applicable to a scene of longitudinal federal learning by a first member with tag data and a second member with characteristic members, wherein the model comprises a local model corresponding to the second member and a global model corresponding to the first member; the method comprises the steps of processing local characteristic data by a second member to obtain an intermediate result by using the local model in a current updating period of the model, obtaining an disordered intermediate result by using the intermediate result in a sample dimension by using a disordered rule f, providing the disordered intermediate result to a first member, obtaining a disordered predicted tensor by processing the disordered intermediate result by the first member on the basis of the global model, expanding label data in a label dimension by the same multiple of the disordered predicted tensor by the first member to obtain a disordered disturbance predicted tensor and a disturbed label tensor, providing the disordered predicted tensor to the second member, wherein the position k of the disordered predicted tensor in the predicted dimension is consistent with the position k of the label data in the label dimension by using the local model, determining a global disordered gradient corresponding to the disordered predicted tensor by using the disordered disturbance gradient by using a comparison rule f, performing an inadvertent transmission protocol by the first member on the disordered intermediate result, recovering the sample by using the first member from the global gradient by using the first member, recovering the disordered gradient by using the second gradient by using the first member, updating the local model.

According to a fourth aspect, a device for multi-party joint training models is provided, and is suitable for a scenario that a plurality of training members perform longitudinal federal learning by using respective local privacy data, wherein the plurality of training members comprise a first member with tag data and at least one feature member, the models comprise local models respectively corresponding to the feature members and global models corresponding to the first member, the at least one feature member comprises a second member, the device is arranged on the second member and comprises a fusion unit, an out-of-order unit, an acquisition unit, a gradient determination unit, a safety transfer unit and an update unit, and in the current update period of the models, the device aims at samples of a current batch:

the fusion unit is configured to fuse each intermediate result ciphertext corresponding to each characteristic member respectively to obtain a fusion tensor ciphertext, wherein a single intermediate result ciphertext is obtained by encrypting a corresponding intermediate result by a first key, the single intermediate result is a processing result of the single characteristic member processing local characteristic data by a local model, and the first key is provided by the first member;

The disorder unit is configured to obtain a disorder fusion ciphertext through disorder of the fusion tensor ciphertext in a sample dimension through a disorder rule f, and provide the disorder fusion ciphertext to a first member so that the first member can decrypt the disorder fusion ciphertext into a disorder fusion tensor by utilizing a second key corresponding to the first key, and then obtain a disorder prediction tensor based on the processing of the disorder fusion tensor by the global model;

The acquisition unit is configured to acquire an out-of-order disturbance prediction tensor and a disturbance label tensor from a first member, wherein the out-of-order disturbance prediction tensor and the disturbance label tensor are obtained by expanding the out-of-order prediction tensor in a prediction dimension and the label data in a label dimension respectively, and the position k of the out-of-order prediction tensor in the prediction dimension is corresponding to the position k of the label data in the label dimension;

The gradient determining unit is configured to comparatively determine a global disorder gradient corresponding to the disorder disturbance prediction tensor based on the disorder rule f, the disturbance label tensor and the disorder disturbance prediction tensor;

The safety transfer unit is configured to execute an inadvertent transmission protocol with a first member, and the first member safely selects a first sub-disordered gradient at a position k from the global disordered gradients so as to update the global model by using reverse transfer of the first sub-disordered gradient and feed back a second sub-disordered gradient corresponding to the disordered fusion ciphertext;

The updating unit is configured to update the local second local model by using a second sub-gradient obtained by sequentially recovering the second sub-disordered gradient in the sample dimension based on the disordered rule f.

According to a fifth aspect, a device for multi-party joint training models is provided, and is suitable for a scenario that a plurality of training members perform longitudinal federal learning by using respective local privacy data, wherein the plurality of training members comprise a first member with tag data and at least one feature member, the models comprise local models respectively corresponding to the feature members and global models corresponding to the first member, the at least one feature member comprises a second member, the device is arranged on the first member and comprises a communication unit, a decryption unit, a prediction unit, an expansion unit, a security acquisition unit and a gradient transfer unit, and in a current update period of the models, for samples of a current batch:

The communication unit is configured to acquire an out-of-order fusion ciphertext from a second member, wherein the out-of-order fusion ciphertext is obtained by an out-of-order rule f after fusing each intermediate result ciphertext corresponding to each characteristic member respectively by the second member, a single intermediate result is a processing result of the single characteristic member processing local characteristic data by a local model, the single intermediate result ciphertext is obtained by encrypting a corresponding intermediate result by a first key, and the first key is provided by the first member;

The decryption unit is configured to decrypt the out-of-order fusion ciphertext into an out-of-order fusion tensor by using a second key, wherein the second key is used for decrypting the data encrypted by the first key;

the prediction unit is configured to obtain an out-of-order prediction tensor based on the processing of the out-of-order fusion tensor by the global model;

The expansion unit is configured to respectively expand the disorder prediction tensor in a prediction dimension and the label data in a label dimension to obtain a disorder disturbance prediction tensor and a disturbance label tensor to provide the disorder disturbance predicted tensor and the disturbance label tensor to a second member, wherein the position k of the disorder prediction tensor in the disorder disturbance predicted tensor is corresponding to the position k of the label data in the disturbance label tensor, so that the second member feeds back a global disorder gradient aiming at the disorder disturbance label tensor based on a disorder rule f, the disorder disturbance prediction tensor and the disturbance label tensor;

The secure acquisition unit is configured to perform an inadvertent transmission protocol with a second member, securely selecting a first sub-out-of-order gradient at position k from the global out-of-order gradients;

The gradient transfer unit is configured to update the global model and determine a second sub-disordered gradient corresponding to the disordered fusion ciphertext by using reverse transfer of the first sub-disordered gradient;

The communication unit is further configured to provide the second sub-unordered gradient to a second member for the second member to update a local second local model based on the unordered rule f and the second sub-unordered gradient.

According to a sixth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.

According to a seventh aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has executable code stored therein, the processor, when executing the executable code, implementing the method of the first or second aspect.

According to the method and the device provided by the embodiment of the specification, under the longitudinal federal learning architecture, a single feature member provides the fusion tensor of the intermediate results of each feature member to a label member after the sample dimension is out of order. And the label member predicts under the disordered fusion tensor to obtain the disordered prediction tensor. And then, the label member conceals the disorder prediction tensor and the label tensor in the disorder data at the corresponding consistent positions and provides the disorder data to the characteristic member for disorder. And comparing model loss, namely gradient data under the condition of disorder by the out-of-order characteristic members, and safely transmitting the out-of-order gradient data to the first member through the OT protocol. The first member updates the global model using the out-of-order gradient data and determines out-of-order fusion tensor gradient data via reverse transfer of gradients to transfer to the feature member undergoing out-of-order. Thus, the feature members can obtain gradient data of each intermediate result through inverse transfer of the fusion tensor according to the gradient data of the fusion mode, and further, the local model can be updated, and gradient information of the intermediate result is transferred to other feature members except the first member.

Therefore, based on disorder processing of the feature members and hiding processing of the tag members on tag data and predicted data, tag plaintext can be transmitted through proper data interaction, data privacy disclosure caused by malicious marking of the data by the tag members is avoided, and data traffic is reduced while the data privacy is protected.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a system architecture for training members in a vertical federal learning architecture;

FIG. 2 is a schematic diagram of a model architecture for training members in a vertical federal learning architecture;

FIG. 3 illustrates an interactive flow diagram of a single model update cycle of a multi-party joint training model according to one embodiment of the present description;

FIG. 4 illustrates a flow diagram of a multi-party joint training model performed by tag members according to one embodiment of the present disclosure;

FIG. 5 illustrates a flow diagram of a multi-party joint training model performed by out-of-order feature members of one embodiment of the present description;

FIG. 6 illustrates a schematic block diagram of an apparatus for a multi-party joint training model provided to tag members in accordance with one embodiment of the present disclosure;

FIG. 7 illustrates a schematic block diagram of an apparatus of a multi-party joint training model performed by out-of-order feature members in accordance with one embodiment of the present specification.

Detailed Description

The following describes the scheme provided in the present specification with reference to the drawings.

Federal learning (FEDERATED LEARNING), which may also be referred to as federal machine learning, joint learning, federation learning, or the like. Federal machine learning is a machine learning framework that can effectively help multiple institutions perform data usage and machine learning modeling while meeting the requirements of user privacy protection, data security and regulations.

Specifically, assuming enterprise A, enterprise B each builds a task model, a single task may be classified or predicted, and these tasks have also been approved by the respective users when obtaining the data. However, because the data is incomplete, such as enterprise a lacks label data, enterprise B lacks user feature data, or the data is insufficient, the sample size is insufficient to build a good model, and the model at each end may not be built or may not be ideal. The problem to be solved by federal learning is how to build a high-quality model at each end of a and B, the training of the model uses the data of each enterprise of a and B, and the own data of each enterprise is not known by other parties, i.e. a common model is built under the condition of not violating the data privacy regulations. This common model is as if the parties were a better model to aggregate the data together. Thus, the built model serves only the own targets in the area of each party.

Various institutions of federal learning may be referred to as training members (also referred to as parties to data, etc.). Each training member can hold different business data respectively, and can participate in the joint training of the model through equipment, a computer, a server and the like. The service data may be various data such as characters, pictures, voice, animation, video, and the like. Typically, the service data held by each training member has a correlation, and the service party corresponding to each training member may also have a correlation. For example, among a plurality of business parties involved in financial business, the business party 1 is a bank, provides business such as deposit and loan for a user, and can hold data such as balance, loan amount, deposit amount, etc. of the user, and the business party 2 is a shopping website, and holds data such as shopping habits, payment accounts, etc. of the user. As another example, among the plurality of business parties involved in the medical business, each business party may be each hospital, physical examination institution, etc., for example, business party 1 is hospital a, diagnosis results, treatment plan, treatment results, etc., diagnosis records corresponding to user symptoms, diagnosis results, etc., are taken as local business data, and business party 2 may be physical examination institution B, physical examination record data corresponding to user symptoms, physical examination conclusions, etc., etc. A single training member of federal learning may hold business data for one business party or may hold business data for multiple business parties.

Under the federal learning architecture, the model may be trained jointly by two or more data parties. The model herein may be various models for processing the business data to obtain corresponding business processing results, and may also be referred to as a business model. For example, the business data may be data related to finance of the user, the obtained business processing result is a financial credit evaluation result of the user, the business data may be customer service dialogue data of the user, the obtained business processing result is a recommendation result of a customer service answer, and the like. Each training member can respectively utilize the local business model to carry out local data processing on the local business data. The goal of federal learning is to train a model that can better handle these business data, so the federal learning model can also be referred to as a business model.

Federal learning is classified into horizontal federal learning and vertical federal learning. Under the transverse federal learning architecture, the feature coincidence of samples in the sample sets of different training members is high, but the sample sources are different. For example, multiple sample sets correspond to customers of different banks. The data features managed by different banks may be similar, but the customers may be different, so that in the case that each training member holds business data of different banks, the model can be trained by adopting a transverse federal learning mode. Different data sets in longitudinal federal learning have higher id registration (e.g., consistent records such as telephone numbers) but are not characterized as identical. For example, a bank and a hospital facing a user group (such as a resident in a county, a small county, etc.), where a large number of people overlap in a sample of the bank and the hospital, but the characteristics are different, the bank data may correspond to the characteristic information such as deposit, loan, etc., and the hospital data may correspond to the characteristic information such as physiological index, health condition, visit record, etc. The training members respectively hold the business data of banks and hospitals can adopt a longitudinal federal learning mode to jointly train the model in multiple ways.

The technical scheme provided by the specification aims at improving a model training method under a longitudinal federal learning architecture.

Longitudinal federal learning may also be referred to as vertical segmentation learning, split learning, and the like. FIG. 1 illustrates one implementation architecture of longitudinal federal learning. Under this implementation architecture, the feature data of the training samples are distributed among some (usually most) or all training members, and the tag data is usually held by a few (e.g., one) training member. For convenience of description, the present specification may refer to a training member holding feature data as a feature member (such as training member 1, training member 2 in figure 1. The training member holding the tag data is referred to as a tag member (e.g., training member X in fig. 1). Specifically, each feature member may hold part of the feature data of the training sample, such as the first feature data, the second feature data. For a training sample, it may have first feature data corresponding to training member 1, second feature data corresponding to training member 2. The training member X corresponds to the tag data. In particular, training member X may be any one of training member 1, training member 2. That is, in some examples, training member X may hold both partial feature data and tag data.

Further, fig. 2 shows a model architecture diagram of longitudinal federal learning. As shown in fig. 2, for a vertical federal learning scenario, the business model is generally divided into two parts, one part is a local model held by each feature member and used for processing local feature data to obtain an intermediate result, and the other part is a global model held by a tag member and used for processing the intermediate result of each training member to obtain a final global output result (prediction result). Each feature member can provide the intermediate result data obtained after the local model processes the local partial feature data to the tag member, and the tag member can process each intermediate result by using the global model, so as to obtain a global prediction result.

On the other hand, the dashed arrows of fig. 2 also show the back propagation path of the parameter gradients in the model. After the label member obtains the global prediction result (namely global output), the global prediction result can be compared with local label data, so that the current model loss is obtained. The tag members then determine the gradient of each pending parameter in the global model (the partial derivative of the model loss to the pending parameter) in terms of the model loss to update the global model. And, a global model may be utilized to determine gradients of the respective intermediate results. And the tag members feed back the gradients of the intermediate results to the corresponding feature members respectively, and the feature members determine the gradients of the undetermined parameters in the corresponding local model by utilizing the counter-propagation of the gradients of the intermediate results so as to update the undetermined parameters in the local model.

In the case where a training member holds both tag data and partial feature data, the training member can hold both the global model and the local model that processes the local feature data. At this time, the part where the feature data is processed by the local model may be a feature member part, and the part where the global processing is performed by the comparison of the prediction result of the global model and the tag data may be a tag member part. That is, the training member is both a feature member and a tab member. And, the intermediate data obtained by the feature member part can be directly used by the tag member part without communication.

Because of forward intermediate result and backward gradient transmission between the characteristic member and the tag member under the longitudinal federal learning architecture, the transmission data volume is in direct proportion to the sample volume, and the larger the data volume is, the larger the communication transmission volume is. In addition, the calculation in the dense space needs to be carried out a large amount of encryption and decryption calculation, the training speed is low, and the calculation force requirement is high. In addition, in some embodiments, in order to reduce the data traffic, the data may be confused in an out-of-order manner, however, since the plaintext cannot be checked when the secret state calculation is performed, if the training member uses means such as marking the data (e.g. the relationship between the training member with the tag and the sample to mark the tag) to maliciously obtain the private data of the other party, the confusion map (e.g. the out-of-order map) may be leaked because the private data cannot be guaranteed.

In view of this, the present specification provides a new multi-party joint training model technical concept, which is applicable to multi-party security computing architecture for longitudinal federal learning.

Under the technical conception provided by the specification, a feature data holder (a non-label holder, such as training member a) fuses the intermediate result data processed by the features of each party and in disorder of sample dimension, the label holder processes the fused data after disorder to obtain disorder global output, then the training member a is provided with the label data and the global output by hiding the label data and the global output in the corresponding disorder data respectively, and the training member a maintains the sequence consistency of the global output and the label data (such as disorder data) based on disorder rules, so that model loss and disorder gradient of the model loss to the disorder global output are determined. Further, the tag holder can update the global model by out-of-order gradients and determine the gradient (partial derivative) of the model loss to out-of-order fusion data via reverse transfer of the gradient. In this way, the training member a obtains the gradient of the disordered fusion data, and the gradient of the disordered fusion data can be obtained according to the gradient of the disordered fusion data through the disordered reverse operation, so that on one hand, the training member a updates the local model through the reverse transfer of the gradient of the disordered fusion data, and on the other hand, the training member a can provide the gradient data of the corresponding intermediate result to other feature holders (excluding the label holder) according to the gradient of the disordered fusion data, and the other feature holders update the local model respectively.

In particular, in the case where the tag holder is also part of the feature holder at the same time, the tag holder can process the local training samples via a local model, an updated global model (information carrying other party data), determine model loss locally, and consider the global model as a whole (temporarily considering that it is not subject to pending parameters). And determining the gradient of each undetermined parameter in the local model through the gradient reverse transfer of the model loss, and updating each undetermined parameter in the local model so as to update the local model.

Because the tag holder conceals real tag data in disturbance data, multiple data transmission is not needed in the multiparty security calculation process, communication efficiency is improved, tag data received by disordered training members are all clear text, whether the data are marked or not is convenient to check, and the tag members are prevented from being marked maliciously (aiming at acquiring privacy of other members).

It should be clear that reference to a model update in this specification generally refers to an update of the value of a parameter to be determined in the model.

The technical idea of the present specification is described in detail below with reference to fig. 3.

Referring to FIG. 3, an interaction flow of a joint update model in one embodiment is presented. Prior to the description in connection with FIG. 3, the multiparty security computation is described as follows, in which the parties to the joint update model include at least a first member, which is a member holding tag data (denoted as Y _A), and a second member, which is a member holding part of the feature data (denoted as X _B). And, the first member corresponds to the global model M _L, the second member holds the local model M _B, and the second member is used as an out-of-order party, and an out-of-order rule f can be determined at any suitable time.

The above is a basic architecture used by the technical idea of the present specification. In view of the fact that there may be some other situations that may occur, in fig. 3, it is also illustrated that the first member contains the feature data X _A, the local model M _A, and that there are other feature members, such as the third member (corresponding feature data X _C, local model M _C). Where the federal learning architecture also includes other feature members, the first member may also correspond to a public-private key pair (P _k,S_k) of the first key P _k, the second key S _k. The second member may obtain one of the public and private keys from the first member at any suitable time, such as first key P _k (first key P _k may be considered as a public key), and the other feature members may obtain first key P _k from the first member or the second member at any suitable time.

The above possible scenarios and related steps are illustrated in fig. 3 for a comprehensive description of the interactive flow that the architecture of the present specification may implement, and the resulting effects. It should be noted that, based on the above most basic implementation architecture, in the case of only the first member holding the tag data and the second member holding the feature data, in a single model update period, the corresponding technical effects are achieved by at least the related operations described in steps 312, 330, 304 and 306 to 3130, which are shown by solid lines in fig. 3. In other cases, steps shown by corresponding other broken lines are appended. Accordingly, the following description of FIG. 3 is based on the interaction flow illustrated by steps 312, 330, 304, and 306-3130. In addition, the feature data and the label data described in the following steps may be aligned by each training member in advance through a privacy transaction or a trusted third party assisted sorting method (i.e. the samples corresponding to the sample dimension and the arrangement sequence are consistent), and in one training period, one batch of samples, for example, n samples may be processed.

First, in step 312, the second member processes the feature data X _B with the local model M _B to obtain an intermediate result H _B.

The local model M _B may be any machine learning model capable of processing feature data, such as a multi-layer full neural network as shown in fig. 2. Typically, the feature data for a sample is processed into a multidimensional (e.g., m-dimensional) vector, and then the processing result of n pieces of sample data may be expressed as a tensor, e.g., n×m-dimensional tensor, where the processing result may be an intermediate result. It will be appreciated that in the case where the processing result for n pieces of sample data is an n×m-dimensional tensor, each row in the tensor corresponds to one sample, the direction along the tensor row (longitudinal direction) may be referred to as a sample dimension. Similarly, if the intermediate result is described by an mxn-dimensional tensor, the direction (transverse direction) of the tensor column may be referred to as a sample dimension, which is not limited in this specification.

It can be appreciated that in the case of including other feature members (e.g., the first member, the third member, etc.), each feature member may process local feature data through a local model, respectively, to obtain corresponding intermediate results. At this time, since the disorder rule f is held by the second member under the technical idea of the present specification, each feature member can provide the local intermediate result to the second member for fusion. In order not to reveal the data information of each feature member, each feature member may encrypt the local intermediate result, thereby providing an intermediate result ciphertext.

The first key P _k may be provided by the first member. The first member may provide the first key P _k alone or with other data during execution of the flow shown in fig. 3. In the case of being provided separately, it may be provided in advance. The first key P _k may be used in a unified manner for each model update period, or may be updated every predetermined number (e.g., one) of model update periods. In the asymmetric encryption mode of the public-private key pair, the data ciphertext encrypted by the first key P _k can be decrypted by the second key S _k, but the data encrypted by the first key P _k or the second key S _k cannot be decrypted by the first key P _k.

Thus, as shown in fig. 3, in the case that the first member also holds the feature data, the process may further include step 311 and step 321 performed by the first member. In step 311, the first member processes the feature data X _A with the local model M _A to obtain an intermediate result H _A, and encrypts the intermediate result H _A with the first key P _k to obtain a corresponding intermediate result ciphertext < H _A >. The first member then provides an intermediate result ciphertext < H _A > to the second member, via step 321.

On the other hand, in the case where the training member participating in the federal learning further includes a third member other than the first member and the second member as the characteristic member, the flow may further include step 313 and step 323 performed by the third member. In step 313, the third member processes the feature data X _C with the local model M _C to obtain an intermediate result H _C, and encrypts the intermediate result H _C with the first key P _k to obtain a corresponding intermediate result ciphertext < H _C >. The third member then provides an intermediate result ciphertext < H _C > to the second member, via step 323.

The process of the first member or the third member processing the local feature data by using the local model to obtain the intermediate result is similar to that of the second member, and will not be described herein.

Next, in step 330, the second member may disorder the fusion tensor H obtained by fusing the intermediate results corresponding to the feature holders respectively through the disorder rule f, to obtain a disorder fusion tensor.

Where the feature member includes only the second member, the fusion tensor H is the intermediate result H _B, and the intermediate result H _B at this time may be plaintext data. In the case where the feature member also includes other members such as the first member or the third member, the other feature members need to provide the corresponding intermediate result ciphertext to the second member, the fusion result H exists in the form of ciphertext < H >, such as a fusion tensor ciphertext. At this time, before proceeding to step 330, the second member may further encrypt the intermediate result H _B with the first key P _k to obtain the corresponding intermediate result ciphertext < H _B >, and perform homomorphic fusion on each intermediate result ciphertext in a secret state. In the homomorphic encryption scheme, the result of the fusion of each intermediate result ciphertext corresponds to the ciphertext of the result of the fusion of each intermediate result, referred to herein, for example, as fusion tensor ciphertext < H >.

In the process of fusing the multiple intermediate result ciphertexts, the second member may use predetermined fusion rules such as addition, weighted average (including averaging), etc., or may use a conductive neural network such as a linear neural network to perform fusion, which is not limited herein. Taking addition as an example, the fusion tensor is H=H _A+H_B+H_C, and the fusion tensor ciphertext is recorded as < H > = < H _A>+<H_B>+<H_C >.

The disorder rule f is, as the name implies, a mapping rule that disorder the data sequence, for example, the first row as the third row (1→3), the second row as the fifth row (2→5), the third row as the second row (3→2). In one embodiment, the mapping rule may describe a row-out-of-order rule for one tensor by vector description, such as (3,1,4,2) ^T, such that the first row of the new tensor is the third row of the original tensor, the second row of the new tensor is the first row of the original tensor, and so on. In the case where each intermediate result is in the form of a two-dimensional tensor, a single row/column corresponds to a single sample, then the direction perpendicular to the single row, i.e., longitudinal/transverse, is the sample dimension.

The second member may perform an out-of-order operation in the sample dimension according to the out-of-order rule f, thereby out-of-order the fusion tensor H. Wherein the disorder of the fusion tensor H in the presence of other feature members than the second member may be achieved via the disorder of the fusion tensor ciphertext < H >. In this way, the vector corresponding to a single sample can be kept unchanged, and other parties cannot correspond fusion data (such as the fusion tensor after disorder or ciphertext thereof) to the sample according to the sample sequence, so that fusion information of the single sample cannot be revealed after disorder. The out-of-order fusion tensor may be denoted as H _f, for example, and the out-of-order fusion tensor ciphertext may be denoted as < H _f >.

Next, at step 340, the second member may provide the first member with the out-of-order fusion tensor H _f, or the out-of-order fusion tensor ciphertext < H _f >.

In the case where the second member provided the out-of-order fusion tensor ciphertext < H _f >, the first member may decrypt < H _f > with the second key S _k to obtain the out-of-order fusion tensor H _f, via step 350.

Further, the first member may process the out-of-order fusion tensor H _f with the global model M _L, resulting in an out-of-order prediction tensor y _f, via step 360. The out-of-order prediction tensor y _f is referred to herein because the order of the predictions in the sample dimension is out of order relative to the initial sample order. I.e. the predicted result y _f is out of order in the sample dimension according to the out-of-order rule f, which is described herein by the subscript f.

It will be appreciated that in a binary scenario, the predicted tensor may be a vector having the same dimension as the number of samples in the current batch (e.g., n), where a single element in the vector corresponds to a single sample, and in a multitasking scenario, the predicted tensor may be a two-dimensional tensor (i.e., tensor), corresponding to a sample dimension and a predicted dimension, where on a single row/column of the sample dimension, it is a multitasking predicted value for a single sample. Since the prediction result corresponds to the tag data, the tag data may also have a similar structure to the prediction result, corresponding to the sample dimension and the tag dimension. Here, in the case where the prediction result is denoted as the out-of-order prediction tensor Y _f, the tag data may be denoted as the tag tensor Y _A.

Because the label tensors Y _A are arranged in the initial order of the current lot samples, and the out-of-order predicted tensor Y _f is out-of-order in the sample dimension, and the first member does not know the out-of-order rule, the first party cannot directly compare the label tensors Y _A with the out-of-order predicted tensor Y _f to determine the model loss during the model update process.

To this end, under the technical concept of the present specification, the first member generates scrambling data in the other predicted dimension or label dimension by keeping the sample dimension unchanged for both the label tensor Y _A and the disorder predicted tensor Y _f, and provides the label tensor Y _A and the disorder predicted tensor Y _f hidden in the corresponding scrambling data to the holder of the scrambling rule, such as the second member, and the second member performs the comparison of the label tensor and the predicted tensor based on the disorder rule f.

Under this concept, in step 370 and step 380, the first member expands the out-of-order predicted tensor Y _f and the label tensor Y _A in the predicted dimension or the label dimension by the disturbance data, and expands the out-of-order predicted tensor Y _fp and the disturbance label tensor Y _Ap to the second member, respectively.

Here, the subscript P indicates that the volume is expanded by P times (P may be an integer greater than 1), or the dimension of the tag dimension/prediction dimension (P is greater than the dimension of the tag tensor in the tag dimension). For example, the original label tensor Y _A is a column vector, which is expanded by P column vectors, and the P-1 column vectors can be randomly sampled over an optional range of values (e.g., 0 or 1). The two-dimensional tensors expanded by the out-of-order predicted tensor Y _f and the label tensor Y _A are respectively denoted as an out-of-order disturbance predicted tensor Y _fp and a disturbance label tensor Y _Ap. The out-of-order prediction tensor Y _f is extended in the prediction dimension, the label tensor Y _A is extended in the label dimension, and the dimension of the sample dimension after extension remains unchanged.

The relative position of the out-of-order predicted tensor Y _f in the predicted dimension of the out-of-order disturbance predicted tensor Y _fp should remain correspondingly consistent with the relative position of the tag data Y _A in the tag dimension of the disturbance tag tensor Y _Ap, as noted as position k. In one embodiment, the position k may be a determined value, e.g., describing a relative position, and the out-of-order disturbance prediction tensor Y _fp and the disturbance label tensor Y _Ap may each be considered P units according to the out-of-order prediction tensor Y _f or the label tensor Y _A volume size, each unit corresponding to a relative position, where the relative position k represents the kth unit of the P units. For example, if the label tensor Y _A is an n×3 two-dimensional tensor, the expanded disturbance label tensor Y _Ap is an n×3P two-dimensional tensor, and the label tensor Y _A occupies the positions from 3 (k-1) +1=3k-2 to 3k (total 3 columns, volume n×3). in another embodiment, the out-of-order perturbation predicted tensor Y _fp and the perturbation tag tensor Y _Ap are not integer multiples of the out-of-order predicted tensor Y _f, the tag tensor Y _A, then k may represent a range or a plurality of discrete positions of the predicted dimension/tag dimension, such as column 3 to column 5, or column 3, the, column 6 two discrete positions.

The first member expands the predicted tensor Y _f, the label tensor Y _A in the predicted dimension or the label dimension by the disturbance data, and provides the out-of-order disturbance predicted tensor Y _fp and the disturbance label tensor Y _Ap to the second member, which may be sequentially performed (e.g., expanding and transmitting the predicted data first and then expanding and transmitting the label data), concurrently performed, and obfuscated (e.g., expanding the out-of-order predicted tensor Y _f and the label tensor Y _A sequentially and then transmitting the expansion result together, etc.), which is not limited in this specification. Writing step 370, step 380 together also indicates that the two do not explicitly define timing. In fact, the operations of step 370, step 380 regarding the first member expanding the label tensor and providing the expanded disturbing data to the second member may be performed at any suitable time before step 390, such as, for example, performed in advance in sample batches, etc., which are not limited herein.

Next, at step 390, the second member compares the gap of the out-of-order disturbance prediction tensor Y _fp and the disturbance label tensor Y _Ap in the out-of-order situation based on the out-of-order rule f, determining the model loss and the global out-of-order gradient G _f of the out-of-order disturbance label tensor at the current model loss.

It will be appreciated by those skilled in the art that gradient-dependent methods (e.g., gradient descent, newton, etc.) can generally be employed during model training to adjust the undetermined parameters in the model in a direction that reduces model loss. Therefore, in order to obtain the frontal gradient data of the undetermined parameters, the model loss can be determined by comparing the label data with the prediction result, and the gradient information of each undetermined parameter is mined in a determination mode of the model loss, so that the undetermined parameters are adjusted according to the gradient. Under the technical conception of the present specification, the data according to which the first member obtains the prediction result via the global model is a fusion result after disorder, and therefore, when the pending parameters in the global model are adjusted, gradient data in the disorder situation should be relied on.

In some alternative implementations, the second member may process the disturbance label tensor Y _Ap by the disorder rule f to obtain the disorder disturbance label tensor Y _Afp. At this time, the out-of-order perturbation tag tensor Y _Afp and the out-of-order perturbation predicted tensor Y _fp are correspondingly consistent in the sample dimension, and therefore, the second member may determine the global out-of-order gradient G _fp in the out-of-order condition based on the comparison of the out-of-order perturbation tag tensor Y _Afp and the out-of-order perturbation predicted tensor Y _fp.

In other alternative implementations, the second member may further process the disorder disturbance predicted tensor Y _fp through the inverse operation of the disorder rule f (e.g. denoted as f ^-1) to obtain a disturbance predicted tensor Y _p, where the disturbance predicted tensor Y _p is consistent with each element in the disturbance label tensor Y _Ap in a one-to-one correspondence according to the sample order, so that the second member may determine the global gradient G _P based on the comparison between the disturbance label tensor Y _Ap and the disturbance predicted tensor Y _p, and then process the global gradient G _P through the disorder rule f to obtain the global disorder gradient G _fp under the disorder condition.

The global disorder gradient G _fp here contains not only the gradient information of each element of the disturbance prediction tensor y _p under the disorder condition, but also the gradient information of other disturbance elements which are expanded. Since the manner in which model loss is determined (e.g., the loss function) may be negotiated in advance by each training member and may be disclosed, it is highly likely that the disorder rule f will be revealed to the first member if the second member directly provides the global disorder gradient G _f to the first member. To preserve data privacy, in step 3100, the first member and the second member may perform an inadvertent transmission (oblivious transfer, OT for short) protocol, the first sub-out-of-order gradient G _f at position k being securely obtained by the first member from the global out-of-order gradient G _fp of the second member. Wherein the first sub-unordered gradient G _f contains elements in the local unordered gradient G _fp at positions corresponding to the predicted tensor (e.g., the kth position of the predicted dimension).

The OT protocol can solve the problem of one party getting any one of the N messages from the other party without letting the other party perceive which message it gets, i.e. 1-out-of-N. In this step 3100, the number of messages N may be P. If k represents a position or a position unit, that is, the second member may divide the global disordered gradient G _f into P messages according to the dimension of the predicted tensor, and execute the OT protocol with the first member, so that the first party selects the kth from the P messages, that is, the first sub disordered gradient G _f. If k represents a location range or a plurality of scattered locations, the first member and the second member may sequentially acquire gradient vectors of the respective locations by executing the OT protocol multiple times, to form a first sub-out-of-order gradient G _f.

Thus, through step 3110, the first member may determine, on the one hand, gradients of respective undetermined parameters in the global model M _L by inverse transfer of the first sub-out-of-order gradient G _f in the global model M _L, thereby updating the global model M _L, and on the other hand, determine that the out-of-order fusion result H _f or the out-of-order fusion ciphertext < H _f > corresponds to the second sub-out-of-order gradient G _s.

It will be appreciated that the gradient of the parameter, i.e. the model loss, is relative to the partial derivative of the parameter. The input value may be considered a fixed value during the determination of the gradient of the parameter to be determined, such as a model y=wx, and the input value x may be considered a fixed value during the determination of the gradient of the parameter to be determined, such as the product of the model Loss to the partial derivative of y and the partial derivative of y to w (e.g. x). While gradient data between the multi-layer neural networks is reverse transitive. If the first layer is y ₁＝w_a x and the second layer is y ₂＝w_by₁, then the gradient of w _a is the product of the partial derivative of model Loss to y ₁ and the partial derivative x of y ₁ to w _a. Whereas the partial derivative of model Loss to y ₁ may be determined by the product of the partial derivative of model Loss to y ₂ and the partial derivative of y ₂ to y ₁, w _b.

Thus, in one aspect, the first member may determine each pending parameter in the global model M _L to determine a corresponding gradient, respectively, based on the inverse transfer of the first sub-out-of-order gradient G _f in the local global model M _L. These gradient data may be updated by the first member based on gradient descent methods or the like for each of the pending parameters in the global model M _L, e.g., the gradient of the pending parameter w is δ, at step λ to w=w- λδ. On the other hand, in the reverse pass to the last layer (i.e., the first layer of the global model M _L) model, the undetermined parameters may be considered as fixed values, thereby determining the gradient corresponding to the out-of-order fusion result H _f or the out-of-order fusion ciphertext < H _f >, e.g., denoted as the second sub out-of-order gradient G _s.

The first member may provide a second sub-out-of-order gradient G _s to the second member, via step 3120.

Further, at step 3130, the second member may update the local model M _B based on the disorder rule f and the second sub-disorder gradient G _s.

It will be appreciated that the second sub-out-of-order gradient G _s is in the order of the sample dimension with respect to the respective intermediate result, and thus, in the case of processing the second sub-out-of-order gradient G _s by the inverse of the out-of-order rule f (e.g., f ^-1), gradient data of the fusion tensor of the current period may be determined, as noted for the second sub-gradient G _H.

In the case where no other member than the second member (e.g., the first member, the third member) is present as the feature member, the fusion tensor H is the intermediate result H _B obtained by processing the local feature data X _B by the local model M _B of the second member, and the second sub-gradient G _H is the gradient of the intermediate result H _B. In this manner, the second member may determine the gradient G _B of each pending parameter in the local model M _B using the inverse transfer of the second sub-gradient G _H in the local model M _B, thereby updating the local model M _B using methods such as the gradient descent method described previously.

In the case that members other than the second member (e.g., the first member and the third member) are present as feature members, the fusion tensor H is a fusion result of each intermediate result (e.g., H _A、H_B、H_C) corresponding to each feature member, and the second member may determine gradient data corresponding to each intermediate result ciphertext (i.e., gradient data corresponding to each intermediate result, such as G _B、G_C) according to a fusion manner (e.g., addition, weighted average, etc.) of the second sub-gradient G _H and each intermediate result ciphertext. For example, in a simple additive fusion approach, the second sub-gradient G _H is equal to the gradient of each intermediate result ciphertext. Thus, the second member may update the local model M _B with the gradient G _B of the intermediate result H _B on the one hand, and may provide the third sub-gradient G _C to the third member through step 3140 after the second member obtains the third sub-gradient G _C of the intermediate result ciphertext < H _C > for other feature members, such as the third member. Then, the third member updates the local model M _C with the third sub-gradient G _C, via step 3150.

It is noted that, for the first member, if it holds the feature data X _A as a feature member at the same time, the second party passes the gradient data G _A of the intermediate result H _A to the first member, it is highly likely that the out-of-order rule f is leaked. Therefore, the local model M _A cannot be updated by the second member directly by passing the gradient data G _A to the first member. For this reason, the present description provides a solution considering that the updated global model M _L contains information of other feature members, and updates the local model M _A by using the updated global model M _L as known information.

Specifically, as shown in step 3160, the first member may sequentially process the feature data X _A 'of the local batch of samples using the local model M _A and the updated global model M _L to obtain a predicted tensor Y _A', determine model loss based on the comparison between the predicted tensor Y _A 'and the label tensor Y _A', and reversely determine the gradient of the model loss with respect to each of the undetermined parameters in the local model M _A by using the model parameters in the global model M _L as known parameters, thereby updating the local model M _A.

It should be noted that, here, the feature data X _A 'and the tag data Y _A' are applied to samples used when updating the local model M _A, and these samples may or may not be identical to samples used by the current model period in combination with other training members, and the number of samples may or may not be as large as the number of samples, which is not limited herein. In the case that the samples used are consistent, in step 3160, the intermediate result H _A may be directly processed using the updated global model M _L to obtain the predicted tensor Y _A ', and the model loss is determined by comparing the predicted tensor Y _A' with the label tensor Y _A, which is not described herein.

In this manner, each training member may complete model updating if a predetermined condition is satisfied via iteration of the model update flow of the plurality of model update cycles. The predetermined condition herein may include, for example, at least one of loss function convergence, gradient convergence, convergence of undetermined parameters, model accuracy reaching a predetermined value, number of model update cycles reaching a predetermined number of cycles, and so forth.

FIG. 3 depicts operations performed by individual training members from the perspective of multiple training member interactions. As can be seen from fig. 3, under the improved implementation architecture of the present specification, the intermediate results are fused and disordered by the second member, and the plaintext data is transferred by the first member through expanding and disturbing the prediction result and the tag data, so that on the basis of protecting the data privacy, the tag holder is prevented from marking the tag data, and the privacy of other members is maliciously obtained.

Fig. 4 and 5 depict the flow of a single training member from the perspective of the first member and the second member, respectively, performed during one update period of the model (e.g., referred to as the current update period) for a current batch of samples.

As shown in fig. 4, the first member may perform the following procedure:

Step 401, obtaining out-of-order fusion ciphertext from a second member;

The method comprises the steps that a disorder fusion ciphertext is obtained by a disorder rule f after fusion of intermediate result ciphertexts corresponding to each characteristic member respectively through a second member, wherein a single intermediate result is a processing result of the single characteristic member processing local characteristic data through a local model, the single intermediate result ciphertext is obtained by encrypting corresponding intermediate results through a first key, and the first key is provided by the first member;

In the case where there are a plurality of feature members, the second member may be any one of the plurality of feature members other than the first member;

Step 402, decrypting the out-of-order fusion ciphertext into an out-of-order fusion tensor by using the second key;

the second key is used for decrypting the data encrypted by the first key;

step 403, obtaining an out-of-order prediction tensor based on the processing of the out-of-order fusion tensor by the global model;

Step 404, respectively expanding the disorder prediction tensor in the prediction dimension and the label data in the label dimension to obtain a disorder disturbance prediction tensor and a disturbance label tensor, and providing the disorder disturbance prediction tensor and the disturbance label tensor for a second member;

The position k of the disorder predicted tensor in the disorder disturbance predicted tensor is correspondingly consistent with the position k of the tag data in the disturbance tag tensor, so that the second member feeds back the global disorder gradient aiming at the disorder disturbance tag tensor based on the disorder rule f, the disorder disturbance predicted tensor and the disturbance tag tensor;

Step 405, performing an inadvertent transmission protocol with a second member, safely selecting a first sub-out-of-order gradient at position k from the global out-of-order gradients;

step 406, updating the global model and determining a second sub-disordered gradient corresponding to the disordered fusion ciphertext by using reverse transfer of the first sub-disordered gradient;

step 407 provides the second sub-out-of-order gradient to the second member for the second member to update the local second local model based on the out-of-order rule f and the second sub-out-of-order gradient.

In some possible architectures, the first member further corresponds to first feature data and a first local model, and the process executed by the first member further comprises the steps of sequentially processing the first feature data of the first batch of samples by using the first local model and the updated global model to obtain a first prediction tensor, determining local model loss based on comparison between first tag data corresponding to the first batch of samples and the first prediction tensor, and adjusting undetermined parameters in the first local model towards the direction of reducing the local model loss so as to update the first local model. The first batch sample may or may not be consistent with the current batch sample, and in case of consistency, the first member may directly process an intermediate result (e.g. H _A in fig. 3) corresponding to the current batch sample by using the updated global model to obtain a first predicted tensor, and determine a model loss based on comparing the first predicted tensor with the tag data Y _A.

As shown in fig. 5, the second member may perform the following procedure:

step 501, fusing each intermediate result ciphertext corresponding to each characteristic member to obtain a fused tensor ciphertext;

the single intermediate result ciphertext is obtained by encrypting a corresponding intermediate result by a first key, the single intermediate result is a processing result of processing local characteristic data by a single characteristic member through a local model, and the first key is provided by the first member;

Step 502, the fusion tensor ciphertext is disordered in the sample dimension through a disorder rule f to obtain a disorder fusion ciphertext, and the disorder fusion ciphertext is provided for a first member;

In this way, after decrypting the disordered fusion ciphertext into the disordered fusion tensor by the first member by using the second key corresponding to the first key, obtaining the disordered prediction tensor based on the processing of the disordered fusion tensor by the global model;

Step 503, obtaining an out-of-order disturbance prediction tensor and a disturbance label tensor from the first member;

The disorder disturbance prediction tensor and the disturbance label tensor are obtained by expanding the disorder prediction tensor in a prediction dimension and the label data in a label dimension respectively, and the position k of the disorder prediction tensor in the prediction dimension is correspondingly consistent with the position k of the label data in the label dimension;

Step 504, determining a global disorder gradient corresponding to the disorder disturbance prediction tensor by comparison based on the disorder rule f, the disturbance label tensor and the disorder disturbance prediction tensor;

step 505, performing an inadvertent transmission protocol with the first member, the first member safely selecting a first sub-out-of-order gradient at position k from the global out-of-order gradients;

the first member can update the global model by utilizing the reverse transfer of the first sub-disordered gradient and feed back the second sub-disordered gradient corresponding to the disordered fusion ciphertext;

step 506, updating the local second local model by using the second sub-gradient obtained by sequentially recovering the second sub-disordered gradient in the sample dimension based on the disordered rule f.

In some possible embodiments, the feature members further include a third member corresponding to a third local model, and the above process may further include:

determining a third sub-gradient of the intermediate tensor corresponding to the third member based on the second sub-gradient and the fusion mode of each intermediate result ciphertext;

The third sub-gradient is provided to the third member for the third member to update the third local model with the third sub-gradient.

Reviewing the above, in longitudinal federal learning with multiple training members in combination, a single feature member provides a fused tensor of intermediate results for each feature member to a tag member after sample dimension disorder. And the label member predicts under the disordered fusion tensor to obtain the disordered prediction tensor. And then, the label member conceals the disorder prediction tensor and the label tensor in the disorder data at the corresponding consistent positions and provides the disorder data to the characteristic member for disorder. And comparing model loss, namely gradient data under the condition of disorder by the out-of-order characteristic members, and safely transmitting the out-of-order gradient data to the first member through the OT protocol. The first member updates the global model using the out-of-order gradient data and determines out-of-order fusion tensor gradient data via reverse transfer of gradients to transfer to the feature member undergoing out-of-order. Thus, the feature members can obtain gradient data of each intermediate result through inverse transfer of the fusion tensor according to the gradient data of the fusion mode, and further, the local model can be updated, and gradient information of the intermediate result is transferred to other feature members except the first member.

In the case that the tag member still holds the feature data, due to the privacy protection requirement of the disorder rule, it cannot directly acquire gradient information of the intermediate result from the feature member, but uses the local model and the connection of the global model by taking the undetermined parameters in the local global model as known data, and uses the local sample to update the local model.

Therefore, based on disorder processing of the feature members and hiding processing of the tag members on tag data and predicted data, tag plaintext can be transmitted through proper data interaction, the tag members are prevented from maliciously marking the data, and data privacy is effectively protected.

According to an embodiment of another aspect, a device for jointly updating a model is further provided, and the device is suitable for a scenario that a plurality of training members perform longitudinal federal learning by using respective local privacy data. Specifically, the first member and the second member shown in fig. 3 may be provided, respectively. Fig. 6 and 7 illustrate an apparatus 600, 700, respectively, for locating a joint update model of a tag member, a feature member, according to one embodiment.

As shown in fig. 6, the apparatus 600 provided for the joint update model of tag members may include a communication unit 601, a decryption unit 602, a prediction unit 603, an expansion unit 604, a security acquisition unit 605, and a gradient transfer unit 606.

At the current update period of the model, for the samples of the current batch:

The communication unit 601 is configured to obtain an out-of-order fusion ciphertext from a second member, where the out-of-order fusion ciphertext is obtained by an out-of-order rule f after fusing each intermediate result ciphertext corresponding to each feature member respectively by the second member, where a single intermediate result is a processing result of processing local feature data by a single feature member through a local model, the single intermediate result ciphertext is obtained by encrypting a corresponding intermediate result by a first key, and the first key is provided by the first member;

the decryption unit 602 is configured to decrypt the out-of-order fusion ciphertext into an out-of-order fusion tensor using a second key, the second key being used to decrypt the data encrypted by the first key;

the prediction unit 603 is configured to obtain an out-of-order prediction tensor based on the processing of the out-of-order fusion tensor by the global model;

The expansion unit 604 is configured to respectively expand the disorder prediction tensor in the prediction dimension and the label data in the label dimension to obtain a disorder disturbance prediction tensor and a disturbance label tensor, and provide the disorder prediction tensor and the disturbance label tensor to the second member, wherein the position k of the disorder prediction tensor in the disorder disturbance prediction tensor is consistent with the position k of the label data in the disturbance label tensor correspondingly, so that the second member feeds back the global disorder gradient aiming at the disorder disturbance label tensor based on the disorder rule f, the disorder disturbance prediction tensor and the disturbance label tensor;

the secure acquisition unit 605 is configured to perform an inadvertent transmission protocol with the second member, securely selecting a first sub-out-of-order gradient at position k from the global out-of-order gradients;

the gradient transfer unit 606 is configured to update the global model and determine a second sub-disordered gradient corresponding to the disordered fusion ciphertext using reverse transfer of the first sub-disordered gradient;

the communication unit 601 is further configured to provide a second sub-out-of-order gradient to the second member for the second member to update the local second local model based on the out-of-order rule f and the second sub-out-of-order gradient.

In the case that the first member further corresponds to the first feature data and the first local model, the apparatus 600 may further include an updating unit (not shown) configured to sequentially process the first feature data of the first batch sample using the first local model and the updated global model to obtain a first predicted tensor, determine a local model loss based on a comparison between the first tag data corresponding to the first batch sample and the first predicted tensor, and adjust the undetermined parameter in the first local model in a direction in which the local model loss is reduced, thereby updating the first local model. The first batch sample and the current batch sample may or may not be consistent, and in case of the consistency, the updating unit may directly process an intermediate result (such as H _A in fig. 3) corresponding to the current batch sample by using the updated global model, obtain a first predicted tensor, and determine a model loss based on comparing the first predicted tensor with the tag data Y _A.

As shown in fig. 7, the apparatus 700 provided in the joint update model of tag members may include a fusion unit 701, an out-of-order unit 702, an acquisition unit 703, a gradient determination unit 704, a secure transfer unit 705, and an update unit 706.

The fusion unit 701 is configured to fuse each intermediate result ciphertext corresponding to each feature member respectively to obtain a fusion tensor ciphertext, wherein a single intermediate result ciphertext is obtained by encrypting a corresponding intermediate result by a first key, the single intermediate result is a processing result of the single feature member processing local feature data by a local model, and the first key is provided by the first member;

The disorder unit 702 is configured to disorder the fused tensor ciphertext in a sample dimension via a disorder rule f to obtain a disorder fused ciphertext, and provide the disorder fused ciphertext to the first member, so that the first member decrypts the disorder fused ciphertext into a disorder fused tensor by using a second key corresponding to the first key, and then obtains a disorder predicted tensor based on the processing of the disorder fused tensor by the global model;

The obtaining unit 703 is configured to obtain an out-of-order disturbance prediction tensor and a disturbance label tensor from the first member, wherein the out-of-order disturbance prediction tensor and the disturbance label tensor are obtained by expanding the out-of-order prediction tensor in a prediction dimension and the label data in a label dimension respectively, and the position k of the out-of-order prediction tensor in the prediction dimension is corresponding to the position k of the label data in the label dimension;

The gradient determining unit 704 is configured to comparatively determine a global disorder gradient corresponding to the disorder disturbance prediction tensor based on the disorder rule f, the disturbance label tensor, and the disorder disturbance prediction tensor;

The secure transfer unit 705 is configured to execute an inadvertent transmission protocol with the first member, and the first member securely selects a first sub-disordered gradient at the position k from the global disordered gradients, so that the first member can update the global model by using the reverse transfer of the first sub-disordered gradient, and feed back a second sub-disordered gradient corresponding to the disordered fusion ciphertext;

the updating unit 706 is configured to update the local second local model with a second sub-gradient obtained by sequentially recovering the second sub-disordered gradient in the sample dimension based on the disordered rule f.

In some possible embodiments, the feature members further comprise a third member, corresponding to a third local model, and the apparatus 700 may further comprise a providing unit (not shown) configured to determine a third sub-gradient of the intermediate tensor corresponding to the third member based on the second sub-gradient and the fusion of the respective intermediate result ciphertext, and provide the third sub-gradient to the third member for the third member to update the third local model with the third sub-gradient.

It should be noted that, the apparatuses 600 and 700 shown in fig. 6 and fig. 7 correspond to the method embodiments shown in fig. 4 and fig. 5, and may be applied to the first member and the second member in the interaction flow shown in fig. 3, so as to complete the model updating flow shown in fig. 3 in cooperation with each other. Accordingly, the descriptions related to the first member and the second member in fig. 3, and the descriptions corresponding to the method embodiments shown in fig. 4 and fig. 5 may be adapted to the apparatuses 600 and 700 shown in fig. 6 and fig. 7, respectively, and are not repeated herein.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 4,5, etc.

According to an embodiment of yet another aspect, there is also provided a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, implements the method described in connection with fig. 4, 5, etc.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present disclosure may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-described specific embodiments are used for further describing the technical concept of the present disclosure in detail, and it should be understood that the above description is only specific embodiments of the technical concept of the present disclosure, and is not intended to limit the scope of the technical concept of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical scheme of the embodiment of the present disclosure should be included in the scope of the technical concept of the present disclosure.

Claims

1. A method of multi-party joint training model, suitable for a scenario in which a plurality of training members perform longitudinal federal learning by using respective local privacy data, the plurality of training members including a first member holding tag data and at least one feature member, the model including respective local models to which respective feature members correspond, and a global model to which the first member corresponds, the at least one feature member including a second member, the method being performed by the second member for a current batch of samples during a current update period of the model, the method comprising:

Fusing each intermediate result ciphertext corresponding to each characteristic member respectively to obtain a fused tensor ciphertext, wherein a single intermediate result ciphertext is obtained by encrypting a corresponding intermediate result by a first key, the single intermediate result is a processing result of the single characteristic member for processing local characteristic data by a local model, and the first key is provided by the first member;

the fusion tensor ciphertext is disordered in a sample dimension through a disorder rule f to obtain a disorder fusion ciphertext, and the disorder fusion ciphertext is provided for a first member to obtain a disorder prediction tensor based on the processing of the global model after the first member decrypts the disorder fusion ciphertext into the disorder fusion tensor by using a second key corresponding to the first key;

Obtaining an out-of-order disturbance prediction tensor and a disturbance label tensor from a first member, wherein the out-of-order disturbance prediction tensor and the disturbance label tensor are obtained by expanding the out-of-order prediction tensor in a prediction dimension and the label data in a label dimension respectively, and the position k of the out-of-order prediction tensor in the prediction dimension is correspondingly consistent with the position k of the label data in the label dimension;

Based on the disorder rule f, the disturbance label tensor and the disorder disturbance prediction tensor, comparing and determining a global disorder gradient corresponding to the disorder disturbance prediction tensor;

Executing an inadvertent transmission protocol with a first member, safely selecting a first sub-disordered gradient at a position k from the global disordered gradient by the first member so as to enable the first member to update the global model by utilizing reverse transmission of the first sub-disordered gradient, and feeding back a second sub-disordered gradient corresponding to the disordered fusion ciphertext;

and updating a local second local model by using a second sub-gradient obtained by sequentially recovering the second sub-disordered gradient in the sample dimension based on the disordered rule f.

2. The method of claim 1, wherein the at least one feature member comprises a third member corresponding to a third local model, the method further comprising:

Determining a third sub-gradient of the intermediate tensor corresponding to a third member based on the second sub-gradient and the fusion mode of each intermediate result ciphertext;

Providing the third sub-gradient to a third member for the third member to update the third local model with the third sub-gradient.

3. The method of claim 1, wherein the fusing means for fusing each intermediate result ciphertext for each feature member comprises one of addition and weighted averaging.

4. The method of claim 1, wherein the comparatively determining a global disorder gradient corresponding to the disorder perturbation prediction tensor based on the disorder rule f, the perturbation tag tensor, the disorder perturbation prediction tensor comprises:

the disturbance label tensor is disordered based on the disorder rule f, so that the disorder disturbance label tensor is obtained;

comparing the disorder disturbance label tensor with the disturbance label tensor to obtain model loss;

And determining the global out-of-order gradient according to the model loss for partial derivatives of each element in the out-of-order disturbance prediction tensor.

5. The method of claim 1, wherein the sequentially recovering the second sub-unordered gradient in a sample dimension based on the unordered rule f comprises:

Determining an inverse disorder rule f ^-1 of the disorder rule f;

And sequentially adjusting the second sub-disordered gradient in the sample dimension by using the reverse disordered rule f ^-1.

6. The method of claim 1, wherein the performing an inadvertent transmission protocol with the first member, the safely selecting, by the first member, a first sub-out-of-order gradient at position k from the global out-of-order gradient comprises:

Dividing the global disordered gradient into P messages in sequence according to the dimension occupied by the prediction tensor in the prediction dimension;

an unintentional transport protocol is performed with the first member, and a kth message is selected from the P messages by the first member.

7. A method of multi-party joint training model, suitable for a scenario in which a plurality of training members perform longitudinal federal learning by using respective local privacy data, the plurality of training members including a first member holding tag data and at least one feature member, the model including respective local models to which respective feature members correspond respectively, and a global model to which the first member corresponds, the at least one feature member including a second member, the method being performed by the first member for a current batch of samples during a current update period of the model, the method comprising:

Obtaining out-of-order fused ciphertext from a second member, wherein, the disorder fused ciphertext is obtained by fusing each intermediate result ciphertext corresponding to each characteristic member respectively through a second member and then disorder by a disorder rule f, the single intermediate result is a processing result of processing local characteristic data by a single characteristic member through a local model, a single intermediate result ciphertext is obtained by encrypting a corresponding intermediate result by a first key, and the first key is provided by the first member;

Decrypting the out-of-order fusion ciphertext into an out-of-order fusion tensor by using a second key, wherein the second key is used for decrypting the data encrypted by the first key;

obtaining an out-of-order prediction tensor based on the processing of the out-of-order fusion tensor by the global model;

Respectively expanding the disorder predicted tensor in a predicted dimension and the label data in a label dimension to obtain a disorder disturbance predicted tensor and a disturbance label tensor, and providing the disorder predicted tensor and the disturbance label tensor to a second member, wherein the position k of the disorder predicted tensor in the disorder disturbance predicted tensor is correspondingly consistent with the position k of the label data in the disturbance label tensor, so that the second member feeds back a global disorder gradient aiming at the disorder disturbance label tensor based on a disorder rule f, the disorder disturbance predicted tensor and the disturbance label tensor;

performing an inadvertent transmission protocol with a second member, safely selecting a first sub-out-of-order gradient at position k from the global out-of-order gradients;

updating the global model and determining a second sub-disordered gradient corresponding to the disordered fusion ciphertext by using reverse transfer of the first sub-disordered gradient;

The second sub-unordered gradient is provided to a second member for the second member to update a local second local model based on the unordered rule f and the second sub-unordered gradient.

8. The method of claim 7, wherein the first member further corresponds to first feature data and a first local model, the method further comprising:

Sequentially processing the first characteristic data of the first batch of samples by using the first local model and the updated global model to obtain a first prediction tensor;

determining a local model loss based on a comparison of first tag data corresponding to the first batch of samples and the first predicted tensor;

and adjusting undetermined parameters in the first local model towards the direction of reduced local model loss so as to update the first local model.

9. The method of claim 8, wherein the first batch of samples is consistent with the current batch of samples, the intermediate result obtained by processing the first feature data of the current batch by the first local model is a first intermediate result, and the sequentially processing the first feature data of the first batch of samples by using the first local model and the updated global model is to obtain a first predicted tensor:

and processing the first intermediate result by using the updated global model to obtain the first prediction tensor.

10. The method of claim 7, wherein the second member's fusion of the respective intermediate result ciphertexts corresponding to the respective characteristic members includes one of addition and weighted averaging.

11. The method of claim 7, wherein the out-of-order perturbation prediction tensor/perturbation tag tensor is extended by:

randomly generating disturbance data consistent with the P-1 predicted tensor/label data volume in a value range corresponding to the predicted value/label value;

And arranging the disturbance data and the predicted tensor/label data along a predicted dimension/label dimension to form the out-of-order disturbance predicted tensor/disturbance label tensor, wherein the disturbance data and the predicted tensor/label data are arranged at a position k.

12. A method for multi-party joint training model is suitable for a scene of longitudinal federal learning by a first member with tag data and a second member with characteristic members, the model comprises a local model corresponding to the second member and a global model corresponding to the first member, and in the current updating period of the model, aiming at samples of a current batch, the method comprises the following steps:

the second member processes the local feature data by using the local model to obtain an intermediate result;

The intermediate result is disordered in a sample dimension through a disorder rule f to obtain a disorder intermediate result, and the disorder intermediate result is provided for a first member;

the first member obtains an out-of-order prediction tensor based on the processing of the out-of-order intermediate result by the global model;

The first member expands the disorder prediction tensor in the prediction dimension and the label data in the label dimension by the same multiple to respectively obtain a disorder disturbance prediction tensor and a disturbance label tensor to be provided for the second member, wherein the position k of the disorder prediction tensor in the prediction dimension is correspondingly consistent with the position k of the label data in the label dimension;

A second member determines a global disorder gradient corresponding to the disorder disturbance predicted tensor in a comparison mode based on the disorder rule f, the disturbance label tensor and the disorder disturbance predicted tensor;

The first member and the second member execute an inadvertent transmission protocol, so that the first member safely selects a first sub-disordered gradient at a position k from the global disordered gradient, updates the global model by utilizing reverse transmission of the first sub-disordered gradient, and feeds back a second sub-disordered gradient corresponding to the disordered intermediate result;

and updating the local model by using a second sub-gradient obtained by sequentially recovering the second sub-disordered gradient in the sample dimension based on the disordered rule f.

13. The device is suitable for a scene of longitudinal federal learning by a plurality of training members by utilizing respective local privacy data, wherein the plurality of training members comprise a first member with tag data and at least one characteristic member, the model comprises local models corresponding to the characteristic members and global models corresponding to the first member, the at least one characteristic member comprises a second member, and the device is arranged on the second member and comprises a fusion unit, an out-of-order unit, an acquisition unit, a gradient determination unit, a safety transmission unit and an updating unit, and in the current updating period of the model, aiming at samples of the current batch:

14. The device is suitable for a scene of longitudinal federal learning by a plurality of training members by utilizing respective local privacy data, wherein the plurality of training members comprise a first member with tag data and at least one characteristic member, the model comprises local models corresponding to the characteristic members and global models corresponding to the first member, the at least one characteristic member comprises a second member, the device is arranged on the first member and comprises a communication unit, a decryption unit, a prediction unit, an expansion unit, a safety acquisition unit and a gradient transfer unit, and in the current update period of the model, the device aims at samples of the current batch:

15. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-11.

16. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1-11.