Background
Deep Neural Networks (DNNs) have demonstrated powerful capabilities in many areas of computer vision, speech recognition, natural language processing, and the like. Today, attempts are being made to run neural networks on mobile devices. However, the success of DNN is highly dependent on the complexity of the model structure, i.e. high computational requirements, whereas mobile devices are distinguished by a lack of computational power and are therefore not suitable for high energy DNN operations. There are two possible solutions. A first solution is to reduce the complexity and runtime of the DNN using compression techniques and then deploy the compressed DNN directly on the mobile device. However, each compression technique may only be applicable to a specific neural network layer and only select for high inference accuracy, without considering various resource constraints of different platforms. A second solution is to upload raw data (images, video, etc.) to a cloud server and then perform DNN inference in the cloud. However, collecting private data to a user violates the privacy of the user.
From the perspective of resource limitation and user privacy protection of a mobile device, a mainstream method is to cut a neural network into two parts, one part is run on the mobile device to process privacy data to obtain an intermediate layer feature, and then the intermediate layer feature is uploaded to a cloud terminal to perform subsequent operations, which is called as an end cloud framework. However, prior work has demonstrated that intermediate representations may lead to sensitive information leakage. For example, an attacker can infer private attributes in the user's private data, and worse still, the attacker can even steal the user's private data by training a neural network to reverse the middle layer features.
To solve the problem of privacy leakage of the intermediate layer features, there are three main solutions: differential privacy, homomorphic encryption, and countermeasure training. The differential privacy mechanism guarantees privacy by adding noise to model parameters, intermediate layer features, model prediction results, or objective functions. Although the differential privacy mechanism has strong privacy guarantee theory proof, how effective the attack defense is in practice is unknown, and serious precision loss is brought. Homomorphic encryption protects private data of a user through an encryption algorithm, however homomorphic encryption is not well suited for non-linear operations, and current papers either approximate an activation function Sigmoid non-linear activation function using taylor expansion, or divide each neuron into linear and non-linear parts and implement them separately on non-conspiracy parties. However, for DNNs with many non-linear computations, the homomorphic encryption algorithm has too high computational complexity and requires high resource consumption, and therefore, is not suitable for a resource-constrained scenario such as an end cloud framework. The countertraining method can be generally expressed as a game of antagonism between a deep neural network DNN network and an attacking neural network, and these antagonism studies usually simulate an attacking network by solving a very small maximum problem to achieve a balance between privacy and accuracy of the middle-layer features.
Patent document CN111445005A (application number: CN202010115498.1) discloses a neural network control method and a reinforcement learning system based on reinforcement learning. In the invention, the action network determines the state control quantity according to the order and delay of the controlled object or the mechanism model thereof, and the controlled object receives the state control quantity output action value sent by the action network; and the estimation network evaluates the comparison between the current control effect and a preset target based on the output action value, adds random disturbance and model change in the exploration process of the controlled object or the mechanism model thereof, and updates the action network and the estimation network simultaneously to obtain a control law.
Besides the above disadvantages, most of the existing privacy protection methods only consider two indexes of task precision and privacy, and therefore cannot be directly applied to an end cloud framework with limited mobile end resources.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a reinforcement learning end cloud framework neural network generation method and system.
The method for generating the end cloud framework neural network based on reinforcement learning provided by the invention comprises the following steps:
step 1: using the vector to represent an initial neural network structure and serving as a state space for reinforcement learning;
step 2: inputting the vector into a bidirectional long-short term memory network (LSTM) to obtain a cutting action and a compression action;
and step 3: updating the structure of the initial neural network according to the cutting action and the compression action, training a new neural network and calculating the precision, privacy and resource consumption of the new neural network;
and 4, step 4: calculating rewards according to precision, privacy and resource consumption and updating the LSTM until convergence;
and 5: and deploying the neural network obtained after convergence to a mobile phone end and a cloud end for a user to carry out neural network reasoning.
Preferably, the step 1 comprises:
the initial neural network structure is represented using vectors, each layer of the neural network is a five-dimensional vector < l, k, s, p, n >, where l represents the layer's class, k represents the size of the convolution kernel, s represents the size of the stride in the convolutional layer, p represents the size of the padding, and n represents the output dimension of the layer.
Preferably, the step 2 comprises:
step 2.1: inputting the vector into a first bidirectional LSTM neural network to obtain cutting actions, including the number of layers for cutting the neural network;
step 2.2: the vectors are input into a second bi-directional LSTM neural network, and compression actions are obtained, including the compression method to be taken for each layer.
Preferably, the step 3 comprises:
step 3.1: dividing the neural network into two parts according to the cutting action, wherein the first half part runs at a mobile phone end, original data is processed to obtain the characteristics of the middle layer, the characteristics of the middle layer are uploaded to a cloud end, and the second half part runs at the cloud end to obtain the characteristics of the middle layer for subsequent operation;
step 3.2: compressing the neural network part operated at the mobile phone end according to the cutting action to obtain a new neural network structure;
step 3.3: training the newly generated neural network structure, and calculating the precision on the original task;
step 3.4: the privacy of the middle layer characteristics is measured by using a neural network as an attacker, the input data are the middle layer characteristics, the label is the original data or the privacy attributes in the data, for attribute reasoning attack, the privacy is measured by using the accuracy of privacy attribute reasoning, and the higher the accuracy is, the lower the privacy is represented; for data reconstruction attack, the similarity between reconstructed data and original data is used for measuring privacy, and the more similar the data, the lower the privacy;
step 3.5: and calculating the energy consumption of the part running on the neural network of the mobile phone end, wherein the energy consumption comprises the parameters of the neural network of the mobile phone end, the product accumulation operation, the MAC and the time delay, and the time delay comprises the time consumed by running on the mobile phone end, the time required by uploading the characteristics of the middle layer to the cloud end and the time consumed by running on the cloud end.
Preferably, the step 4 comprises:
step 4.1: calculating a reward R according to the precision A, the privacy P and the resource consumption S, wherein the formula is as follows:
wherein A isbaseIs the accuracy of the initial neural network;
for attribute reasoning attacks, P is the accuracy of reasoning with privacy attributes; for data reconstruction attacks, P is the similarity of the reconstructed data and the original data;
for the parameter quantity, the parameter quantity of the initial model is S
baseThe parameter quantity of the model of the new neural network generated according to the cutting action and the compression action and operated at the mobile phone end is S
1Then, then
For multiply-accumulate operation and MAC, the MAC of the initial model is M
baseThe MAC of the neural network running at the mobile phone end is M
1Then, then
For time delay, the time that the initial model is completely operated at the mobile phone end is T
baseThe time consumed by the operation of the new neural network generated according to the cutting action and the compression action at the mobile phone end is T
eThe time required for uploading the intermediate layer characteristics to the cloud is T
tThe time consumed by running in the cloud is T
cThen, then
Step 4.2: the LSTM parameters are updated using a policy gradient algorithm until LSTM converges.
The invention provides a reinforcement learning-based end cloud framework neural network generation system, which comprises:
module M1: using the vector to represent an initial neural network structure and serving as a state space for reinforcement learning;
module M2: inputting the vector into a bidirectional long-short term memory network (LSTM) to obtain a cutting action and a compression action;
module M3: updating the structure of the initial neural network according to the cutting action and the compression action, training a new neural network and calculating the precision, privacy and resource consumption of the new neural network;
module M4: calculating rewards according to precision, privacy and resource consumption and updating the LSTM until convergence;
module M5: and deploying the neural network obtained after convergence to a mobile phone end and a cloud end for a user to carry out neural network reasoning.
Preferably, the module M1 includes:
the initial neural network structure is represented using vectors, each layer of the neural network is a five-dimensional vector < l, k, s, p, n >, where l represents the layer's class, k represents the size of the convolution kernel, s represents the size of the stride in the convolutional layer, p represents the size of the padding, and n represents the output dimension of the layer.
Preferably, the module M2 includes:
module M2.1: inputting the vector into a first bidirectional LSTM neural network to obtain cutting actions, including the number of layers for cutting the neural network;
module M2.2: the vectors are input into a second bi-directional LSTM neural network, and compression actions are obtained, including the compression method to be taken for each layer.
Preferably, the module M3 includes:
module M3.1: dividing the neural network into two parts according to the cutting action, wherein the first half part runs at a mobile phone end, original data is processed to obtain the characteristics of the middle layer, the characteristics of the middle layer are uploaded to a cloud end, and the second half part runs at the cloud end to obtain the characteristics of the middle layer for subsequent operation;
module M3.2: compressing the neural network part operated at the mobile phone end according to the cutting action to obtain a new neural network structure;
module M3.3: training the newly generated neural network structure, and calculating the precision on the original task;
module M3.4: the privacy of the middle layer characteristics is measured by using a neural network as an attacker, the input data are the middle layer characteristics, the label is the original data or the privacy attributes in the data, for attribute reasoning attack, the privacy is measured by using the accuracy of privacy attribute reasoning, and the higher the accuracy is, the lower the privacy is represented; for data reconstruction attack, the similarity between reconstructed data and original data is used for measuring privacy, and the more similar the data, the lower the privacy;
module M3.5: and calculating the energy consumption of the part running on the neural network of the mobile phone end, wherein the energy consumption comprises the parameters of the neural network of the mobile phone end, the product accumulation operation, the MAC and the time delay, and the time delay comprises the time consumed by running on the mobile phone end, the time required by uploading the characteristics of the middle layer to the cloud end and the time consumed by running on the cloud end.
Preferably, the module M4 includes:
module M4.1: calculating a reward R according to the precision A, the privacy P and the resource consumption S, wherein the formula is as follows:
wherein A isbaseIs the accuracy of the initial neural network;
for attribute reasoning attacks, P is the accuracy of reasoning with privacy attributes; for data reconstruction attacks, P is the similarity of the reconstructed data and the original data;
for the parameter quantity, the parameter quantity of the initial model is S
baseThe parameters of the model of the new neural network generated according to the cutting action and the compression action and operated at the mobile phone endAmount is S
1Then, then
For multiply-accumulate operation and MAC, the MAC of the initial model is M
baseThe MAC of the neural network running at the mobile phone end is M
1Then, then
For time delay, the time that the initial model is completely operated at the mobile phone end is T
baseThe time consumed by the operation of the new neural network generated according to the cutting action and the compression action at the mobile phone end is T
eThe time required for uploading the intermediate layer characteristics to the cloud is T
tThe time consumed by running in the cloud is T
cThen, then
Module M4.2: the LSTM parameters are updated using a policy gradient algorithm until LSTM converges.
Compared with the prior art, the invention has the following beneficial effects:
(1) the neural network structure generated by the method can simultaneously meet the requirements of high task precision, high privacy of intermediate layer characteristics and low loss of mobile terminal resources;
(2) the algorithm of the invention only needs to input the original neural network neural structure, so that the optimal neural network structure deployed under the end cloud card rolling frame can be obtained, and the neural network structure does not need to be designed manually;
(3) the algorithm of the invention has good migratability, can migrate between different data sets and different initial neural networks, does not need to train from beginning, and has less energy consumption.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Example (b):
taking a neural network convolutional layer VGG11 as an example, the invention provides a terminal cloud framework neural network generation method based on reinforcement learning, which relates to the steps of acquiring state representation in reinforcement learning according to an original neural network structure, acquiring actions in a corresponding state, calculating rewards according to the actions, and then updating a controller in reinforcement learning until convergence;
specifically, as shown in fig. 1, the method comprises the following steps:
step S1: using the vector to represent an initial neural network structure as a state space for reinforcement learning;
step S2: inputting the vector into a bidirectional LSTM to obtain a cutting action and a compression action;
step S3: changing the structure of the initial neural network according to the cutting action and the compression action, and training a new neural network to calculate the precision, the privacy and the resource consumption;
step S4: calculating rewards according to precision, privacy and resource consumption and updating the LSTM until convergence;
step S5: and deploying the neural network obtained after convergence to a mobile phone end and a cloud end for a user to execute a neural network reasoning stage.
The step S1 includes:
step S101: the initial neural network structure is represented using vectors, each layer of the neural network being a five-dimensional vector < l, k, s, p, n >. Where l represents the layer class, k represents the size of the convolution kernel, s represents the size of stride in the convolutional layer, p represents the size of padding, and n represents the output dimension of the layer. For the VGG11 structure shown in fig. 2, the active layer and the pooling layer are removed, and the types of the different layers are respectively set as follows: [ convolutional layer: 1, a pooling layer: 2, full connection layer: 8], then the vector corresponding to VGG11 is represented as:
[[1,3,1,64,1],
[2,2,2,0,0],
[1,3,1,128,1],
[2,2,2,0,0],
[1,3,1,256,1],
[1,3,1,256,1],
[2,2,2,0,0],
[1,3,1,512,1],
[1,3,1,512,1],
[2,2,2,0,0],
[1,3,1,512,1],
[1,3,1,512,1],
[2,2,2,0,0],
[8,1,1,0,0]]
the step S2 includes:
step S201: inputting the vector into a first bidirectional LSTM neural network to obtain a cutting action, namely cutting the neural network at which layer;
step S202: inputting the vector into a second bidirectional LSTM neural network to obtain a compression action, namely a compression method to be adopted by each layer;
the cut LSTM configuration is shown in fig. 3 and the compressed LSTM configuration is shown in fig. 4. HiIs a hidden state corresponding to the ith layer, apRepresents a cutting layer, ac;iRepresents the compression algorithm corresponding to the ith layer, and the compression algorithm has 6 possible choices including MobileNet, MobileNet V2, SqueezeNet,Prung, FilterPrung and uncompressed.
The step S3 includes:
step S301: the neural network is divided into two parts according to the cutting action. The front half part runs at a mobile phone end, original data are processed to obtain middle layer characteristics, and then the middle layer characteristics are uploaded to a cloud. The latter half runs in the cloud, and the characteristics of the middle layer are obtained for subsequent operation. As shown in fig. 5.
Step S302: and according to the cutting action, executing a compression corresponding compression algorithm on the model operated at the mobile phone end to obtain a new neural network structure.
Step S303: and training the newly generated neural network, and calculating the precision on the original task.
Step S304: a neural network is designed to serve as an attacker to measure the privacy of the middle layer characteristics, the input data are the middle layer characteristics, and the labels are original data or privacy attributes in the data. For attribute reasoning attack, the privacy is measured by the accuracy of privacy attribute reasoning, and the higher the accuracy, the lower the privacy. For data reconstruction attacks, the similarity between reconstructed data and original data is used to measure privacy, and the more similar the data, the lower the privacy.
Step M305: calculating the energy consumption of the model running at the mobile phone end, wherein the energy consumption comprises three choices, namely parameter quantity of a mobile terminal neural network, multiplication and addition operation sum (MAC) and time delay, wherein the time delay comprises time consumed by running at the mobile phone end, time required by uploading the characteristics of the middle layer to the cloud end, and time consumed by running at the cloud end.
The step S4 includes:
step S401: calculating a reward R according to the precision A, the privacy P and the resource consumption S, wherein the formula is as follows:
wherein A is
baseIs the accuracy of the initial neural network; for attribute reasoning attacks, P is the accuracy of reasoning with privacy attributes. For data reconstruction attacks, P is the number of reconstructionsAccording to the similarity with the original data; s has three choices, and for the parameter quantity, the parameter quantity of the initial model is assumed to be S
baseThe parameter quantity of the model of the new neural network generated according to the cutting action and the compression action and operated at the mobile phone end is S
1Then, then
For multiply-add operation sum (MAC), assume the MAC of the initial model is M
baseThe MAC of the neural network running at the mobile phone end is M
1Then, then
For time delay, the time for which the initial model is completely operated at the mobile phone end is assumed to be T
baseThe time consumed by the operation of the new neural network generated according to the cutting action and the compression action at the mobile phone end is T
eThe time required for uploading the intermediate layer characteristics to the cloud is T
tThe time consumed by running in the cloud is T
cThen, then
Step S402: updating LSTM parameters by using PolicyGradient algorithm, and repeating the steps from S1 to S401 until LSTM converges.
The invention provides a reinforcement learning-based end cloud framework neural network generation system, which comprises: module M1: using the vector to represent an initial neural network structure and serving as a state space for reinforcement learning; module M2: inputting the vector into a bidirectional long-short term memory network (LSTM) to obtain a cutting action and a compression action; module M3: updating the structure of the initial neural network according to the cutting action and the compression action, training a new neural network and calculating the precision, privacy and resource consumption of the new neural network; module M4: calculating rewards according to precision, privacy and resource consumption and updating the LSTM until convergence; module M5: and deploying the neural network obtained after convergence to a mobile phone end and a cloud end for a user to carry out neural network reasoning.
The module M1 includes: the initial neural network structure is represented using vectors, each layer of the neural network is a five-dimensional vector < l, k, s, p, n >, where l represents the layer's class, k represents the size of the convolution kernel, s represents the size of the stride in the convolutional layer, p represents the size of the padding, and n represents the output dimension of the layer.
The module M2 includes: module M2.1: inputting the vector into a first bidirectional LSTM neural network to obtain cutting actions, including the number of layers for cutting the neural network; module M2.2: the vectors are input into a second bi-directional LSTM neural network, and compression actions are obtained, including the compression method to be taken for each layer.
The module M3 includes: module M3.1: dividing the neural network into two parts according to the cutting action, wherein the first half part runs at a mobile phone end, original data is processed to obtain the characteristics of the middle layer, the characteristics of the middle layer are uploaded to a cloud end, and the second half part runs at the cloud end to obtain the characteristics of the middle layer for subsequent operation; module M3.2: compressing the neural network part operated at the mobile phone end according to the cutting action to obtain a new neural network structure; module M3.3: training the newly generated neural network structure, and calculating the precision on the original task; module M3.4: the privacy of the middle layer characteristics is measured by using a neural network as an attacker, the input data are the middle layer characteristics, the label is the original data or the privacy attributes in the data, for attribute reasoning attack, the privacy is measured by using the accuracy of privacy attribute reasoning, and the higher the accuracy is, the lower the privacy is represented; for data reconstruction attack, the similarity between reconstructed data and original data is used for measuring privacy, and the more similar the data, the lower the privacy; module M3.5: and calculating the energy consumption of the part running on the neural network of the mobile phone end, wherein the energy consumption comprises the parameters of the neural network of the mobile phone end, the product accumulation operation, the MAC and the time delay, and the time delay comprises the time consumed by running on the mobile phone end, the time required by uploading the characteristics of the middle layer to the cloud end and the time consumed by running on the cloud end.
The module M4 includes: module M4.1: calculating a reward R according to the precision A, the privacy P and the resource consumption S, wherein the formula is as follows:
wherein A is
baseIs the accuracy of the initial neural network; for attribute reasoning attacks, P is the accuracy of reasoning with privacy attributes; for data reconstruction attacks, P is the similarity of the reconstructed data and the original data; for the parameter quantity, the parameter quantity of the initial model is S
baseThe parameter quantity of the model of the new neural network generated according to the cutting action and the compression action and operated at the mobile phone end is S
1Then, then
For multiply-accumulate operation and MAC, the MAC of the initial model is M
baseThe MAC of the neural network running at the mobile phone end is M
1Then, then
For time delay, the time that the initial model is completely operated at the mobile phone end is T
baseThe time consumed by the operation of the new neural network generated according to the cutting action and the compression action at the mobile phone end is T
eThe time required for uploading the intermediate layer characteristics to the cloud is T
tThe time consumed by running in the cloud is T
cThen, then
Module M4.2: the LSTM parameters are updated using a policy gradient algorithm until LSTM converges.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.