WO2022104799A1

WO2022104799A1 - Training method, training apparatus, and storage medium

Info

Publication number: WO2022104799A1
Application number: PCT/CN2020/130896
Authority: WO
Inventors: 牟勤; 洪伟; 赵中原; 王屹东; 熊可欣
Original assignee: 北京小米移动软件有限公司; 北京邮电大学
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2022-05-27
Also published as: CN114793453A

Abstract

The present disclosure relates to a training method, a training apparatus, and a storage medium. The training method comprises: in response to receiving a model training request, training a first training model, wherein the model training request comprises model compression parameters; and on the basis of the first training model and the model compression parameters, obtaining a first compression model of the first training model. Thus, a compression model can have the same effect as a training model, so that signaling overheads during model transmission are reduced, the accuracy and reliability of the model can be guaranteed, and the security of user information is further ensured.

Description

A training method, training device and storage medium

technical field

The present disclosure relates to the field of wireless communication technologies, and in particular, to a training method, a training device and a storage medium.

Background technique

In order to meet the needs of multi-service scenarios, the communication network has the characteristics of ultra-high speed, ultra-low latency, ultra-high reliability, and ultra-multiple connections. These business scenarios, as well as the corresponding requirements and characteristics of the communication network, are used for the deployment of the communication network. and operation and maintenance brought unprecedented challenges.

In related technologies, artificial intelligence is introduced to improve the resource utilization of communication networks, terminal service experience, automation and intelligent control and management of communication networks, and models obtained through deep learning of artificial intelligence can have better performance. However, its high storage space and computing resource consumption make it difficult to be effectively applied on various hardware platforms, and the communication overhead is large, the precision is small, and the security is low.

SUMMARY OF THE INVENTION

In order to overcome the problems existing in the related art, the present disclosure provides a training method, a training device and a storage medium.

According to a first aspect of the embodiments of the present disclosure, a training method is provided, applied to a first node, the method includes:

In response to receiving a model training request, a first training model is trained, wherein the model training request includes model compression parameters; based on the first training model and the model compression parameters, the first training model of the first training model is obtained. A compressed model.

In one embodiment, the model compression parameters include a plurality of model compression options;

The obtaining the first compression model of the first training model based on the first training model and the model compression parameters includes:

Determine a first model compression option from the plurality of model compression options, and compress the first training model based on the first model compression option to obtain a second compression model; according to the output of the first training model , the output of the second compression model, and the sample parameter set used to train the first training model, to determine a first loss function; update the parameters of the second compression model based on the first loss function to obtain the The first compression model.

In one embodiment, the determining the first loss function according to the output of the first training model, the output of the second compression model, and the sample parameter set used to train the first training model, includes: :

determining a first cross-entropy between the output of the second compression model and the sample parameter set, and determining a first relative entropy divergence between the output of the second compression model and the output of the first training model; based on the The first loss function is determined using the first cross entropy and the first relative entropy divergence.

In one embodiment, the method further includes:

A second loss function for updating parameters of the first training model is determined according to the output of the first training model, the output of the second compression model, and the sample parameter set used to train the first training model .

In an implementation manner, the method for updating the first training model is determined according to the output of the first training model, the output of the second compression model, and the sample parameter set used for training the first training model A second loss function for model parameters, including:

determining a second cross-entropy between the output of the first training model and the sample parameter set, and determining a second relative entropy divergence between the output of the first training model and the output of the second compression model; based on the The second loss function is determined using the second cross entropy and the second relative entropy divergence.

In one embodiment, the model compression parameters include a model training mode including a single training node mode for training a single first training model and a single training node mode for training a plurality of the first training models The multi-training node mode of ;

The number of the first training models is determined based on the model training mode.

In one embodiment, the method further includes:

A second indication message is sent, where the second indication message includes a number of first compressed models corresponding to the model training mode.

In one embodiment, the method further includes:

A third indication message is received, where the third indication message includes an indication of determining the training model.

In one embodiment, the model training mode includes a multi-training node mode, and the method further includes:

receiving a fourth indication message; the fourth indication message is used to indicate a third compression model, where the third compression model is a compression model obtained by performing federated averaging on the first training model based on the number of the first compression models; based on For the third compression model, the model compression parameters are re-determined, and the first compression model is updated based on the re-determined model compression parameters.

In one embodiment, the method further includes:

A fifth indication message is received, where the fifth indication message is used to instruct the end of training the first compression model.

According to a second aspect of the embodiments of the present disclosure, there is provided a training method applied to a second node, the method comprising:

Send a model training request; wherein, the model training request includes model compression parameters, and the model compression parameters are used to compress a first training model to obtain a first compression model, and the first training model is obtained by training based on the model training request .

In one embodiment, the method further includes:

A second indication message is received, the second indication message includes a number of first compressed models corresponding to the model training mode.

In one embodiment, the method further includes:

A third indication message is sent, where the third indication message includes an indication of determining the training model.

A fourth indication message is sent; the fourth indication message is used to indicate a third compression model, where the third compression model is a compression model obtained by performing federated averaging on the first compression model based on the number of the first training models.

In one embodiment, the method further includes:

A fifth instruction message is sent, where the fifth instruction message is used to instruct the end of training the first compression model.

In one embodiment, the method further includes:

A subscription requirement is received, and a model training request is sent based on the subscription requirement.

According to a third aspect of the embodiments of the present disclosure, there is provided a training apparatus applied to a first node, the apparatus comprising:

A model training and compression module, configured to train a first training model in response to receiving a model training request, wherein the model training request includes model compression parameters; based on the first training model and the model compression parameters, obtain A first compressed model of the first training model.

The model training and compression module is configured to determine a first model compression option from among the multiple model compression options, and compress the first training model based on the first model compression option to obtain a second compression model ; Determine the first loss function according to the output of the first training model, the output of the second compression model, and the sample parameter set used to train the first training model; update all the parameters based on the first loss function parameters of the second compression model to obtain the first compression model.

In one embodiment, the apparatus further includes a data processing and storage module;

The data processing and storage module is used to determine the first cross entropy between the output of the second compression model and the sample parameter set, and to determine the difference between the output of the second compression model and the output of the first training model The first relative entropy divergence of , and the first loss function is determined based on the first cross entropy and the first relative entropy divergence.

In one embodiment, the data processing and storage module is further configured to, according to the output of the first training model, the output of the second compression model, and the sample parameters used for training the first training model set to determine a second loss function for updating the parameters of the first training model.

In one embodiment, the data processing and storage module is further configured to determine a second cross entropy between the output of the first training model and the sample parameter set, and determine the output of the first training model and the a second relative entropy divergence between the outputs of the second compression model; the second loss function is determined based on the second cross entropy and the second relative entropy divergence.

In one embodiment, the model compression parameters include a model training mode including a single training node mode for training a single first training model and a single training node mode for training a plurality of the first training models the multi-training node mode; the number of the first training models is determined based on the model training mode.

In one embodiment, the device network communication module;

The first network communication module is configured to send a second indication message, where the second indication message includes a number of first compressed models corresponding to the model training mode.

In an implementation manner, the first network communication module is further configured to receive a third indication message, where the third indication message includes an indication of determining a training model.

In an embodiment, the first network communication module is further configured to receive a fourth indication message; the fourth indication message is used to indicate a third compression model, and the third compression model is based on the first compression model A compression model obtained by federally averaging the first training model; based on the third compression model, re-determining the model compression parameters, and updating the first compression model based on the re-determined model compression parameters.

In an implementation manner, the first network communication module is further configured to receive a fifth indication message, where the fifth indication message is used to instruct the end of training the first compression model.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a training apparatus applied to a second node, the apparatus comprising:

The second network communication module is used to send a model training request; wherein, the model training request includes model compression parameters, and the model compression parameters are used to compress the first training model to obtain a first compression model, and the first training model Obtained by training based on the model training request.

In an implementation manner, the second network communication module is further configured to receive a second indication message, where the second indication message includes a number of first compressed models corresponding to the model training mode.

In an implementation manner, the second network communication module is further configured to send a third instruction message, where the third instruction message includes an instruction for determining the training model.

In an embodiment, the model training mode includes a multi-training node mode, and the network communication module is further configured to send a fourth instruction message; the fourth instruction message is used to indicate the third compression model, the first The three-compression model is a compression model obtained by federally averaging the first compression model based on the number of the first training models.

In an implementation manner, the second network communication module is further configured to send a fifth instruction message, where the fifth instruction message is used to instruct the end of training the first compression model.

In one embodiment, the apparatus further includes a service management module;

The service management module is configured to receive subscription requirements and send a model training request based on the subscription requirements.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a training device, comprising:

a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: execute the first aspect or the training method described in any implementation manner of the first aspect, or execute the second aspect or The training method described in any one of the implementation manners of the second aspect.

According to a sixth aspect of the embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, which enables the mobile terminal to execute the first aspect or the first aspect when instructions in the storage medium are executed by a processor of a mobile terminal. The training method described in any one of the embodiments of the aspect, or the training method described in the second aspect or any one of the embodiments of the second aspect is performed.

The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: in the present disclosure, the trained model is compressed, and the parameters of the compressed model are updated, so that the compressed model can have the same effect as the training model, thereby reducing the transmission time The model is the signaling overhead, and can ensure the accuracy and reliability of the model, and further ensure the security of user information.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

FIG. 1 is a schematic diagram of a system architecture of a compression method provided by the present disclosure.

Fig. 2 is a flowchart of a training method according to an exemplary embodiment.

Fig. 3 is a flowchart of another training method according to an exemplary embodiment.

Fig. 4 is a flowchart of yet another training method according to an exemplary embodiment.

Fig. 5 is a flowchart of yet another training method according to an exemplary embodiment.

FIG. 6 is a flowchart of an implementation manner of determining a first compression model in a single training node mode in a training method provided by the present disclosure.

FIG. 7 is a flowchart of an implementation manner of determining a first compression model in a multi-training node mode in a training method provided by the present disclosure.

FIG. 8 is a schematic diagram of the protocol and interface of the model training and compression decision part in a training method provided by the present disclosure.

FIG. 9 is a schematic diagram of the protocol and interface of the model training and compression part in a single training node mode in a training method provided by the present disclosure.

FIG. 10 is a schematic diagram of a protocol and an interface of a model training and compression part in a multi-training node mode in a training method provided by the present disclosure.

FIG. 11 is a schematic diagram of a protocol and interface of a wireless data transmission part in a training method provided by the present disclosure.

Fig. 12 is a block diagram of a training apparatus according to an exemplary embodiment.

Fig. 13 is a block diagram of another training apparatus according to an exemplary embodiment.

Fig. 14 is a block diagram of an apparatus for training according to an exemplary embodiment.

Fig. 15 is a block diagram of another apparatus for training according to an exemplary embodiment.

Detailed ways

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

With the breakthrough of artificial intelligence technology, especially in terms of the enrichment of deep learning algorithms, the improvement of hardware computing capabilities, and the introduction of massive data into the new generation of communication networks, it has provided strong support for the new generation of network intelligence. And, use artificial intelligence to further improve the resource utilization of the communication network, improve the terminal service experience of the communication network, and realize the automation and intelligent control and management of the communication network.

In the related art, the implementation process of using the deep learning algorithm when training the model includes: the model request node determines the model structure and the model training mode according to the model/analysis subscription requirements, wherein the model training mode includes a single training node mode and a multi-training node mode. . The model request node sends the model structure and model training mode to the model training node, and the model training node independently conducts model training according to the model training mode or participates in the collaborative model training of multiple training nodes. After the model training is completed, the model training node sends the model to the model request node, and the model request node performs a federated average of the models sent by the model request node in the multi-training node mode to obtain a global model. The model request node checks whether the obtained model meets the model/analysis subscription requirements, and if so, the model request node sends the obtained model to the model/analysis party. If not, repeat the above model training process until the model obtained by the model request node meets the model/analysis subscription requirements. It can be seen from this that the related art includes the following deficiencies:

(1) The data volume of the model is relatively large, especially in the multi-training node mode, the model needs to perform multiple transmissions between the model training node and the model requesting node, which greatly increases the communication overhead.

(2) The large amount of model data transmitted between the model training node and the model requesting node will aggravate the shortage of wireless resources, thereby increasing the probability of data transmission errors, and reducing the reliability of the model received by the model requesting node. precision.

(3) Sending the model trained by the model training node using local data to the model requesting node without any processing increases the risk of inferring information related to the terminal and network data in reverse after the model is maliciously intercepted in the network, which cannot be guaranteed. Security of terminal private data.

Based on the deficiencies in the above-mentioned related technologies, the present disclosure provides a training method to solve the problems of high communication overhead, insufficient model accuracy, and security of terminal private data. The training method provided by the present disclosure determines the model structure and model training mode according to network service requirements (such as model subscription requirements), and fully considers the local available computing power, communication conditions, training sample characteristics and other factors of the model training node, and formulates various Model compression option to reduce unnecessary communication overhead, improve wireless network resource utilization, and apply deep learning to network intelligence work in a more efficient and secure way.

FIG. 1 is a schematic diagram of a system architecture of a training method provided by the present disclosure. As shown in Figure 1, the system includes a core network part and a radio access network part. The terminal (user) accesses the base station through the wireless channel, the base stations are connected through the Xn interface, the base station accesses the terminal port function (User Port Function, UPF) network element of the core network through the N3 interface, and the UPF network element accesses the session through the N4 interface The management function (Session Management Function, SMF) network element, the SMF network element is connected to the bus structure of the core network, and is connected with other network functions (Network Function, NF) of the core network.

It can be understood that the communication system between the network device and the terminal shown in FIG. 1 is only a schematic illustration, and the wireless communication system may also include other network devices, for example, a wireless relay device and a wireless backhaul device, etc. Not shown in Figure 1. The embodiments of the present disclosure do not limit the number of network devices and the number of terminals included in the wireless communication system.

It can be further understood that the wireless communication system according to the embodiment of the present disclosure is a network that provides a wireless communication function. Wireless communication systems can use different communication technologies, such as code division multiple access (CDMA), wideband code division multiple access (WCDMA), time division multiple access (TDMA) , frequency division multiple access (frequency division multiple access, FDMA), orthogonal frequency division multiple access (orthogonal frequency-division multiple access, OFDMA), single carrier frequency division multiple access (single Carrier FDMA, SC-FDMA), carrier sense Carrier Sense Multiple Access with Collision Avoidance. According to the capacity, speed, delay and other factors of different networks, the network can be divided into 2G (English: generation) network, 3G network, 4G network or future evolution network, such as 5G network, 5G network can also be called a new wireless network ( New Radio, NR). For convenience of description, the present disclosure will sometimes refer to a wireless communication network simply as a network.

Further, the network devices involved in the present disclosure may also be referred to as radio access network devices. The wireless access network equipment may be: a base station, an evolved node B (base station), a home base station, an access point (AP) in a wireless fidelity (WIFI) system, a wireless relay A node, a wireless backhaul node, a transmission point (TP) or a transmission and reception point (TRP), etc., can also be a gNB in an NR system, or can also be a component or part of a device that constitutes a base station Wait. When it is a vehicle-to-everything (V2X) communication system, the network device may also be an in-vehicle device. It should be understood that, in the embodiments of the present disclosure, the specific technology and specific device form adopted by the network device are not limited.

Further, the terminal involved in the present disclosure may also be referred to as terminal equipment, user equipment (User Equipment, UE), mobile station (Mobile Station, MS), mobile terminal (Mobile Terminal, MT), etc. A device that provides voice and/or data connectivity, for example, a terminal may be a handheld device with wireless connectivity, a vehicle-mounted device, or the like. At present, some examples of terminals are: Smartphone (Mobile Phone), Pocket Personal Computer (PPC), PDA, Personal Digital Assistant (PDA), notebook computer, tablet computer, wearable device, or Vehicle equipment, etc. In addition, when it is a vehicle-to-everything (V2X) communication system, the terminal device may also be an in-vehicle device. It should be understood that the embodiments of the present disclosure do not limit the specific technology and specific device form adopted by the terminal.

Fig. 2 is a flow chart of a training method according to an exemplary embodiment. As shown in Figure 2, the training method is used in the first node and includes the following steps.

In step S11, in response to receiving a model training request, a first training model is trained.

In the embodiment of the present disclosure, the first node is a model training node, and the model training node is referred to as the first node in this disclosure for the convenience of description; the second node is a model request node. for the second node. The model training request includes model compression parameters.

The model compression parameters include at least one of the following:

Model training structure, multiple model compression options, model training modes.

In an embodiment of the present disclosure, the model compression option is determined based on the model subscription requirement received by the second node (eg, the model requesting node). The second node determines to send the model training request according to the received model subscription requirement. After receiving the model training request sent by the second node, the first node (for example, the model training node) sends third indication information, responds to the model training request information, and determines The first training model is trained based on the local sample parameter set and the model training structure, and the relevant parameters required for model compression are determined. The response model training request information sent by the first node further includes one or more of the local computing capability of the first node, communication conditions, and characteristics of the training sample parameter set.

In step S12, a first compression model of the first training model is obtained based on the first training model and the model compression parameters.

In the embodiment of the present disclosure, the first node compresses the first training model based on the model compression option in the model compression parameters and the relevant parameters required for model compression. The relevant parameters required for model compression are determined by the first node based on parameters such as model compression parameters and local computing capabilities sent by the second node, and the model compression options include model accuracy and model parameter data volume. The model compression parameters include multiple model compression options, and the multiple model compression options are determined by the second node based on one or more of local computing capabilities, communication conditions, and training sample parameter set characteristics reported by multiple first nodes.

Fig. 3 is a flowchart of a training method according to an exemplary embodiment. As shown in FIG. 3 , based on the first training model and the model compression parameters, a compression model of the first training model is obtained, including the following steps.

In step S21, a first model compression option is determined among the multiple model compression options, and the first training model is compressed based on the first model compression option to obtain a second compression model.

In an embodiment of the present disclosure, the first node determines a first model compression option for model compression according to one or more of local computing capabilities, communication conditions, and training samples among the multiple model compression options. According to the model accuracy included in the first model compression option and the relevant parameters required for model compression determined in the training process, determine a matrix representing the contribution value of each channel in the network to the accuracy, and use the symbol g to identify the matrix. The first training model is compressed according to the requirement of the model parameter data volume in the model compression option to obtain the second compression model, and the symbol θ ^S is used to identify the second compression model. The following implementations may be used to compress the first training model by using the matrix g and the first model compression option:

The first node takes the amount of model parameter data as a constraint, and designs a pruning matrix X to retain the channel that contributes more to the accuracy in the model. The first node takes the sum of the elements of each column of the pruning matrix X as the unknown item, and according to the size of the elements in each column of the matrix g, retains the channel corresponding to the item with the largest column element in the matrix, and prunes other channels. After the pruning matrix X is obtained, X is used to prune θ to obtain the second compression model θ ^S .

In the embodiment of the present disclosure, the first node selects an appropriate model compression option, compresses the training model according to the model compression option, and then transmits it to the second node. Under the condition of retaining most of the accuracy of the deep learning model, try to compress the training model as much as possible. The data volume of the model is reduced. This method realizes model compression according to the communication rate requirements of the model training node, which greatly reduces the communication overhead of the model uplink transmission.

In step S22, a first loss function is determined according to the output of the first training model, the output of the second compression model, and the sample parameter set used for training the first training model.

In an embodiment of the present disclosure, the sample training parameter set further includes a sample verification parameter set, and at least one data pair of input and output of the sample verification parameter set is determined. The first node inputs the input and output data pairs of the sample verification parameter set into the first training model and the second compression model, and determines the output of the first training model and the output of the second compression model, and the corresponding sample verification parameter set. The output is the true value corresponding to the model input.

Further, determine the first cross entropy between the output of the second compression model and the sample parameter set (that is, the input and output data of the sample verification parameter set to the true value), and determine the output of the second compression model and the output of the first training model The first relative entropy divergence between, the sum of the first cross entropy and the first relative entropy divergence is determined as the loss function of the second compression model. The present disclosure determines the loss function of the second compression model as the first loss function for the convenience of distinction. And according to the above-mentioned embodiment, a plurality of first loss functions are determined based on a plurality of input and output data pairs in the sample parameter set, and an average value of the plurality of first loss functions is determined, and a gradient descent method is used according to the average value of the plurality of first loss functions. The parameters of the second compression model are updated to obtain the first compression model.

The first loss function (that is, the loss function of the second compression model) is expressed by the following formula:

In the formula,

is the loss function of the second compression model;

is the first cross entropy between the output value of the second compression model and the input and output data pairs of the sample validation parameter set to the true value; D _KL (p ^S || p ₂ ) is the output value of the first training model and the second compression model The first relative entropy divergence of the output value; p ^S is the output value of the second compression model; p ₂ is the output value of the first training model.

In the embodiment of the present disclosure, it should be noted that the first training model is the first training model after updating the parameters of the first training model based on the loss function of the first training model. A loss function of the training model, after updating the parameters of the first training model based on the loss function of the first training model, the loss function (ie, the first loss function) of the second compression model is determined. The present disclosure determines the loss function of the first training model as the second loss function for the convenience of distinction. As mentioned above, the sample training parameter set further includes a sample verification parameter set, and the first node determines at least one data pair of input and output of the sample verification parameter set. The first node inputs the input and output data pairs of the sample verification parameter set into the first training model and the second compression model, and determines the output of the first training model and the output of the second compression model, and the corresponding sample verification parameter set. The output is the true value corresponding to the model input.

Further determine the second cross entropy between the output of the first training model and the sample parameter set (that is, the input and output data of the sample verification parameter set to the true value), and determine the output of the first training model and the output of the second compression model The second relative entropy divergence between, the sum of the second cross entropy and the second relative divergence is determined as the second loss function. According to the above-mentioned embodiment, a plurality of second loss functions are determined based on a plurality of input-output data pairs in the sample parameter set, and an average value of the plurality of second loss functions is determined, and a gradient descent method is adopted according to the average value of the plurality of second loss functions The parameters of the first training model are updated to obtain an updated first training model.

Wherein, the second loss function (that is, the loss function of the first training model) is expressed by the following formula:

L _θ =L _C +D _KL (p ₁ ||p ^S )

In the formula, L _θ is the loss function of the first training model; L _C is the second cross-entropy between the output value of the first training model and the input and output data of the sample verification parameter set to the true value; D _KL (p ₁ | | p ^S ) is the relative entropy divergence of the output value of the first training model and the output value of the second compression model; p ^S is the output value of the second compression model; p ₁ is the output value of the first training model.

In an embodiment of the present disclosure, the model training modes in the model compression parameters include a single training node mode for training a single first training model and a multi-training node mode for training multiple first training models.

The first node determines the number of training first training models according to the model training mode included in the model training parameters. If the model training mode is single training node training, it is determined to train a first training model based on a single first node. The training method is as follows: above. If the model training mode is multi-training node training, it is determined to train a plurality of first training models based on the plurality of first nodes, and different sequence marks are set for the plurality of first nodes to train the plurality of first training models. The following takes the mth model training node (ie, the mth first node) as an example for description. The multi-training node model is described.

The first node takes the amount of model parameter data as a constraint, and designs a pruning matrix X to retain the channel that contributes more to the accuracy in the model. The first node takes the sum of the elements of each column of the pruning matrix X as the unknown item, and according to the size of the elements in each column of the matrix g, retains the channel corresponding to the item with the largest column element in the matrix g, and prunes other channels. After obtaining the pruning matrix X, use X to prune θ _m to obtain the mth second compression model

In an embodiment of the present disclosure, the sample training parameter set further includes a sample verification parameter set, and at least one data pair of input and output of the sample verification parameter set is determined. The first node inputs the input and output data pairs of the sample verification parameter set into the mth first training model and the mth second compression model, and determines the output of the mth first training model and the mth second compression model. The output of the compressed model, and the output of the corresponding sample validation parameter set, the output is the real value corresponding to the model input.

Further, determine the mth first cross-entropy between the output of the mth second compression model and the sample parameter set (that is, the input and output data of the sample verification parameter set to the true value), and determine the mth second compression model The mth first relative entropy divergence between the output of the mth first training model and the output of the mth first training model, and the sum of the mth first cross-entropy and the mth first relative entropy divergence is determined as the mth first The loss function of the second compression model. The present disclosure determines the loss function of the mth second compression model as the mth first loss function for the convenience of distinction. And according to the above-mentioned embodiment, a plurality of mth first loss functions are determined based on a plurality of input and output data pairs in the sample parameter set, and an average value of a plurality of mth first loss functions is determined, according to the plurality of mth first loss functions. The average value of the loss function uses the gradient descent method to update the parameters of the mth second compression model to obtain the mth first compression model.

Among them, the mth first loss function (that is, the loss function of the second compression model) is expressed by the following formula:

In the formula,

is the loss function of the mth second compression model;

is the first cross entropy between the output value of the mth second compression model and the input and output data pairs of the sample validation parameter set to the true value;

is the first relative entropy divergence of the mth first training model output value and the mth second compression model output value;

is the output value of the mth second compression model; p _m is the output value of the mth first training model.

In the embodiment of the present disclosure, it should be noted that the mth first training model is the mth first training model obtained by updating the parameters of the mth first training model based on the loss function of the mth first training model. , in other words, in the embodiment of the present disclosure, the loss function of the mth first training model is preferentially determined, and after updating the parameters of the first training model based on the loss function of the mth first training model, the mth second compression model is determined. The loss function of the model (i.e. the mth first loss function). In the present disclosure, for the convenience of distinction, the loss function of the mth first training model is determined as the mth second loss function. As mentioned above, the sample training parameter set further includes a sample verification parameter set, and the first node determines at least one data pair of input and output of the sample verification parameter set. The first node inputs the input and output data pairs of the sample verification parameter set into the mth first training model and the mth second compression model, and determines the output of the mth first training model and the mth second compression model. The output of the compressed model, and the output of the corresponding sample validation parameter set, the output is the real value corresponding to the model input.

Further determine the second cross-entropy between the output of the mth first training model and the sample parameter set (that is, the input and output data of the sample validation parameter set to the true value), and determine the output of the mth first training model and The second relative entropy divergence between the outputs of the mth second compression model, the sum of the mth second cross entropy and the mth second relative divergence is determined as the mth second loss function. According to the above-mentioned embodiment, a plurality of m-th second loss functions are determined based on a plurality of input-output data pairs in the sample parameter set, and the average value of a plurality of m-th second loss functions is determined, according to the plurality of m-th second loss functions. The average value of the loss function uses the gradient descent method to update the parameters of the m-th first training model to obtain the updated m-th first training model.

Among them, the mth second loss function (that is, the loss function of the first training model) is expressed by the following formula:

In the formula,

is the loss function of the mth first training model; L _C is the second cross entropy between the output value of the first training model and the input and output data of the sample verification parameter set to the true value;

is the relative entropy divergence of the mth output value of the first training model and the output value of the second compression model;

Of course, in the embodiment of the present disclosure, other model compression methods such as model sparseness, parameter quantization, etc. may also be selected to determine the first compression model, which is not specifically limited in the present disclosure.

Fig. 4 is a flowchart showing a training method according to an exemplary embodiment. As shown in FIG. 4 , based on the first training model and the model compression parameters, a compression model of the first training model is obtained, including the following steps.

In step S31, a fourth indication message is received.

In this embodiment of the present disclosure, the second node determines the first compression model according to the received second indication message. If there is one first compression model, it is determined whether the first compression model meets the model subscription requirement, or whether it meets the analysis subscription requirement. If there are multiple first compression models, the multiple first compression models are federated averaged to obtain a third compressed model after federation average or called a global model, and it is determined whether the third compressed model after federation average meets the model subscription requirements , or whether the analytics subscription requirements are met. In an implementation manner, if a first compression model or a third compression model after federated average of multiple first compression models does not meet the subscription requirement, a fourth indication message is sent, where the fourth indication message is used to indicate the third compression model. Model. The first node receives the fourth indication message.

In step S32, based on the third compression model, model compression parameters are re-determined, and the first compression model is updated based on the re-determined model compression parameters.

In this embodiment of the present disclosure, the first node re-determines model compression parameters according to the third compression model indicated by the fourth indication message, updates the first compression model based on the re-determined model compression parameters, and determines the loss function of the first compression model , and re-update the parameters of the first compression model until the second node determines a compression model that satisfies the model subscription requirement.

In an exemplary embodiment of the present disclosure, another implementation is that, when the second node determines that the first compression model meets the model subscription requirements, it determines to send a fifth indication message, where the fifth indication message is used to instruct the end of training the first A compressed model. After receiving the fifth indication message, the first node determines that the training of the first compression model is over, and the second node sends the determined compression model to the model subscriber.

In this embodiment of the present disclosure, after obtaining one or more first compressed models corresponding to the model training modes, the first node sends a second indication message to the second node through a wireless channel. Wherein, the second indication message includes the number of first compressed models corresponding to the model training mode.

The embodiment of the present disclosure solves the problem that the data volume of the deep learning model is too large, effectively alleviates the shortage of wireless resources, reduces the problem of data transmission errors in the case of network congestion, and improves the reliability of model transmission in the wireless network. , to ensure the accuracy of the model. In the embodiment of the present disclosure, the model obtained by training the first node using the local training parameter set is compressed and uploaded to the second node. This method not only keeps the user's private data locally, but also greatly increases the difficulty of the network's reverse reasoning for the model. , which further ensures the security of user information.

Based on the same/similar concept, the embodiments of the present disclosure also provide a training method.

Fig. 5 is a flowchart of a training method according to an exemplary embodiment. As shown in Figure 5, the training method is used in the second node and includes the following steps.

In step S41, a model training request is sent.

In the embodiment of the present disclosure, the model training request includes model compression parameters, and the model compression parameters are used to compress the first training model to obtain the first compression model, and the first training model is obtained by training based on the model training request.

The model compression parameters include at least one of the following:

In an embodiment of the present disclosure, the model compression option is determined based on the model subscription requirement received by the second node (eg, the model requesting node). The second node determines to send the model training request according to the received model subscription requirement. After receiving the model training request sent by the second node, the first node (for example, the model training node) sends third indication information, responds to the model training request information, and determines The first training model is trained based on the local sample parameter set and the model training structure, and the relevant parameters required for model compression are determined. The information sent by the first node to respond to the model training request further includes one or more of the local computing capability of the first node, communication conditions, and characteristics of the training sample parameter set.

In the embodiment of the present disclosure, the first node compresses the first training model based on the model compression option in the model compression parameters and the relevant parameters required for model compression. Among them, the model compression options include model accuracy and model parameter data volume. The model compression parameters include multiple model compression options, and the multiple model compression options are determined by the second node based on one or more of local computing capabilities reported by multiple first nodes, communication conditions, and characteristics of the training sample parameter set.

The first node determines the number of training first training models according to the model training mode included in the model training parameters. If the model training mode is single training node training, it is determined to train a first training model based on a single first node. The training method is as follows: above. If the model training mode is multi-training node training, it is determined to train a plurality of first training models based on the plurality of first nodes, and different sequence marks are set for the plurality of first nodes to train the plurality of first training models.

In this embodiment of the present disclosure, after obtaining one or more first compressed models corresponding to the model training modes, the first node sends a second indication message to the second node through a wireless channel. Wherein, the second indication message includes the number of first compressed models corresponding to the model training modes. The second node receives the second instruction message to determine the number of first compression models corresponding to the model training modes, and performs a federated average on the received one or more first compressions to obtain a third compression model after the federal average, and determines the third compression model. Whether the compressed model meets the model subscription requirements, or whether it meets the analysis subscription requirements.

In the embodiment of the present disclosure, the subscription requirement may be issued by Operation Administration and Maintenance (OAM), or issued by the core network. Subscription requirements include analysis ID, used to identify the analysis type of the model training request; notification target model training node address, used to associate notifications received by the requested party with this subscription; also used for analysis report information, including the preferred analysis accuracy level , analysis time interval and other parameters; analysis filter information (optional): Indicates the conditions to be met by the report analysis information.

In an implementation manner of the embodiment of the present disclosure, if a first compression model or a third compression model obtained by federated averaging of multiple first compression models does not meet the subscription requirement, a fourth indication message is sent, wherein the fourth indication message uses to indicate the third compression model. The first node receives the fourth indication message.

In an exemplary embodiment of the present disclosure, another implementation is that, when the second node determines that the first compression model meets the model subscription requirements, it determines to send a fifth indication message, where the fifth indication message is used to instruct the end of training the first A compressed model. After receiving the fifth indication message, the first node determines that the training of the first compression model is finished, and the second node sends the determined compression model to the model subscriber.

In the embodiment of the present disclosure, the second node receives the subscription requirement sent by the OAM or the core network, and determines to send the model training request based on the received subscription requirement.

In this embodiment of the present disclosure, the first node and the second node may be applied between a base station and a base station, or between a base station and a terminal, and of course, may also be applied between a base station and a core network. For example, an application environment may be that the first node is a terminal and the second node is a base station. At this time, the required measurement can also be applied to an application environment in which the first node is a base station and the second node is also a base station, and may also include an application In an environment, for example, the first node is a base station, and the second node is a core network node. Of course, this is only an illustration of the application environment of the first node and the second node of the present disclosure, and the application environment of the specific implementation manner is not specifically limited in the present disclosure.

In some embodiments of the present disclosure, the first node is referred to as a model training node, and the second node is referred to as a model request node. The present disclosure is further described in terms of interaction between model training nodes and model request nodes.

FIG. 6 is a flowchart of an implementation manner of determining a first compression model in a single training node mode in a training method provided by the present disclosure. As shown in Figure 6, the model request node initiates a model training request to the model training node.

The model training node sends the local computing power, communication conditions and training sample parameter set characteristics to the model requesting node.

The model request node determines the model structure and model training mode according to the model/analysis subscription requirements, and proposes a variety of model compression options based on the information reported by the model training node, including model accuracy and model parameter data volume.

The model request node sends the model structure, model training mode, and model compression options to the model training node, and the model training node selects an appropriate model compression option.

The model training node uses the local sample parameter set for model training to obtain the first training model and related parameters required for model compression.

The model training node compresses the first training model according to the selected model compression option and relevant parameters required for model compression to obtain a first compressed model, and transmits the first compressed model to the model requesting node through a wireless channel.

When the first compressed model obtained by the model request node meets the model/analysis subscription requirements, the model training process ends, and the model request node reports the model to the model/analysis subscriber.

FIG. 7 is a flowchart of an implementation manner of determining a first compression model in a multi-training node mode in a training method provided by the present disclosure. As shown in Figure 7, the model request node initiates a model training request to the model training node.

The model training node selects an appropriate model compression option, and compresses the first training model according to the selected model compression option and relevant parameters required for model compression to obtain a first compression model, and transmits the first compression model through the wireless channel Passed to the model request node.

The first compressed model sent from the first model training node of the model request node is federated average to obtain a global model.

Determine if the global model meets the model/analytics subscription requirements.

If the global model meets the model/analysis subscription requirements, the model training process ends, and the model request node reports the global model to the model/analysis subscriber. If the global model does not meet the model/analysis subscription requirements, the model training node reselects an appropriate model compression option, and updates the first compression model according to the re-determined model compression option.

FIG. 8 is a schematic diagram of the protocol and interface of the model training and compression decision part in a training method provided by the present disclosure. As shown in FIG. 8 , it includes a service management module and a network communication module in the model request node, and a network communication module, model training and compression module, data processing and storage module in the model training node device. The service management module in the model request node, the network communication module and the network communication module, model training and compression module, data processing and storage module in the model training node device perform the following steps for information exchange.

In the embodiment of the present disclosure, step 1 includes steps 1a-1c, wherein in step 1a, the model request node service management module sends a model training request signaling to the model request node network communication module, and the content of the signaling instruction is to train the model The node initiates a model training request. In step 1b, the model requesting node network communication module sends model training request signaling to the model requesting node network communication module. In step 1c, the model training node network communication module sends the model training request response signaling to the model request node network communication module, and the content of the signaling instruction is to notify the acceptance of the model training request.

Step 2 includes steps 2a-2c, wherein in step 2a, the model training node model training and compression module sends the computing capability information reporting signaling to the model training node network communication module, and the signaling instruction content is to calculate the model training node equipment. The capability information is reported to the receiver. In step 2b, the model training node data processing and storage module sends the training sample feature information reporting signaling to the model training node network communication module, and the signaling instruction content is to report the model training node local data training sample feature information to the receiver. In step 2c, the model training node network communication module sends the computing capability and training sample feature information reporting signaling to the model training node network communication module, and the signaling instruction content is to report the model training node computing capability and local data training sample feature information to the recipient.

In step 3, if the model training node is a terminal and the model requesting node is a base station, the network communication module of the model training node needs to measure the Channel Quality Indication (CQI) and report the signaling to the model requesting node network communication module , and the content of the signaling indication is to perform CQI measurement and report the CQI information to the receiver.

In step 4, the model request node network communication module sends the model training node computing capability, training sample characteristics, CQI information (optional) signaling to the model request node service management module, and the signaling indication content is the model to be received. The computing power of the training node, the characteristics of the training samples, and the CQI information (optional) are aggregated and sent to the receiver.

In step 5, the model request node service management module determines the model structure and model training mode according to the model/analysis subscription requirements.

In step 6, the model request node service management module proposes various model compression options according to the information reported by the model training node.

In step 7, 7a-7b are included, wherein in step 7a, the model request node service management module sends the model structure and model training mode signaling to the model request node network communication module, and the signaling instruction content is to convert the model structure and model The training mode is sent to the receiver. In step 7b, the model requesting node service management module sends the model requesting node network communication module a signaling of sending model compression options, and the content of the signaling instruction is to send multiple model compression options to the receiver.

Step 8 includes 8a-8b, wherein in step 8a, the model request node network communication module sends the model structure and model training mode signaling to the model training node network communication module. In step 8b, the model requesting node network communication module sends the model compression option signaling to the model training node network communication module.

9a-9b are included in step 9, wherein in step 9a, the model training node network communication module will send model structure and model training mode signaling to the model training node model training and compression module. In step 9b, the model training node network communication module sends the model compression option signaling to the model training node model training and compression module.

In step 10, the model training node selects an appropriate model compression option according to locally available computing power, real-time communication conditions, and characteristics of training samples, and selects an appropriate model compression option.

FIG. 9 is a schematic diagram of the protocol and interface of the model training and compression part in a single training node mode in a training method provided by the present disclosure. As shown in FIG. 9 , it includes a data processing and storage module, a model training and compression module, and a network communication module in the model training node, as well as a network communication module and a service management module in the model requesting node device. The data processing and storage module, model training and compression module, network communication module in the model training node, and the network communication module and service management module in the model request node device perform the following steps for message interaction.

In step 1, including steps 1a-1b, wherein in step 1a, the model training node model training and compression module sends the request local training data set signaling to the model training node data processing and storage module, and the signaling indicates that the content is the request Collect training datasets from local data. In step 1b, the data processing and storage module of the model training node sends the signaling of sending the local training data set to the model training and compression module of the model training node, and the signaling indicates the content: collect data from local data to generate a training data set and send it to the model training and compression module. receiver.

In step 2, the model training and compression module of the model training node uses the local training data set for model training, and obtains the training model and relevant parameters required for model compression.

In step 3, the model training node compresses the original training model according to the selected model compression option and relevant parameters required for model compression to obtain a compressed model.

Step 4 includes 4a-4c, wherein, in step 4a, the model training node model training and compression module sends the compressed model to the model training node network communication module. In step 4b, the model training node network communication module sends the compressed model to the model request node network communication module. In step 4c, the model request node network communication module sends the compressed model to the model request node service management module.

In step 5, the model request node service management module judges whether the obtained model satisfies the model/analysis subscription requirement. If satisfied, go to step 6.

In step 6, the model request node service management module sends a signaling of notifying the model training end to the model training node network communication module via the model request node network communication module. This process and corresponding signaling are newly added in the present invention, and the signaling indicates Content: Notify the model training node to end the model training process.

Otherwise, go to step 6a.

In step 6a, the model request node service management module sends the notification model training continuation signaling to the model training node network communication module via the model request node network communication module. This process and the corresponding signaling are newly added in the present invention, and the signaling indicates Content: Notifies the model training process to continue. In step 6b, the network communication module of the model training node will notify the model training continuation signaling to the model training node model training and compression module.

In step 7, the model training and compression module of the model training node uses the local training data set to train the compressed model, and repeats steps 4a-7 until the model obtained by the model request node meets the model/analysis subscription requirements.

FIG. 10 is a schematic diagram of a protocol and an interface of a model training and compression part in a multi-training node mode in a training method provided by the present disclosure. As shown in FIG. 10 , it includes a model training and compression module, a transmission control module, and a network communication module in the model training node, and a network communication module, transmission control module, and service management module in the model request node device. The following steps are performed for the information exchange among its various modules.

Step 1 includes steps 1a-1b, wherein in step 1a, the model training and compression module of the model training node sends a request for local training data set signaling to the model training node data processing and storage module. In step 1b, the data processing and storage module of the model training node sends the signaling of sending the local training data set to the model training and compression module of the model training node.

In step 2, the model training and compression module of the model training node uses the local training data set to perform model training to obtain a first training model and relevant parameters required for model compression.

In step 3, the model training node compresses the first training model according to the selected model compression option and the relevant parameters required for model compression to obtain the first compression model.

Step 4 includes steps 4a-4c, wherein in step 4a, the model training node model training and compression module sends the first compressed model to the model training node network communication module. In step 4b, the model training node network communication module sends the first compressed model to the model request node network communication module. In step 4c, the model request node network communication module sends the first compressed model to the model request node model calculation and update module.

In step 5, the model request node model calculation and update module summarizes the first compressed model sent from each model training node, and performs a federated average to obtain a global model.

In step 6, the model request node model calculation and update module sends the global model to the model request node service management module.

In step 7, the model request node service management module judges whether the obtained model meets the model/analysis subscription requirements. If so, execute:

In step 8, the model request node service management module sends a signaling of notifying the model training end to the model training node network communication module via the model request node network communication module.

Otherwise, go to steps 8a-8b.

In step 8a, the model request node service management module sends the notification model training continuation signaling to the model training node network communication module via the model request node network communication module, and distributes the global model to the model training node via the model request node network communication module network communication module. In step 8b, the model training node network communication module will notify the model training continuation signaling to the model training node model training and compression module, and send the global model to the model training node model training and compression module.

In step 9, the model training and compression module of the model training node uses the local training data set to perform model training and compression on the global model sent by the model requesting node, and repeats steps 4a-9 until the model obtained by the model requesting node satisfies the model/analysis Subscription requirements.

FIG. 11 is a schematic diagram of a protocol and interface of a wireless data transmission part in a training method provided by the present disclosure. As shown in Figure 11, it includes a model training and compression module, a transmission control module, and a network communication module in the model training node, and a network communication module, transmission control module, and service management module in the model request node. It can be applied to the application scenario where the model requesting node is the base station and the model training node is the terminal. The following steps are performed for the information interaction between each module.

In step 1, the model training node model training and compression module sends the compressed model to the model training node transmission control module.

In step 2, the model training node network communication module sends the measured CQI and reporting signaling to the model training node transmission control module.

In step 3, the model training node transmission control module formulates a data transmission scheme according to compression characteristics and wireless communication conditions.

In step 4, the model training node transmission control module sends the data transmission scheme information signaling to the model training node network communication module, this process and the corresponding signaling are newly added in the present invention, and the signaling indicates the content: send the data transmission scheme The information is sent to the receiver, including modulation mode, code rate and other information.

In step 5, the model training and compression module of the model training node sends the compressed model to the network communication module of the model training node.

In step 6, the model training node network communication module encapsulates the compressed model according to the data transmission scheme.

Step 7 includes steps 7a-7d, wherein in step 7a, the model training node network communication module transmits the compressed model data packet to the model request node network communication module. In step 7b, the network communication module of the model request node sends the compressed model to the transmission control module of the model request node, and the decapsulated data is transmitted at this time. In step 7c, the transmission control module of the model requesting node sends a signaling of acknowledging receipt of correct data to the network communication module of the model requesting node, and the content of the signaling indicates that the receiver has received correct data. In step 7d, the model requesting node network communication module sends a notification acknowledging receipt of correct data signaling to the model training node network communication module.

In step 8, the model request node transmission control module sends the compressed model to the model request node service management module. If it is a single training node mode, the compressed model can be directly sent to the model request node business management module; if it is a multi-training node mode, the global model needs to be obtained through the model request node model calculation and update module, and then sent. Request the node service management module for the model.

In step 9, the model request node service management module judges whether the model meets the model/analysis subscription requirement. If so, go to steps 10a1-10b1.

In step 10a1, the model requesting node service management module sends a signaling informing the model training end to the model requesting node transmission control module. In step 10b1, the model requesting node network communication module sends a signaling informing the model training end to the model training node network communication module.

Otherwise, steps 10a2-10b2 are performed.

In step 10a2, the model request node service management module sends the distribution model training end signaling to the model request node transmission control module, and the signaling indicates the content: notifies the model training node to end the model training process. In step 10b2, the model requesting node network communication module sends the model training end signaling to the model training node network communication module.

In the case of the training node mode, only the signaling to notify the model to continue training can be sent; in the case of the multi-training node mode, the global model needs to be distributed to the model training nodes.

The protocol and interface principle of global model distribution are similar to the above steps 1-7. The sending module is replaced by the model request node, the receiving module is replaced by the model training node, and the compression model is replaced by the global model. In step 2, the CQI and In the reporting process, the model requesting node should initiate a CQI measurement request to the model training node, and the model training node performs CQI measurement and then feeds it back to the model requesting node.

Based on the same concept, an embodiment of the present disclosure also provides a training device.

It can be understood that, in order to realize the above-mentioned functions, the training apparatus provided by the embodiments of the present disclosure includes corresponding hardware structures and/or software modules for executing each function. Combining with the units and algorithm steps of each example disclosed in the embodiments of the present disclosure, the embodiments of the present disclosure can be implemented in hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the technical solutions of the embodiments of the present disclosure.

FIG. 12 is a block diagram of a training apparatus 100 according to an exemplary embodiment. Referring to FIG. 12 , the apparatus is applied to a first node, including a model training and compression module 110 , a first network communication module 120 , a first transmission control module 130 and a data processing and storage module 140 .

The model training and compression module 110 is configured to train a first training model in response to receiving a model training request, wherein the model training request includes model compression parameters. Based on the first training model and the model compression parameters, a first compression model of the first training model is obtained.

In an embodiment of the present disclosure, the model compression parameter includes a plurality of model compression options.

The model training and compression module 110 is configured to determine a first model compression option among the multiple model compression options, and compress the first training model based on the first model compression option to obtain a second compression model. The first loss function is determined according to the output of the first training model, the output of the second compression model, and the sample parameter set used to train the first training model. The parameters of the second compression model are updated based on the first loss function to obtain the first compression model.

In the embodiment of the present disclosure, the apparatus further includes a data processing and storage module 140 .

The data processing and storage module 140 is configured to determine the first cross entropy between the output of the second compression model and the sample parameter set, and determine the first relative entropy between the output of the second compression model and the output of the first training model Divergence. Based on the first cross entropy and the first relative entropy divergence, a first loss function is determined.

In the embodiment of the present disclosure, the data processing and storage module 140 is further configured to, according to the output of the first training model, the output of the second compression model, and the sample parameter set used for training the first training model, determine the parameters for updating the first training model. A second loss function for training model parameters.

In this embodiment of the present disclosure, the data processing and storage module 140 is further configured to determine the second cross-entropy between the output of the first training model and the sample parameter set, and to determine the difference between the output of the first training model and the second compression model The second relative entropy divergence between the outputs. Based on the second cross entropy and the second relative entropy divergence, a second loss function is determined.

In an embodiment of the present disclosure, the model compression parameters include a model training mode, which includes a single training node mode for training a single first training model and a multi-training node mode for training multiple first training models. The number of first training models is determined based on the model training mode.

In this embodiment of the present disclosure, the apparatus further includes a first network communication module 120 .

The first network communication module 120 is configured to send a second indication message, where the second indication message includes a number of the first compressed models corresponding to the model training modes.

In this embodiment of the present disclosure, the first network communication module 120 is further configured to receive a third instruction message, where the third instruction message includes an instruction to determine the training model.

In this embodiment of the present disclosure, the first network communication module 120 is further configured to receive a fourth indication message. The fourth indication message is used to indicate a third compression model, where the third compression model is a compression model obtained by federally averaging the first training model based on the number of the first compression models. Based on the third compression model, model compression parameters are re-determined, and the first compression model is updated based on the re-determined model compression parameters.

In this embodiment of the present disclosure, the network communication module 120 is further configured to receive a fifth instruction message, where the fifth instruction message is used to instruct to end training the first compression model.

The first network communication module 120 is used for data transmission and control signaling interaction between the model requesting node and the model training node.

The first transmission control module 130 is used to formulate a data transmission scheme according to the characteristics of the data to be transmitted and wireless communication conditions, and package the data to be transmitted according to the data transmission scheme, only in the embodiment in which the model requesting node is the base station and the model training node is the user Requires the use of a transport control module.

The data processing and storage module is used to manage local data, generate training sample characteristic information, collect data to generate a local training data set, and store the data set.

The model training and compression module is used for model training using the local data set, and compressing the model according to the information required for model compression obtained in the training process.

FIG. 13 is a block diagram of a training apparatus 200 according to an exemplary embodiment. Referring to FIG. 13 , the apparatus is applied to a second node, and includes a second network communication module 210 , a second transmission control module 220 , a service management module 230 and a model calculation and update module 240 .

The second network communication module 210 is configured to send a model training request. The model training request includes model compression parameters, and the model compression parameters are used to compress the first training model to obtain the first compression model, and the first training model is obtained by training based on the model training request.

In an embodiment of the present disclosure, the model compression parameters include a model training mode, which includes a single training node mode for training a single first training model and a multi-training node mode for training multiple first training models.

The number of first training models is determined based on the model training mode.

In this embodiment of the present disclosure, the second network communication module 210 is further configured to receive a second indication message, where the second indication message includes a number of the first compressed models corresponding to the model training modes.

In this embodiment of the present disclosure, the second network communication module 210 is further configured to send a third instruction message, where the third instruction message includes an instruction to determine the training model.

In this embodiment of the present disclosure, the model training mode includes a multi-training node mode, and the second network communication module 210 is further configured to send a fourth indication message. The fourth indication message is used to indicate a third compression model, where the third compression model is a compression model obtained by performing a federated average of the first compression models based on the number of the first training models.

In this embodiment of the present disclosure, the second network communication module 210 is further configured to send a fifth instruction message, where the fifth instruction message is used to instruct the end of training the first compression model.

In this embodiment of the present disclosure, the apparatus further includes a service management module 230 .

The service management module 230 is configured to receive subscription requirements and send a model training request based on the subscription requirements.

The second network communication module 210 is used for data transmission and control signaling interaction between the model requesting node and the model training node.

The second transmission control module 220 is configured to formulate a data transmission scheme according to the characteristics of the data to be transmitted and wireless communication conditions, and package the data to be transmitted according to the data transmission scheme, only in the embodiment in which the model requesting node is the base station and the model training node is the user Requires the use of a transport control module.

The service management module 230 is used to process model/analysis subscription requests, initiate model training requests to model training nodes, formulate model structures, model training modes and model compression options, and check whether the obtained models meet model/analysis subscription requirements.

The model calculation and update module 240 is used for performing federated averaging on the compressed models sent from multiple model training nodes in a multi-training node mode to obtain a global model, and distributing the global model to the model training nodes.

A model training node device for deep learning model training and compression oriented to a wireless network according to an embodiment of the present invention is responsible for: responding to a model training request from a model requesting node and reporting local resource information. Select the appropriate model compression option, and perform model training and compression according to the model training mode, the selected model compression option.

Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment of the method, and will not be described in detail here.

FIG. 14 is a block diagram of an apparatus 300 for training according to an exemplary embodiment. For example, apparatus 300 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.

14, apparatus 300 may include one or more of the following components: processing component 302, memory 304, power component 306, multimedia component 308, audio component 310, input/output (I/O) interface 312, sensor component 314, and Communication component 316 .

The processing component 302 generally controls the overall operation of the device 300, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 302 may include one or more processors 320 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 302 may include one or more modules that facilitate interaction between processing component 302 and other components. For example, processing component 302 may include a multimedia module to facilitate interaction between multimedia component 308 and processing component 302 .

Memory 304 is configured to store various types of data to support operations at device 300 . Examples of such data include instructions for any application or method operating on device 300, contact data, phonebook data, messages, pictures, videos, and the like. Memory 304 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

Power component 306 provides power to various components of device 300 . Power components 306 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power to device 300 .

Multimedia component 308 includes screens that provide an output interface between the device 300 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 308 includes a front-facing camera and/or a rear-facing camera. When the apparatus 300 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

Audio component 310 is configured to output and/or input audio signals. For example, audio component 310 includes a microphone (MIC) that is configured to receive external audio signals when device 300 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 304 or transmitted via communication component 316 . In some embodiments, audio component 310 also includes a speaker for outputting audio signals.

The I/O interface 312 provides an interface between the processing component 302 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

Sensor assembly 314 includes one or more sensors for providing status assessment of various aspects of device 300 . For example, the sensor assembly 314 can detect the open/closed state of the device 300, the relative positioning of components, such as the display and keypad of the device 300, and the sensor assembly 314 can also detect a change in the position of the device 300 or a component of the device 300 , the presence or absence of user contact with the device 300 , the orientation or acceleration/deceleration of the device 300 and the temperature change of the device 300 . Sensor assembly 314 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communication component 316 is configured to facilitate wired or wireless communication between apparatus 300 and other devices. Device 300 may access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

In an exemplary embodiment, apparatus 300 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the above method.

In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium including instructions, such as memory 304 including instructions, executable by processor 320 of apparatus 300 to perform the method described above. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

FIG. 15 is a block diagram of an apparatus 400 for training according to an exemplary embodiment. For example, the apparatus 400 may be provided as a server. 15, apparatus 400 includes processing component 422, which further includes one or more processors, and a memory resource represented by memory 432 for storing instructions executable by processing component 422, such as an application program. An application program stored in memory 432 may include one or more modules, each corresponding to a set of instructions. Additionally, the processing component 422 is configured to execute instructions to perform the training method described above.

Device 400 may also include a power supply assembly 426 configured to perform power management of device 400 , a wired or wireless network interface 450 configured to connect device 400 to a network, and an input output (I/O) interface 458 . Device 400 may operate based on an operating system stored in memory 432, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™ or the like.

It should be further understood that in the present disclosure, "plurality" refers to two or more, and other quantifiers are similar. "And/or", which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are an "or" relationship. The singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise.

It is further understood that the terms "first", "second", etc. are used to describe various information, but the information should not be limited to these terms. These terms are only used to distinguish the same type of information from one another, and do not imply a particular order or level of importance. In fact, the expressions "first", "second" etc. are used completely interchangeably. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure.

It is further to be understood that, although the operations in the embodiments of the present disclosure are described in a specific order in the drawings, it should not be construed as requiring that the operations be performed in the specific order shown or the serial order, or requiring Perform all operations shown to obtain the desired result. In certain circumstances, multitasking and parallel processing may be advantageous.

Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or techniques in the technical field not disclosed by the present disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

A training method, characterized in that, applied to a first node, the method comprising:

training a first training model in response to receiving a model training request, wherein the model training request includes model compression parameters;

Based on the first training model and the model compression parameters, a first compression model of the first training model is obtained.
The training method according to claim 1, wherein the model compression parameter comprises a plurality of model compression options;

The obtaining the first compression model of the first training model based on the first training model and the model compression parameters includes:

determining a first model compression option among the plurality of model compression options, and compressing the first training model based on the first model compression option to obtain a second compression model;

determining a first loss function according to the output of the first training model, the output of the second compression model, and the sample parameter set used to train the first training model;

The parameters of the second compression model are updated based on the first loss function to obtain the first compression model.
The training method according to claim 2, wherein, according to the output of the first training model, the output of the second compression model, and the sample parameter set used for training the first training model, Determine the first loss function, including:

determining a first cross-entropy between the output of the second compression model and the sample parameter set, and determining a first relative entropy divergence between the output of the second compression model and the output of the first training model;

The first loss function is determined based on the first cross entropy and a first relative entropy divergence.
The training method according to claim 2 or 3, wherein the method further comprises:

A second loss function for updating parameters of the first training model is determined according to the output of the first training model, the output of the second compression model, and the sample parameter set used to train the first training model .
The training method according to claim 4, wherein, according to the output of the first training model, the output of the second compression model, and the sample parameter set used for training the first training model, determining the for updating the second loss function of the parameters of the first training model, including:

determining a second cross-entropy between the output of the first training model and the sample parameter set, and determining a second relative entropy divergence between the output of the first training model and the output of the second compression model;

The second loss function is determined based on the second cross entropy and a second relative entropy divergence.
The training method according to claim 1, wherein the model compression parameter includes a model training mode, and the model training mode includes a single training node mode for training a single first training model and a single training node mode for training multiple a multi-training node mode of the first training model;

The number of the first training models is determined based on the model training mode.
The training method according to claim 6, wherein the method further comprises:

A second indication message is sent, where the second indication message includes a number of first compressed models corresponding to the model training mode.
The training method according to claim 1, wherein the method further comprises:

A third indication message is received, where the third indication message includes an indication of determining the training model.
The training method according to claim 6, wherein the model training mode comprises a multi-training node mode, and the method further comprises:

receiving a fourth indication message; the fourth indication message is used to indicate a third compression model, where the third compression model is a compression model obtained by federally averaging the first training model based on the number of the first compression models;

Based on the third compression model, the model compression parameters are re-determined, and the first compression model is updated based on the re-determined model compression parameters.
The training method according to claim 1, wherein the method further comprises:

A fifth indication message is received, where the fifth indication message is used to instruct the end of training the first compression model.
A training method, characterized in that, applied to a second node, the method comprising:

Send model training request;

Wherein, the model training request includes model compression parameters, and the model compression parameters are used to compress a first training model to obtain a first compression model, and the first training model is obtained by training based on the model training request.
The training method according to claim 11, wherein the model compression parameter includes a model training mode, and the model training mode includes a single training node mode for training a single first training model and a single training node mode for training multiple a multi-training node mode of the first training model;

The number of the first training models is determined based on the model training mode.
The training method according to claim 12, wherein the method further comprises:

A second indication message is received, the second indication message includes a number of first compressed models corresponding to the model training mode.
The training method according to claim 11, wherein the method further comprises:

A third indication message is sent, where the third indication message includes an indication of determining the training model.
The training method according to claim 12, wherein the model training mode comprises a multi-training node mode, and the method further comprises:

A fourth indication message is sent; the fourth indication message is used to indicate a third compression model, where the third compression model is a compression model obtained by performing federated averaging on the first compression model based on the number of the first training models.
The training method according to claim 11, wherein the method further comprises:

A fifth instruction message is sent, where the fifth instruction message is used to instruct the end of training the first compression model.
The training method according to claim 11, wherein the method further comprises:

A subscription requirement is received, and a model training request is sent based on the subscription requirement.
A training device, characterized in that, applied to a first node, the device comprises:

a model training module, configured to train a first training model in response to receiving a model training request, wherein the model training request includes model compression parameters;

A model compression module, configured to obtain a first compression model of the first training model based on the first training model and the model compression parameters.
A training device, characterized in that, applied to a second node, the device comprising:

Network communication module, used to send model training request;

Wherein, the model training request includes model compression parameters, and the model compression parameters are used to compress a first training model to obtain a first compression model, and the first training model is obtained by training based on the model training request.
A training device, comprising:

processor;

memory for storing processor-executable instructions;

Wherein, the processor is configured to: execute the training method described in any one of claims 1-10, or execute the training method described in any one of claims 11-17.
A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the mobile terminal, the mobile terminal can execute the training method described in any one of claims 1-10, or execute The training method of any one of claims 11-17.