CN113987004B

CN113987004B - A model training method and device

Info

Publication number: CN113987004B
Application number: CN202111177135.1A
Authority: CN
Inventors: 孙宇
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-10-09
Filing date: 2021-10-09
Publication date: 2025-03-14
Anticipated expiration: 2041-10-09
Also published as: CN113987004A

Abstract

The embodiment of the present invention provides a model training method and device, which is applied to a distributed system including a task node device and a parameter node device. The method includes: when the task node device adds new training data, sending the target data identifier to the parameter node device; when the parameter node device does not find the model parameter corresponding to the target data identifier, generating the corresponding target model parameter for the target data identifier, and storing the target data identifier and the target model parameter in the memory of the parameter node device in a non-continuous storage manner; sending the target model parameter to the task node device through the parameter node device, and then calculating the target gradient value according to the training data corresponding to the target data identifier and the target model parameter. Then, the target model parameter is updated based on the target gradient value. In the case of adding new training data, the present invention can ensure that the training is carried out normally, so there is no need to terminate the training.

Description

Model training method and device

Technical Field

The invention relates to the field of deep learning model training, in particular to a model training method and device.

Background

Model training involves a large amount of data and computation, and the overall process is quite complex, resulting in a generally long training period. At present, distributed training is touted by people with extremely high efficiency. The PS (Parameter Server) architecture is represented by the architecture, and comprises a parameter server node and a task node, wherein the main functions of the parameter server node are to initialize and save model parameters, accept local gradients calculated by the task node, summarize and calculate global gradients, and update the model parameters. The main functions of the task nodes are to respectively store part of training data, initialize the model, pull the latest model parameters from the parameter server nodes, read the model parameters, calculate local gradients according to the training data, and upload the local gradients to the parameter server nodes. The two nodes have their own functions, so that the data and the calculation process are prevented from being concentrated at one place.

In the process of model training by adopting a PS architecture, the existing deep learning platform needs to prepare training data in advance and correspondingly set model parameters based on the training data. After training is initiated, the model is trained based on the prepared training data.

However, in the training process, if new training data is additionally added, the training is terminated, so that the training can only be restarted, and the training process is delayed.

Disclosure of Invention

In view of the above problems, embodiments of the present invention provide a method and an apparatus for training a model, so as to solve the problem that training efficiency is affected due to the need of stopping training when training data is newly added in the prior art.

In a first aspect of the present invention, a training method of a model is provided, which is applied to a distributed system including task node devices and parameter node devices, where the task node devices store training data required for training the model, and the parameter node devices store data identifiers of each piece of training data and model parameters corresponding to each piece of data identifiers, and the method includes:

Under the condition that the task node equipment adds training data newly, a target data identifier is sent to the parameter node equipment, wherein the target data identifier is a data identifier in the training data newly added;

generating corresponding target model parameters aiming at the target data identifiers under the condition that the parameter node equipment does not find the model parameters corresponding to the target data identifiers, and storing the target data identifiers and the target model parameters in a memory of the parameter node equipment in a discontinuous storage mode;

The parameter node equipment sends the target model parameters to the task node equipment so that the task node equipment calculates and obtains target gradient values according to the training data corresponding to the target data identification and the target model parameters;

and sending the target gradient value to the parameter node equipment through the task node equipment so that the parameter node equipment updates the target model parameter based on the target gradient value.

Optionally, after the parameter node device updates the target model parameter based on the target gradient value, the method further comprises:

Determining an actual data range value based on the data identification and the model parameter in the memory by the parameter node equipment, wherein the actual data range value comprises the element number of each dimension of the data identification and the model parameter in the memory;

And under the condition that the actual data range value is different from a preset data range value, storing the data identification and the model parameters in the memory into the external memory of the parameter node equipment according to the actual data range value, wherein the preset data range value comprises the element number of each dimension of the data identification and the model parameters determined based on training data before training is started.

Optionally, the storing the data identifier and the model parameter in the memory according to the actual data range value in the external memory of the parameter node device includes:

Generating an identification tensor and a parameter tensor according to the data identification and the model parameters in the memory respectively;

and respectively storing the identification tensor and the parameter tensor into the external memory of the parameter node equipment according to the element quantity of each dimension of the identification tensor and the parameter tensor.

Optionally, the parameter node device updates the target model parameter based on the target gradient value, including:

And updating the target model parameters by adopting a target optimizer based on the target gradient values, wherein the target optimizer is an optimizer designed for the variables stored in the discontinuous storage.

Optionally, in the case that the newly added training data includes a plurality of pieces of training data and the number of the parameter node devices is a plurality of pieces, the sending the target data identifier to the parameter node device includes:

and respectively sending the respective target data identifiers in the plurality of pieces of training data to different parameter node equipment.

In a second aspect of the present invention, there is also provided a training apparatus for a model, applied to a distributed system including a task node device and a parameter node device, where the task node device stores training data required for training the model, and the parameter node device stores a data identifier of each piece of training data and a model parameter corresponding to each data identifier, the apparatus includes:

the receiving module is used for receiving target information input by a user, wherein the target information comprises information required by a plurality of different applications when inquiring data in respective corresponding data sources;

The sending module is used for sending the target data identifier to the parameter node equipment under the condition that the task node equipment adds the training data newly, wherein the target data identifier is the data identifier in the training data newly added;

The parameter return module is used for generating corresponding target model parameters aiming at the target data identifier under the condition that the parameter node equipment does not find the model parameters corresponding to the target data identifier, and storing the target data identifier and the target model parameters in a memory of the parameter node equipment in a discontinuous storage mode;

The gradient module is used for sending the target model parameters to the task node equipment through the parameter node equipment so that the task node equipment calculates a target gradient value according to the training data corresponding to the target data identification and the target model parameters;

And the updating module is used for sending the target gradient value to the parameter node equipment through the task node equipment so that the parameter node equipment can update the target model parameter based on the target gradient value.

Optionally, the apparatus further comprises:

The judging module is used for determining an actual data range value based on the data identifier and the model parameter in the memory through the parameter node equipment, wherein the actual data range value comprises the element number of each dimension of the data identifier and the model parameter in the memory;

And the storage module is used for storing the data identification and the model parameters in the memory into the external memory of the parameter node equipment according to the actual data range value under the condition that the actual data range value is different from the preset data range value, wherein the preset data range value comprises the element number of each dimension of the data identification and the model parameters determined based on training data before training is started.

Optionally, the storage module includes:

the generating unit is used for generating an identification tensor and a parameter tensor according to the data identification and the model parameters in the memory respectively;

And the storage unit is used for respectively storing the identification tensor and the parameter sheet into the external memory of the parameter node equipment according to the element quantity of each dimension of the identification tensor and the parameter tensor.

Optionally, the updating module is specifically configured to update the target model parameter with a target optimizer based on the target gradient value, where the target optimizer is an optimizer designed for a variable stored in a discontinuous manner.

Optionally, when the newly added training data includes a plurality of pieces of training data and the number of the parameter node devices is plural, the sending module is specifically configured to send the respective target data identifiers in the plurality of pieces of training data to different parameter node devices respectively.

In a third aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

A memory for storing a computer program;

And the processor is used for realizing the steps of the training method of the model when executing the program stored in the memory.

In a fourth aspect of the present invention, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the training method of the model according to any of the first aspects.

In a fifth aspect of the invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the training method of the model described above.

Aiming at the prior art, the invention has the following advantages:

The training method of the model is applied to a distributed system comprising task node equipment and parameter node equipment, wherein the task node equipment stores training data required by training the model, the parameter node equipment stores data identifiers of each piece of training data and model parameters corresponding to each data identifier, and the method comprises the step of sending target data identifiers to the parameter node equipment under the condition that the task node equipment adds training data newly, wherein the target data identifiers are data identifiers in the newly added training data. Training data is directly added without stopping training. Under the condition that the parameter node equipment does not find the model parameters corresponding to the target data identification, generating corresponding target model parameters aiming at the target data identification, and storing the target data identification and the target model parameters in a memory of the parameter node equipment in a discontinuous storage mode. Even if the model parameters associated with the newly added training data are not found, the corresponding target model parameters can be directly generated, so that direct error reporting is avoided, and the model parameters are stored in a discontinuous storage mode, so that overflow of the feature space can be avoided. And sending the target model parameters to the task node equipment through the parameter node equipment so that the task node equipment calculates the target gradient value according to the training data corresponding to the target data identification and the target model parameters. The target gradient value obtained through the reverse calculation can be used for preparing for updating the target parameter model. And sending the target gradient value to the parameter node equipment through the task node equipment so that the parameter node equipment updates the target model parameter based on the target gradient value. And updating the target model parameters through the target gradient values so that the numerical values of the target model parameters are more accurate and effective. The embodiment of the invention can process the newly added training data under the condition of the newly added training data, avoid reporting errors due to the fact that the model parameters associated with the newly added training data cannot be found, avoid overflow of the characteristic space, and ensure that the training is normally carried out, so that the training is not required to be stopped.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the steps of a training method for a model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a discontinuous storage architecture according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of storing data identifiers and model parameters into an external memory according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of recovering data identifiers and model parameters according to an embodiment of the present invention;

Fig. 5 is a schematic diagram of practical application of a training method of a model according to an embodiment of the present invention;

FIG. 6 is a block diagram of a training device for a model according to an embodiment of the present invention;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

Referring to fig. 1, an embodiment of the present invention provides a training method for a model, which is applied to a distributed system including task node devices and parameter node devices, where the task node devices store training data required for training the model, and the parameter node devices store data identifiers of each piece of training data and model parameters corresponding to each data identifier. The number of training data stored by the task node device is a plurality of pieces, preferably tens of thousands, but is not limited thereto. To distinguish between different training data, each piece of training data is provided with a unique data identification. For example, the training data is user characteristic data, the data identification may be a user identification for distinguishing between different users. Each data identifier in the parameter node device corresponds to a portion of the model parameters, where the data identifier and the model parameters may be stored in a key-value pair structure.

It is understood that the model trained by the training method of the model may be a recommendation model in a recommendation scene of large-scale sparse features, but is not limited thereto. In the case where the training object of the training method is a recommendation model, the training data is a large-scale sparse feature on the user behavior. Sparse features about user behavior are input into the recommendation model, and preferences of the user or data content of the user preferences can be output, so that the purpose of recommendation is achieved.

The method may include:

And 101, under the condition that the task node equipment adds training data, sending the target data identification to the parameter node equipment.

It should be noted that, in the model training process, the user may introduce the newly added training data into the task node device, so as to add the training data to the model training, and improve the model training result. For example, in the process of training the recommendation model, the user may preprocess the behavior data of a large number of users collected recently, so that the data format of the behavior data is consistent with that of the original training data, thereby generating new training data, and the new training data is imported into the task node device as the new training data.

The target data identifier is a data identifier in the newly added training data, and model parameters associated with the newly added training data can be pulled from the parameter node device through the target data identifier.

Step 102, under the condition that the parameter node equipment does not find the model parameters corresponding to the target data identification, generating the corresponding target model parameters aiming at the target data identification, and storing the target data identification and the target model parameters in a memory of the parameter node equipment in a discontinuous storage mode.

It should be noted that the model parameters associated with the training data are stored in the parameter node device. Wherein the model parameters are typically divided into a plurality of different parts, each part of model parameters corresponding to a respective associated training data, and processed by the respective associated training data. Here, the model parameters correspond to the training data associated therewith, i.e. the model parameters correspond to the data identifications of the training data associated therewith.

The method comprises the steps that model parameters corresponding to target data identifiers are not searched in parameter node equipment, training data indicated by the target data identifiers are newly added training data, and in order to avoid error reporting, a target model parameter corresponding to the target data identifiers needs to be generated, namely, a model parameter is initialized in the parameter node equipment to serve as the target model parameter according to a preset algorithm. Thus, model parameters corresponding to the target data identification can be found in the parameter node equipment.

It can be appreciated that, to increase the training speed, the parameter node device generally backs up the pre-stored model parameters to the memory, and directly searches the memory for the model parameters. Accordingly, after the target model parameters are generated, they are stored in the memory in the form of the original model parameters. Namely, the data are stored in the memory of the parameter node equipment in a discontinuous storage mode. As shown in fig. 2, in the form of discontinuous storage, data to be stored can be freely added in the memory.

Of course, when the parameter node device finds the model parameter corresponding to the target data identifier, the found model parameter is taken as the target model parameter.

And 103, sending the target model parameters to task node equipment through the parameter node equipment so that the task node equipment calculates the target gradient value according to the training data corresponding to the target data identification and the target model parameters.

It should be noted that, the task node device is configured to pull the model parameters on the parameter node device, and calculate a gradient value based on the training data, so that the parameter node device updates the model parameters. Thus, the target model parameter is the model parameter pulled by the task node device and associated with the training data indicated by the target data identifier. Here, when calculating the target gradient value, the calculation may be performed according to any gradient algorithm, which is not described herein.

And 104, transmitting the target gradient value to the parameter node equipment through the task node equipment so that the parameter node equipment updates the target model parameter based on the target gradient value.

It should be noted that the parameter node device is adapted to update the model parameters, i.e. to update the parameter values of the target model parameters, based on the gradient values.

The training method of the model in the embodiment of the invention is applied to a distributed system comprising task node equipment and parameter node equipment, wherein the task node equipment stores training data required by a training model, the parameter node equipment stores data identifiers of each piece of training data and model parameters corresponding to each data identifier, and the method comprises the step of sending target data identifiers to the parameter node equipment under the condition that the task node equipment adds training data newly, wherein the target data identifiers are data identifiers in the newly added training data. Training data is directly added without stopping training. Under the condition that the parameter node equipment does not find the model parameters corresponding to the target data identification, generating corresponding target model parameters aiming at the target data identification, and storing the target data identification and the target model parameters in a memory of the parameter node equipment in a discontinuous storage mode. Even if the model parameters associated with the newly added training data are not found, the corresponding target model parameters can be directly generated, so that direct error reporting is avoided, and the model parameters are stored in a discontinuous storage mode, so that overflow of the feature space can be avoided. And sending the target model parameters to the task node equipment through the parameter node equipment so that the task node equipment calculates the target gradient value according to the training data corresponding to the target data identification and the target model parameters. The target gradient value obtained through the reverse calculation can be used for preparing for updating the target parameter model. And sending the target gradient value to the parameter node equipment through the task node equipment so that the parameter node equipment updates the target model parameter based on the target gradient value. And updating the target model parameters through the target gradient values so that the numerical values of the target model parameters are more accurate and effective. The embodiment of the invention can process the newly added training data under the condition of the newly added training data, avoid reporting errors due to the fact that the model parameters associated with the newly added training data cannot be found, avoid overflow of the characteristic space, and ensure that the training is normally carried out, so that the training is not required to be stopped.

Optionally, the data identifier and the model parameter stored in the memory can be synchronized to the external memory at intervals or at intervals of a preset number of training times in the training process, so that the data loss caused by power failure is avoided. Specifically, after the parameter node device updates the target model parameters based on the target gradient values, the method further includes:

and determining an actual data range value based on the data identification and the model parameters in the memory by the parameter node equipment, wherein the actual data range value comprises the element number of each dimension of the data identification and the model parameters in the memory.

It should be noted that the data identifier and the model parameter in the memory are usually multidimensional arrays, for example, may be 100×10 two-dimensional arrays, and the number of elements in the first dimension is 100, and the number of elements in the second dimension is 10. Accordingly, the actual data range value may be (100, 10).

And under the condition that the actual data range value is different from the preset data range value, storing the data identification and the model parameters in the memory into the external memory of the parameter node equipment according to the actual data range value.

It should be noted that the preset data range values include the data identification and the number of elements per dimension of the model parameters determined based on the training data before starting training. Before training is started, the number of elements in each dimension of the data identification and model parameters to be stored in the memory estimated based on all the training data at present is estimated. For example, there are 100 pieces of training data currently, and the number of data identifiers and model parameters corresponding to each piece of training data is 10, and then the preset data range value may be (100, 10). Here, the preset data range value is similar to the Shape parameter in the machine learning system Tensorflow, and a description thereof will be omitted.

In the embodiment of the invention, the data identification and the model parameters in the memory are synchronously stored in the external memory, so that the data loss after power failure can be avoided, and meanwhile, the data identification and the model parameters can be stored even if the actual data range value is different from the preset data range value.

Optionally, storing the data identifier and the model parameter in the memory into the external memory of the parameter node device according to the actual data range value includes:

And generating an identification tensor and a parameter tensor according to the data identification and the model parameters in the memory respectively.

It should be noted that, the data identifier in the memory of the parameter node device and the model parameter are stored together, so that the direct data query can be facilitated. The data identifiers and the model parameters are in one-to-one correspondence with each other in a key value pair (K-V) structural form, namely, the data identifiers are keys in the key value pair, and the corresponding model parameters are value. Therefore, when key tensors are required to be generated according to all keys in the memory and value tensors are required to be generated according to all values, only an identification tensor is required to be generated according to the data identification in the memory, and a parameter tensor is required to be generated according to the model in the memory. The identification tensor is the key tensor, and the parameter tensor is the value tensor.

And respectively storing the identification tensor and the parameter tensor into the external memory of the parameter node equipment according to the element number of each dimension of the identification tensor and the parameter tensor.

It should be noted that the identification tensor and the parameter tensor are stored according to actual conditions, so that the loss of data is avoided. For example, if the budget tensor is a two-dimensional tensor of 100×10 and the actual parameter tensor is a two-dimensional tensor of 90×10, the two-dimensional tensor of 90×1 is identified after the budget tensor is stored in the external memory, and the parameter tensor is a two-dimensional tensor of 90×10.

As shown in fig. 3, a schematic diagram of storing data identifiers and model parameters into an external memory is shown, wherein a data tensor is the data identifiers and model parameters in the internal memory of the parameter node device, and the data identifiers and model parameters are represented by adopting a tensor data format. Generating an identification tensor according to the data identification in the memory, and generating a parameter tensor according to the parameter model in the memory. And further, by means of the logic about shape parameter detection in the existing Tensorflow, whether the predefined data range value and the actual data range value are the same is judged. And stored according to the actual data range value.

Of course, when the data identifier and the model parameter are recovered after the training is terminated, a similar method for storing the data identifier and the model parameter can be adopted to recover, as shown in fig. 4, to recover the identifier tensor and the parameter tensor respectively, to determine whether the predefined data range value and the actual data range value are the same, and to recover according to the actual data range value under different conditions to obtain the data identifier and the model parameter, which will not be described in detail herein.

In the embodiment of the invention, the data identification and the model parameters can be stored respectively, so that the subsequent use is convenient.

And updating the target model parameters by adopting a target optimizer based on the target gradient values, wherein the target optimizer is an optimizer designed for the variables stored discontinuously.

It should be noted that after the gradient values are calculated, the parameter matrix (model parameters) needs to be updated according to certain strategies and algorithms, which are called optimizers. It will be appreciated that there are differences in optimizers for variables of different storage forms. The target optimizer used by the present invention is different from the existing optimizers designed for variables of the continuity store. Wherein, before using the optimizer to process gradient values, an optimizer subgraph needs to be constructed, specifically, whether the storage form of the self variables (data identifier and model parameters) is a discontinuous storage form or not can be detected, if yes, an optimizer designed for the variables of discontinuous storage is added to the calculation graph, and the optimizers include, but are not limited to, a FTRL (following-the-regularized-Leader) optimizer and an SGD (random gradient descent) optimizer. The calculation map is equivalent to the calculation map in Tensorflow, and will not be described in detail here.

In the embodiment of the invention, the target gradient value is optimized by the optimizer designed for the discontinuously stored variable, so that the updated target model parameter has a more training effect.

Optionally, in the case that the newly added training data includes a plurality of pieces of training data and the number of parameter node devices is a plurality of pieces, sending the target data identifier to the parameter node device includes:

It should be noted that by setting a plurality of parameter node apparatuses, a plurality of pieces of training data can be processed in parallel. Wherein the processing procedure of each parameter node device is the same. Fig. 5 is a schematic diagram of practical application of the training method of the model according to the embodiment of the present invention, where only one task node device and two parameter node devices are illustrated as examples. The new training data includes a plurality of pieces of training data, where typically, the data identifiers of the plurality of pieces of training data are connected together to form a string, that is, a data identifier string, and the data string may be preprocessed, and the string data may be converted into integer data, and then the preprocessed data identifier string may be divided into individual data identifiers by using a hash algorithm, and fig. 5 only shows two data identifiers (the first data identifier and the second data identifier), but is not limited thereto. Here, the subsequent processing for the first data identifier and the subsequent processing for the second data identifier are the same, and only the subsequent processing for the first data identifier will be described here. The task node equipment sends the first data identification to the parameter node equipment 1, the parameter node equipment 1 carries out return processing, namely whether the first data identification exists in the data identification in the local memory is searched, if so, the model parameter corresponding to the first data identification is determined to be the first model parameter, and if not, the model parameter corresponding to the first data identification is initialized by using a preset algorithm to be used as the first model parameter. And correspondingly storing the first model parameters and the first data identification in a memory in a discontinuous storage mode, and simultaneously sending the first model parameters to task node equipment. The task node equipment reversely calculates a first gradient value and sends the first gradient value to the parameter node equipment 1, the parameter node equipment 1 updates the first model parameter based on the received first gradient value, and replaces the parameter value before updating by the updated parameter value to realize the storage of the updated first model parameter.

In the embodiment of the invention, the different training parameters are respectively processed by utilizing the plurality of parameter node devices, so that the processing efficiency is improved.

Having described the training method of the model provided by the embodiment of the present invention, the training device of the model provided by the embodiment of the present invention will be described below with reference to the accompanying drawings.

Referring to fig. 6, an embodiment of the present invention further provides a training apparatus for a model, which is applied to a distributed system including a task node device and a parameter node device, where the task node device stores training data required for training the model, and the parameter node device stores a data identifier of each piece of training data and a model parameter corresponding to each data identifier, and the apparatus includes:

The device comprises a receiving module, a receiving module and a processing module, wherein the receiving module is used for receiving target information input by a user, the target information comprises information required by a plurality of different applications when inquiring data in respective corresponding data sources, and the device comprises:

The sending module 61 is configured to send a target data identifier to the parameter node device when the task node device adds training data, where the target data identifier is a data identifier in the newly added training data;

The parameter return module 62 is configured to generate a corresponding target model parameter for the target data identifier, and store the target data identifier and the target model parameter in a memory of the parameter node device in a discontinuous storage manner when the parameter node device does not find the model parameter corresponding to the target data identifier;

The gradient module 63 is configured to send the target model parameter to the task node device through the parameter node device, so that the task node device calculates a target gradient value according to the training data corresponding to the target data identifier and the target model parameter;

the updating module 64 is configured to send, by the task node device, the target gradient value to the parameter node device, so that the parameter node device updates the target model parameter based on the target gradient value.

Optionally, the apparatus further comprises:

The judging module is used for determining an actual data range value based on the data identification and the model parameters in the memory through the parameter node equipment, wherein the actual data range value comprises the number of elements of each dimension of the data identification and the model parameters in the memory;

and the storage module is used for storing the data identification and the model parameters in the memory into the external memory of the parameter node equipment according to the actual data range value under the condition that the actual data range value is different from the preset data range value, wherein the preset data range value comprises the data identification and the element number of each dimension of the model parameters, which are determined based on training data before training is started.

Optionally, the storage module includes:

and the storage unit is used for respectively storing the identification tensor and the parameter tensor into the external memory of the parameter node equipment according to the element number of each dimension of the identification tensor and the parameter tensor.

Optionally, the updating module 64 is specifically configured to update the target model parameter with a target optimizer based on the target gradient value, where the target optimizer is an optimizer designed for the non-continuously stored variable.

Optionally, when the newly added training data includes a plurality of pieces of training data and the number of the parameter node devices is plural, the sending module 61 is specifically configured to send the respective target data identifiers in the plurality of pieces of training data to different parameter node devices respectively.

The training device for the model provided by the embodiment of the invention can realize each process of the training method implementation of the model in the method embodiments of fig. 1to 5, and in order to avoid repetition, the description is omitted here.

The embodiment of the invention can process the newly added training data under the condition of the newly added training data, avoid reporting errors due to the fact that the model parameters associated with the newly added training data cannot be found, avoid overflow of the characteristic space, and ensure that the training is normally carried out, so that the training is not required to be stopped.

The embodiment of the invention also provides an electronic device, as shown in fig. 7, which comprises a processor 701, a communication interface 702, a memory 703 and a communication bus 704, wherein the processor 701, the communication interface 702 and the memory 703 complete communication with each other through the communication bus 704;

a memory 703 for storing a computer program;

The processor 701 is configured to execute the program stored in the memory 703, and implement the following steps:

Under the condition that the task node equipment adds training data newly, a target data identifier is sent to the parameter node equipment, wherein the target data identifier is a data identifier in the newly added training data;

under the condition that the parameter node equipment does not find the model parameters corresponding to the target data identification, generating corresponding target model parameters aiming at the target data identification, and storing the target data identification and the target model parameters in a memory of the parameter node equipment in a discontinuous storage mode;

Transmitting the target model parameters to task node equipment through parameter node equipment so that the task node equipment calculates and obtains target gradient values according to training data corresponding to the target data identification and the target model parameters;

The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the terminal and other devices.

The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central Processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a digital signal processor (DIGITAL SIGNAL Processing, DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components.

In yet another embodiment of the present invention, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform the training method of the model according to any of the above embodiments.

In a further embodiment of the present invention, a computer program product comprising instructions is also provided which, when run on a computer, causes the computer to perform the training method of the model described in the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A training method of a model, which is applied to a distributed system including task node devices and parameter node devices, wherein the task node devices store training data required for training the model, and the parameter node devices store data identifiers of each piece of training data and model parameters corresponding to each data identifier, and the method comprises:

transmitting, by the task node device, the target gradient value to the parameter node device, so that the parameter node device updates the target model parameter based on the target gradient value;

after the parameter node device updates the target model parameters based on the target gradient values, the method further comprises:

2. The method according to claim 1, wherein storing the data identifier and the model parameter in the memory in the external memory of the parameter node device according to the actual data range value comprises:

3. The method of claim 1, wherein the parameter node device updating the target model parameters based on the target gradient values comprises:

4. The method according to claim 1, wherein in the case where the newly added training data includes a plurality of pieces of training data and the number of the parameter node devices is a plurality, the transmitting the target data identification to the parameter node device includes:

5. A training apparatus for a model, applied to a distributed system including a task node device and a parameter node device, wherein the task node device stores training data required for training the model, and the parameter node device stores a data identifier of each piece of training data and a model parameter corresponding to each data identifier, the apparatus comprising:

the updating module is used for sending the target gradient value to the parameter node equipment through the task node equipment so that the parameter node equipment can update the target model parameter based on the target gradient value;

6. The apparatus of claim 5, wherein the memory module comprises:

7. The apparatus according to claim 5, wherein the updating module is configured to update the target model parameters with a target optimizer based on the target gradient values, wherein the target optimizer is an optimizer designed for non-continuously stored variables.

8. The apparatus according to claim 5, wherein, in a case where the newly added training data includes a plurality of pieces of training data and the number of the parameter node devices is plural, the sending module is specifically configured to send the respective target data identifiers in the plurality of pieces of training data to different parameter node devices respectively.