Summary of the invention
The application provides a kind of method and apparatus of training pattern, and local runtime mode and the distributed method of operation are existed
It realizes in TensorFlow, is determined in a manner of local runtime still without relying on Spark environment, and by introducing configuration parameter
Distributed method of operation operation, reduces the development difficulty of developer.
In a first aspect, providing a kind of method of training pattern, the method is applied in learning system, the study system
System includes model building module and control module, which comprises model building module establishes training pattern, and to control mould
Block returns to model handle;Control module obtains the model handle;Control module obtains configuration parameter, and the configuration parameter is used for
Indicate the method for operation that training pattern needs to use, wherein the method for operation includes local runtime mode or distributed operation
Mode, the configuration parameter can be user's input;Control module determines that training pattern needs to use according to the configuration parameter
The method of operation;Control module is based on the method for operation and reads data;Control module is according to the model handle and the fortune
Line mode runs the training pattern, obtains the training result of the training pattern of the data.This provides one kind
The solution of offline exploitation and distributed integeration can be needed not rely on spark distributed environment, distribution can be realized
Operation.
Optionally, the configuration parameter can be the variable ctx of introducing.
In an optional implementation manner, the control module determines the training model need according to the configuration parameter
The method of operation to be used, comprising:
If the configuration parameter is the first variate-value, the control module is determined using local runtime mode as training mould
Type needs the method for operation used;
Wherein, the control module runs the training pattern, obtains according to the model handle and the method for operation
The training result of the training pattern of the data, comprising:
The control module uses the model handle, with training pattern described in local runtime function operation, obtains described
The training result of the training pattern of data, wherein the data are local datas.
In this way, control module can be trained the behaviour of model using local runtime mode based on the value of configuration parameter
Make.
In an optional implementation manner, the learning system further includes debugging module, the method also includes:
Process of the debugging module to the training result obtained based on the local runtime function executes debugging behaviour
Make.
Since distribution operation can not execute debugging function, disturbing factor (such as the net of debugging is executed in distribution operation
Network delay, low memory etc.) it is relatively more, debugging difficulty can be reduced by introducing debugging module, i.e., debug and rerun in local
On distributed platform.
In an optional implementation manner, the control module determines that training pattern needs are adopted according to the configuration parameter
The method of operation, comprising:
If the configuration parameter is the second variate-value, the control module is determined using the distributed method of operation as training
The model needs the method for operation used;
Wherein, the control module runs the training pattern, obtains according to the model handle and the method for operation
The training result of the training pattern of the data, comprising:
The control module uses the model handle, runs training pattern described in function operation in a distributed manner, obtains institute
State the model training result of data.
In this way, control module can be trained model using the distributed method of operation based on the value of configuration parameter
Operation.
In an optional implementation manner, the control module is based on the method for operation and reads data, comprising:
The control module reads distributed data;
Wherein, the control module uses the model handle, runs training pattern described in function operation in a distributed manner, obtains
To the model training result of the data, comprising:
The control module is based on the distributed data, and training pattern function is called to instruct the distributed data
Practice, or anticipation function is called to predict the distributed data;
The control module runs the corresponding model running function of the distributed data, obtains the instruction of the training pattern
Practice result.
In an optional implementation manner, the method also includes:
The control module calls preservation model function, and stores the training pattern based on the preservation model function
Training result.Here it is possible to be saved to training result obtained above, so as to subsequent use.
Second aspect provides a kind of device of training pattern, and described device is applied in learning system, such as
TensorFlow system.Described device includes: model building module, returns to mould for establishing training pattern, and to control module
Type handle;The control module, for obtaining the model handle;The control module is also used to obtain configuration parameter, described
Configuration parameter is used to indicate the method for operation that training pattern needs to use, wherein the method for operation includes local runtime mode
Or the distributed method of operation;The control module is also used to determine the fortune that training pattern needs to use according to the configuration parameter
Line mode;The control module is also used to read data based on the method for operation;And according to the model handle and the fortune
Line mode runs the training pattern, obtains the training result of the training pattern of the data.This provides one kind
The solution of offline exploitation and distributed integeration can be needed not rely on spark distributed environment, distribution can be realized
Operation.
Optionally, the configuration parameter can be the variable ctx of introducing.
In an optional implementation manner, the control module is used to determine the training mould according to the configuration parameter
Type needs the method for operation used, specifically includes:
It is determining that local runtime mode is used to need to use as training pattern if the configuration parameter is the first variate-value
The method of operation;
Wherein, the control module is used to run the training pattern according to the model handle and the method for operation,
The training result of the training pattern of the data is obtained, is specifically included:
The instruction of the data is obtained with training pattern described in local runtime function operation using the model handle
Practice the training result of model, wherein the data are local datas.
In this way, control module can be trained the behaviour of model using local runtime mode based on the value of configuration parameter
Make.
In an optional implementation manner, described device further include:
Debugging module executes debugging for the process to the training result obtained based on the local runtime function
Operation.
Since distribution operation can not execute debugging function, disturbing factor (such as the net of debugging is executed in distribution operation
Network delay, low memory etc.) it is relatively more, debugging difficulty can be reduced by introducing debugging module, i.e., debug and rerun in local
On distributed platform.
In an optional implementation manner, the control module is used to determine the training mould according to the configuration parameter
Type needs the method for operation used, specifically includes:
If the configuration parameter is the second variate-value, determines and need to use as training pattern using the distributed method of operation
The method of operation;
Wherein, the control module is used to run the training pattern according to the model handle and the method for operation,
The training result of the training pattern of the data is obtained, is specifically included:
Using the model handle, training pattern described in function operation is run in a distributed manner, obtains the model of the data
Training result.
In this way, control module can be trained model using the distributed method of operation based on the value of configuration parameter
Operation.
In an optional implementation manner, the control module is used to read data based on the method for operation, specifically
Include:
Read distributed data;
Wherein, the control module is used to use the model handle, runs training mould described in function operation in a distributed manner
Type obtains the model training of the data as a result, specifically including:
Based on the distributed data, training pattern function is called to be trained the distributed data, or called
Anticipation function predicts the distributed data;
The corresponding model running function of the distributed data is run, the training result of the training pattern is obtained.
In an optional implementation manner, the control module is also used to: calling preservation model function, and based on described
Preservation model function stores the training result of the training pattern.Here it is possible to training result obtained above is saved,
So as to subsequent use.
The third aspect provides a kind of system of training pattern, comprising: model building module, for establishing training pattern,
And model handle is returned to control module;Control module, for obtaining the model handle;The control module is also used to obtain
Configuration parameter is taken, the configuration parameter is used to indicate the method for operation that training pattern needs to use, wherein the method for operation packet
Include local runtime mode or the distributed method of operation;The control module is also used to determine training mould according to the configuration parameter
Type needs the method for operation used;The control module is also used to read data based on the method for operation;And according to the mould
Type handle and the method for operation, run the training pattern, obtain the training result of the training pattern of the data.This
Sample provides a kind of solution that will can be developed offline with distributed integeration, need not rely on spark distributed environment,
Distributed operation can be realized.
Optionally, the configuration parameter can be the variable ctx of introducing.
In an optional implementation manner, the control module is used to determine training pattern need according to the configuration parameter
The method of operation to be used, specifically includes:
If the configuration parameter is the first variate-value, determines and adopted using local runtime mode as the training model needs
The method of operation;
Wherein, the control module is used to run the training pattern according to the model handle and the method for operation,
The training result of the training pattern of the data is obtained, is specifically included:
The instruction of the data is obtained with training pattern described in local runtime function operation using the model handle
Practice the training result of model, wherein the data are local datas.
In this way, control module can be trained the behaviour of model using local runtime mode based on the value of configuration parameter
Make.
In an optional implementation manner, described device further include:
Debugging module executes debugging for the process to the training result obtained based on the local runtime function
Operation.
Since distribution operation can not execute debugging function, disturbing factor (such as the net of debugging is executed in distribution operation
Network delay, low memory etc.) it is relatively more, debugging difficulty can be reduced by introducing debugging module, i.e., debug and rerun in local
On distributed platform.
In an optional implementation manner, the control module is used to determine training pattern need according to the configuration parameter
The method of operation to be used, specifically includes:
If the configuration parameter is the second variate-value, determines and need to use as training pattern using the distributed method of operation
The method of operation;
Wherein, the control module is used to run the training pattern according to the model handle and the method for operation,
The training result of the training pattern of the data is obtained, is specifically included:
Using the model handle, training pattern described in function operation is run in a distributed manner, obtains the model of the data
Training result.
In this way, control module can be trained model using the distributed method of operation based on the value of configuration parameter
Operation.
In an optional implementation manner, the control module is used to read data based on the method for operation, specifically
Include:
Read distributed data;
Wherein, the control module is used to use the model handle, runs training mould described in function operation in a distributed manner
Type obtains the model training of the data as a result, specifically including:
Based on the distributed data, training pattern function is called to be trained the distributed data, or called
Anticipation function predicts the distributed data;
The corresponding model running function of the distributed data is run, the training result of the training pattern is obtained.
In an optional implementation manner, the control module is also used to: calling preservation model function, and based on described
Preservation model function stores the training result of the training pattern.Here it is possible to training result obtained above is saved,
So as to subsequent use.
Fourth aspect provides a kind of computer readable storage medium, which has program,
The program makes computer or intelligent learning system execute any training in any of the above-described aspect and its various implementations
The method of model.
5th aspect, the application also provides a kind of computer program product comprising instruction, when it runs on computers
When, so that the method that computer executes the training pattern in above-mentioned various aspects.
6th aspect, provides a kind of device of training pattern, which includes processor, memory and transceiver.Place
Reason device is connect with memory and transceiver.For storing instruction, processor is used for memory for executing the instruction, transceiver
It is communicated under the control of processor with other network elements.The processor execute the memory storage instruction when, the execution so that
The method that the processor executes the training pattern in above-mentioned various aspects.
Specific embodiment
Below in conjunction with attached drawing, the technical solution in the application is described.
The technical solution of the embodiment of the present application can be applied to learning system (for example, machine learning system, intelligence learning system
System or deep learning system) for example, TensorFlow learning system.TensorFlow learning system is by complicated data structure
It is transmitted in artificial intelligence nerve net and carries out analysis and treatment process.TensorFlow can be used for speech recognition or image recognition
Etc. multinomial machine deep learning field, it can in small to one smart phone, arrive the various of thousands of data center servers greatly
It is run in equipment.TensorFlow will increase income completely, anyone can use.Distributed T ensorFlow can make full use of
The computing capability of computer cluster, greatly speeds up training speed.
In order to simplify the development process of distributed machines study, development efficiency is effectively facilitated, the embodiment of the present application provides
A kind of total solution of offline exploitation and distributed integeration, realizes the offline exploitation for being absorbed in algorithm model.Compared to existing
Have in technology using Yahoo encapsulate Distributed Architecture run on Spark tensor stream (TensorFlow On Spark,
TFoS), the technical solution of the embodiment of the present application is not necessarily to Spark distributed environment, can be in locally exploitation debugging.Also, it is not required to
Changing can be based on code TensorFlow On Spark mode operation in a distributed manner in cluster platform.
Fig. 1 is the scheme schematic diagram of the Distributed Architecture TFoS using Yahoo encapsulation in the prior art.As shown in Figure 1, existing
Scheme needs directly to develop by distributed, and online data passes through specific format elasticity distribution formula data set (Resilient
Distributed Datasets, RDD) it reads in, and have to operate in spark environment, obtain training result.It can see
Out: existing scheme requires developer to have enough distributed development experiences (for example, Spark task creation, scheduling;RDD is managed
Solution and utilization;Hadoop distributed file system (Hadoop Distributed File System, HDFS) file read-write etc.,
Code has to run in the distributed type assemblies environment (or single machine installs multiple virtual machines) put up simultaneously, to portion
The environmental requirement of administration is very high.In addition to this, distributed environment can not execute single-step debug, often be difficult after encountering problems with weight
It is existing, and due to further increasing debugging difficulty there are the factors of instability of network communication and hardware.
Compared to the existing schematic diagram in Fig. 1, a virtual component schematic diagram of the embodiment of the present application is provided in Fig. 2.Such as
Shown in Fig. 2, comprising: frame controller (framework controller), it is local to develop Software Development Kit (local
Development Software Development Kit, local development SDK), local debugging module
(local debug module), local data download (local data loading) (training data simulated in memory
(mock training data into memory)), local calling model component (local call to the model
Module), online data downloading (online data loading) ((Distributed RDD) is distributed with RDD format), far
It holds calling model component (remote call to the model module), training result exports (training result
export).Virtual component schematic diagram in Fig. 2 provides the solution of offline exploitation and distributed integeration.Compared to Fig. 1,
The virtual component of the embodiment of the present application is not necessarily to spark distributed environment, in locally exploitation debugging and can run standalone version program,
Also, not doing change can be based on code TensorFlow On Spark mode operation in a distributed manner in cluster platform.Figure
Virtual component in 2 can permit user in local runtime model, can also seamlessly switch on line.Mould is debugged by local
Block can permit user in local debugging model.
It should be noted that in the embodiment of the present application, each code module or virtual component can be encapsulated by SDK,
So that developer need to only fill in actual logic code.For example, model_builder.py, user can be encapsulated by SDK
Only need in corresponding logical variable input demand.Also, in the embodiment of the present application, local adjust can be encapsulated by SDK
Method for testing allows developer locally to develop debugging.In addition, distributed fortune can be encapsulated by SDK in the embodiment of the present application
Capable method, so that developer without considering distributed treatment, need to only pay close attention to and write local code.
The method of the training pattern of the embodiment of the present application is described below with reference to Fig. 3.Fig. 3 is shown according to the embodiment of the present application
Training pattern method 300 schematic diagram.The method is applied in learning system, and the learning system includes model foundation
Module and control module.Optionally, the learning system can be TensorFlow.As shown in figure 3, this method 300 includes:
S310, model building module establishes training pattern, and returns to model handle to the control module.
In the embodiment of the present application, Software Development Kit (Software Development Kit, SDK) can be passed through
Packaging model establishes module.Optionally, which can be local data, be also possible to distributed data.For example, model handle
It can be build_model (self, cluster, task_index, kwargs).
For example, model building module can be ModelBuilder component, it can be understood as interface, user can be according to need
It wants, uses the foundation of ModelBuilder customization implementation model and the reading of data.
S320, control module obtain the model handle.
Wherein, the model handle that the available above-mentioned model building module of control module returns, so as to subsequent use.
S330, control module obtain configuration parameter, and the configuration parameter is used to indicate the operation that training pattern needs to use
Mode, wherein the method for operation includes local runtime mode or the distributed method of operation.
Wherein, the configuration parameter is user's input.Optionally, user inputs different configuration ginsengs by running function
Number, to indicate the method for operation of training pattern use.For example, configuration parameter can be variable ctx.
S340, control module determine the method for operation that the training model needs to use according to the configuration parameter.
Optionally, control module can be judged by the different assignment of ctx using distributed operation or local runtime.
For example, using local runtime mode if ctx value is empty (NULL);If ctx value is non-NULL, using distribution
The formula method of operation.That is, selecting which kind of method of operation only to need to be passed to different ctx when starting operation and correcting to control
Molding block.If it is the distributed method of operation, variable can be passed to map_func (args, ctx) by control module, finally
Function is successively called to be passed to class model class Model train_model (training pattern)/predict_model (prediction
Model) in the corresponding function of/run_model (training pattern).If it is local runtime mode, control module can skip map_
Func runs the foundation that model is called directly to (_ _ name__==" _ _ main__ ") and local data is read.
Optionally, if ctx value is that non-NULL can call train_model and run_ when using distributed operation
Model carries a series of environment set content required for distributed platforms are run:
Worker_num=ctx.worker_num (number of worker)
Job_name=ctx.job_name (title (job_name) of current operation project)
((distribution operation has multiple subtasks to the serial number of subtask to task_index=ctx.task_index, passes through
Index is distinguished))
Cluster_spec=ctx.cluster_spec (standard configuration parameters of spark distributed type assemblies)
Args=ctx.args.(args is parameter (such as the number of plies of neural network, the training that some projects may be used
Step-length, learning rate etc.))
It should be understood that environment set content needed for above-mentioned distributed platform is illustratively, not implement to the application
The protection scope of example, which is constituted, to be limited.
It should also be understood that the value of above-mentioned ctx and the method for operation are also that illustratively, those skilled in the art can be to ctx
Other values are taken, as long as the different methods of operation can be distinguished, are not especially limited here.
In the embodiment of the present application, above-mentioned model building module is general for local runtime and distributed operation, no
Dependent on running environment, i.e. whether local runtime or distributed operation can establish function with calling model.In the application
It in embodiment, is abstracted model building module as a separate modular, then by encapsulation, handles local/distribution well
The interaction of formula and model building module, so that local/distributed can establish module with calling model.
S350, the control module are based on the method for operation and read data.
It, can also be based on distributed fortune specifically for example, control module can read local data based on local runtime mode
Line mode reads distributed data.
S360, the control module run the training pattern, obtain according to the model handle and the method for operation
The training result of the training pattern of the data.
Specifically, the control module can be based on local runtime mode, call training pattern, obtain the mould of local data
Type training result can also be called training pattern, be obtained the model training knot of distributed data based on the distributed method of operation
Fruit.
In the embodiment of the present application, model building module returns to model sentence by establishing training pattern, and to control module
Handle.The control module is used to indicate the operation side that training pattern needs to use by obtaining configuration parameter, the configuration parameter
Formula allows control module to select the method for operation according to configuration parameter, to determine using local runtime or distributed fortune
Row, and it is based on the corresponding method of operation and the model handle, training pattern is run, to obtain the training result of training pattern.
The solution of offline exploitation and distributed integeration can be needed not rely on into spark distribution ring this provides a kind of
Distributed operation can be realized in border.Scheme compared with the prior art, the embodiment of the present application learn spark without developer
Function library does not need skilled distributed development yet.
The technical solution of the embodiment of the present application simplifies the development process of distributed machines study, effectively facilitates exploitation effect
Rate saves the dependence in development process to complicated distributed hardware environment.
Optionally, as one embodiment, S340 includes:
If the configuration parameter is the first variate-value, the control module is determined using local runtime mode as training mould
Type needs the method for operation used;
Wherein, S360 includes:
The control module uses the model handle, with training pattern described in local runtime function operation, obtains described
The training result of the training pattern of data, wherein the data are local datas.
Specifically for example, above-mentioned first variate-value is NULL value, control module can enter local runtime mode.In addition, control
Molding block reads local data.Control module calls local runtime function by using above-mentioned model handle, for example trains function
And anticipation function, the training pattern is run, it can be with the training result of the training pattern of local data.
Optionally, in local runtime mode, the learning system further includes debugging module, the method 200 further include:
Process of the debugging module to the training result for exporting the model based on the local runtime function, executes tune
Examination operation.
That is, debugging module can be introduced, the process of the training result of the model is exported to local runtime function,
Execute debugging operations.Here, what debugging module was debugged is the code of whole process in local runtime mode.The purpose of debugging exists
In: ensure that the expection of foundation with the operation of training pattern is identical.In local runtime mode, list can be executed after encountering problems
Walk debugging function, since distribution operation can not execute debugging function, executed in distribution operation debugging disturbing factor (such as
Network delay, low memory etc.) it is relatively more, debugging difficulty can be reduced by introducing debugging module, i.e., debug and rerun in local
On distributed platform.
Optionally, one or more in following local debugging function may be implemented in the debugging module in the embodiment of the present application
:
1) variable print_log is directly printed;
2) depth printing variable (to class, array, the realization of the complex data structures such as chained list is gone through all over printing) deep_print_
log;
3) printing variable condition_debug is judged according to condition;
4) customized_debug is additionally provided, user can according to need customized.
It should be understood that above-mentioned local debugging function is only illustratively described, this is not especially limited.
Optionally, as one embodiment, S340 includes:
If the configuration parameter is the second variate-value, the control module is determined using the distributed method of operation as training
Model needs the method for operation used;
Wherein, S360 includes:
The control module uses the model handle, runs training pattern described in function operation in a distributed manner, obtains institute
State the model training result of data.
Specifically for example, above-mentioned second variate-value is non-NULL value, control module can enter the distributed method of operation.Dividing
In cloth operational process, having configured Distributed C ontroller by program, (i.e. the controller of distributed subsystem is responsible for dividing
The communication interaction of cloth system itself) and Workers (Worker can be understood as computing unit, generally there is multiple Worker, tool
Body quantity can configure in configuration file) etc. computing resources, and packaged model building module is called to establish model.Control
Module runs calling model, and Worker individually carries out calculating and calculated result being returned to Distributed C ontroller being converged
Always, it corrects and broadcasts.During the entire process of distribution operation, by encapsulation, the details of distributed communication is concealed, allows exploitation
Person is not necessarily to worry about the distributed difference with local operation.
Optionally, the control module reads distributed data, wherein and the control module uses the model handle,
Training pattern described in function operation is run in a distributed manner, obtains the model training result of the data, comprising:
The control module is based on the distributed data, and training pattern function is called to instruct the distributed data
Practice, or anticipation function is called to predict the distributed data;
The control module runs the corresponding model running function of the distributed data, obtains the instruction of the training pattern
Practice result.
That is, control module can be called in the case where determining using the distributed method of operation
ModelBuilder reads distributed data.It is then based on training pattern function training distributed data, alternatively, calling prediction letter
Several pairs of distributed datas are predicted.Wherein, training pattern function can be train_model (self, args, ctx,
Cluster, server), anticipation function can be predict_model (self, args, ctx, server).It can be seen that phase
Than in the embodiment of the present application, introducing variable ctx in Train function and Predict function in the prior art, to refer to
Which kind of show using the method for operation.Wherein, model running function can be run_model function, and run_model function can help
User completes the local background logic called and distribution is called, and insulation package guarantees the independence of each module.
It should be noted that Train function is the training process carried out by training data, definition and industry engineering
It practises unanimously, specifically may refer to the description of the prior art.Predict function is predicted data, and definition also corresponds to industry
Boundary mark is quasi-, specifically may refer to the description of the prior art.The two similitude is all to be carried out by trained model to data pre-
It surveys;Difference is that the training process of Train function can be compared according to the result obtained with correct result, and feedback modifiers
Model, but the prediction process of Predict function will not be modified.
Optionally, the method 200 further include:
The control module calls preservation model function, and stores the training pattern based on the preservation model function
Training result.
That is, control module is by calling preservation model function, it can obtained model is stored, for
Prediction uses.Wherein, the training result of the training pattern of storage can be exported with Binary Serialization or unserializing imports, and be led
It can be used for prediction after entering.For example, preservation model function is save_model function.In the embodiment of the present application, save_
Model function can only provide several necessary parameters, and user only needs to define needs in save_model function
The content and parameter of storage, other function have passed through encapsulation and have realized, comprising: the Binary Serialization of model, and dividing
In cloth file system and the specific implementation that stores in local file system, (including links, writes with the part of File system communication
Enter and close).
In order to facilitate the understanding of those skilled in the art, being described below in conjunction with the example in Fig. 4.
Fig. 4 provides a schematic block diagram of the module for including according to the embodiment of the present application.As shown in Figure 4, comprising: this
Ground development module (can call ModelBuilder component), control module (Controlframework), local runtime module,
Debugging module (ModelDebug), user complete module (can call ModelBuilder component), TFos distributed platform fortune
Row module, training module, prediction module run module, export training result module.It optionally, can also include saving training
Object module.
Wherein, local development module can be understood as including that model is set up and data read in module for completing.User can
With according to demand, customization implementation model is established and the reading of data.Specifically for example, ModelBuilder, which can be used, (can manage
Solution be interface) under function build_model (self, cluster, task_index, kwargs), the foundation of implementation model;
Also 2 under ModelBuilder can be used) in load_data (args, sc) customization of data read in that (local data is read in
Code/distribution RDD data read in code).
Control module (Controlframework) can according to configuration parameter, determine this operation be local system also
It is distributed system, i.e., using local runtime or distributed operation.Optionally, control module can be assigned according to the difference of ctx
Value is determined using distributed or local runtime.That is, which kind of method of operation selected when starting operation, as long as incoming
Different ctx values is to control module.Specifically for example, if (ctx!=NULL), then it is currently distributed arithmetic, control module can
To call distribution request by map_func (args, ctx), and " ps " or " worker " shape is judged according to ctx.job_name
State (it include " ps " state and " worker " state in Distributed T ensorFlow, wherein " ps " is responsible for training data and summarizes,
Each worker of workers is responsible for individually training), call respective task, including model foundation and distributed data to read
Take etc..If (ctx==NULL), the execution of control module will skip map_func run to local runtime module (_ _
Name__==" _ _ main__ "), the foundation and local data for calling directly model are read.
Wherein, function local_load_ can be called for local runtime module (_ _ name__==" _ _ main__ ")
Data (args, sc) reads local data, to be trained and predict.
Wherein, module (map_func (args, ctx)) is run for TFos distributed platform, distributed fortune can be entered
Capable branch calls load_data (args, sc) to read distributed data.Optionally, control module can be to distributed data
It is trained (using training function, train_model (self, args, ctx, cluter, server)), alternatively, calling prediction
Module (using anticipation function, predict_model (self, args, ctx, server)) is predicted.For example, if
Args.mode==" train ", then call trained function;Otherwise, anticipation function is called.Optionally, based on training function or
Anticipation function, control module can call operation module, that is, run training function run_model (self, args, ctx,
), or the corresponding model running function of operation distributed data server.
Optionally, control module, which may call upon, saves training result module, calls SaveModelAtEndHook by mould
Type training result stores, and uses for prediction.
User completes module (ModelBuilder) and is used to provide customization service for user, can be increased based on the demand of client
Add additional function.User can by function build_model (args, cluster, task_index, kwargs) and
Load_data (args, sc) realizes customized demand.
For debugging module (ModelDebug) for being debugged to the model of local runtime.It, can basis in Fig. 4
Need to customize customized_debug (), for example, local call debugging function, to ensure code in local correct operation.This
In, user is according to debugging module as local debugging is debugged with traditional local.It is alternatively possible in advance in debugging module
In provide some common functions for user call.
In compared with the prior art can not shared module situation, in the embodiment of the present application, either distributed operation
Or local runtime, can call the customized ModelBuilder of user, on-line off-line hence for the same model
Without developing again.In addition, the module for other in Fig. 4 independent of user, can be solidified, to distribution operation and
Respectively required module can respectively carry out decomposing and perfect local runtime, so that selection uses when operation.Encapsulation in Fig. 4 every
From ensure that modules be it is independent, make it possible generalization and seamless switching.
It should be understood that the arrow in Fig. 4 is directed toward the signal for illustrating that function call process, it is not intended to limit the stream of data
To.
It should also be understood that can correspond to the virtual component schematic diagram in Fig. 2 above in Fig. 4.Fig. 4 can be understood as Fig. 2
In one specific implementation.Modules in Fig. 4 can be understood as virtual component, by code or call the modes such as function real
Existing, the embodiment of the present application is not construed as limiting this.
It should be understood that the example in Fig. 2 and Fig. 4 is intended merely to facilitate those skilled in the art understand that the embodiment of the present application,
The concrete scene for being limited to the embodiment of the present application to illustrate is not really wanted.Those skilled in the art according to fig. 2 with the example of Fig. 4, it is clear that
The modification or variation of various equivalences can be carried out, such modification or variation are also fallen into the range of the embodiment of the present application.
The method that the training pattern according to the embodiment of the present application is described in detail above in association with Fig. 1 to Fig. 4.Below in conjunction with
Fig. 5 and Fig. 6 describes the device according to the embodiment of the present application.It should be understood that technical characteristic described in embodiment of the method is equally applicable
In following Installation practice.
Fig. 5 shows the schematic block diagram of the device 500 according to the training pattern of the embodiment of the present application.As shown in figure 5, institute
The device 500 for stating training pattern is applied in learning system, and described device 500 includes model building module 510 and control module
520, wherein the model building module 510 returns to model sentence for establishing training pattern, and to the control module 520
Handle;
The control module 520 obtains the model handle;
The control module 520 obtains configuration parameter, and the configuration parameter is used to indicate the fortune that training pattern needs to use
Line mode, wherein the method for operation includes local runtime mode or the distributed method of operation;
The control module 520 determines the method for operation that training pattern needs to use according to the configuration parameter;
The control module 520 is based on the method for operation and reads data;
The control module 520 runs the training pattern according to the model handle and the method for operation, obtains institute
State the training result of the training pattern of data.
In one possible implementation, as one embodiment, the control module 520 is used for: if the configuration
Parameter is the first variate-value, and the control module determines the operation side for needing to use as training pattern using local runtime mode
Formula;
The instruction of the data is obtained with training pattern described in local runtime function operation using the model handle
Practice the training result of model, wherein the data are local datas.
Optionally, described device 500 further include:
Debugging module 530 executes tune for the process to the training result obtained based on the local runtime function
Examination operation.
In one possible implementation, as one embodiment, the control module 520 is used for:
If the configuration parameter is the second variate-value, the control module is determined using the distributed method of operation as training
Model needs the method for operation used;
Using the model handle, training pattern described in function operation is run in a distributed manner, obtains the described of the data
The training result of training pattern.
Optionally, the control module 520 is specifically used for: reading distributed data;
Based on the distributed data, training pattern function is called to be trained the distributed data, or called
Anticipation function predicts the distributed data;
The corresponding model running function of the distributed data is run, the training result of the training pattern is obtained.
Optionally, the control module is also used to: being called preservation model function, and is stored based on the preservation model function
The training result of the training pattern.
It should be understood that can correspond to method (including Fig. 3 of preceding method embodiment according to the device 500 of the embodiment of the present application
In method), and above and other management operation and/or function of the modules in device 500 respectively in order to realize before
The corresponding steps of each method are stated, therefore the beneficial effect in preceding method embodiment also may be implemented, for sake of simplicity, here not
It repeats.
It should also be understood that above-mentioned control module, model building module and debugging module can pass through software and/or hardware reality
It is existing.
Fig. 6 shows the schematic block diagram of the device 1000 according to the training pattern of the embodiment of the present application.As shown in fig. 6,
The device 600 of the training pattern includes: processor 1001, memory 1002 and transceiver 1003.
It is communicated with each other between processor 1001, memory 1002 and transceiver 1003 by internal connecting path, transmitting control
System and/or data-signal.In a possible design, processor 1001, memory 1002 and transceiver 1003 can pass through
Chip is realized.The memory 1002 can store program code, and processor 1001 calls the program code of the storage of memory 1002,
To realize the corresponding function of the device 1000.
The method that above-mentioned the embodiment of the present application discloses can be applied in processor, or be realized by processor.Processor
It may be a kind of IC chip, the processing capacity with signal.During realization, each step of above method embodiment
It can be completed by the integrated logic circuit of the hardware in processor or the instruction of software form.Above-mentioned processor can be
General processor, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components can also be System on Chip/SoC (system on chip, SoC), can also be central processing unit (Central
Processor Unit, CPU), it can also be network processing unit (Network Processor, NP), can also be digital signal
Processing circuit (Digital Signal Processor, DSP), can also be microcontroller (Micro Controller
Unit, MCU), it can also be programmable controller (Programmable Logic Device, PLD) or other integrated chips.
It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present application.General processor can be
Microprocessor or the processor are also possible to any conventional processor etc..The method in conjunction with disclosed in the embodiment of the present application
Step can be embodied directly in hardware decoding processor and execute completion, or with the hardware and software module group in decoding processor
Conjunction executes completion.Software module can be located at random access memory, flash memory, read-only memory, programmable read only memory or electricity
In the storage medium of this fields such as erasable programmable memory, register maturation.The storage medium is located at memory, processor
The step of reading the information in memory, completing the above method in conjunction with its hardware.
It is appreciated that the memory in the embodiment of the present application can be volatile memory or nonvolatile memory, or
It may include both volatile and non-volatile memories.Wherein, nonvolatile memory can be read-only memory (Read-
Only Memory, ROM), programmable read only memory (Programmable ROM, PROM), the read-only storage of erasable programmable
Device (Erasable PROM, EPROM), electrically erasable programmable read-only memory (Electrically EPROM, EEPROM) or
Flash memory.Volatile memory can be random access memory (Random Access Memory, RAM), be used as external high
Speed caching.By exemplary but be not restricted explanation, the RAM of many forms is available, such as static random access memory
(Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory
(Synchronous DRAM, SDRAM), double data speed synchronous dynamic RAM (Double Data Rate
SDRAM, DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), synchronized links
Dynamic random access memory (Synchlink DRAM, SLDRAM) and direct rambus random access memory (Direct
Rambus RAM, DR RAM).It should be noted that the memory of system and method described herein be intended to include but be not limited to these and
The memory of any other suitable type.
It should be understood that the terms "and/or", only a kind of incidence relation for describing affiliated partner, expression can deposit
In three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.
In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
Scope of the present application.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of the steps.
And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read only memory ROM, random access memory ram, magnetic disk or light
The various media that can store program code such as disk.
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any
Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain
Lid is within the scope of protection of this application.Therefore, the protection scope of the application should be based on the protection scope of the described claims.