CN117196015A - Operator execution method, device, electronic equipment and storage medium - Google Patents
Operator execution method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN117196015A CN117196015A CN202311212785.4A CN202311212785A CN117196015A CN 117196015 A CN117196015 A CN 117196015A CN 202311212785 A CN202311212785 A CN 202311212785A CN 117196015 A CN117196015 A CN 117196015A
- Authority
- CN
- China
- Prior art keywords
- operator
- sub
- execution
- operators
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000004927 fusion Effects 0.000 claims abstract description 105
- 238000003062 neural network model Methods 0.000 claims abstract description 46
- 238000004364 calculation method Methods 0.000 claims abstract description 19
- 238000007499 fusion processing Methods 0.000 claims abstract description 18
- 238000004590 computer program Methods 0.000 claims description 12
- 238000013507 mapping Methods 0.000 claims description 5
- 238000011161 development Methods 0.000 abstract description 6
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Landscapes
- General Factory Administration (AREA)
Abstract
The invention provides an operator execution method, an operator execution device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, wherein the operator execution method comprises the following steps: performing operator fusion processing on a plurality of single operators in the neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator; the target operators are executed in parallel based on the parallel configuration information. By the method, parallel execution of calculation among operators is realized, the calculation parallelism among the fusion operators is improved, and the calculation parallelism among operators inside each fusion operator is improved, so that the operator execution efficiency and the operator development efficiency are improved.
Description
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to an operator execution method, an operator execution device, an electronic device, and a storage medium.
Background
With the rapid development of artificial intelligence (Artificial Intelligence, AI), neural networks are widely used in various fields, and a large number of operators are required to operate the neural networks.
In the related art, a single operator mode (eager mode) is basically adopted in the operation of the neural network, operators are usually executed serially, and synchronization among the operators is ensured by hardware: i.e. this is determined by the multiple run kernel function (kernel).
However, operator serial execution is time consuming, resulting in less efficient operator development. Therefore, how to improve the execution efficiency of the operator is a problem to be solved.
Disclosure of Invention
Aiming at the problems existing in the prior art, the embodiment of the invention provides an operator execution method, an operator execution device, electronic equipment and a storage medium.
The invention provides an operator execution method, which comprises the following steps:
performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1;
generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator;
and executing the target operator based on the operator execution configuration information.
Optionally, the operator performs configuration information including at least one of:
the first operator execution configuration information is used for indicating operator parallel execution strategies among the M fusion operators and the N single operators;
and the second operator execution configuration information is used for indicating an operator parallel execution strategy among all sub operators in each fusion operator.
Optionally, the executing the target operator based on the operator execution configuration information includes:
under the condition that the target operator comprises the M fusion operators and the N single operators, generating an operator execution instruction based on the first operator execution configuration information and the second operator execution configuration information;
and executing the target operator based on the operator execution instruction.
Optionally, the executing the target operator based on the operator execution configuration information includes:
under the condition that the target operator comprises the M fusion operators, executing configuration information based on the second operator to generate an operator execution instruction;
and executing the target operator based on the operator execution instruction.
Optionally, the executing the target operator based on the operator execution instruction includes:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing the second sub operator under the condition that a first sub operator sends a first message to the second sub operator; the output of the first sub operator is the input of the second sub operator, and the first message is used for representing at least part of data associated with the first sub operator after the first sub operator is executed;
executing configuration information based on the first operator, and executing the N single operators under the condition that a target sub operator in each fusion operator sends a second message to the N single operators; the second message is used for representing at least part of data in the data associated with the target sub-operator after the execution of the target sub-operator is finished, and the output of the target sub-operator is the input of the N single operators.
Optionally, the executing the target operator based on the operator execution instruction includes:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing a fourth sub operator under the condition that a third message is sent to the fourth sub operator by the third sub operator;
the output of the third sub-operator is the input of the fourth sub-operator, and the third message is used for representing at least part of data associated with the third sub-operator after the third sub-operator is executed.
Optionally, before the executing the target operator, the method further includes:
and aiming at each fusion operator, cutting the data associated with each sub operator in the fusion operators based on a preset data cutting strategy.
Optionally, the executing the target operator based on the operator execution instruction includes:
mapping the operator execution instruction from a logic space to a physical space of the target operator in hardware equipment;
executing an instruction based on the operator in the physical space, and executing the target operator.
Optionally, before the operator fusion processing is performed on the plurality of single operators in the neural network model, the method further includes:
based on a plurality of single operators in the neural network model, generating a calculation map corresponding to the neural network model, wherein the calculation map is used for representing the data dependency relationship among the single operators.
The invention also provides an operator executing device, which comprises:
the fusion module is used for carrying out operator fusion processing on a plurality of single operators in the neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1;
the first generation module is used for generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator;
and the execution module is used for executing the target operator based on the operator execution configuration information.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any one of the operator execution methods described above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an operator execution method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements an operator execution method as described in any one of the above.
The operator execution method, the device, the electronic equipment and the storage medium provided by the invention are used for generating a target operator corresponding to the neural network model by carrying out operator fusion processing on a plurality of single operators in the neural network model, wherein the target operator comprises M fusion operators or M fusion operators and N single operators; then, generating operator execution configuration information based on the target operators, and executing M fusion operators or M fusion operators and N single operators based on a parallel execution strategy indicated by the operator execution configuration information, so that the parallel execution of the computation among the operators is realized, the computation parallelism among the fusion operators and the computation parallelism among operators in each fusion operator are improved, and further the operator execution efficiency and the operator development efficiency are improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of an operator execution method provided by the present invention;
FIG. 2 is a schematic illustration of a computational graph provided by the present invention;
FIG. 3 is a second schematic diagram of the calculation chart provided by the present invention;
FIG. 4 is a third schematic diagram of the calculation chart provided by the present invention;
FIG. 5 is a kernel schematic diagram corresponding to the operator graph provided by the present invention;
FIG. 6 is a second flow chart of the operator execution method according to the present invention;
FIG. 7 is a schematic diagram of an operator execution apparatus provided by the present invention;
fig. 8 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The operator execution method provided by the present invention is specifically described below with reference to fig. 1 to 6. Fig. 1 is a schematic flow chart of an operator execution method provided in the present invention, referring to fig. 1, the method includes steps 101 to 103, where:
step 101, performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1.
Firstly, it should be noted that the present invention is applied to a computing map compiling scene in an artificial intelligent chip software stack, and the execution subject of the present invention may be any electronic device capable of implementing operator execution, for example, any one of a smart phone, a smart watch, a desktop computer, a portable computer, etc.
In the embodiment of the invention, the neural network model can be applied to the fields of image recognition, voice processing, natural language processing and the like, and can be, for example, a convolutional neural network model (Convolutional Neural Networks, CNN), a cyclic neural network model (Recurrent Neural Networks, RNN) and the like, and a plurality of operators are required to operate the neural network as supports.
Optionally, before the operator fusion processing is performed on the plurality of single operators in the neural network model, the following steps are further required to be performed:
based on a plurality of single operators in the neural network model, generating a calculation map corresponding to the neural network model, wherein the calculation map is used for representing the data dependency relationship among the single operators.
In the embodiment of the invention, before operator fusion processing is carried out on a plurality of single operators in the neural network model, each single operator from the neural network model needs to be subjected to graph assembly according to interfaces, and a calculation graph is generated.
The computational graph corresponding to the neural network model is a directed acyclic graph used for describing operations, and has two main elements: nodes (nodes) and edges (edges); wherein each node may correspond to a single operator, such as a vector, matrix, tensor, etc.; edges represent operations such as addition, subtraction, multiplication, division, convolution, etc.
The computational graph reflects the data dependencies between the individual operators. Fig. 2 is one of the schematic diagrams of the computational graph provided by the present invention, in the computational graph shown in fig. 2, the outputs of operator 1 and operator 2 are the inputs of operator 3, the output of operator 4 is the input of operator 5, and the outputs of operator 3 and operator 5 are the inputs of operator 6.
The invention provides an operator synchronization mechanism under a calculation graph mode of a system, and different operator synchronization strategies (namely parallel execution strategies among operators) are configured at different calculation graph compiling stages, so that the calculation parallelism among the operators is improved.
102, generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator.
Optionally, the operator performs configuration information including at least one of:
a) The first operator execution configuration information is used for indicating operator parallel execution strategies among the M fusion operators and the N single operators.
b) And the second operator execution configuration information is used for indicating an operator parallel execution strategy among all sub operators in each fusion operator.
In the above embodiment, the parallel execution policy between the fusion operators and the single operators, and the parallel execution policy between the sub operators in each fusion operator can be determined by the first operator execution configuration information and the second operator execution configuration information, so that the parallelism of the computation between the operators is improved.
And 103, executing the target operator based on the operator execution configuration information.
According to the operator execution method provided by the invention, the operator fusion processing is carried out on a plurality of single operators in the neural network model to generate a target operator corresponding to the neural network model, wherein the target operator comprises M fusion operators or M fusion operators and N single operators; then, generating operator execution configuration information based on the target operators, and executing M fusion operators or M fusion operators and N single operators based on a parallel execution strategy indicated by the operator execution configuration information, so that the parallel execution of the computation among the operators is realized, the computation parallelism among the fusion operators and the computation parallelism among operators in each fusion operator are improved, and further the operator execution efficiency and the operator development efficiency are improved.
Optionally, before the executing the target operator, the method further includes:
and aiming at each fusion operator, cutting the data associated with each sub operator in the fusion operators based on a preset data cutting strategy.
In the embodiment of the invention, the data segmentation strategy is a tilling strategy, which is a technology for reducing access to global memory by using shared memory on a graphics processor (Graphics Processing Unit, GPU) so as to improve the execution efficiency of kernel functions. The data associated with each sub operator in each fusion operator can be segmented by using a tilling strategy, so that the data quantity input into each sub operator is reduced, and the execution efficiency of the whole fusion operator can be effectively improved.
Specifically, after data associated with each sub-operator in the fusion operator is segmented by using a tilling strategy, a plurality of data blocks associated with each sub-operator are obtained.
In the process of executing each sub operator, the data input to each sub operator is a segmented data block, so that the data processing capacity of each sub operator is improved.
Optionally, the executing the target operator based on the operator execution configuration information specifically includes at least one of the following ways:
mode one, specifically including step 1) -step 2):
step 1), generating an operator execution instruction based on the first operator execution configuration information and the second operator execution configuration information under the condition that the target operator comprises the M fusion operators and the N single operators.
Step 2), executing the target operator based on the operator execution instruction.
Optionally, the executing the target operator based on the operator executing instruction may be specifically implemented by the following steps [1] -step [ 2):
step [1], for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing the second sub operator under the condition that the first sub operator sends a first message to the second sub operator; the output of the first sub operator is the input of the second sub operator, and the first message is used for representing at least part of data associated with the first sub operator after the first sub operator is executed;
step [2] of executing configuration information based on the first operator, and executing the N single operators under the condition that the target sub operator in each fusion operator sends a second message to the N single operators; the second message is used for representing at least part of data in the data associated with the target sub-operator after the execution of the target sub-operator is finished, and the output of the target sub-operator is the input of the N single operators.
Fig. 3 is a schematic diagram of a second calculation diagram provided by the present invention, where the calculation diagram shown in fig. 3 includes a fusion operator a, a fusion operator b, and a single operator c. The fusion operator a comprises a sub operator 1, a sub operator 2 and a sub operator 3; the fusion operator b internally comprises a sub operator 4 and a sub operator 5.
a) For sub-operator 1 and sub-operator 3: the sub operator 1 is a first sub operator, and the sub operator 3 is a second sub operator.
Based on the second operator configuration information, the sub operator 1 firstly executes at least one data block associated with the sub operator 1 (each data block is obtained by segmenting data associated with the sub operator 1 through a tilling strategy). After sub operator 1 has performed at least part of the data, a first message is sent to sub operator 3. The sub operator 3, upon receipt of the first message, starts executing at least one data block associated with the sub operator 3.
In the above embodiment, the sub-operator 1 and the sub-operator 3 may be executed in parallel.
b) For sub-operator 2 and sub-operator 3: the sub operator 2 is a first sub operator, and the sub operator 3 is a second sub operator.
Based on the second operator configuration information, the sub operator 2 first executes at least one data block associated therewith. After sub operator 2 has performed at least part of the data, a first message is sent to sub operator 3. The sub operator 3, upon receipt of the first message, starts executing at least one data block associated with the sub operator 3.
In the above embodiment, the sub-operator 2 and the sub-operator 3 may be executed in parallel.
c) For sub-operator 4 and sub-operator 5: the sub-operator 4 is a first sub-operator, and the sub-operator 5 is a second sub-operator.
Based on the second operator configuration information, the sub operator 4 first executes at least one data block associated therewith. After sub-operator 4 has performed at least part of the data, a first message is sent to sub-operator 5. The sub operator 5, upon receipt of the first message, starts executing at least one data block associated with the sub operator 5.
In the above embodiment, the sub-operator 4 and the sub-operator 5 may be executed in parallel.
d) For sub-operator 3, sub-operator 5 and single operator c: the sub-operators 3 and 5 are target sub-operators.
Based on the first operator configuration information, the sub-operators 3, 5 first execute at least one data block associated therewith. After sub-operator 3, sub-operator 5 has performed at least part of the data, a second message is sent to single operator c. The single operator c starts to execute at least one data block associated with the single operator c after receiving the second message sent by the sub operator 3 and the sub operator 5.
In the above embodiment, the sub-operator 3, the sub-operator 5 and the single operator c may be executed in parallel.
In the above embodiment, different operators belong to different computing units, and thus, parallel execution between different operators may be understood as parallel execution between different computing units or synchronization between different computing units.
For example: synchronization between the Tcore and vector Engine; synchronization between Vector and Vector engine.
Mode two, specifically including step 1) -step 2):
step 1), under the condition that the target operator comprises the M fusion operators, executing configuration information based on the second operator to generate an operator execution instruction;
step 2), executing the target operator based on the operator execution instruction.
Optionally, the executing the target operator based on the operator executing instruction may specifically be implemented by:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing a fourth sub operator under the condition that a third message is sent to the fourth sub operator by the third sub operator;
the output of the third sub-operator is the input of the fourth sub-operator, and the third message is used for representing at least part of data associated with the third sub-operator after the third sub-operator is executed.
FIG. 4 is a third schematic diagram of the computation graph provided by the present invention, in which the computation graph shown in FIG. 4 includes a fusion operator a and a fusion operator b. The fusion operator a comprises a sub operator 1, a sub operator 2 and a sub operator 3; the fusion operator b internally comprises a sub operator 4 and a sub operator 5.
a) For sub-operator 1 and sub-operator 3: the sub-operator 1 is a third sub-operator, and the sub-operator 3 is a fourth sub-operator.
Based on the second operator configuration information, the sub operator 1 firstly executes at least one data block associated with the sub operator 1 (each data block is obtained by segmenting data associated with the sub operator 1 through a tilling strategy). After sub operator 1 has performed at least part of the data, a third message is sent to sub operator 3. The sub operator 3, after receiving the third message, starts executing at least one data block associated with the sub operator 3.
In the above embodiment, the sub-operator 1 and the sub-operator 3 may be executed in parallel.
b) For sub-operator 2 and sub-operator 3: the sub-operator 2 is a third sub-operator, and the sub-operator 3 is a fourth sub-operator.
Based on the second operator configuration information, the sub operator 2 first executes at least one data block associated therewith. After sub operator 2 has performed at least part of the data, a third message is sent to sub operator 3. The sub operator 3, after receiving the third message, starts executing at least one data block associated with the sub operator 3.
In the above embodiment, the sub-operator 2 and the sub-operator 3 may be executed in parallel.
c) For sub-operator 4 and sub-operator 5: the sub-operator 4 is a third sub-operator, and the sub-operator 5 is a fourth sub-operator.
Based on the second operator configuration information, the sub operator 4 first executes at least one data block associated therewith. After sub-operator 4 has performed at least part of the data, a third message is sent to sub-operator 5. The sub operator 5, after receiving the third message, starts executing at least one data block associated with the sub operator 5.
In the above embodiment, the sub-operator 4 and the sub-operator 5 may be executed in parallel.
Since there is no data dependency between the sub-operator 3 and the sub-operator 5, after the sub-operator 3 and the sub-operator 5 receive the third message, each may execute at least one data block associated with each sub-operator.
Optionally, in the foregoing embodiment, the executing the target operator based on the operator execution instruction may specifically be implemented by:
step 1), mapping the operator execution instruction from a logic space to a physical space of the target operator in hardware equipment;
step 2), executing instructions based on the operators in the physical space, and executing the target operators.
In the embodiment of the invention, after the operator execution instruction is generated, a kernel corresponding to the operator graph can be generated, and specifically as shown in fig. 5, fig. 5 is a schematic diagram of the kernel corresponding to the operator graph provided by the invention.
After the operator execution instruction is mapped from the logical space to the physical space of the target operator in the hardware device, kernel is run to ensure the parallel execution of the target operator.
Fig. 6 is a second flowchart of the operator execution method provided in the present invention, referring to fig. 6, the method includes steps 601-610, in which:
step 601, generating a computation graph corresponding to the neural network model based on a plurality of single operators in the neural network model, wherein the computation graph is used for representing data dependency relations among the single operators.
Step 602, performing operator fusion processing on a plurality of single operators in the neural network model to generate a target operator corresponding to the neural network model; the target operators comprise M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1.
Step 603, for each fusion operator, splitting data associated with each sub operator in the fusion operator based on a preset data splitting strategy.
Step 604, generating an operator execution instruction based on the first operator execution configuration information and the second operator execution configuration information in the case that the target operator includes M fusion operators and N single operators.
Step 605, for each fusion operator, responding to an operator execution instruction, executing configuration information based on a second operator, and executing the second sub operator under the condition that the first sub operator sends a first message to the second sub operator; the output of the first sub operator is the input of the second sub operator, and the first message is used for representing at least part of data in the data associated with the first sub operator after the first sub operator is executed; the second operator execution configuration information is used for indicating an operator parallel execution strategy among sub operators in each fusion operator.
Step 606, executing configuration information based on the first operator, and executing N single operators under the condition that the target sub operator in each fusion operator sends a second message to the N single operators; the second message is used for representing at least part of data in the data associated with the target sub operator after the execution of the target sub operator is finished, the output of the target sub operator is the input of N single operators, and the first operator execution configuration information is used for indicating an operator parallel execution strategy between M fusion operators and an operator parallel execution strategy between the M fusion operators and the N single operators.
In step 607, in the case that the target operator includes M fusion operators, the operator execution instruction is generated based on the second operator execution configuration information.
Step 608, for each fusion operator, responding to an operator execution instruction, executing configuration information based on the second operator, and executing the fourth sub operator under the condition that the third sub operator sends a third message to the fourth sub operator; the output of the third sub operator is the input of the fourth sub operator, and the third message is used for representing at least part of data in the data associated with the third sub operator after the third sub operator is executed.
It should be noted that the execution sequence of steps 604-606 and steps 607-608 is not sequential.
Step 609, mapping the operator execution instruction from the logical space to the physical space of the target operator in the hardware device.
Step 610, executing a target operator based on an operator execution instruction in the physical space.
The operator execution device provided by the invention is described below, and the operator execution device described below and the operator execution method described above can be referred to correspondingly. Fig. 7 is a schematic structural diagram of an operator executing apparatus according to the present invention, and as shown in fig. 7, the operator executing apparatus 700 includes: a fusion module 701, a first generation module 702 and an execution module 703, wherein:
the fusion module 701 is configured to perform operator fusion processing on a plurality of single operators in a neural network model, and generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1;
a first generating module 702, configured to generate operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator;
an execution module 703, configured to execute the target operator based on the operator execution configuration information.
The operator execution device provided by the invention generates a target operator corresponding to the neural network model by carrying out operator fusion processing on a plurality of single operators in the neural network model, wherein the target operator comprises M fusion operators or M fusion operators and N single operators; then, generating operator execution configuration information based on the target operators, and executing M fusion operators or M fusion operators and N single operators based on a parallel execution strategy indicated by the operator execution configuration information, so that the parallel execution of the computation among the operators is realized, the computation parallelism among the fusion operators and the computation parallelism among operators in each fusion operator are improved, and further the operator execution efficiency and the operator development efficiency are improved.
Optionally, the operator performs configuration information including at least one of:
the first operator execution configuration information is used for indicating operator parallel execution strategies among the M fusion operators and the N single operators;
and the second operator execution configuration information is used for indicating an operator parallel execution strategy among all sub operators in each fusion operator.
Optionally, the execution module 703 is further configured to:
under the condition that the target operator comprises the M fusion operators and the N single operators, generating an operator execution instruction based on the first operator execution configuration information and the second operator execution configuration information;
and executing the target operator based on the operator execution instruction.
Optionally, the execution module 703 is further configured to:
under the condition that the target operator comprises the M fusion operators, executing configuration information based on the second operator to generate an operator execution instruction;
and executing the target operator based on the operator execution instruction.
Optionally, the execution module 703 is further configured to:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing the second sub operator under the condition that a first sub operator sends a first message to the second sub operator; the output of the first sub operator is the input of the second sub operator, and the first message is used for representing at least part of data associated with the first sub operator after the first sub operator is executed;
executing configuration information based on the first operator, and executing the N single operators under the condition that a target sub operator in each fusion operator sends a second message to the N single operators; the second message is used for representing at least part of data in the data associated with the target sub-operator after the execution of the target sub-operator is finished, and the output of the target sub-operator is the input of the N single operators.
Optionally, the execution module 703 is further configured to:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing a fourth sub operator under the condition that a third message is sent to the fourth sub operator by the third sub operator;
the output of the third sub-operator is the input of the fourth sub-operator, and the third message is used for representing at least part of data associated with the third sub-operator after the third sub-operator is executed.
Optionally, the apparatus further comprises:
the segmentation module is used for segmenting the data associated with each sub operator in the fusion operators based on a preset data segmentation strategy aiming at each fusion operator.
Optionally, the execution module 703 is further configured to:
mapping the operator execution instruction from a logic space to a physical space of the target operator in hardware equipment;
executing an instruction based on the operator in the physical space, and executing the target operator.
Optionally, the apparatus further comprises:
the second generation module is used for generating a calculation graph corresponding to the neural network model based on a plurality of single operators in the neural network model, and the calculation graph is used for representing the data dependency relationship among the single operators.
Fig. 8 is a schematic structural diagram of an electronic device according to the present invention, as shown in fig. 8, the electronic device may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. Processor 810 may call logic instructions in memory 830 to perform an operator execution method comprising: performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1; generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator; and executing the target operator based on the operator execution configuration information.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the operator performing method provided by the methods described above, the method comprising: performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1; generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator; and executing the target operator based on the operator execution configuration information.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the operator execution method provided by the above methods, the method comprising: performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1; generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator; and executing the target operator based on the operator execution configuration information.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (13)
1. An operator execution method, comprising:
performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1;
generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator;
and executing the target operator based on the operator execution configuration information.
2. The operator execution method according to claim 1, wherein the operator execution configuration information includes at least one of:
the first operator execution configuration information is used for indicating operator parallel execution strategies among the M fusion operators and the N single operators;
and the second operator execution configuration information is used for indicating an operator parallel execution strategy among all sub operators in each fusion operator.
3. The operator execution method according to claim 2, wherein the executing the target operator based on the operator execution configuration information includes:
under the condition that the target operator comprises the M fusion operators and the N single operators, generating an operator execution instruction based on the first operator execution configuration information and the second operator execution configuration information;
and executing the target operator based on the operator execution instruction.
4. The operator execution method according to claim 2, wherein the executing the target operator based on the operator execution configuration information includes:
under the condition that the target operator comprises the M fusion operators, executing configuration information based on the second operator to generate an operator execution instruction;
and executing the target operator based on the operator execution instruction.
5. The operator execution method according to claim 3, wherein the executing the target operator based on the operator execution instruction includes:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing the second sub operator under the condition that a first sub operator sends a first message to the second sub operator; the output of the first sub operator is the input of the second sub operator, and the first message is used for representing at least part of data associated with the first sub operator after the first sub operator is executed;
executing configuration information based on the first operator, and executing the N single operators under the condition that a target sub operator in each fusion operator sends a second message to the N single operators; the second message is used for representing at least part of data in the data associated with the target sub-operator after the execution of the target sub-operator is finished, and the output of the target sub-operator is the input of the N single operators.
6. The operator execution method according to claim 4, wherein the executing the target operator based on the operator execution instruction includes:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing a fourth sub operator under the condition that a third message is sent to the fourth sub operator by the third sub operator;
the output of the third sub-operator is the input of the fourth sub-operator, and the third message is used for representing at least part of data associated with the third sub-operator after the third sub-operator is executed.
7. The operator execution method according to any one of claims 1 to 6, wherein before said executing the target operator, the method further comprises:
and aiming at each fusion operator, cutting the data associated with each sub operator in the fusion operators based on a preset data cutting strategy.
8. The operator execution method according to any one of claims 3 to 6, wherein the executing the target operator based on the operator execution instruction includes:
mapping the operator execution instruction from a logic space to a physical space of the target operator in hardware equipment;
executing an instruction based on the operator in the physical space, and executing the target operator.
9. The operator execution method according to any one of claims 1 to 6, characterized in that before the operator fusion processing is performed on a plurality of single operators in a neural network model, the method further comprises:
based on a plurality of single operators in the neural network model, generating a calculation map corresponding to the neural network model, wherein the calculation map is used for representing the data dependency relationship among the single operators.
10. An operator execution apparatus, comprising:
the fusion module is used for carrying out operator fusion processing on a plurality of single operators in the neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1;
the first generation module is used for generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator;
and the execution module is used for executing the target operator based on the operator execution configuration information.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the operator execution method of any one of claims 1 to 9 when the program is executed.
12. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the operator performing method of any of claims 1 to 9.
13. A computer program product comprising a computer program which, when executed by a processor, implements the operator execution method of any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311212785.4A CN117196015A (en) | 2023-09-19 | 2023-09-19 | Operator execution method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311212785.4A CN117196015A (en) | 2023-09-19 | 2023-09-19 | Operator execution method, device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117196015A true CN117196015A (en) | 2023-12-08 |
Family
ID=88999620
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311212785.4A Pending CN117196015A (en) | 2023-09-19 | 2023-09-19 | Operator execution method, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117196015A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118446305A (en) * | 2023-12-29 | 2024-08-06 | 荣耀终端有限公司 | Reasoning method of model reasoning framework, electronic equipment and corresponding device |
-
2023
- 2023-09-19 CN CN202311212785.4A patent/CN117196015A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118446305A (en) * | 2023-12-29 | 2024-08-06 | 荣耀终端有限公司 | Reasoning method of model reasoning framework, electronic equipment and corresponding device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220083857A1 (en) | Convolutional neural network operation method and device | |
CN110633785B (en) | Method and system for calculating convolutional neural network | |
CN112230927B (en) | File redirection method, code loading control method and device | |
CN118151906B (en) | Operator automatic generation method, device, equipment and medium | |
CN118334323B (en) | Insulator detection method and system based on ultraviolet image | |
CN116128019A (en) | Parallel training method and device for transducer model | |
CN113168309A (en) | Method, circuit and SOC for performing matrix multiplication | |
CN117196015A (en) | Operator execution method, device, electronic equipment and storage medium | |
CN111352896B (en) | Artificial intelligence accelerator, equipment, chip and data processing method | |
CN116796289A (en) | Operator processing method and device, electronic equipment and storage medium | |
CN117170681A (en) | Nuclear function generation method and device, electronic equipment and storage medium | |
KR20220078819A (en) | Method and apparatus for performing deep learning operations | |
CN112200310B (en) | Intelligent processor, data processing method and storage medium | |
CN117291259A (en) | Operator optimization method and device, electronic equipment and storage medium | |
CN118467459A (en) | Model chip architecture implementation method, apparatus, electronic device, storage medium and computer program product | |
CN115809688B (en) | Model debugging method and device, electronic equipment and storage medium | |
CN115759260A (en) | Inference method and device of deep learning model, electronic equipment and storage medium | |
CN115346099A (en) | Image convolution method, chip, equipment and medium based on accelerator chip | |
CN113313239A (en) | Artificial intelligence model design optimization method and device | |
CN116755714B (en) | Method, device, equipment and storage medium for operating deep neural network model | |
CN119005271B (en) | Neural network model parallel optimization method and device based on operator partitioning | |
CN118093436B (en) | Random test method, device, equipment and storage medium for deep learning operator | |
US11689608B1 (en) | Method, electronic device, and computer program product for data sharing | |
CN116562041B (en) | A simulation method, device, electronic device and storage medium | |
CN117172289B (en) | Tensor segmentation method, device and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |