[go: up one dir, main page]

CN117196015A - Operator execution method, device, electronic equipment and storage medium - Google Patents

Operator execution method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117196015A
CN117196015A CN202311212785.4A CN202311212785A CN117196015A CN 117196015 A CN117196015 A CN 117196015A CN 202311212785 A CN202311212785 A CN 202311212785A CN 117196015 A CN117196015 A CN 117196015A
Authority
CN
China
Prior art keywords
operator
sub
execution
operators
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311212785.4A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bi Ren Technology Co ltd
Original Assignee
Shanghai Bi Ren Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bi Ren Technology Co ltd filed Critical Shanghai Bi Ren Technology Co ltd
Priority to CN202311212785.4A priority Critical patent/CN117196015A/en
Publication of CN117196015A publication Critical patent/CN117196015A/en
Pending legal-status Critical Current

Links

Landscapes

  • General Factory Administration (AREA)

Abstract

The invention provides an operator execution method, an operator execution device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, wherein the operator execution method comprises the following steps: performing operator fusion processing on a plurality of single operators in the neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator; the target operators are executed in parallel based on the parallel configuration information. By the method, parallel execution of calculation among operators is realized, the calculation parallelism among the fusion operators is improved, and the calculation parallelism among operators inside each fusion operator is improved, so that the operator execution efficiency and the operator development efficiency are improved.

Description

Operator execution method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to an operator execution method, an operator execution device, an electronic device, and a storage medium.
Background
With the rapid development of artificial intelligence (Artificial Intelligence, AI), neural networks are widely used in various fields, and a large number of operators are required to operate the neural networks.
In the related art, a single operator mode (eager mode) is basically adopted in the operation of the neural network, operators are usually executed serially, and synchronization among the operators is ensured by hardware: i.e. this is determined by the multiple run kernel function (kernel).
However, operator serial execution is time consuming, resulting in less efficient operator development. Therefore, how to improve the execution efficiency of the operator is a problem to be solved.
Disclosure of Invention
Aiming at the problems existing in the prior art, the embodiment of the invention provides an operator execution method, an operator execution device, electronic equipment and a storage medium.
The invention provides an operator execution method, which comprises the following steps:
performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1;
generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator;
and executing the target operator based on the operator execution configuration information.
Optionally, the operator performs configuration information including at least one of:
the first operator execution configuration information is used for indicating operator parallel execution strategies among the M fusion operators and the N single operators;
and the second operator execution configuration information is used for indicating an operator parallel execution strategy among all sub operators in each fusion operator.
Optionally, the executing the target operator based on the operator execution configuration information includes:
under the condition that the target operator comprises the M fusion operators and the N single operators, generating an operator execution instruction based on the first operator execution configuration information and the second operator execution configuration information;
and executing the target operator based on the operator execution instruction.
Optionally, the executing the target operator based on the operator execution configuration information includes:
under the condition that the target operator comprises the M fusion operators, executing configuration information based on the second operator to generate an operator execution instruction;
and executing the target operator based on the operator execution instruction.
Optionally, the executing the target operator based on the operator execution instruction includes:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing the second sub operator under the condition that a first sub operator sends a first message to the second sub operator; the output of the first sub operator is the input of the second sub operator, and the first message is used for representing at least part of data associated with the first sub operator after the first sub operator is executed;
executing configuration information based on the first operator, and executing the N single operators under the condition that a target sub operator in each fusion operator sends a second message to the N single operators; the second message is used for representing at least part of data in the data associated with the target sub-operator after the execution of the target sub-operator is finished, and the output of the target sub-operator is the input of the N single operators.
Optionally, the executing the target operator based on the operator execution instruction includes:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing a fourth sub operator under the condition that a third message is sent to the fourth sub operator by the third sub operator;
the output of the third sub-operator is the input of the fourth sub-operator, and the third message is used for representing at least part of data associated with the third sub-operator after the third sub-operator is executed.
Optionally, before the executing the target operator, the method further includes:
and aiming at each fusion operator, cutting the data associated with each sub operator in the fusion operators based on a preset data cutting strategy.
Optionally, the executing the target operator based on the operator execution instruction includes:
mapping the operator execution instruction from a logic space to a physical space of the target operator in hardware equipment;
executing an instruction based on the operator in the physical space, and executing the target operator.
Optionally, before the operator fusion processing is performed on the plurality of single operators in the neural network model, the method further includes:
based on a plurality of single operators in the neural network model, generating a calculation map corresponding to the neural network model, wherein the calculation map is used for representing the data dependency relationship among the single operators.
The invention also provides an operator executing device, which comprises:
the fusion module is used for carrying out operator fusion processing on a plurality of single operators in the neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1;
the first generation module is used for generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator;
and the execution module is used for executing the target operator based on the operator execution configuration information.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any one of the operator execution methods described above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an operator execution method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements an operator execution method as described in any one of the above.
The operator execution method, the device, the electronic equipment and the storage medium provided by the invention are used for generating a target operator corresponding to the neural network model by carrying out operator fusion processing on a plurality of single operators in the neural network model, wherein the target operator comprises M fusion operators or M fusion operators and N single operators; then, generating operator execution configuration information based on the target operators, and executing M fusion operators or M fusion operators and N single operators based on a parallel execution strategy indicated by the operator execution configuration information, so that the parallel execution of the computation among the operators is realized, the computation parallelism among the fusion operators and the computation parallelism among operators in each fusion operator are improved, and further the operator execution efficiency and the operator development efficiency are improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of an operator execution method provided by the present invention;
FIG. 2 is a schematic illustration of a computational graph provided by the present invention;
FIG. 3 is a second schematic diagram of the calculation chart provided by the present invention;
FIG. 4 is a third schematic diagram of the calculation chart provided by the present invention;
FIG. 5 is a kernel schematic diagram corresponding to the operator graph provided by the present invention;
FIG. 6 is a second flow chart of the operator execution method according to the present invention;
FIG. 7 is a schematic diagram of an operator execution apparatus provided by the present invention;
fig. 8 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The operator execution method provided by the present invention is specifically described below with reference to fig. 1 to 6. Fig. 1 is a schematic flow chart of an operator execution method provided in the present invention, referring to fig. 1, the method includes steps 101 to 103, where:
step 101, performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1.
Firstly, it should be noted that the present invention is applied to a computing map compiling scene in an artificial intelligent chip software stack, and the execution subject of the present invention may be any electronic device capable of implementing operator execution, for example, any one of a smart phone, a smart watch, a desktop computer, a portable computer, etc.
In the embodiment of the invention, the neural network model can be applied to the fields of image recognition, voice processing, natural language processing and the like, and can be, for example, a convolutional neural network model (Convolutional Neural Networks, CNN), a cyclic neural network model (Recurrent Neural Networks, RNN) and the like, and a plurality of operators are required to operate the neural network as supports.
Optionally, before the operator fusion processing is performed on the plurality of single operators in the neural network model, the following steps are further required to be performed:
based on a plurality of single operators in the neural network model, generating a calculation map corresponding to the neural network model, wherein the calculation map is used for representing the data dependency relationship among the single operators.
In the embodiment of the invention, before operator fusion processing is carried out on a plurality of single operators in the neural network model, each single operator from the neural network model needs to be subjected to graph assembly according to interfaces, and a calculation graph is generated.
The computational graph corresponding to the neural network model is a directed acyclic graph used for describing operations, and has two main elements: nodes (nodes) and edges (edges); wherein each node may correspond to a single operator, such as a vector, matrix, tensor, etc.; edges represent operations such as addition, subtraction, multiplication, division, convolution, etc.
The computational graph reflects the data dependencies between the individual operators. Fig. 2 is one of the schematic diagrams of the computational graph provided by the present invention, in the computational graph shown in fig. 2, the outputs of operator 1 and operator 2 are the inputs of operator 3, the output of operator 4 is the input of operator 5, and the outputs of operator 3 and operator 5 are the inputs of operator 6.
The invention provides an operator synchronization mechanism under a calculation graph mode of a system, and different operator synchronization strategies (namely parallel execution strategies among operators) are configured at different calculation graph compiling stages, so that the calculation parallelism among the operators is improved.
102, generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator.
Optionally, the operator performs configuration information including at least one of:
a) The first operator execution configuration information is used for indicating operator parallel execution strategies among the M fusion operators and the N single operators.
b) And the second operator execution configuration information is used for indicating an operator parallel execution strategy among all sub operators in each fusion operator.
In the above embodiment, the parallel execution policy between the fusion operators and the single operators, and the parallel execution policy between the sub operators in each fusion operator can be determined by the first operator execution configuration information and the second operator execution configuration information, so that the parallelism of the computation between the operators is improved.
And 103, executing the target operator based on the operator execution configuration information.
According to the operator execution method provided by the invention, the operator fusion processing is carried out on a plurality of single operators in the neural network model to generate a target operator corresponding to the neural network model, wherein the target operator comprises M fusion operators or M fusion operators and N single operators; then, generating operator execution configuration information based on the target operators, and executing M fusion operators or M fusion operators and N single operators based on a parallel execution strategy indicated by the operator execution configuration information, so that the parallel execution of the computation among the operators is realized, the computation parallelism among the fusion operators and the computation parallelism among operators in each fusion operator are improved, and further the operator execution efficiency and the operator development efficiency are improved.
Optionally, before the executing the target operator, the method further includes:
and aiming at each fusion operator, cutting the data associated with each sub operator in the fusion operators based on a preset data cutting strategy.
In the embodiment of the invention, the data segmentation strategy is a tilling strategy, which is a technology for reducing access to global memory by using shared memory on a graphics processor (Graphics Processing Unit, GPU) so as to improve the execution efficiency of kernel functions. The data associated with each sub operator in each fusion operator can be segmented by using a tilling strategy, so that the data quantity input into each sub operator is reduced, and the execution efficiency of the whole fusion operator can be effectively improved.
Specifically, after data associated with each sub-operator in the fusion operator is segmented by using a tilling strategy, a plurality of data blocks associated with each sub-operator are obtained.
In the process of executing each sub operator, the data input to each sub operator is a segmented data block, so that the data processing capacity of each sub operator is improved.
Optionally, the executing the target operator based on the operator execution configuration information specifically includes at least one of the following ways:
mode one, specifically including step 1) -step 2):
step 1), generating an operator execution instruction based on the first operator execution configuration information and the second operator execution configuration information under the condition that the target operator comprises the M fusion operators and the N single operators.
Step 2), executing the target operator based on the operator execution instruction.
Optionally, the executing the target operator based on the operator executing instruction may be specifically implemented by the following steps [1] -step [ 2):
step [1], for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing the second sub operator under the condition that the first sub operator sends a first message to the second sub operator; the output of the first sub operator is the input of the second sub operator, and the first message is used for representing at least part of data associated with the first sub operator after the first sub operator is executed;
step [2] of executing configuration information based on the first operator, and executing the N single operators under the condition that the target sub operator in each fusion operator sends a second message to the N single operators; the second message is used for representing at least part of data in the data associated with the target sub-operator after the execution of the target sub-operator is finished, and the output of the target sub-operator is the input of the N single operators.
Fig. 3 is a schematic diagram of a second calculation diagram provided by the present invention, where the calculation diagram shown in fig. 3 includes a fusion operator a, a fusion operator b, and a single operator c. The fusion operator a comprises a sub operator 1, a sub operator 2 and a sub operator 3; the fusion operator b internally comprises a sub operator 4 and a sub operator 5.
a) For sub-operator 1 and sub-operator 3: the sub operator 1 is a first sub operator, and the sub operator 3 is a second sub operator.
Based on the second operator configuration information, the sub operator 1 firstly executes at least one data block associated with the sub operator 1 (each data block is obtained by segmenting data associated with the sub operator 1 through a tilling strategy). After sub operator 1 has performed at least part of the data, a first message is sent to sub operator 3. The sub operator 3, upon receipt of the first message, starts executing at least one data block associated with the sub operator 3.
In the above embodiment, the sub-operator 1 and the sub-operator 3 may be executed in parallel.
b) For sub-operator 2 and sub-operator 3: the sub operator 2 is a first sub operator, and the sub operator 3 is a second sub operator.
Based on the second operator configuration information, the sub operator 2 first executes at least one data block associated therewith. After sub operator 2 has performed at least part of the data, a first message is sent to sub operator 3. The sub operator 3, upon receipt of the first message, starts executing at least one data block associated with the sub operator 3.
In the above embodiment, the sub-operator 2 and the sub-operator 3 may be executed in parallel.
c) For sub-operator 4 and sub-operator 5: the sub-operator 4 is a first sub-operator, and the sub-operator 5 is a second sub-operator.
Based on the second operator configuration information, the sub operator 4 first executes at least one data block associated therewith. After sub-operator 4 has performed at least part of the data, a first message is sent to sub-operator 5. The sub operator 5, upon receipt of the first message, starts executing at least one data block associated with the sub operator 5.
In the above embodiment, the sub-operator 4 and the sub-operator 5 may be executed in parallel.
d) For sub-operator 3, sub-operator 5 and single operator c: the sub-operators 3 and 5 are target sub-operators.
Based on the first operator configuration information, the sub-operators 3, 5 first execute at least one data block associated therewith. After sub-operator 3, sub-operator 5 has performed at least part of the data, a second message is sent to single operator c. The single operator c starts to execute at least one data block associated with the single operator c after receiving the second message sent by the sub operator 3 and the sub operator 5.
In the above embodiment, the sub-operator 3, the sub-operator 5 and the single operator c may be executed in parallel.
In the above embodiment, different operators belong to different computing units, and thus, parallel execution between different operators may be understood as parallel execution between different computing units or synchronization between different computing units.
For example: synchronization between the Tcore and vector Engine; synchronization between Vector and Vector engine.
Mode two, specifically including step 1) -step 2):
step 1), under the condition that the target operator comprises the M fusion operators, executing configuration information based on the second operator to generate an operator execution instruction;
step 2), executing the target operator based on the operator execution instruction.
Optionally, the executing the target operator based on the operator executing instruction may specifically be implemented by:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing a fourth sub operator under the condition that a third message is sent to the fourth sub operator by the third sub operator;
the output of the third sub-operator is the input of the fourth sub-operator, and the third message is used for representing at least part of data associated with the third sub-operator after the third sub-operator is executed.
FIG. 4 is a third schematic diagram of the computation graph provided by the present invention, in which the computation graph shown in FIG. 4 includes a fusion operator a and a fusion operator b. The fusion operator a comprises a sub operator 1, a sub operator 2 and a sub operator 3; the fusion operator b internally comprises a sub operator 4 and a sub operator 5.
a) For sub-operator 1 and sub-operator 3: the sub-operator 1 is a third sub-operator, and the sub-operator 3 is a fourth sub-operator.
Based on the second operator configuration information, the sub operator 1 firstly executes at least one data block associated with the sub operator 1 (each data block is obtained by segmenting data associated with the sub operator 1 through a tilling strategy). After sub operator 1 has performed at least part of the data, a third message is sent to sub operator 3. The sub operator 3, after receiving the third message, starts executing at least one data block associated with the sub operator 3.
In the above embodiment, the sub-operator 1 and the sub-operator 3 may be executed in parallel.
b) For sub-operator 2 and sub-operator 3: the sub-operator 2 is a third sub-operator, and the sub-operator 3 is a fourth sub-operator.
Based on the second operator configuration information, the sub operator 2 first executes at least one data block associated therewith. After sub operator 2 has performed at least part of the data, a third message is sent to sub operator 3. The sub operator 3, after receiving the third message, starts executing at least one data block associated with the sub operator 3.
In the above embodiment, the sub-operator 2 and the sub-operator 3 may be executed in parallel.
c) For sub-operator 4 and sub-operator 5: the sub-operator 4 is a third sub-operator, and the sub-operator 5 is a fourth sub-operator.
Based on the second operator configuration information, the sub operator 4 first executes at least one data block associated therewith. After sub-operator 4 has performed at least part of the data, a third message is sent to sub-operator 5. The sub operator 5, after receiving the third message, starts executing at least one data block associated with the sub operator 5.
In the above embodiment, the sub-operator 4 and the sub-operator 5 may be executed in parallel.
Since there is no data dependency between the sub-operator 3 and the sub-operator 5, after the sub-operator 3 and the sub-operator 5 receive the third message, each may execute at least one data block associated with each sub-operator.
Optionally, in the foregoing embodiment, the executing the target operator based on the operator execution instruction may specifically be implemented by:
step 1), mapping the operator execution instruction from a logic space to a physical space of the target operator in hardware equipment;
step 2), executing instructions based on the operators in the physical space, and executing the target operators.
In the embodiment of the invention, after the operator execution instruction is generated, a kernel corresponding to the operator graph can be generated, and specifically as shown in fig. 5, fig. 5 is a schematic diagram of the kernel corresponding to the operator graph provided by the invention.
After the operator execution instruction is mapped from the logical space to the physical space of the target operator in the hardware device, kernel is run to ensure the parallel execution of the target operator.
Fig. 6 is a second flowchart of the operator execution method provided in the present invention, referring to fig. 6, the method includes steps 601-610, in which:
step 601, generating a computation graph corresponding to the neural network model based on a plurality of single operators in the neural network model, wherein the computation graph is used for representing data dependency relations among the single operators.
Step 602, performing operator fusion processing on a plurality of single operators in the neural network model to generate a target operator corresponding to the neural network model; the target operators comprise M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1.
Step 603, for each fusion operator, splitting data associated with each sub operator in the fusion operator based on a preset data splitting strategy.
Step 604, generating an operator execution instruction based on the first operator execution configuration information and the second operator execution configuration information in the case that the target operator includes M fusion operators and N single operators.
Step 605, for each fusion operator, responding to an operator execution instruction, executing configuration information based on a second operator, and executing the second sub operator under the condition that the first sub operator sends a first message to the second sub operator; the output of the first sub operator is the input of the second sub operator, and the first message is used for representing at least part of data in the data associated with the first sub operator after the first sub operator is executed; the second operator execution configuration information is used for indicating an operator parallel execution strategy among sub operators in each fusion operator.
Step 606, executing configuration information based on the first operator, and executing N single operators under the condition that the target sub operator in each fusion operator sends a second message to the N single operators; the second message is used for representing at least part of data in the data associated with the target sub operator after the execution of the target sub operator is finished, the output of the target sub operator is the input of N single operators, and the first operator execution configuration information is used for indicating an operator parallel execution strategy between M fusion operators and an operator parallel execution strategy between the M fusion operators and the N single operators.
In step 607, in the case that the target operator includes M fusion operators, the operator execution instruction is generated based on the second operator execution configuration information.
Step 608, for each fusion operator, responding to an operator execution instruction, executing configuration information based on the second operator, and executing the fourth sub operator under the condition that the third sub operator sends a third message to the fourth sub operator; the output of the third sub operator is the input of the fourth sub operator, and the third message is used for representing at least part of data in the data associated with the third sub operator after the third sub operator is executed.
It should be noted that the execution sequence of steps 604-606 and steps 607-608 is not sequential.
Step 609, mapping the operator execution instruction from the logical space to the physical space of the target operator in the hardware device.
Step 610, executing a target operator based on an operator execution instruction in the physical space.
The operator execution device provided by the invention is described below, and the operator execution device described below and the operator execution method described above can be referred to correspondingly. Fig. 7 is a schematic structural diagram of an operator executing apparatus according to the present invention, and as shown in fig. 7, the operator executing apparatus 700 includes: a fusion module 701, a first generation module 702 and an execution module 703, wherein:
the fusion module 701 is configured to perform operator fusion processing on a plurality of single operators in a neural network model, and generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1;
a first generating module 702, configured to generate operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator;
an execution module 703, configured to execute the target operator based on the operator execution configuration information.
The operator execution device provided by the invention generates a target operator corresponding to the neural network model by carrying out operator fusion processing on a plurality of single operators in the neural network model, wherein the target operator comprises M fusion operators or M fusion operators and N single operators; then, generating operator execution configuration information based on the target operators, and executing M fusion operators or M fusion operators and N single operators based on a parallel execution strategy indicated by the operator execution configuration information, so that the parallel execution of the computation among the operators is realized, the computation parallelism among the fusion operators and the computation parallelism among operators in each fusion operator are improved, and further the operator execution efficiency and the operator development efficiency are improved.
Optionally, the operator performs configuration information including at least one of:
the first operator execution configuration information is used for indicating operator parallel execution strategies among the M fusion operators and the N single operators;
and the second operator execution configuration information is used for indicating an operator parallel execution strategy among all sub operators in each fusion operator.
Optionally, the execution module 703 is further configured to:
under the condition that the target operator comprises the M fusion operators and the N single operators, generating an operator execution instruction based on the first operator execution configuration information and the second operator execution configuration information;
and executing the target operator based on the operator execution instruction.
Optionally, the execution module 703 is further configured to:
under the condition that the target operator comprises the M fusion operators, executing configuration information based on the second operator to generate an operator execution instruction;
and executing the target operator based on the operator execution instruction.
Optionally, the execution module 703 is further configured to:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing the second sub operator under the condition that a first sub operator sends a first message to the second sub operator; the output of the first sub operator is the input of the second sub operator, and the first message is used for representing at least part of data associated with the first sub operator after the first sub operator is executed;
executing configuration information based on the first operator, and executing the N single operators under the condition that a target sub operator in each fusion operator sends a second message to the N single operators; the second message is used for representing at least part of data in the data associated with the target sub-operator after the execution of the target sub-operator is finished, and the output of the target sub-operator is the input of the N single operators.
Optionally, the execution module 703 is further configured to:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing a fourth sub operator under the condition that a third message is sent to the fourth sub operator by the third sub operator;
the output of the third sub-operator is the input of the fourth sub-operator, and the third message is used for representing at least part of data associated with the third sub-operator after the third sub-operator is executed.
Optionally, the apparatus further comprises:
the segmentation module is used for segmenting the data associated with each sub operator in the fusion operators based on a preset data segmentation strategy aiming at each fusion operator.
Optionally, the execution module 703 is further configured to:
mapping the operator execution instruction from a logic space to a physical space of the target operator in hardware equipment;
executing an instruction based on the operator in the physical space, and executing the target operator.
Optionally, the apparatus further comprises:
the second generation module is used for generating a calculation graph corresponding to the neural network model based on a plurality of single operators in the neural network model, and the calculation graph is used for representing the data dependency relationship among the single operators.
Fig. 8 is a schematic structural diagram of an electronic device according to the present invention, as shown in fig. 8, the electronic device may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. Processor 810 may call logic instructions in memory 830 to perform an operator execution method comprising: performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1; generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator; and executing the target operator based on the operator execution configuration information.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the operator performing method provided by the methods described above, the method comprising: performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1; generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator; and executing the target operator based on the operator execution configuration information.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the operator execution method provided by the above methods, the method comprising: performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1; generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator; and executing the target operator based on the operator execution configuration information.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (13)

1. An operator execution method, comprising:
performing operator fusion processing on a plurality of single operators in a neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1;
generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator;
and executing the target operator based on the operator execution configuration information.
2. The operator execution method according to claim 1, wherein the operator execution configuration information includes at least one of:
the first operator execution configuration information is used for indicating operator parallel execution strategies among the M fusion operators and the N single operators;
and the second operator execution configuration information is used for indicating an operator parallel execution strategy among all sub operators in each fusion operator.
3. The operator execution method according to claim 2, wherein the executing the target operator based on the operator execution configuration information includes:
under the condition that the target operator comprises the M fusion operators and the N single operators, generating an operator execution instruction based on the first operator execution configuration information and the second operator execution configuration information;
and executing the target operator based on the operator execution instruction.
4. The operator execution method according to claim 2, wherein the executing the target operator based on the operator execution configuration information includes:
under the condition that the target operator comprises the M fusion operators, executing configuration information based on the second operator to generate an operator execution instruction;
and executing the target operator based on the operator execution instruction.
5. The operator execution method according to claim 3, wherein the executing the target operator based on the operator execution instruction includes:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing the second sub operator under the condition that a first sub operator sends a first message to the second sub operator; the output of the first sub operator is the input of the second sub operator, and the first message is used for representing at least part of data associated with the first sub operator after the first sub operator is executed;
executing configuration information based on the first operator, and executing the N single operators under the condition that a target sub operator in each fusion operator sends a second message to the N single operators; the second message is used for representing at least part of data in the data associated with the target sub-operator after the execution of the target sub-operator is finished, and the output of the target sub-operator is the input of the N single operators.
6. The operator execution method according to claim 4, wherein the executing the target operator based on the operator execution instruction includes:
for each fusion operator, responding to the operator execution instruction, executing configuration information based on the second operator, and executing a fourth sub operator under the condition that a third message is sent to the fourth sub operator by the third sub operator;
the output of the third sub-operator is the input of the fourth sub-operator, and the third message is used for representing at least part of data associated with the third sub-operator after the third sub-operator is executed.
7. The operator execution method according to any one of claims 1 to 6, wherein before said executing the target operator, the method further comprises:
and aiming at each fusion operator, cutting the data associated with each sub operator in the fusion operators based on a preset data cutting strategy.
8. The operator execution method according to any one of claims 3 to 6, wherein the executing the target operator based on the operator execution instruction includes:
mapping the operator execution instruction from a logic space to a physical space of the target operator in hardware equipment;
executing an instruction based on the operator in the physical space, and executing the target operator.
9. The operator execution method according to any one of claims 1 to 6, characterized in that before the operator fusion processing is performed on a plurality of single operators in a neural network model, the method further comprises:
based on a plurality of single operators in the neural network model, generating a calculation map corresponding to the neural network model, wherein the calculation map is used for representing the data dependency relationship among the single operators.
10. An operator execution apparatus, comprising:
the fusion module is used for carrying out operator fusion processing on a plurality of single operators in the neural network model to generate a target operator corresponding to the neural network model; the target operator comprises M fusion operators, or M fusion operators and N single operators; m, N is an integer greater than or equal to 1;
the first generation module is used for generating operator execution configuration information based on the target operator; the operator execution configuration information is used for indicating a parallel execution strategy of the target operator;
and the execution module is used for executing the target operator based on the operator execution configuration information.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the operator execution method of any one of claims 1 to 9 when the program is executed.
12. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the operator performing method of any of claims 1 to 9.
13. A computer program product comprising a computer program which, when executed by a processor, implements the operator execution method of any one of claims 1 to 9.
CN202311212785.4A 2023-09-19 2023-09-19 Operator execution method, device, electronic equipment and storage medium Pending CN117196015A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311212785.4A CN117196015A (en) 2023-09-19 2023-09-19 Operator execution method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311212785.4A CN117196015A (en) 2023-09-19 2023-09-19 Operator execution method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117196015A true CN117196015A (en) 2023-12-08

Family

ID=88999620

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311212785.4A Pending CN117196015A (en) 2023-09-19 2023-09-19 Operator execution method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117196015A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118446305A (en) * 2023-12-29 2024-08-06 荣耀终端有限公司 Reasoning method of model reasoning framework, electronic equipment and corresponding device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118446305A (en) * 2023-12-29 2024-08-06 荣耀终端有限公司 Reasoning method of model reasoning framework, electronic equipment and corresponding device

Similar Documents

Publication Publication Date Title
US20220083857A1 (en) Convolutional neural network operation method and device
CN110633785B (en) Method and system for calculating convolutional neural network
CN112230927B (en) File redirection method, code loading control method and device
CN118151906B (en) Operator automatic generation method, device, equipment and medium
CN118334323B (en) Insulator detection method and system based on ultraviolet image
CN116128019A (en) Parallel training method and device for transducer model
CN113168309A (en) Method, circuit and SOC for performing matrix multiplication
CN117196015A (en) Operator execution method, device, electronic equipment and storage medium
CN111352896B (en) Artificial intelligence accelerator, equipment, chip and data processing method
CN116796289A (en) Operator processing method and device, electronic equipment and storage medium
CN117170681A (en) Nuclear function generation method and device, electronic equipment and storage medium
KR20220078819A (en) Method and apparatus for performing deep learning operations
CN112200310B (en) Intelligent processor, data processing method and storage medium
CN117291259A (en) Operator optimization method and device, electronic equipment and storage medium
CN118467459A (en) Model chip architecture implementation method, apparatus, electronic device, storage medium and computer program product
CN115809688B (en) Model debugging method and device, electronic equipment and storage medium
CN115759260A (en) Inference method and device of deep learning model, electronic equipment and storage medium
CN115346099A (en) Image convolution method, chip, equipment and medium based on accelerator chip
CN113313239A (en) Artificial intelligence model design optimization method and device
CN116755714B (en) Method, device, equipment and storage medium for operating deep neural network model
CN119005271B (en) Neural network model parallel optimization method and device based on operator partitioning
CN118093436B (en) Random test method, device, equipment and storage medium for deep learning operator
US11689608B1 (en) Method, electronic device, and computer program product for data sharing
CN116562041B (en) A simulation method, device, electronic device and storage medium
CN117172289B (en) Tensor segmentation method, device and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination