[go: up one dir, main page]

CN119902771A - Computational graph processing method, device, equipment and storage medium - Google Patents

Computational graph processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN119902771A
CN119902771A CN202411983319.0A CN202411983319A CN119902771A CN 119902771 A CN119902771 A CN 119902771A CN 202411983319 A CN202411983319 A CN 202411983319A CN 119902771 A CN119902771 A CN 119902771A
Authority
CN
China
Prior art keywords
shape
dynamic
input
data
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411983319.0A
Other languages
Chinese (zh)
Other versions
CN119902771B (en
Inventor
远翔
施慧
杨晓峰
陈鹏
蒋杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202411983319.0A priority Critical patent/CN119902771B/en
Publication of CN119902771A publication Critical patent/CN119902771A/en
Application granted granted Critical
Publication of CN119902771B publication Critical patent/CN119902771B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Generation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a processing method, a device, equipment and a storage medium of a calculation graph, wherein the method comprises the steps of obtaining the calculation graph comprising P dynamic inputs and a barrel dividing mark of the calculation graph, wherein the barrel dividing mark is used for indicating groups to which each dynamic input belongs and barrel dividing modes of each group, the dynamic shapes of the dynamic inputs in the same group have the same value ranges, the value ranges of the dynamic shapes of the dynamic inputs are subjected to barrel dividing processing according to the barrel dividing modes of the corresponding groups, further, the barrel dividing processing can be performed on the value ranges of the dynamic shapes of the P dynamic inputs according to the indication of the barrel dividing mark, a plurality of barrels are obtained, and the calculation graph is subjected to static compiling based on the numerical value in each barrel, so that after the input data of the calculation graph are obtained, the calculation graph is executed on the input data based on the compiling results corresponding to the barrels, and the data calculation result of the input data is obtained, thereby saving the cost of a processor and improving the execution efficiency of the calculation graph.

Description

Processing method, device, equipment and storage medium for calculation graph
Technical Field
The present application relates to the field of internet technologies, and in particular, to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing a computation graph.
Background
With the development of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) technology, a computational graph (Computation Graph) is widely used, and the computational graph is a graph structure for representing mathematical computation or program execution, and is particularly important in the implementation of an AI model, and can be used for describing a computation process in the AI model. The computing graph may be input by using a dynamic shape, which refers to a shape that can be dynamically changed, and it may indicate that the shape of the data input by the computing graph each time is not fixed, that is, the computing graph may support the input of data of multiple shapes, and such computing graph may be referred to as a dynamic shape graph.
For a dynamic shape graph, a common processing mode is compiling and executing, namely, dynamic compiling is performed on the dynamic shape graph in advance to obtain a compiling result supporting dynamic input, so that when the dynamic shape graph needs to be executed, input data of the dynamic shape graph is obtained, and the dynamic shape graph is executed on the input data based on the compiling result. Since the shape of the input data is often different when it is executed each time, such a processing manner easily causes a large overhead to the processor, resulting in a low execution efficiency.
Disclosure of Invention
The embodiment of the application provides a processing method, a device, equipment and a storage medium of a computational graph, which can save the cost of a processor and improve the execution efficiency of the computational graph.
In one aspect, an embodiment of the present application provides a method for processing a computation graph, where the method includes:
Obtaining a calculation graph, wherein the calculation graph comprises P dynamic inputs, and P is a positive integer, each dynamic input is provided with a dynamic shape, and each dynamic shape is provided with a value range;
The method comprises the steps of obtaining a sub-bucket mark of the calculation graph, wherein the sub-bucket mark is used for indicating each dynamic input group and a sub-bucket mode of each group, the dynamic shapes of the dynamic inputs in the same group have the same value range, and the value ranges of the dynamic shapes of the dynamic inputs are subjected to sub-bucket processing according to the sub-bucket modes of the corresponding groups;
Carrying out barrel separation processing on the value ranges of the P dynamic shapes which are dynamically input according to the indication of the barrel separation mark to obtain a plurality of barrels, wherein each barrel comprises P numerical values, and different numerical values in the P numerical values are selected from different value ranges;
Performing static compiling on the calculation graph based on the numerical value in each bucket to obtain a compiling result corresponding to each bucket;
After the input data of the computation graph is obtained, the computation graph is executed on the input data based on the compiling results corresponding to the buckets, and the data computation result of the input data is obtained.
In another aspect, an embodiment of the present application provides a processing apparatus for a computation graph, where the apparatus includes:
The computing device comprises an acquisition unit, a computing unit and a processing unit, wherein the acquisition unit is used for acquiring a computing graph, the computing graph comprises P dynamic inputs, and P is a positive integer, each dynamic input is provided with a dynamic shape, and each dynamic shape is provided with a value range;
The acquisition unit is also used for acquiring a barrel division mark of the calculation graph, wherein the barrel division mark is used for indicating each dynamic input group and a barrel division mode of each group, the dynamic shapes of the dynamic inputs in the same group have the same value range, and the value ranges of the dynamic shapes of the dynamic inputs are subjected to barrel division processing according to the barrel division modes of the corresponding groups;
The processing unit is used for carrying out barrel separation processing on the value ranges of the P dynamic shapes which are dynamically input according to the indication of the barrel separation mark to obtain a plurality of barrels, wherein each barrel comprises P numerical values, and different numerical values in the P numerical values are selected from different value ranges;
The processing unit is further used for performing static compiling on the calculation graph based on the numerical value in each bucket to obtain a compiling result corresponding to each bucket;
And the processing unit is further used for executing the computation graph on the input data based on the compiling results corresponding to the multiple barrels after the input data of the computation graph is acquired, so as to obtain the data computation result of the input data.
In yet another aspect, an embodiment of the present application provides a computer device, including an input interface and an output interface, the computer device further including:
A processor and a computer storage medium;
wherein the processor is adapted to implement one or more instructions and the computer storage medium stores one or more instructions adapted to be loaded by the processor and to perform the above-mentioned method of processing a computational graph.
In yet another aspect, embodiments of the present application provide a computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the above-mentioned method of processing a computational graph.
In yet another aspect, embodiments of the present application provide a computer program product comprising one or more instructions which, when executed by a processor, implement the above-mentioned method of processing a computational graph.
The embodiment of the application can acquire the calculation graphs of P dynamic inputs and the barrel dividing marks of the calculation graphs, wherein the barrel dividing marks are used for indicating groups to which each dynamic input belongs and barrel dividing modes of each group, the dynamic shapes of the dynamic inputs in the same group have the same value ranges, and the value ranges of the dynamic shapes of the dynamic inputs are processed in barrel dividing modes of the corresponding groups. Furthermore, before executing the calculation graph, the calculation graph can be subjected to barrel division processing according to the indication of barrel division marks, a plurality of barrels are obtained, the calculation graph is subjected to static compiling based on the numerical value in each barrel, so that after acquiring the input data of the calculation graph, the calculation graph is executed on the input data based on the compiling results corresponding to the barrels, and the data calculation result of the input data is obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a process for executing a dynamic computational graph by means of static bucket compilation according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for processing a calculation map according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a bucket-based process according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for processing a calculation map according to another embodiment of the present application;
FIG. 5a is a schematic diagram of an extended data set according to an embodiment of the present application;
FIG. 5b is a schematic diagram of an expanded data and corresponding data calculation result according to an embodiment of the present application;
FIG. 5c is a schematic diagram of a process for generating a target shape calculation formula according to an embodiment of the present application;
FIG. 5d is a schematic diagram of another object shape calculation formula according to an embodiment of the present application;
FIG. 5e is a schematic diagram of a method for processing a calculation map according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a processing device for a calculation map according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
In the embodiment of the application, the calculation graph is a graph structure which can be used for describing the calculation process in an AI model, wherein the AI model is a model constructed based on an AI technology, and the AI technology is a theory, a method, a technology and an application system which are used for simulating, extending and expanding human intelligence, sensing environment, acquiring knowledge and obtaining the best result by using the knowledge by utilizing a digital computer or a machine controlled by the digital computer. By way of example, the AI model may be, for example, a generic neural network model (e.g., convolutional neural network model), a large model, etc., and the so-called large model may be a large language model (Large Language Model), which specifically refers to a deep neural network with large parameters.
Specifically, the computational graph may include, but is not limited to, Q input nodes, R operator nodes, and at least one output node, Q and R being positive integers. The input node is used for representing input (input) of the computational graph, which can be responsible for inputting data to be calculated to the computational graph, the operator node is used for representing operators, which are a generic term of a certain mathematical operation, one operator can represent at least one operation in one network layer in the AI model, and the output node is used for representing output (output) of the computational graph, which is responsible for outputting execution results of the computational graph (namely, data calculation results obtained after calculating the input data). It follows that the computational graph may include Q inputs, R operators, and at least one output.
In addition, both the input and the output of the computation graph may have respective shapes, the shape of the input specifically refers to the shape of the data supported by the input, and the shape of the output specifically refers to the shape of the data supported by the output. A shape is an array for representing a constituent structure of data, which may specifically include values of at least one dimension to describe the size (number of data elements) of the data in at least one dimension in a multidimensional space, where the data may be, for example, images, voices, text, etc., without limitation. By way of example, for example, a shape for data A of <23xf32>, the shape may indicate that data A is one-dimensional data having 1 dimension in multidimensional space and containing 23 data elements in that dimension, each data element having a format of f32 (i.e., 32-bit floating point number), and for example, a shape for data B of <23x10xf32>, the shape may indicate that data B is two-dimensional data having 2 dimensions in multidimensional space and containing 23 data elements in the 1 st dimension and 10 data elements in the 2 nd dimension, each data element having a format of f32.
Further, the shape can be specifically divided into a dynamic shape and a static shape, wherein the dynamic shape refers to a shape containing a dynamic dimension, and the static shape refers to a shape not containing a dynamic dimension. Wherein, the dynamic dimension is the dimension in which the index value supports dynamic change, i.e. the value of the dynamic dimension is unknown, and the unknown value can be represented by a target value (such as-1) in the shape. That is, when the value of one or several dimensions is-1, the value representing the dimensions is unknown and can be dynamically changed, the dimensions can be called dynamic dimensions, the shape in this case can be called dynamic shape, for example [ -1,30] is called dynamic shape, and when the value of each dimension in the shape is not-1, the value representing each dimension is known and is fixed, and the shape in this case is static shape, for example [1,30] is called static shape.
Based on the classification of the shape, the calculation map can be specifically classified into a dynamic shape map (dynamic calculation map) and a static shape map (static calculation map). The dynamic shape graph refers to a calculation graph with at least one input shape being a dynamic shape, namely, when the calculation graph has at least one input shape being a static shape, the calculation graph can be called as a dynamic shape graph. Correspondingly, the static shape graph refers to a calculation graph of which each input shape is a static shape, namely, when each input shape of the calculation graph is a static shape, the calculation graph can be called a static shape graph. For convenience of explanation, the input with the dynamic shape will be referred to as dynamic input, the input with the static shape will be referred to as static input, and the dynamic shape of each dynamic input has a respective value range.
In the AI scenario, since the shape of the input involved in each execution is often different, it is easy to cause that operators in the dynamic shape graph often need to add many extra judgments in the implementation process, and most operators supporting multiple inputs usually have requirements on the shape of each input, for example, add (add) operators, communication requires that the shape of two inputs be the same, or the shape of one input can be changed to be the same as the shape of another input in a broadcast manner, which easily causes that in actual computation, a processor needs to add and process some operators related to shape computation in addition to the operators in the computation graph to ensure correctness. However, for a static shape, since the shape of the input of each operator is known, there is no need to involve two types of overhead (i.e., overhead required for additional judgment and overhead required for processing additional operators) involved in the execution of the aforementioned dynamic shape, so that the running time of the static shape can be much faster than that of the dynamic shape, i.e., the execution efficiency of the static shape is higher than that of the dynamic shape.
It is proved that, for the dynamic shape graph and the static shape graph having the same calculation logic, if the shape of the input data of the dynamic shape graph is expanded to be the same as the shape of the input of the static shape graph when the dynamic shape graph is required to be executed, the static shape graph is executed using the expanded input data, and the corresponding output shape (i.e., the shape of the data calculation result obtained by the calculation of the dynamic shape graph) is predicted according to the shape of the input data, so that the data calculation result of the input data is extracted from the execution result of the static shape graph based on the predicted output shape, and the extracted data calculation result is identical with the data calculation result calculated by the dynamic shape graph.
For example, if a dynamic shape map has two inputs and the shape of the two inputs is < -1xf32> and < -1xf32>, respectively, where the dimension where-1 is located is in the range of values [1,10], there is another static shape map that also has two inputs and the shape of the two inputs is <10xf32> and <10xf32>, respectively, and the computational logic in both the dynamic shape map and the static shape map is add. When the dynamic shape graph needs to be executed, the shape of the input data of the dynamic shape graph can be expanded to <10xf32>, the static shape graph is executed by using the expanded input data, the shape of the actual input data is predicted to output the shape, and the data calculation result of the input data is extracted from the execution result of the static shape graph according to the predicted output shape, so that the extracted data calculation result is consistent with the data calculation result obtained by calculating the input data through the dynamic shape graph.
Based on the research results, the embodiment of the application provides a processing method of a calculation map aiming at a dynamic shape map, and the method can accelerate the execution of the dynamic shape map in a static sub-bucket compiling mode. The static sub-bucket compiling mode is to perform sub-bucket processing on a value range of a dynamic shape provided by at least one input of a dynamic shape graph to obtain a plurality of buckets, wherein each bucket comprises at least one numerical value, different numerical values are selected from the value ranges of different dynamic shapes, and static compiling is performed on the dynamic shape graph based on the numerical values in each bucket to obtain compiling results corresponding to each bucket. The principle of compiling the dynamic shape graph statically based on the numerical value in any barrel is that the specific value of each dimension in the dynamic shape in at least one input of the dynamic shape graph is determined based on the numerical value in any barrel, and the static shape of the corresponding barrel is obtained, so that the dynamic shape graph is compiled by the static shape of the barrel.
Specifically, before executing the dynamic shape graph, the method compiles the dynamic shape graph in advance in a static sub-bucket compiling manner to obtain compiling results corresponding to a plurality of buckets, wherein different compiling results correspond to static shapes of different buckets, so that after acquiring actual input data of the dynamic shape graph, one compiling result can be selected from the compiling results as a target compiling result, the static shape corresponding to the target compiling result is larger than or equal to the shape of the actual input data, the dynamic shape graph is executed on the basis of the target compiling result, the input of the dynamic shape is converted into the input of the static shape, the execution of the dynamic shape graph is converted into the execution of the static shape graph, the execution of the dynamic shape graph is accelerated, the execution efficiency of the dynamic shape graph is prompted, the additional cost introduced by the dynamic shape processing is avoided, the cost of a processor is saved, and the running efficiency of the processor is improved. It can be understood that any compiling result can be understood to be an executable dynamic shape diagram with known shape, the known shape is a static shape corresponding to the compiling result, so that when the dynamic shape diagram is executed on the actual input data based on any compiling result, the corresponding compiling result is essentially executed by adopting the actual input data.
The following describes, with reference to fig. 1, a specific example, the general procedure of executing a dynamic shape graph by static bucket compiling according to the method described above:
Assume that a dynamic shape map has at least two inputs, for example, arg0 (1 st input) dynamic shape is < -1x1x16xf32>, arg1 (2 nd input) dynamic shape is < -1x1280xf32>, and the value range of the 1 st dimension value in the dynamic shapes is 1-100. If the value range is subjected to barrel splitting processing by adopting an index barrel splitting strategy based on 2, 1,2,4,8 can be selected from the value range respectively, the values in different barrels are used as the values in the values range, the values in each barrel are used as the actual values (specific values) of the dynamic value of the 1 st dimension in the dynamic shape input respectively, so as to obtain the static shape of each barrel, for example, the static shape of the 1 st barrel comprises <1x1x16xf32> corresponding to Arg0, the <1x1280xf32> corresponding to Arg1, and the like, and the static shape of the 2 nd barrel comprises <2x1x16xf32> corresponding to Arg0, the <2x1280xf32> corresponding to Arg1, and the like.
Furthermore, the static compiling of the dynamic shape graph can be respectively carried out based on the static shape of each bucket to obtain the compiling result corresponding to each bucket, and in actual operation, the compiling result corresponding to the bucket of which the static shape is closest to and not smaller than the shape of the actual input data is selected to be executed according to the shape size of each actual input data so as to obtain the data calculation result of the actual input data. For example, if the value of the first dimension in the shape of the actual input data is 5, that is, the shape of the actual input data includes <5x1x16xf32> corresponding to Arg0, <5x1280xf32> corresponding to Arg1, etc., then it may be determined that the static shape (< 8x1x16xf32>, <8x1280xf32>, etc.) of the compilation result corresponding to the bucket containing the value 8 is closest to and not smaller than the shape of the actual input data, and thus the compilation result corresponding to the bucket containing the value 8 may be selected to be executed to obtain the data calculation result of the actual input data. Similarly, if the value of the first dimension in the shape of the actual input data is 1, i.e., the shape of the actual input data includes <1x1x16xf32> corresponding to Arg0, <1x1280xf32> corresponding to Arg1, etc., then the compilation result corresponding to the bucket containing the value 1 may be selected for execution, if the value of the first dimension in the shape of the actual input data is 3, i.e., the shape of the actual input data includes <3x1x16xf32> corresponding to Arg0, <3x1280xf32> corresponding to Arg1, etc., then the compilation result corresponding to the bucket containing the value 4 may be selected for execution, and so on.
Based on the above description, the following points need to be explained:
(1) The method provided by the embodiment of the application can effectively reduce the processor overhead when the dynamic shape graph runs by converting the calculation of the dynamic shape graph into the calculation of the static shape graph, thereby improving the running efficiency of the processor. Although the barrel separation processing may bring about some redundant calculation, the cost of the redundant calculation can be reduced by reasonably performing barrel separation, and finally the execution efficiency of the dynamic shape graph is improved.
(2) The application scenario of the method proposed by the embodiment of the application is not limited, namely the method proposed by the embodiment of the application can be applied to various AI scenarios. For example, in an inference scenario, a plurality of input data are often formed into a batch (batch), that is, one batch contains a plurality of input data and is provided for a neural network model to execute, and since the number of input data contained in one batch is often not fixed, in terms of the execution angle of a computation graph corresponding to the neural network model, the situation can be considered that dynamic shape is adopted for input, and the number of input data contained in the batch usually has a value range, the method proposed by the embodiment of the present application can be applied to accelerate the execution of the corresponding computation graph. As another example, in a large model scenario, since the number of tokens (minimum units or basic elements of text processing, such as a word, a phrase, a punctuation mark, or a character) in the input text is also often not fixed, but there is also often an upper limit, this scenario also belongs to the case where a dynamic shape exists in the input of a large model corresponding computation graph and the dynamic shape has a value range, so the method proposed by the embodiment of the present application can be applied to accelerate execution of the corresponding computation graph.
(3) The implementation main body of the method according to the embodiment of the present application is not limited, for example, the method according to the embodiment of the present application may be implemented by a computer device, which may be a terminal or a server, or the method according to the embodiment of the present application may be implemented by both the terminal and the server, which is not limited. The terminal may be a smart phone, a computer (such as a tablet computer, a notebook computer, a desktop computer, etc.), an intelligent wearable device (such as a smart watch, smart glasses), an intelligent voice interaction device, an intelligent household appliance (such as a smart television), a vehicle-mounted terminal or an aircraft, etc., the server may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content distribution networks), and basic cloud computing services such as big data and artificial intelligent platforms, etc.
(4) In the embodiment of the present application, if related data such as user information is involved, when any of the method embodiments provided in the embodiment of the present application is applied to a specific product or technology, the related data is collected under the condition of obtaining user permission or consent, and the collection, use and processing of the related data complies with the related laws and regulations and standards of the related region.
In the embodiment of the present application, a specific implementation procedure of a processing method of a computing chart provided by an embodiment of the present application is described by taking a computer device to execute the processing method of the computing chart as an example. Referring to fig. 2, the processing method of the calculation map may generally include S201-S205:
S201, obtaining a calculation map.
In the embodiment of the present application, the calculation map obtained in step S201 specifically refers to a dynamic shape map (i.e., a dynamic calculation map). The computational graph may include a plurality of nodes, and the plurality of nodes may include, in particular, P input nodes, R operator nodes, and at least one output node, where P and R are positive integers, where one input node represents one dynamic input of the computational graph, one operator node represents one operator in the computational graph, and one output node represents one output of the computational graph.
It can be seen that the computational graph referred to in embodiments of the present application may include P dynamic inputs. It will be appreciated that the computation graph may include a total of Q input nodes (Q is a positive integer), and that the above-mentioned P input nodes representing P dynamic inputs belong to all or part of the Q input nodes, i.e. the computation graph may include a total of Q inputs (Q is a positive integer), where p=q (i.e. the value of P is equal to the total number of inputs included in the computation graph) if each of the Q inputs is a dynamic input, or where P < Q (the value of P is less than the total number of inputs included in the computation graph) if there are some of the Q inputs included in the computation graph that are dynamic inputs and some of the other inputs that are static inputs.
Each dynamic input has a dynamic shape, the dynamic shape of any dynamic input may include values of N dimensions, and dynamic dimensions (i.e., dimensions in which the values support dynamic changes) exist in the N dimensions, where N is a positive integer. It is understood that the number of dimensions (i.e., the value of N) in the dynamic shape of the different dynamic inputs may be the same or different, which is not limited. For example, the 1 st dynamic shape includes a value of 3 dimensions (i.e., the number of dimensions in the 1 st dynamic shape is 3), and the 2 nd dynamic shape includes a value of 1 dimension (i.e., the number of dimensions in the 2 nd dynamic shape is 1, in which case the number of dimensions in the 1 st dynamic input is different from the number of dimensions in the 2 nd dynamic input).
And, each dynamic shape has a range of values. Specifically, the value range of the dynamic shape can be the value range of the dynamic dimension in the dynamic shape, for example, the dynamic shape is < -1x10xf32>, wherein the dynamic dimension is the 1 st dimension and the value range of the 1 st dimension is 0-100, and the value range of the dynamic shape is 0-100. Or the range of values of the dynamic shape may be a range formed by adopting a plurality of preset static shapes, wherein any static shape in the range is obtained by setting the value of the dynamic dimension in the dynamic shape as a preset value, for example, the dynamic shape is still set to be < -1x10xf32>, the preset value may comprise 1, 5, 8, 10 and the like, and the range of values of the dynamic shape may comprise the static shapes of <1x10xf32>, <5x10xf32>, <8x10xf32>, <10x10xf32>, and the like.
S202, obtaining the barrel mark of the calculation map.
For a given computational graph (dynamic shape graph), in order to accelerate execution of the computational graph by using a static bucket compiling manner, a bucket mark of the computational graph needs to be acquired, so that a bucket manner of a value range of each dynamic shape of each dynamic input of the computational graph is known through the bucket mark, so that subsequent bucket processing can be accurately performed on the value range of each dynamic shape based on the corresponding bucket manner.
In a specific implementation, the embodiment of the application provides a shape group barrel marking method, which is used for simply and efficiently marking how to barrel the value range of each dynamically input dynamic shape of a dynamic shape graph. Specifically, the principle of the barrel marking method is approximately as follows:
and s11, grouping the dynamic inputs based on the value ranges of the dynamic shapes of the dynamic inputs to obtain the group to which the dynamic inputs belong. Specifically, the dynamic inputs corresponding to the dynamic shapes with the same value range can be divided into the same group, so as to obtain the group to which each dynamic input belongs.
And s12, respectively configuring a barrel dividing mode for each group. Specifically, for any group, a bucket-splitting manner may be configured for the group based on the service requirement or the experience value, and the bucket-splitting manner may be, for example, an exponential bucket-splitting manner based on a first value (e.g., a value of 2), a multiple bucket-splitting manner based on a second value (e.g., a value of 5), and so on. Or, a barrel dividing mode is configured for the corresponding group based on the historical value distribution condition of the dynamic shapes of the dynamic inputs in the group, so that after the subsequent barrel dividing processing is performed on the value range of the dynamic shapes of the corresponding dynamic inputs based on the barrel dividing mode, the obtained values in the barrels are the values frequently appearing in the history, and the static shapes determined based on the values in the barrels are the shapes frequently appearing in the history, and the maximum probability of the shapes is the same as or similar to the shape of the actual input data of the calculation graph, so that the operations of expanding the input data due to the shape difference and the like can be reduced to a certain extent, the processing resources of the processor are saved, and the performance of the processor is improved.
And s13, generating a barrel mark of the calculation map based on the group to which each dynamic input belongs and the barrel dividing mode of each group. Specifically, the corresponding dynamic input can be marked based on the group to which each dynamic input belongs to respectively to obtain the group mark of the corresponding dynamic input, and the corresponding group can be marked based on the bucket division mode of each group to obtain the attribute mark of the corresponding group, so that the bucket division mark of the calculation graph is constructed by adopting the group mark of each dynamic input and the attribute mark of each group.
Based on the description of s11-s13, it can be known that the bucket label generated by the bucket label method of shape group can be used for indicating each group to which each dynamic input belongs and the bucket mode of each group, and specifically can comprise the group label of each dynamic input and the attribute label of at least one group. Wherein, the dynamic shapes of each dynamic input in the same group have the same value range, and the value ranges of the dynamic shapes of each dynamic input are subjected to barrel separation processing according to the barrel separation mode of the corresponding group. Therefore, the embodiment of the application can realize that the barrel division mode of the value range of each dynamic shape input in the same group can record by adopting one piece of information, so that the barrel division mode of taking some memory to record the value range of each dynamic shape input respectively can be avoided, and the memory of a processor can be effectively saved.
The group to which the dynamic input belongs may refer to a group to which a dynamic dimension in a dynamic shape of the dynamic input belongs, in which case the group mark of the dynamic input is used to indicate the dynamic dimension in the dynamic shape of the dynamic input and the group to which the corresponding dynamic dimension belongs. For example, the group label of the dynamic input may include, but is not limited to, at least one of a node identification (e.g., node name) for representing the input node of the dynamic input, output information of the input node, a dynamic shape, and a group to which the dynamic dimension in the dynamic shape belongs, and the like. Illustratively, the dynamically entered group designation may be represented as follows:
{name:NODE_NAME,index:0,shape:{-1,10},shape_group:{0,-1}}
name, a field representing the node name, for storing the node name of the input node.
Index, which indicates the output sequence number field for storing the sequence number of the output of the input node to indicate the number of outputs of the input node by the stored value, and index 0 indicates the 1 st output.
The shape field is used for storing shape output by the input node, wherein-1 indicates that the value of the dimension where the shape is located is dynamically changed, and other values indicate that the value of the dimension where the shape is located is statically unchanged. It can be understood that, for a computational graph, the data output by the input node is the input data of the computational graph, so that the shape output by the input node is the shape input by the computational graph.
The group field is used for storing grouping information of the shape, and specifically comprises grouping indication values of each dimension in the shape, wherein when the grouping indication value of any dimension is a specified value (such as-1), the dimension is not grouped, when the grouping indication information of any dimension is a non-specified value (namely, a value except the specified value), the grouping information of the dimension is represented, and if the grouping indication information is 0, the corresponding dimension can be represented as belonging to the 0 th group.
In addition, one group corresponds to one value range, and the attribute mark of the group is used for indicating the value range corresponding to the group and the barrel division mode of the group. For example, the attribute flags for the groups may include, but are not limited to, at least one of group, bucket type (e.g., bucket algorithm and data required by the bucket algorithm), maximum and minimum values of a range of values, and the like. Illustratively, the group's attribute tags may be represented as follows:
{shape_group:0,algo:exponent,value:2,min:1,max:100}
shape_group, a group field for storing a group to indicate what group this is, through the stored group;
algo, an algorithm field, which is used for storing a barrel dividing algorithm, and exponent, which is used for indicating an index barrel dividing algorithm;
value represents the algorithm basis field for storing the data required by the bucket algorithm, e.g., 2 stored therein represents the index bucket algorithm with the base of 2.
Max_shape, a field representing the maximum value of the range, for storing the maximum value of the range of values, thereby indicating the maximum value of shape.
Min_shape, a field representing the maximum value of the range, is used for storing the minimum value of the value range, thereby indicating the minimum value of shape.
Based on the above description, for example, the computation graph may include the following input nodes:
input0:<-1x10x16xf32>
input1:<-1x2xf32>
input2:<-1xf32>
Then, the bucket label of the computational graph may be as follows:
{name:input0,index:0,shape:{-1,10,16},shape_group:{0,-1,-1}}
{name:input1,index:0,shape:{-1,2},shape_group:{0,-1}}
{name:input2,index:0,shape:{-1},shape_group:{0}}
{shape_group:0,algo:exponent,value:2,min:1,max:100}
From the above-mentioned bucket label, it can be known that the 1 st dimension in the shape of input0 (i.e., the 0 th input node), the 1 st dimension in the shape of input1 (i.e., the 1 st input node), and the 1 st dimension in the shape of input2 (i.e., the 2 nd input node) are all in the 0 th group. Based on the attribute mark of the 0 th group, the minimum value of the 1 st dimension value range in the shape of each input node is 1, the maximum value is 100, namely the value range is 1-100, and each value range is subjected to barrel division processing according to an index barrel division algorithm based on 2, namely barrel division is performed according to 1,2,4,8,16,32,64,100.
It should be noted that, in the above-mentioned sub-bucket label, the sub-bucket algorithm may be configured by itself according to the need, which is not limited in the embodiment of the present application. In addition, the above description is merely illustrative, and not restrictive, and in other embodiments, each of the group marks dynamically input in the bucket mark may include a corresponding value range of dynamic input, and in this case, the attribute mark of the group may not need to indicate the value range corresponding to the group.
S203, according to the indication of the barrel dividing mark, barrel dividing processing is carried out on the value ranges of the dynamic shapes of the P dynamic inputs, and a plurality of barrels are obtained.
In a specific implementation, the computer device can determine a barrel dividing mode of a value range of a dynamic shape of the P-th dynamic input according to an indication of a barrel dividing mark aiming at the P-th dynamic input (P epsilon [1, P ]), so as to perform barrel dividing processing on the value range of the dynamic shape of the P-th dynamic input based on the determined barrel dividing mode to obtain a plurality of values corresponding to the P-th dynamic input, and after obtaining a plurality of values corresponding to each dynamic input based on the principle, the plurality of values corresponding to the P-th dynamic input can be combined to obtain a plurality of barrels.
Wherein each bucket includes P values, different ones of the P values being selected from different ranges of values. For example, referring to fig. 3, a total of 2 dynamic inputs are shown, the value range of the dynamic shape of the 1 st dynamic input is subjected to barrel dividing processing to obtain two values of 1 and 2, the value range of the dynamic shape of the 2 nd dynamic input is subjected to barrel dividing processing to obtain two values of 4 and 8, and then 4 barrels can be obtained by combining the values corresponding to different dynamic inputs, wherein the 1 st dynamic input comprises 1 and 4 numerical values, the 2 nd dynamic input comprises 1 numerical value and 8 numerical value, the 3 st dynamic input comprises 2 numerical value and 4 numerical value, and the 4 th dynamic input comprises 2 numerical value and 8 numerical value.
S204, performing static compiling on the calculation graph based on the numerical value in each bucket to obtain a compiling result corresponding to each bucket.
In one embodiment, if the range of values of each dynamic shape input dynamically is a range formed by a plurality of preset static shapes, any value in each barrel obtained in this case is a static shape. Then the computer device may directly compile the computation graph based on the static shapes in each bucket, respectively, to obtain a compiling result corresponding to each bucket when executing S204. It will be appreciated that the embodiment of the present application does not limit the specific manner of compiling the computation graph based on the static shape, and for example, an existing compiler may be used to compile the computation graph based on the static shape.
In another implementation, if the value range of each dynamic shape of the dynamic input is the value range of the dynamic dimension in the corresponding dynamic shape, then any value in each bucket obtained in this case is the specific value of the dynamic dimension. The computer device may traverse a plurality of buckets while executing S204, determine a kth bucket currently traversed, k being a positive integer and less than or equal to a total number of buckets, obtain a static shape of the kth bucket by taking each value in the kth bucket as a value of a corresponding dynamically inputted dynamic shape, wherein when a P-th value in the kth bucket is selected from a range of values of the P-th dynamically inputted dynamic shape, the P-th value corresponds to the P-th dynamic input, P is [1, P ], and the P-th value is taken as the determined shape after the P-th dynamically inputted dynamic shape is taken as the P-th dynamically inputted dynamic shape, which may be referred to as the P-th dynamically inputted static shape, whereby it can be seen that the static shape of the kth bucket includes one static shape of each of the P dynamic inputs. Further, after the buckets are traversed, the calculation map may be statically compiled based on the static shape of each bucket, to obtain a compiling result corresponding to each bucket.
It should be noted that, if the input of the computation graph includes at least one static input (i.e., an input having a static shape) in addition to P dynamic inputs, the computer device may further add the static shape of each static input to the static shape of each bucket, so that the static shape of each bucket includes not only one static shape of each dynamic input, but also the static shape of each static input, so as to promote the comprehensiveness of the static shape of each bucket, and accurately perform static compiling on the computation graph based on the static shape of each bucket, to obtain a compiling result corresponding to each bucket.
S205, after the input data of the calculation graph is obtained, the calculation graph is executed on the input data based on the compiling results corresponding to the buckets, and the data calculation result of the input data is obtained.
As can be seen from the foregoing description, each bucket has a static shape. Also, the difference between the static shape of any bucket and the shape of the input data is that the values of the dynamic dimensions are different, i.e., the values in the static dimensions are the same between the static shape of any bucket and the shape of the input data. For example, the dynamic dimension is the 1 st dimension, then the static shape of any bucket may be <4x10x16xf32>, <8x10x16xf32>, etc., while the shape of the input data may be <5x10x16xf32 >.
Based on this, the computer device may select a target bucket from the plurality of buckets based on the shape of the input data when executing S205, the static shape of the target bucket being greater than or equal to the shape of the input data, wherein the static shape of the target bucket being greater than or equal to the shape of the input data means that the value of the static shape of the target bucket in the dynamic dimension is greater than or equal to the value of the input data in the dynamic dimension. Further, when the static shape of the target bucket is equal to or greater than the shape of the input data, it indicates that the compiling result corresponding to the target bucket can identify and process the input data, so that the computing graph can be directly executed on the input data based on the compiling result corresponding to the target bucket (i.e. the compiling result corresponding to the target bucket is directly executed by using the input data), thereby obtaining the data computing result of the input data.
When the static shape of the target bucket is larger than the shape of the input data, the shape of the input data is not matched with the shape of the data which is supported to be identified and processed by the compiling result corresponding to the target bucket, so that the input data can be expanded to obtain expanded data (namely, the expanded input data), the shape of the expanded data is the same as the static shape in the target bucket, a calculation graph is executed on the expanded data based on the compiling result corresponding to the target bucket, the data calculation result of the expanded data is obtained, and the reference information for shape prediction is obtained, wherein the reference information is generated based on the calculation logic of the calculation graph, and then the data calculation result of the expanded data is subjected to data extraction based on the target output shape, so that the data calculation result corresponding to the input data is obtained.
The specific implementation mode of the step of executing the calculation graph on the extended data based on the compiling result corresponding to the target bucket includes executing the compiling result corresponding to the target bucket by using the extended data. Therefore, when the static shape of the target bucket is larger than the shape of the input data, the input data is expanded to obtain expanded data with the shape matched with the static shape of the target bucket, so that the expanded data is used for executing the compiling result corresponding to the target bucket, and the successful execution of the target compiling result can be ensured. And the shape of the data calculation result corresponding to the input data is predicted, so that the data extraction is performed on the data calculation result of the expanded data based on the predicted shape, and the accuracy of the data calculation result corresponding to the finally obtained input data can be ensured.
It can be understood that in the above implementation, although the input data is expanded, the additional redundancy operation is caused, and experiments prove that the static sub-bucket compiling execution efficiency can still exceed the execution efficiency of the dynamic shape graph through reasonable sub-buckets.
The embodiment of the application can acquire the calculation graphs of P dynamic inputs and the barrel dividing marks of the calculation graphs, wherein the barrel dividing marks are used for indicating groups to which each dynamic input belongs and barrel dividing modes of each group, the dynamic shapes of the dynamic inputs in the same group have the same value ranges, and the value ranges of the dynamic shapes of the dynamic inputs are processed in barrel dividing modes of the corresponding groups. Furthermore, before executing the calculation graph, the calculation graph can be subjected to barrel division processing according to the indication of barrel division marks, a plurality of barrels are obtained, the calculation graph is subjected to static compiling based on the numerical value in each barrel, so that after acquiring the input data of the calculation graph, the calculation graph is executed on the input data based on the compiling results corresponding to the barrels, and the data calculation result of the input data is obtained.
Based on the related description of the method embodiment shown in fig. 2, another processing method of the calculation map is provided in the embodiment of the present application, and in the embodiment of the present application, the description is still made using the computer device as the execution body.
Referring to fig. 4, the processing method of the calculation map may generally include S401-S408:
s401, acquiring a calculation graph and acquiring a barrel mark of the calculation graph.
Wherein the computational graph may include a plurality of nodes, each node having respective data processing logic, and the data processing logic of the plurality of nodes constitutes the computational logic of the computational graph. The data processing logic of the input nodes can be logic of inputting data to the computational graph by the input nodes, the data processing logic of the operator nodes can be data operation logic adopted by the operator nodes, and the data processing logic of the output nodes can be logic of outputting data from the computational graph. Wherein an input node represents an input, an operator node represents an operator, and an output node represents an output, based on which the computer device may comprise Q inputs, R operators, and at least one output. And, there are P dynamic inputs in Q inputs, each dynamic input has a dynamic shape, each dynamic shape has a value range, P E [1, Q ].
The barrel dividing mark of the calculation graph is used for indicating each dynamic input group and the barrel dividing mode of each group, wherein the dynamic shapes of the dynamic inputs in the same group have the same value range, and the value ranges of the dynamic shapes of the dynamic inputs are subjected to barrel dividing processing according to the barrel dividing mode of the corresponding group. It can be understood that the generation manner of the bucket label of the calculation map can be referred to the related description of the foregoing method embodiment, and will not be repeated herein.
Optionally, after taking into account the bucket of the value range of the dynamic shape of each dynamic input of the computation graph and performing static compiling on the computation graph based on the value in each bucket, when executing the computation graph on the actual input data based on the compiling results corresponding to a plurality of buckets, there may be an operation of expanding the input data, and if there may be some arithmetic logic of the operators in the computation graph, the accuracy of the final data computation result of the input data may be affected due to the expanded operation. To avoid this problem, it is considered that the class of computation graphs do not have the partitionability (i.e., the property that the computation graph supports the partitioning of the range of values of the dynamic shape of the dynamic input), thereby prohibiting the execution of the partitioning process on the class of computation graphs. Based on this, the computer device can detect whether the calculated graph has the partitionability after acquiring the calculated graph, so as to execute the partitioning processing in the case that the calculated graph is detected to have the partitionability, that is, the partitioning processing in the embodiment of the application is executed in the case that the calculated graph is detected to have the partitionability.
The determining whether a computation graph has the partitionability (i.e. whether the computation graph can be executed by using the manner of the static compiling of the partitioner) may be implemented by the following method:
Assume that the dynamic input of a computational graph is < -1x10xf32>, wherein the dimension in which-1 is located (i.e., the 1 st dimension) is the dynamic dimension, and the range of values of the dynamic dimension is [1,100]. If the shape of the input data of the computation graph is <23x10xf32> and the input data is executed using the compiling result corresponding to the bucket having the static shape <100x10xf32>, in one execution, it is necessary to expand the input data using 77 preset data elements in the 1 st dimension so that the shape of the expanded input data is <100x10xf32>, i.e., so that the expanded input data has 100 data elements in the 1 st dimension, as shown in fig. 5 a. In this case, of the 100 data elements of the expanded input data in the 1 st dimension, only the first 23 data elements (i.e. the data elements in the solid line box) are meaningful inputs, while the last 77 data elements (i.e. the data elements in the dashed line box) are meaningless, then, for this expanded input data, if during execution of the computation graph, each operator in the computation graph can satisfy the following conditions:
(1) Any data element in the solid line frame only performs operation with other data elements in the solid line frame, any data element in the dashed line frame only performs operation with other data elements in the dashed line frame (i.e. the data element in the dashed line frame (i.e. the preset data element) does not participate in the calculation of the data element in the solid line frame (i.e. the data element in the input data), that is, the calculation operation for the data element in the dashed line frame has no relevance to the calculation operation for the data element in the solid line frame, and the two calculation operations are independent of each other), for example:
add input [0] [1], input [0] [9]. The method is characterized in that the 1 st data element in the 2 nd dimension in the input data is added with the 1 st data element in the 2 nd dimension in the input data, and when the operator is used for calculating the expanded data, the preset data element used for expansion in the 1 st dimension does not participate in the calculation process of the input data, namely the calculation operation for the preset data element and the calculation operation for the input data are mutually independent.
Reduce input, axis=1. The method is characterized in that the method indicates that the data elements in the 2 nd dimension in the input data are reduced (e.g. accumulated), and when the operator is used for calculating the expanded data, the preset data elements used for expansion in the 1 st dimension do not participate in the calculation process of the input data, namely the calculation operation for the preset data elements and the calculation operation for the input data are mutually independent.
(2) The operation process does not involve extended shape (i.e. after the input data is extended by using the preset data element, the operator does not involve a calculation operation for the preset data element when calculating the extended input data), for example:
average= (input [0] [0] +input [1] + ]. Sup.1..sup. + input [ g-1] [0 ])/g. The method comprises the steps of representing that average value operation is carried out on the first g data elements in the 1 st dimension in input data, wherein the value of g refers to the number of the data elements in the 1 st dimension in the input data, if the shape of the input data is <23x10xf32>, g=23, and when the operator is used for calculating the expanded data, the operation process does not involve calculation operation on preset data elements.
It can be seen that if the operator satisfies the above condition, then for the input data whose original input shape is <23x10xf32>, whether it is extended in the 1 st dimension will not affect its final calculation result. Only after expansion, some redundant data is added to its result. After removing these redundant data, the results are consistent with the direct execution of the dynamic shape graph.
Based on the above, each operator in the computational graph can be analyzed to determine whether the execution of each operator meets the above conditions in the static sub-bucket compiling scene, so that it can be automatically determined whether one computational graph can be executed by the static sub-bucket compiling mode (i.e. whether one computational graph has sub-barrelability).
The following is exemplified by several operators:
① Matmul (matrix multiplication) this operator has two inputs (i.e. lhs and rhs) whose shape is < AxB > < BxC >, where B is called the switchable dim. According to the semantics of Matmul operators, all data elements of the dimension where the connecting dim is located are multiplied and accumulated. If expansion is performed in this one dimension, the data elements used for expansion are caused to participate in the operation process of the input data, and thus, the data calculation result of the input data is caused to be wrong. Thus, for the Matmul operator, neither of its two inputs of the connecting dim can be binned, i.e. the operator does not have a binning condition. Also similar are general matmul, batching matmul operators, etc.
② Reduce operation the operator performs Reduce operation on a dimension of the input. For example, reduce_ suminput <100x10>,1. And performing accumulation operation of data elements in the 2 nd dimension in the input data, specifically accumulating 10 data elements corresponding to each data element in the 1 st dimension in the 2 nd dimension to obtain a data calculation result, wherein the shape of the data calculation result is <100>, and for such a reduction operator, determining that static sub-buckets cannot be performed in the dimension in which the reduction operation is performed, thereby determining that the operator does not have a sub-bucket condition.
③ The Gather is that the operator mainly has two parameters, input and index, and its logic is to select proper data elements from the inputs to fill into the data calculation result mainly by the values of the index parameters, and it can be seen that when the operator is used to calculate the data expanded based on the preset data elements, the calculation operation (index search) for the preset data elements may be involved, so that the operator cannot perform static binning, and thus it is determined that the operator does not have the binning condition.
Based on the above description, the input data includes at least one data element, the computation graph records at least one operator (i.e., the operators represented by the R operator nodes), and the manner of detecting whether the computation graph has the partitionability can be summarized as including the following steps s21-s24:
s21, traversing each operator recorded by the computation graph, and taking the currently traversed operator as the current operator.
S22, predicting whether the preset data elements influence the original calculation result of the input data or not when the preset data elements are used for calculating the expanded input data after the input data are expanded by the preset data elements according to the data processing logic of the current operator, wherein the original calculation result is the result obtained by calculating the input data by the current operator. Specifically, according to the data processing logic of the current operator, it is predicted whether a calculation operation for the preset data element exists when the input data is expanded by using the preset data element, if the calculation operation for the preset data element does not exist, it is determined that the preset data element does not affect the original calculation result of the input data, if the calculation operation for the preset data element exists, according to the data processing logic of the current operator, the calculation relevance between the preset data element and the input data (namely, the relevance between the calculation operation for the preset data element and the calculation result for the input data) is detected, if the calculation relevance is detected, it is determined that the preset data element affects the original calculation result of the input data, and if the calculation relevance is not detected, it is determined that the preset data element does not affect the original calculation result of the input data.
S23, if the preset data elements influence the original calculation result of the input data, determining that the current operator does not meet the bucket dividing condition, and if the preset data elements do not influence the original calculation result of the input data, determining that the current operator meets the bucket dividing condition.
S24, after each operator traverses, if each operator meets the bucket dividing condition, determining that the calculated graph has the bucket dividing property, and if at least one operator does not meet the bucket dividing condition, determining that the calculated graph does not have the bucket dividing property.
Based on the above description of s21-s24, the embodiment of the application can automatically judge whether the given calculation graph has the barrelability or not, so as to determine whether the calculation graph can be executed in a static barrel-dividing compiling mode, thereby effectively reducing the manual workload and improving the adaptation efficiency of the calculation graph.
S402, according to the indication of the barrel dividing mark, barrel dividing processing is carried out on the value ranges of the dynamic shapes of the P dynamic inputs, and a plurality of barrels are obtained.
S403, performing static compiling on the calculation graph based on the numerical value in each bucket to obtain a compiling result corresponding to each bucket.
In specific implementation, the specific implementation of steps S402-S403 may refer to the related descriptions of steps S203-S204 in the foregoing method example, which are not repeated herein. Also, as can be seen from the foregoing description of steps S203-S204, each bucket has a static shape, and the static shape of any bucket may be a set of static shapes that take respective inputs, i.e., the static shape of any bucket includes the static shapes of Q inputs. For example, if the computation graph includes 2 inputs, then the static shape of any bucket includes the static shape of the 1 st input <4x1280xf32> and one static shape of the 2 nd input <8x1280xf32>.
S404, after the input data of the calculation graph is acquired, selecting a target barrel from a plurality of barrels based on the shape of the input data, wherein the static shape in the target barrel is larger than or equal to the shape of the input data.
Wherein the computational graph may include Q inputs, one input for inputting one data to the computational graph, so the number of input data of the computational graph is Q. It will be appreciated that when Q is greater than 1, then each of the Q input data has a respective shape and the shape of the Q input data matches the shape of the Q input, e.g., the computation graph includes 2 inputs and each input is a dynamic input, the dynamic shape of each dynamic input is in turn < -1x1280xf32> and < -1x1280xf32>, then there are 2 input data of the computation graph, the shape of the 1 st input data may be <3x1280xf32> and the shape of the input data corresponding to the 2 nd input may be <6x1280xf32>. Any input data may include, but is not limited to, images, text, speech, etc.
Since the static shape of any bucket differs from the shape of the input data in terms of the magnitude of the dynamic dimension, the computer device may determine the magnitude relationship between the respective bucket and the input data by the magnitude between the static shape of each bucket and the magnitude of the shape of the input data in the dynamic dimension, thereby selecting the target bucket from the plurality of buckets based on the magnitude relationship such that the static shape of the target bucket is greater than or equal to the shape of the input data. Specifically, the computer device may determine, based on the size relationship, each bucket having a static shape greater than or equal to the shape of the input data from among the plurality of buckets, and randomly select one bucket from among the screened buckets as the target bucket. Or selecting the barrel with the smallest static shape from the screened barrels as the target barrel, so that the static shape of the target barrel is closest to the shape of the input data, thus the subsequent expansion operation on the input data can be reduced, the processing resource of the processor is saved, and the running efficiency of the processor is improved.
It is understood that when the computation graph includes 1 input, the static shape of any bucket includes 1 input static shape and the shape of the input data is also 1, in which case, the static shape of the target bucket being greater than or equal to the shape of the input data means that the value in the dynamic dimension in the static shape of the target bucket is greater than or equal to the value in the dynamic dimension in the input data. When the calculation graph comprises at least two inputs, the static shape of any barrel comprises at least two input static shapes and the shape of input data is at least two, and in this case, the static shape of the target barrel is larger than or equal to the shape of the input data, namely, the value of any dynamic input static shape contained in the target barrel in the dynamic dimension is larger than or equal to the value in the dynamic dimension in the static shape of the input data corresponding to the corresponding dynamic input.
For example, the computation graph includes 2 dynamic inputs, the static shapes of the 2 dynamic inputs in bucket 1 are <6x10x80xf32> and <10x1280xf32> in sequence, the static shapes in bucket 2 are <10x10x80xf32> and <10x1280xf32> in sequence, and if the shapes of the 2 input data of the computation graph are <7x10x80xf32> and <8x1280xf32> in sequence, the bucket 2 can be used as a target bucket because the value in the dynamic dimension (1 st dimension) of the 1 st dynamic input static shape included in bucket 2 is larger than the value in the dynamic dimension (1 st dimension) of the 1 st dynamic input static shape included in bucket 2 is also larger than the value in the dynamic dimension (2 nd dimension) of the static shape of the 2 nd dynamic input data.
S405, when the static shape of the target barrel is larger than the shape of the input data, the input data is expanded to obtain expanded data, and the shape of the expanded data is the same as the static shape in the target barrel.
In a specific implementation, if the static shape of the target bucket is greater than the shape of the input data, the computer device may take the value of the static shape of the target bucket in the dynamic dimension as the first value and the value of the shape of the input data in the dynamic dimension as the second value, so that the difference between the first value and the second value is taken as the target number, and further, the input data is expanded by adopting the preset data elements of the target number in the dynamic dimension to obtain expanded data, so that the value of the shape of the expanded data in the dynamic dimension is the same as the first value. For example, the static shape of the target bucket is <10x1280xf32>, and the shape of the input data is <8x1280xf32>, the first value is 10, the second value is 8, and the difference between the first value and the second value is 2, so that the input data can be expanded with 2 preset data elements in the 1 st dimension (i.e. the dynamic dimension) to obtain expanded data, and the value of the shape of the expanded data in the 1 st dimension is 10, even if the expanded data has 10 elements in the 1 st dimension.
It will be appreciated that, the above description is given by taking the example that the static shape of the target bucket includes 1 input static shape and the number of input data is 1, and when the static shape of the target bucket includes at least two input static shapes and the number of input data is at least two, the computer device may expand the input data corresponding to each input by adopting the above manner for each input, so as to obtain expanded data.
S406, executing a calculation graph on the extended data based on the compiling result corresponding to the target bucket, and obtaining a data calculation result of the extended data.
S407, acquiring reference information for shape prediction, and predicting a target output shape, which is a shape of a data calculation result corresponding to the input data, from the shape of the input data and the reference information.
In a specific implementation, since the extended data is obtained by extending the input data, it is necessary to acquire reference information for shape prediction, and predict a target output shape according to the shape of the input data and the reference information, so as to determine which data is calculated based on the input data and which data is calculated based on preset data elements for extension in the data calculation result of the extended data based on the target output shape. For example, referring to fig. 5b, after the input data is expanded from <23xf32> to <100xf32> in the 1 st dimension and a calculation map is performed on the expanded data (i.e., the expanded input data) based on the compiling result corresponding to the target bucket, the shape of the data calculation result of the obtained expanded data is <200xf32>, and it is required to determine which data is calculated based on the actual input data and which data is calculated based on the preset data elements in the data calculation result of the expanded data, so that the data calculation result of the input data is accurately extracted.
Wherein the above mentioned reference information is generated based on computational logic of the computational graph, the computational logic comprising data processing logic of the individual nodes in the computational graph.
In one particular implementation, the reference information may include computational logic of a computational graph. In this case, when the computer device predicts the target output shape according to the shape of the input data and the reference information, a shape prediction model trained in advance based on the AI technique may be acquired, so that the shape prediction model is invoked to perform output shape prediction according to the shape of the input data and the calculation logic of the calculation map, and the target output shape is obtained.
In another implementation, the reference information may include a target shape calculation formula, a variable in the target shape calculation formula is a numerical value in a shape of the input data, and a value of the target shape calculation formula represents a value of a t-th dimension in the target output shape, where t is a positive integer and is less than or equal to a dimension number in the target output shape. In this case, when predicting the target output shape from the shape of the input data and the reference information, the computer apparatus may invoke a target shape calculation formula, calculate a value of a t-th dimension in the target output shape based on the value in the shape of the input data, obtain a target value, and acquire values of respective dimensions other than the t-th dimension from the shape of the data calculation result of the expanded data, thereby integrating the acquired values of the respective dimensions with the target value, and obtain the target output shape.
In one embodiment, if the computational graph includes only one operator node, the computer device may generate the target shape calculation formula directly based on the data processing logic of the operator represented by the operator node.
For example, the inputs to the computational graph are:
Input0:<-1xf32>
Input1:<-1x10xf32>
The output is:
Output0:<-1xf32>
Output1:<-1x30xf32>
if the data processing logic of the operator in the calculation graph includes directly outputting the Input data corresponding to the 0 th Input (Input 0) and performing 2 times multiplication on the data element in the 1 st dimension in the Input data corresponding to the 1 st Input (Input 1), then based on the data processing logic, it can be known that the shape of the 0 th Output (Output 0) is equal to the value of the shape of the 0 th Input in each dimension, and the value size relationship between the shape of the 1 st Output (Output 1) and the shape of the 1 st Input in the 1 st dimension is a two times relationship.
Based on this, the generated target shape calculation formula may be as follows:
Output_0_dim_0=input_0_dim_0
Output_1_dim_0=input_1_dim_1*2
The meaning of the formula is that the dynamic shape graph has two outputs, each of which contains a dynamic dim (i.e., a value in the dynamic dimension). The formula will use the dynamic dim in the input to calculate the dynamic dim of the output. And it can be seen from the formula that the value of dim0 of Output0 (i.e., the value of the shape of Output0 in dimension 1) and the value of dim0 of Input0 (i.e., the value of the shape of Intput0 in dimension 1) are equal, while the value of dim0 of Output1 (i.e., the value of the shape of Output1 in dimension 1) is twice the value of dim0 of Input1 (i.e., the value of the shape of Input1 in dimension 1).
In another embodiment, the computational graph comprises a plurality of nodes, and the plurality of nodes can specifically comprise P input nodes, R operator nodes, output nodes and the like. The process of generating the target shape calculation based on the computational graph's arithmetic logic (i.e., the data processing logic of the plurality of nodes) may include s31-s33 as follows:
And s31, based on the topological structure of the computational graph, sequencing the plurality of nodes in the computational graph to obtain the topological sequence of the plurality of nodes. Specifically, the topology may include a connection relationship between each node, and when the computer device orders the plurality of nodes in the computation graph based on the topology, the computer device may obtain a topology sequence of the plurality of nodes by using an input node as a start point (i.e., a start node) and an output node as an end point (i.e., a last node).
And s32, processing the corresponding nodes according to the topological sequence of the plurality of nodes in turn according to the data processing logic of each node to obtain the shape calculation formula of each node. The variables in the shape calculation formula of any node are numerical values in the shape of the input data. Specific:
① The processing of the ith input node includes generating a shape calculation formula for the ith input node based on the variable x i and the dimension j of the dynamic dimension in the dynamic shape of the ith input node, recording the generated shape calculation formula on the output shape of the ith input node, and passing the output shape of the ith input node to the next node of the ith input node. The shape calculation formula of the ith input node is used for indicating that the value of the jth dimension in the dynamic shape of the ith input node is a variable x i, and i epsilon [1, P ] and j are positive integers. Illustratively, the shape calculation formula for the ith input node may be f (input i,j)=xi; i denotes the number of input nodes with dynamic shapes, j denotes the number of dimensions in this dynamic shape are dynamic dimensions, x i denotes the ith variable (i.e., the ith unknowns).
② The processing of the r operator node comprises the steps of deducing the output shape of the r operator node according to the data processing logic of the r operator node and the shape received by the r operator node, and transmitting the deduced output shape to the next node of the r operator node, wherein r is E [1, R ]. Wherein the output formula (i.e., the shape calculation formula in the output shape) of the r-th operator node may be changed as compared with the input formula (i.e., the received shape calculation formula) thereof, such that the output formula is equal to the input formula, or the output formula performs an arithmetic operation based on the input formula.
③ The processing of the output node comprises taking the shape received by the output node as the output shape of the output node, recording a shape calculation formula on the output shape of the output node, and storing the shape calculation formula recorded on the output shape (namely, storing a formula of dynamic dim of the output shape).
S33, the shape calculation formula of the last node is set as the target shape calculation formula. It is understood that, since each input node is the start node indicated by the topology sequence, and the output node is the last node indicated by the topology sequence, the specific implementation of step s33 is to take the shape calculation formula of the output node as the target shape calculation formula.
Based on the above description of s31-s33, the generation process of the target shape calculation formula is exemplarily described below with reference to fig. 5 c:
The design calculation graph has two dynamic inputs, namely input_1 and input_2, and the dynamic shapes of the input_1 and the input_2 are < -1x12xf32>. Then the shape calculation for input_1 may be generated as f (input 1,1)=x1 and the shape calculation for input_2 as f (input 2,1)=x2) such that the shape calculation for each dynamic input is recorded on the output shape for the corresponding dynamic input and the output shape is passed to the next node add.
Further, it is possible to continue to derive its corresponding output shape as < -1×12xf32> and obtain a corresponding shape calculation formula of f (add, 1) =broadcast (x 1,x2) from the data processing logic of the add operator and the received shape, and pass the derived output shape (carrying shape calculation formula) to the next node reshape (reshaped shape). It will be appreciated that the value in dimension 1 in the output shape may be x 1 or x 2 when the add operator is processed, and that given that the add operator here carries the semantics of broadcast, the broadcast semantics need to be added to the formula when the formula is generated.
Further, the data processing logic of reshape operators (e.g., reshaping the data element in the 1 st dimension into the data element in two dimensions) and the received shape can be continued to derive that its corresponding output shape is <2x-1x12xf32> and the corresponding shape calculation formula is f (reshape, 2) =broadcast (x 1,x2)/2, and the derived output shape (carrying shape calculation formula) is transferred to the next node transfer (transpose).
Further, the data processing logic of the transfer operator (e.g., transpose the data element in the 1 st dimension and the data element in the 2 nd dimension) and the received shape may be further used to derive that the corresponding output shape is < -1x2x12xf32> and obtain the corresponding shape calculation formula of f (transfer, 1) =broadcast (x 1,x2)/2, and transfer the derived output shape (carrying shape calculation formula) to the next node tile (concatenation).
Further, the data processing logic of the tile operator and the received shape may be further used to derive that the corresponding output shape is < -1×2x24xf32> and obtain a corresponding shape calculation formula of f (tile, 1) =broadcast (x 1,x2)/2*3, and transfer the derived output shape (carrying shape calculation formula) to the next node output (output).
Further, the shape received by the output node may be taken as the output shape of the output node, and the shape calculation formula recorded on the output shape of the output node (i.e., < -1x2x24xf32 >) may be f (output, 1) =broadcast (x 1,x2)/2*3, so that the shape calculation formula is taken as the target shape calculation formula.
Based on the above, by continuously analyzing each node, a formula containing the input dynamic dim as a variable can be accurately generated finally. In the actual implementation, only the obtained actual shape of the input data can be directly applied to the formula to calculate the output actual shape (namely the target output shape), so that the acquisition efficiency of the target output shape can be improved. It can be appreciated that if there are nodes that cannot be analyzed, the formula generation failure can be determined, in which case static bucket compilation of the computational graph may not be performed.
Based on the above description, it should also be noted that, the algorithm for generating the target shape calculation formula in the embodiment of the present application needs to derive the shape of the calculation output according to the shape input. In addition to the basic algorithm described above, special processing of part of the operators involved in shape calculation is required. Such as:
And the shape operator obtains the input shape as output data.
That is, for the processing of the normal operator, the algorithm only needs to generate the deduction formula according to the shape of the operator input and output. But when encountering the shape operator above, it passes the input shape as data to the next operator so that the shape participates in subsequent calculations as data. At this time, if the formula derivation is performed only according to the shape of the operator input and output, a large number of subsequent operators cannot generate the shape calculation formula because the shape calculation formula does not exist in the output shape. In this scenario, when encountering special calculations such as shape operator, the algorithm for generating the target shape calculation formula provided by the embodiment of the application binds the generated shape calculation formula in its input and its output data, and transmits the output data to the next node, so that the next node can deduce the shape calculation formula through the output data, and the subsequent calculation of the shape calculation formula output can also simultaneously process the shape calculation formula until encountering the operator such as reshape.
Based on this, the shape received by the r-th operator node is represented as the target shape. When the data processing logic of the r-th operator node does not include a computing operation for the target shape, the shape calculation formula in the target shape is recorded on the output shape of the r-th operator node. When the data processing logic of the r operator node comprises the calculation operation aiming at the target shape, the shape calculation formula in the target shape is recorded on the output data of the r operator node, and the processing of the r operator node further comprises the step of transmitting the output data of the r operator node to the next node of the r operator node, so that the normal transmission of the shape calculation formula in a calculation graph is realized, the follow-up nodes can be guaranteed to generate the shape calculation formula of the user, and the generation success rate and the accuracy of the final target shape calculation formula are improved.
For example, see FIG. 5d where the formula f () represents the shape calculation of the operator recorded on the output shape of the operator and f value () represents the shape calculation of the operator recorded on the output data of the operator. It can be seen from fig. 5d that:
The shape calculation formula f (add, 1) =x of the add operator is recorded on the output shape of the add operator, but the output shape of the next operator of the add operator (i.e., shape operator) is <2xi32> (where i32 represents a 32-bit integer), the shape calculation formula and any unknowns are not carried in the output shape, and the shape calculation formula (f value (shape, 1) =x) of the shape operator is recorded on the output data of the shape operator, so that the output data and the output shape of the shape operator need to be transferred to the next node slice.
Further, according to the data processing logic of the slice operator and the received output data of the shape and shape operator, deducing that the corresponding output shape is <1xi32> and obtaining a corresponding shape calculation formula of f value (slice, 1) =x, and transmitting the deduced output shape and the output data recorded with the corresponding shape calculation formula to the next node concat (splicing).
Further, according to the data processing logic of the concat operator and the received output data of the shape and slice operator, deducing that the corresponding output shape is <2xi32> and obtaining a corresponding shape calculation formula of f value (concat, 1) =x, and transmitting the deduced output shape and the output data recorded with the corresponding shape calculation formula to the next node reshape.
Further, according to the data processing logic of reshape operators and the received output data of the shape and concat operators, deducing that the corresponding output shape is < -1x6xf32> and obtaining the corresponding shape calculation formula is f (reshape, 1) =x, and transmitting the deduced output shape (recorded with the corresponding shape calculation formula) to the next node, and so on until the target shape calculation formula is obtained.
Therefore, the shape calculation formula of each operator is recorded on the output data of the operator from the beginning (including) of the shape operator to the end (not including) of the reshape operator, so that the output data and the output shape of each operator are required to be transmitted to the next node together during transmission, and the shape calculation formula of the add operator can be transmitted to the output shape of the reshape operator, thereby avoiding the problem that the shape calculation formula of the operator cannot be deduced after reshape, and further ensuring the success rate and accuracy of the deduction of the final target shape calculation formula.
S408, based on the target output shape, data extraction is carried out on the data calculation result of the expansion data, and the data calculation result corresponding to the input data is obtained.
Specifically, data corresponding to the target output shape may be extracted from the data calculation results of the expanded data as the data calculation results corresponding to the input data. For example, the target output shape is <2×10xf32>, then the first 2 data elements can be extracted in the 1 st dimension and the first 10 data elements can be extracted in the 2 nd dimension in the data calculation result of the extended data, so that the data calculation result corresponding to the input data is composed by the data elements extracted in each dimension.
Based on the above description of steps S401 to S408, the processing method of the calculation map according to the embodiment of the application may include a plurality of module contents. As shown in fig. 5e, the plurality of module contents may include, but are not limited to, input validity check (i.e., checking whether the input data is valid to ensure validity of the input data), barrelability analysis (i.e., analyzing whether the computation graph is barrelability to avoid performing optimization on the inapplicable computation graph based on a static barrelability compiling manner), output data space computation (i.e., predicting a shape of a data computation result corresponding to the input data to ensure that a correct data computation result is obtained based on the predicted shape), custom operator processing (e.g., processing dynamic output of the custom operator), barrelability mode selection (e.g., selecting a more targeted barrelability mode for each dynamic input value range based on an input distribution condition to improve performance), static barrelability compiling (i.e., barrelability is performed on each dynamic input value range using a corresponding barrelability mode, and performing static compiling on the computation graph based on values in each barrel), result extraction (i.e.e., selecting a compiling result corresponding to a suitable barrel according to the shape of the input data and performing result extraction (i.e., performing the result extraction from the compiling result of the data corresponding to the compiling result based on the shape calculated by the output data space).
Therefore, the processing method of the calculation map provided by the embodiment of the application can confirm that the input calculation map can be subjected to static barrel compiling through validity check and barrel splitting analysis (for example, the calculation of a plurality of input data can not be interfered with each other through code analysis). Further, the data relied on in the compiling process is determined through modules such as output data space calculation, barrel strategy selection and custom operator processing. In the compiling process, a plurality of compiling results are generated by carrying out static compiling for a plurality of times through the barrel values of the barrel dividing strategy, so that when the calculation diagram is actually executed, judgment can be carried out according to the shape of the input data, and the appropriate compiling result is selected to execute the corresponding input data, thereby obtaining the data calculation result of the input data.
Based on the above, the method provided by the embodiment of the application not only can support to execute the dynamic shape graph by using the static sub-bucket compiling mode, but also provides a plurality of automatic and efficient methods to ensure the correctness of the sub-bucket, so that the method can better cope with the actual production environment, and when the dynamic shape graph needs to be executed, the additional expenditure brought by directly executing the dynamic shape graph can be reduced, thereby possibly obtaining better execution efficiency. In addition, from practical application, the embodiment of the application considers a plurality of scenes such as commercialized application, test and the like, and adopts simple, convenient, automatic and high-performance design, so that the method has higher feasibility in practical production application.
Based on the descriptions of the above embodiments of the methods, the embodiments of the present application also disclose a processing apparatus of a computational graph, where the processing apparatus of the computational graph may be a computer program (including one or more instructions) running in a computer device, and the processing apparatus of the computational graph may perform the steps in any of the above methods. Referring to fig. 6, the processing device of the calculation map may operate the following units:
An obtaining unit 601, configured to obtain a computation graph, where the computation graph includes P dynamic inputs, and P is a positive integer, each dynamic input has a dynamic shape, and each dynamic shape has a value range;
The obtaining unit 601 is further configured to obtain a bucket label of the computational graph, where the bucket label is used to indicate a group to which each dynamic input belongs and a bucket manner of each group, and the dynamic shapes of each dynamic input in the same group have the same value range, and the value ranges of the dynamic shapes of each dynamic input are subjected to bucket processing according to the bucket manner of the corresponding group;
The processing unit 602 is configured to perform barrel division processing on the value ranges of the P dynamically input dynamic shapes according to the indication of the barrel division mark, so as to obtain multiple barrels, where each barrel includes P values, and different values in the P values are selected from different value ranges;
the processing unit 602 is further configured to statically compile the computation graph based on the values in each bucket, to obtain a compiling result corresponding to each bucket;
the processing unit 602 is further configured to execute, after obtaining input data of the computation graph, the computation graph on the input data based on compiling results corresponding to the multiple buckets, so as to obtain a data computation result of the input data.
In one embodiment, when the processing unit 602 is configured to statically compile the computation graph based on the value in each bucket, to obtain a compiling result corresponding to each bucket, the processing unit may be specifically configured to:
traversing the plurality of barrels, determining a kth barrel traversed currently, wherein k is a positive integer and is less than or equal to the total number of barrels;
respectively taking each numerical value in the kth barrel as the value of the dynamic shape of the corresponding dynamic input to obtain the static shape of the kth barrel, wherein when the p-th numerical value in the kth barrel is selected from the value range of the dynamic shape of the p-th dynamic input, the p-th numerical value corresponds to the p-th dynamic input, and p is E [1, P ];
After the buckets are traversed, compiling the calculation graph based on the static shape of each bucket respectively to obtain a compiling result corresponding to each bucket.
In another embodiment, each bucket has a static shape, and the processing unit 602, when configured to execute the computation graph on the input data based on the compiled results corresponding to the plurality of buckets, may be specifically configured to:
Selecting a target bucket from the plurality of buckets based on the shape of the input data, the static shape of the target bucket being greater than or equal to the shape of the input data;
When the static shape of the target barrel is larger than the shape of the input data, expanding the input data to obtain expanded data, wherein the shape of the expanded data is the same as the static shape in the target barrel;
Executing the calculation graph on the expansion data based on the compiling result corresponding to the target bucket to obtain a data calculation result of the expansion data;
The method comprises the steps of obtaining reference information for shape prediction, wherein the reference information is generated based on calculation logic of a calculation graph, and predicting a target output shape according to the shape of input data and the reference information, and the target output shape is the shape of a data calculation result corresponding to the input data;
And based on the target output shape, carrying out data extraction on the data calculation result of the expansion data to obtain the data calculation result corresponding to the input data.
In another embodiment, the reference information includes a target shape calculation formula, a variable in the target shape calculation formula is a numerical value in a shape of the input data, and a value of the target shape calculation formula represents a value of a t-th dimension in a target output shape, and t is a positive integer and is less than or equal to a dimension number in the target output shape;
accordingly, the processing unit 602, when configured to predict the target output shape according to the shape of the input data and the reference information, may be specifically configured to:
Invoking the target shape calculation formula, and calculating the value of the t dimension in the target output shape based on the value in the shape of the input data to obtain a target value;
Acquiring the numerical values of all the dimensions except the t dimension from the shape of the data calculation result of the expansion data;
And integrating the acquired values of all the dimensions with the target value to obtain the target output shape.
In another embodiment, the computational graph includes a plurality of nodes, each node having respective data processing logic, the data processing logic of the plurality of nodes constituting the computational logic of the computational graph;
accordingly, the processing unit 602 may be specifically configured to, when generating the target shape calculation formula based on the operation logic of the calculation map:
Based on the topological structure of the computational graph, sequencing a plurality of nodes in the computational graph to obtain the topological sequence of the plurality of nodes;
Processing corresponding nodes according to the topological order of the nodes in turn according to the data processing logic of each node to obtain a shape calculation formula of each node, wherein variables in the shape calculation formula of any node are numerical values in the shape of input data;
And taking the shape calculation formula of the last node as the target shape calculation formula.
In another embodiment, the binning process is performed if the computational graph is detected to be binnable;
Wherein the input data includes at least one data element, the computation graph records at least one operator, and the processing unit 602 may be specifically configured to, when detecting whether the computation graph has the partitionability:
traversing each operator recorded by the computation graph, and taking the currently traversed operator as a current operator;
Predicting whether the preset data element influences the original calculation result of the input data or not when the current operator calculates the expanded input data after expanding the input data by using the preset data element according to the data processing logic of the current operator, wherein the original calculation result is the result obtained by calculating the input data by the current operator;
if the preset data element does not influence the original calculation result of the input data, determining that the current operator meets the bucket dividing condition;
After each operator traverses, if each operator meets the bucket dividing condition, determining that the computational graph has the bucket dividing property, and if at least one operator does not meet the bucket dividing condition, determining that the computational graph does not have the bucket dividing property.
In another embodiment, the processing unit 602, when configured to predict, according to the data processing logic of the current operator, whether the preset data element affects the original calculation result of the input data when the current operator calculates the extended input data after expanding the input data with the preset data element, may be specifically configured to:
predicting whether a computing operation aiming at a preset data element exists when the current operator computes the expanded input data after expanding the input data by using the preset data element according to the data processing logic of the current operator;
and if the computing operation for the preset data element does not exist, determining that the preset data element does not influence the original computing result of the input data.
In another embodiment, the processing unit 602 may be further configured to:
If the computing operation aiming at the preset data element exists, detecting the computing relevance between the preset data element and the input data according to the data processing logic of the current operator;
if the calculation relevance is detected, determining that the preset data element influences the original calculation result of the input data;
and if the calculation relevance is not detected, determining that the preset data element does not influence the original calculation result of the input data.
According to another embodiment of the present application, each unit in the processing apparatus of the calculation chart shown in fig. 6 may be separately or completely combined into one or several other units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the achievement of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the processing device based on the computation graph may also include other units, and in practical applications, these functions may also be implemented with assistance of other units, and may be implemented by cooperation of a plurality of units.
According to another embodiment of the present application, a processing apparatus device of a computational graph as shown in fig. 6 may be constructed by running a computer program (including one or more instructions) capable of executing the steps involved in any of the methods described above on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and a storage element, and the methods of the embodiments of the present application may be implemented. The computer program may be recorded on, for example, a computer readable storage medium, and loaded into and executed by the computing device described above.
It should be noted that, in the embodiment of the present application, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function, and works together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may comprise a portion of the overall module or unit functionality of that module or unit.
The embodiment of the application can acquire the calculation graphs of P dynamic inputs and the barrel dividing marks of the calculation graphs, wherein the barrel dividing marks are used for indicating groups to which each dynamic input belongs and barrel dividing modes of each group, the dynamic shapes of the dynamic inputs in the same group have the same value ranges, and the value ranges of the dynamic shapes of the dynamic inputs are processed in barrel dividing modes of the corresponding groups. Furthermore, before executing the calculation graph, the calculation graph can be subjected to barrel division processing according to the indication of barrel division marks, a plurality of barrels are obtained, the calculation graph is subjected to static compiling based on the numerical value in each barrel, so that after acquiring the input data of the calculation graph, the calculation graph is executed on the input data based on the compiling results corresponding to the barrels, and the data calculation result of the input data is obtained.
Based on the description of the method embodiment and the device embodiment, the embodiment of the application also provides a computer device. Referring to fig. 7, the computer device includes at least a processor 701, an input interface 702, an output interface 703, and a computer storage medium 704. Wherein the processor 701, input interface 702, output interface 703, and computer storage medium 704 within a computer device may be connected by a bus or other means. The computer storage medium 704 may be stored in a memory of a computer device, the computer storage medium 704 being configured to store a computer program, the computer program comprising one or more instructions, the processor 701 being configured to execute one or more instructions of the computer program stored by the computer storage medium 704. The processor 701, or CPU (Central Processing Unit )), is a computing core and a control core of a computer device, which is adapted to implement one or more instructions, in particular to load and execute one or more instructions to implement a corresponding method flow or a corresponding function.
In one embodiment, the processor 701 of the present application may be configured to perform a series of processes on a computation graph, and specifically includes obtaining the computation graph, where the computation graph includes P dynamic inputs, P is a positive integer, each dynamic input has a dynamic shape, each dynamic shape has a value range, obtaining a bucket flag of the computation graph, where the bucket flag is used to indicate a group to which each dynamic input belongs and a bucket manner of each group, where each dynamic input dynamic shape in the same group has the same value range, and the value ranges of the dynamic inputs are subjected to bucket processing according to the bucket manners of the corresponding groups, performing bucket processing on the value ranges of the dynamic shapes of the P dynamic inputs according to the bucket flag to obtain a plurality of buckets, each bucket includes P values, different values in the P values are selected from different value ranges, performing the computation graph on the computation graph based on each bucket, and obtaining the computation result of the computation graph corresponding to the compiling result, and performing the computation graph on the compiling result.
The embodiment of the application also provides a computer storage medium (Memory), which is a Memory device in the computer device and is used for storing computer programs and data. It is understood that the computer storage media herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer storage media provides storage space that stores an operating system of the computer device. Also stored in the memory space is a computer program comprising one or more instructions, which may be one or more program codes, adapted to be loaded and executed by the processor 701. The computer storage medium may be a high-speed RAM memory, a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory, or at least one computer storage medium located remotely from the processor.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by a processor to implement the corresponding steps in any of the method embodiments described above, and in a specific implementation, the one or more instructions in the computer storage medium may be loaded and executed by the processor to:
Obtaining a calculation graph, wherein the calculation graph comprises P dynamic inputs, and P is a positive integer, each dynamic input is provided with a dynamic shape, and each dynamic shape is provided with a value range;
The method comprises the steps of obtaining a sub-bucket mark of the calculation graph, wherein the sub-bucket mark is used for indicating each dynamic input group and a sub-bucket mode of each group, the dynamic shapes of the dynamic inputs in the same group have the same value range, and the value ranges of the dynamic shapes of the dynamic inputs are subjected to sub-bucket processing according to the sub-bucket modes of the corresponding groups;
Carrying out barrel separation processing on the value ranges of the P dynamic shapes which are dynamically input according to the indication of the barrel separation mark to obtain a plurality of barrels, wherein each barrel comprises P numerical values, and different numerical values in the P numerical values are selected from different value ranges;
Performing static compiling on the calculation graph based on the numerical value in each bucket to obtain a compiling result corresponding to each bucket;
After the input data of the computation graph is obtained, the computation graph is executed on the input data based on the compiling results corresponding to the buckets, and the data computation result of the input data is obtained.
In one embodiment, when the computing graph is statically compiled based on the value in each bucket to obtain the compiling result corresponding to each bucket, the one or more instructions may be loaded and specifically executed by the processor:
traversing the plurality of barrels, determining a kth barrel traversed currently, wherein k is a positive integer and is less than or equal to the total number of barrels;
respectively taking each numerical value in the kth barrel as the value of the dynamic shape of the corresponding dynamic input to obtain the static shape of the kth barrel, wherein when the p-th numerical value in the kth barrel is selected from the value range of the dynamic shape of the p-th dynamic input, the p-th numerical value corresponds to the p-th dynamic input, and p is E [1, P ];
After the buckets are traversed, compiling the calculation graph based on the static shape of each bucket respectively to obtain a compiling result corresponding to each bucket.
In another embodiment, each bucket has a static shape, and the processing unit 602, when configured to execute the computation graph on the input data based on the compiled results corresponding to the plurality of buckets, may be specifically configured to:
Selecting a target bucket from the plurality of buckets based on the shape of the input data, the static shape of the target bucket being greater than or equal to the shape of the input data;
When the static shape of the target barrel is larger than the shape of the input data, expanding the input data to obtain expanded data, wherein the shape of the expanded data is the same as the static shape in the target barrel;
Executing the calculation graph on the expansion data based on the compiling result corresponding to the target bucket to obtain a data calculation result of the expansion data;
The method comprises the steps of obtaining reference information for shape prediction, wherein the reference information is generated based on calculation logic of a calculation graph, and predicting a target output shape according to the shape of input data and the reference information, and the target output shape is the shape of a data calculation result corresponding to the input data;
And based on the target output shape, carrying out data extraction on the data calculation result of the expansion data to obtain the data calculation result corresponding to the input data.
In another embodiment, the reference information includes a target shape calculation formula, a variable in the target shape calculation formula is a numerical value in a shape of the input data, and a value of the target shape calculation formula represents a value of a t-th dimension in a target output shape, and t is a positive integer and is less than or equal to a dimension number in the target output shape;
Accordingly, the one or more instructions may be loaded by the processor and executed in particular in predicting a target output shape based on the shape of the input data and the reference information:
Invoking the target shape calculation formula, and calculating the value of the t dimension in the target output shape based on the value in the shape of the input data to obtain a target value;
Acquiring the numerical values of all the dimensions except the t dimension from the shape of the data calculation result of the expansion data;
And integrating the acquired values of all the dimensions with the target value to obtain the target output shape.
In another embodiment, the computational graph includes a plurality of nodes, each node having respective data processing logic, the data processing logic of the plurality of nodes constituting the computational logic of the computational graph;
Accordingly, the one or more instructions may be loaded by a processor and executed in particular in generating the target shape calculation based on the computational logic of the computational graph:
Based on the topological structure of the computational graph, sequencing a plurality of nodes in the computational graph to obtain the topological sequence of the plurality of nodes;
Processing corresponding nodes according to the topological order of the nodes in turn according to the data processing logic of each node to obtain a shape calculation formula of each node, wherein variables in the shape calculation formula of any node are numerical values in the shape of input data;
And taking the shape calculation formula of the last node as the target shape calculation formula.
In another embodiment, the binning process is performed if the computational graph is detected to be binnable;
The computing graph records at least one operator, and correspondingly, when detecting whether the computing graph has the partitionability, the one or more instructions can be loaded and specifically executed by a processor:
traversing each operator recorded by the computation graph, and taking the currently traversed operator as a current operator;
Predicting whether the preset data element influences the original calculation result of the input data or not when the current operator calculates the expanded input data after expanding the input data by using the preset data element according to the data processing logic of the current operator, wherein the original calculation result is the result obtained by calculating the input data by the current operator;
if the preset data element does not influence the original calculation result of the input data, determining that the current operator meets the bucket dividing condition;
After each operator traverses, if each operator meets the bucket dividing condition, determining that the computational graph has the bucket dividing property, and if at least one operator does not meet the bucket dividing condition, determining that the computational graph does not have the bucket dividing property.
In another embodiment, when predicting whether, according to the data processing logic of the current operator, when the current operator calculates the extended input data after extending the input data using a preset data element, the preset data element affects an original calculation result of the input data, the one or more instructions may be loaded by the processor and specifically executed:
predicting whether a computing operation aiming at a preset data element exists when the current operator computes the expanded input data after expanding the input data by using the preset data element according to the data processing logic of the current operator;
and if the computing operation for the preset data element does not exist, determining that the preset data element does not influence the original computing result of the input data.
In another embodiment, the one or more instructions may be loaded by a processor and executed in particular:
If the computing operation aiming at the preset data element exists, detecting the computing relevance between the preset data element and the input data according to the data processing logic of the current operator;
if the calculation relevance is detected, determining that the preset data element influences the original calculation result of the input data;
and if the calculation relevance is not detected, determining that the preset data element does not influence the original calculation result of the input data.
The embodiment of the application can acquire the calculation graphs of P dynamic inputs and the barrel dividing marks of the calculation graphs, wherein the barrel dividing marks are used for indicating groups to which each dynamic input belongs and barrel dividing modes of each group, the dynamic shapes of the dynamic inputs in the same group have the same value ranges, and the value ranges of the dynamic shapes of the dynamic inputs are processed in barrel dividing modes of the corresponding groups. Furthermore, before executing the calculation graph, the calculation graph can be subjected to barrel division processing according to the indication of barrel division marks, a plurality of barrels are obtained, the calculation graph is subjected to static compiling based on the numerical value in each barrel, so that after acquiring the input data of the calculation graph, the calculation graph is executed on the input data based on the compiling results corresponding to the barrels, and the data calculation result of the input data is obtained.
It should be noted that, according to an aspect of the present application, there is also provided a computer program product or a computer program, which comprises one or more instructions stored in a computer storage medium. The processor of the computer device reads one or more instructions from the computer storage medium, and the processor executes the one or more instructions to cause the computer device to perform the methods provided in the various alternatives of any of the method embodiment aspects described above. It should be understood that the foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (16)

1.一种计算图的处理方法,其特征在于,包括:1. A method for processing a computational graph, comprising: 获取计算图,所述计算图包括P个动态输入,P为正整数;其中,每个动态输入具备一个动态形状,每个动态形状具有一个取值范围;Obtain a computation graph, wherein the computation graph includes P dynamic inputs, where P is a positive integer; wherein each dynamic input has a dynamic shape, and each dynamic shape has a value range; 获取所述计算图的分桶标记,所述分桶标记用于指示:每个动态输入所属的组别以及每个组别的分桶方式;其中,同一个组别中的各个动态输入的动态形状具有相同的取值范围,且各个动态输入的动态形状的取值范围按照相应组别的分桶方式进行分桶处理;Obtaining a bucket mark of the computation graph, the bucket mark being used to indicate: the group to which each dynamic input belongs and the bucketing method of each group; wherein the dynamic shapes of each dynamic input in the same group have the same value range, and the value range of the dynamic shape of each dynamic input is bucketed according to the bucketing method of the corresponding group; 按照所述分桶标记的指示,对所述P个动态输入的动态形状的取值范围进行分桶处理,得到多个桶;每个桶中包括P个数值,所述P个数值中的不同数值是从不同的取值范围中选取出的;According to the indication of the bucket mark, the value ranges of the dynamic shapes of the P dynamic inputs are bucketed to obtain a plurality of buckets; each bucket includes P values, and different values of the P values are selected from different value ranges; 基于每个桶中的数值对所述计算图进行静态编译,得到所述每个桶对应的编译结果;Static compile the computation graph based on the value in each bucket to obtain a compilation result corresponding to each bucket; 在获取到所述计算图的输入数据后,基于所述多个桶对应的编译结果,对所述输入数据执行所述计算图,得到所述输入数据的数据计算结果。After the input data of the calculation graph is acquired, the calculation graph is executed on the input data based on the compilation results corresponding to the multiple buckets to obtain a data calculation result of the input data. 2.如权利要求1所述的方法,其特征在于,动态输入的动态形状包括N个维度的数值,且所述N个维度中存在动态维度,N为正整数;2. The method of claim 1, wherein the dynamic shape of the dynamic input includes values of N dimensions, and there is a dynamic dimension in the N dimensions, and N is a positive integer; 其中,所述动态维度是指数值支持动态变化的维度;动态形状的取值范围是指:动态形状中的动态维度的取值范围。The dynamic dimension refers to a dimension whose value supports dynamic changes; the value range of the dynamic shape refers to: the value range of the dynamic dimension in the dynamic shape. 3.如权利要求2所述的方法,其特征在于,所述分桶标记包括:每个动态输入的组别标记,以及至少一个组别的属性标记;其中:3. The method according to claim 2, wherein the bucket marking comprises: a group mark of each dynamic input, and an attribute mark of at least one group; wherein: 动态输入所属的组别是指:动态输入的动态形状中的动态维度所属的组别;动态输入的组别标记用于指示:动态输入的动态形状中的动态维度,以及相应动态维度所属的组别;The group to which the dynamic input belongs refers to: the group to which the dynamic dimension in the dynamic shape of the dynamic input belongs; the group tag of the dynamic input is used to indicate: the dynamic dimension in the dynamic shape of the dynamic input, and the group to which the corresponding dynamic dimension belongs; 一个组别对应一个取值范围,组别的属性标记用于指示:组别对应的取值范围以及组别的分桶方式。A group corresponds to a value range, and the attribute tag of the group is used to indicate: the value range corresponding to the group and the bucketing method of the group. 4.如权利要求1-3任一项所述的方法,其特征在于,所述基于每个桶中的数值对所述计算图进行静态编译,得到所述每个桶对应的编译结果,包括:4. The method according to any one of claims 1 to 3, wherein the statically compiling the computation graph based on the value in each bucket to obtain the compilation result corresponding to each bucket comprises: 遍历所述多个桶,确定当前遍历的第k个桶,k为正整数且小于或等于桶的总数;Traversing the multiple buckets, determining the kth bucket currently traversed, where k is a positive integer and is less than or equal to the total number of buckets; 分别将所述第k个桶中的各个数值,作为相应动态输入的动态形状的取值,得到所述第k个桶的静态形状;其中,当所述第k个桶中的第p个数值从第p个动态输入的动态形状的取值范围中选取出时,所述第p个数值与所述第p个动态输入对应,p∈[1,P];Taking each value in the k-th bucket as a value of the dynamic shape of the corresponding dynamic input, respectively, to obtain the static shape of the k-th bucket; wherein, when the p-th value in the k-th bucket is selected from the value range of the dynamic shape of the p-th dynamic input, the p-th value corresponds to the p-th dynamic input, p∈[1,P]; 在所述多个桶均被遍历后,分别基于每个桶的静态形状对所述计算图进行编译,得到所述每个桶对应的编译结果。After the multiple buckets are traversed, the computation graph is compiled based on the static shape of each bucket to obtain a compilation result corresponding to each bucket. 5.如权利要求1-3任一项所述的方法,其特征在于,所述每个桶具有一个静态形状;所述基于所述多个桶对应的编译结果,对所述输入数据执行所述计算图,得到所述输入数据的数据计算结果,包括:5. The method according to any one of claims 1 to 3, wherein each of the buckets has a static shape; and executing the computation graph on the input data based on the compilation results corresponding to the multiple buckets to obtain the data computation result of the input data comprises: 基于所述输入数据的形状,从所述多个桶中选取目标桶,所述目标桶的静态形状大于或等于所述输入数据的形状;Based on the shape of the input data, selecting a target bucket from the multiple buckets, wherein the static shape of the target bucket is greater than or equal to the shape of the input data; 在所述目标桶的静态形状大于所述输入数据的形状时,对所述输入数据进行扩充处理,得到扩充数据,所述扩充数据的形状与所述目标桶中的静态形状相同;When the static shape of the target bucket is larger than the shape of the input data, the input data is expanded to obtain expanded data, and the shape of the expanded data is the same as the static shape in the target bucket; 基于所述目标桶对应的编译结果,对所述扩充数据执行所述计算图,得到所述扩充数据的数据计算结果;Based on the compilation result corresponding to the target bucket, executing the calculation graph on the extended data to obtain a data calculation result of the extended data; 获取用于形状预测的参考信息,所述参考信息是基于所述计算图的计算逻辑生成的;并根据所述输入数据的形状和所述参考信息预测目标输出形状,所述目标输出形状是所述输入数据对应的数据计算结果的形状;Acquire reference information for shape prediction, the reference information being generated based on the computational logic of the computational graph; and predict a target output shape according to the shape of the input data and the reference information, the target output shape being the shape of a data computation result corresponding to the input data; 基于所述目标输出形状,对所述扩充数据的数据计算结果进行数据提取,得到所述输入数据对应的数据计算结果。Based on the target output shape, data extraction is performed on the data calculation results of the expanded data to obtain the data calculation results corresponding to the input data. 6.如权利要求5所述的方法,其特征在于,所述参考信息包括目标形状计算式,所述目标形状计算式中的变量为输入数据的形状中的数值,且所述目标形状计算式的值表示目标输出形状中的第t个维度的取值,t为正整数且小于或等于所述目标输出形状中的维度数;6. The method according to claim 5, wherein the reference information comprises a target shape calculation formula, the variables in the target shape calculation formula are numerical values in the shape of the input data, and the value of the target shape calculation formula represents the value of the t-th dimension in the target output shape, where t is a positive integer and is less than or equal to the number of dimensions in the target output shape; 所述根据所述输入数据的形状和所述参考信息预测目标输出形状,包括:The predicting the target output shape according to the shape of the input data and the reference information comprises: 调用所述目标形状计算式,基于所述输入数据的形状中的数值,计算目标输出形状中的第t个维度的取值,得到目标数值;Calling the target shape calculation formula, based on the value in the shape of the input data, calculates the value of the tth dimension in the target output shape to obtain the target value; 从所述扩充数据的数据计算结果的形状中,获取除所述第t个维度以外的各个维度的数值;From the shape of the data calculation result of the extended data, obtaining the values of each dimension except the t-th dimension; 对获取到的各个维度的数值和所述目标数值进行整合,得到所述目标输出形状。The acquired values of each dimension and the target value are integrated to obtain the target output shape. 7.如权利要求6所述的方法,其特征在于,所述计算图包括多个节点,每个节点具有各自的数据处理逻辑,所述多个节点的数据处理逻辑构成所述计算图的运算逻辑;7. The method according to claim 6, wherein the computation graph comprises a plurality of nodes, each node having its own data processing logic, and the data processing logics of the plurality of nodes constitute the operation logic of the computation graph; 其中,基于所述计算图的运算逻辑生成所述目标形状计算式的过程,包括:The process of generating the target shape calculation formula based on the operation logic of the calculation graph includes: 基于所述计算图的拓扑结构,对所述计算图中的多个节点进行排序,得到所述多个节点的拓扑顺序;Based on the topological structure of the computation graph, sorting multiple nodes in the computation graph to obtain a topological order of the multiple nodes; 按照所述多个节点的拓扑顺序,依次根据每个节点的数据处理逻辑对相应节点进行处理,得到所述每个节点的形状计算式;其中,任一节点的形状计算式中的变量均为输入数据的形状中的数值;According to the topological order of the plurality of nodes, the corresponding nodes are processed in turn according to the data processing logic of each node to obtain a shape calculation formula of each node; wherein the variables in the shape calculation formula of any node are the values in the shape of the input data; 将最后一个节点的形状计算式,作为所述目标形状计算式。The shape calculation formula of the last node is used as the target shape calculation formula. 8.如权利要求7所述的方法,其特征在于,所述多个节点包括:P个输入节点、R个算子节点和输出节点,R为正整数;其中,一个输入节点表示所述计算图的一个动态输入,每个输入节点的动态形状包括动态维度;8. The method of claim 7, wherein the plurality of nodes comprises: P input nodes, R operator nodes and output nodes, where R is a positive integer; wherein an input node represents a dynamic input of the computation graph, and the dynamic shape of each input node comprises a dynamic dimension; 所述每个输入节点均为所述拓扑顺序指示的起始节点,且所述输出节点为所述拓扑顺序指示的最后一个节点;其中:Each of the input nodes is a starting node indicated by the topological order, and the output node is the last node indicated by the topological order; wherein: 对第i个输入节点的处理包括:基于变量xi和所述第i个输入节点的动态形状中的动态维度的维数j,生成所述第i个输入节点的形状计算式,将生成的形状计算式记录在所述第i个输入节点的输出形状上,并将所述第i个输入节点的输出形状传递至所述第i个输入节点的下一节点;其中,所述第i个输入节点的形状计算式用于指示:所述第i个输入节点的动态形状中的第j个维度的取值为变量xi;i∈[1,P],j为正整数;The processing of the i-th input node includes: generating a shape calculation formula of the i-th input node based on a variable x i and a dimension j of a dynamic dimension in a dynamic shape of the i-th input node, recording the generated shape calculation formula on an output shape of the i-th input node, and transferring the output shape of the i-th input node to a next node of the i-th input node; wherein the shape calculation formula of the i-th input node is used to indicate that the value of the j-th dimension in the dynamic shape of the i-th input node is the variable x i ; i∈[1, P], j is a positive integer; 对第r个算子节点的处理包括:根据所述第r个算子节点的数据处理逻辑和所述第r个算子节点接收到的形状,推导所述第r个算子节点的输出形状,并将推导出的输出形状传递给所述第r个算子节点的下一节点,r∈[1,R];The processing of the rth operator node includes: deriving an output shape of the rth operator node according to the data processing logic of the rth operator node and the shape received by the rth operator node, and passing the derived output shape to the next node of the rth operator node, r∈[1,R]; 对所述输出节点的处理包括:将所述输出节点接收到的形状,作为所述输出节点的输出形状,所述输出节点的输出形状上记录了形状计算式。The processing of the output node includes: using the shape received by the output node as the output shape of the output node, and recording a shape calculation formula on the output shape of the output node. 9.如权利要求8所述的方法,其特征在于,所述第r个算子节点接收到的形状表示为目标形状;9. The method of claim 8, wherein the shape received by the rth operator node is represented as a target shape; 当所述第r个算子节点的数据处理逻辑包括:针对所述目标形状的计算操作时,所述目标形状中的形状计算式记录在所述第r个算子节点的输出数据上,则对所述第r个算子节点的处理还包括:将所述第r个算子节点的输出数据传递给所述第r个算子节点的下一节点。When the data processing logic of the rth operator node includes: when performing a calculation operation on the target shape, the shape calculation formula in the target shape is recorded on the output data of the rth operator node, then the processing of the rth operator node also includes: passing the output data of the rth operator node to the next node of the rth operator node. 10.如权利要求1所述的方法,其特征在于,所述分桶处理是在检测到所述计算图具有可分桶性的情况下执行的;10. The method of claim 1, wherein the bucketing process is performed when it is detected that the computation graph is bucketizable; 其中,输入数据包括至少一个数据元素;所述计算图记录了至少一个算子,检测所述计算图是否具有所述可分桶性的方式包括:The input data includes at least one data element; the computation graph records at least one operator, and the method for detecting whether the computation graph has the bucketability includes: 遍历所述计算图所记录的各个算子,将当前遍历的算子作为当前算子;Traverse each operator recorded in the computation graph, and use the currently traversed operator as the current operator; 根据所述当前算子的数据处理逻辑,预测当使用预设数据元素扩充输入数据后,所述当前算子对扩充后的输入数据进行计算时,所述预设数据元素是否影响输入数据的原始计算结果;其中,所述原始计算结果是指:所述当前算子对输入数据进行计算所得到的结果;According to the data processing logic of the current operator, predict whether the preset data element affects the original calculation result of the input data when the current operator calculates the expanded input data after the input data is expanded using the preset data element; wherein the original calculation result refers to: the result obtained by the current operator calculating the input data; 若所述预设数据元素影响输入数据的原始计算结果,则确定所述当前算子未满足分桶条件;若所述预设数据元素不影响输入数据的原始计算结果,则确定所述当前算子满足分桶条件;If the preset data element affects the original calculation result of the input data, it is determined that the current operator does not meet the bucketing condition; if the preset data element does not affect the original calculation result of the input data, it is determined that the current operator meets the bucketing condition; 在所述各个算子均遍历后,若所述各个算子均满足分桶条件,则确定所述计算图具有可分桶性,若存在至少一个算子未满足分桶条件,则确定所述计算图未具有可分桶性。After all the operators are traversed, if all the operators meet the bucketing condition, it is determined that the computation graph is bucketable; if there is at least one operator that does not meet the bucketing condition, it is determined that the computation graph is not bucketable. 11.如权10所述的方法,其特征在于,所述根据所述当前算子的数据处理逻辑,预测当使用预设数据元素扩充输入数据后,所述当前算子对扩充后的输入数据进行计算时,所述预设数据元素是否影响输入数据的原始计算结果,包括:11. The method according to claim 10, characterized in that the predicting, based on the data processing logic of the current operator, whether the preset data element affects the original calculation result of the input data when the current operator calculates the expanded input data after the input data is expanded using the preset data element comprises: 根据所述当前算子的数据处理逻辑,预测当使用预设数据元素扩充输入数据后,所述当前算子对扩充后的输入数据进行计算时,是否存在针对所述预设数据元素的计算操作;According to the data processing logic of the current operator, predict whether there is a calculation operation for the preset data element when the current operator calculates the expanded input data after the input data is expanded using the preset data element; 若不存在针对所述预设数据元素的计算操作,则确定所述预设数据元素不影响输入数据的原始计算结果。If there is no calculation operation for the preset data element, it is determined that the preset data element does not affect the original calculation result of the input data. 12.如权利要求11所述的方法,其特征在于,所述方法还包括:12. The method according to claim 11, characterized in that the method further comprises: 若存在针对所述预设数据元素的计算操作,则根据所述当前算子的数据处理逻辑,检测所述预设数据元素和所述输入数据之间的计算关联性;If there is a calculation operation for the preset data element, detecting the calculation association between the preset data element and the input data according to the data processing logic of the current operator; 如果检测到所述计算关联性,则确定所述预设数据元素影响输入数据的原始计算结果;If the computational association is detected, determining that the preset data element affects the original computational result of the input data; 如果未检测到所述计算关联性,则确定所述预设数据元素不影响输入数据的原始计算结果。If the calculation association is not detected, it is determined that the preset data element does not affect the original calculation result of the input data. 13.一种计算图的处理装置,其特征在于,包括:13. A computing graph processing device, comprising: 获取单元,用于获取计算图,所述计算图包括P个动态输入,P为正整数;其中,每个动态输入具备一个动态形状,每个动态形状具有一个取值范围;An acquisition unit, used to acquire a computation graph, wherein the computation graph includes P dynamic inputs, where P is a positive integer; wherein each dynamic input has a dynamic shape, and each dynamic shape has a value range; 所述获取单元,还用于获取所述计算图的分桶标记,所述分桶标记用于指示:每个动态输入所属的组别以及每个组别的分桶方式;其中,同一个组别中的各个动态输入的动态形状具有相同的取值范围,且各个动态输入的动态形状的取值范围按照相应组别的分桶方式进行分桶处理;The acquisition unit is further used to acquire a bucket mark of the computation graph, wherein the bucket mark is used to indicate: the group to which each dynamic input belongs and the bucketing method of each group; wherein the dynamic shapes of the dynamic inputs in the same group have the same value range, and the value range of the dynamic shape of each dynamic input is bucketed according to the bucketing method of the corresponding group; 处理单元,用于按照所述分桶标记的指示,对所述P个动态输入的动态形状的取值范围进行分桶处理,得到多个桶;每个桶中包括P个数值,所述P个数值中的不同数值是从不同的取值范围中选取出的;a processing unit, configured to perform bucket processing on the value ranges of the dynamic shapes of the P dynamic inputs according to the indication of the bucket marking, to obtain a plurality of buckets; each bucket includes P values, and different values among the P values are selected from different value ranges; 所述处理单元,还用于基于每个桶中的数值对所述计算图进行静态编译,得到所述每个桶对应的编译结果;The processing unit is further used to statically compile the calculation graph based on the value in each bucket to obtain a compilation result corresponding to each bucket; 所述处理单元,还用于在获取到所述计算图的输入数据后,基于所述多个桶对应的编译结果,对所述输入数据执行所述计算图,得到所述输入数据的数据计算结果。The processing unit is further configured to, after acquiring the input data of the calculation graph, execute the calculation graph on the input data based on the compilation results corresponding to the multiple buckets to obtain a data calculation result of the input data. 14.一种计算机设备,包括输入接口和输出接口,其特征在于,还包括:处理器以及计算机存储介质;14. A computer device, comprising an input interface and an output interface, characterized in that it also comprises: a processor and a computer storage medium; 其中,所述处理器适于实现一条或多条指令,所述计算机存储介质存储有一条或多条指令,所述一条或多条指令适于由所述处理器加载并执行如权利要求1-12任一项所述的计算图的处理方法。The processor is suitable for implementing one or more instructions, the computer storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by the processor and executed by the method for processing a computational graph as described in any one of claims 1-12. 15.一种计算机存储介质,其特征在于,所述计算机存储介质存储有一条或多条指令,所述一条或多条指令适于由处理器加载并执行如权利要求1-12任一项所述的计算图的处理方法。15. A computer storage medium, characterized in that the computer storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing the method for processing a computational graph as described in any one of claims 1-12. 16.一种计算机程序产品,其特征在于,所述计算机程序产品包括一条或多条指令;所述计算机程序中的一条或多条指令被处理器执行时,实现如权利要求1-12任一项所述的计算图的处理方法。16. A computer program product, characterized in that the computer program product comprises one or more instructions; when the one or more instructions in the computer program are executed by a processor, the method for processing a computational graph as described in any one of claims 1 to 12 is implemented.
CN202411983319.0A 2024-12-28 2024-12-28 Computational graph processing method, device, equipment and storage medium Active CN119902771B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411983319.0A CN119902771B (en) 2024-12-28 2024-12-28 Computational graph processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411983319.0A CN119902771B (en) 2024-12-28 2024-12-28 Computational graph processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN119902771A true CN119902771A (en) 2025-04-29
CN119902771B CN119902771B (en) 2025-10-24

Family

ID=95467663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411983319.0A Active CN119902771B (en) 2024-12-28 2024-12-28 Computational graph processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN119902771B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120723249A (en) * 2025-09-02 2025-09-30 北京清微智能科技有限公司 Compilation method, device, storage medium and electronic device based on dynamic shape scene

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445776A (en) * 2020-11-20 2021-03-05 北京易观智库网络科技有限公司 Presto-based dynamic barrel dividing method, system, equipment and readable storage medium
CN117908894A (en) * 2022-10-19 2024-04-19 华为技术有限公司 Calculation graph processing method and device
CN117908967A (en) * 2024-03-15 2024-04-19 北京壁仞科技开发有限公司 Method, apparatus, medium and program product for supporting computation of dynamic shapes
CN118672592A (en) * 2024-06-19 2024-09-20 北京百度网讯科技有限公司 Compiling method, compiling device, compiling equipment and storage medium
US20240419449A1 (en) * 2021-11-01 2024-12-19 Cambricon Singgo (Nanjing) Technology Co., Ltd. Computational graph compiling and scheduling methods and related products

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445776A (en) * 2020-11-20 2021-03-05 北京易观智库网络科技有限公司 Presto-based dynamic barrel dividing method, system, equipment and readable storage medium
US20240419449A1 (en) * 2021-11-01 2024-12-19 Cambricon Singgo (Nanjing) Technology Co., Ltd. Computational graph compiling and scheduling methods and related products
CN117908894A (en) * 2022-10-19 2024-04-19 华为技术有限公司 Calculation graph processing method and device
CN117908967A (en) * 2024-03-15 2024-04-19 北京壁仞科技开发有限公司 Method, apparatus, medium and program product for supporting computation of dynamic shapes
CN118672592A (en) * 2024-06-19 2024-09-20 北京百度网讯科技有限公司 Compiling method, compiling device, compiling equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120723249A (en) * 2025-09-02 2025-09-30 北京清微智能科技有限公司 Compilation method, device, storage medium and electronic device based on dynamic shape scene

Also Published As

Publication number Publication date
CN119902771B (en) 2025-10-24

Similar Documents

Publication Publication Date Title
US11461317B2 (en) Method, apparatus, system, device, and storage medium for answering knowledge questions
CN111047563B (en) Neural network construction method applied to medical ultrasonic image
KR102134952B1 (en) Data processing method and system
CN110866029B (en) sql statement construction method, device, server and readable storage medium
CN108710662B (en) Language conversion method and device, storage medium, data query system and method
CN106293891B (en) Multidimensional investment index monitoring method
WO2025026013A1 (en) Multi-modal task processing method and system, multi-modal dialogue task processing method and system, and device
CN109408493A (en) A kind of moving method and system of data source
CN101751333A (en) Method, computer program and computer system for assisting in analyzing program
CN103927314B (en) A kind of method and apparatus of batch data processing
CN106909454B (en) Rule processing method and equipment
CN108509453B (en) Information processing method and device
CN108763536A (en) Data bank access method and device
CN111241059B (en) Database optimization method and device based on database
CN119902771B (en) Computational graph processing method, device, equipment and storage medium
CN110928941B (en) A data fragmentation extraction method and device
CN107871055B (en) Data analysis method and device
CN118897866A (en) A data query method and related device based on relational cluster database
CN117609870A (en) Structure recognition model training, model structure recognition method, device and medium
CN109711555B (en) Method and system for predicting single-round iteration time of deep learning model
CN116484947B (en) Operator automatic generation method, device, equipment and medium
CN119806541A (en) A compiling and executing method, device, equipment and medium for dynamic neural network
US11386155B2 (en) Filter evaluation in a database system
CN115730507A (en) Model engine construction method, kernel function processing method, device and storage medium
CN115269654B (en) Data cache supplementing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant