CN110515739B - Deep learning neural network model load calculation method, device, equipment and medium - Google Patents
Deep learning neural network model load calculation method, device, equipment and medium Download PDFInfo
- Publication number
- CN110515739B CN110515739B CN201911008660.3A CN201911008660A CN110515739B CN 110515739 B CN110515739 B CN 110515739B CN 201911008660 A CN201911008660 A CN 201911008660A CN 110515739 B CN110515739 B CN 110515739B
- Authority
- CN
- China
- Prior art keywords
- computing
- resource allocation
- calculation
- network model
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 186
- 238000013135 deep learning Methods 0.000 title claims abstract description 35
- 238000003062 neural network model Methods 0.000 title claims abstract description 34
- 238000013468 resource allocation Methods 0.000 claims abstract description 182
- 239000011159 matrix material Substances 0.000 claims abstract description 59
- 238000000034 method Methods 0.000 claims abstract description 45
- 238000003860 storage Methods 0.000 claims description 69
- 238000012545 processing Methods 0.000 claims description 35
- 238000009826 distribution Methods 0.000 claims description 33
- 230000015654 memory Effects 0.000 claims description 19
- 238000012546 transfer Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 9
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 238000004088 simulation Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000011056 performance test Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012356 Product development Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000010923 batch production Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000007847 structural defect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/10—Interfaces, programming languages or software development kits, e.g. for simulating neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5017—Task decomposition
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the invention discloses a load calculation method, a device, equipment and a medium for deep learning neural network models, wherein the method comprises the steps of analyzing a pre-constructed network model, decomposing a calculation flow of the network model into at least two calculation tasks, dividing each calculation task to form at least calculation subtasks, respectively allocating resources for all calculation subtasks related to the calculation tasks according to each resource allocation strategy to obtain an allocation data set of the calculation tasks under each resource allocation strategy, counting the allocation data set of the calculation tasks under each resource allocation strategy to form a load matrix of the network model, and calculating the operation time of each calculation subtask according to a performance parameter set of a chip to be evaluated to determine a performance matrix.
Description
Technical Field
The embodiment of the invention relates to the field of data processing, in particular to deep learning neural network model load calculation methods, devices, equipment and media.
Background
The rapid development of the artificial intelligence industry has put higher demands on the computing power of computers, and various major semiconductor manufacturers are actively developing and launching special chips aiming at accelerating deep learning training and reasoning processes.
The development and manufacturing of chips are relatively long processes, and generally, the rationality verification of chip architecture design and the evaluation of computing performance need to be performed after small-batch production and sample acquisition, so that the iterative cycle of product development can be greatly increased, and even the product time to market can be indefinitely delayed, which is unacceptable for various semiconductor manufacturers.
The existing solution is to use a dedicated server to simulate a chip architecture, and provide complete sets of matched software and hardware solutions by a server, and perform performance verification of the chip based on the solutions, but the solution is expensive in manufacturing cost, and the simulation service software has a slow operating speed, generally simple test samples need to operate for hours, and in addition, for verification of an accelerated chip architecture supporting parallel computing, the splitting of computing tasks and different scheduling strategies of on-chip hardware resources lead to different operating loads of the chip, thereby bringing different performance performances, and the trial and exploration of the strategies are helpful for finding structural defects at the beginning of chip design.
Disclosure of Invention
The embodiment of the invention provides deep learning neural network model load calculation methods, devices, equipment and media, which can improve the performance simulation speed of a chip running a learning network model.
, the embodiment of the invention provides methods for calculating the load of deep learning neural network models, which include:
analyzing a pre-constructed network model, and decomposing a calculation process of the network model into at least two calculation tasks; wherein the at least two computing tasks have a dependency relationship;
dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks;
distributing resources for all computing subtasks associated with the computing task according to each resource distribution strategy to obtain a distribution data set of the computing task under each resource distribution strategy; the resources include computing resources and storage resources;
counting distribution data sets of each calculation task under each resource distribution strategy to form a load matrix of the network model;
and calculating the running time of each calculation subtask obtained by decomposing the network model under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip for running the network model.
In a second aspect, an embodiment of the present invention provides deep learning neural network model load calculation apparatuses, including:
the computing task analysis module is used for analyzing a pre-constructed network model and decomposing the computing process of the network model into at least two computing tasks; wherein the at least two computing tasks have a dependency relationship;
the computing task dividing module is used for dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks;
the resource allocation module is used for allocating resources for all the computing subtasks associated with the computing task according to the resource allocation strategies to obtain an allocation data set of the computing task under the resource allocation strategies; the resources include computing resources and storage resources;
a load matrix generation module, configured to count an allocation data set of each computation task under each resource allocation policy to form a load matrix of the network model;
and the performance matrix calculation module is used for calculating the running time of each calculation subtask obtained by decomposing the network model under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip for running the network model.
In a third aspect, an embodiment of the present invention further provides computer apparatuses, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the deep learning neural network model load calculation method according to any in the embodiment of the present invention when the processor executes the program.
In a fourth aspect, the present invention further provides computer-readable storage media, where the computer program is stored, and when the computer program is executed by a processor, the deep learning neural network model load calculation method according to any in the present invention is implemented.
The embodiment of the invention automatically analyzes the deep learning neural network model to form at least two calculation tasks, divides the calculation tasks according to the pre-configured resource allocation strategy in step to form calculation sub-tasks, respectively allocates resources for the calculation sub-tasks according to different resource allocation strategies to obtain load matrixes under different resource allocation strategies, and calculates the operation time of each calculation sub-task under different resource allocation strategies based on the performance parameter set of the chip to be evaluated, thereby determining the performance matrix of the chip operation network model for evaluating the performance of the chip operation network model, solving the problems of high economic cost and low efficiency of the chip simulation operation network model in the prior art, and improving the performance simulation speed of the chip operating the learning network model.
Drawings
FIG. 1 is a flowchart of a deep learning neural network model load calculation method in of an embodiment of the present invention;
FIG. 2 is a flowchart of methods for calculating the load of deep learning neural network model according to the second embodiment of the present invention;
FIG. 3 is a schematic structural diagram of deep learning neural network model load calculation apparatuses according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of computer devices in the fourth embodiment of the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the drawings and examples, it being understood that the specific embodiments herein described are merely illustrative of and not restrictive on the broad invention, and it should be further noted that for the purposes of description, only some, but not all, of the structures associated with the present invention are shown in the drawings.
Example
Fig. 1 is a flowchart of a deep learning neural network model load calculation method in , where this embodiment is applicable to a case of simulating a process of running a network model on a chip, and the method can be executed by a deep learning neural network model load calculation apparatus provided in this embodiment of the present invention, the apparatus can be implemented in a software and/or hardware manner, and can be integrated into a computer device, such as a terminal device or a server, as shown in fig. 1, the method of this embodiment specifically includes:
s110, analyzing a pre-constructed network model, and decomposing a calculation process of the network model into at least two calculation tasks; wherein the at least two computing tasks have a dependency relationship.
The network model may be referred to as a Deep Learning Neural network model (Deep Learning Neural Networks).
The computational flow of the network model is used to represent a plurality of successive computational steps that the network model needs to perform at runtime. Wherein the calculation flow may be converted into a plurality of successive calculation steps.
A computational task is used to represent a certain computational step or steps. Each computational task is different.
The method comprises the steps of combining a plurality of computing tasks with dependency relationships in order, namely, a computing process of a network model is a computing task sequence, and the arrangement order of the computing tasks in the computing task sequence is the execution order of the computing tasks.
Illustratively, the data processing operations may include Padding (Padding), morphing (Reshape), Convolution (Convolution), and Pooling (Pooling), among others.
The network model may be built through a predefined programming interface. The user can input data related to the network model through a predefined programming interface to build the neural network model. Illustratively, the network model is established asWhereinrepresenting some level of the neural network.
It will be appreciated that the user may transfer the network model to the programming interface, and the structure of the network model may be obtained via the programming interface, i.e., the structure of each layers in the network model is determined and fixed during subsequent processingProcessing so that data processing at each level can be used as a computational task for the network modelS continuous computing task sequences can be obtained,Wherein, fromToIn, computing taskIs inputted as(ii) a The boxes representing computational tasksThere is a dependency relationship between the first calculation tasks as the input of the last calculation tasks, and the S calculation tasks are executed sequentially to ensure the correctness of the calculation result.
In fact, the computation task is divided from the viewpoint of the running time sequence of the network model, that is, the computation task is divided from the time of the computation flow of the network model.
And S120, dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks.
Generally, a resource allocation policy is used to allocate resources required for executing a computing task, and the resource allocation policy may refer to a resource allocation manner, where the resources may specifically include computing resources and storage resources. The computing resources are used to perform computing tasks. The storage resources are used for storing data associated with executing the computing task. The calculation sub-tasks are used for forming the calculation tasks and are part of calculation in the calculation tasks.
In practice, the computing task may also be further divided, e.g., subdivided into a plurality of computing subtasks. Each calculation subtask is different and independent, and all calculation subtasks form a complete calculation task. The dividing manner may be that the calculation amount of the calculation task is divided into n calculation subtasks equally, where n is greater than or equal to 1, and may be specifically set according to needs, which is not limited in the embodiments of the present invention. Exemplarily, the calculation task is convolution calculation on 10 feature maps, and the calculation task can be divided into 10 calculation subtasks, wherein each calculation subtask is convolution calculation on 1 feature map; or the calculation task can be divided into 5 calculation subtasks, each calculation subtask is used for performing convolution calculation on 2 feature maps, and the feature maps of the convolution calculation performed by each calculation subtask are different from each other.
As in the previous example, the computing task isThe relationship between the computing task and the computing subtask is as follows:whereinfor the computation subtasks, each computation subtask has the same subscript step representation, and the computation subtasks are commonly assigned to the computation taskSuperscript differencing slave computing tasksThe Q items split in the method represent that each computation subtask has no dependency relationship and exists in parallel.
The dividing mode of each computing task can be determined according to the resource allocation strategy, that is, the number of computing subtasks which can be divided into each computing task is determined according to the resource allocation strategy, that is, the number of n is determined.
Specifically, the computing resource comprises a plurality of computing units, the number of the computing units is n, namely the computing task is divided into n computing subtasks, so that computing units execute computing subtasks.
In addition, the space size of the storage resource can be used as the basis for dividing the calculation sub-tasks; or the division mode of the calculation subtasks can be determined comprehensively according to the calculation resources and the storage resources. The specific configuration may be set as required, and the embodiment of the present invention is not particularly limited.
In summary, a computing sub-task is actually a unit task that divides the computing task into units to be executed in parallel at time .
S130, respectively allocating resources to all the computing subtasks associated with the computing task according to each resource allocation strategy to obtain an allocation data set of the computing task under each resource allocation strategy; the resources include computing resources and storage resources.
The distribution data set is used for describing resource distribution conditions of all the computing subtasks of the computing task division, and the distribution data set records the mapping relation between all the computing subtasks of the computing task division and resources.
The resource allocation for the computation subtasks is actually the allocation of computation resources and storage resources for the computation subtasks, that is, the allocation of processors and storage space for the computation subtasks.
, the allocation of resources for different resource allocation policies is different, in that the computing resources and/or storage resources are quantified and allocated to the computing subtasks in different quantities.
Specifically, the computing resources may be equally divided into N computing units, i.e., the computing resources are grouped into, computing units, it should be noted that the computing resource generally refers to a processor on a chip, computing units are an integer number of processors, wherein computing units may include processors, computing units may include two processors, or even more, and therefore, the embodiment of the present invention is not particularly limited.
Simultaneously, the storage resources are evenly divided into M equal parts to obtain a storage resource set,The storage resources are the storage space allocated for running the network model on the chip, and the storage space can be divided equally.
In practice, resource allocation policies specify that compute subtasks can be allocated the number of compute units, and the number of storage units,I.e. to ensureAll the computing sub-tasks obtained by computing task division are just divided into all the computing resources C and the storage resources D, and cannot be divided more or less, and the computing sub-tasks cannot be overlapped with each other.
Under resource allocation strategies, each computing sub-task obtained by computing task division is allocated to at least computing units and at least storage units, correspondingly, under different resource allocation strategies, each computing sub-task is allocated to different numbers of computing units and different numbers of storage units, and each computing sub-task under each resource allocation strategy, the number of the computing units allocated to each computing sub-task and the number of the storage units allocated to each computing sub-task are used as sequences, so that sets can be obtained by counting data of different resource allocation strategies, wherein the sets are actually allocation data sets of the computing tasks under each resource allocation strategy.
And S140, counting distribution data sets of the calculation tasks under the resource distribution strategies to form a load matrix of the network model.
The load matrix is used for describing the resource allocation condition of the network model under different resource allocation strategies. The load matrix records the mapping relation between each calculation stage and resource allocation in the operation process of the network model.
According to the method, the distribution data set of each calculation task under each resource distribution strategy is obtained, and a load matrix of the network model is formed.
S150, calculating the running time of each calculation subtask obtained by the network model decomposition under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip running the network model.
The performance parameter set is used for describing the performance of the chip to be evaluated. The performance parameter set records performance parameters of the chip, and is illustratively a Performance Dictionary (PD). Runtime is used to describe the length of time spent performing a computation subtask. The performance matrix is used for evaluating the operation performance of the chip to be evaluated. And calculating the running time of each calculation subtask according to the performance parameters of the chip and the calculation unit and the storage unit which are distributed by each calculation subtask under different resource distribution strategies to form a performance matrix corresponding to the load matrix. Therefore, the running time of each computing subtask obtained by decomposing the network model under different resource allocation strategies can be determined, namely the running time of the network model under different resource allocation strategies is determined, and therefore the performance of the chip running network model is determined. It will be appreciated that the shorter the run time of the network model, the higher the performance of the chip running the network model.
Optionally, the calculating, according to the performance parameter set of the chip to be evaluated and the load matrix, the running time of each computation subtask obtained by decomposing the network model under each resource allocation policy to determine the performance matrix includes: according to the performance parameter set of the chip to be evaluated, calculating input data carrying time, input data processing consumption time and result data carrying time of each calculation subtask in the load matrix under each resource allocation strategy; taking the sum of the input data transfer time, the input data processing consumption time and the result data transfer time of the calculation subtasks under the resource allocation strategy as the running time of the calculation subtasks under the resource allocation strategy; and forming a performance matrix of the network model according to the running time of each computing sub-task under each resource allocation strategy.
Typically, the execution of computation subtasks is completed through three processes of reading input data from the storage space, performing data processing on the input data, and writing result data obtained by the data processing into the storage space, so that the corresponding operation time of computation subtasks includes input data transfer time, input data processing consumption time, and result data transfer time.
The input data transfer time is used to describe a time period for the computing sub-task to acquire the input data, and may refer to a time period for transferring the input data from the allocated storage resource to the allocated computing resource. Specifically, the input data transfer time may be calculated based on the following formula:
wherein,the method is used for calculating the size of input data of a subtask, taking Byte as a statistical unit, and input bandwidth is a slave storage resourcePorting to computing resourcesThe data transmission bandwidth of the time can be obtained from the performance parameter set by taking Byte/s as a unit.
The processing consumption time of the input data is used for describing the time length consumed by the computing subtask for processing the input data correspondingly, and may refer to computing resourcesThe length of time the input data is processed. Specifically, the processing consumption time of the input data may be calculated based on the following formula:
wherein, the cost of data processing per byte refers to the time consumed for processing each byte of data. The time consumed to process each byte of data may be calculated by a computational taskMay also be obtained from a set of performance parameters.
The result data transfer time is used to describe a time period for the computing subtask to output the result data, and may refer to a time period for transferring the result data from the allocated computing resource to the allocated storage resource. The resulting data transfer time may be specifically calculated based on the following formula:
wherein,is the size of the result data obtained by the calculation subtask, and takes Byte as the statistical unit, and output bandwidth is the calculation resourceTransporting computed result data to storage resourcesThe data transmission bandwidth, in terms of Byte/s, can be obtained from the set of performance parameters.
And taking the sum of the input data transfer time, the processing consumption time of the input data and the result data transfer time of the calculation subtask as the running time of the calculation subtask under the resource allocation strategy.
And counting the running time of different computing subtasks under different resource allocation strategies, and taking the running time of the computing subtask with the longest running time as the running time of the computing task. It will be appreciated that the computation subtasks into which the computation task is decomposed are performed in parallel, such that the running time of the computation task is equal to the longest duration of the running time of the computation subtasks. Determining a running time of the computing task based on the following formula:
The method comprises the steps of determining the running time of each calculation subtask under each resource allocation strategy and accurately determining the running time of each calculation subtask by calculating the input data carrying time, the input data processing consumption time and the result data carrying time of each calculation subtask under each resource allocation strategy, thereby accurately counting the time consumed by the running of the whole network model.
The embodiment of the invention automatically analyzes the deep learning neural network model to form at least two calculation tasks, divides the calculation tasks according to the pre-configured resource allocation strategy in step to form calculation sub-tasks, respectively allocates resources for the calculation sub-tasks according to different resource allocation strategies to obtain load matrixes under different resource allocation strategies, and calculates the operation time of each calculation sub-task under different resource allocation strategies based on the performance parameter set of the chip to be evaluated, thereby determining the performance matrix of the chip operation network model for evaluating the performance of the chip operation network model, solving the problems of high economic cost and low efficiency of the chip simulation operation network model in the prior art, and improving the performance simulation speed of the chip operating the learning network model.
Example two
Fig. 2 is a flowchart of deep learning neural network model load calculation methods according to a second embodiment of the present invention, which is embodied on the basis of the above embodiments, and is configured to analyze a pre-constructed network model and decompose a calculation flow of the network model into at least two calculation tasks, specifically, analyze the network model and determine a hierarchical structure of the network model, where the hierarchical structure of the network model includes at least two layers, and form at least two calculation tasks by using data processing operations associated with the layers as calculation tasks.
Specifically, the method of this embodiment specifically includes:
s210, analyzing a pre-constructed network model, and determining the hierarchical structure of the network model, wherein the hierarchical structure of the network model comprises at least two layers.
The hierarchical structure of the network model comprises at least two layers, namely the calculation process of the network model can be decomposed into at least two calculation tasks.
The network model, the computing task, the dependency relationship, the resource allocation policy, the computing subtask, the allocation data set, the performance parameter set, and the performance matrix in this embodiment may refer to the description of the foregoing embodiments.
S220, taking the data processing operation associated with the layer as calculation tasks to form at least two calculation tasks, wherein the at least two calculation tasks have a dependency relationship.
The data processing operation related to the layers is calculation tasks, namely the data processing operation executed by nodes is calculation tasks, the result obtained by data processing performed by nodes is transmitted to the next nodes as input, so that the next nodes continue to perform the data processing operation, correspondingly, the calculation result of calculation tasks is used as the input data of the next calculation tasks, and therefore, the calculation flow of the network model is mapped to a plurality of calculation tasks with dependency relationship.
And S230, dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks.
Optionally, the dividing of each computing task according to at least preconfigured resource allocation policies to form at least computing subtasks includes determining an allocation number of computing resources according to at least preconfigured resource allocation policies, and dividing each computing task according to the allocation number to form at least computing subtasks, where the number of computing subtasks divided by each computing task is less than or equal to the allocation number.
Specifically, the allocated amount of computing resources may refer to the number of computing resources that the chip may invoke. Generally, the greater the number of allocations of computing resources, the greater the number of computing subtask partitions. The allocation quantity is used to determine the partitioning method of the calculation subtasks. The computing resources can be equally divided into a plurality of computing units according to the allocation number, and the number of the computing units is the same as the allocation number, so that the allocation number is equal to the number of the computing units to be allocated in each resource allocation strategy and is also equal to the maximum value of the number of the allocated computing units in each resource allocation strategy. It will be appreciated that the amount of allocation data for a computing resource may describe the operational performance of the chip, and thus the amount of allocation may be derived from a set of performance parameters for the chip.
At least computing units are required for each computing subtask to perform corresponding data processing, therefore, the computing task can be divided into a distribution number of computing subtasks at most, generally computing subtasks are performed by computing units, the distribution number can be larger than the number of computing subtasks, and at this time, at least computing units are idle and do not perform the computing subtasks.
The allocation quantity is determined through the resource allocation strategy, the quantity of the calculation subtasks is determined according to the allocation quantity, the divided calculation subtasks can be adapted to the calculation resources, and the calculation subtasks are guaranteed to be correctly executed, so that the correct simulation operation of the network model is guaranteed.
Optionally, before dividing each computing task according to at least preconfigured resource allocation policies to form at least computing subtasks, the method further includes receiving at least resource allocation tables, analyzing the resource allocation tables to obtain a combination relationship between computing resources and storage resources corresponding to each resource allocation table, and using the combination relationship between the computing resources and the storage resources corresponding to resource allocation tables as resource allocation policies.
The resource allocation table records a combination relationship between the computing resources and the storage resources, wherein the combination relationship is used for determining the computing resources and the storage resources allocated to each computing subtask, and is used for describing the relationship between the computing units and the storage units which respectively have mapping relationships with computing subtasks.
In fact, the resource allocation table allocates computing resources and storage resources for each computing subtasks, and there is a combination relationship between the computing resources and storage resources allocated to the same computing subtasks.
Illustratively, the resource configuration table is as follows:
…
accordingly, resource allocation policies of the Q computation subtasks of the computation task on the computation resources and the storage resources are:
the resource allocation table may be input by a tester to specify the resource allocation policy to be tested. And analyzing the received resource configuration table input by the user to obtain a resource allocation strategy.
For all allocation strategies (i.e. individual calculation of sub-tasks)Process of (2)Exhaustive enumeration is performed, and a K distribution method is assumed to be obtained, so that load matrixes describing chip load conditions at the step one of the whole network model can be obtained。
The resource allocation condition of each computing subtask is determined by receiving and analyzing the resource allocation table, so that flexible resource allocation is realized, the load condition of the chip is increased, the test range of the chip is increased, and the accuracy of the performance test of the chip is improved.
Optionally, the allocated number of the computing resources included in each resource configuration table is equal to the number of the processors in the chip performance parameter set. The allocated number of the computing resources is the number of the processors, that is, the number of the computing resources including the computing units is the number of the processors. The distribution quantity of the computing resources is equal to the quantity of the processors in the chip, so that the condition of the processors of the chip can be adapted, the computing resources are distributed, the rationality of resource distribution is improved, and the accuracy of performance test of the chip is improved.
Optionally, the allocating resources to all computing subtasks associated with the computing task according to each resource allocation policy to obtain an allocation data set of the computing task under each resource allocation policy includes traversing each resource allocation table from a target resource allocation table in each resource allocation table until all resource allocation tables are traversed, traversing each computing subtask from a target computing subtask in each computing subtask of the computing task in the traversing process of the resource allocation tables, selecting a target computing resource from a combination relation corresponding to the resource allocation tables for a traversed current computing subtask, acquiring a corresponding target storage resource, establishing a corresponding relation among the current computing subtask, the target computing resource and at least target storage resources until all computing subtasks are completed, wherein computing resources corresponding to each computing task of the computing task are different, storage resources corresponding to each computing subtask of the computing task are different, and all resource allocation data sets of the computing subtasks under the computing task are acquired after all resource allocation data sets of the computing subtasks under each computing task allocation table are completed.
Specifically, the target resource allocation table is resource allocation tables in all resource allocation tables, for example, the target resource allocation table is any resource allocation tables selected randomly, or all resource allocation tables are numbered and the resource allocation tables are selected according to the number order, for example, the resource allocation table with the number of 1 is selected.
The allocation data is used for describing resource allocation conditions of each computing subtask obtained by computing task decomposition under different resource allocation tables, namely, used for determining corresponding computing resources and storage resources of each computing subtask under different resource allocation tables.
And traversing the resource allocation tables from the resource allocation tables for obtaining the resource allocation condition of each resource allocation table.
The method comprises the steps of traversing all the computing subtasks of the current computing task, and establishing the corresponding relation between each computing subtask and the computing resources and the storage resources, thereby realizing the allocation of the computing resources and the storage resources for each computing subtask obtained by decomposing the current computing task and reducing the omission condition.
And repeating the steps until all the resource allocation tables are traversed. Therefore, the distribution data of each computing subtask can be obtained, so that a distribution data set of the computing tasks under different resource configuration tables is formed, and the resource distribution condition of each computing subtask under different resource configuration tables is determined.
For example, the allocation data set of the computation task in step one under resource allocation tables is as follows:
by traversing the resource allocation table and respectively establishing the corresponding relation between all the calculation subtasks obtained by the calculation tasks and the calculation resources and the storage resources allocated by the current resource allocation table when the current resource allocation table is traversed, resource allocation of the calculation subtasks based on the current resource allocation table is realized, the calculation subtasks can be allocated to the calculation resources and the storage resources, flexible resource allocation is realized, the accuracy of resource allocation is improved, and the accuracy of performance test of the chip is improved.
S240, respectively allocating resources to all computing subtasks associated with the computing task according to each resource allocation strategy to obtain an allocation data set of the computing task under each resource allocation strategy; the resources include computing resources and storage resources.
And S250, counting distribution data sets of each calculation task under each resource distribution strategy to form a load matrix of the network model.
As the previous example, the distribution data set of the calculation task in the step one under the K resource allocation tables is counted to obtain the load matrixAs follows:
and S260, calculating the running time of each calculation subtask obtained by decomposing the network model under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip for running the network model.
Illustratively, the set of performance parameters is PD.
Calculating the running time of the allocation data set of the step calculation task under the k resource configuration table based on the following formula:
The calculation tasks with the same step items, namely the calculation tasks representing the step (0,1, …, S-1) items obtained by decomposition in the calculation process of the whole network model, respectively run for long time on K different resource allocation strategies.
Each column of elementsResource allocation strategy at k-th position representing whole neural network computing taskNext calculation task item by itemThe operation time of each calculation task has a dependency relationship, and the total operation time of the network model under the k-th allocation strategy can be obtained by summing the columns.
Optionally, the calculating, according to the performance parameter set of the chip to be evaluated and the load matrix, the running time of each computation subtask obtained by decomposing the network model under each resource allocation policy to determine the performance matrix includes: according to the performance parameter set of the chip to be evaluated, calculating the input data carrying time, the input data processing consumption time and the result data carrying time of each calculation subtask in the load matrix; taking the sum of the input data transfer time, the input data processing consumption time and the result data transfer time of the calculation subtasks as the running time of the calculation subtasks under the resource allocation strategy; and forming a performance matrix of the network model according to the running time of each computing sub-task under each resource allocation strategy.
The embodiment of the invention analyzes the network structure of the network model, determines the layers in the hierarchical structure, and takes the data processing operation associated with each layer as the calculation task respectively to form at least two calculation tasks, thereby realizing the automatic decomposition of the calculation process of the network model into the calculation tasks without depending on specific hardware equipment to support the simulation of the network model, reducing the cost of the simulation operation of the network model, and simultaneously obtaining the calculation tasks by dividing the network structure, thereby improving the accuracy of the decomposition of the calculation process of the network model.
EXAMPLE III
Fig. 3 is a schematic diagram of deep learning neural network model load calculation apparatuses in the third embodiment of the present invention, and the fourth embodiment is a corresponding apparatus for implementing the deep learning neural network model load calculation method provided in the above embodiments of the present invention, and the apparatus can be implemented by software and/or hardware, and can be integrated with a computer device, etc.
Accordingly, the apparatus of the present embodiment may include:
the calculation task analysis module 310 is configured to analyze a pre-constructed network model, and decompose a calculation flow of the network model into at least two calculation tasks; wherein the at least two computing tasks have a dependency relationship;
the computing task dividing module 320 is configured to divide each computing task according to at least preconfigured resource allocation policies to form at least computing subtasks;
a resource allocation module 330, configured to allocate resources to all computation subtasks associated with the computation task according to each resource allocation policy, respectively, to obtain an allocation data set of the computation task under each resource allocation policy; the resources include computing resources and storage resources;
a load matrix generation module 340, configured to count an allocation data set of each computing task under each resource allocation policy, and form a load matrix of the network model;
and a performance matrix calculation module 350, configured to calculate, according to the performance parameter set of the chip to be evaluated and the load matrix, an operation time of each computation subtask obtained by decomposing the network model under each resource allocation policy, and determine a performance matrix, so as to evaluate performance of the chip in operating the network model.
The embodiment of the invention automatically analyzes the deep learning neural network model to form at least two calculation tasks, divides the calculation tasks according to the pre-configured resource allocation strategy in step to form calculation sub-tasks, respectively allocates resources for the calculation sub-tasks according to different resource allocation strategies to obtain load matrixes under different resource allocation strategies, and calculates the operation time of each calculation sub-task under different resource allocation strategies based on the performance parameter set of the chip to be evaluated, thereby determining the performance matrix of the chip operation network model for evaluating the performance of the chip operation network model, solving the problems of high economic cost and low efficiency of the chip simulation operation network model in the prior art, and improving the performance simulation speed of the chip operating the learning network model.
, the calculation task analysis module 310 includes a network model hierarchy analysis unit configured to analyze the network model to determine a hierarchy of the network model, where the hierarchy of the network model includes at least two layers, and form at least two calculation tasks by using data processing operations associated with the layers as calculation tasks.
, the calculation task dividing module 320 includes a resource allocation policy dividing unit configured to determine an allocation amount of the calculation resources according to at least resource allocation policies configured in advance, and divide each of the calculation tasks according to the allocation amount to form at least calculation subtasks, where the amount of the calculation subtasks obtained by dividing each of the calculation tasks is less than or equal to the allocation amount.
, the deep learning neural network model load calculation apparatus further includes a resource allocation table receiving module, configured to receive and analyze at least resource allocation tables before dividing each of the computation tasks according to at least preconfigured resource allocation policies to form at least computation subtasks, to obtain a combination relationship between the computation resources and the storage resources corresponding to each of the resource allocation tables, and use the combination relationship between the computation resources and the storage resources corresponding to resource allocation tables as resource allocation policies.
, the resource allocation module 330 includes a resource allocation table traversal parsing unit configured to traverse each of the resource allocation tables from a target resource allocation table in each of the resource allocation tables until all of the resource allocation tables are traversed, traverse each of the computing subtasks from a target computing subtask in each of the computing subtasks of the computing task in a traversal process of the resource allocation tables, select a target computing resource from a combination relationship corresponding to the resource allocation tables for a traversed current computing subtask, acquire a corresponding target storage resource, establish a corresponding relationship between the current computing subtask, the target computing resource, and at least target storage resources until all of the computing subtasks are traversed, wherein the computing resources corresponding to each of the computing subtasks of the computing task are different, the storage resources corresponding to each of the computing subtasks of the computing task are different, generate an allocation data set of the computing task in the resource allocation tables according to the computing resources corresponding to each of the computing subtasks and the storage resources corresponding to the computing subtasks, and acquire the data set of the computing data allocated under the resource allocation tables after all of the computing subtasks are traversed.
, the performance matrix calculation module 350 includes a running time calculation unit configured to calculate, according to the performance parameter set of the chip to be evaluated, input data transfer time, input data processing consumption time, and result data transfer time of each calculation subtask in the load matrix under each resource allocation policy, use a sum of the input data transfer time, the input data processing consumption time, and the result data transfer time of the calculation subtask under the resource allocation policy as the running time of the calculation subtask under the resource allocation policy, and form the performance matrix of the network model according to the running time of each calculation subtask under each resource allocation policy.
Further , the allocated amount of computing resources included in each of the resource allocation tables is equal to the number of processors in the set of chip performance parameters.
The deep learning neural network model load calculation device can execute the deep learning neural network model load calculation method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executed deep learning neural network model load calculation method.
Example four
FIG. 4 is a schematic diagram of computer devices in the fourth embodiment of the present invention, FIG. 4 is a block diagram of an exemplary computer device 12 suitable for implementing the embodiments of the present invention, and the computer device 12 shown in FIG. 4 is only examples and should not bring any limitations to the function and scope of the embodiments of the present invention.
As shown in FIG. 4, computer device 12 is embodied in a general purpose computing device, the components of computer device 12 may include, but are not limited to, or more processors or processing units 16, a system memory 28, a bus 18 that couples the various system components including the system memory 28 and the processing unit 16, and computer device 12 may be a device that is attached to the bus.
Program/utility 40 having sets (at least ) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, or more application programs, other program modules, and program data, each or some combination of these examples possibly including implementation of a network environment.
The processing unit 16 executes programs stored in the system memory 28 to perform various functional applications and data processing, such as implementing deep learning neural network model load calculation methods provided by any of the embodiments of the present invention.
EXAMPLE five
The fifth embodiment of the invention provides computer-readable storage media, wherein the computer-readable storage media are stored with computer programs, and the computer programs are implemented by a processor, when the computer programs are executed by the processor, the method for computing the load of the deep learning neural network model provided by all the embodiments of the invention of the present application comprises the steps of analyzing a pre-constructed network model, decomposing the computing process of the network model into at least two computing tasks, wherein the at least two computing tasks have dependency relationships, dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks, respectively allocating resources to all computing subtasks associated with the computing tasks according to each resource allocation strategy to obtain an allocation data set of the computing tasks under each resource allocation strategy, wherein the resources comprise computing resources and storage resources, counting the allocation data set of each computing task under each resource allocation strategy to form a load matrix of the network model, calculating the allocation data set of each computing subtask obtained by decomposing the network model under each resource allocation strategy according to-be-evaluated chip performance parameter set and the load matrix, and determining the performance matrix of the network model under each computing subtask.
A more specific example (a non-exhaustive list) of the computer readable storage medium includes an electrical connection having or more wires, a portable computer diskette, a hard disk, a RAM, a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave .
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or a combination thereof, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (10)
1, deep learning neural network model load calculation method, which is characterized by comprising:
analyzing a pre-constructed network model, and decomposing a calculation process of the network model into at least two calculation tasks; wherein, the at least two calculation tasks have a dependency relationship, and the network model is a deep learning neural network model;
dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks;
distributing resources for all computing subtasks associated with the computing task according to each resource distribution strategy to obtain a distribution data set of the computing task under each resource distribution strategy; the resources include computing resources and storage resources;
counting distribution data sets of each calculation task under each resource distribution strategy to form a load matrix of the network model;
and calculating the running time of each calculation subtask obtained by decomposing the network model under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip for running the network model.
2. The method of claim 1, wherein the parsing the pre-constructed network model to decompose the computation flow of the network model into at least two computation tasks comprises:
analyzing the network model to determine a hierarchical structure of the network model, wherein the hierarchical structure of the network model comprises at least two layers;
and taking the data processing operation associated with the layer as calculation tasks to form at least two calculation tasks.
3. The method of claim 2, wherein the partitioning of each of the computing tasks according to the at least preconfigured resource allocation policies to form at least computing subtasks comprises:
determining the distribution quantity of the computing resources according to at least pre-configured resource distribution strategies;
and dividing the calculation tasks according to the distribution quantity to form at least calculation subtasks, wherein the quantity of the calculation subtasks obtained by dividing each calculation task is less than or equal to the distribution quantity.
4. The method of claim 3, further comprising, before partitioning each of the computing tasks according to at least preconfigured resource allocation policies to form at least computing subtasks:
receiving at least resource allocation tables, and analyzing to obtain the combination relationship between the computing resources and the storage resources corresponding to each resource allocation table;
the combination relationship between the computing resources and the storage resources corresponding to resource allocation tables is used as resource allocation strategies.
5. The method according to claim 4, wherein the allocating resources to all the computing subtasks associated with the computing task according to each of the resource allocation policies to obtain an allocation data set of the computing task under each of the resource allocation policies comprises:
traversing each resource allocation table from a target resource allocation table in each resource allocation table until all resource allocation tables are traversed;
in the traversal process of the resource configuration table, traversing each computing subtask from a target computing subtask in each computing subtask of the computing tasks, selecting a target computing resource from a combination relation corresponding to the resource configuration table for the traversed current computing subtask, acquiring a corresponding target storage resource, and establishing a corresponding relation among the current computing subtask, the target computing resource and at least target storage resources until all computing subtasks are traversed;
the computing resources corresponding to each computing sub-task of the computing task are different, and the storage resources corresponding to each computing sub-task of the computing task are different;
generating an allocation data set of the computing tasks under the resource allocation table according to the computing resources corresponding to the computing sub-tasks and the corresponding storage resources;
and acquiring an allocation data set of the computing task under each resource allocation table after all the resource allocation tables are traversed.
6. The method according to claim 1, wherein the step of determining the performance matrix by calculating the running time of each computation subtask obtained by the network model decomposition under each resource allocation policy according to the performance parameter set of the chip to be evaluated and the load matrix comprises:
according to the performance parameter set of the chip to be evaluated, calculating input data carrying time, input data processing consumption time and result data carrying time of each calculation subtask in the load matrix under each resource allocation strategy;
taking the sum of the input data transfer time, the input data processing consumption time and the result data transfer time of the calculation subtasks under the resource allocation strategy as the running time of the calculation subtasks under the resource allocation strategy;
and forming a performance matrix of the network model according to the running time of each computing sub-task under each resource allocation strategy.
7. The method of claim 4, wherein the allocated amount of computing resources included in each of the resource allocation tables is equal to the number of processors in the set of chip performance parameters.
8, deep learning neural network model load calculation device, comprising:
the computing task analysis module is used for analyzing a pre-constructed network model and decomposing the computing process of the network model into at least two computing tasks; wherein, the at least two calculation tasks have a dependency relationship, and the network model is a deep learning neural network model;
the computing task dividing module is used for dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks;
the resource allocation module is used for allocating resources for all the computing subtasks associated with the computing task according to the resource allocation strategies to obtain an allocation data set of the computing task under the resource allocation strategies; the resources include computing resources and storage resources;
a load matrix generation module, configured to count an allocation data set of each computation task under each resource allocation policy to form a load matrix of the network model;
and the performance matrix calculation module is used for calculating the running time of each calculation subtask obtained by decomposing the network model under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip for running the network model.
Computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for deep learning neural network model load calculation according to any of claims 1-7 when executing the program.
10, a computer-readable storage medium, on which a computer program is stored, the program, when being executed by a processor, implementing a deep learning neural network model load calculating method as claimed in any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911008660.3A CN110515739B (en) | 2019-10-23 | 2019-10-23 | Deep learning neural network model load calculation method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911008660.3A CN110515739B (en) | 2019-10-23 | 2019-10-23 | Deep learning neural network model load calculation method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110515739A CN110515739A (en) | 2019-11-29 |
CN110515739B true CN110515739B (en) | 2020-01-31 |
Family
ID=68633608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911008660.3A Active CN110515739B (en) | 2019-10-23 | 2019-10-23 | Deep learning neural network model load calculation method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110515739B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111158901B (en) * | 2019-12-09 | 2023-09-08 | 爱芯元智半导体(宁波)有限公司 | Optimization method, optimization device, computer equipment and storage medium for calculation graph |
CN111047017B (en) * | 2019-12-18 | 2023-06-23 | 北京安兔兔科技有限公司 | Neural network algorithm evaluation method and device and electronic equipment |
CN111162946B (en) * | 2019-12-30 | 2022-07-12 | 北京奇艺世纪科技有限公司 | Method for constructing model inference network, data processing method, data processing device and storage medium |
CN111340237B (en) * | 2020-03-05 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Data processing and model running method, device and computer equipment |
CN111860758B (en) * | 2020-04-07 | 2024-05-03 | 北京嘀嘀无限科技发展有限公司 | Deep learning model operation method and device, electronic equipment and medium |
CN111738434B (en) * | 2020-06-03 | 2023-04-07 | 中国科学院计算技术研究所 | Method for executing deep neural network on heterogeneous processing unit |
CN111753973B (en) * | 2020-06-22 | 2024-11-26 | 深圳鲲云信息科技有限公司 | A neural network chip optimization method, system, device and storage medium |
CN111858070B (en) * | 2020-08-05 | 2023-12-01 | 中国工商银行股份有限公司 | Computing resource allocation method, device, equipment and storage medium |
CN112036559B (en) * | 2020-08-26 | 2024-12-27 | 北京灵汐科技有限公司 | Neural network structure division method, device, computer equipment and storage medium |
WO2022042519A1 (en) * | 2020-08-27 | 2022-03-03 | 北京灵汐科技有限公司 | Resource allocation method and apparatus, and computer device and computer-readable storage medium |
CN111984423B (en) * | 2020-09-02 | 2024-09-03 | 北京小米松果电子有限公司 | Method, device and medium for running deep learning model |
WO2022116142A1 (en) * | 2020-12-04 | 2022-06-09 | 深圳大学 | Resource scheduling method based on graph neural network |
CN112598112B (en) * | 2020-12-04 | 2021-09-10 | 深圳大学 | Resource scheduling method based on graph neural network |
US20220188620A1 (en) * | 2020-12-10 | 2022-06-16 | International Business Machines Corporation | Time estimator for deep learning architecture |
CN113268404B (en) * | 2021-05-28 | 2024-08-06 | 曙光信息产业(北京)有限公司 | Performance analysis and optimization method and device, computer equipment and storage medium |
CN113741932A (en) * | 2021-08-19 | 2021-12-03 | 浙江大华技术股份有限公司 | Intelligent identification algorithm upgrading method and device for equipment and electronic device |
CN113884857B (en) * | 2021-09-29 | 2024-03-08 | 上海阵量智能科技有限公司 | Chip, chip pressure testing method and device, electronic equipment and storage medium |
CN114020450A (en) * | 2021-10-08 | 2022-02-08 | 深圳云天励飞技术股份有限公司 | Neural network model execution method, device, system and electronic equipment |
KR20230142336A (en) * | 2022-04-01 | 2023-10-11 | 리벨리온 주식회사 | Method for measuring performance of neural processing device and Device for measuring performance |
CN114721802A (en) * | 2022-04-12 | 2022-07-08 | 北京灵汐科技有限公司 | Resource scheduling method, scheduling device and processing core |
CN117521841A (en) * | 2022-07-28 | 2024-02-06 | 华为技术有限公司 | Deep learning system and method |
CN116501594B (en) * | 2023-06-27 | 2023-09-08 | 上海燧原科技有限公司 | System modeling evaluation method and device, electronic equipment and storage medium |
CN116501505B (en) * | 2023-06-27 | 2023-09-12 | 上海燧原科技有限公司 | Method, device, equipment and medium for generating data stream of load task |
CN116737605B (en) * | 2023-08-11 | 2023-11-14 | 上海燧原科技有限公司 | Data prefetching method, device, equipment and medium based on chip multilevel storage |
CN118798275B (en) * | 2024-09-10 | 2025-02-25 | 中昊芯英(杭州)科技有限公司 | Model calculation method and related device |
CN119248499A (en) * | 2024-09-29 | 2025-01-03 | 上海稀宇极智科技有限公司 | Task processing load analysis method and device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110333945A (en) * | 2019-05-09 | 2019-10-15 | 成都信息工程大学 | A dynamic load balancing method, system and terminal |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8515885B2 (en) * | 2010-10-29 | 2013-08-20 | International Business Machines Corporation | Neuromorphic and synaptronic spiking neural network with synaptic weights learned using simulation |
CN106649060A (en) * | 2015-11-02 | 2017-05-10 | 中国移动通信集团公司 | Equipment performance testing method and device |
US10019668B1 (en) * | 2017-05-19 | 2018-07-10 | Google Llc | Scheduling neural network processing |
CN108197083B (en) * | 2018-01-31 | 2021-04-13 | 湖南农业大学 | A Data Center Workload Prediction Method Based on Wavelet Neural Network Fusion Linear Regression |
CN109901878B (en) * | 2019-02-25 | 2021-07-23 | 北京灵汐科技有限公司 | Brain-like computing chip and computing equipment |
-
2019
- 2019-10-23 CN CN201911008660.3A patent/CN110515739B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110333945A (en) * | 2019-05-09 | 2019-10-15 | 成都信息工程大学 | A dynamic load balancing method, system and terminal |
Also Published As
Publication number | Publication date |
---|---|
CN110515739A (en) | 2019-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110515739B (en) | Deep learning neural network model load calculation method, device, equipment and medium | |
US7660884B2 (en) | Apparatus, system, and method for generating a resource utilization description for a parallel data processing system | |
CN108205469B (en) | MapReduce-based resource allocation method and server | |
US20130268941A1 (en) | Determining an allocation of resources to assign to jobs of a program | |
EP2738675B1 (en) | System and method for efficient resource management of a signal flow programmed digital signal processor code | |
CN111753983A (en) | Method, system, device and storage medium for customizing neural network model | |
CN113296905A (en) | Scheduling method, scheduling device, electronic equipment, storage medium and software product | |
CN112068957A (en) | Resource allocation method, device, computer equipment and storage medium | |
US11809849B1 (en) | Global modulo allocation in neural network compilation | |
CN116467061B (en) | A method, device, storage medium and electronic equipment for task execution | |
CN110413539B (en) | Data processing method and device | |
CN114816711A (en) | Batch task processing method and device, computer equipment and storage medium | |
CN116701001B (en) | Target task allocation method, device, electronic equipment and storage medium | |
CN115705496A (en) | Quantum computer operating system and quantum computer | |
CN118963941A (en) | Task allocation method and device | |
CN116560968A (en) | Simulation calculation time prediction method, system and equipment based on machine learning | |
CN113704687B (en) | Tensor calculation operation method, device and operation system | |
CN118210615A (en) | Resource allocation method and device | |
CN115759260A (en) | Inference method and device of deep learning model, electronic equipment and storage medium | |
CN111984418B (en) | Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks | |
CN115827225A (en) | Distribution method of heterogeneous operation, model training method, device, chip, equipment and medium | |
Meng et al. | PEARL: Enabling portable, productive, and high-performance deep reinforcement learning using heterogeneous platforms | |
Filelis-Papadopoulos et al. | Characterization of hardware in self-managing self-organizing cloud environment | |
US20240062045A1 (en) | Method and system for latency optimized heterogeneous deployment of convolutional neural network | |
CN117573523B (en) | A parallel fuzz testing method based on complementarity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |