[go: up one dir, main page]

CN110515739B - Deep learning neural network model load calculation method, device, equipment and medium - Google Patents

Deep learning neural network model load calculation method, device, equipment and medium Download PDF

Info

Publication number
CN110515739B
CN110515739B CN201911008660.3A CN201911008660A CN110515739B CN 110515739 B CN110515739 B CN 110515739B CN 201911008660 A CN201911008660 A CN 201911008660A CN 110515739 B CN110515739 B CN 110515739B
Authority
CN
China
Prior art keywords
computing
resource allocation
calculation
network model
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911008660.3A
Other languages
Chinese (zh)
Other versions
CN110515739A (en
Inventor
黎兴民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Suiyuan Intelligent Technology Co Ltd
Original Assignee
Shanghai Suiyuan Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Suiyuan Intelligent Technology Co Ltd filed Critical Shanghai Suiyuan Intelligent Technology Co Ltd
Priority to CN201911008660.3A priority Critical patent/CN110515739B/en
Publication of CN110515739A publication Critical patent/CN110515739A/en
Application granted granted Critical
Publication of CN110515739B publication Critical patent/CN110515739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention discloses a load calculation method, a device, equipment and a medium for deep learning neural network models, wherein the method comprises the steps of analyzing a pre-constructed network model, decomposing a calculation flow of the network model into at least two calculation tasks, dividing each calculation task to form at least calculation subtasks, respectively allocating resources for all calculation subtasks related to the calculation tasks according to each resource allocation strategy to obtain an allocation data set of the calculation tasks under each resource allocation strategy, counting the allocation data set of the calculation tasks under each resource allocation strategy to form a load matrix of the network model, and calculating the operation time of each calculation subtask according to a performance parameter set of a chip to be evaluated to determine a performance matrix.

Description

Deep learning neural network model load calculation method, device, equipment and medium
Technical Field
The embodiment of the invention relates to the field of data processing, in particular to deep learning neural network model load calculation methods, devices, equipment and media.
Background
The rapid development of the artificial intelligence industry has put higher demands on the computing power of computers, and various major semiconductor manufacturers are actively developing and launching special chips aiming at accelerating deep learning training and reasoning processes.
The development and manufacturing of chips are relatively long processes, and generally, the rationality verification of chip architecture design and the evaluation of computing performance need to be performed after small-batch production and sample acquisition, so that the iterative cycle of product development can be greatly increased, and even the product time to market can be indefinitely delayed, which is unacceptable for various semiconductor manufacturers.
The existing solution is to use a dedicated server to simulate a chip architecture, and provide complete sets of matched software and hardware solutions by a server, and perform performance verification of the chip based on the solutions, but the solution is expensive in manufacturing cost, and the simulation service software has a slow operating speed, generally simple test samples need to operate for hours, and in addition, for verification of an accelerated chip architecture supporting parallel computing, the splitting of computing tasks and different scheduling strategies of on-chip hardware resources lead to different operating loads of the chip, thereby bringing different performance performances, and the trial and exploration of the strategies are helpful for finding structural defects at the beginning of chip design.
Disclosure of Invention
The embodiment of the invention provides deep learning neural network model load calculation methods, devices, equipment and media, which can improve the performance simulation speed of a chip running a learning network model.
, the embodiment of the invention provides methods for calculating the load of deep learning neural network models, which include:
analyzing a pre-constructed network model, and decomposing a calculation process of the network model into at least two calculation tasks; wherein the at least two computing tasks have a dependency relationship;
dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks;
distributing resources for all computing subtasks associated with the computing task according to each resource distribution strategy to obtain a distribution data set of the computing task under each resource distribution strategy; the resources include computing resources and storage resources;
counting distribution data sets of each calculation task under each resource distribution strategy to form a load matrix of the network model;
and calculating the running time of each calculation subtask obtained by decomposing the network model under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip for running the network model.
In a second aspect, an embodiment of the present invention provides deep learning neural network model load calculation apparatuses, including:
the computing task analysis module is used for analyzing a pre-constructed network model and decomposing the computing process of the network model into at least two computing tasks; wherein the at least two computing tasks have a dependency relationship;
the computing task dividing module is used for dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks;
the resource allocation module is used for allocating resources for all the computing subtasks associated with the computing task according to the resource allocation strategies to obtain an allocation data set of the computing task under the resource allocation strategies; the resources include computing resources and storage resources;
a load matrix generation module, configured to count an allocation data set of each computation task under each resource allocation policy to form a load matrix of the network model;
and the performance matrix calculation module is used for calculating the running time of each calculation subtask obtained by decomposing the network model under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip for running the network model.
In a third aspect, an embodiment of the present invention further provides computer apparatuses, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the deep learning neural network model load calculation method according to any in the embodiment of the present invention when the processor executes the program.
In a fourth aspect, the present invention further provides computer-readable storage media, where the computer program is stored, and when the computer program is executed by a processor, the deep learning neural network model load calculation method according to any in the present invention is implemented.
The embodiment of the invention automatically analyzes the deep learning neural network model to form at least two calculation tasks, divides the calculation tasks according to the pre-configured resource allocation strategy in step to form calculation sub-tasks, respectively allocates resources for the calculation sub-tasks according to different resource allocation strategies to obtain load matrixes under different resource allocation strategies, and calculates the operation time of each calculation sub-task under different resource allocation strategies based on the performance parameter set of the chip to be evaluated, thereby determining the performance matrix of the chip operation network model for evaluating the performance of the chip operation network model, solving the problems of high economic cost and low efficiency of the chip simulation operation network model in the prior art, and improving the performance simulation speed of the chip operating the learning network model.
Drawings
FIG. 1 is a flowchart of a deep learning neural network model load calculation method in of an embodiment of the present invention;
FIG. 2 is a flowchart of methods for calculating the load of deep learning neural network model according to the second embodiment of the present invention;
FIG. 3 is a schematic structural diagram of deep learning neural network model load calculation apparatuses according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of computer devices in the fourth embodiment of the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the drawings and examples, it being understood that the specific embodiments herein described are merely illustrative of and not restrictive on the broad invention, and it should be further noted that for the purposes of description, only some, but not all, of the structures associated with the present invention are shown in the drawings.
Example
Fig. 1 is a flowchart of a deep learning neural network model load calculation method in , where this embodiment is applicable to a case of simulating a process of running a network model on a chip, and the method can be executed by a deep learning neural network model load calculation apparatus provided in this embodiment of the present invention, the apparatus can be implemented in a software and/or hardware manner, and can be integrated into a computer device, such as a terminal device or a server, as shown in fig. 1, the method of this embodiment specifically includes:
s110, analyzing a pre-constructed network model, and decomposing a calculation process of the network model into at least two calculation tasks; wherein the at least two computing tasks have a dependency relationship.
The network model may be referred to as a Deep Learning Neural network model (Deep Learning Neural Networks).
The computational flow of the network model is used to represent a plurality of successive computational steps that the network model needs to perform at runtime. Wherein the calculation flow may be converted into a plurality of successive calculation steps.
A computational task is used to represent a certain computational step or steps. Each computational task is different.
The method comprises the steps of combining a plurality of computing tasks with dependency relationships in order, namely, a computing process of a network model is a computing task sequence, and the arrangement order of the computing tasks in the computing task sequence is the execution order of the computing tasks.
Illustratively, the data processing operations may include Padding (Padding), morphing (Reshape), Convolution (Convolution), and Pooling (Pooling), among others.
The network model may be built through a predefined programming interface. The user can input data related to the network model through a predefined programming interface to build the neural network model. Illustratively, the network model is established as
Figure DEST_PATH_IMAGE002
Wherein
Figure DEST_PATH_IMAGE004
representing some level of the neural network.
It will be appreciated that the user may transfer the network model to the programming interface, and the structure of the network model may be obtained via the programming interface, i.e., the structure of each layers in the network model is determined and fixed during subsequent processingProcessing so that data processing at each level can be used as a computational task for the network model
Figure DEST_PATH_IMAGE006
S continuous computing task sequences can be obtained
Figure DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE010
Wherein, from
Figure DEST_PATH_IMAGE012
To
Figure DEST_PATH_IMAGE014
In, computing task
Figure DEST_PATH_IMAGE016
Is inputted as
Figure DEST_PATH_IMAGE018
(ii) a The boxes representing computational tasksThere is a dependency relationship between the first calculation tasks as the input of the last calculation tasks, and the S calculation tasks are executed sequentially to ensure the correctness of the calculation result.
In fact, the computation task is divided from the viewpoint of the running time sequence of the network model, that is, the computation task is divided from the time of the computation flow of the network model.
And S120, dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks.
Generally, a resource allocation policy is used to allocate resources required for executing a computing task, and the resource allocation policy may refer to a resource allocation manner, where the resources may specifically include computing resources and storage resources. The computing resources are used to perform computing tasks. The storage resources are used for storing data associated with executing the computing task. The calculation sub-tasks are used for forming the calculation tasks and are part of calculation in the calculation tasks.
In practice, the computing task may also be further divided, e.g., subdivided into a plurality of computing subtasks. Each calculation subtask is different and independent, and all calculation subtasks form a complete calculation task. The dividing manner may be that the calculation amount of the calculation task is divided into n calculation subtasks equally, where n is greater than or equal to 1, and may be specifically set according to needs, which is not limited in the embodiments of the present invention. Exemplarily, the calculation task is convolution calculation on 10 feature maps, and the calculation task can be divided into 10 calculation subtasks, wherein each calculation subtask is convolution calculation on 1 feature map; or the calculation task can be divided into 5 calculation subtasks, each calculation subtask is used for performing convolution calculation on 2 feature maps, and the feature maps of the convolution calculation performed by each calculation subtask are different from each other.
As in the previous example, the computing task is
Figure 44426DEST_PATH_IMAGE016
The relationship between the computing task and the computing subtask is as follows:wherein
Figure DEST_PATH_IMAGE022
for the computation subtasks, each computation subtask has the same subscript step representation, and the computation subtasks are commonly assigned to the computation taskSuperscript differencing slave computing tasks
Figure 151196DEST_PATH_IMAGE016
The Q items split in the method represent that each computation subtask has no dependency relationship and exists in parallel.
The dividing mode of each computing task can be determined according to the resource allocation strategy, that is, the number of computing subtasks which can be divided into each computing task is determined according to the resource allocation strategy, that is, the number of n is determined.
Specifically, the computing resource comprises a plurality of computing units, the number of the computing units is n, namely the computing task is divided into n computing subtasks, so that computing units execute computing subtasks.
In addition, the space size of the storage resource can be used as the basis for dividing the calculation sub-tasks; or the division mode of the calculation subtasks can be determined comprehensively according to the calculation resources and the storage resources. The specific configuration may be set as required, and the embodiment of the present invention is not particularly limited.
In summary, a computing sub-task is actually a unit task that divides the computing task into units to be executed in parallel at time .
S130, respectively allocating resources to all the computing subtasks associated with the computing task according to each resource allocation strategy to obtain an allocation data set of the computing task under each resource allocation strategy; the resources include computing resources and storage resources.
The distribution data set is used for describing resource distribution conditions of all the computing subtasks of the computing task division, and the distribution data set records the mapping relation between all the computing subtasks of the computing task division and resources.
The resource allocation for the computation subtasks is actually the allocation of computation resources and storage resources for the computation subtasks, that is, the allocation of processors and storage space for the computation subtasks.
, the allocation of resources for different resource allocation policies is different, in that the computing resources and/or storage resources are quantified and allocated to the computing subtasks in different quantities.
Specifically, the computing resources may be equally divided into N computing units, i.e., the computing resources are grouped into
Figure DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE026
computing units, it should be noted that the computing resource generally refers to a processor on a chip, computing units are an integer number of processors, wherein computing units may include processors, computing units may include two processors, or even more, and therefore, the embodiment of the present invention is not particularly limited.
Simultaneously, the storage resources are evenly divided into M equal parts to obtain a storage resource set
Figure DEST_PATH_IMAGE028
The storage resources are the storage space allocated for running the network model on the chip, and the storage space can be divided equally.
In practice, resource allocation policies specify that compute subtasks can be allocated the number of compute units, and the number of storage units
Figure DEST_PATH_IMAGE032
Figure DEST_PATH_IMAGE034
I.e. to ensureAll the computing sub-tasks obtained by computing task division are just divided into all the computing resources C and the storage resources D, and cannot be divided more or less, and the computing sub-tasks cannot be overlapped with each other.
Under resource allocation strategies, each computing sub-task obtained by computing task division is allocated to at least computing units and at least storage units, correspondingly, under different resource allocation strategies, each computing sub-task is allocated to different numbers of computing units and different numbers of storage units, and each computing sub-task under each resource allocation strategy, the number of the computing units allocated to each computing sub-task and the number of the storage units allocated to each computing sub-task are used as sequences, so that sets can be obtained by counting data of different resource allocation strategies, wherein the sets are actually allocation data sets of the computing tasks under each resource allocation strategy.
And S140, counting distribution data sets of the calculation tasks under the resource distribution strategies to form a load matrix of the network model.
The load matrix is used for describing the resource allocation condition of the network model under different resource allocation strategies. The load matrix records the mapping relation between each calculation stage and resource allocation in the operation process of the network model.
According to the method, the distribution data set of each calculation task under each resource distribution strategy is obtained, and a load matrix of the network model is formed.
S150, calculating the running time of each calculation subtask obtained by the network model decomposition under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip running the network model.
The performance parameter set is used for describing the performance of the chip to be evaluated. The performance parameter set records performance parameters of the chip, and is illustratively a Performance Dictionary (PD). Runtime is used to describe the length of time spent performing a computation subtask. The performance matrix is used for evaluating the operation performance of the chip to be evaluated. And calculating the running time of each calculation subtask according to the performance parameters of the chip and the calculation unit and the storage unit which are distributed by each calculation subtask under different resource distribution strategies to form a performance matrix corresponding to the load matrix. Therefore, the running time of each computing subtask obtained by decomposing the network model under different resource allocation strategies can be determined, namely the running time of the network model under different resource allocation strategies is determined, and therefore the performance of the chip running network model is determined. It will be appreciated that the shorter the run time of the network model, the higher the performance of the chip running the network model.
Optionally, the calculating, according to the performance parameter set of the chip to be evaluated and the load matrix, the running time of each computation subtask obtained by decomposing the network model under each resource allocation policy to determine the performance matrix includes: according to the performance parameter set of the chip to be evaluated, calculating input data carrying time, input data processing consumption time and result data carrying time of each calculation subtask in the load matrix under each resource allocation strategy; taking the sum of the input data transfer time, the input data processing consumption time and the result data transfer time of the calculation subtasks under the resource allocation strategy as the running time of the calculation subtasks under the resource allocation strategy; and forming a performance matrix of the network model according to the running time of each computing sub-task under each resource allocation strategy.
Typically, the execution of computation subtasks is completed through three processes of reading input data from the storage space, performing data processing on the input data, and writing result data obtained by the data processing into the storage space, so that the corresponding operation time of computation subtasks includes input data transfer time, input data processing consumption time, and result data transfer time.
The input data transfer time is used to describe a time period for the computing sub-task to acquire the input data, and may refer to a time period for transferring the input data from the allocated storage resource to the allocated computing resource. Specifically, the input data transfer time may be calculated based on the following formula:
Figure DEST_PATH_IMAGE036
wherein,
Figure DEST_PATH_IMAGE038
the method is used for calculating the size of input data of a subtask, taking Byte as a statistical unit, and input bandwidth is a slave storage resource
Figure DEST_PATH_IMAGE040
Porting to computing resources
Figure DEST_PATH_IMAGE042
The data transmission bandwidth of the time can be obtained from the performance parameter set by taking Byte/s as a unit.
The processing consumption time of the input data is used for describing the time length consumed by the computing subtask for processing the input data correspondingly, and may refer to computing resourcesThe length of time the input data is processed. Specifically, the processing consumption time of the input data may be calculated based on the following formula:
Figure DEST_PATH_IMAGE044
wherein, the cost of data processing per byte refers to the time consumed for processing each byte of data. The time consumed to process each byte of data may be calculated by a computational task
Figure 273184DEST_PATH_IMAGE016
May also be obtained from a set of performance parameters.
The result data transfer time is used to describe a time period for the computing subtask to output the result data, and may refer to a time period for transferring the result data from the allocated computing resource to the allocated storage resource. The resulting data transfer time may be specifically calculated based on the following formula:
Figure DEST_PATH_IMAGE046
wherein,
Figure 477900DEST_PATH_IMAGE038
is the size of the result data obtained by the calculation subtask, and takes Byte as the statistical unit, and output bandwidth is the calculation resourceTransporting computed result data to storage resources
Figure 814521DEST_PATH_IMAGE040
The data transmission bandwidth, in terms of Byte/s, can be obtained from the set of performance parameters.
And taking the sum of the input data transfer time, the processing consumption time of the input data and the result data transfer time of the calculation subtask as the running time of the calculation subtask under the resource allocation strategy.
And counting the running time of different computing subtasks under different resource allocation strategies, and taking the running time of the computing subtask with the longest running time as the running time of the computing task. It will be appreciated that the computation subtasks into which the computation task is decomposed are performed in parallel, such that the running time of the computation task is equal to the longest duration of the running time of the computation subtasks. Determining a running time of the computing task based on the following formula:
Figure DEST_PATH_IMAGE048
Figure DEST_PATH_IMAGE050
Figure DEST_PATH_IMAGE052
wherein,
Figure DEST_PATH_IMAGE054
the running time of the calculation task for the first step.
The method comprises the steps of determining the running time of each calculation subtask under each resource allocation strategy and accurately determining the running time of each calculation subtask by calculating the input data carrying time, the input data processing consumption time and the result data carrying time of each calculation subtask under each resource allocation strategy, thereby accurately counting the time consumed by the running of the whole network model.
The embodiment of the invention automatically analyzes the deep learning neural network model to form at least two calculation tasks, divides the calculation tasks according to the pre-configured resource allocation strategy in step to form calculation sub-tasks, respectively allocates resources for the calculation sub-tasks according to different resource allocation strategies to obtain load matrixes under different resource allocation strategies, and calculates the operation time of each calculation sub-task under different resource allocation strategies based on the performance parameter set of the chip to be evaluated, thereby determining the performance matrix of the chip operation network model for evaluating the performance of the chip operation network model, solving the problems of high economic cost and low efficiency of the chip simulation operation network model in the prior art, and improving the performance simulation speed of the chip operating the learning network model.
Example two
Fig. 2 is a flowchart of deep learning neural network model load calculation methods according to a second embodiment of the present invention, which is embodied on the basis of the above embodiments, and is configured to analyze a pre-constructed network model and decompose a calculation flow of the network model into at least two calculation tasks, specifically, analyze the network model and determine a hierarchical structure of the network model, where the hierarchical structure of the network model includes at least two layers, and form at least two calculation tasks by using data processing operations associated with the layers as calculation tasks.
Specifically, the method of this embodiment specifically includes:
s210, analyzing a pre-constructed network model, and determining the hierarchical structure of the network model, wherein the hierarchical structure of the network model comprises at least two layers.
The hierarchical structure of the network model comprises at least two layers, namely the calculation process of the network model can be decomposed into at least two calculation tasks.
The network model, the computing task, the dependency relationship, the resource allocation policy, the computing subtask, the allocation data set, the performance parameter set, and the performance matrix in this embodiment may refer to the description of the foregoing embodiments.
S220, taking the data processing operation associated with the layer as calculation tasks to form at least two calculation tasks, wherein the at least two calculation tasks have a dependency relationship.
The data processing operation related to the layers is calculation tasks, namely the data processing operation executed by nodes is calculation tasks, the result obtained by data processing performed by nodes is transmitted to the next nodes as input, so that the next nodes continue to perform the data processing operation, correspondingly, the calculation result of calculation tasks is used as the input data of the next calculation tasks, and therefore, the calculation flow of the network model is mapped to a plurality of calculation tasks with dependency relationship.
And S230, dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks.
Optionally, the dividing of each computing task according to at least preconfigured resource allocation policies to form at least computing subtasks includes determining an allocation number of computing resources according to at least preconfigured resource allocation policies, and dividing each computing task according to the allocation number to form at least computing subtasks, where the number of computing subtasks divided by each computing task is less than or equal to the allocation number.
Specifically, the allocated amount of computing resources may refer to the number of computing resources that the chip may invoke. Generally, the greater the number of allocations of computing resources, the greater the number of computing subtask partitions. The allocation quantity is used to determine the partitioning method of the calculation subtasks. The computing resources can be equally divided into a plurality of computing units according to the allocation number, and the number of the computing units is the same as the allocation number, so that the allocation number is equal to the number of the computing units to be allocated in each resource allocation strategy and is also equal to the maximum value of the number of the allocated computing units in each resource allocation strategy. It will be appreciated that the amount of allocation data for a computing resource may describe the operational performance of the chip, and thus the amount of allocation may be derived from a set of performance parameters for the chip.
At least computing units are required for each computing subtask to perform corresponding data processing, therefore, the computing task can be divided into a distribution number of computing subtasks at most, generally computing subtasks are performed by computing units, the distribution number can be larger than the number of computing subtasks, and at this time, at least computing units are idle and do not perform the computing subtasks.
The allocation quantity is determined through the resource allocation strategy, the quantity of the calculation subtasks is determined according to the allocation quantity, the divided calculation subtasks can be adapted to the calculation resources, and the calculation subtasks are guaranteed to be correctly executed, so that the correct simulation operation of the network model is guaranteed.
Optionally, before dividing each computing task according to at least preconfigured resource allocation policies to form at least computing subtasks, the method further includes receiving at least resource allocation tables, analyzing the resource allocation tables to obtain a combination relationship between computing resources and storage resources corresponding to each resource allocation table, and using the combination relationship between the computing resources and the storage resources corresponding to resource allocation tables as resource allocation policies.
The resource allocation table records a combination relationship between the computing resources and the storage resources, wherein the combination relationship is used for determining the computing resources and the storage resources allocated to each computing subtask, and is used for describing the relationship between the computing units and the storage units which respectively have mapping relationships with computing subtasks.
In fact, the resource allocation table allocates computing resources and storage resources for each computing subtasks, and there is a combination relationship between the computing resources and storage resources allocated to the same computing subtasks.
Illustratively, the resource configuration table is as follows:
Figure DEST_PATH_IMAGE056
Figure DEST_PATH_IMAGE060
accordingly, resource allocation policies of the Q computation subtasks of the computation task on the computation resources and the storage resources are:
wherein,
Figure DEST_PATH_IMAGE064
the resource allocation table may be input by a tester to specify the resource allocation policy to be tested. And analyzing the received resource configuration table input by the user to obtain a resource allocation strategy.
For all allocation strategies (i.e. individual calculation of sub-tasks)
Figure DEST_PATH_IMAGE066
Process of (2)Exhaustive enumeration is performed, and a K distribution method is assumed to be obtained, so that load matrixes describing chip load conditions at the step one of the whole network model can be obtained
Figure DEST_PATH_IMAGE068
The resource allocation condition of each computing subtask is determined by receiving and analyzing the resource allocation table, so that flexible resource allocation is realized, the load condition of the chip is increased, the test range of the chip is increased, and the accuracy of the performance test of the chip is improved.
Optionally, the allocated number of the computing resources included in each resource configuration table is equal to the number of the processors in the chip performance parameter set. The allocated number of the computing resources is the number of the processors, that is, the number of the computing resources including the computing units is the number of the processors. The distribution quantity of the computing resources is equal to the quantity of the processors in the chip, so that the condition of the processors of the chip can be adapted, the computing resources are distributed, the rationality of resource distribution is improved, and the accuracy of performance test of the chip is improved.
Optionally, the allocating resources to all computing subtasks associated with the computing task according to each resource allocation policy to obtain an allocation data set of the computing task under each resource allocation policy includes traversing each resource allocation table from a target resource allocation table in each resource allocation table until all resource allocation tables are traversed, traversing each computing subtask from a target computing subtask in each computing subtask of the computing task in the traversing process of the resource allocation tables, selecting a target computing resource from a combination relation corresponding to the resource allocation tables for a traversed current computing subtask, acquiring a corresponding target storage resource, establishing a corresponding relation among the current computing subtask, the target computing resource and at least target storage resources until all computing subtasks are completed, wherein computing resources corresponding to each computing task of the computing task are different, storage resources corresponding to each computing subtask of the computing task are different, and all resource allocation data sets of the computing subtasks under the computing task are acquired after all resource allocation data sets of the computing subtasks under each computing task allocation table are completed.
Specifically, the target resource allocation table is resource allocation tables in all resource allocation tables, for example, the target resource allocation table is any resource allocation tables selected randomly, or all resource allocation tables are numbered and the resource allocation tables are selected according to the number order, for example, the resource allocation table with the number of 1 is selected.
The allocation data is used for describing resource allocation conditions of each computing subtask obtained by computing task decomposition under different resource allocation tables, namely, used for determining corresponding computing resources and storage resources of each computing subtask under different resource allocation tables.
And traversing the resource allocation tables from the resource allocation tables for obtaining the resource allocation condition of each resource allocation table.
The method comprises the steps of traversing all the computing subtasks of the current computing task, and establishing the corresponding relation between each computing subtask and the computing resources and the storage resources, thereby realizing the allocation of the computing resources and the storage resources for each computing subtask obtained by decomposing the current computing task and reducing the omission condition.
And repeating the steps until all the resource allocation tables are traversed. Therefore, the distribution data of each computing subtask can be obtained, so that a distribution data set of the computing tasks under different resource configuration tables is formed, and the resource distribution condition of each computing subtask under different resource configuration tables is determined.
For example, the allocation data set of the computation task in step one under resource allocation tables is as follows:
Figure DEST_PATH_IMAGE062A
by traversing the resource allocation table and respectively establishing the corresponding relation between all the calculation subtasks obtained by the calculation tasks and the calculation resources and the storage resources allocated by the current resource allocation table when the current resource allocation table is traversed, resource allocation of the calculation subtasks based on the current resource allocation table is realized, the calculation subtasks can be allocated to the calculation resources and the storage resources, flexible resource allocation is realized, the accuracy of resource allocation is improved, and the accuracy of performance test of the chip is improved.
S240, respectively allocating resources to all computing subtasks associated with the computing task according to each resource allocation strategy to obtain an allocation data set of the computing task under each resource allocation strategy; the resources include computing resources and storage resources.
And S250, counting distribution data sets of each calculation task under each resource distribution strategy to form a load matrix of the network model.
As the previous example, the distribution data set of the calculation task in the step one under the K resource allocation tables is counted to obtain the load matrixAs follows:
and S260, calculating the running time of each calculation subtask obtained by decomposing the network model under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip for running the network model.
Illustratively, the set of performance parameters is PD.
Calculating the running time of the allocation data set of the step calculation task under the k resource configuration table based on the following formula
Figure 574928DEST_PATH_IMAGE054
Figure DEST_PATH_IMAGE074
Figure DEST_PATH_IMAGE050A
Figure DEST_PATH_IMAGE052A
Further, the performance matrix is calculated based on the following formula
Figure DEST_PATH_IMAGE076
Figure DEST_PATH_IMAGE078
Wherein, each line element
Figure DEST_PATH_IMAGE080
The calculation tasks with the same step items, namely the calculation tasks representing the step (0,1, …, S-1) items obtained by decomposition in the calculation process of the whole network model, respectively run for long time on K different resource allocation strategies.
Each column of elements
Figure DEST_PATH_IMAGE082
Resource allocation strategy at k-th position representing whole neural network computing task
Figure DEST_PATH_IMAGE084
Next calculation task item by item
Figure DEST_PATH_IMAGE086
The operation time of each calculation task has a dependency relationship, and the total operation time of the network model under the k-th allocation strategy can be obtained by summing the columns.
Optionally, the calculating, according to the performance parameter set of the chip to be evaluated and the load matrix, the running time of each computation subtask obtained by decomposing the network model under each resource allocation policy to determine the performance matrix includes: according to the performance parameter set of the chip to be evaluated, calculating the input data carrying time, the input data processing consumption time and the result data carrying time of each calculation subtask in the load matrix; taking the sum of the input data transfer time, the input data processing consumption time and the result data transfer time of the calculation subtasks as the running time of the calculation subtasks under the resource allocation strategy; and forming a performance matrix of the network model according to the running time of each computing sub-task under each resource allocation strategy.
The embodiment of the invention analyzes the network structure of the network model, determines the layers in the hierarchical structure, and takes the data processing operation associated with each layer as the calculation task respectively to form at least two calculation tasks, thereby realizing the automatic decomposition of the calculation process of the network model into the calculation tasks without depending on specific hardware equipment to support the simulation of the network model, reducing the cost of the simulation operation of the network model, and simultaneously obtaining the calculation tasks by dividing the network structure, thereby improving the accuracy of the decomposition of the calculation process of the network model.
EXAMPLE III
Fig. 3 is a schematic diagram of deep learning neural network model load calculation apparatuses in the third embodiment of the present invention, and the fourth embodiment is a corresponding apparatus for implementing the deep learning neural network model load calculation method provided in the above embodiments of the present invention, and the apparatus can be implemented by software and/or hardware, and can be integrated with a computer device, etc.
Accordingly, the apparatus of the present embodiment may include:
the calculation task analysis module 310 is configured to analyze a pre-constructed network model, and decompose a calculation flow of the network model into at least two calculation tasks; wherein the at least two computing tasks have a dependency relationship;
the computing task dividing module 320 is configured to divide each computing task according to at least preconfigured resource allocation policies to form at least computing subtasks;
a resource allocation module 330, configured to allocate resources to all computation subtasks associated with the computation task according to each resource allocation policy, respectively, to obtain an allocation data set of the computation task under each resource allocation policy; the resources include computing resources and storage resources;
a load matrix generation module 340, configured to count an allocation data set of each computing task under each resource allocation policy, and form a load matrix of the network model;
and a performance matrix calculation module 350, configured to calculate, according to the performance parameter set of the chip to be evaluated and the load matrix, an operation time of each computation subtask obtained by decomposing the network model under each resource allocation policy, and determine a performance matrix, so as to evaluate performance of the chip in operating the network model.
The embodiment of the invention automatically analyzes the deep learning neural network model to form at least two calculation tasks, divides the calculation tasks according to the pre-configured resource allocation strategy in step to form calculation sub-tasks, respectively allocates resources for the calculation sub-tasks according to different resource allocation strategies to obtain load matrixes under different resource allocation strategies, and calculates the operation time of each calculation sub-task under different resource allocation strategies based on the performance parameter set of the chip to be evaluated, thereby determining the performance matrix of the chip operation network model for evaluating the performance of the chip operation network model, solving the problems of high economic cost and low efficiency of the chip simulation operation network model in the prior art, and improving the performance simulation speed of the chip operating the learning network model.
, the calculation task analysis module 310 includes a network model hierarchy analysis unit configured to analyze the network model to determine a hierarchy of the network model, where the hierarchy of the network model includes at least two layers, and form at least two calculation tasks by using data processing operations associated with the layers as calculation tasks.
, the calculation task dividing module 320 includes a resource allocation policy dividing unit configured to determine an allocation amount of the calculation resources according to at least resource allocation policies configured in advance, and divide each of the calculation tasks according to the allocation amount to form at least calculation subtasks, where the amount of the calculation subtasks obtained by dividing each of the calculation tasks is less than or equal to the allocation amount.
, the deep learning neural network model load calculation apparatus further includes a resource allocation table receiving module, configured to receive and analyze at least resource allocation tables before dividing each of the computation tasks according to at least preconfigured resource allocation policies to form at least computation subtasks, to obtain a combination relationship between the computation resources and the storage resources corresponding to each of the resource allocation tables, and use the combination relationship between the computation resources and the storage resources corresponding to resource allocation tables as resource allocation policies.
, the resource allocation module 330 includes a resource allocation table traversal parsing unit configured to traverse each of the resource allocation tables from a target resource allocation table in each of the resource allocation tables until all of the resource allocation tables are traversed, traverse each of the computing subtasks from a target computing subtask in each of the computing subtasks of the computing task in a traversal process of the resource allocation tables, select a target computing resource from a combination relationship corresponding to the resource allocation tables for a traversed current computing subtask, acquire a corresponding target storage resource, establish a corresponding relationship between the current computing subtask, the target computing resource, and at least target storage resources until all of the computing subtasks are traversed, wherein the computing resources corresponding to each of the computing subtasks of the computing task are different, the storage resources corresponding to each of the computing subtasks of the computing task are different, generate an allocation data set of the computing task in the resource allocation tables according to the computing resources corresponding to each of the computing subtasks and the storage resources corresponding to the computing subtasks, and acquire the data set of the computing data allocated under the resource allocation tables after all of the computing subtasks are traversed.
, the performance matrix calculation module 350 includes a running time calculation unit configured to calculate, according to the performance parameter set of the chip to be evaluated, input data transfer time, input data processing consumption time, and result data transfer time of each calculation subtask in the load matrix under each resource allocation policy, use a sum of the input data transfer time, the input data processing consumption time, and the result data transfer time of the calculation subtask under the resource allocation policy as the running time of the calculation subtask under the resource allocation policy, and form the performance matrix of the network model according to the running time of each calculation subtask under each resource allocation policy.
Further , the allocated amount of computing resources included in each of the resource allocation tables is equal to the number of processors in the set of chip performance parameters.
The deep learning neural network model load calculation device can execute the deep learning neural network model load calculation method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executed deep learning neural network model load calculation method.
Example four
FIG. 4 is a schematic diagram of computer devices in the fourth embodiment of the present invention, FIG. 4 is a block diagram of an exemplary computer device 12 suitable for implementing the embodiments of the present invention, and the computer device 12 shown in FIG. 4 is only examples and should not bring any limitations to the function and scope of the embodiments of the present invention.
As shown in FIG. 4, computer device 12 is embodied in a general purpose computing device, the components of computer device 12 may include, but are not limited to, or more processors or processing units 16, a system memory 28, a bus 18 that couples the various system components including the system memory 28 and the processing unit 16, and computer device 12 may be a device that is attached to the bus.
Bus 18 represents or more of several types of bus structures, including a memory bus or memory controller, a Peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures, including, but not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA (enhanced ISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus, to name a few.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
System Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32 computer device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media storage systems 34 may be used to Read from and write to non-removable, non-volatile magnetic media (not shown in fig. 4, commonly referred to as "hard drives"). although not shown in fig. 4, magnetic disk drives may be provided for reading from and writing to removable non-volatile magnetic disks (e.g., "floppy disks"), and optical disk drives for reading from and writing to removable non-volatile optical disks (e.g., Compact disk Read-Only memories (CD-ROMs), Digital Video disks-Read Only memories (DVD-ROMs), or other optical media) may be provided, by way of or multiple data media interfaces, system Memory 28 may include at least program products having at least one program module (e.g., ) configured to execute embodiments of the present invention.
Program/utility 40 having sets (at least ) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, or more application programs, other program modules, and program data, each or some combination of these examples possibly including implementation of a network environment.
Computer device 12 may also communicate with or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), or more devices that enable a user to interact with the computer device 12, and/or any device (e.g., Network card, modem, etc.) that enables the computer device 12 to communicate with or more other computing devices.this communication may be via an Input/Output (I/O) interface 22. furthermore, computer device 12 may also communicate with or more networks (e.g., Local Area Network (LAN), Area Network (WAN) via a Network adapter 20. As shown, Network adapter 20 communicates with other modules of computer device 12 via bus 18. it should be understood that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with computer device 12, including, but not limited to, microcode, a device driver, Redundant array drive unit, disk drive system drive, RAID, disk drive system, RAID, and the like.
The processing unit 16 executes programs stored in the system memory 28 to perform various functional applications and data processing, such as implementing deep learning neural network model load calculation methods provided by any of the embodiments of the present invention.
EXAMPLE five
The fifth embodiment of the invention provides computer-readable storage media, wherein the computer-readable storage media are stored with computer programs, and the computer programs are implemented by a processor, when the computer programs are executed by the processor, the method for computing the load of the deep learning neural network model provided by all the embodiments of the invention of the present application comprises the steps of analyzing a pre-constructed network model, decomposing the computing process of the network model into at least two computing tasks, wherein the at least two computing tasks have dependency relationships, dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks, respectively allocating resources to all computing subtasks associated with the computing tasks according to each resource allocation strategy to obtain an allocation data set of the computing tasks under each resource allocation strategy, wherein the resources comprise computing resources and storage resources, counting the allocation data set of each computing task under each resource allocation strategy to form a load matrix of the network model, calculating the allocation data set of each computing subtask obtained by decomposing the network model under each resource allocation strategy according to-be-evaluated chip performance parameter set and the load matrix, and determining the performance matrix of the network model under each computing subtask.
A more specific example (a non-exhaustive list) of the computer readable storage medium includes an electrical connection having or more wires, a portable computer diskette, a hard disk, a RAM, a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave .
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or a combination thereof, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1, deep learning neural network model load calculation method, which is characterized by comprising:
analyzing a pre-constructed network model, and decomposing a calculation process of the network model into at least two calculation tasks; wherein, the at least two calculation tasks have a dependency relationship, and the network model is a deep learning neural network model;
dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks;
distributing resources for all computing subtasks associated with the computing task according to each resource distribution strategy to obtain a distribution data set of the computing task under each resource distribution strategy; the resources include computing resources and storage resources;
counting distribution data sets of each calculation task under each resource distribution strategy to form a load matrix of the network model;
and calculating the running time of each calculation subtask obtained by decomposing the network model under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip for running the network model.
2. The method of claim 1, wherein the parsing the pre-constructed network model to decompose the computation flow of the network model into at least two computation tasks comprises:
analyzing the network model to determine a hierarchical structure of the network model, wherein the hierarchical structure of the network model comprises at least two layers;
and taking the data processing operation associated with the layer as calculation tasks to form at least two calculation tasks.
3. The method of claim 2, wherein the partitioning of each of the computing tasks according to the at least preconfigured resource allocation policies to form at least computing subtasks comprises:
determining the distribution quantity of the computing resources according to at least pre-configured resource distribution strategies;
and dividing the calculation tasks according to the distribution quantity to form at least calculation subtasks, wherein the quantity of the calculation subtasks obtained by dividing each calculation task is less than or equal to the distribution quantity.
4. The method of claim 3, further comprising, before partitioning each of the computing tasks according to at least preconfigured resource allocation policies to form at least computing subtasks:
receiving at least resource allocation tables, and analyzing to obtain the combination relationship between the computing resources and the storage resources corresponding to each resource allocation table;
the combination relationship between the computing resources and the storage resources corresponding to resource allocation tables is used as resource allocation strategies.
5. The method according to claim 4, wherein the allocating resources to all the computing subtasks associated with the computing task according to each of the resource allocation policies to obtain an allocation data set of the computing task under each of the resource allocation policies comprises:
traversing each resource allocation table from a target resource allocation table in each resource allocation table until all resource allocation tables are traversed;
in the traversal process of the resource configuration table, traversing each computing subtask from a target computing subtask in each computing subtask of the computing tasks, selecting a target computing resource from a combination relation corresponding to the resource configuration table for the traversed current computing subtask, acquiring a corresponding target storage resource, and establishing a corresponding relation among the current computing subtask, the target computing resource and at least target storage resources until all computing subtasks are traversed;
the computing resources corresponding to each computing sub-task of the computing task are different, and the storage resources corresponding to each computing sub-task of the computing task are different;
generating an allocation data set of the computing tasks under the resource allocation table according to the computing resources corresponding to the computing sub-tasks and the corresponding storage resources;
and acquiring an allocation data set of the computing task under each resource allocation table after all the resource allocation tables are traversed.
6. The method according to claim 1, wherein the step of determining the performance matrix by calculating the running time of each computation subtask obtained by the network model decomposition under each resource allocation policy according to the performance parameter set of the chip to be evaluated and the load matrix comprises:
according to the performance parameter set of the chip to be evaluated, calculating input data carrying time, input data processing consumption time and result data carrying time of each calculation subtask in the load matrix under each resource allocation strategy;
taking the sum of the input data transfer time, the input data processing consumption time and the result data transfer time of the calculation subtasks under the resource allocation strategy as the running time of the calculation subtasks under the resource allocation strategy;
and forming a performance matrix of the network model according to the running time of each computing sub-task under each resource allocation strategy.
7. The method of claim 4, wherein the allocated amount of computing resources included in each of the resource allocation tables is equal to the number of processors in the set of chip performance parameters.
8, deep learning neural network model load calculation device, comprising:
the computing task analysis module is used for analyzing a pre-constructed network model and decomposing the computing process of the network model into at least two computing tasks; wherein, the at least two calculation tasks have a dependency relationship, and the network model is a deep learning neural network model;
the computing task dividing module is used for dividing each computing task according to at least pre-configured resource allocation strategies to form at least computing subtasks;
the resource allocation module is used for allocating resources for all the computing subtasks associated with the computing task according to the resource allocation strategies to obtain an allocation data set of the computing task under the resource allocation strategies; the resources include computing resources and storage resources;
a load matrix generation module, configured to count an allocation data set of each computation task under each resource allocation policy to form a load matrix of the network model;
and the performance matrix calculation module is used for calculating the running time of each calculation subtask obtained by decomposing the network model under each resource allocation strategy according to the performance parameter set of the chip to be evaluated and the load matrix, and determining the performance matrix so as to evaluate the performance of the chip for running the network model.
Computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for deep learning neural network model load calculation according to any of claims 1-7 when executing the program.
10, a computer-readable storage medium, on which a computer program is stored, the program, when being executed by a processor, implementing a deep learning neural network model load calculating method as claimed in any of claims 1-7.
CN201911008660.3A 2019-10-23 2019-10-23 Deep learning neural network model load calculation method, device, equipment and medium Active CN110515739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911008660.3A CN110515739B (en) 2019-10-23 2019-10-23 Deep learning neural network model load calculation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911008660.3A CN110515739B (en) 2019-10-23 2019-10-23 Deep learning neural network model load calculation method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN110515739A CN110515739A (en) 2019-11-29
CN110515739B true CN110515739B (en) 2020-01-31

Family

ID=68633608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911008660.3A Active CN110515739B (en) 2019-10-23 2019-10-23 Deep learning neural network model load calculation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN110515739B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111158901B (en) * 2019-12-09 2023-09-08 爱芯元智半导体(宁波)有限公司 Optimization method, optimization device, computer equipment and storage medium for calculation graph
CN111047017B (en) * 2019-12-18 2023-06-23 北京安兔兔科技有限公司 Neural network algorithm evaluation method and device and electronic equipment
CN111162946B (en) * 2019-12-30 2022-07-12 北京奇艺世纪科技有限公司 Method for constructing model inference network, data processing method, data processing device and storage medium
CN111340237B (en) * 2020-03-05 2024-04-26 腾讯科技(深圳)有限公司 Data processing and model running method, device and computer equipment
CN111860758B (en) * 2020-04-07 2024-05-03 北京嘀嘀无限科技发展有限公司 Deep learning model operation method and device, electronic equipment and medium
CN111738434B (en) * 2020-06-03 2023-04-07 中国科学院计算技术研究所 Method for executing deep neural network on heterogeneous processing unit
CN111753973B (en) * 2020-06-22 2024-11-26 深圳鲲云信息科技有限公司 A neural network chip optimization method, system, device and storage medium
CN111858070B (en) * 2020-08-05 2023-12-01 中国工商银行股份有限公司 Computing resource allocation method, device, equipment and storage medium
CN112036559B (en) * 2020-08-26 2024-12-27 北京灵汐科技有限公司 Neural network structure division method, device, computer equipment and storage medium
WO2022042519A1 (en) * 2020-08-27 2022-03-03 北京灵汐科技有限公司 Resource allocation method and apparatus, and computer device and computer-readable storage medium
CN111984423B (en) * 2020-09-02 2024-09-03 北京小米松果电子有限公司 Method, device and medium for running deep learning model
WO2022116142A1 (en) * 2020-12-04 2022-06-09 深圳大学 Resource scheduling method based on graph neural network
CN112598112B (en) * 2020-12-04 2021-09-10 深圳大学 Resource scheduling method based on graph neural network
US20220188620A1 (en) * 2020-12-10 2022-06-16 International Business Machines Corporation Time estimator for deep learning architecture
CN113268404B (en) * 2021-05-28 2024-08-06 曙光信息产业(北京)有限公司 Performance analysis and optimization method and device, computer equipment and storage medium
CN113741932A (en) * 2021-08-19 2021-12-03 浙江大华技术股份有限公司 Intelligent identification algorithm upgrading method and device for equipment and electronic device
CN113884857B (en) * 2021-09-29 2024-03-08 上海阵量智能科技有限公司 Chip, chip pressure testing method and device, electronic equipment and storage medium
CN114020450A (en) * 2021-10-08 2022-02-08 深圳云天励飞技术股份有限公司 Neural network model execution method, device, system and electronic equipment
KR20230142336A (en) * 2022-04-01 2023-10-11 리벨리온 주식회사 Method for measuring performance of neural processing device and Device for measuring performance
CN114721802A (en) * 2022-04-12 2022-07-08 北京灵汐科技有限公司 Resource scheduling method, scheduling device and processing core
CN117521841A (en) * 2022-07-28 2024-02-06 华为技术有限公司 Deep learning system and method
CN116501594B (en) * 2023-06-27 2023-09-08 上海燧原科技有限公司 System modeling evaluation method and device, electronic equipment and storage medium
CN116501505B (en) * 2023-06-27 2023-09-12 上海燧原科技有限公司 Method, device, equipment and medium for generating data stream of load task
CN116737605B (en) * 2023-08-11 2023-11-14 上海燧原科技有限公司 Data prefetching method, device, equipment and medium based on chip multilevel storage
CN118798275B (en) * 2024-09-10 2025-02-25 中昊芯英(杭州)科技有限公司 Model calculation method and related device
CN119248499A (en) * 2024-09-29 2025-01-03 上海稀宇极智科技有限公司 Task processing load analysis method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110333945A (en) * 2019-05-09 2019-10-15 成都信息工程大学 A dynamic load balancing method, system and terminal

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8515885B2 (en) * 2010-10-29 2013-08-20 International Business Machines Corporation Neuromorphic and synaptronic spiking neural network with synaptic weights learned using simulation
CN106649060A (en) * 2015-11-02 2017-05-10 中国移动通信集团公司 Equipment performance testing method and device
US10019668B1 (en) * 2017-05-19 2018-07-10 Google Llc Scheduling neural network processing
CN108197083B (en) * 2018-01-31 2021-04-13 湖南农业大学 A Data Center Workload Prediction Method Based on Wavelet Neural Network Fusion Linear Regression
CN109901878B (en) * 2019-02-25 2021-07-23 北京灵汐科技有限公司 Brain-like computing chip and computing equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110333945A (en) * 2019-05-09 2019-10-15 成都信息工程大学 A dynamic load balancing method, system and terminal

Also Published As

Publication number Publication date
CN110515739A (en) 2019-11-29

Similar Documents

Publication Publication Date Title
CN110515739B (en) Deep learning neural network model load calculation method, device, equipment and medium
US7660884B2 (en) Apparatus, system, and method for generating a resource utilization description for a parallel data processing system
CN108205469B (en) MapReduce-based resource allocation method and server
US20130268941A1 (en) Determining an allocation of resources to assign to jobs of a program
EP2738675B1 (en) System and method for efficient resource management of a signal flow programmed digital signal processor code
CN111753983A (en) Method, system, device and storage medium for customizing neural network model
CN113296905A (en) Scheduling method, scheduling device, electronic equipment, storage medium and software product
CN112068957A (en) Resource allocation method, device, computer equipment and storage medium
US11809849B1 (en) Global modulo allocation in neural network compilation
CN116467061B (en) A method, device, storage medium and electronic equipment for task execution
CN110413539B (en) Data processing method and device
CN114816711A (en) Batch task processing method and device, computer equipment and storage medium
CN116701001B (en) Target task allocation method, device, electronic equipment and storage medium
CN115705496A (en) Quantum computer operating system and quantum computer
CN118963941A (en) Task allocation method and device
CN116560968A (en) Simulation calculation time prediction method, system and equipment based on machine learning
CN113704687B (en) Tensor calculation operation method, device and operation system
CN118210615A (en) Resource allocation method and device
CN115759260A (en) Inference method and device of deep learning model, electronic equipment and storage medium
CN111984418B (en) Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks
CN115827225A (en) Distribution method of heterogeneous operation, model training method, device, chip, equipment and medium
Meng et al. PEARL: Enabling portable, productive, and high-performance deep reinforcement learning using heterogeneous platforms
Filelis-Papadopoulos et al. Characterization of hardware in self-managing self-organizing cloud environment
US20240062045A1 (en) Method and system for latency optimized heterogeneous deployment of convolutional neural network
CN117573523B (en) A parallel fuzz testing method based on complementarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant