WO2024109312A1 - 任务调度执行方法、任务调度执行指令的生成方法及装置 - Google Patents
任务调度执行方法、任务调度执行指令的生成方法及装置 Download PDFInfo
- Publication number
- WO2024109312A1 WO2024109312A1 PCT/CN2023/120845 CN2023120845W WO2024109312A1 WO 2024109312 A1 WO2024109312 A1 WO 2024109312A1 CN 2023120845 W CN2023120845 W CN 2023120845W WO 2024109312 A1 WO2024109312 A1 WO 2024109312A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- task
- group
- model file
- version model
- neural network
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present disclosure relates to chip technology, and in particular to a task scheduling execution method, a method and a device for generating task scheduling execution instructions.
- the chip may include a neural network accelerator, such as a brain processing unit (BPU).
- a neural network accelerator such as a brain processing unit (BPU).
- the neural network accelerator has multiple tasks to be processed, and the neural network accelerator often executes these tasks in the order in which they are generated.
- Embodiments of the present disclosure provide a task scheduling execution method, a method and a device for generating a task scheduling execution instruction.
- a task scheduling execution method including:
- a set of the first target tasks that meet the preset concurrent execution condition is determined as a first task group, wherein the state information group of the first version model file includes: states corresponding to the respective functional units of the neural network accelerator when the first version model file is in a running state;
- the first version model files corresponding to the first target tasks in the first task group are run concurrently.
- a method for generating a task scheduling execution instruction comprising:
- a state information group of the first version model file is generated, wherein the functional unit group corresponding to the first operator group includes: each functional unit of the neural network accelerator for the first operator group to run, and the state information group of the first version model file includes: each state corresponding to each functional unit of the neural network accelerator when the first version model file is in a running state;
- a task scheduling execution instruction is generated, and the task scheduling execution instruction is used to execute the above-mentioned task scheduling execution method.
- a task scheduling execution device comprising:
- a first determination module is used to determine whether there is a first target task corresponding to a first version model file, and the first version model file occupies computing resources of the neural network accelerator according to a predetermined occupancy ratio in a running state;
- a second determination module is configured to determine, based on the state information group of the first version model file and the predetermined occupancy ratio, the set of the first target tasks determined by the first determination module that meet the preset concurrent execution condition as a first task group, wherein the state information group of the first version model file includes: states corresponding to the respective functional units of the neural network accelerator when the first version model file is in a running state;
- the first running module is used to concurrently run the first version model files respectively corresponding to the first target tasks in the first task group determined by the second determining module.
- a device for generating a task scheduling execution instruction comprising:
- the first generation module is used to generate a first version model file corresponding to the first operator group through compilation processing.
- the first version model file occupies computing resources of the neural network accelerator according to a predetermined occupancy ratio in a running state;
- a second generation module is used to generate a state information group of the first version model file generated by the first generation module based on the function unit group corresponding to the first operator group, the function unit group corresponding to the first operator group includes: each function unit of the neural network accelerator for the first operator group to run, and the state information group of the first version model file includes: the state corresponding to each function unit of the neural network accelerator when the first version model file is in a running state;
- the third generation module is used to generate a task scheduling execution instruction based on the first version model file generated by the first generation module, the status information group of the first version model file generated by the second generation module, and the predetermined occupancy ratio, and the task scheduling execution instruction is used to execute the above-mentioned task scheduling execution method.
- a computer-readable storage medium wherein the storage medium stores a computer program, and the computer program is used to execute the above-mentioned task scheduling execution method or the method for generating task scheduling execution instructions.
- an electronic device including:
- a memory for storing instructions executable by the processor
- the processor is used to read the executable instructions from the memory and execute the instructions to implement the above-mentioned task scheduling execution method or the method for generating task scheduling execution instructions.
- a computer program product is provided.
- the above-mentioned task scheduling execution method or the above-mentioned method for generating task scheduling execution instructions is implemented.
- a first task group can be determined based on the status information group of the first version model file and the predetermined occupancy ratio of the computing resources of the neural network accelerator by the first version model file in the running state, and the first version model files corresponding to each first target task in the first task group can be run concurrently.
- FIG. 1 is a schematic diagram of the structure of a chip in an exemplary embodiment of the present disclosure.
- FIG2 is a schematic diagram of a neural network accelerator for implementing parallel processing of multiple tasks in an embodiment of the present disclosure.
- FIG. 3 is a flowchart of a task scheduling execution method provided by an exemplary embodiment of the present disclosure.
- FIG. 4 is a flowchart of a task scheduling execution method provided by another exemplary embodiment of the present disclosure.
- FIG5-1 is a schematic diagram of a task queue and a task scheduling table in a task scheduling execution method provided by an exemplary embodiment of the present disclosure.
- FIG5-2 is a schematic diagram of task splitting in a task scheduling execution method provided by an exemplary embodiment of the present disclosure.
- FIG. 6 is a flowchart of a task scheduling execution method provided by yet another exemplary embodiment of the present disclosure.
- FIG. 7 is a flowchart of a method for generating a task scheduling execution instruction provided by an exemplary embodiment of the present disclosure.
- FIG8 is a flowchart of a method for generating a task scheduling execution instruction provided by another exemplary embodiment of the present disclosure.
- FIG. 9 is a flowchart of a method for generating a task scheduling execution instruction provided by yet another exemplary embodiment of the present disclosure.
- FIG. 10 is a flowchart of a method for generating a task scheduling execution instruction provided by yet another exemplary embodiment of the present disclosure.
- FIG. 11 is a schematic diagram of the structure of a task scheduling execution device provided by an exemplary embodiment of the present disclosure.
- FIG. 12 is a schematic diagram of the structure of a task scheduling execution device provided by another exemplary embodiment of the present disclosure.
- FIG. 13 is a schematic diagram of the structure of a device for generating task scheduling execution instructions provided by an exemplary embodiment of the present disclosure.
- FIG. 14 is a schematic diagram of the structure of a device for generating task scheduling execution instructions provided by another exemplary embodiment of the present disclosure.
- FIG. 15 is a structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.
- Some chips may include a neural network accelerator, for example, an artificial intelligence (AI) chip may include a BPU.
- AI artificial intelligence
- the neural network accelerator has multiple tasks to be processed, and the neural network accelerator often executes these tasks in the order in which the tasks are generated, and the neural network accelerator executes only one task at the same time.
- the neural network accelerator in the chip may include: a computing component and multiple functional units (Function Unit); among them, the L1SRAM (Static Random-Access Memory) in Figure 1 can be used as a computing component; the Tensor Core, Vector core, Scalar core, and DSU (Domain Specific Unit) in Figure 1 can be used as a functional unit respectively.
- a computing component and multiple functional units (Function Unit); among them, the L1SRAM (Static Random-Access Memory) in Figure 1 can be used as a computing component; the Tensor Core, Vector core, Scalar core, and DSU (Domain Specific Unit) in Figure 1 can be used as a functional unit respectively.
- the chip may also include other components, such as a graphics processor (Graphics Processing Unit, GPU), a digital signal processor (Digital Signal Processing, DSP), etc.
- a graphics processor Graphics Processing Unit, GPU
- DSP Digital Signal Processing
- the compilation stage and the execution stage can be improved respectively; wherein, in the compilation stage, a task scheduling execution instruction can be generated; in the execution stage, the task scheduling execution instruction generated in the compilation stage can be executed to realize the parallel processing of multiple tasks by the neural network accelerator, thereby improving the computing efficiency of the neural network accelerator.
- Fig. 3 is a flowchart of a task scheduling execution method provided by an exemplary embodiment of the present disclosure.
- the method shown in Fig. 3 includes step 310, step 320 and step 330, and each step is described below.
- Step 310 determine whether there is a first target task corresponding to the first version model file, and the first version model file occupies the computing resources of the neural network accelerator according to a predetermined occupancy ratio in the running state.
- step 310 may be executed by the processor calling a corresponding instruction stored in a memory, or may be executed by a first determination module executed by the processor.
- step 310 all tasks to be processed by the neural network accelerator can be determined, and these tasks can be traversed to determine which of these tasks have corresponding first version model files. Then, each task with the corresponding first version model file can be used as a first target task. In this way, by executing step 310, a number of first target tasks can be determined (for the sake of ease of explanation, it is assumed that the number of first target tasks is N, and N can be an integer greater than or equal to 2).
- any first target task and the corresponding first version model file can be understood as follows: the first target task can be completed by running the first version model file, and when the first version model file is in operation, the proportion of computing resources occupied by the first version model file of the neural network accelerator is a predetermined proportion.
- any predetermined occupancy ratio can be any ratio greater than 0% and less than 100%, such as 30%, 40%, 60%, etc.; the predetermined occupancy ratios corresponding to different first version model files can be the same or different; the computing resources of the neural network accelerator may refer to the computing resources of the computing components in the neural network accelerator, such as the computing resources of the L1SRAM in Figure 1.
- Step 320 based on the state information group of the first version model file and the predetermined occupancy ratio,
- the set of first target tasks of the execution conditions is determined as the first task group, and the state information group of the first version model file includes: the states corresponding to the various functional units of the neural network accelerator when the first version model file is in the running state.
- step 320 may be executed by the processor calling a corresponding instruction stored in a memory, or may be executed by a second determination module executed by the processor.
- the corresponding state information group and the predetermined occupancy ratio can be determined for the first version model file corresponding to each of the N first target tasks, thereby obtaining N state information groups and N predetermined occupancy ratios; wherein, in the state information group corresponding to any first version model file, the state corresponding to any functional unit can be used to characterize whether the functional unit is used when the first version model file is in operation.
- the N state information groups and the N predetermined occupancy ratios it can be determined which of the N first target tasks meet the preset concurrent execution conditions, thereby determining the first task group.
- the N first target tasks can be specifically 4 first target tasks, namely Task 1 to Task 4, and only Task 1, Task 3, and Task 4 meet the preset concurrent execution conditions. Then the set of Task 1, Task 3, and Task 4 can be determined as a first task group.
- the N first target tasks can be specifically 6 first target tasks, namely Task 1 to Task 6, and Task 1 to Task 3 meet the preset concurrent execution conditions, and Task 4 to Task 6 meet the preset concurrent execution conditions. Then, the set consisting of Task 1 to Task 3 can be determined as a first task group, and the set consisting of Task 4 to Task 6 can be determined as another first task group.
- Step 330 concurrently run the first version model files corresponding to the first target tasks in the first task group.
- step 330 may be executed by the processor calling a corresponding instruction stored in a memory, or may be executed by a first execution module executed by the processor.
- first task groups are one, and this first task group is a set consisting of Task 1, Task 3, and Task 4, the first version model files corresponding to Task 1, Task 3, and Task 4 can be run concurrently by the neural network accelerator, thereby realizing parallel processing of Task 1, Task 3, and Task 4 by the neural network accelerator.
- the neural network accelerator can be used to concurrently run the first version model files corresponding to tasks 1, 2, and 3, respectively, to achieve parallel processing of tasks 1, 2, and 3 by the neural network accelerator, and then the neural network accelerator can be used to concurrently run the first version model files corresponding to tasks 4, 5, and 6, respectively, to achieve parallel processing of tasks 4, 5, and 6 by the neural network accelerator.
- the neural network accelerator can also be used to concurrently run the first version model files corresponding to tasks 4, 5, and 6, respectively, and then the neural network accelerator can be used to concurrently run the first version model files corresponding to tasks 1, 2, and 3, respectively.
- the first task group can be determined based on the status information group of the first version model file and the predetermined occupancy ratio of the computing resources of the neural network accelerator by the first version model file in the running state, and the first version model files corresponding to the first target tasks in the first task group can be run concurrently. This is equivalent to realizing the parallel processing of multiple tasks by the neural network accelerator through the task scheduling mechanism, thereby improving the computing efficiency of the neural network accelerator to better meet actual needs.
- the plurality of first target tasks satisfying the preset concurrent execution condition may include any one or more of the following:
- the preset ratio may be 100%.
- the preset ratio may be a ratio less than 100% and close to 100%.
- the embodiments of the present disclosure are described by taking the case where the preset ratio is 100% as an example.
- the state corresponding to any functional unit can be It is one of the following two: the used state and the idle state; among them, the used state can be represented by shared, and the idle state can be represented by available.
- the neural network accelerator includes three functional units, namely Tensor Core, Vector Core, and DSU.
- the N first target tasks are specifically four first target tasks, namely Task 1 to Task 4, and the status information groups corresponding to Task 1 to Task 4 are as follows:
- Task 1 Tensor Core-shared, Vector core-available, DSU-available
- Task 2 Tensor Core-shared, Vector core-shared, DSU-shared
- Task 3 Tensor Core-available, Vector core-shared, DSU-available
- Task 4 Tensor Core-available, Vector core-available, DSU-shared
- the form of "A-shared” indicates that the state corresponding to the functional unit A is in use
- the form of "B-available” indicates that the state corresponding to the functional unit B is idle.
- Task 1, Task 3, and Task 4 only Task 1 corresponds to the state of Tensor Core in use, and the other two correspond to the state of Tensor Core in idle state; among Task 1, Task 3, and Task 4, only Task 3 corresponds to the state of Vector Core in use, and the other two correspond to the state of Vector Core in idle state; among Task 1, Task 3, and Task 4, only Task 4 corresponds to the state of DSU in use, and the other two correspond to the state of DSU in idle state. Then, for Task 1, Task 3, and Task 4, the conditions defined in (1) above are satisfied.
- Task 1 when the conditions defined in (1) above are met, assuming that the predetermined occupancy ratios corresponding to Task 1 to Task 4 are 40%, 25%, 50%, and 65% respectively, it is obvious that the sum of the three predetermined occupancy ratios corresponding to Task 1, Task 3, and Task 4 is greater than 100%, the sum of the two predetermined occupancy ratios corresponding to Task 1 and Task 4 is greater than 100%, the sum of the two predetermined occupancy ratios corresponding to Task 3 and Task 4 is greater than 100%, and the sum of the two predetermined occupancy ratios corresponding to Task 1 and Task 3 is less than 100%. That is, for both Task 1 and Task 3, the conditions defined in (2) above are met, and it can be determined that both Task 1 and Task 3 meet the preset concurrent execution conditions. In this way, Task 1 can be implemented by using Tensor Cores while Task 3 can be implemented by using Vector Cores. That is, the neural network accelerator can execute both Task 1 and Task 3 at the same time.
- N first target tasks in the above example are not 4 first target tasks, but 5 first target tasks, for example, in addition to the above tasks 1 to 4, task 5 is also included, and the predetermined occupancy ratio corresponding to task 5 is 30%.
- the status information group corresponding to task 5 is as follows:
- Task 5 Tensor Core-shared, Vector core-shared, DSU-available
- the predetermined occupancy ratios corresponding to tasks 1 to 4 are 40%, 25%, 50%, and 65% respectively. It is easy to see that both tasks 1 and 3 meet the preset concurrent execution conditions, and since only one of tasks 4 and 5 corresponds to the state of Tensor Core being used, only one of tasks 4 and 5 corresponds to the state of Vector core being used, only one of tasks 4 and 5 corresponds to the state of DSU being used, and the sum of the two predetermined occupancy ratios corresponding to tasks 4 and 5 is 95%, which is less than 100%, it can be determined that tasks 4 and 5 also meet the preset concurrent execution conditions. In this way, tasks 1 and 3 can be divided into one first task group, and tasks 4 and 5 can be divided into another first task group, thereby determining two first task groups.
- the computing resources of the neural network accelerator can support the parallel processing of each first target task in the first task group, thereby ensuring that each first target task in the first task group is successfully completed through parallel processing, thereby improving the utilization rate of the computing components in the neural network accelerator.
- the method further includes step 301 , step 303 , step 305 and step 307 .
- Step 301 obtain a task queue, each task in the task queue corresponds to a neural network model.
- step 301 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by an acquisition module executed by the processor.
- the task queue can be the BPU Task Queue in Figure 5-1.
- the neural network model can be considered as a sequence of operator units, that is, the neural network model can include multiple (for example, 40, 50, 100, etc.) operator units arranged in a certain order; wherein, the multiple operator units include but are not limited to convolution (Convolution, Conv) operator unit, pooling (Pool) operator unit, deconvolution operator unit, rectified linear unit (Rectified Linear Unit, ReLU) operator unit, batch normalization (Batch Normalization, BN) operator unit, etc.
- the relationship between any task in the task queue and the corresponding neural network model can be understood as follows: the task needs to rely on the neural network model to be completed.
- the task is a target detection task, and the neural network model is a model for target detection.
- the task is considered to be completed by providing the image to be detected as input to the neural network model for calculation and processing, and obtaining the target detection result output by the neural network model.
- Step 303 determine whether there is corresponding division method information for the target neural network model corresponding to the second target task, where the second target task is any task in the task queue; in response to the existence of corresponding division method information for the target neural network model, execute step 305; in response to the absence of corresponding division method information for the target neural network model, execute step 307.
- step 303 may be executed by the processor calling a corresponding instruction stored in the memory, or may be executed by a third determination module executed by the processor.
- the target storage area may store the correspondence between the neural network model and the division method information.
- the origin of the correspondence stored in the target storage area can refer to the relevant description of the compilation stage below, which will not be elaborated here.
- step 303 the corresponding relationships stored in the target storage area may be traversed. If it is determined by traversing the corresponding relationships stored in the target storage area that the corresponding relationship stored in the target storage area contains the division method information corresponding to the target neural network model, step 305 may be executed; if it is determined by traversing the corresponding relationships stored in the target storage area that the corresponding relationship stored in the target storage area does not contain the division method information corresponding to the target neural network model, step 307 may be executed.
- Step 305 divide the second target task to obtain K divided tasks, and add the K divided tasks to the task scheduling table; the K divided tasks correspond to the K operator groups obtained by dividing the target neural network model according to the division method information corresponding to the target neural network model.
- step 305 may be executed by the processor calling a corresponding instruction stored in a memory, or may be executed by a first processing module executed by the processor.
- the task scheduler can be the Task Scheduler in Figure 5-1.
- the division method information corresponding to the target neural network model can be used to divide the target neural network model into K operator groups, each operator group including at least one operator unit in the target neural network model.
- the second target task can be divided based on the division method information corresponding to the target neural network model to obtain K division tasks corresponding to the K operator groups.
- K can be 2, 3, 4 or an integer greater than 4, which will not be listed here one by one.
- Step 307 Add the second target task to the task scheduling table.
- step 307 may be executed by the processor calling a corresponding instruction stored in the memory, or may be executed by a second processing module executed by the processor.
- Step 310 includes step 3101.
- Step 3101 Determine from the task scheduling table whether there is a first target task corresponding to the first version model file.
- step 3101 all tasks in the task scheduling table may be traversed to determine which of the tasks have corresponding first version model files, and then each task having the corresponding first version model file may be taken as a first target task.
- the second target task is Task1 in Figure 5-2.
- the target neural network model includes 5 operator units, namely Convolution1, Pooling1, Convolution2, Pooling2, and Convolution3.
- Task1 can be divided into 5 tasks, namely Task1.1, Task1.2, Task1.3, Task1.4, and Task1.5; among them, Task1.1 corresponds to Convolution1, Task1.2 corresponds to Pooling1, Task1.3 corresponds to Convolution2, Task1.4 corresponds to Pooling2, and Task1.5 corresponds to Convolution3.
- Task1 needs to use Tensor Core and Vector core
- Task1.1 needs to use Tensor Core
- Task1.2 needs to use Vector core
- Task1.3 needs to use Tensor Core
- Task1.4 needs to use Vector core
- Task1.5 needs to use Vector core.
- the functional units required to execute any of Task1.1 to Task1.5 are less than the functional units required to execute Task1.
- Task1.1 and Task1.2 can be used as a first target task respectively, and the sum of the two predetermined occupancy ratios corresponding to Task1.1 and Task1.2 is less than 100%, the neural network accelerator can process both Task1.1 and Task1.2 in parallel to improve the computing efficiency of the neural network accelerator.
- Task1.3 and Task1.4 can be used as a first target task respectively, and the sum of the two predetermined occupancy ratios corresponding to Task1.3 and Task1.4 is less than 100%, the neural network accelerator can process both Task1.3 and Task1.4 in parallel.
- corresponding division method information for tasks for which there is corresponding division method information, several divided tasks with smaller granularity than the task can be obtained through division processing, and relatively fewer functional units need to be used when each divided task is executed, thereby increasing the probability that different tasks in the task scheduling table can be processed in parallel, which is beneficial to improving the computing efficiency of the neural network accelerator.
- the method further includes step 340 and step 350 .
- Step 340 determine a second task group, where the second task group includes: a set of third target tasks in the task scheduling table except for each first target task in the first task group.
- step 340 may be executed by the processor calling a corresponding instruction stored in the memory, or may be executed by a fourth determination module executed by the processor.
- step 340 all tasks in the task scheduling table can be traversed to determine which tasks in the task scheduling table are not in the first task group. Each of these tasks can be used as the first third target task, and the set of all third target tasks can be used as the second task group.
- Step 350 Run the second version model files corresponding to the third target tasks in the second task group in a preset order, and the second version model files completely occupy the computing resources in the running state.
- step 350 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a second execution module executed by the processor.
- any third target task and the corresponding second version model file can be understood as follows: the third target task can be completed by running the second version model file, and when the second version model file is in operation, the second version model file occupies 100% of the computing resources of the neural network accelerator.
- any second version model file may have a status information group, and the status information group of any second version model includes: when the second version model file is in the running state, the states corresponding to each functional unit of the neural network accelerator respectively, and each state in the status information group is an exclusive state; wherein the exclusive state can be represented by exclusive.
- step 350 the time when each third target task in the second task group is added to the task schedule table can be determined, and the second version model files corresponding to each third target task in the second task group can be run in sequence through the neural network accelerator in the order of the adding time from the earliest to the latest, thereby realizing each third target task in the second task group.
- Serial processing of target tasks can be performed.
- these tasks can be processed serially through the neural network accelerator, so that the tasks in the task scheduling table can be successfully processed without causing omission of tasks.
- FIG5-1 there are three tasks in the task queue, namely Task1, Task2, and Task3.
- the neural network model corresponding to Task1 is model1
- the neural network model corresponding to Task2 is model2
- the neural network model corresponding to Task3 is model3.
- Model1 does not have corresponding partitioning information
- model2 has corresponding partitioning information
- the partitioning information is used to partition model2 into operator group 1 and operator group 2
- model3 has corresponding partitioning information
- the partitioning information is used to partition model3 into operator group 3 and operator group 4.
- Task1 may not be partitioned, and Task2 may be partitioned to obtain Task2.1 and Task2.2, and Task3 may be partitioned to obtain Task3.1 and Task3.2; wherein Task2.1 corresponds to operator group 1, Task2.2 corresponds to operator group 2, Task3.1 corresponds to operator group 3, and Task3.2 corresponds to operator group 4. Task1, Task2.1, Task2.2, Task3.1, and Task3.2 can all be added to Task Scheduler.
- the neural network accelerator includes two functional units, namely Tensor Core and Vector core, and Task1, Task2.1, and Task3.1 do not have corresponding first version model files, but only corresponding second version model files, and Task2.2 and Task3.2 both have corresponding first version model files.
- the Task Scheduler can also add status information groups of the second version model files corresponding to Task1, Task2.1, and Task3.1, as well as status information groups of the first version model files corresponding to Task2.2 and Task3.2.
- Task1, Task2.1, and Task3.1 do not have corresponding first version model files, Task1, Task2.1, and Task3.1 cannot be used as the first target task, but can only be used as a third target task. In this way, any of Task1, Task2.1, and Task3.1 cannot be processed in parallel with other tasks, but can only be processed separately. Since only one of Task2.2 and Task3.2 corresponds to the state of Tensor Core as being in use, and only one of Task2.2 and Task3.2 corresponds to the state of Vector core as being in use, if the sum of the predetermined occupancy ratios corresponding to the first version model files corresponding to Task2.2 and Task3.2 is less than 100%, it can be determined that Task2.2 and Task3.2 both meet the preset concurrent execution conditions. Then, the neural network accelerator can first process Task2.2 and Task3.2 in parallel, and then process Task1, Task2.1, and Task3.1 in serial, thereby processing all tasks in Task Scheduler.
- Any task scheduling execution method provided in the embodiments of the present disclosure may be executed by any appropriate device with data processing capabilities, including but not limited to: a terminal device and a server, etc.
- any task scheduling execution method provided in the embodiments of the present disclosure may be executed by a processor, such as the processor executing any task scheduling execution method mentioned in the embodiments of the present disclosure by calling corresponding instructions stored in a memory. This will not be described in detail below.
- Fig. 7 is a flowchart of a method for generating a task scheduling execution instruction provided by an exemplary embodiment of the present disclosure.
- the method shown in Fig. 7 includes step 710, step 720 and step 730, and each step is described below.
- Step 710 through compilation processing, a first version model file corresponding to the first operator group is generated, and the first version model file occupies the computing resources of the neural network accelerator according to a predetermined occupancy ratio in the running state.
- step 710 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by the first generating module executed by the processor.
- a complete neural network model can be used as the first operator group, or a set of several consecutive operator units in a complete neural network model can be used as the first operator group; the number of first operator groups can be multiple, and each of the multiple first operator groups can correspond to a first target task in the above text.
- a compiler may be used to perform compilation processing to generate first version model files corresponding to each first operator group.
- the specific compilation processing method may be any feasible method according to actual needs, and this disclosure will not elaborate on this.
- Step 720 Generate a state information group of the first version model file based on the functional unit group corresponding to the first operator group.
- the functional unit group corresponding to the first operator group includes: each functional unit of the neural network accelerator used for the operation of the first operator group, and the status information group of the first version model file includes: the status corresponding to each functional unit of the neural network accelerator when the first version model file is in the running state.
- step 720 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a second generation module executed by the processor.
- the functional unit group corresponding to the first operator group may be determined first. Assuming that the neural network accelerator includes three functional units, namely Tensor Core, Vector Core, and DSU, and the first operator group needs to use Tensor Core and Vector Core during operation, the functional unit group corresponding to the first operator group includes Tensor Core and Vector Core.
- the state information group of the first version model file may be generated with reference to the functional unit group corresponding to the first operator group. In the state information group of the first version file, the state corresponding to any functional unit may be used to indicate whether the functional unit is used when the first version model file is in operation.
- Step 730 based on the first version model file, the status information group of the first version model file, and the predetermined occupancy ratio, generate a task scheduling execution instruction, the task scheduling execution instruction is used for the task scheduling execution method (which can specifically be the task scheduling execution method in the embodiment shown in Figure 3).
- step 730 may be executed by the processor calling corresponding instructions stored in the memory, or may be executed by a third generation module executed by the processor.
- the generation step of the first version model file and the generation step of the status information group of the first version model file can be executed in sequence, and then combined with the predetermined occupancy ratio corresponding to the first version model file, the task scheduling execution instruction can be generated.
- the execution stage by executing the task scheduling execution instruction generated in the compilation stage, the first target task and the first task group can be determined in sequence, so as to concurrently run the first version model files corresponding to each first target task in the first task group. This is equivalent to realizing the parallel processing of multiple tasks by the neural network accelerator through the task scheduling mechanism, thereby improving the computing efficiency of the neural network accelerator to better meet actual needs.
- step 720 includes step 7201 and step 7203 .
- Step 7201 In response to the target functional unit being located in the functional unit group corresponding to the first operator group, determining that the state corresponding to the target functional unit in the state information group of the first version model file is a use state.
- step 7201 may be executed by the processor calling a corresponding instruction stored in a memory, or may be executed by a first determination submodule executed by the processor.
- Step 7203 In response to the target functional unit not being located in the functional unit group corresponding to the first operator group, determining that the state corresponding to the target functional unit in the state information group of the first version model file is an idle state.
- step 7203 may be executed by the processor calling a corresponding instruction stored in a memory, or may be executed by a second determination submodule executed by the processor.
- a neural network accelerator includes three functional units, namely Tensor Core, Vector core, and DSU.
- the functional unit group corresponding to the first operator group includes Tensor Core and Vector core. Since Tensor Core and Vector core are both located in the functional unit group corresponding to the first operator group, the states corresponding to Tensor Core and Vector core in the status information group of the first version model file can both be in use state. Since DSU is not located in the functional unit group corresponding to the first operator group, the state corresponding to DSU in the status information group of the first version model file can be an idle state. In this way, the status information group of the first version model file can be expressed as follows: Tensor Core-shared, Vector core-shared, DSU-available.
- the target functional unit exists in the functional unit group corresponding to the first operator group, it is possible to efficiently and reliably determine the states of the target functional unit corresponding to each functional unit in the neural network accelerator, thereby efficiently and reliably generating the state information group of the first version model file.
- the method further includes step 701 and step 703 .
- Step 701 When the function unit groups corresponding to the operator units in the neural network model are not completely the same, In this case, the neural network model is divided into K operator groups corresponding to different functional unit groups, and the corresponding division method information is recorded.
- the functional unit group corresponding to any operator group includes: various functional units of the neural network accelerator used for the operation of the operator group.
- step 701 may be executed by the processor calling a corresponding instruction stored in a memory, or may be executed by a third processing module executed by the processor.
- K may be 2, 3, 4 or an integer greater than 4, which are not listed here one by one.
- the functional unit groups corresponding to the various operator units in the neural network model may be determined first; wherein the functional unit groups corresponding to any operator unit include: various functional units of the neural network accelerator for the operation of the operator unit.
- the functional unit groups corresponding to the various operator units in the neural network model may be compared to determine whether the functional unit groups corresponding to the various operator units in the neural network model are exactly the same.
- the neural network model does not need to be divided, and naturally there is no corresponding division method information for the neural network model.
- the neural network accelerator includes three functional units, namely Tensor Core, Vector Core, and DSU.
- the neural network model is the neural network model corresponding to Task2 in Figure 5-1, that is, the functional unit groups corresponding to each operator unit included in the neural network model only include Tensor Core. Then, the neural network model does not need to be divided. In this way, Task2 can run on Tensor Core as a whole during the execution phase. Similar to Task2, Task3 in Figure 5-1 can run on Vector Core as a whole during the execution phase.
- the neural network model can be divided into K operator groups corresponding to different functional unit groups, and the corresponding division method information can be recorded.
- a neural network accelerator includes three functional units, namely Tensor Core, Vector core, and DSU
- the neural network model includes 30 operator units, among which the functional unit groups corresponding to the first 10 operator units include Tensor Core and Vector core, the functional unit groups corresponding to the middle 10 operator units include Vector core and DSU, and the functional unit groups corresponding to the last 10 operator units include Tensor Core, Vector core, and DSU.
- the neural network model can be divided into three operator groups; among which, the first operator group includes the first 10 operator units of the 30 operator units included in the neural network model, the second operator group includes the middle 10 operator units of the 30 operator units included in the neural network model, and the third operator group includes the last 10 operator units of the 30 operator units included in the neural network model.
- the division method information recorded for the neural network model can be used to characterize that the 30 operator units included in the neural network model are evenly divided into three parts.
- the neural network model may be divided into two operator groups; wherein the first operator group includes the first 10 operator units of the 30 operator units included in the neural network model, and the second operator group includes the remaining 20 operator units of the 30 operator units included in the neural network model, and the division method information recorded for the neural network model may be used to characterize that the 30 operator units included in the neural network model are divided in a ratio of 1:2. It should be noted that after recording the division method information for the neural network model, the corresponding relationship between the neural network model and the division method information may also be recorded in the target storage area.
- Step 703 Each operator group in at least part of the K operator groups is respectively used as a first operator group.
- step 703 may be executed by the processor calling a corresponding instruction stored in the memory, or may be executed by a fifth determination module executed by the processor.
- each operator group in the K operator groups may be used as a first operator group.
- Step 730 includes:
- Step 7301 for each first operator group, based on the first version model file corresponding to the first operator group, the status information group of the first version model file corresponding to the first operator group, the predetermined occupancy ratio corresponding to the first operator group, and the division method information, generate a task scheduling execution instruction, the task scheduling execution instruction is used to execute the task scheduling execution method (which can specifically be the task scheduling execution method in the embodiment shown in Figure 4).
- step 7301 By executing step 7301, corresponding task scheduling execution instructions can be generated for each first operator group.
- the neural network model when the functional unit groups corresponding to the various operator units in the neural network model are not completely the same, the neural network model can be divided with reference to the functional unit groups corresponding to the various operator units, so that different operator groups obtained by the division correspond to different functional unit groups, and the division method information is recorded.
- the corresponding relationship stored in the target storage area can be referred to to determine whether to directly add the second target task to the task scheduling table or to divide the second target task and then add it to the task scheduling table.
- the task division processing is beneficial to increase the probability that different tasks in the task scheduling table can be processed in parallel.
- the method further includes step 711 .
- Step 711 through compiling, second version model files corresponding to each first operator group are generated, and the second version model files completely occupy computing resources in the running state.
- step 711 may be executed by the processor calling a corresponding instruction stored in a memory, or may be executed by a fourth generation module executed by the processor.
- a compiler may be used to perform compilation processing to generate second version model files corresponding to each first operator group.
- the specific compilation processing method may be any feasible method according to actual needs, and this disclosure will not elaborate on this.
- Step 7301 including step 73011.
- Step 73011 for each first operator group, based on the first version model file corresponding to the first operator group, the status information group of the first version model file corresponding to the first operator group, the predetermined occupancy ratio and division method information corresponding to the first operator group, and the second version module file corresponding to the first operator group, a task scheduling execution instruction is generated, and the task scheduling execution instruction is used to execute the task scheduling execution method (which can specifically be the task scheduling execution method in the embodiment shown in Figure 6).
- second version model files corresponding to each first operator group are generated, and the generated second version model files are used to generate task scheduling execution instructions.
- these tasks can be processed serially through the neural network accelerator to ensure that these tasks can be successfully processed. In this way, each task in the task scheduling table can be successfully processed.
- multi-version model files can be compiled for each operator group in multiple operator groups (for example, each first operator group mentioned above) to generate a first version model file and a second version model file for implementing the same function; wherein the first version model file occupies L1SRAM according to a predetermined occupancy ratio, and the second version model file occupies all L1SRAM.
- both the first version model file and the second version model file can have corresponding status information groups, each of which includes the status corresponding to each functional unit of the neural network accelerator.
- status information groups each of which includes the status corresponding to each functional unit of the neural network accelerator.
- Any method for generating a task scheduling execution instruction provided in the embodiments of the present disclosure may be executed by any appropriate device with data processing capabilities, including but not limited to: a terminal device and a server, etc.
- any method for generating a task scheduling execution instruction provided in the embodiments of the present disclosure may be executed by a processor, such as the processor executing any method for generating a task scheduling execution instruction in the embodiments of the present disclosure by calling a corresponding instruction stored in a memory. This will not be described in detail below.
- FIG11 is a schematic diagram of a task scheduling execution device provided by an exemplary embodiment of the present disclosure.
- the device shown in FIG11 can be used to implement any of the above-mentioned task scheduling execution method embodiments of the present disclosure.
- the device shown in FIG11 includes a first determination module 1110, a second determination module 1120 and a first operation module 1130.
- a first determination module 1110 is used to determine whether there is a first target task corresponding to a first version model file, and the first version model file occupies computing resources of the neural network accelerator according to a predetermined occupancy ratio in a running state;
- the second determination module 1120 is used to determine the set of first target tasks determined by the first determination module 1110 that meet the preset concurrent execution condition as a first task group based on the state information group of the first version model file and the predetermined occupancy ratio, wherein the state information group of the first version model file includes: states corresponding to each functional unit of the neural network accelerator when the first version model file is in a running state;
- the first running module 1130 is used to concurrently run the first version model files corresponding to the first target tasks in the first task group determined by the second determining module 1120 .
- the plurality of first target tasks satisfying the preset concurrent execution conditions include any one or more of the following:
- one state is a use state, and the remaining states are all idle states; or, all states are idle states;
- the sum of the predetermined occupancy ratios respectively corresponding to the first target tasks in the plurality of first target tasks is smaller than the preset ratio.
- the device further includes:
- An acquisition module 1101 is used to acquire a task queue, each task in the task queue corresponds to a neural network model;
- the third determination module 1103 is used to determine whether there is corresponding division method information for the target neural network model corresponding to the second target task, and the second target task is any task in the task queue obtained by the acquisition module 1101;
- the first processing module 1105 is used for dividing the second target task to obtain K divided tasks in response to the third determining module 1103 determining that the target neural network model has corresponding division mode information, and adding the K divided tasks to the task scheduling table; the K divided tasks correspond to the K operator groups obtained by dividing the target neural network model according to the division mode information corresponding to the target neural network model;
- a second processing module 1107 is configured to add a second target task to the task scheduling table in response to the third determining module 1103 determining that the target neural network model does not have corresponding partitioning information;
- the first determining module 1110 is specifically configured to determine, from the task scheduling table, whether there is a first target task corresponding to the first version model file.
- the device further includes:
- a fourth determination module 1140 is used to determine a second task group, where the second task group includes: a set of third target tasks in the task scheduling table except for each first target task in the first task group determined by the second determination module 1120;
- the second running module 1150 is used to run the second version model files corresponding to each third target task in the second task group determined by the fourth determining module 1140 in a preset order, and the second version model files completely occupy the computing resources in the running state.
- FIG13 is a schematic diagram of a structure of a device for generating a task scheduling execution instruction provided by an exemplary embodiment of the present disclosure.
- the device shown in FIG13 can be used to implement any of the above-mentioned embodiments of the method for generating a task scheduling execution instruction of the present disclosure.
- the device shown in FIG13 includes a first generation module 1310, a second generation module 1320, and a third generation module 1330.
- a first generating module 1310 is used to generate a first version model file corresponding to the first operator group through compiling, wherein the first version model file occupies computing resources of the neural network accelerator according to a predetermined occupancy ratio in a running state;
- the second generation module 1320 is used to generate a state information group of the first version model file generated by the first generation module 1310 based on the functional unit group corresponding to the first operator group, and the functional unit group corresponding to the first operator group includes: neural network plus
- the state information group of the first version model file includes: states corresponding to the various functional units of the neural network accelerator when the first version model file is in the running state;
- the third generation module 1320 is used to generate a task scheduling execution instruction based on the first version model file generated by the first generation module 1310, the status information group of the first version model file generated by the second generation module 1320, and a predetermined occupancy ratio.
- the task scheduling execution instruction is used to execute the task scheduling execution method in the embodiment shown in Figure 3 above.
- the second generation module 1320 includes:
- the first determining submodule 13201 is configured to determine that the state corresponding to the target functional unit in the state information group of the first version model file generated by the first generating module 1310 is a use state in response to the target functional unit being located in the functional unit group corresponding to the first operator group;
- the second determining submodule 13203 is used to determine that the state corresponding to the target functional unit in the state information group of the first version model file generated by the first generating module 1310 is an idle state in response to the target functional unit not being located in the functional unit group corresponding to the first operator group.
- the device further includes:
- the third processing module 1301 is used for dividing the neural network model into K operator groups corresponding to different functional unit groups before generating the first version model file corresponding to the first operator group through compiling processing, when the functional unit groups corresponding to the various operator units in the neural network model are not completely the same, and recording the corresponding division method information, wherein the functional unit group corresponding to any operator group includes: various functional units of the neural network accelerator for the operation of the operator group;
- a fifth determining module 1303 is configured to respectively use each operator group in at least part of the K operator groups divided by the third processing module 1301 as a first operator group;
- the third generation module 1320 is specifically used to generate a task scheduling execution instruction for each first operator group determined by the fifth determination module 1303, based on the first version model file corresponding to the first operator group, the status information group of the first version model file corresponding to the first operator group, the predetermined occupancy ratio corresponding to the first operator group, and the division method information.
- the task scheduling execution instruction is used to execute the task scheduling execution method in the embodiment shown in Figure 4 above.
- the device further includes:
- the fourth generating module 1311 is used to generate, through compilation processing, second version model files corresponding to the first operator groups determined by the fifth determining module 1303, respectively, and the second version model files completely occupy computing resources in a running state;
- the third generation module 1320 is specifically used to generate a task scheduling execution instruction for each first operator group determined by the fifth determination module 1303, based on the first version model file corresponding to the first operator group, the status information group of the first version model file corresponding to the first operator group, the predetermined occupancy ratio and division method information corresponding to the first operator group, and the second version module file corresponding to the first operator group.
- the task scheduling execution instruction is used to execute the task scheduling execution method in the embodiment shown in Figure 6 above.
- the various optional embodiments, optional implementation methods and optional examples disclosed above can be flexibly selected and combined as needed to achieve corresponding functions and effects, and the present disclosure does not list them one by one.
- FIG15 is a block diagram of an electronic device according to an embodiment of the present disclosure.
- the electronic device 1500 includes one or more processors 1510 and a memory 1520 .
- the processor 1510 may be a central processing unit (CPU) or other forms of processing units having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 1500 to perform desired functions.
- CPU central processing unit
- Other components in the electronic device 1500 may control other components in the electronic device 1500 to perform desired functions.
- the memory 1520 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
- the volatile memory may include, for example, a random access memory (RAM) and/or a cache memory (cache), etc.
- the non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, etc.
- One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 1510 may execute the one or more computer program instructions.
- the first version model file occupies the computing resources of the neural network accelerator at a predetermined occupancy ratio in a running state; based on the state information group of the first version model file and the predetermined occupancy ratio, a set of first target tasks that meet the preset concurrent execution conditions is determined as a first task group, and the state information group of the first version model file includes: the states corresponding to the various functional units of the neural network accelerator when the first version model file is in a running state; and the first version model files corresponding to the various first target tasks in the first task group are run concurrently.
- the electronic device 1500 may further include: an input device 1530 and an output device 1540 , and these components are interconnected via a bus system and/or other forms of connection mechanisms (not shown).
- the input device 1530 may include, for example, a keyboard, a mouse, etc.
- the output device 1540 may output various information to the outside.
- the output device 1540 may include, for example, a display, a speaker, a printer, a communication network and a remote output device connected thereto, and the like.
- FIG15 only shows some of the components related to the present disclosure in the electronic device 1500, omitting components such as a bus, an input/output interface, etc.
- the electronic device 1500 may further include any other appropriate components according to specific application scenarios.
- an embodiment of the present disclosure may also be a computer program product, which includes computer program instructions, which, when executed by a processor, enable the processor to execute the steps described in the above "Exemplary Method" section of this specification according to any method embodiment of the present disclosure, for example, determining that there is a first target task corresponding to a first version model file, and the first version model file occupies the computing resources of the neural network accelerator at a predetermined occupancy ratio in the running state; based on the state information group of the first version model file and the predetermined occupancy ratio, determining a set of first target tasks that meet the preset concurrent execution conditions as a first task group, and the state information group of the first version model file includes: the states corresponding to each functional unit of the neural network accelerator when the first version model file is in the running state; and concurrently running the first version model files corresponding to each first target task in the first task group.
- the computer program product may be written in any combination of one or more programming languages to write program code for performing the operations of the disclosed embodiments, including object-oriented programming languages such as Java, C++, etc., and conventional procedural programming languages such as "C" or similar programming languages.
- the program code may be executed entirely on the user computing device, partially on the user device, as a separate software package, partially on the user computing device and partially on a remote computing device, or entirely on a remote computing device or server.
- an embodiment of the present disclosure may also be a computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the processor executes the steps in any method embodiment of the present disclosure described in the above “Exemplary Method” section of this specification.
- the computer readable storage medium may adopt any combination of one or more readable media.
- the readable medium may be a readable signal medium or a readable storage medium.
- the readable storage medium may include, but is not limited to, a system, device or device of electricity, magnetism, light, electromagnetic, infrared, or semiconductor, or any combination of the above.
- readable storage media include: an electrical connection with one or more wires, a portable disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
- RAM random access memory
- ROM read-only memory
- EPROM or flash memory erasable programmable read-only memory
- CD-ROM compact disk read-only memory
- magnetic storage device or any suitable combination of the above.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Devices For Executing Special Programs (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
提供了一种任务调度执行方法、任务调度执行指令的生成方法及装置。任务调度执行方法包括:确定存在对应第一版本模型文件的第一目标任务,第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源(310);基于第一版本模型文件的状态信息组以及预定占用比例,将满足预设并发执行条件的第一目标任务的集合确定为第一任务组,第一版本模型文件的状态信息组包括:第一版本模型文件在运行状态下,神经网络加速器的各个功能单元分别对应的状态(320);并发运行第一任务组中的各个第一目标任务分别对应的第一版本模型文件(330)。实现了神经网络加速器对多个任务的并行处理,从而能够提升神经网络加速器的运算效率。
Description
本公开要求在2022年11月22日提交中国专利局、申请号为CN202211467576.X、发明名称为“任务调度执行方法、任务调度执行指令的生成方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
本公开涉及芯片技术,尤其涉及一种任务调度执行方法、任务调度执行指令的生成方法及装置。
芯片中可以包括神经网络加速器,例如可以包括大脑处理器(Brain Processing Unit,BPU)。一些情况下,神经网络加速器存在多个待处理的任务,神经网络加速器往往是按任务产生的时间先后顺序,依次执行这些任务。
发明内容
本公开的实施例提供了一种任务调度执行方法、任务调度执行指令的生成方法及装置。
根据本公开实施例的一个方面,提供了一种任务调度执行方法,包括:
确定存在对应第一版本模型文件的第一目标任务,所述第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;
基于所述第一版本模型文件的状态信息组以及所述预定占用比例,将满足预设并发执行条件的所述第一目标任务的集合确定为第一任务组,所述第一版本模型文件的状态信息组包括:所述第一版本模型文件在运行状态下,所述神经网络加速器的各个功能单元分别对应的状态;
并发运行所述第一任务组中的各个所述第一目标任务分别对应的所述第一版本模型文件。
根据本公开实施例的另一个方面,提供了一种任务调度执行指令的生成方法,包括:
通过编译处理,生成第一算子组对应的第一版本模型文件,所述第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;
基于所述第一算子组对应的功能单元组,生成所述第一版本模型文件的状态信息组,所述第一算子组对应的功能单元组包括:所述神经网络加速器的用于供所述第一算子组运行的各个功能单元,所述第一版本模型文件的状态信息组包括:所述第一版本模型文件在运行状态下,所述神经网络加速器的各个功能单元分别对应的状态;
基于所述第一版本模型文件、所述第一版本模型文件的状态信息组,以及所述预定占用比例,生成任务调度执行指令,所述任务调度执行指令用于执行上述任务调度执行方法。
根据本公开实施例的再一个方面,提供了一种任务调度执行装置,包括:
第一确定模块,用于确定存在对应第一版本模型文件的第一目标任务,所述第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;
第二确定模块,用于基于所述第一版本模型文件的状态信息组以及所述预定占用比例,将满足预设并发执行条件的所述第一确定模块确定的所述第一目标任务的集合确定为第一任务组,所述第一版本模型文件的状态信息组包括:所述第一版本模型文件在运行状态下,所述神经网络加速器的各个功能单元分别对应的状态;
第一运行模块,用于并发运行所述第二确定模块确定的所述第一任务组中的各个所述第一目标任务分别对应的所述第一版本模型文件。
根据本公开实施例的又一个方面,提供了一种任务调度执行指令的生成装置,包括:
第一生成模块,用于通过编译处理,生成第一算子组对应的第一版本模型文件,所述
第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;
第二生成模块,用于基于所述第一算子组对应的功能单元组,生成所述第一生成模块生成的所述第一版本模型文件的状态信息组,所述第一算子组对应的功能单元组包括:所述神经网络加速器的用于供所述第一算子组运行的各个功能单元,所述第一版本模型文件的状态信息组包括:所述第一版本模型文件在运行状态下,所述神经网络加速器的各个功能单元分别对应的状态;
第三生成模块,用于基于所述第一生成模块生成的所述第一版本模型文件、所述第二生成模块生成的所述第一版本模型文件的状态信息组,以及所述预定占用比例,生成任务调度执行指令,所述任务调度执行指令用于执行上述任务调度执行方法。
根据本公开实施例的又一个方面,提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述任务调度执行方法或者任务调度执行指令的生成方法。
根据本公开实施例的又一个方面,提供了一种电子设备,包括:
处理器;
用于存储所述处理器可执行指令的存储器;
所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述任务调度执行方法或者任务调度执行指令的生成方法。
根据本公开实施例的又一个方面,提供了一种计算机程序产品,当所述计算机程序产品中的指令被处理器执行时,实现上述任务调度执行方法,或者实现上述任务调度执行指令的生成方法。
基于本公开上述实施例提供的一种任务调度执行方法、任务调度执行指令的生成方法、装置、计算机可读存储介质、电子设备及产品,可以基于第一版本模型文件的状态信息组,以及第一版本模型文件在运行状态下对神经网络加速器的计算资源的预定占用比例,确定第一任务组,且并发运行第一任务组中的各个第一目标任务分别对应的第一版本模型文件,这样相当于通过任务调度机制,实现了神经网络加速器对多个任务的并行处理,从而能够提升神经网络加速器的运算效率,以较好地满足实际需求。
图1是本公开一示例性实施例中芯片的结构示意图。
图2是本公开的实施例中实现神经网络加速器对多个任务的并行处理的原理图。
图3是本公开一示例性实施例提供的任务调度执行方法的流程示意图。
图4是本公开另一示例性实施例提供的任务调度执行方法的流程示意图。
图5-1是本公开一示例性实施例提供的任务调度执行方法中任务队列和任务调度表的示意图。
图5-2是本公开一示例性实施例提供的任务调度执行方法中任务的拆分示意图。
图6是本公开再一示例性实施例提供的任务调度执行方法的流程示意图。
图7是本公开一示例性实施例提供的任务调度执行指令的生成方法的流程示意图。
图8是本公开另一示例性实施例提供的任务调度执行指令的生成方法的流程示意图。
图9是本公开再一示例性实施例提供的任务调度执行指令的生成方法的流程示意图。
图10是本公开又一示例性实施例提供的任务调度执行指令的生成方法的流程示意图。
图11是本公开一示例性实施例提供的任务调度执行装置的结构示意图。
图12是本公开另一示例性实施例提供的任务调度执行装置的结构示意图。
图13是本公开一示例性实施例提供的任务调度执行指令的生成装置的结构示意图。
图14是本公开另一示例性实施例提供的任务调度执行指令的生成装置的结构示意图。
图15是本公开一示例性实施例提供的电子设备的结构图。
为了解释本公开,下面将参考附图详细地描述本公开的示例实施例,显然,所描述的实施例仅是本公开的一部分实施例,而不是全部实施例,应理解,本公开不受示例性实施例的限制。
应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
申请概述
一些芯片中可以包括神经网络加速器,例如,人工智能(Artificial Intelligence,AI)芯片中可以包括BPU。一些情况下,神经网络加速器存在多个待处理的任务,神经网络加速器往往是按任务产生的时间先后顺序,依次执行这些任务,神经网络加速器在同一时刻仅执行一个任务。
示例性系统
芯片中的神经网络加速器可以包括:计算部件和多个功能单元(Function Unit);其中,图1中的L1SRAM(Static Random-Access Memory,静态随机存取存储器)可以作为计算部件;图1中的Tensor Core(张量核)、Vector core(矢量核)、Scalar core(标量核)、DSU(Domain Specific Unit,域特定单元)可以分别作为一个功能单元。
可选地,芯片中除了包括神经网络加速器,还可以包括其他组成,例如还可以包括图形处理器(Graphics Processing Unit,GPU)、数字信号处理器(Digital Signal Processing,DSP)等。
需要说明的是,如图2所示,本公开的实施例中可以分别对编译阶段和执行阶段进行改进;其中,在编译阶段,可以生成任务调度执行指令;在执行阶段,可以通过编译阶段生成的任务调度执行指令的执行,实现神经网络加速器对多个任务的并行处理,从而提升神经网络加速器的运算效率。
示例性方法
图3是本公开一示例性实施例提供的任务调度执行方法的流程示意图。图3所示的方法包括步骤310、步骤320和步骤330,下面分别对各步骤进行说明。
步骤310,确定存在对应第一版本模型文件的第一目标任务,第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源。
在一个可选示例中,步骤310可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一确定模块执行。
在步骤310中,可以确定神经网络加速器待处理的所有任务,并对这些任务进行遍历,以确定这些任务中的哪些任务具有对应的第一版本模型文件,之后可以将具有对应的第一版本模型文件的每个任务分别作为一个第一目标任务,这样,通过执行步骤310可以确定若干个第一目标任务(为了便于说明,后续均假设第一目标任务的数量为N个,N可以为大于或等于2的整数)。
需要说明的是,任一第一目标任务与对应的第一版本模型文件之间的关系可以理解为:通过该第一版本模型文件的运行,可以完成该第一目标任务,并且,在该第一版本模型文件的运行状态下,该第一版本模型文件对神经网络加速器的计算资源的占用比例是一个预定占用比例。
可选地,任一预定占用比例可以为大于0%且小于100%的任意比例,例如为30%、40%、60%等;不同第一版本模型文件对应的预定占用比例可以相同或者不同;神经网络加速器的计算资源可以是指神经网络加速器中的计算部件的计算资源,例如图1中的L1SRAM的计算资源。
步骤320,基于第一版本模型文件的状态信息组以及预定占用比例,将满足预设并发
执行条件的第一目标任务的集合确定为第一任务组,第一版本模型文件的状态信息组包括:第一版本模型文件在运行状态下,神经网络加速器的各个功能单元分别对应的状态。
在一个可选示例中,步骤320可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二确定模块执行。
在通过执行步骤310确定N个第一目标任务之后,可以针对N个第一目标任务中的每个第一目标任务对应的第一版本模型文件,确定对应的状态信息组和预定占用比例,由此可以得到N个状态信息组和N个预定占用比例;其中,任一第一版本模型文件对应的状态信息组中,任一功能单元对应的状态可以用于表征该第一版本模型文件在运行状态下,该功能单元是否被使用。参考N个状态信息组和N个预定占用比例,可以确定N个第一目标任务中的哪些第一目标任务满足预设并发执行条件,由此可以确定第一任务组。
在一个例子中,N个第一目标任务具体可以为4个第一目标任务,分别是任务1至任务4,且仅任务1、任务3、任务4三者满足预设并发执行条件,则可以将任务1、任务3、任务4三者组成的集合确定为一个第一任务组。
在另一个例子中,N个第一目标任务具体可以为6个第一目标任务,分别是任务1至任务6,且任务1至任务3三者满足预设并发执行条件,任务4至任务6三者满足预设并发执行条件,则可以将任务1至任务3三者组成的集合确定为一个第一任务组,并将任务4至任务6三者组成的集合确定为另一个第一任务组。
步骤330,并发运行第一任务组中的各个第一目标任务分别对应的第一版本模型文件。
在一个可选示例中,步骤330可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一运行模块执行。
如果第一任务组的数量为一个,且这个第一任务组为任务1、任务3、任务4三者组成的集合,则可以通过神经网络加速器并发运行任务1、任务3、任务4分别对应的第一版本模型文件,由此实现神经网络加速器对任务1、任务3、任务4的并行处理。
如果第一任务组的数量为两个,且其中一个第一任务组为任务1至任务3三者组成的集合,另一个第一任务组为任务4至任务6三者组成的集合,则可以先通过神经网络加速器并发运行任务1、任务2、任务3分别对应的第一版本模型文件,由此实现神经网络加速器对任务1、任务2、任务3的并行处理,然后通过神经网络加速器并发运行任务4、任务5、任务6分别对应的第一版本模型文件,由此实现神经网络加速器对任务4、任务5、任务6的并行处理。当然,根据实际情况,也可以先通过神经网络加速器并发运行任务4、任务5、任务6分别对应的第一版本模型文件,然后通过神经网络加速器并发运行任务1、任务2、任务3分别对应的第一版本模型文件。
本公开的实施例中,可以基于第一版本模型文件的状态信息组,以及第一版本模型文件在运行状态下对神经网络加速器的计算资源的预定占用比例,确定第一任务组,且并发运行第一任务组中的各个第一目标任务分别对应的第一版本模型文件,这样相当于通过任务调度机制,实现了神经网络加速器对多个任务的并行处理,从而能够提升神经网络加速器的运算效率,以较好地满足实际需求。
在一个可选示例中,多个第一目标任务满足预设并发执行条件可以包括以下任意一者或者多者:
(1)多个第一目标任务中的各个第一目标任务分别对应的状态信息组所组成的信息集合中,对应于任一功能单元的所有状态中:一个状态为使用状态,剩余状态均为空闲状态;或者,各个状态均为空闲状态;
(2)多个第一目标任务中的各个第一目标任务分别对应的预定占用比例的和值小于预设比例。
可选地,预设比例可以为100%。或者,预设比例可以为小于100%且接近100%的比例。为了便于理解,本公开的实施例中均以预设比例为100%的情况为例进行说明。
需要说明的是,任一第一目标任务对应的状态信息组中,任一功能单元对应的状态可
以为以下两者中的一者:使用状态、空闲状态;其中,使用状态可以用shared表示,空闲状态可以用available表示。
在一个例子中,神经网络加速器中包括3个功能单元,分别是Tensor Core、Vector core、DSU,N个第一目标任务具体为4个第一目标任务,分别是任务1至任务4,且任务1至任务4各自对应的状态信息组如下:
任务1:Tensor Core-shared、Vector core-available、DSU-available
任务2:Tensor Core-shared、Vector core-shared、DSU-shared
任务3:Tensor Core-available、Vector core-shared、DSU-available
任务4:Tensor Core-available、Vector core-available、DSU-shared
其中,如“A-shared”的形式表示A这个功能单元对应的状态为使用状态,“B-available”的形式表示B这个功能单元对应的状态为空闲状态。
容易看出,任务1、任务3、任务4三者中仅任务1一者对应于Tensor Core的状态为使用状态,另外两者对应于Tensor Core的状态均为空闲状态;任务1、任务3、任务4三者中仅任务3一者对应于Vector core的状态为使用状态,另外两者对应于Vector core的状态均为空闲状态;任务1、任务3、任务4三者中仅任务4一者对应于DSU的状态为使用状态,另外两者对应于DSU的状态均为空闲状态。那么,对于任务1、任务3、任务4三者,上述(1)中限定的条件满足。与此同时,假设任务1至任务4分别对应的预定占用比例依次为30%、30%、25%、35%,很明显,任务1、任务3、任务4三者对应的三个预定占用比例的和值小于100%,那么,对于任务1、任务3、任务4三者,上述(2)中限定的条件也满足,由此可以确定任务1、任务3、任务4三者满足预设并发执行条件。这样,可以在通过Tensor Core的使用实现任务1的同时,通过Vector core的使用实现任务3,并通过DSU的使用实现任务4,也即,神经网络加速器在同一时刻可以执行任务1、任务3、任务4三者。
需要说明的是,在对于任务1、任务3、任务4三者,上述(1)中限定的条件满足的情况下,假设任务1至任务4分别对应的预定占用比例依次为40%、25%、50%、65%,很明显,任务1、任务3、任务4三者对应的三个预定占用比例的和值大于100%,任务1和任务4两者对应的两个预定占用比例的和值大于100%,任务3和任务4两者对应的两个预定占用比例的和值大于100%,任务1和任务3两者对应的两个预定占用比例的和值小于100%,也即,对于任务1和任务3两者而言,上述(2)中限定的条件满足,由此可以确定任务1和任务3两者满足预设并发执行条件。这样,可以在通过Tensor Core的使用实现任务1的同时,通过Vector core的使用实现任务3,也即,神经网络加速器在同一时刻可以执行任务1和任务3两者。
假设上述例子中的N个第一目标任务不是4个第一目标任务,而是5个第一目标任务,例如除了包括上述的任务1至任务4之外,还包括任务5,任务5对应的预定占用比例为30%,任务5对应的状态信息组如下:
任务5:Tensor Core-shared、Vector core-shared、DSU-available
假设任务1至任务4分别对应的预定占用比例依次为40%、25%、50%、65%,容易看出,任务1和任务3两者满足预设并发执行条件,并且,由于任务4和任务5中仅一者对应于Tensor Core的状态为使用状态,任务4和任务5中仅一者对应于Vector core的状态为使用状态,任务4和任务5中仅一者对应于DSU的状态为使用状态,且任务4和任务5两者对应的两个预定占用比例的和值为95%,小于100%,可以判定任务4和任务5也满足预设并发执行条件,这样,可以将任务1和任务3划分至一个第一任务组中,将任务4和任务5划分至另一第一任务组中,由此可以确定出两个第一任务组。
本公开的实施例中,通过上述(1)中限定的条件,针对神经网络加速器中的每个功能单元,同一时刻第一任务组中至多可以有一个第一目标任务使用该功能单元,由此可以避免第一任务组中的不同第一目标任务同时使用同一功能单元,从而避免功能单元的使用
冲突;通过上述(2)中限定的条件,可以保证神经网络加速器的计算资源能够支持第一任务组中的各个第一目标任务的并行处理,从而保证通过并行处理,成功完成第一任务组中的各个第一目标任务,由此可以提升神经网络加速器中的计算部件的利用率。
在图3所示实施例的基础上,如图4所示,该方法还包括步骤301、步骤303、步骤305和步骤307。
步骤301,获取任务队列,任务队列中的各个任务分别对应一个神经网络模型。
在一个可选示例中,步骤301可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取模块执行。
可选地,任务队列可以为图5-1中的BPU Task Queue。
可以理解的是,神经网络模型可以认为是算子单元序列,也即,神经网络模型中可以包括按照一定顺序排列的多个(例如40个、50个、100个等)算子单元;其中,多个算子单元包括但不限于卷积(Convolution,Conv)算子单元、池化(Pooling,Pool)算子单元、反卷积算子单元、修正线性单元(Rectified Linear Unit,ReLU)算子单元、批归一化(Batch Normalization,BN)算子单元等。
需要说明的是,任务队列中的任一任务与对应的神经网络模型之间的关系可以理解为:该任务需要依赖于该神经网络模型完成,例如,该任务为目标检测任务,该神经网络模型为用于目标检测的模型,则通过将待进行目标检测的图像作为输入提供给该神经网络模型进行运算处理,并得到该神经网络模型输出的目标检测结果,可以认为完成了该任务。
步骤303,确定第二目标任务对应的目标神经网络模型是否存在对应的划分方式信息,第二目标任务为任务队列中的任一任务;响应于目标神经网络模型存在对应的划分方式信息,执行步骤305;响应于目标神经网络模型不存在对应的划分方式信息,执行步骤307。
在一个可选示例中,步骤303可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三确定模块执行。
可选地,目标存储区中可以存储有神经网络模型与划分方式信息之间的对应关系,目标存储区存储的对应关系的由来可以参照下文中编译阶段的相关说明,在此不做展开介绍。
在步骤303中,可以对目标存储区中存储的对应关系进行遍历。如果通过对目标存储区中存储的对应关系的遍历,确定目标存储区中存储的对应关系中存在目标神经网络模型对应的划分方式信息,则可以执行步骤305;如果通过对目标存储区中存储的对应关系的遍历,确定目标存储区中存储的对应关系中不存在目标神经网络模型对应的划分方式信息,则可以执行步骤307。
步骤305,对第二目标任务进行划分,得到K个划分任务,并将K个划分任务添加至任务调度表中;K个划分任务与按照目标神经网络模型对应的划分方式信息,对目标神经网络模型进行划分得到的K个算子组对应。
在一个可选示例中,步骤305可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一处理模块执行。
可选地,任务调度表可以为图5-1中的Task Scheduler。
需要说明的是,目标神经网络模型对应的划分方式信息可以用于将目标神经网络模型划分为K个算子组,每个算子组中包括目标神经网络模型中的至少一个算子单元。这样,可以基于目标神经网络模型对应的划分方式信息,对第二目标任务进行划分,以得到与K个算子组一一对应的K个划分任务。可选地,K可以为2、3、4或者大于4的整数,在此不再一一列举。
步骤307,将第二目标任务添加至任务调度表中。
在一个可选示例中,步骤307可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二处理模块执行。
步骤310,包括步骤3101。
步骤3101,从任务调度表中,确定存在对应第一版本模型文件的第一目标任务。
在步骤3101中,可以对任务调度表中的所有任务进行遍历,以确定这些任务中的哪些任务具有对应的第一版本模型文件,之后可以将具有对应的第一版本模型文件的每个任务分别作为一个第一目标任务。
在一个例子中,第二目标任务为图5-2中的Task1,目标神经网络模型包括5个算子单元,依次为Convolution1、Pooling1、Convolution2、Pooling2、Convolution3,则可以对Task1进行划分,得到5个划分任务,分别是Task1.1、Task1.2、Task1.3、Task1.4、Task1.5;其中,Task1.1与Convolution1对应,Task1.2与Pooling1对应,Task1.3与Convolution2对应、Task1.4与Pooling2对应、Task1.5与Convolution3对应。假设各个Convolution均执行在Tensor Core上,各个Pooling均执行在Vector core上,则Task1执行时需要运用Tensor Core和Vector core,Task1.1执行时需要运用Tensor Core,Task1.2执行时需要运用Vector core,Task1.3执行时需要运用Tensor Core,Task1.4执行时需要运用Vector core,Task1.5执行时需要运用Vector core,很明显,Task1.1至Task1.5中的任一者执行时需要运用的功能单元均少于Task1执行时需要运用的功能单元。
这样,如果Task1.1和Task1.2可以分别作为一个第一目标任务,且Task1.1和Task1.2两者对应的两个预定占用比例的和值小于100%,神经网络加速器可以对Task1.1和Task1.2两者进行并行处理,以提升神经网络加速器的运算效率。类似地,如果Task1.3和Task1.4后续可以分别作为一个第一目标任务,且Task1.3和Task1.4两者对应的两个预定占用比例的和值小于100%,神经网络加速器可以对Task1.3和Task1.4两者进行并行处理。
本公开的实施例中,可以依据任务队列中的第二目标任务对应的神经网络模型是否存在对应的划分方式信息,确定是直接将第二目标任务添加至任务调度表中,还是将对第二目标任务进行划分得到的K个划分任务添加至任务调度表中,这样,对于存在对应的划分方式信息的任务,通过划分处理,可以得到相比于该任务细粒度更小的若干个划分任务,每个划分任务执行时需要运用的功能单元相对更少,由此可以提高任务调度表中的不同任务可并行处理的概率,从而有利于提升神经网络加速器的运算效率。
在图4所示实施例的基础上,如图6所示,该方法还包括步骤340和步骤350。
步骤340,确定第二任务组,第二任务组包括:任务调度表中除第一任务组中的各个第一目标任务之外的第三目标任务的集合。
在一个可选示例中,步骤340可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第四确定模块执行。
在步骤340中,可以对任务调度表中的所有任务进行遍历,以确定任务调度表中的哪些任务不位于第一任务组中,这些任务中的每个任务可以分别作为第一第三目标任务,所有第三目标任务组成的集合可以作为第二任务组。
步骤350,按照预设顺序运行第二任务组中的各个第三目标任务分别对应的第二版本模型文件,第二版本模型文件在运行状态下完整占用计算资源。
在一个可选示例中,步骤350可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二运行模块执行。
需要说明的是,任一第三目标任务与对应的第二版本模型文件之间的关系可以理解为:通过该第二版本模型文件的运行,可以完成该第三目标任务,并且,在该第二版本模型文件的运行状态下,该第二版本模型文件对神经网络加速器的计算资源的占用比例是100%。
可选地,任一第二版本模型文件可以具有状态信息组,并且,任一第二版本模型的状态信息组包括:该第二版本模型文件在运行状态下,神经网络加速器的各个功能单元分别对应的状态,且该状态信息组中的各个状态均为独占状态;其中,独占状态可以用exclusive表示。
在步骤350中,可以确定第二任务组中的各个第三目标任务分别添加至任务调度表中的时间,并按照添加时间由早至晚的顺序,通过神经网络加速器依次运行第二任务组中的各个第三目标任务分别对应的第二版本模型文件,由此可以实现第二任务组中的各个第三
目标任务的串行处理。
本公开的实施例中,对于任务调度表中无法并行处理的那部分任务,可以通过神经网络加速器对这部分任务进行串行处理,这样,任务调度表中的任务均能够被成功处理,不会造成任务的遗漏。
在一个例子中,如图5-1所示,任务队列中存在三个任务,分别是Task1、Task2、Task3,假设Task1对应的神经网络模型为model1,Task2对应的神经网络模型为model2,Task3对应的神经网络模型为model3,model1不存在对应的划分方式信息,model2存在对应的划分方式信息,且该划分方式信息用于将model2划分为算子组1和算子组2,model3存在对应的划分方式信息且该划分方式信息用于将model3划分为算子组3和算子组4。那么,可以不对Task1进行划分,且可以对Task2进行划分,以得到Task2.1和Task2.2,还可以对Task3进行划分,以得到Task3.1和Task3.2;其中,Task2.1与算子组1对应,Task2.2与算子组2对应,Task3.1与算子组3对应,Task3.2与算子组4对应。Task1、Task2.1、Task2.2、Task3.1、Task3.2均可以添加至Task Scheduler中。
假设神经网络加速器中包括2个功能单元,分别是Tensor Core、Vector core,Task1、Task2.1、Task3.1均不存在对应的第一版本模型文件,而仅存在对应的第二版本模型文件,而Task2.2、Task3.2均存在对应的第一版本模型文件,则Task Scheduler中还可以添加有Task1、Task2.1、Task3.1分别对应的第二版本模型文件的状态信息组,以及Task2.2、Task3.2分别对应的第一版本模型文件的状态信息组,Task Scheduler中添加的内容具体可以参见图5-1;其中,如“A_exclusive”的形式表示A这个功能单元对应的状态为独占状态,如“B_available”的形式表示B这个功能单元对应的状态为空闲状态,如“C_shared”的形式表示C这个功能单元对应的状态为使用状态。
由于Task1、Task2.1、Task3.1均不存在对应的第一版本模型文件,Task1、Task2.1、Task3.1均不能作为第一目标任务,而仅能分别作为一个第三目标任务,这样,Task1、Task2.1、Task3.1中的任一者均无法与其他任务并行处理,而只能分别单独处理。由于Task2.2和Task3.2两者中仅一者对应于Tensor Core的状态为使用状态,且Task2.2和Task3.2两者中仅一者对应于Vector core的状态为使用状态,如果Task2.2和Task3.2各自对应的第一版本模型文件所对应的预定占用比例的和值小于100%,则可以确定Task2.2和Task3.2两者满足预设并发执行条件,那么,神经网络加速器可以先并行处理Task2.2和Task3.2,然后再串行处理Task1、Task2.1、Task3.1,由此实现对Task Scheduler中的所有任务的处理。
本公开实施例提供的任一种任务调度执行方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本公开实施例提供的任一种任务调度执行方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本公开实施例提及的任一种任务调度执行方法。下文不再赘述。
图7是本公开一示例性实施例提供的任务调度执行指令的生成方法的流程示意图。图7所示的方法包括步骤710、步骤720和步骤730,下面分别对各步骤进行说明。
步骤710,通过编译处理,生成第一算子组对应的第一版本模型文件,第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源。
在一个可选示例中,步骤710可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一生成模块执行。
可选地,可以将一个完整的神经网络模型作为第一算子组,或者,可以将一个完整的神经网络模型中连续若干个算子单元组成的集合作为第一算子组;第一算子组的数量可以为多个,多个第一算子组中的每个第一算子组可以对应上文中的一个第一目标任务。
在步骤710中,可以通过编译器进行编译处理,以生成各个第一算子组分别对应的第一版本模型文件,具体编译处理方式可以根据实际需求采用任意可实施的方式,本公开对此不做赘述。
步骤720,基于第一算子组对应的功能单元组,生成第一版本模型文件的状态信息组,
第一算子组对应的功能单元组包括:神经网络加速器的用于供第一算子组运行的各个功能单元,第一版本模型文件的状态信息组包括:第一版本模型文件在运行状态下,神经网络加速器的各个功能单元分别对应的状态。
在一个可选示例中,步骤720可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二生成模块执行。
在步骤720中,可以先确定第一算子组对应的功能单元组,假设神经网络加速器中包括3个功能单元,分别是Tensor Core、Vector core、DSU,第一算子组运行时需要用到Tensor Core和Vector core,则第一算子组对应的功能单元组中包括Tensor Core和Vector core。接下来,可以参考第一算子组对应的功能单元组,生成第一版本模型文件的状态信息组,第一版本文件的状态信息组中,任一功能单元对应的状态可以用于表征该第一版本模型文件在运行状态下,该功能单元是否被使用。
步骤730,基于第一版本模型文件、第一版本模型文件的状态信息组,以及预定占用比例,生成任务调度执行指令,任务调度执行指令用于任务调度执行方法(其具体可以为图3所示实施例中的任务调度执行方法)。
在一个可选示例中,步骤730可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三生成模块执行。
本公开的实施例中,在编译阶段,可以依次执行第一版本模型文件的生成步骤,第一版本模型文件的状态信息组的生成步骤,再结合第一版本模型文件对应的预定占用比例,可以进行任务调度执行指令的生成,这样,在执行阶段,通过执行编译阶段生成的任务调度执行指令,可以依次确定第一目标任务和第一任务组,以便并发运行第一任务组中的各个第一目标任务分别对应的第一版本模型文件,这样相当于通过任务调度机制,实现了神经网络加速器对多个任务的并行处理,从而能够提升神经网络加速器的运算效率,以较好地满足实际需求。
在图7所示实施例的基础上,可以将神经网络加速器的任一功能单元作为目标功能单元,如图8所示,步骤720包括步骤7201和步骤7203。
步骤7201,响应于目标功能单元位于第一算子组对应的功能单元组中,确定第一版本模型文件的状态信息组中目标功能单元对应的状态为使用状态。
在一个可选示例中,步骤7201可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一确定子模块执行。
步骤7203,响应于目标功能单元不位于第一算子组对应的功能单元组中,确定第一版本模型文件的状态信息组中目标功能单元对应的状态为空闲状态。
在一个可选示例中,步骤7203可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二确定子模块执行。
在一个例子中,神经网络加速器中包括3个功能单元,分别是Tensor Core、Vector core、DSU,第一算子组对应的功能单元组中包括Tensor Core和Vector core,由于Tensor Core和Vector core均位于第一算子组对应的功能单元组中,第一版本模型文件的状态信息组中Tensor Core和Vector core分别对应的状态可以均为使用状态,又由于DSU不位于第一算子组对应的功能单元组中,第一版本模型文件的状态信息组中DSU对应的状态可以为空闲状态,这样,第一版本模型文件的状态信息组可以表示为如下形式:Tensor Core-shared、Vector core-shared、DSU-available。
本公开的实施例中,参考目标功能单元在第一算子组对应的功能单元组中存在与否,能够高效可靠地确定目标功能单元对应于神经网络加速器中的各个功能单元分别为何种状态,由此可以高效可靠地生成第一版本模型文件的状态信息组。
在图7所示实施例的基础上,如图9所示,在步骤710之前,该方法还包括步骤701和步骤703。
步骤701,在神经网络模型中的各个算子单元分别对应的功能单元组不完全相同的情
况下,将该神经网络模型划分为分别对应不同功能单元组的K个算子组,并记录相应的划分方式信息,任一算子组对应的功能单元组包括:神经网络加速器的用于供该算子组运行的各个功能单元。
在一个可选示例中,步骤701可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三处理模块执行。
可选地,K可以为2、3、4或者大于4的整数,在此不再一一列举。
在步骤701中,可以先确定神经网络模型中的各个算子单元分别对应的功能单元组;其中,任一算子单元对应的功能单元组包括:神经网络加速器的用于供该算子单元运行的各个功能单元。接下来,可以将该神经网络模型中的各个算子单元分别对应的功能单元组进行比对,以确定该神经网络模型中的各个算子单元分别对应的功能单元组是否完全相同。
在该神经网络模型的各个算子单元分别对应的功能单元组完全相同的情况下,可以不对该神经网络模型进行划分,该神经网络模型自然也不存在对应的划分方式信息。
在一个例子中,神经网络加速器中包括3个功能单元,分别是Tensor Core、Vector core、DSU,该神经网络模型为图5-1中的Task2对应的神经网络模型,也即,该神经网络模型包括的各个算子单元对应的功能单元组均仅包括Tensor Core,那么,可以不对该神经网络模型进行划分,这样,在执行阶段,Task2整体可以运行在Tensor Core上。与Task2类似,在执行阶段,图5-1中的Task3整体可以运行在Vector core上。
在该神经网络模型中的各个算子单元分别对应的功能单元组不完全相同的情况下,可以将该神经网络模型划分为分别对应不同功能单元组的K个算子组,并记录相应的划分方式信息。
在一个例子中,神经网络加速器中包括3个功能单元,分别是Tensor Core、Vector core、DSU,该神经网络模型包括30个算子单元,其中,前10个算子单元对应的功能单元组中包括Tensor Core、Vector core,中间10个算子单元对应的功能单元组中包括Vector core和DSU,后10个算子单元对应的功能单元组中包括Tensor Core、Vector core、DSU,那么,可以将该神经网络模型划分为3个算子组;其中,第1个算子组中包括该神经网络模型包括的30个算子单元中的前10个算子单元,第2个算子组中包括该神经网络模型包括的30个算子单元中的中间10个算子单元,第3个算子组中包括该神经网络模型包括30个算子单元中的后10个算子单元,针对该神经网络模型记录的划分方式信息可以用于表征将该神经网络模型包括的30个算子单元平均划分为3份。或者,也可以将该神经网络模型划分为2个算子组;其中,第1个算子组中包括该神经网络模型包括的30个算子单元中的前10个算子单元,第2个算子组中包括该神经网络模型包括的30个算子单元中的剩余20个算子单元,针对该神经网络模型记录的划分方式信息可以用于表征将该神经网络模型包括的30个算子单元按照1:2的比例划分。需要说明的是,在针对该神经网络模型进行划分方式信息的记录之后,还可以在目标存储区中记录该神经网络模型与该划分方式信息之间的对应关系。
步骤703,将K个算子组中的至少部分算子组中的各个算子组分别作为一个第一算子组。
在一个可选示例中,步骤703可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第五确定模块执行。
可选地,可以将K个算子组中的各个算子组分别作为一个第一算子组。
步骤730,包括:
步骤7301,分别针对各个第一算子组,基于该第一算子组对应的第一版本模型文件、该第一算子组对应的第一版本模型文件的状态信息组、该第一算子组对应的预定占用比例,以及划分方式信息,生成任务调度执行指令,任务调度执行指令用于执行任务调度执行方法(其具体可以为图4所示实施例中的任务调度执行方法)。
通过执行步骤7301,可以针对每个第一算子组,分别生成对应的任务调度执行指令。
本公开的实施例中,对于神经网络模型中的各个算子单元分别对应的功能单元组不完全相同的情况下,可以参照各个算子单元分别对应的功能单元组,针对该神经网络模型进行划分,以使得划分得到的不同算子组对应不同功能单元组,并进行划分方式信息的记录,这样,在执行阶段,可以参考目标存储区中存储的对应关系,确定是直接将第二目标任务添加至任务调度表中,还是对第二目标任务进行划分后再添加至任务调度表中,通过任务的划分处理,有利于提高任务调度表中的不同任务可并行处理的概率。
在图9所示实施例的基础上,如图10所示,该方法还包括步骤711。
步骤711,通过编译处理,生成各个第一算子组分别对应的第二版本模型文件,第二版本模型文件在运行状态下完整占用计算资源。
在一个可选示例中,步骤711可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第四生成模块执行。
在步骤711中,可以通过编译器进行编译处理,以生成各个第一算子组分别对应的第二版本模型文件,具体编译处理方式可以根据实际需求采用任意可实施的方式,本公开对此不做赘述。
步骤7301,包括步骤73011。
步骤73011,分别针对各个第一算子组,基于该第一算子组对应的第一版本模型文件、该第一算子组对应的第一版本模型文件的状态信息组、该第一算子组对应的预定占用比例、划分方式信息,以及该第一算子组对应的第二版本模块文件,生成任务调度执行指令,任务调度执行指令用于执行任务调度执行方法(其具体可以为图6所示实施例中的任务调度执行方法)。
本公开的实施例中,通过生成各个第一算子组分别对应的第二版本模型文件,并将所生成的第二版本模型文件用于任务调度执行指令的生成,在执行阶段,对于任务调度表中无法并行处理的那部分任务,可以通过神经网络加速器对这部分任务进行串行处理,以保证这部分任务能够被成功处理,这样,任务调度表中的各个任务均能够被成功处理。
在一个可选示例中,在编译阶段,可以针对多个算子组中的每个算子组(例如上文中的每个第一算子组),进行多版本模型文件的编译,以生成用于实现相同功能的第一版本模型文件和第二版本模型文件;其中,第一版本模型文件按照预定占用比例占用L1SRAM,第二版本模型文件占用全部的L1SRAM。
需要说明的是,第一版本模型文件和第二版本模型文件均可以具有对应的状态信息组,每个状态信息组中均包括神经网络加速器的各个功能单元分别对应的状态,任一功能单元对应的状态有三种可能的情况,分别是:exclusive、shared、available;其中,exclusive表示算子组需要独占全部的L1SRAM;available表示算子组无需使用该功能单元,且算子组仅占用部分L1SRAM;shared表示算子组需要使用该功能单元,且可以共享除了自身需要使用的功能单元之外的功能单元,算子组仅占用部分L1SRAM。
这样,在执行阶段,参考第一版本模型文件的状态信息组以及第一版本模型文件对应的预定占用比例,能够高效可靠地确定哪些任务满足预设并发执行条件,之后可以对这些任务进行并行处理,从而提升神经网络加速器的运算效率,对于无法并行处理的任务,可以对这部分任务进行串行处理。
本公开实施例提供的任一种任务调度执行指令的生成方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本公开实施例提供的任一种任务调度执行指令的生成方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本公开实施例中的任一种任务调度执行指令的生成方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:只读存储器(ROM)、随机存取存储器(RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
示例性装置
图11是本公开一示例性实施例提供的任务调度执行装置的结构示意图。图11所示的装置可以用于实现本公开上述任一任务调度执行方法实施例。图11所示的装置包括第一确定模块1110、第二确定模块1120和第一运行模块1130。
第一确定模块1110,用于确定存在对应第一版本模型文件的第一目标任务,第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;
第二确定模块1120,用于基于第一版本模型文件的状态信息组以及预定占用比例,将满足预设并发执行条件的第一确定模块1110确定的第一目标任务的集合确定为第一任务组,第一版本模型文件的状态信息组包括:第一版本模型文件在运行状态下,神经网络加速器的各个功能单元分别对应的状态;
第一运行模块1130,用于并发运行第二确定模块1120确定的第一任务组中的各个第一目标任务分别对应的第一版本模型文件。
在一个可选示例中,多个第一目标任务满足预设并发执行条件包括以下任意一者或者多者:
多个第一目标任务中的各个第一目标任务分别对应的状态信息组所组成的信息集合中,对应于任一功能单元的所有状态中:一个状态为使用状态,剩余状态均为空闲状态;或者,各个状态均为空闲状态;
多个第一目标任务中的各个第一目标任务分别对应的预定占用比例的和值小于预设比例。
在一个可选示例中,如图12所示,该装置还包括:
获取模块1101,用于获取任务队列,任务队列中的各个任务分别对应一个神经网络模型;
第三确定模块1103,用于确定第二目标任务对应的目标神经网络模型是否存在对应的划分方式信息,第二目标任务为获取模块1101获取的任务队列中的任一任务;
第一处理模块1105,用于响应于第三确定模块1103确定出目标神经网络模型存在对应的划分方式信息,对第二目标任务进行划分,得到K个划分任务,并将K个划分任务添加至任务调度表中;K个划分任务与按照目标神经网络模型对应的划分方式信息,对目标神经网络模型进行划分得到的K个算子组对应;
第二处理模块1107,用于响应于第三确定模块1103确定出目标神经网络模型不存在对应的划分方式信息,将第二目标任务添加至任务调度表中;
第一确定模块1110,具体用于从任务调度表中,确定存在对应第一版本模型文件的第一目标任务。
在一个可选示例中,如图12所示,该装置还包括:
第四确定模块1140,用于确定第二任务组,第二任务组包括:任务调度表中除第二确定模块1120确定的第一任务组中的各个第一目标任务之外的第三目标任务的集合;
第二运行模块1150,用于按照预设顺序运行第四确定模块1140确定的第二任务组中的各个第三目标任务分别对应的第二版本模型文件,第二版本模型文件在运行状态下完整占用计算资源。
图13是本公开一示例性实施例提供的任务调度执行指令的生成装置的结构示意图。图13所示的装置可以用于实现本公开上述任一任务调度执行指令的生成方法实施例。图13所示的装置包括第一生成模块1310、第二生成模块1320和第三生成模块1330。
第一生成模块1310,用于通过编译处理,生成第一算子组对应的第一版本模型文件,第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;
第二生成模块1320,用于基于第一算子组对应的功能单元组,生成第一生成模块1310生成的第一版本模型文件的状态信息组,第一算子组对应的功能单元组包括:神经网络加
速器的用于供第一算子组运行的各个功能单元,第一版本模型文件的状态信息组包括:第一版本模型文件在运行状态下,神经网络加速器的各个功能单元分别对应的状态;
第三生成模块1320,用于基于第一生成模块1310生成的第一版本模型文件、第二生成模块1320生成的第一版本模型文件的状态信息组,以及预定占用比例,生成任务调度执行指令,任务调度执行指令用于执行上述图3所示实施例中的任务调度执行方法。
在一个可选示例中,将神经网络加速器的任一功能单元作为目标功能单元,如图14所示,第二生成模块1320,包括:
第一确定子模块13201,用于响应于目标功能单元位于第一算子组对应的功能单元组中,确定第一生成模块1310生成的第一版本模型文件的状态信息组中目标功能单元对应的状态为使用状态;
第二确定子模块13203,用于响应于目标功能单元不位于第一算子组对应的功能单元组中,确定第一生成模块1310生成的第一版本模型文件的状态信息组中目标功能单元对应的状态为空闲状态。
在一个可选示例中,如图14所示,该装置还包括:
第三处理模块1301,用于在通过编译处理,生成第一算子组对应的第一版本模型文件之前,在神经网络模型中的各个算子单元分别对应的功能单元组不完全相同的情况下,将该神经网络模型划分为分别对应不同功能单元组的K个算子组,并记录相应的划分方式信息,任一算子组对应的功能单元组包括:神经网络加速器的用于供该算子组运行的各个功能单元;
第五确定模块1303,用于将第三处理模块1301划分得到的K个算子组中的至少部分算子组中的各个算子组分别作为一个第一算子组;
第三生成模块1320,具体用于分别针对第五确定模块1303确定的各个第一算子组,基于该第一算子组对应的第一版本模型文件、该第一算子组对应的第一版本模型文件的状态信息组、该第一算子组对应的预定占用比例,以及划分方式信息,生成任务调度执行指令,任务调度执行指令用于执行上述图4所示实施例中的任务调度执行方法。
在一个可选示例中,如图14所示,该装置还包括:
第四生成模块1311,用于通过编译处理,生成第五确定模块1303确定的各个第一算子组分别对应的第二版本模型文件,第二版本模型文件在运行状态下完整占用计算资源;
第三生成模块1320,具体用于分别针对第五确定模块1303确定的各个第一算子组,基于该第一算子组对应的第一版本模型文件、该第一算子组对应的第一版本模型文件的状态信息组、该第一算子组对应的预定占用比例、划分方式信息,以及该第一算子组对应的第二版本模块文件,生成任务调度执行指令,任务调度执行指令用于执行上述图6所示实施例中的任务调度执行方法。
在本公开的装置中,上述公开的各种可选实施例、可选实施方式和可选示例,都可以根据需要进行灵活的选择和组合,从而实现相应的功能和效果,本公开不进行一一列举。
示例性电子设备
图15图示了根据本公开实施例的电子设备的框图。电子设备1500包括一个或多个处理器1510和存储器1520。
处理器1510可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备1500中的其他组件以执行期望的功能。
存储器1520可以包括一个或多个计算机程序产品,所述计算机程序产品可包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器1510可以运行一个或多个计算机程序指令,
以实现上文所述的本公开的任一方法实施例,例如,确定存在对应第一版本模型文件的第一目标任务,第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;基于第一版本模型文件的状态信息组以及预定占用比例,将满足预设并发执行条件的第一目标任务的集合确定为第一任务组,第一版本模型文件的状态信息组包括:第一版本模型文件在运行状态下,神经网络加速器的各个功能单元分别对应的状态;并发运行第一任务组中的各个第一目标任务分别对应的第一版本模型文件。
在一个可选示例中,电子设备1500还可以包括:输入装置1530和输出装置1540,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。
此外,输入装置1530可以包括例如键盘、鼠标等。输出装置1540可以向外部输出各种信息。输出装置1540可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。
当然,为了简化,图15中仅示出了该电子设备1500中与本公开有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备1500还可以包括任何其他适当的组件。
示例性计算机程序产品和计算机可读存储介质
除了上述方法和设备以外,本公开的实施例还可以是计算机程序产品,其包括计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开任一方法实施例中的步骤,例如,确定存在对应第一版本模型文件的第一目标任务,第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;基于第一版本模型文件的状态信息组以及预定占用比例,将满足预设并发执行条件的第一目标任务的集合确定为第一任务组,第一版本模型文件的状态信息组包括:第一版本模型文件在运行状态下,神经网络加速器的各个功能单元分别对应的状态;并发运行第一任务组中的各个第一目标任务分别对应的第一版本模型文件。
所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例操作的程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。
此外,本公开的实施例还可以是计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开任一方法实施例中的步骤。
所述计算机可读存储介质可采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或上述的任意合适的组合。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。
本领域的技术人员可以对本公开进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。
Claims (13)
- 一种任务调度执行方法,包括:确定存在对应第一版本模型文件的第一目标任务,所述第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;基于所述第一版本模型文件的状态信息组以及所述预定占用比例,将满足预设并发执行条件的所述第一目标任务的集合确定为第一任务组,所述第一版本模型文件的状态信息组包括:所述第一版本模型文件在运行状态下,所述神经网络加速器的各个功能单元分别对应的状态;并发运行所述第一任务组中的各个所述第一目标任务分别对应的所述第一版本模型文件。
- 根据权利要求1所述的方法,其中,多个所述第一目标任务满足所述预设并发执行条件包括以下任意一者或者多者:多个所述第一目标任务中的各个所述第一目标任务分别对应的所述状态信息组所组成的信息集合中,对应于任一所述功能单元的所有状态中:一个状态为使用状态,剩余状态均为空闲状态;或者,各个状态均为空闲状态;多个所述第一目标任务中的各个所述第一目标任务分别对应的所述预定占用比例的和值小于预设比例。
- 根据权利要求1或2所述的方法,所述方法还包括:获取任务队列,所述任务队列中的各个任务分别对应一个神经网络模型;确定第二目标任务对应的目标神经网络模型是否存在对应的划分方式信息,所述第二目标任务为所述任务队列中的任一任务;响应于所述目标神经网络模型存在对应的划分方式信息,对所述第二目标任务进行划分,得到K个划分任务,并将所述K个划分任务添加至任务调度表中;所述K个划分任务与按照所述目标神经网络模型对应的划分方式信息,对所述目标神经网络模型进行划分得到的K个算子组对应;响应于所述目标神经网络模型不存在对应的划分方式信息,将所述第二目标任务添加至所述任务调度表中;所述确定存在对应第一版本模型文件的第一目标任务,包括:从所述任务调度表中,确定存在对应第一版本模型文件的所述第一目标任务。
- 根据权利要求3所述的方法,所述方法还包括:确定第二任务组,所述第二任务组包括:所述任务调度表中除所述第一任务组中的各个所述第一目标任务之外的第三目标任务的集合;按照预设顺序运行所述第二任务组中的各个所述第三目标任务分别对应的第二版本模型文件,所述第二版本模型文件在运行状态下完整占用所述计算资源。
- 一种任务调度执行指令的生成方法,包括:通过编译处理,生成第一算子组对应的第一版本模型文件,所述第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;基于所述第一算子组对应的功能单元组,生成所述第一版本模型文件的状态信息组,所述第一算子组对应的功能单元组包括:所述神经网络加速器的用于供所述第一算子组运行的各个功能单元,所述第一版本模型文件的状态信息组包括:所述第一版本模型文件在运行状态下,所述神经网络加速器的各个功能单元分别对应的状态;基于所述第一版本模型文件、所述第一版本模型文件的状态信息组,以及所述预定占用比例,生成任务调度执行指令,所述任务调度执行指令用于执行上述权利要求1或2所述的任务调度执行方法。
- 根据权利要求5所述的方法,其中,将所述神经网络加速器的任一所述功能单元作为目标功能单元,所述基于所述第一算子组对应的功能单元组,生成所述第一版本模型文件的状态信息组,包括:响应于所述目标功能单元位于所述第一算子组对应的功能单元组中,确定所述第一版本模型文件的状态信息组中所述目标功能单元对应的状态为使用状态;响应于所述目标功能单元不位于所述第一算子组对应的功能单元组中,确定所述第一版本模型文件的状态信息组中所述目标功能单元对应的状态为空闲状态。
- 根据权利要求5所述的方法,其中,所述通过编译处理,生成第一算子组对应的第一版本模型文件之前,所述方法还包括:在神经网络模型中的各个算子单元分别对应的功能单元组不完全相同的情况下,将该神经网络模型划分为分别对应不同功能单元组的K个算子组,并记录相应的划分方式信息,任一算子组对应的功能单元组包括:所述神经网络加速器的用于供该算子组运行的各个功能单元;将所述K个算子组中的至少部分算子组中的各个所述算子组分别作为一个所述第一算子组;所述基于所述第一版本模型文件、所述第一版本模型文件的状态信息组,以及所述预定占用比例,生成任务调度执行指令,所述任务调度执行指令用于执行上述权利要求1或2所述的任务调度执行方法,包括:分别针对各个所述第一算子组,基于该第一算子组对应的所述第一版本模型文件、该第一算子组对应的所述第一版本模型文件的状态信息组、该第一算子组对应的所述预定占用比例,以及所述划分方式信息,生成任务调度执行指令,所述任务调度执行指令用于执行上述权利要求3所述的任务调度执行方法。
- 根据权利要求7所述的方法,还包括:通过编译处理,生成各个所述第一算子组分别对应的第二版本模型文件,所述第二版本模型文件在运行状态下完整占用所述计算资源;所述分别针对各个所述第一算子组,基于该第一算子组对应的所述第一版本模型文件、该第一算子组对应的所述第一版本模型文件的状态信息组、该第一算子组对应的所述预定占用比例,以及所述划分方式信息,生成任务调度执行指令,所述任务调度执行指令用于执行上述权利要求3所述的任务调度执行方法,包括:分别针对各个所述第一算子组,基于该第一算子组对应的所述第一版本模型文件、该第一算子组对应的所述第一版本模型文件的状态信息组、该第一算子组对应的所述预定占用比例、所述划分方式信息,以及该第一算子组对应的所述第二版本模块文件,生成任务调度执行指令,所述任务调度执行指令用于执行上述权利要求4所述的任务调度执行方法。
- 一种任务调度执行装置,包括:第一确定模块,用于确定存在对应第一版本模型文件的第一目标任务,所述第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;第二确定模块,用于基于所述第一版本模型文件的状态信息组以及所述预定占用比例,将满足预设并发执行条件的所述第一确定模块确定的所述第一目标任务的集合确定为第一任务组,所述第一版本模型文件的状态信息组包括:所述第一版本模型文件在运行状态下,所述神经网络加速器的各个功能单元分别对应的状态;第一运行模块,用于并发运行所述第二确定模块确定的所述第一任务组中的各个所述第一目标任务分别对应的所述第一版本模型文件。
- 一种任务调度执行指令的生成装置,包括:第一生成模块,用于通过编译处理,生成第一算子组对应的第一版本模型文件,所述第一版本模型文件在运行状态下按预定占用比例占用神经网络加速器的计算资源;第二生成模块,用于基于所述第一算子组对应的功能单元组,生成所述第一生成模块生成的所述第一版本模型文件的状态信息组,所述第一算子组对应的功能单元组包括:所述神经网络加速器的用于供所述第一算子组运行的各个功能单元,所述第一版本模型文件的状态信息组包括:所述第一版本模型文件在运行状态下,所述神经网络加速器的各个功能单元分别对应的状态;第三生成模块,用于基于所述第一生成模块生成的所述第一版本模型文件、所述第二生成模块生成的所述第一版本模型文件的状态信息组,以及所述预定占用比例,生成任务调度执行指令,所述任务调度执行指令用于执行上述权利要求1或2所述的任务调度执行方法。
- 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-4中任一所述的任务调度执行方法,或者用于执行上述权利要求5-8中任一所述的任务调度执行指令的生成方法。
- 一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-4中任一所述的任务调度执行方法,或者实现上述权利要求5-8中任一所述的任务调度执行指令的生成方法。
- 一种计算机程序产品,当所述计算机程序产品中的指令被处理器执行时,实现上述权利要求1-4中任一所述的任务调度执行方法,或者实现上述权利要求5-8中任一所述的任务调度执行指令的生成方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP23864116.1A EP4398101A1 (en) | 2022-11-22 | 2023-09-22 | Task scheduling execution method, and generation method and apparatus for task scheduling execution instruction |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211467576.XA CN115756794A (zh) | 2022-11-22 | 2022-11-22 | 任务调度执行方法、任务调度执行指令的生成方法及装置 |
CN202211467576.X | 2022-11-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024109312A1 true WO2024109312A1 (zh) | 2024-05-30 |
Family
ID=85335042
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/120845 WO2024109312A1 (zh) | 2022-11-22 | 2023-09-22 | 任务调度执行方法、任务调度执行指令的生成方法及装置 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP4398101A1 (zh) |
CN (1) | CN115756794A (zh) |
WO (1) | WO2024109312A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115756794A (zh) * | 2022-11-22 | 2023-03-07 | 北京地平线信息技术有限公司 | 任务调度执行方法、任务调度执行指令的生成方法及装置 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070021998A1 (en) * | 2005-06-27 | 2007-01-25 | Road Ltd. | Resource scheduling method and system |
US20140089935A1 (en) * | 2011-05-19 | 2014-03-27 | Nec Corporation | Parallel processing device, parallel processing method, optimization device, optimization method and computer program |
CN111400010A (zh) * | 2020-03-18 | 2020-07-10 | 中国建设银行股份有限公司 | 任务调度方法及装置 |
CN113760524A (zh) * | 2020-11-17 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | 任务执行方法和装置 |
CN113886034A (zh) * | 2021-09-09 | 2022-01-04 | 深圳奥哲网络科技有限公司 | 任务调度方法、系统、电子设备及存储介质 |
CN114661475A (zh) * | 2022-03-31 | 2022-06-24 | 北京白海科技有限公司 | 一种用于机器学习的分布式资源调度方法及装置 |
CN115756794A (zh) * | 2022-11-22 | 2023-03-07 | 北京地平线信息技术有限公司 | 任务调度执行方法、任务调度执行指令的生成方法及装置 |
-
2022
- 2022-11-22 CN CN202211467576.XA patent/CN115756794A/zh active Pending
-
2023
- 2023-09-22 EP EP23864116.1A patent/EP4398101A1/en active Pending
- 2023-09-22 WO PCT/CN2023/120845 patent/WO2024109312A1/zh unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070021998A1 (en) * | 2005-06-27 | 2007-01-25 | Road Ltd. | Resource scheduling method and system |
US20140089935A1 (en) * | 2011-05-19 | 2014-03-27 | Nec Corporation | Parallel processing device, parallel processing method, optimization device, optimization method and computer program |
CN111400010A (zh) * | 2020-03-18 | 2020-07-10 | 中国建设银行股份有限公司 | 任务调度方法及装置 |
CN113760524A (zh) * | 2020-11-17 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | 任务执行方法和装置 |
CN113886034A (zh) * | 2021-09-09 | 2022-01-04 | 深圳奥哲网络科技有限公司 | 任务调度方法、系统、电子设备及存储介质 |
CN114661475A (zh) * | 2022-03-31 | 2022-06-24 | 北京白海科技有限公司 | 一种用于机器学习的分布式资源调度方法及装置 |
CN115756794A (zh) * | 2022-11-22 | 2023-03-07 | 北京地平线信息技术有限公司 | 任务调度执行方法、任务调度执行指令的生成方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
EP4398101A1 (en) | 2024-07-10 |
CN115756794A (zh) | 2023-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20130283250A1 (en) | Thread Specific Compiler Generated Customization of Runtime Support for Application Programming Interfaces | |
US8359588B2 (en) | Reducing inter-task latency in a multiprocessor system | |
Brock et al. | BCL: A cross-platform distributed data structures library | |
JP2010079622A (ja) | マルチコアプロセッサシステム、および、そのタスク制御方法 | |
CN102402419A (zh) | 用所选执行运行时执行的用户代码的运行时不可知表示 | |
US20170192762A1 (en) | Declarative programming model with a native programming language | |
WO2024131097A1 (zh) | 神经网络模型的编译方法、装置、电子设备和存储介质 | |
CN117032999B (zh) | 一种基于异步运行时的cpu-gpu协同调度方法及装置 | |
JP2019049843A (ja) | 実行ノード選定プログラム、実行ノード選定方法及び情報処理装置 | |
WO2024109312A1 (zh) | 任务调度执行方法、任务调度执行指令的生成方法及装置 | |
EP2799986B1 (en) | Apparatus and method for translating multithread program code | |
JP7122299B2 (ja) | 処理タスクを実行するための方法、装置、デバイス、および記憶媒体 | |
US20160147559A1 (en) | Modification of context saving functions | |
Kitagawa et al. | DAG scheduling algorithm for a cluster-based many-core architecture | |
Kale et al. | Toward a standard interface for user-defined scheduling in OpenMP | |
Mastoras et al. | Understanding parallelization tradeoffs for linear pipelines | |
WO2024222552A1 (zh) | 模型编译方法、装置、设备和存储介质 | |
KR101332839B1 (ko) | 병렬 컴퓨팅 프레임워크 기반 클러스터 시스템의 호스트 노드 및 메모리 관리 방법 | |
Yang et al. | Managing asynchronous operations in Coarray Fortran 2.0 | |
US20120137300A1 (en) | Information Processor and Information Processing Method | |
Senoussaoui et al. | Memory-processor co-scheduling of AECR-DAG real-time tasks on partitioned multicore platforms with scratchpads | |
Henrio et al. | Declarative scheduling for active objects | |
Tardieu et al. | X10 and APGAS at Petascale | |
CN116048759A (zh) | 数据流的数据处理方法、装置、计算机和存储介质 | |
JP2023046376A (ja) | 制御方法、処理サーバ及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2023864116 Country of ref document: EP Effective date: 20240320 |