CN119987958A

CN119987958A - A multi-task scheduling method based on multi-NPU

Info

Publication number: CN119987958A
Application number: CN202411749908.2A
Authority: CN
Inventors: 秦翔; 马城城; 罗进杰; 王泉; 张承之; 王樱洁
Original assignee: Xian Xiangteng Microelectronics Technology Co Ltd
Current assignee: Xian Xiangteng Microelectronics Technology Co Ltd
Priority date: 2024-12-02
Filing date: 2024-12-02
Publication date: 2025-05-13

Abstract

The invention relates to a multi-task scheduling method based on multiple NPUs, which comprises the following steps of 1) obtaining and monitoring the core load condition, the data transmission path state, the memory occupation and other NPU resource states of each NPU, 2) confirming the priority of each NPU based on the algorithm task scale, the required core resources, the task issuing sequence and the like, 3) distributing relevant resources for the appropriate tasks according to the current resource state of the NPU and the algorithm task priority and executing the tasks, 4) continuing to distribute the appropriate tasks according to the number of residual resources and the priority after the tasks are distributed, and ensuring that all NPUs can be maximally utilized, 5) immediately recovering the relevant resources after the algorithm tasks are executed, and repeating the steps 3) and 4) and continuing to execute until all the algorithm tasks are completed. By reasonably distributing NPU resources and task configuration, the invention improves the multi-task processing efficiency and the resource utilization rate in the multi-NPU environment, reduces the task waiting time and the system energy consumption, and provides powerful support for constructing an artificial intelligence application platform with high performance and low delay.

Description

Multitasking scheduling method based on multiple NPUs

Technical Field

The invention relates to the field of computer computing and deep learning hardware acceleration, in particular to a multi-neural Network Processor (NPU) efficient scheduling method in a multi-task environment.

Background

With the rapid development of computer vision and artificial intelligence technology, image and video processing technology based on deep learning is widely applied in the fields of target detection, image classification, semantic segmentation, scene understanding and the like. The training and reasoning process of neural network models typically requires a significant amount of computational resources.

A neural Network Processor (NPU) is a processor that is specifically created for Artificial Intelligence (AI) algorithms, which provides a powerful computing power to support efficient operation of various AI algorithms. Along with the rapid development of AI technology, NPU has been widely used in numerous fields such as intelligent cameras, autopilots, intelligent robots, etc.

In a complex multi-task processing scenario, a plurality of NPUs may be required to execute a large number of AI algorithms at the same time, and these tasks generally have different running times and execution frequencies, so that it is difficult to schedule with a single configuration, and how to efficiently and fairly schedule a plurality of tasks to be executed on a plurality of NPUs, so as to maximize resource utilization and reduce task latency is a problem to be solved.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides a multi-task scheduling method based on multi-NPU, which improves the multi-task processing efficiency and the resource utilization rate in the multi-NPU environment, reduces the task waiting time and the system energy consumption and provides powerful support for constructing an artificial intelligent application platform with high performance and low delay by reasonably distributing NPU resources and task configuration.

The technical scheme of the invention is that the invention provides a multi-NPU-based multi-task scheduling method, which is characterized by comprising the following steps:

1) Acquiring and monitoring NPU resource states such as core load conditions, data transmission path states, memory occupation and the like of each NPU;

2) Confirming the priority of the task based on the scale of the algorithm task, the demand nuclear resources, the task issuing sequence and the like;

3) Distributing relevant resources for the proper task according to the current resource state of the NPU and the task priority of the algorithm and executing the task;

4) After task allocation, continuing to allocate proper tasks according to the number of residual resources and the priority, so as to ensure that all NPUs can be utilized maximally;

5) And (3) immediately recovering relevant resources after the execution of the algorithm tasks is completed, and repeating the step (3) and the step (4), and continuing to execute until all the algorithm tasks are completed.

Further, the specific steps of step 1) include:

1.1 Acquiring the states of all NPU cores, including whether the cores are in operation, whether the cores are abnormal, whether the cores are bound by tasks and other core load information, and continuously monitoring related information;

1.2 Acquiring the state of a data transmission path, including whether the data path is in the transmission state, the number of paths in operation and idle paths, the residual number of data transmitted by the paths in operation, and continuously monitoring related information;

1.3 Acquiring and managing NPU free space.

Further, the specific steps of the step 2) include:

2.1 The algorithm scale difference of different algorithm tasks is large, and the same algorithm task can be distributed to different numbers of core resources for execution according to the requirements, so that each task needs to be briefly evaluated, the required execution time is roughly judged, and then the priority of the task is determined according to the forward and backward sequence and the task level issued by the task so as to ensure that the task can be executed uniformly and efficiently;

2.2 Firstly, judging according to the assigned task level, wherein tasks with high levels can be preferentially executed, secondly, judging by the needed cores of the tasks with the same level, preferentially executing tasks with more needed cores, and preferentially executing the tasks issued first again.

Further, the specific step of the step 3) is to select the task with the highest algorithm task priority and the current core resource and the memory resource meeting in the step 2) according to the current NPU state information obtained in the step 1), especially the available core number, allocate the corresponding resource for the task and execute the task.

Further, the specific step of the step 4) is that after the task is executed in the step 3), the task with higher algorithm priority and meeting the number of the needed cores is obtained from the step 2) according to the number of idle cores on each NPU, corresponding resources are allocated for the task and the task is executed, so that each core on each NPU can be fully utilized, the calculation force of the NPU is utilized to the maximum, and the aim of improving the overall performance is achieved.

Further, the specific step of the step 5) is to analyze and judge the completion condition of the task according to the current state information of each NPU obtained in the step 1), after the NPU completes the calculation of the algorithm task, carry the result to the designated address through the transmission channel, update the information such as the core occupation condition of the NPU, the use condition of the memory space and the like, recover all relevant resources, and repeat the step 3) and the step 4) until all tasks are executed and completed, and then wait for the next round of task to arrive.

The invention provides a multi-task scheduling method based on multiple NPUs, which improves the multi-task processing efficiency and the resource utilization rate in the multi-NPU environment by reasonably distributing NPU resources and task configuration, reduces task waiting time and system energy consumption, and provides powerful support for constructing an artificial intelligence application platform with high performance and low delay.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and the detailed description.

Referring to fig. 1, the method steps of a specific embodiment of the present invention are as follows:

1.3 Acquiring and managing NPU free space.

3) Related resources are allocated for the proper task according to the current resource state of the NPU and the task priority of the algorithm, and the method comprises the following specific steps:

According to the current NPU state information obtained in the step 1), especially the number of available cores, selecting the task with the highest algorithm task priority and the current core resource and memory resource meeting in the step 2), distributing corresponding resources for the task and executing the task.

4) After the task is allocated, continuing to allocate the proper task according to the number of the residual resources and the priority, so as to ensure that all NPUs can be utilized maximally, wherein the method specifically comprises the following steps:

after the task is executed in the step 3), the task with higher algorithm priority and the required number of the cores meeting the requirement is obtained from the step 2) according to the number of idle cores on each NPU, corresponding resources are allocated for the task and the task is executed, so that each core on each NPU can be fully utilized, the calculation force of the NPU is utilized to the maximum, and the aim of improving the overall performance is achieved.

5) And (3) immediately recovering relevant resources after the execution of the algorithm tasks is completed, and repeating the step (3) and the step (4), and continuing to execute until all the algorithm tasks are completed. The method comprises the following steps:

Analyzing and judging the completion condition of the task according to the current state information of each NPU obtained in the step 1), carrying the result to a designated address through a transmission path after the NPU completes calculation of the algorithm task, updating the information such as the core occupation condition, the memory space use condition and the like of the NPU, recovering all relevant resources, and repeating the step 3) and the step 4) until all tasks are executed, and waiting for the arrival of the next round of tasks.

The above is only a specific embodiment disclosed in the present invention, but the scope of the present invention is not limited thereto, and the scope of the present invention should be defined by the claims.

The technical matters not specifically described in the foregoing embodiments are the same as those in the prior art.

The present invention is not limited to the above-described embodiments, and the present invention can be implemented with the above-described advantageous effects.

Claims

1. A multi-NPU-based multi-task scheduling method is characterized by comprising the following steps:

2. The multi-NPU-based multi-task scheduling method of claim 1, wherein the specific steps of step 1) comprise:

1) The method comprises the steps of 1, obtaining the states of all NPU cores, including whether the cores are in operation, whether the cores are abnormal, whether the cores are bound by tasks and other core load information, and continuously monitoring related information;

1) Acquiring a data transmission channel state, including whether the data channel is in a transmission state, the number of channels in operation and idle channels, the residual quantity of data transmitted by the channels in operation, and continuously monitoring related information;

1) And thirdly, acquiring and managing the NPU free space.

3. The multi-NPU-based multi-task scheduling method of claim 2, wherein the specific steps of step 2) include:

2) 1, the algorithm scale difference of different algorithm tasks is large, and the same algorithm task can be distributed to different numbers of core resources for execution according to requirements, so that each task needs to be briefly evaluated, the required execution time is roughly judged, and then the priority of the task is determined according to the forward and backward sequence issued by the task and the task level, so that the task can be executed uniformly and efficiently;

2) And 2, firstly judging according to the assigned task level, wherein tasks with high levels can be preferentially executed, secondly judging the tasks with the same level by the required cores, preferentially executing the tasks with more required cores, and preferentially executing the tasks issued again.

4. The multi-NPU-based multi-task scheduling method according to claim 3, wherein the specific step of the step 3) is to select the task with the highest algorithm task priority and the current core resource and the memory resource meeting in the step 2) according to the current NPU state information obtained in the step 1), especially the available core number, allocate the corresponding resource for the task and execute the task.

5. The multi-NPU-based multi-task scheduling method according to claim 4, wherein the specific step of step 4) is to obtain the tasks with higher algorithm priority and satisfied required number of cores from step 2) according to the number of idle cores on each NPU after the task is executed in step 3), allocate corresponding resources for the tasks and execute the tasks, thereby ensuring that each core on each NPU can be fully utilized, maximizing the calculation power of the NPU, and achieving the purpose of improving the overall performance.

6. The multi-NPU-based multi-task scheduling method according to claim 5, wherein the specific step of step 5) is to analyze and judge the completion of the task according to the current state information of each NPU obtained in step 1), after the NPU completes the calculation of the algorithm task, carry the result to the designated address through the transmission path, update the information such as the core occupation condition, the memory space usage condition and the like of the NPU, recycle all relevant resources, and repeat step 3) and step 4) until all tasks are executed, and wait for the next round of tasks to arrive.