[go: up one dir, main page]

CN119987958A - A multi-task scheduling method based on multi-NPU - Google Patents

A multi-task scheduling method based on multi-NPU Download PDF

Info

Publication number
CN119987958A
CN119987958A CN202411749908.2A CN202411749908A CN119987958A CN 119987958 A CN119987958 A CN 119987958A CN 202411749908 A CN202411749908 A CN 202411749908A CN 119987958 A CN119987958 A CN 119987958A
Authority
CN
China
Prior art keywords
task
npu
tasks
algorithm
resources
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411749908.2A
Other languages
Chinese (zh)
Inventor
秦翔
马城城
罗进杰
王泉
张承之
王樱洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Xiangteng Microelectronics Technology Co Ltd
Original Assignee
Xian Xiangteng Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Xiangteng Microelectronics Technology Co Ltd filed Critical Xian Xiangteng Microelectronics Technology Co Ltd
Priority to CN202411749908.2A priority Critical patent/CN119987958A/en
Publication of CN119987958A publication Critical patent/CN119987958A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Multi Processors (AREA)

Abstract

The invention relates to a multi-task scheduling method based on multiple NPUs, which comprises the following steps of 1) obtaining and monitoring the core load condition, the data transmission path state, the memory occupation and other NPU resource states of each NPU, 2) confirming the priority of each NPU based on the algorithm task scale, the required core resources, the task issuing sequence and the like, 3) distributing relevant resources for the appropriate tasks according to the current resource state of the NPU and the algorithm task priority and executing the tasks, 4) continuing to distribute the appropriate tasks according to the number of residual resources and the priority after the tasks are distributed, and ensuring that all NPUs can be maximally utilized, 5) immediately recovering the relevant resources after the algorithm tasks are executed, and repeating the steps 3) and 4) and continuing to execute until all the algorithm tasks are completed. By reasonably distributing NPU resources and task configuration, the invention improves the multi-task processing efficiency and the resource utilization rate in the multi-NPU environment, reduces the task waiting time and the system energy consumption, and provides powerful support for constructing an artificial intelligence application platform with high performance and low delay.

Description

Multitasking scheduling method based on multiple NPUs
Technical Field
The invention relates to the field of computer computing and deep learning hardware acceleration, in particular to a multi-neural Network Processor (NPU) efficient scheduling method in a multi-task environment.
Background
With the rapid development of computer vision and artificial intelligence technology, image and video processing technology based on deep learning is widely applied in the fields of target detection, image classification, semantic segmentation, scene understanding and the like. The training and reasoning process of neural network models typically requires a significant amount of computational resources.
A neural Network Processor (NPU) is a processor that is specifically created for Artificial Intelligence (AI) algorithms, which provides a powerful computing power to support efficient operation of various AI algorithms. Along with the rapid development of AI technology, NPU has been widely used in numerous fields such as intelligent cameras, autopilots, intelligent robots, etc.
In a complex multi-task processing scenario, a plurality of NPUs may be required to execute a large number of AI algorithms at the same time, and these tasks generally have different running times and execution frequencies, so that it is difficult to schedule with a single configuration, and how to efficiently and fairly schedule a plurality of tasks to be executed on a plurality of NPUs, so as to maximize resource utilization and reduce task latency is a problem to be solved.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a multi-task scheduling method based on multi-NPU, which improves the multi-task processing efficiency and the resource utilization rate in the multi-NPU environment, reduces the task waiting time and the system energy consumption and provides powerful support for constructing an artificial intelligent application platform with high performance and low delay by reasonably distributing NPU resources and task configuration.
The technical scheme of the invention is that the invention provides a multi-NPU-based multi-task scheduling method, which is characterized by comprising the following steps:
1) Acquiring and monitoring NPU resource states such as core load conditions, data transmission path states, memory occupation and the like of each NPU;
2) Confirming the priority of the task based on the scale of the algorithm task, the demand nuclear resources, the task issuing sequence and the like;
3) Distributing relevant resources for the proper task according to the current resource state of the NPU and the task priority of the algorithm and executing the task;
4) After task allocation, continuing to allocate proper tasks according to the number of residual resources and the priority, so as to ensure that all NPUs can be utilized maximally;
5) And (3) immediately recovering relevant resources after the execution of the algorithm tasks is completed, and repeating the step (3) and the step (4), and continuing to execute until all the algorithm tasks are completed.
Further, the specific steps of step 1) include:
1.1 Acquiring the states of all NPU cores, including whether the cores are in operation, whether the cores are abnormal, whether the cores are bound by tasks and other core load information, and continuously monitoring related information;
1.2 Acquiring the state of a data transmission path, including whether the data path is in the transmission state, the number of paths in operation and idle paths, the residual number of data transmitted by the paths in operation, and continuously monitoring related information;
1.3 Acquiring and managing NPU free space.
Further, the specific steps of the step 2) include:
2.1 The algorithm scale difference of different algorithm tasks is large, and the same algorithm task can be distributed to different numbers of core resources for execution according to the requirements, so that each task needs to be briefly evaluated, the required execution time is roughly judged, and then the priority of the task is determined according to the forward and backward sequence and the task level issued by the task so as to ensure that the task can be executed uniformly and efficiently;
2.2 Firstly, judging according to the assigned task level, wherein tasks with high levels can be preferentially executed, secondly, judging by the needed cores of the tasks with the same level, preferentially executing tasks with more needed cores, and preferentially executing the tasks issued first again.
Further, the specific step of the step 3) is to select the task with the highest algorithm task priority and the current core resource and the memory resource meeting in the step 2) according to the current NPU state information obtained in the step 1), especially the available core number, allocate the corresponding resource for the task and execute the task.
Further, the specific step of the step 4) is that after the task is executed in the step 3), the task with higher algorithm priority and meeting the number of the needed cores is obtained from the step 2) according to the number of idle cores on each NPU, corresponding resources are allocated for the task and the task is executed, so that each core on each NPU can be fully utilized, the calculation force of the NPU is utilized to the maximum, and the aim of improving the overall performance is achieved.
Further, the specific step of the step 5) is to analyze and judge the completion condition of the task according to the current state information of each NPU obtained in the step 1), after the NPU completes the calculation of the algorithm task, carry the result to the designated address through the transmission channel, update the information such as the core occupation condition of the NPU, the use condition of the memory space and the like, recover all relevant resources, and repeat the step 3) and the step 4) until all tasks are executed and completed, and then wait for the next round of task to arrive.
The invention provides a multi-task scheduling method based on multiple NPUs, which improves the multi-task processing efficiency and the resource utilization rate in the multi-NPU environment by reasonably distributing NPU resources and task configuration, reduces task waiting time and system energy consumption, and provides powerful support for constructing an artificial intelligence application platform with high performance and low delay.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
Referring to fig. 1, the method steps of a specific embodiment of the present invention are as follows:
1) Acquiring and monitoring NPU resource states such as core load conditions, data transmission path states, memory occupation and the like of each NPU;
1.1 Acquiring the states of all NPU cores, including whether the cores are in operation, whether the cores are abnormal, whether the cores are bound by tasks and other core load information, and continuously monitoring related information;
1.2 Acquiring the state of a data transmission path, including whether the data path is in the transmission state, the number of paths in operation and idle paths, the residual number of data transmitted by the paths in operation, and continuously monitoring related information;
1.3 Acquiring and managing NPU free space.
2) Confirming the priority of the task based on the scale of the algorithm task, the demand nuclear resources, the task issuing sequence and the like;
2.1 The algorithm scale difference of different algorithm tasks is large, and the same algorithm task can be distributed to different numbers of core resources for execution according to the requirements, so that each task needs to be briefly evaluated, the required execution time is roughly judged, and then the priority of the task is determined according to the forward and backward sequence and the task level issued by the task so as to ensure that the task can be executed uniformly and efficiently;
2.2 Firstly, judging according to the assigned task level, wherein tasks with high levels can be preferentially executed, secondly, judging by the needed cores of the tasks with the same level, preferentially executing tasks with more needed cores, and preferentially executing the tasks issued first again.
3) Related resources are allocated for the proper task according to the current resource state of the NPU and the task priority of the algorithm, and the method comprises the following specific steps:
According to the current NPU state information obtained in the step 1), especially the number of available cores, selecting the task with the highest algorithm task priority and the current core resource and memory resource meeting in the step 2), distributing corresponding resources for the task and executing the task.
4) After the task is allocated, continuing to allocate the proper task according to the number of the residual resources and the priority, so as to ensure that all NPUs can be utilized maximally, wherein the method specifically comprises the following steps:
after the task is executed in the step 3), the task with higher algorithm priority and the required number of the cores meeting the requirement is obtained from the step 2) according to the number of idle cores on each NPU, corresponding resources are allocated for the task and the task is executed, so that each core on each NPU can be fully utilized, the calculation force of the NPU is utilized to the maximum, and the aim of improving the overall performance is achieved.
5) And (3) immediately recovering relevant resources after the execution of the algorithm tasks is completed, and repeating the step (3) and the step (4), and continuing to execute until all the algorithm tasks are completed. The method comprises the following steps:
Analyzing and judging the completion condition of the task according to the current state information of each NPU obtained in the step 1), carrying the result to a designated address through a transmission path after the NPU completes calculation of the algorithm task, updating the information such as the core occupation condition, the memory space use condition and the like of the NPU, recovering all relevant resources, and repeating the step 3) and the step 4) until all tasks are executed, and waiting for the arrival of the next round of tasks.
The above is only a specific embodiment disclosed in the present invention, but the scope of the present invention is not limited thereto, and the scope of the present invention should be defined by the claims.
The technical matters not specifically described in the foregoing embodiments are the same as those in the prior art.
The present invention is not limited to the above-described embodiments, and the present invention can be implemented with the above-described advantageous effects.

Claims (6)

1. A multi-NPU-based multi-task scheduling method is characterized by comprising the following steps:
1) Acquiring and monitoring NPU resource states such as core load conditions, data transmission path states, memory occupation and the like of each NPU;
2) Confirming the priority of the task based on the scale of the algorithm task, the demand nuclear resources, the task issuing sequence and the like;
3) Distributing relevant resources for the proper task according to the current resource state of the NPU and the task priority of the algorithm and executing the task;
4) After task allocation, continuing to allocate proper tasks according to the number of residual resources and the priority, so as to ensure that all NPUs can be utilized maximally;
5) And (3) immediately recovering relevant resources after the execution of the algorithm tasks is completed, and repeating the step (3) and the step (4), and continuing to execute until all the algorithm tasks are completed.
2. The multi-NPU-based multi-task scheduling method of claim 1, wherein the specific steps of step 1) comprise:
1) The method comprises the steps of 1, obtaining the states of all NPU cores, including whether the cores are in operation, whether the cores are abnormal, whether the cores are bound by tasks and other core load information, and continuously monitoring related information;
1) Acquiring a data transmission channel state, including whether the data channel is in a transmission state, the number of channels in operation and idle channels, the residual quantity of data transmitted by the channels in operation, and continuously monitoring related information;
1) And thirdly, acquiring and managing the NPU free space.
3. The multi-NPU-based multi-task scheduling method of claim 2, wherein the specific steps of step 2) include:
2) 1, the algorithm scale difference of different algorithm tasks is large, and the same algorithm task can be distributed to different numbers of core resources for execution according to requirements, so that each task needs to be briefly evaluated, the required execution time is roughly judged, and then the priority of the task is determined according to the forward and backward sequence issued by the task and the task level, so that the task can be executed uniformly and efficiently;
2) And 2, firstly judging according to the assigned task level, wherein tasks with high levels can be preferentially executed, secondly judging the tasks with the same level by the required cores, preferentially executing the tasks with more required cores, and preferentially executing the tasks issued again.
4. The multi-NPU-based multi-task scheduling method according to claim 3, wherein the specific step of the step 3) is to select the task with the highest algorithm task priority and the current core resource and the memory resource meeting in the step 2) according to the current NPU state information obtained in the step 1), especially the available core number, allocate the corresponding resource for the task and execute the task.
5. The multi-NPU-based multi-task scheduling method according to claim 4, wherein the specific step of step 4) is to obtain the tasks with higher algorithm priority and satisfied required number of cores from step 2) according to the number of idle cores on each NPU after the task is executed in step 3), allocate corresponding resources for the tasks and execute the tasks, thereby ensuring that each core on each NPU can be fully utilized, maximizing the calculation power of the NPU, and achieving the purpose of improving the overall performance.
6. The multi-NPU-based multi-task scheduling method according to claim 5, wherein the specific step of step 5) is to analyze and judge the completion of the task according to the current state information of each NPU obtained in step 1), after the NPU completes the calculation of the algorithm task, carry the result to the designated address through the transmission path, update the information such as the core occupation condition, the memory space usage condition and the like of the NPU, recycle all relevant resources, and repeat step 3) and step 4) until all tasks are executed, and wait for the next round of tasks to arrive.
CN202411749908.2A 2024-12-02 2024-12-02 A multi-task scheduling method based on multi-NPU Pending CN119987958A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411749908.2A CN119987958A (en) 2024-12-02 2024-12-02 A multi-task scheduling method based on multi-NPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411749908.2A CN119987958A (en) 2024-12-02 2024-12-02 A multi-task scheduling method based on multi-NPU

Publications (1)

Publication Number Publication Date
CN119987958A true CN119987958A (en) 2025-05-13

Family

ID=95633496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411749908.2A Pending CN119987958A (en) 2024-12-02 2024-12-02 A multi-task scheduling method based on multi-NPU

Country Status (1)

Country Link
CN (1) CN119987958A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120492133A (en) * 2025-07-17 2025-08-15 北京智芯微电子科技有限公司 Scheduling method and device of NPU computing task, artificial intelligent device and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120492133A (en) * 2025-07-17 2025-08-15 北京智芯微电子科技有限公司 Scheduling method and device of NPU computing task, artificial intelligent device and medium

Similar Documents

Publication Publication Date Title
CN112416585B (en) GPU resource management and intelligent scheduling methods for deep learning
US11989647B2 (en) Self-learning scheduler for application orchestration on shared compute cluster
CN113377540A (en) Cluster resource scheduling method and device, electronic equipment and storage medium
CN114741207B (en) GPU resource scheduling method and system based on multi-dimensional combination parallelism
CN113946431B (en) Resource scheduling method, system, medium and computing device
CN112711478B (en) Task processing method and device based on neural network, server and storage medium
CN114912618B (en) A quantum computing task scheduling method, device and quantum computer operating system
CN115098257B (en) Resource scheduling method, device, equipment and storage medium
CN117271101B (en) Operator fusion method and device, electronic equipment and storage medium
CN115437760A (en) Computing resource allocation method, electronic device, storage medium, and program product
CN119248525B (en) Memory management method and device of reasoning system
CN115934344A (en) Heterogeneous distributed reinforcement learning computing method, system and storage medium
CN114610474A (en) Multi-strategy job scheduling method and system in heterogeneous supercomputing environment
CN117492997A (en) A resource scheduling method and system for deep recommendation system training tasks
CN119987958A (en) A multi-task scheduling method based on multi-NPU
CN115564374A (en) Collaborative multitask redistribution method, device, equipment and readable storage medium
CN118051331A (en) Resource allocation method and system
CN114866612B (en) A method and device for unloading electric microservices
CN120011093B (en) Accelerator-oriented multitasking method and related device
CN117215740A (en) Methods and systems for building scheduling models, methods and systems for scheduling target tasks
CN119248522B (en) Memory management method and device of reasoning system
CN120578481A (en) A reinforcement learning and heuristic driven method for scheduling edge computing dependent tasks
CN116069480B (en) Processor and computing device
CN115269165A (en) Operator graph resource allocation method, device, computer equipment and storage medium
CN115204406A (en) Quantum computation simulation method, quantum circuit simulator and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination