Task scheduling method and system under rapid reconfigurable signal processing heterogeneous platform
Technical Field
The invention belongs to the technical field of signal processing, and particularly relates to a task scheduling method and system under a rapidly reconfigurable signal processing heterogeneous platform.
Background
At present: the quick reconfigurable signal processing heterogeneous platform utilizes heterogeneous processing resources such as a CPU, a DSP, a GPU, an FPGA and the like to form a large number of board cards for signal processing, integrates all the board cards into a case, utilizes a bus on a back plate of the case and a switch board of the case to carry out data interaction between processing resources, and stores data and files required in the processing process by means of a memory. The comprehensive software development environment of the platform is a software part of the platform and comprises a visual development interface, a task scheduling module, a resource monitoring module, a configuration control module, an encapsulation module, a communication control module and other functional modules. The task scheduling algorithm is the core of the task scheduling module and determines the performance and efficiency of the signal processing application. A signal processing application is usually completed by multiple tasks in cooperation, and the tasks usually contain complex dependencies, which are usually represented by DAG (directed acyclic graph). In a directed acyclic graph, nodes represent tasks and directed edges represent data dependencies between tasks. The task scheduling algorithm is to assign the tasks in the task graph to the appropriate computing resources according to the specified optimization objective, and determine the execution sequence on the computing resources on the premise of ensuring the communication dependency relationship among the tasks. In our scenario, the main optimization goal of the task scheduling algorithm is the overall execution time of the signal processing tasks, i.e. the time difference between the task that starts executing earliest and the task that finishes executing latest. Task scheduling problems can be divided into static task scheduling and dynamic task scheduling according to whether the characteristics of tasks (including dependency relationship between tasks, calculation overhead of tasks on a processor and communication overhead between tasks) are given in advance. Under our platform, the method of static scheduling is adopted, since hardware resources are usually fixed and various signal processing tasks are provided by a task component library, all of us give these characteristic estimation values based on a statistical method to be provided for a scheduling algorithm. Aiming at the problem of static scheduling in a heterogeneous system, the scheduling can be mainly divided into table-based scheduling, cluster-based scheduling, task replication-based scheduling and guided random search scheduling, and common scheduling algorithms such as heuristic algorithms like HEFT, CPOP, Lookahead and CEFT are widely applied to practical application.
Typically, such task scheduling is divided into two phases, a task selection phase and a processing resource selection phase. The HEFT algorithm is used for calculating RankU values of all task nodes from the task node exiting based on the communication overhead and the calculation overhead of the tasks, taking the RankU values as the standard of priority sorting, performing descending order arrangement on the RankU values, and sequentially scheduling each task to the processing resource which enables the earliest execution completion time (EFT) of the task to be the minimum. The CPOP algorithm schedules tasks on the critical path to specific processing resources, and the rest of the scheduling process is similar to the left scheduling process. The Lookahead method differs from the HEFT in that the earliest execution completion time of the task immediately following the task is used as a criterion in the processing resource selection phase. The CEFT algorithm is a scheduling method that combines table scheduling and task replication.
Because data dependency exists among tasks, a subsequent task can be executed only after all direct predecessor tasks of the subsequent task are executed, that is, after all direct predecessor tasks of the subsequent task are executed, data depended on by the subsequent task needs to be transmitted to processing resources where the subsequent task is located. Under the scenes of large data transmission quantity, narrow communication bandwidth and large communication overhead, the subsequent tasks have long waiting time delay, a certain amount of idle time for processing resources is caused, the resource utilization rate is low, and defects of the HEFT algorithm in the aspect are more prominent. In order to fully utilize the idle time of the processing resources, part of the predecessor tasks are executed by utilizing the idle time of the processing resources, and the larger communication overhead is offset by using the calculation overhead, so that the execution time of the whole task application is reduced, namely the task copying method is achieved. However, the performance of the method of simply using task replication needs to be improved in a scenario where there is a task with high communication overhead. Aiming at the defects, the invention provides high-communication overhead task binding and improves the time application of signal processing application under the rapid reconfigurable signal processing heterogeneous platform by combining task replication. Task replication is the process of copying some predecessor tasks to the hardware processing resources of the task to execute in order to save the communication overhead of the predecessor tasks to the task. The task binding with high communication overhead is to bind the task pair together, and after the data of the predecessor tasks of the two tasks are prepared, the binding task pair is allocated on the same hardware processing resource according to a certain rule, so that the direct high communication overhead of the two tasks is eliminated.
In order to meet the real-time requirement of signal processing in more and more scenes and enable a developer to carry out quick iteration and quick reconstruction on deployed signal processing tasks, a quick reconfigurable signal processing heterogeneous platform which combines a heterogeneous hardware accelerator (a large number of heterogeneous computing resources such as CPUs, GPUs, DSPs (digital signal processors), FPGAs (field programmable gate arrays) and a comprehensive development environment of visual development, resource virtualization management and real-time task scheduling can meet the requirement. Developers deploy signal processing applications in visualization software, and the applications are usually composed of a plurality of tasks with certain data dependency, and are represented in the form of a DAG (directed acyclic graph). In order to accelerate the signal processing process to meet the real-time requirement, how to allocate a scheduling algorithm of appropriate hardware computing resources to each task according to the characteristics of the application and the characteristics of hardware in the platform is crucial. Unlike other scenarios, hardware resources are abundant in this heterogeneous platform, so the main concern of the scheduling algorithm is temporal performance.
Through the above analysis, the problems and defects of the prior art are as follows: in the prior art, under the conditions of large data transmission quantity, narrow communication bandwidth and high communication overhead, a subsequent task has long waiting time delay, and simultaneously, a certain amount of idle time for processing resources is caused, so that the resource utilization rate is low.
The difficulty in solving the above problems and defects is: to achieve fast reconstruction and fast deployment of signal processing applications requires a combination of visualization software and heterogeneous hardware, since both software and hardware are involved. How to build such a fast reconfigurable signal processing platform becomes a difficult point. Meanwhile, in the face of scenes of large data transmission quantity, narrow communication bandwidth and high communication overhead, how to carry out task scheduling can improve the real-time performance of signal processing application as much as possible.
The significance of solving the problems and the defects is as follows: the method can accelerate the rapid iteration and the rapid reconstruction of the signal processing application and shorten the time period of application deployment. The task scheduling algorithm provided by the invention can further accelerate the overall running time of the signal processing application and improve the real-time performance of the application. Meanwhile, the utilization rate of hardware resources under the heterogeneous platform used by the user is improved.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a task scheduling method and system under a rapid reconfigurable signal processing heterogeneous platform.
The invention is realized in such a way that the task scheduling method under the rapid reconfigurable signal processing heterogeneous platform finishes screening by using statistical standards and binding tasks with high communication overhead and priority standard selection scheduling tasks at a task selection stage; the processing resources are selected during the processing resource selection phase to minimize the earliest execution completion time for the task or bound task.
Using wijRepresenting the computational overhead of task i on processing resource j, cikThe communication overhead between the task i and the task j is expressed, obviously, for a given DAG signal processing application, the DAG signal processing application comprises N tasks, the platform provides P heterogeneous processing resources, and a corresponding calculation overhead matrix W is providedN×PAnd a communication overhead matrix CN×NMeanwhile, the entry task is EntryTask and the exit task is ExitTask.
Further, the priority criteria used in the task selection phase is
With higher priority for larger RankU values, the application quitting the task
Wherein
Representing the mean of the computational overhead of task i over the various processing resources,
represents the average communication overhead of task i to task k, succ (i) represents the set of immediate successors to task i.
Further, the criteria in the processing resource selection stage is that the processing resource is selected such that the earliest execution completion time, EFT, of the task is minimized, which may be calculated by the following formula EFT (j) wij+ EST (j), where EFT (j) represents the earliest completion time of task i on processing resource j, EST (j) represents the earliest start time of task i on processing resource j, and EST (j) may be expressed by the following equation EST (j) max { avail (j), maxm∈pred(i)(AFT(m)+cmi) Where avail (j) is the time of availability of processing resource j, aft (m) is equal to the minimum est (m), obviously for the task in the entry task EntryTask, est (j) is 0, pred (i) represents the immediate predecessor task set for task i.
Further, the task scheduling method under the rapidly reconfigurable signal processing heterogeneous platform is prepared before scheduling, and specifically comprises the following steps: based on hardware processing resources provided by a platform and signal processing tasks provided by a system task library, a calculation overhead value w of each task on different hardware processing resources is estimated by using a statistical method and an empirical methodijRepresents the computational overhead of task i on processing resource j; analyzing the dependency relationship of DAG application according to the signal processing application deployed by a developer on a visual interface, and simultaneously predicting the communication overhead value between different dependent tasks by combining the parameters set for the tasks by the developer and a bottom hardware resource topology information tableikRepresenting the communication overhead between task i to task j; assuming that the signal processing application graph is composed of N tasks and the platform provides P heterogeneous processing resources, the corresponding calculation overhead matrix is WN×PThe communication overhead matrix is CN×N。
Further, the task scheduling method under the fast reconfigurable heterogeneous platform performs scheduling, and specifically includes:
(1) analyzing a DAG application diagram, searching an entry task EntryTask and an exit task ExitTask, and assigning the EntryTask to a ready queue ReadyQueue;
(2) counting communication overhead between tasks of signal processing application, calculating mean value mu and standard deviation sigma, screening out task pairs with communication overhead falling outside 2 sigma by using 3 sigma principle of Gaussian distribution model, namely screening out task pairs meeting cikPerforming high-communication overhead binding on the task pair (i, k) which is more than or equal to mu +2 sigma, and adding the merged task pair into a binding List;
(3) calculating RankU values of all tasks, and taking the RankU values as priority standards when the tasks are selected, wherein the larger the RankU value is, the higher the priority is;
(4) when the ready queue ReadyQueue is not empty, executing the steps (5) and (6); performing step (7) when the ready queue ReadyQueue is empty;
(5) judging whether the task pair in the binding List exists in the ReadyQueue of the ready queue or not, if so, dispatching the task, distributing the task to a processing resource which enables the subsequent task in the task pair to execute the earliest execution completion time EFT to be the minimum, deleting the task pair from the ReadyQueue of the ready queue and the binding List, adding a new ready task into the ReadyQueue of the ready queue, and returning to the step (4); if no pair of task in the binding List exists in the ready queue ReadyQueue, executing step (6);
(6) selecting a task corresponding to the largest RankU in the ready queue ReadyQueue for scheduling, respectively considering two situations of not copying a precursor task and copying the precursor task, selecting a processing resource which enables the earliest execution completion time EFT of the task to be the smallest in the two situations to execute the task, deleting the task from the ready queue ReadyQueue, adding a new ready task into the ready queue ReadyQueue, and returning to the step (4);
(7) and obtaining a complete task scheduling scheme and finishing scheduling.
The invention also aims to provide a quick reconfigurable signal processing heterogeneous platform system, which is used for realizing the task scheduling method under the quick reconfigurable heterogeneous platform. The quick reconfigurable signal processing heterogeneous platform system utilizes heterogeneous processing resources such as a CPU, a DSP, a GPU, an FPGA and the like to form a large number of board cards for signal processing, integrates all the board cards into a case, utilizes a bus on a back plate of the case and an exchange board of the case to carry out data interaction between processing resources, and stores data and files required in the processing process by means of a memory. The comprehensive software development environment of the platform is a software part of the platform and comprises a visual development interface, a task scheduling module, a resource monitoring module, a configuration control module, an encapsulation module, a communication control module and other functional modules.
By combining all the technical schemes, the invention has the advantages and positive effects that: the invention combines the methods of task replication and task binding, fully utilizes the idle overhead of processing resources, offsets larger communication overhead by a certain amount of calculation overhead, accelerates the execution process of the whole task, and can well meet the real-time requirement. Compared with the common scheduling algorithm of the same type, the algorithm provided by the invention has better time performance.
The system of the invention uses a large amount of heterogeneous hardware resources to accelerate the signal processing application, and simultaneously realizes the quick reconfiguration and quick iteration of application deployment by matching with the upper comprehensive software. The algorithm of the invention is an improvement of the HEFT algorithm and the CEFT algorithm, and by using a method of task replication and task binding, idle time of processing resources is fully utilized, and a certain amount of calculation overhead is used for offsetting larger communication overhead, so that time performance of the whole task and system resource utilization rate are optimized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
FIG. 1 is a block diagram of a task scheduling module, 1, task selection module, according to the present invention; 2. and a processing resource selection module.
Fig. 2 is a schematic structural diagram of a fast reconfigurable signal processing heterogeneous platform system provided by an embodiment of the present invention;
fig. 3 is a flowchart of a task scheduling method under a fast reconfigurable signal processing heterogeneous platform according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a simulation result provided in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides a task scheduling method, a task scheduling system and computer equipment under a fast reconfigurable heterogeneous platform, and the invention is described in detail below with reference to the attached drawings.
The task scheduling method under the rapidly reconfigurable heterogeneous platform provided by the invention can be implemented by adopting other steps by ordinary technicians in the field.
As shown in fig. 1, the task scheduling module under the fast reconfigurable heterogeneous platform provided by the present invention includes:
the task selection module 1 is used for selecting a task or binding the task according to the priority standard used in the task selection stage;
a processing resource selection module 2, configured to select a processing resource in the processing resource selection stage so as to minimize the earliest completion time of execution of the task or the bound task.
As shown in fig. 2, the system architecture of the rapidly reconfigurable heterogeneous signal processing platform provided by the present invention is shown in the figure, and is composed of a software portion and a hardware portion, wherein the hardware portion is composed of heterogeneous hardware such as an FPGA, a DSP, a GPU, and a CPU, the software portion virtualizes the hardware at the bottom layer into uniform logic resources, and the software at the upper layer includes a real-time scheduling module, a configuration control module, a resource management module, a state monitoring module, a visual development interface, and a display control interface. Developers can rapidly deploy and reconfigure signal processing applications in a graphical manner.
The technical solution of the present invention is further described with reference to fig. 3.
The task scheduling algorithm is a scheduling algorithm for accelerating the application of processing the directed acyclic graph type signals aiming at a quick reconfigurable heterogeneous signal processing platform. The process of scheduling may be divided into a task selection phase and a processing resource selection phase. Using wijRepresenting the computational overhead of task i on processing resource j, cikRepresenting the communication overhead between task i to task j, it is clear that for a given DAG signal processing application (assuming N task components, the system setup provides P heterogeneous processing resources), there is a corresponding computational overhead matrix WN×PAnd a communication overhead matrix CN×N. Meanwhile, the entry task is EntryTask and the exit task is ExitTask.
The priority criteria used in the task selection phase are
The higher the RankU value, the higher the priority, and obviously the application exits from the task
Wherein
Representing the mean of the computational overhead of task i over the various processing resources,
represents the average communication overhead of task i to task k, succ (i) represents the set of immediate successors to task i.
The criteria in the processing resource selection stage is that the processing resource is selected such that the earliest execution completion time, EFT, of the task is minimized, which can be calculated by the following equation EFT (j) wij+ EST (j), where EFT (j) represents the earliest completion time of execution of task i on processing resource j, and EST (j) represents the earliest start time of execution of task i on processing resource j. EST (j) can be expressed by the formula EST (j) max (avail (j), maxm∈pred(i)(AFT(m)+cmi) Where avail (j) is the time of availability of processing resource j, aft (m) is equal to the minimum est (m), obviously for the task in the entry task EntryTask, est (j) is 0, pred (i) represents the immediate predecessor task set for task i.
The scheduling algorithm of the invention comprises the following steps:
the first step, preparation before scheduling, specifically includes: based on hardware processing resources provided by a platform and signal processing tasks provided by a system task library, a calculation overhead value w of each task on different hardware processing resources is estimated by using a statistical method and an empirical methodijRepresents the computational overhead of task i on processing resource j; analyzing the dependency relationship of the DAG application according to the signal processing application (DAG graph form) deployed on the visual interface by a developer, and estimating the communication overhead value among different dependent tasks by combining the parameters set for the tasks by the developer and the bottom hardware resource topology information table, cikRepresenting the communication overhead between task i to task j; obtaining a calculation overhead matrix W corresponding to a signal processing application diagram (assuming that N tasks are formed and the system setting provides P heterogeneous processing resources)N×PAnd a communication overhead matrix CN×N。
And step two, scheduling is executed, and the method specifically comprises the following steps:
(1) and analyzing the DAG application diagram, searching an entry task EntryTast and an exit task ExitTask, and assigning the ETtryTask to the ready queue ReadyQueue.
(2) Counting communication overhead between tasks of signal processing application, calculating mean value mu and standard deviation sigma, screening out task pairs with communication overhead falling outside 2 sigma by using 3 sigma principle of Gaussian distribution model, namely screening out task pairs meeting cikAnd (5) performing high-communication overhead binding on the task pair (i, k) of more than or equal to mu +2 sigma, and adding the determined task pair into a binding List.
(3) Calculating RankU values of all tasks, and taking the RankU values as priority standards when the tasks are selected, wherein the larger the RankU value is, the higher the priority is;
(4) when the ready queue ReadyQueue is not empty, executing the steps (5) and (6); step (7) is performed when the ready queue ReadyQueue is empty.
(5) Judging whether the task pair in the binding List exists in the ReadyQueue of the ready queue or not, if so, dispatching the task, distributing the task to a processing resource which enables the subsequent task in the task pair to execute the earliest execution completion time EFT to be the minimum, deleting the task pair from the ReadyQueue of the ready queue and the binding List, adding a new ready task into the ReadyQueue of the ready queue, and returning to the step (4); if no pair of tasks in the binding List exists in the ready queue ReadyQueue, step (6) is performed.
(6) And (3) selecting the task corresponding to the largest RankU in the ready queue ReadyQueue for scheduling, respectively considering two situations of not copying the precursor task and copying the precursor task, selecting the processing resource which enables the earliest execution completion time EFT of the task to be the smallest in the two situations to execute the task, deleting the task from the ready queue ReadyQueue, adding a new ready task into the ready queue ReadyQueue, and returning to the step (4).
(7) And obtaining a complete task scheduling scheme and finishing scheduling.
The technical effects of the present invention will be described in detail with reference to simulations.
In order to evaluate the performance difference of the task scheduling algorithm of the invention compared with some common algorithms and simulate the common algorithms, the simulation mainly compares the classic HEFT algorithm with the CEFT algorithm based on task replication. In the simulation, the execution time and the performance parameter of the whole task are makespan, which are defined as:
makespan=max{AFT(ExitTask)}
a scheduling length ratio, SLR, defined as:
in our simulation, the average scheduling length ratio was used as a performance index for comparing the three scheduling algorithms. As can be seen from the simulation result graph, under the condition that the number of tasks is the same, the algorithm provided by the invention has better average scheduling length ratio performance. Compared with the classic HEFT algorithm, the performance of the algorithm is remarkably optimized, and due to the combined action of task replication and task binding, the communication overhead between tasks is greatly reduced. Compared with the CEFT algorithm with task replication, the performance of the algorithm is improved to a certain extent, and the communication overhead between the task pairs is eliminated due to the binding of the algorithm to the high-communication marketing task pairs.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.