CN119292793A

CN119292793A - Computing task allocation method, system, electronic device, medium and product

Info

Publication number: CN119292793A
Application number: CN202411825959.9A
Authority: CN
Inventors: 卢伟超
Original assignee: Shenzhen Kaihong Digital Industry Development Co Ltd
Current assignee: Shenzhen Kaihong Digital Industry Development Co Ltd
Priority date: 2024-12-12
Filing date: 2024-12-12
Publication date: 2025-01-10

Abstract

The application discloses a computing task distribution method, a system, electronic equipment, media and products, which relate to the technical field of artificial intelligent computing and hardware architecture and are applied to a computing task distribution system, wherein the computing task distribution system comprises an equipment abstraction layer and various equipment, the equipment abstraction layer comprises standardized interfaces of the various equipment and comprises the following components: when a new device is accessed, a computing task is distributed through a device abstraction layer, if the resource load of the new device is over-limited, idle devices with low resource load are screened, in the idle devices, the target device with high overlap ratio with the hardware resource unit of the new device and highest matching degree with the computing task is selected, the computing task is migrated to the target device, and an instruction set is matched. The application solves the technical problem that the allocation of the calculation tasks is not intelligent.

Description

Computing task distribution method, system, electronic equipment, medium and product

Technical Field

The present application relates to the field of artificial intelligent computing and hardware architecture technologies, and in particular, to a computing task allocation method, a computing task allocation system, an electronic device, a storage medium, and a computer program product.

Background

In the current AI application development and deployment, developers need to process various heterogeneous hardware devices (such as GPU, NPU, CPU, etc.), the diversity of the devices brings great challenges to the reasonable distribution of computing tasks, the prior art needs to process a large number of computing tasks when the AI application development is carried out, as the difference between the devices cannot flexibly call different devices, incompatibility among the devices needs to be considered, development difficulty is increased, and the traditional computing task distribution method often relies on static rules to distribute tasks, cannot adapt to the dynamic change of the devices, thereby leading to the fact that task distribution is not intelligent and increasing development complexity, and therefore, the problem of the fact that the computing task distribution is not intelligent in the current AI application development and deployment exists.

Disclosure of Invention

The application mainly aims to provide a computing task allocation method, a computing task allocation system, electronic equipment, a storage medium and a computer program product, and aims to solve the technical problem that computing task allocation is not intelligent.

In order to achieve the above object, the present application provides a computing task allocation method, which is characterized in that the computing task allocation method is applied to a computing task allocation system, the computing task allocation system includes a device abstraction layer, the device abstraction layer defines a standardized interface for each device in the task allocation system, and the computing task allocation method includes:

when detecting that a new device is accessed to the computing task distribution system, distributing computing tasks to the new device for execution through the device abstraction layer;

screening out idle devices with the resource load lower than a preset resource load lower limit from the devices of the computing task distribution system under the condition that the resource load of the new device exceeds the preset resource load upper limit;

And marking the idle equipment with the highest matching degree with the computing task as target equipment, migrating the computing task to the target equipment for execution, and matching the computing task with an instruction set of the target equipment.

In one embodiment, the step of assigning computing tasks to the new device by the device abstraction layer comprises:

Acquiring equipment characteristics of the new equipment through the resource set of the new equipment, and detecting task demands of all computing tasks in the computing task distribution system;

matching the detected task demands with the equipment characteristics to obtain matching degrees;

And distributing the computing task with the highest matching degree to the new equipment through the equipment abstraction layer, and driving a hardware resource unit meeting the computing requirement of the computing task through a corresponding driving module of the new equipment so as to execute the computing task.

In an embodiment, the step of assigning, by the device abstraction layer, a computing task to the new device further comprises:

Monitoring whether new equipment is accessed in the computing task allocation system;

when detecting that a new device is accessed to the computing task allocation system, identifying the device type of the new device, and loading a corresponding driving module of the new device according to the device type, wherein the driving module is used for driving a hardware resource unit on the new device;

And identifying the hardware resource unit on the new equipment to obtain a resource set of the new equipment.

In an embodiment, the step of recording, as the target device, the idle device with the highest matching degree with the computing task, where the contact ratio of the hardware resource units in the idle devices is higher than a preset contact ratio threshold, includes:

calculating the coincidence ratio of hardware resource units between each idle device and the new device;

Filtering idle equipment with the contact ratio lower than a preset contact ratio threshold, and calculating the matching degree of each idle equipment after filtering and the calculation task according to the task requirements of the calculation task and the equipment characteristics of each idle equipment after filtering;

and marking the idle equipment with the highest matching degree as target equipment.

In one embodiment, the step of migrating the computing task to the target device comprises:

acquiring a source memory space of the new device and a target memory space of the target device from a unified virtual address space, wherein the memory space of each device in the computing task allocation system is mapped in the unified virtual address space;

And storing the execution state of the computing task, transmitting the computing task from the source memory space to the target memory space through direct memory access, and distributing computing resources to the computing task after the transmission is completed to recover the execution state of the computing task.

In an embodiment, the step of matching the computing task with the instruction set of the target device comprises:

Determining a hardware acceleration library of the target equipment according to the equipment characteristics of the target equipment, and optimizing the acceleration path of each operator in the hardware acceleration library;

and mapping high-level operators in the computing task to a hardware acceleration instruction set of the target device according to the hardware acceleration library.

In addition, in order to achieve the above object, the present application further provides a computing task allocation system, the computing task allocation system including a device abstraction layer, the device abstraction layer defining a standardized interface for each device in the task allocation system, the computing task allocation apparatus including:

the new equipment access module is used for distributing the computing task to the new equipment for execution through the equipment abstraction layer when detecting that the new equipment is accessed to the computing task distribution system;

The resource reallocation module is used for screening out idle devices with the resource load lower than a preset resource load lower limit from the devices of the computing task allocation system under the condition that the resource load of the new device exceeds the preset resource load upper limit;

And the task migration module is used for recording the idle equipment with the highest matching degree with the computing task as target equipment, migrating the computing task to the target equipment for execution, and matching the computing task with an instruction set of the target equipment, wherein the overlapping degree of the hardware resource units in the idle equipment is higher than a preset overlapping degree threshold value.

Furthermore, to achieve the above object, the application proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the computing task allocation method as described above.

Furthermore, to achieve the above object, the present application also proposes a storage medium, which is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the computing task allocation method as described above.

Furthermore, to achieve the above object, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the computing task allocation method as described above.

The application provides a computing task distribution method which is applied to a computing task distribution system, wherein the computing task distribution system comprises an equipment abstraction layer and equipment, the equipment abstraction layer comprises standardized interfaces of the equipment, when a new equipment is detected to be connected into the computing task distribution system, the computing task is distributed to the new equipment for execution through the equipment abstraction layer, under the condition that the resource load of the new equipment exceeds the upper limit of a preset resource load, idle equipment with the resource load lower than the lower limit of the preset resource load is screened out from the equipment of the computing task distribution system, the coincidence degree of hardware resource units in the idle equipment is higher than a preset coincidence degree threshold value and the idle equipment with the highest matching degree of the computing task is recorded as target equipment, the computing task is migrated to the target equipment for execution, and an instruction set of the computing task and the target equipment is matched.

According to the application, a uniform operation interface is defined for each device by introducing a device abstraction layer, the bottom hardware difference is hidden, so that the system can flexibly cope with computing devices of different brands and models, when a new device is accessed, the rapid identification and task allocation of the new device are realized, the effective utilization of computing resources is ensured, meanwhile, the manual intervention is reduced, the automation degree of the system is improved, the stable operation of the system under the condition of high load is ensured by monitoring the device resource load in real time and screening out the idle devices, the overall performance reduction caused by overload of a single device is avoided, the computing task can be executed on the most suitable device by comprehensively considering the hardware resource coincidence degree and the task matching degree, the task execution efficiency and accuracy are improved, and meanwhile, the matching of the instruction set further improves the task execution performance. Compared with a related scheme, the method is often based on a simple load balancing strategy, lacks comprehensive consideration on dynamic change of resource load and equipment compatibility, provides a uniform management mode for different equipment through an equipment abstraction layer, shields hardware realization differences of bottom equipment, enables calculation tasks to be freely switched among the different equipment, dynamically adjusts according to the resource load and the resource overlap ratio, fully utilizes hardware resources of all the equipment, reduces task waiting time and execution delay, avoids excessive waste and idling of equipment resources, realizes efficient utilization of the resources, and improves resource utilization rate and task execution efficiency.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a flowchart of a computing task allocation method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a second embodiment of a computing task allocation method according to the present application;

FIG. 3 is a schematic flow chart of a third embodiment of a computing task allocation method according to the present application;

FIG. 4 is a schematic block diagram of a computing task allocation device according to an embodiment of the present application;

fig. 5 is a schematic device structure diagram of a hardware running environment related to a computing task allocation method in an embodiment of the present application.

The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the technical solution of the present application and are not intended to limit the present application.

For a better understanding of the technical solution of the present application, the following detailed description will be given with reference to the drawings and the specific embodiments.

The embodiment of the application is applied to a computing task distribution system, which comprises a device abstraction layer and devices, wherein the device abstraction layer comprises standardized interfaces of the devices, and mainly adopts the technical scheme that when a new device is detected to be accessed into the computing task distribution system, the computing task is distributed to the new device for execution through the device abstraction layer, when the resource load of the new device exceeds the upper limit of a preset resource load, idle devices with the resource load lower than the lower limit of the preset resource load are screened out from the devices of the computing task distribution system, the coincidence degree of hardware resource units in the idle devices is higher than the preset coincidence degree threshold, and the idle device with the highest matching degree with the computing task is recorded as target device, the computing task is migrated to the target device for execution, and the instruction set of the computing task and the target device is matched.

In this embodiment, for convenience of description, the following description will be made with the computing task allocation system as an execution subject.

Because in the current AI application development and deployment environment, developers face the challenge of handling diversified heterogeneous hardware devices (covering GPU, NPU, CPU and other types) and numerous AI computing frameworks (such as TensorFlow, pyTorch and the like), the diversity of the devices and frameworks significantly increases the complexity of reasonable distribution of computing tasks, and the prior art often requires the developers to write specific codes for each device and framework, which not only aggravates the complexity of the development process, but also increases the economic cost of hardware expansion, therefore, the problem of lack of intelligence in distribution of computing tasks in AI application development and deployment currently exists.

The application provides a solution, a device abstraction layer is introduced, a uniform operation interface is defined for each device, and the bottom hardware difference is hidden, so that the system can flexibly cope with computing devices of different brands and models, when a new device is accessed, tasks are screened and migrated in real time by automatically identifying and driving loading the new device and dynamically adjusting according to the real-time resource condition of the device, the tasks can be ensured to be executed on the device with sufficient resources, the response speed and stability of the system are improved, meanwhile, the dynamic allocation is carried out according to the task requirement, and the resource utilization rate and the task execution efficiency are improved.

It should be noted that, the execution body of the embodiment may be a computing service device with functions of data processing, network communication and program running, such as a tablet computer, a personal computer, a mobile phone, or an electronic device, a computing task distribution system, or the like, which can implement the above functions. The present embodiment and the following embodiments will be described below with reference to a computing task allocation system.

Based on this, an embodiment of the present application provides a computing task allocation method, and referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of the computing task allocation method of the present application.

In this embodiment, the method is applied to a computing task distribution system, where the computing task distribution system includes an equipment abstraction layer and each equipment, a standardized interface of each equipment is defined in the equipment abstraction layer, and the computing task distribution method includes steps S01-S03:

step S01, when detecting that a new device is accessed to the computing task distribution system, distributing a computing task to the new device for execution through a device abstraction layer;

It should be noted that, the device abstraction layer is a middle layer, and defines a standardized interface of each device, so that the upper layer application can realize unified management and control of the devices without concern for the specific implementation of the bottom layer device, and the standardized interface refers to a unified interface provided by the device abstraction layer for each device, and is used for realizing communication and data exchange between the devices, and tasks of each device are managed through the unified standardized interface. For example, whether a GPU or an NPU, a developer may initiate convolution calculation or matrix operation requests through the same interface, and the system automatically adapts these requests at the bottom level.

Additionally, it should be noted that, when the system detects that a new device is accessed, the new device is identified through the device abstraction layer, where the new device is a hardware device that has just been accessed to the computing task allocation system, once the new device is identified, the system allocates a matched computing task to the new device according to the current task queue and the performance characteristics of the device, and the computing task is a computing task that needs to be executed on the hardware device.

It can be understood that, since the existing solution generally needs to manually configure the new device driver and communication interface, the new device can be seamlessly accessed to the computing task allocation system through the standardized interface defined by the device abstraction layer, without any additional modification or configuration to the system, and the system can automatically detect the access of the new device, and immediately allocate the computing task to the new device for execution, thereby improving the flexibility and expandability of the system.

Step S02, screening out idle devices with the resource load lower than the preset resource load lower limit from the devices of the computing task distribution system under the condition that the resource load of the new device exceeds the preset resource load upper limit;

It should be noted that, in the task execution process, the system continuously monitors the resource load condition of each device, where the resource load refers to the computing task currently being processed by a device and the amount of resources occupied by the device, when the resource load of a certain device exceeds a preset upper limit of resource load, the system triggers a resource load monitoring mechanism, idle devices with resource loads lower than a preset lower limit of resource load are screened out from all devices, the preset upper limit/lower limit of resource load refers to a threshold value of resource loads set by the system, and the idle devices refer to devices with resource loads lower than the preset lower limit of resource load.

It can be understood that, since the existing scheme generally adopts a static or experience-based allocation policy, step S02 is performed, by monitoring the resource load of each device in real time, it is ensured that measures are timely taken when the resource load exceeds a preset upper limit, idle devices with resource loads lower than a preset lower limit are screened, an alternative target is provided for migration of computing tasks, dynamic monitoring and intelligent adjustment of the device resource load are realized, and task execution failure or performance degradation caused by resource overload is avoided.

And S03, marking the idle equipment with the highest matching degree with the computing task as target equipment, migrating the computing task to the target equipment for execution, and matching the computing task with an instruction set of the target equipment, wherein the matching degree of the hardware resource units in the idle equipment is higher than a preset matching degree threshold value.

It should be noted that, the hardware resource unit refers to hardware resources on the device, such as CPU, GPU, NPU, the degree of coincidence refers to the similarity degree of hardware resources between different devices, the preset degree of coincidence threshold refers to a threshold of the degree of coincidence of hardware resources set by the system, the threshold is used for screening devices matched with a target task, the target device refers to devices selected to execute a computing task after screening and matching, and the instruction set refers to an instruction set supported by the devices and is used for guiding the devices to execute a specific computing task.

Additionally, it should be noted that after the idle devices are screened, the system needs to further evaluate the matching degree between the devices and the current computing task, select the idle device with the highest matching degree as the target device according to the overlapping degree (such as the processor type, the memory size, etc.) of the hardware resource unit and the task requirement (such as the computation intensive, the memory intensive, etc.), migrate the computing task to the target device for execution, and match the instruction set of the computing task and the target device, so as to ensure that the task can be executed efficiently.

It can be understood that, since the existing scheme often relies on static rules or simple load balancing algorithms to distribute tasks, and cannot adapt to dynamic changes of devices and diversity of tasks, the system can select a target device that is most matched with the tasks by comprehensively considering the overlapping ratio of hardware resource units and the matching ratio of the computing tasks, and after the tasks are migrated, the system can automatically match instruction sets of the computing tasks and the target device, so that the tasks can be efficiently executed on the target device.

In the second embodiment of the present application, the same or similar content as in the first embodiment of the present application may be referred to the above description, and will not be repeated. On this basis, referring to fig. 2, in step S01, the step of allocating a computing task to a new device for execution through a device abstraction layer includes steps S11 to S13:

step S11, obtaining equipment characteristics of the new equipment through a resource set of the new equipment, and detecting task requirements of all computing tasks in a computing task distribution system;

the system obtains the device characteristics of the new device through the resource set of the new device, where the resource set refers to the sum of various hardware resource units of the new device, and includes, but is not limited to, hardware resources such as CPU, GPU, memory, storage, and the like, and is represented by a device vector, for example:

Wherein, For hardware resource units on a device, such as a computing core and a memory block, device characteristics are refined based on the resource sets, and can reflect key attributes of device performance and applicable scenarios, such as computing capability, memory size, storage speed, and the like, for example, a GPU resource-rich device may be characterized by high parallel computing capability, large memory, and the like, meanwhile, the system needs to detect task requirements of each computing task in the computing task allocation system, where the task requirements refer to hardware and software conditions that each computing task needs to satisfy in the execution process, including, but not limited to, computing complexity, memory occupation, dependence on a specific hardware accelerator, and the like, for example, some AI tasks may need high-performance GPUs to accelerate training of a deep learning model, and other tasks may pay more attention to processing speed and memory size of a CPU.

Step S12, matching the detected task demands with equipment characteristics to obtain matching degrees;

It should be noted that, the system matches the detected task demands with the device features to obtain matching degrees, the matching degrees refer to the matching degree of each computing task and the new device in terms of resource demands and capabilities, and the matching degree computing formula is as follows:

Wherein, The characteristics of the device are represented by,Representing task demands.

For example, if the task is primarily computationally intensive (e.g., convolution computation, matrix multiplication), the system may prefer a device that is computationally intensive, supports a large number of parallel computations (e.g., GPU or NPU), if the task requires a high memory bandwidth, or requires large-scale data storage, the system may prefer a device with high bandwidth video memory (e.g., GPU) or a device with high memory capacity (e.g., CPU of some high performance server), for low latency tasks (e.g., real-time video processing, real-time reasoning, etc.), the system may choose a device that responds faster, e.g., low latency NPU.

And S13, distributing the computing task with the highest matching degree to the new equipment through the equipment abstraction layer, and driving a hardware resource unit meeting the computing requirement of the computing task through a corresponding driving module of the new equipment so as to execute the computing task.

It should be noted that, the system distributes the computing task with the highest matching degree to the new device through the device abstraction layer, and drives the hardware resource unit meeting the computing requirement of the computing task through the driving module corresponding to the new device to execute the computing task, where the driving module is the software specially used for controlling and managing the specific hardware device, and is responsible for translating the instruction of the upper layer application into the signal that can be understood by the hardware, so as to drive the hardware device to execute the corresponding operation. For example, a GPU driver module may be capable of receiving a computing instruction from an upper layer application and converting it into parallel computing tasks that the GPU may be capable of performing.

The system is mainly used for performing short-array operation on a task demand of a computing task assuming that the current computing task is a Convolutional Neural Network (CNN) reasoning task, and requires high parallel computing capability, if two available devices A and B exist in the system, wherein the task demand of the computing task is in a memory intensive type, high parallelism is supported, 16GB video memory is provided, the memory bandwidth is 512GB/S, floating point computing capability is 30 TFLOPS, B devices (CPU) have 16 computing cores, single-core performance is stronger, but the parallel computing capability is limited, the memory bandwidth is 50GB/S, the system is used for judging that the A device is most suitable for performing the task, and can efficiently execute the convolutional computing task because of the strong parallel computing capability and high floating point computing capability, if the current computing task is in a large-scale data processing task, high bandwidth and large memory capacity are required, if two available devices A and B exist in the system, the A device (GPU) has 16GB video memory and 512GB memory bandwidth and B device (CPU) has the maximum memory bandwidth, and the system has the memory capacity of 200GB/S is more suitable for performing the task, and the task is more suitable for performing the task in a large-scale data processing task, and the system has the high-scale data processing task is provided that the task demand of the task is in a large-scale data processing task, and the task is more suitable for performing the task, the B device (NPU) is specially optimized for low-latency reasoning tasks, and the response speed is fast, the system will determine that the B device (NPU) is most suitable for real-time video reasoning tasks, because it optimizes low latency and fast response, and is capable of processing video streams in real-time.

In this embodiment, through the extraction of the resource set and the device feature, the system can clearly understand the performance characteristics and the application scope of the new device, and through the detection of the task requirement, the system can determine the specific requirement of each computing task, provide necessary data support for subsequent computing task allocation, and help to achieve more intelligent task allocation, through computing the matching degree, the system can accurately evaluate the execution effect of each computing task on the new device, thereby avoiding the allocation of unsuitable tasks to the new device, improving the utilization rate of computing resources and the efficiency of task execution, and through the cooperative work of the device abstraction layer and the driving module, the system can accurately allocate the computing task with the highest matching degree to the new device, and fully utilize the hardware resource of the new device to execute the computing task, thereby not only improving the execution efficiency of the computing task, but also reducing the complexity and cost of developing and deploying the application.

In a possible implementation manner, in step S01, before the step of allocating, by the device abstraction layer, the computing task to the new device for execution, steps a01 to a03 are further included:

Step A01, monitoring whether new equipment is accessed in a computing task allocation system;

It should be noted that, in the computing task allocation system, a device file subsystem monitoring module is deployed, which is responsible for monitoring insertion and removal events of devices in the system in real time, when a new device accesses the system, the monitoring module immediately detects the event and triggers a corresponding process flow, and the monitoring module obtains and identifies a unique identifier (such as a device ID, a serial number, etc.) of the new device by reading a device file of the new device or using other identification mechanisms.

Step A02, when detecting that the new equipment is accessed into the computing task distribution system, identifying the equipment type of the new equipment, and loading a corresponding driving module of the new equipment according to the equipment type, wherein the driving module is used for driving a hardware resource unit on the new equipment;

It should be noted that, when the monitoring module detects that a new device is accessed, the system determines the device type of the new device through the device identification module (based on the hardware ID, model information or specific communication protocol of the device), once the device type is identified, the system searches the preconfigured driver library for a driver module matching the device type and loads it into the system, and the driver module then communicates with the new device to drive the hardware resource unit thereon.

Additionally, it should be noted that, the system dynamically optimizes the drive loading of the new device to ensure that the time for loading the corresponding drive of the device is minimum, and the optimization model of the dynamic device loading is:

Wherein, Representation deviceLoading the time overhead of the corresponding device driver.

And step A03, identifying a hardware resource unit on the new equipment to obtain a resource set of the new equipment.

It should be noted that, after the driving module loads and successfully communicates with the new device, the system identifies the hardware resource units (such as CPU core, GPU, memory, storage, etc.) on the new device through the resource identification module, and the information of these resource units is collected and integrated into a resource set for the subsequent allocation of computing tasks.

In this embodiment, the system can discover the access of the new device in real time through the real-time monitoring mechanism, so as to provide a basis for the subsequent device identification and driving loading, ensure that the computing task distribution system can flexibly cope with the dynamic changes of the hardware device, improve the expandability and flexibility of the system, and automatically load the correct driving module for the new device through the device type identification and driving module dynamic loading mechanism, thereby driving the hardware resource unit on the new device, ensuring that the new device can immediately participate in task computing, improving the utilization rate of computing resources and the overall performance of the system, and by identifying the hardware resource unit on the new device and acquiring the resource set thereof, the system can more accurately understand the computing capability, memory layout and other hardware characteristics of the new device, so as to provide a basis for realizing the intelligent distribution of computing tasks, ensure that the tasks can operate efficiently on the new device, and simultaneously improve the resource utilization rate and the overall performance of the system.

In the third embodiment of the present application, the same or similar contents as those of the first and second embodiments can be referred to the above description, and the description thereof will be omitted. On this basis, referring to fig. 3, in step S03, the step of recording, as a target device, an idle device having a highest matching degree with a computing task, where the matching degree of the hardware resource units in each idle device is higher than a preset matching degree threshold, includes steps S21 to S23:

Step S21, calculating the coincidence ratio of hardware resource units between each idle device and the new device;

It should be noted that, obtaining the hardware resource unit information of the new device and all the current idle devices, where the information includes but is not limited to CPU model, GPU model, memory capacity, storage type, capacity, etc.,

Then, the system compares the hardware resource units of each pair of idle equipment and new equipment by using a resource overlap ratio algorithm, calculates the overlap ratio between the idle equipment and the new equipment, and the overlap ratio is usually expressed by a percentage, reflects the similarity degree of the idle equipment and the new equipment on hardware resources, and the overlap ratio algorithm model is expressed as follows:

Wherein, Representing the corresponding set of resources of the new device,Representing the set of resources corresponding to the idle device.

Step S22, filtering idle equipment with the contact ratio lower than a preset contact ratio threshold, and calculating the matching degree of each idle equipment after filtering and a calculation task according to the task requirements of the calculation task and the equipment characteristics of each idle equipment after filtering;

it should be noted that, based on step S21, the system filters out all idle devices with a contact ratio lower than a preset contact ratio threshold, only the device with a higher contact ratio is reserved as a candidate, the preset contact ratio threshold is determined according to the requirement of the system on the resource matching precision and the requirement of the actual application scenario, the system calculates the matching degree of each candidate device and the task according to the calculation requirement of the calculation task (such as the calculation amount, the memory requirement, the dependence on specific hardware, etc.) and the device characteristics (such as the CPU performance, the GPU model, the memory capacity, etc.) of each idle device after filtering, and the calculation formula is:

Wherein, As a device characteristic of the target device,For calculating the calculation requirement of the task, the matching degree is also represented by a numerical value, and reflects the matching degree between the equipment and the task.

In step S23, the idle device with the highest matching degree is recorded as the target device.

It should be noted that, based on step S22, the system determines, according to the calculated matching degree, the idle device with the highest matching degree as the target device, and this target device will be responsible for executing the current computing task.

In this embodiment, by calculating the overlap ratio, the system can evaluate the matching degree of the idle device and the new device on the hardware resource more accurately, provide more accurate information for the subsequent task allocation, by identifying the device with high overlap ratio, the system can more effectively utilize the existing resource, avoid allocating the task to the device with unmatched resource, thereby reducing the resource waste, by calculating the overlap ratio, the system can intelligently allocate the task to the most suitable device, thereby improving the efficiency and accuracy of task execution, by optimizing the task allocation, the system can reduce the migration and waiting time of the task between the devices, thereby accelerating the task execution speed, by selecting the device with the highest matching degree as the target device, the system can ensure the task to be executed on the optimal resource, thereby improving the efficiency and accuracy of task execution, and by intelligent task allocation, the system can avoid multiple tasks competing for the same resource at the same time, thereby reducing the resource conflict and waiting time.

In a possible implementation manner, in step S03, the step of migrating the computing task to the target device includes steps B01 to B02:

Step B01, acquiring a source memory space of the new device and a target memory space of the target device in a unified virtual address space, wherein the memory space of each device in the computing task allocation system is mapped in the unified virtual address space;

It should be noted that, the system uses the memory mapping technique to make the physical memory space Mapping to unified virtual address spaceThe data sharing among different devices is ensured, and the optimization goal of the memory mapping is to minimize the memory copy time during data transmission:

Wherein, To minimize memory copy time during data transfer,In order to be able to transmit the time of the data,The time required for the memory mapping.

Additionally, it should be noted that, the system accesses a unified virtual address space, which is a logical address space that maps the memory space of all devices in the computing task allocation system, through which the system can access and operate these memory spaces in a unified and consistent manner, and based on the identification information of the new device and the target device, find and obtain their source memory space (i.e., the memory area where the new device currently stores the computing task) and target memory space (i.e., the memory area where the target device is planning to receive and process the computing task) in the unified virtual address space.

And step B02, storing the execution state of the computing task, transmitting the computing task from the source memory space to the target memory space through direct memory access, distributing computing resources for the computing task after the transmission is completed, and recovering the execution state of the computing task.

It should be noted that, on the basis of step B01, the system saves the execution state of the current computing task, where the state includes the current progress of the task, the used data, and any context information related to task execution, and uses a direct memory access (Direct Memory Access, DMA) technique to transfer the computing task from the source memory space to the target memory space, and allows the devices (such as the new device and the target device) to directly exchange data through the DMA technique, without intervention of a CPU, so as to avoid CPU intervention, thereby reducing the transfer delay:

Wherein, The time of the data transmission is determined,In order for the DMA transfer time to be sufficient,For the memory mapping time, the time of passing through the CPU originally is reduced by the DMA technology, so that the data transmission efficiency is greatly improved, after the transmission is completed, the system allocates proper computing resources (such as CPU, GPU and the like) for the computing task according to the requirements of the computing task and the resource condition of the target equipment, restores the execution state of the computing task, and continues to execute the task on the target equipment, and the task migration model is as follows:

Wherein, In order to calculate the node point of the network,For a set of corresponding resource sets for a target device,In order to calculate the task to correspond to the calculation requirement, the calculation node is used for calculating the task migrationWill be from the source deviceRebinding to target deviceEnsuring that tasks can be efficiently performed on new devices.

In this embodiment, a unified memory access interface is provided through unifying virtual address spaces, so that the system can access the memory space of any device indiscriminately, manage and utilize the memory space more effectively, reduce occurrence of memory fragmentation, and through direct memory access, the system can greatly reduce delay of data transmission, improve efficiency of data transmission, and meanwhile, the system can intelligently allocate computing resources for the system according to requirements of computing tasks and resource conditions of target devices, thereby solving the problem of incapacitation of computing resource allocation.

In one possible implementation, in step S03, the step of matching the instruction set of the computing task and the target device includes steps B11 to B12:

Step B11, determining a hardware acceleration library of the target equipment according to the equipment characteristics of the target equipment, and optimizing the acceleration path of each operator in the hardware acceleration library;

It should be noted that, the hardware acceleration library is a set of algorithms and operations optimized for a specific hardware device (such as GPU, FPGA, ASIC, etc.), where the algorithms and operations can utilize parallel processing capability of hardware to accelerate computing tasks in the AI model, and the acceleration path is one or more execution paths in the hardware acceleration library for each operator (such as matrix multiplication, convolution operation, etc.), and these paths are optimized according to hardware characteristics and task requirements to achieve optimal performance.

Additionally, it should be noted that, the system collects detailed device characteristics of the target device, where the characteristics include, but are not limited to, a processor type (such as CPU, GPU, NPU, etc.), a memory size, a processing speed, a power consumption limit, etc. according to the collected device characteristics, a hardware acceleration library that is the best match to the target device is selected from a predefined set of hardware acceleration libraries, and optimizes acceleration paths for each operator in the library, including adjusting an execution order of the operators, reducing unnecessary memory accesses, using parallel processing capability, etc., so as to maximize execution efficiency of a computing task, where the optimization model is:

Wherein, For a set of hardware instructions on a target device,Operators that need to be executed for a task.

And step B12, mapping the high-level operators in the computing task to a hardware acceleration instruction set of the target device according to the hardware acceleration library.

It should be noted that, in AI applications, a high-level operator refers to a computing task defined by a high-level programming language (such as Python) and a deep learning framework (such as TensorFlow, pyTorch), where these tasks are composed of a series of high-level operators, such as addition, multiplication, activation functions, and the like, and a hardware acceleration instruction set is a bottom instruction set supported by the target device hardware, and these instructions are directly executed by hardware to implement various computing functions.

Additionally, it should be noted that, mapping higher-level operators (such as matrix multiplication, convolution operation, etc.) in a computing task onto instruction sets supported by a target device hardware acceleration library, where the instruction sets are instruction sets that can be directly understood and executed by the underlying hardware, the mapping process involves converting semantics of the higher-level operators into specific implementations of the underlying hardware instructions, including determining the input and output of the operators, selecting appropriate hardware instructions, setting parameters of the instructions, etc.

Additionally, it should be noted that, the system automatically replaces the high-level operators in the computing task with the acceleration library operators of the target device through the device abstraction layer, for example, the convolution operation uses a dedicated acceleration library on the NPU and uses a cuDNN library on the GPU, and the system automatically replaces the operators according to the device characteristics, so as to ensure that the task can be executed efficiently in the migration process.

Additionally, it should be noted that the system may replace the computing operators based on the hardware acceleration library of the target device, e.g., the convolution operation may call cuDNN the acceleration library on the GPU and call a dedicated NPU convolution acceleration library on the NPU, which ensures optimal performance when executing on different devices.

In the embodiment, the most suitable hardware acceleration library is automatically determined according to the equipment characteristics of the target equipment, and the acceleration path of each operator in the library is further optimized, so that the intelligent allocation of the calculation task is realized, the execution efficiency of the AI application is improved, the complexity of development and deployment is reduced, the accurate execution of the calculation task and the efficient utilization of hardware resources are realized by automatically mapping the high-level operators onto the hardware acceleration instruction set of the target equipment, the execution efficiency of the AI application is improved, the complexity of development and deployment is reduced, and the AI application can be more rapidly adapted to different target equipment.

It should be noted that the foregoing examples are only for understanding the present application, and are not meant to limit the method for assigning computing tasks according to the present application, and more forms of simple transformation based on the technical concept are all within the scope of the present application.

The present application also provides a computing task allocation system, please refer to fig. 4, the computing task allocation system includes a device abstraction layer and each device, a standardized interface of each device is defined in the device abstraction layer, and the computing task allocation apparatus includes:

A new device access module 10, configured to allocate a computing task to a new device for execution through a device abstraction layer when detecting that the new device is accessed to the computing task allocation system;

The resource reallocation module 20 is configured to screen each idle device with a resource load lower than a preset resource load lower limit from devices of the computing task allocation system when the resource load of the new device exceeds the preset resource load upper limit;

the task migration module 30 is configured to record, as a target device, an idle device with a highest matching degree with the computing task, where the matching degree of the hardware resource units in each idle device is higher than a preset matching degree threshold, migrate the computing task to the target device for execution, and match an instruction set of the computing task and the target device.

Optionally, the new device access module 10 is further configured to:

Acquiring equipment characteristics of the new equipment through a resource set of the new equipment, and detecting task demands of all computing tasks in a computing task distribution system;

Matching the detected task demands with equipment characteristics to obtain matching degrees;

Optionally, the new device access module 10 is further configured to:

when detecting that the new equipment is accessed into the computing task distribution system, identifying the equipment type of the new equipment, and loading a corresponding driving module of the new equipment according to the equipment type, wherein the driving module is used for driving a hardware resource unit on the new equipment;

And identifying a hardware resource unit on the new equipment to obtain a resource set of the new equipment.

Optionally, the task migration module 30 is further configured to:

Calculating the overlap ratio of hardware resource units between each idle device and the new device;

Filtering idle equipment with the contact ratio lower than a preset contact ratio threshold, and calculating the matching degree of each filtered idle equipment and a calculation task according to the task requirements of the calculation task and the equipment characteristics of each filtered idle equipment;

And marking the idle device with the highest matching degree as the target device.

Optionally, the task migration module 30 is further configured to:

The execution state of the calculation task is saved, the calculation task is transmitted from the source memory space to the target memory space through direct memory access, and after the transmission is completed, calculation resources are distributed for the calculation task, so that the execution state of the calculation task is recovered.

Optionally, the task migration module 30 is further configured to:

According to the equipment characteristics of the target equipment, determining a hardware acceleration library of the target equipment, and optimizing the acceleration path of each operator in the hardware acceleration library;

the high-level operators in the computing task are mapped onto a hardware acceleration instruction set of the target device according to the hardware acceleration library.

The computing task allocation device provided by the application can solve the technical problem of incapacity of computing task allocation by adopting the computing task allocation method in the embodiment. Compared with the prior art, the computing task allocation device provided by the application has the same beneficial effects as the computing task allocation method provided by the embodiment, and other technical features in the computing task allocation device are the same as the features disclosed by the method of the embodiment, and are not repeated herein.

The application provides electronic equipment, which comprises at least one processor and a memory in communication connection with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the computing task allocation method in the first embodiment.

Referring now to fig. 5, a schematic diagram of an electronic device suitable for use in implementing embodiments of the present application is shown. Electronic devices in embodiments of the present application may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, PADs (Portable Application Description: tablet computers), and the like, as well as stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.

As shown in fig. 5, the electronic apparatus may include a processing device 1001 (e.g., a central processing unit, a graphics processor, etc.), which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access Memory (RAM: random Access Memory) 1004. In the RAM1004, various programs and data required for the operation of the electronic device are also stored. The processing device 1001, the ROM1002, and the RAM1004 are connected to each other by a bus 1005. An input/output (I/O) interface 1006 is also connected to the bus. In general, a system including an input device 1007 including, for example, a touch screen, a touch pad, a keyboard, a mouse, an image sensor, a microphone, etc., an output device 1008 including, for example, a Liquid crystal display (LCD: liquid CRYSTAL DISPLAY), a speaker, a vibrator, etc., a storage device 1003 including, for example, a magnetic tape, a hard disk, etc., and a communication device 1009 may be connected to the I/O interface 1006. The communication means 1009 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While electronic devices having various systems are shown in the figures, it should be understood that not all of the illustrated systems are required to be implemented or provided. More or fewer systems may alternatively be implemented or provided.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through a communication device, or installed from the storage device 1003, or installed from the ROM 1002. The above-described functions defined in the method of the disclosed embodiment of the application are performed when the computer program is executed by the processing device 1001.

The electronic equipment provided by the application adopts the calculation task allocation method in the embodiment, and can solve the technical problem of incapacity of calculation task allocation. Compared with the prior art, the electronic device provided by the application has the same beneficial effects as the computing task allocation method provided by the embodiment, and other technical features in the electronic device are the same as the features disclosed by the method of the previous embodiment, and are not repeated herein.

It is to be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the description of the above embodiments, particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

The present application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon for performing the computing task allocation method of the above-described embodiments.

The computer readable storage medium provided by the present application may be, for example, a USB flash disk, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access Memory (RAM: random Access Memory), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (EPROM: erasable Programmable Read Only Memory or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this embodiment, the computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (Radio Frequency) and the like, or any suitable combination of the foregoing.

The computer readable storage medium may be included in the electronic device or may exist alone without being incorporated into the electronic device.

The computer readable storage medium carries one or more programs, when the one or more programs are executed by the electronic device, the computing task distribution device is applied to a computing task distribution system, the computing task distribution system comprises a device abstraction layer, a new device corresponding driving module can be loaded when the new device is detected to be accessed into the computing task distribution system, the new device is communicated with the device abstraction layer to distribute an existing computing task to the new device for execution, when the resource of the new device does not meet the resource requirement of the computing task, target devices meeting the resource requirement of the computing task are screened out from all devices of the computing task distribution system, the computing task is migrated to the target devices for execution, and a high-level operator of the computing task is mapped to an optimal instruction set supported by the target devices.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN: local Area Network) or a wide area network (WAN: wide Area Network), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present application may be implemented in software or in hardware. Wherein the name of the module does not constitute a limitation of the unit itself in some cases.

The readable storage medium provided by the application is a computer readable storage medium, and the computer readable storage medium stores computer readable program instructions (namely computer programs) for executing the computing task allocation method, so that the technical problem of incapacity of computing task allocation can be solved. Compared with the prior art, the beneficial effects of the computer readable storage medium provided by the application are the same as those of the computing task allocation method provided by the above embodiment, and are not described in detail herein.

The application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of a computing task allocation method as described above.

The computer program product provided by the application can solve the technical problem that the allocation of the computing tasks is not intelligent. Compared with the prior art, the beneficial effects of the computer program product provided by the application are the same as those of the computing task allocation method provided by the above embodiment, and are not described herein.

The foregoing description is only a partial embodiment of the present application, and is not intended to limit the scope of the present application, and all the equivalent structural changes made by the description and the accompanying drawings under the technical concept of the present application, or the direct/indirect application in other related technical fields are included in the scope of the present application.

Claims

1. The computing task distribution method is characterized by being applied to a computing task distribution system, wherein the computing task distribution system comprises a device abstraction layer and each device, the device abstraction layer comprises standardized interfaces of each device, and the computing task distribution method comprises the following steps:

2. The computing task allocation method according to claim 1, wherein the step of allocating computing tasks to the new device by the device abstraction layer comprises:

3. The computing task allocation method according to claim 2, wherein the step of allocating computing tasks to the new device execution through the device abstraction layer further comprises:

4. The method for allocating computing tasks according to claim 1, wherein the step of recording, as the target device, the idle device having the highest matching degree with the computing task, for which the overlap ratio of the hardware resource units in each idle device is higher than a preset overlap ratio threshold value, comprises:

5. The computing task allocation method according to claim 1, wherein the step of migrating the computing task to the target device comprises:

6. The computing task allocation method of claim 1, wherein the step of matching the computing task with the instruction set of the target device comprises:

7. A computing task allocation system comprising a device abstraction layer defining standardized interfaces for respective devices in the task allocation system, the computing task allocation apparatus comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the computing task allocation method of any one of claims 1 to 6.

9. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the computing task allocation method according to any one of claims 1 to 6.

10. A computer program product, characterized in that it comprises a computer program which, when executed by a processor, implements the steps of the computing task allocation method according to any one of claims 1 to 6.