CN117827451A

CN117827451A - Task processing method, multi-core graphics processor, electronic device and storage medium

Info

Publication number: CN117827451A
Application number: CN202311868583.5A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Moore Threads Technology Co Ltd
Current assignee: Moore Threads Technology Co Ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-05

Abstract

The embodiment of the application discloses a task processing method of a multi-core Graphics Processor (GPU), a multi-core graphics processor, electronic equipment and a computer storage medium, wherein the method comprises the following steps: in the process of processing the first task by the GPU, determining a target subtask of the first core from the first task based on the current processing capacity of the first core and the number of the allocated subtasks allocated to each core in the first task; the first core is one of multiple cores; and acquiring and processing the target subtasks through the first core.

Description

Task processing method, multi-core graphics processor, electronic device and storage medium

技术领域Technical Field

本申请涉及但不限于计算机技术领域，尤其涉及一种任务处理方法、多核图形处理器、电子设备和存储介质。The present application relates to but is not limited to the field of computer technology, and in particular to a task processing method, a multi-core graphics processor, an electronic device and a storage medium.

背景技术Background technique

大规模图形处理器(Graphics Processing Unit，GPU)通常由多个核组成，GPU处理的任务可以分为多个子任务，分配到各个核执行。然而，将子任务分配到各个核的过程，需要全局唯一的调度模块来执行，这样会导致GPU处理效率受到调度模块的调度效率的限制。或者，这一过程需要预先设置好核需要执行的子任务集合，由于各个核对各自的子任务集合的执行时长不同，使得各个核的运行时长差距较大，导致GPU处理性能差。Large-scale graphics processing units (GPUs) are usually composed of multiple cores. The tasks processed by the GPU can be divided into multiple subtasks and assigned to each core for execution. However, the process of assigning subtasks to each core requires a globally unique scheduling module to execute, which will cause the GPU processing efficiency to be limited by the scheduling efficiency of the scheduling module. Alternatively, this process requires pre-setting the set of subtasks that the core needs to execute. Since each core has a different execution time for its own set of subtasks, the running time of each core varies greatly, resulting in poor GPU processing performance.

发明内容Summary of the invention

本申请实施例提供一种任务处理方法、多核图形处理器、电子设备和存储介质，能够使各个核根据自身处理能力，主动获取子任务，实现子任务的调度，使多个核的子任务更加均衡，提高GPU处理性能。The embodiments of the present application provide a task processing method, a multi-core graphics processor, an electronic device and a storage medium, which can enable each core to actively acquire subtasks according to its own processing capability, implement subtask scheduling, make the subtasks of multiple cores more balanced, and improve GPU processing performance.

本申请的技术方案是这样实现的：The technical solution of this application is implemented as follows:

本申请实施例提供了一种任务处理方法，包括：The present application embodiment provides a task processing method, including:

在所述图形处理器处理第一任务的过程中，基于核对子任务的可处理的子任务的数量和所述第一任务中已分配到核的已分配子任务的数量，从所述第一任务中确定所述核的目标子任务；通过所述核，获取所述目标子任务并处理。During the process of the graphics processor processing the first task, based on checking the number of processable subtasks of the subtask and the number of allocated subtasks in the first task that have been allocated to the core, a target subtask of the core is determined from the first task; and the target subtask is obtained and processed by the core.

本申请实施例提供了一种图形处理器，包括：The present application embodiment provides a graphics processor, including:

多个核；其中，所述核，用于在所述图形处理器处理第一任务的过程中，基于核对子任务的可处理的子任务的数量和所述第一任务中已分配到核的已分配子任务的数量，从所述第一任务中确定所述核的目标子任务；获取所述目标子任务并处理。Multiple cores; wherein the core is used to determine the target subtask of the core from the first task based on the number of processable subtasks of the core subtask and the number of allocated subtasks in the first task that have been allocated to the core during the process of the graphics processor processing the first task; obtain the target subtask and process it.

本申请实施例提供了一种电子设备，包括上述多核图形处理器。An embodiment of the present application provides an electronic device, comprising the multi-core graphics processor described above.

本申请实施例提供了一种计算机可读存储介质，其上存储有可执行指令，用于被处理器执行时，实现上述任务处理方法。An embodiment of the present application provides a computer-readable storage medium having executable instructions stored thereon for implementing the above-mentioned task processing method when executed by a processor.

本申请实施例提供一种任务处理方法、多核图形处理器、电子设备和计算机存储介质，由于GPU中的核可以根据对子任务的可处理的子任务的数量，和已分配到核的已分配子任务的数量，确定并及时获取第一任务中的目标子任务；如此，核处理子任务越快，完成的子任务越多，就能够获取更多子任务，从而减少子任务在部分核内堆积，提高核间负载的均衡性，进而提高GPU的任务处理性能。The embodiments of the present application provide a task processing method, a multi-core graphics processor, an electronic device, and a computer storage medium. Since the cores in the GPU can determine and promptly acquire the target subtask in the first task based on the number of subtasks that can be processed for the subtask and the number of allocated subtasks that have been allocated to the cores; in this way, the faster the core processes the subtasks and the more subtasks are completed, the more subtasks can be acquired, thereby reducing the accumulation of subtasks in some cores, improving the balance of load between cores, and thereby improving the task processing performance of the GPU.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，而非限制本公开的技术方案。It should be understood that the above general description and the following detailed description are merely exemplary and explanatory, and are not intended to limit the technical solutions of the present disclosure.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处的附图被并入说明书中并构成本说明书的一部分，这些附图示出了符合本申请的实施例，并与说明书一起用于说明本申请的技术方案。The drawings herein are incorporated into the specification and constitute a part of the specification. These drawings illustrate embodiments consistent with the present application and are used together with the specification to illustrate the technical solution of the present application.

图1为本申请实施例提供的一种可选的全局调度的过程示意图；FIG1 is a schematic diagram of an optional global scheduling process provided in an embodiment of the present application;

图2为本申请实施例提供的一种可选的相关技术中的自治调度的过程示意图；FIG2 is a schematic diagram of an optional autonomous scheduling process in a related technology provided in an embodiment of the present application;

图3为本申请实施例提供的一种可选的任务处理方法的过程示意图一；FIG3 is a process diagram 1 of an optional task processing method provided in an embodiment of the present application;

图4为本申请实施例提供的一种可选的任务处理方法的过程示意图二；FIG4 is a second process diagram of an optional task processing method provided in an embodiment of the present application;

图5为本申请实施例提供的一种可选的多核自治调度的过程示意图；FIG5 is a schematic diagram of an optional multi-core autonomous scheduling process provided in an embodiment of the present application;

图6为本申请实施例提供的一种可选的任务处理方法的过程示意图三；FIG6 is a third process diagram of an optional task processing method provided in an embodiment of the present application;

图7为本申请实施例提供的一种可选的任务处理方法的过程示意图四；FIG. 7 is a fourth process diagram of an optional task processing method provided in an embodiment of the present application;

图8为本申请实施例提供的一种可选的电子设备的硬件结构示意图。FIG8 is a schematic diagram of the hardware structure of an optional electronic device provided in an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请作进一步地详细描述，所描述的实施例不应视为对本申请的限制，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例，都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings. The described embodiments should not be regarded as limiting the present application. All other embodiments obtained by ordinary technicians in the field without making creative work are within the scope of protection of this application.

在以下的描述中，涉及到“一些实施例”，其描述了所有可能实施例的子集，但是可以理解，“一些实施例”可以是所有可能实施例的相同子集或不同子集，并且可以在不冲突的情况下相互结合。In the following description, reference is made to “some embodiments”, which describe a subset of all possible embodiments, but it will be understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.

在以下的描述中，所涉及的术语“第一\第二\第三”仅仅是是区别类似的对象，不代表针对对象的特定排序，可以理解地，“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序，以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。In the following description, the terms "first\second\third" involved are merely used to distinguish similar objects and do not represent a specific ordering of the objects. It can be understood that "first\second\third" can be interchanged with a specific order or sequence where permitted, so that the embodiments of the present application described here can be implemented in an order other than that illustrated or described here.

除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的，不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of this application and are not intended to limit this application.

为便于理解本方案，在对本申请实施例进行说明之前，对本申请实施例中的应用背景进行说明。To facilitate understanding of the present solution, before describing the embodiments of the present application, the application background of the embodiments of the present application is described.

相关技术中，GPU多核间的子任务调度可以包括全局调度和自治调度两种。全局调度需要在GPU中设置一个统一的用于子任务调度的硬件模块，通过这一模块来给GPU内所有核分配子任务。如图1所示，中央处理器(Central Processing Unit，CPU)2向GPU1下发任务后，任务存储在GPU的内存13中，任务包括多个子任务，全局调度模块11可以从内存13中读取子任务，将子任务分配到多个核(示例性的示出了核12-1、核12-2、核12-3和核12-4)。由于GPU中的核的数量远远超出图中示出的数量，全局调度模块11在向多个核分配任务时，可能会跟不上核的处理能力，导致有的核的子任务已经处理完成，全局调度模块11还未来得及给它分配新的子任务，从而导致GPU性能损失。而自治调度没有全局调度模块，需要GPU内每个核按照事先约定好的方式，确定自身的子任务集合并处理。如图2所示，CPU4向GPU3下发任务后，任务存储在GPU的内存33中，任务包括多个子任务。GPU3内的多个核(示例性的示出了核32-1、核32-2、核32-3和核32-4)可以从内存33中获取子任务集合，处理子任务集合中的子任务。需要说明的是，子任务之间没有依赖关系，各个核可以独立处理自身子任务集合中的子任务。这里，每个核可以根据子任务所在向量空间上的位置获取子任务集合，也可以结合向量空间的位置和子任务标识获取子任务集合；例如，核32-1的子任务集合可以包括左上位置中子任务标识为偶数的子任务。由于每个核的子任务集合中的子任务数量可以不同，且每个子任务的处理时间也可以不同，导致每个核处理完成子任务集合中的子任务所需的时间也可能大不相同；如此，每个核的总运行时间不一样，且差距可能较大，运行时间最长的核会成为GPU的性能瓶颈，影响GPU的处理性能。In the related art, subtask scheduling between GPU multi-cores can include global scheduling and autonomous scheduling. Global scheduling requires setting a unified hardware module for subtask scheduling in the GPU, and using this module to assign subtasks to all cores in the GPU. As shown in Figure 1, after the central processing unit (CPU) 2 sends a task to GPU1, the task is stored in the memory 13 of the GPU. The task includes multiple subtasks. The global scheduling module 11 can read the subtasks from the memory 13 and assign the subtasks to multiple cores (cores 12-1, 12-2, 12-3 and 12-4 are shown as examples). Since the number of cores in the GPU far exceeds the number shown in the figure, the global scheduling module 11 may not keep up with the processing capacity of the cores when assigning tasks to multiple cores, resulting in the subtasks of some cores being processed and completed, and the global scheduling module 11 has not yet assigned new subtasks to it, resulting in GPU performance loss. Autonomous scheduling does not have a global scheduling module, and each core in the GPU needs to determine its own subtask set and process it in a pre-agreed manner. As shown in FIG2 , after CPU4 sends a task to GPU3, the task is stored in the memory 33 of the GPU, and the task includes multiple subtasks. Multiple cores in GPU3 (exemplarily showing core 32-1, core 32-2, core 32-3 and core 32-4) can obtain a subtask set from the memory 33 and process the subtasks in the subtask set. It should be noted that there is no dependency between subtasks, and each core can independently process the subtasks in its own subtask set. Here, each core can obtain a subtask set based on the position of the subtask in the vector space, or can obtain a subtask set in combination with the position of the vector space and the subtask identifier; for example, the subtask set of core 32-1 can include subtasks whose subtask identifiers in the upper left position are even numbers. Since the number of subtasks in the subtask set of each core can be different, and the processing time of each subtask can also be different, the time required for each core to complete the processing of the subtasks in the subtask set may also be very different; in this way, the total running time of each core is different, and the gap may be large. The core with the longest running time will become the performance bottleneck of the GPU, affecting the processing performance of the GPU.

为了解决上述问题，本申请实施例提供一种任务处理方法，该方法可以由电子设备的多核图形处理器执行。其中，电子设备指的可以是服务器、笔记本电脑、平板电脑、台式计算机、智能电视、移动设备(例如移动电话、便携式视频播放器、个人数字助理、便携式游戏设备)等具备显示功能，需要采用图形处理器处理任务的设备。图3为本申请实施例提供的一种可选的任务处理方法的过程示意图一，如图3所示，该方法包括：S101-S102。In order to solve the above problems, an embodiment of the present application provides a task processing method, which can be executed by a multi-core graphics processor of an electronic device. The electronic device may refer to a server, a laptop, a tablet computer, a desktop computer, a smart TV, a mobile device (such as a mobile phone, a portable video player, a personal digital assistant, a portable gaming device), etc., which has a display function and requires a graphics processor to process tasks. Figure 3 is a process schematic diagram of an optional task processing method provided in an embodiment of the present application. As shown in Figure 3, the method includes: S101-S102.

S101、在GPU处理第一任务的过程中，基于第一核的当前处理能力和第一任务中已分配到各个核的已分配子任务的数量，从第一任务中确定第一核的目标子任务；第一核为多核中的一个。S101. During the process of GPU processing a first task, based on the current processing capability of the first core and the number of allocated subtasks in the first task that have been allocated to each core, determine a target subtask of the first core from the first task; the first core is one of the multiple cores.

在本申请实施例中，图形处理器GPU处理第一任务时，是通过GPU的多个核处理第一任务的子任务实现的。每个核对子任务的处理能力可以通过自身的预设最大任务数量表征，也即可同时处理的最大任务数量表征；这里，每个核的处理能力可以相同，也可以不同；对此，可以根据需要设置，本申请实施例不作限制。各个核处理子任务过程中，不断地有子任务完成，此时，各个核的当前处理能力为当前可处理的子任务数量。In the embodiment of the present application, when the graphics processor GPU processes the first task, it is achieved by processing the subtasks of the first task through multiple cores of the GPU. The processing capacity of each core for subtasks can be characterized by its own preset maximum number of tasks, that is, the maximum number of tasks that can be processed simultaneously; here, the processing capacity of each core can be the same or different; this can be set as needed, and the embodiment of the present application does not limit it. During the process of each core processing subtasks, subtasks are continuously completed. At this time, the current processing capacity of each core is the number of subtasks that can be processed at present.

在本申请实施例中，GPU获取到CPU下发的第一任务之后，GPU中的各个核可以从第一任务中获取子任务并处理。其中，第一核可以根据自身的当前处理能力和已分配子任务的数量，从第一任务中确定自身的目标子任务，获取目标子任务并处理；这里，第一核为多个中的一个。In the embodiment of the present application, after the GPU obtains the first task sent by the CPU, each core in the GPU can obtain subtasks from the first task and process them. Among them, the first core can determine its own target subtask from the first task according to its current processing capacity and the number of assigned subtasks, obtain the target subtask and process it; here, the first core is one of the multiple cores.

在本申请实施例中，第一核内已获取的子任务，无论正在处理的任务还是待处理的任务，都是第一核的未完成任务；未完成任务的数量应小于或者等于第一核最大任务数量。第一核可处理的子任务的数量可以根据最大任务数量和未完成任务数量确定；这里，可处理的子任务的数量和未完成任务的数量之和应小于或者等于最大任务数量。In the embodiment of the present application, the subtasks acquired in the first core, whether they are tasks being processed or tasks to be processed, are all unfinished tasks of the first core; the number of unfinished tasks should be less than or equal to the maximum number of tasks of the first core. The number of subtasks that can be processed by the first core can be determined based on the maximum number of tasks and the number of unfinished tasks; here, the sum of the number of processable subtasks and the number of unfinished tasks should be less than or equal to the maximum number of tasks.

在一些实施例中，最大任务数量和未完成任务的数量的差值等于可处理的子任务的数量。In some embodiments, the difference between the maximum number of tasks and the number of uncompleted tasks is equal to the number of processable subtasks.

示例性的，多个核包括4核：核1-核4；其中，核1的最大任务数量为5，在核1内子任务数量为0的情况下，核1内未完成任务的数量也为0，此时，可处理的子任务的数量即为5。在核1获取到5个子任务并开始处理后，如果有1个子任务处理完成，则核1的未完成任务的数量为4，此时，可处理的子任务的数量为1。Exemplarily, the multiple cores include 4 cores: core 1 to core 4; wherein the maximum number of tasks of core 1 is 5, and when the number of subtasks in core 1 is 0, the number of unfinished tasks in core 1 is also 0, and at this time, the number of processable subtasks is 5. After core 1 obtains 5 subtasks and starts processing, if 1 subtask is processed, the number of unfinished tasks of core 1 is 4, and at this time, the number of processable subtasks is 1.

在本申请实施例中，GPU接收到CPU下发的第一任务后，可以将第一任务存储在GPU的内存中；GPU中的多核需要从内存中获取子任务。这里，已经被各个核获取的子任务为第一任务中的已分配子任务，还未分配到各个核的子任务为未分配子任务。电子设备需要保持各个核的运行性能，使各个核内的子任务的数量能够持续保持最大任务量，因此，在GPU处理第一任务的过程中，各个核需要根据自身处理子任务的状态，及时从未分配子任务中获取子任务并执行。In an embodiment of the present application, after the GPU receives the first task sent by the CPU, the first task can be stored in the memory of the GPU; the multiple cores in the GPU need to obtain subtasks from the memory. Here, the subtasks that have been obtained by each core are the allocated subtasks in the first task, and the subtasks that have not yet been allocated to each core are the unallocated subtasks. The electronic device needs to maintain the operating performance of each core so that the number of subtasks in each core can continue to maintain the maximum task volume. Therefore, in the process of the GPU processing the first task, each core needs to obtain and execute subtasks from the unallocated subtasks in a timely manner according to the status of its own processing subtasks.

在本申请实施例中，第一任务的子任务总数是确定的，且所有子任务的分配顺序是确定的。第一核可以从第一任务的未分配子任务中获取可处理的子任务的数量的子任务。在可以确定已分配子任务的数量的情况下，第一核可以根据分配顺序确定出未分配子任务中，可以分配给核的可处理的子任务的数量的子任务。In an embodiment of the present application, the total number of subtasks of the first task is determined, and the allocation order of all subtasks is determined. The first core can obtain subtasks of the number of processable subtasks from the unallocated subtasks of the first task. In the case where the number of allocated subtasks can be determined, the first core can determine the number of processable subtasks that can be allocated to the core from the unallocated subtasks according to the allocation order.

需要说明的是，每个子任务有自己的任务标识。任务标识可以表征子任务的分配顺序。在一些实施例中，任务标识可以通过字母表征，分配顺序可以按照字母的前后顺序排列。在一些实施例中，任务标识可以通过数字表征，分配顺序可以按照数字的大小排列。It should be noted that each subtask has its own task identifier. The task identifier can represent the order in which the subtasks are assigned. In some embodiments, the task identifier can be represented by letters, and the order of assignment can be arranged in the order of letters. In some embodiments, the task identifier can be represented by numbers, and the order of assignment can be arranged according to the size of the numbers.

示例性的，任务A包括M个子任务，其子任务的任务标识可以为：a₀-a_m-1；如此，子任务的分配顺序可以是按照任务标识的下标从小到大的顺序依次分配。如果已经分配了3个子任务，未分配子任务包括a₃-a_m-1，此时，若多核中有一个核需要2个子任务，则该核需要获取a₃和a₄。For example, task A includes M subtasks, and the task identifiers of the subtasks may be: a ₀ -am _-1 ; thus, the subtasks may be allocated in ascending order of the subscripts of the task identifiers. If 3 subtasks have been allocated, and the unallocated subtasks include a ₃ -am _-1 , at this time, if one of the multi-cores needs 2 subtasks, the core needs to obtain a ₃ and a ₄ .

在一些实施例中，任务标识可以为64位整数标识；如此，所有子任务可以拥有全局唯一的任务标识。In some embodiments, the task identifier may be a 64-bit integer identifier; in this way, all subtasks may have a globally unique task identifier.

在本申请实施例中，GPU获取的第一任务可以是已经拆分过的，也就是说，GPU获取到的第一任务是多个子任务，第一核可以根据需要逐个读取子任务。GPU获取的第一任务也可以是没有拆分过的，第一核可以根据需要拆分和读取；对此，可以根据需要设置，本申请实施例不作限制。In the embodiment of the present application, the first task acquired by the GPU may be split, that is, the first task acquired by the GPU is a plurality of subtasks, and the first core may read the subtasks one by one as needed. The first task acquired by the GPU may also be unsplit, and the first core may split and read the subtasks as needed; this may be set as needed, and the embodiment of the present application does not limit this.

在一些实施例中，第一任务的多个子任务可以按照分配顺序存储在先进先出队列(First In First Out，FIFO)中，如此，FIFO中排列在最前的可处理的子任务的数量个子任务即为目标子任务，第一核可以根据可处理的子任务的数量，从先进先出队列中获取子任务。在一些实施例中，GPU可以记录已分配子任务的数量，第一核可以获取已分配子任务的数量，结合可处理的子任务的数量，从第一任务中确定目标子任务。在一些实施例中，第一核可以根据可处理的子任务的数量和已分配子任务的数量，从第一任务中确定目标子任务的标识，进而确定目标子任务。In some embodiments, multiple subtasks of the first task can be stored in a first-in-first-out queue (FIFO) in the order of allocation, so that the number of processable subtasks arranged at the front in the FIFO is the target subtask, and the first core can obtain the subtask from the first-in-first-out queue according to the number of processable subtasks. In some embodiments, the GPU can record the number of allocated subtasks, and the first core can obtain the number of allocated subtasks, and determine the target subtask from the first task in combination with the number of processable subtasks. In some embodiments, the first core can determine the identifier of the target subtask from the first task based on the number of processable subtasks and the number of allocated subtasks, and then determine the target subtask.

S102、通过第一核，获取目标子任务并处理。S102: Obtain and process the target subtask through the first core.

在本申请实施例中，第一核在确定了目标子任务之后，可以获取第一任务中的目标子任务，处理目标子任务。In the embodiment of the present application, after determining the target subtask, the first core may obtain the target subtask in the first task and process the target subtask.

在本申请实施例中，只要第一任务中的子任务没有全部分配到各个核中，也即GPU中还有未分配子任务，第一核就可以在正在处理的子任务的数量小于最大任务量的情况下，从未分配子任务中获取目标子任务。在第一任务的所有子任务都分配完成，且GPU中所有核都处理完成第一任务的子任务的情况下，第一任务处理完成。In the embodiment of the present application, as long as all subtasks in the first task are not allocated to each core, that is, there are unallocated subtasks in the GPU, the first core can obtain the target subtask from the unallocated subtask when the number of subtasks being processed is less than the maximum task amount. When all subtasks of the first task are allocated and all cores in the GPU have processed the subtasks of the first task, the processing of the first task is completed.

可以理解的是，由于GPU中的第一核可以根据可处理的子任务的数量，和已分配到各个核的已分配子任务的数量，确定并及时获取第一任务中的目标子任务；如此，GPU中的各个核之间，处理子任务的速度越快的核，完成的子任务越多，就能够获取更多子任务，从而减少子任务在部分核内堆积的可能性，提高多核间负载的均衡性，进而提高GPU处理任务的能力。It can be understood that since the first core in the GPU can determine and promptly obtain the target subtask in the first task based on the number of processable subtasks and the number of allocated subtasks that have been allocated to each core; in this way, among the cores in the GPU, the faster the core that processes subtasks, the more subtasks it completes and the more subtasks it can obtain, thereby reducing the possibility of subtasks piling up in some cores, improving the load balance between multiple cores, and thereby improving the GPU's ability to process tasks.

在本申请的一些实施例中，多核GPU包括任务计数模块，各个核的已分配子任务的数量通过任务计数模块的计数值进行确定。In some embodiments of the present application, the multi-core GPU includes a task counting module, and the number of allocated subtasks of each core is determined by a count value of the task counting module.

在本申请实施例中，多核GPU设置有任务计数模块，第一任务中已经分配到各个核的已分配子任务的数量，可以通过任务计数模块确定。这里，各个核在获取第一任务的子任务的过程中，任务计数模块可以方便地累计已被核获取的子任务的数量，从而得到已分配子任务的数量，提高确定已分配子任务的数量的效率。In the embodiment of the present application, the multi-core GPU is provided with a task counting module, and the number of allocated subtasks that have been allocated to each core in the first task can be determined by the task counting module. Here, in the process of each core acquiring the subtasks of the first task, the task counting module can conveniently accumulate the number of subtasks acquired by the core, thereby obtaining the number of allocated subtasks, thereby improving the efficiency of determining the number of allocated subtasks.

在一些实施例中，任务计数模块可以设置在GPU的内存中，如此，各个核从内存中获取第一任务中的子任务时，使任务计数模块能够快速累计计数值，提高确定已分配子任务的数量的效率。在一些实施例中，任务计数模块可以设置在任意一个核内，也可以设置在内存和核以外的GPU的其他组成部分；对此，可以根据需要设置，本申请实施例不作限制。在一些实施例中，任务计数模块可以为GPU中设置的计数器。In some embodiments, the task counting module can be set in the memory of the GPU, so that when each core obtains the subtasks in the first task from the memory, the task counting module can quickly accumulate the count value, thereby improving the efficiency of determining the number of assigned subtasks. In some embodiments, the task counting module can be set in any core, or in other components of the GPU other than the memory and the core; in this regard, it can be set as needed, and the embodiments of the present application are not limited. In some embodiments, the task counting module can be a counter set in the GPU.

在本申请的一些实施例中，S101中从第一任务中确定第一核的目标子任务的实现，如图4所示，可以包括：S201-S203。In some embodiments of the present application, determining the implementation of the target subtask of the first core from the first task in S101, as shown in FIG. 4 , may include: S201 - S203 .

S201、通过第一核，向任务计数模块发送子任务获取指令；子任务获取指令中包括第一核的当前处理能力，当前处理能力用可处理的子任务的数量表征。S201. Send a subtask acquisition instruction to a task counting module through the first core. The subtask acquisition instruction includes the current processing capability of the first core, and the current processing capability is represented by the number of processable subtasks.

在本申请实施例中，第一核在需要获取第一任务的子任务的情况下，可以向任务计数模块发送子任务获取指令，通过子任务获取指令将第一核的当前处理能力告知任务计数模块，方便任务计数模块累计已分配子任务的数量。这里，第一核的当前处理能力，即为第一核可处理的子任务的数量；第一核可以将可处理的子任务的数量可携带在任务获取指令中，发送至任务计数模块。In the embodiment of the present application, when the first core needs to obtain a subtask of the first task, it can send a subtask acquisition instruction to the task counting module, and inform the task counting module of the current processing capacity of the first core through the subtask acquisition instruction, so that the task counting module can accumulate the number of assigned subtasks. Here, the current processing capacity of the first core is the number of subtasks that the first core can process; the first core can carry the number of subtasks that can be processed in the task acquisition instruction and send it to the task counting module.

在本申请实施例中，CPU向GPU下发第一任务时，各个核内通常还没有获取子任务，此时，第一核需要获取子任务，则可以每隔预设获取时间间隔检测第一核对子任务的可处理的子任务的数量，在可处理的子任务的数量大于0的情况下，确定需要获取子任务；第一核还可以在检测到第一核内有子任务被处理完时，确定需要获取子任务。对此，可以根据需要设置，本申请实施例不作限制。In the embodiment of the present application, when the CPU sends the first task to the GPU, the subtasks are usually not acquired in each core. At this time, the first core needs to acquire the subtask, and can detect the number of processable subtasks of the subtask of the first core at a preset acquisition time interval. When the number of processable subtasks is greater than 0, it is determined that the subtask needs to be acquired; the first core can also determine that the subtask needs to be acquired when it is detected that a subtask in the first core has been processed. This can be set as needed, and the embodiment of the present application does not limit it.

S202、通过任务计数模块响应任务获取指令，将计数值发送至第一核。S202: Respond to the task acquisition instruction through the task counting module and send the count value to the first core.

在本申请实施例中，任务计数模块中记录有已分配子任务的数量，也即计数值。任务计数模块接收到子任务获取指令后，响应子任务获取指令，可以将计数值发送给第一核。In the embodiment of the present application, the task counting module records the number of assigned subtasks, that is, the count value. After receiving the subtask acquisition instruction, the task counting module responds to the subtask acquisition instruction and can send the count value to the first core.

S203、通过第一核，根据计数值和当前处理能力，从第一任务中的未分配子任务中确定目标子任务。S203 . Determine, by the first core, a target subtask from unassigned subtasks in the first task according to the count value and the current processing capability.

在本申请实施例中，第一核在接收到来自任务计数模块的计数值之后，可以根据计数值和可处理的子任务的数量，从未分配子任务中确定出目标子任务，目标子任务的数量为可处理的子任务的数量。In an embodiment of the present application, after receiving the count value from the task counting module, the first core can determine the target subtask from the unassigned subtasks based on the count value and the number of processable subtasks, where the number of target subtasks is the number of processable subtasks.

示例性的，如图5所示，CPU6向GPU5下发任务后，任务存储在GPU的内存53中，任务包括多个子任务。GPU5内的多个核(示例性的示出了核52-1、核52-2、核52-3和核52-4)可以向任务计数模块54发送子任务获取指令，接收任务计数模块54返回的计数值，根据计数值和可处理的子任务的数量从内存53中获取子任务并处理。Exemplarily, as shown in FIG5 , after CPU 6 sends a task to GPU 5, the task is stored in GPU memory 53, and the task includes multiple subtasks. Multiple cores in GPU 5 (exemplarily showing core 52-1, core 52-2, core 52-3, and core 52-4) can send a subtask acquisition instruction to task counting module 54, receive the count value returned by task counting module 54, and acquire and process the subtask from memory 53 according to the count value and the number of processable subtasks.

可以理解的是，GPU可以通过任务计数模块记录已分配子任务的数量，核可以从任务计数模块获取已分配子任务的数量，根据已分配子任务的数量确定未分配任务中可以分配给核的目标子任务；如此，能够提高核确定目标子任务的准确性。It can be understood that the GPU can record the number of allocated subtasks through the task counting module, and the core can obtain the number of allocated subtasks from the task counting module, and determine the target subtasks that can be allocated to the core among the unallocated tasks based on the number of allocated subtasks; in this way, the accuracy of the core in determining the target subtasks can be improved.

在本申请的一些实施例中，任务计数模块在接收到子任务获取指令，将计数值发送给第一核之后，还可以将计数值增加可处理的子任务的数量，得到更新后的计数值。In some embodiments of the present application, after receiving the subtask acquisition instruction and sending the count value to the first core, the task counting module may also increase the count value by the number of processable subtasks to obtain an updated count value.

在本申请实施例中，任务计数模块每接收到一个任务获取指令，获取到来自一个核的可处理的子任务的数量之后，需要将计数值反馈给该核；然后，在在计数值的基础上加上可处理的子任务的数量，得到更新后的计数值；也就是说，第一核可以获取可处理的子任务的数量个子任务，同时，任务计数模块记录的已分配子任务的数量增加可处理的子任务的数量。如此，能够及时更新计数值，提高计数值的准确性。In the embodiment of the present application, after receiving a task acquisition instruction and obtaining the number of processable subtasks from a core, the task counting module needs to feed back the count value to the core; then, the number of processable subtasks is added to the count value to obtain an updated count value; that is, the first core can obtain the number of processable subtasks, and at the same time, the number of assigned subtasks recorded by the task counting module is increased by the number of processable subtasks. In this way, the count value can be updated in time to improve the accuracy of the count value.

在本申请的一些实施例中，第一核可以在接收到来自中央处理器的子任务获取指示信息的情况下，向任务计数模块发送子任务获取指令。In some embodiments of the present application, the first core may send a subtask acquisition instruction to the task counting module when receiving subtask acquisition indication information from the central processing unit.

在本申请实施例中，CPU下发第一任务给GPU的同时，会向各个核发送子任务获取指示信息；如此，第一核在接收到子任务获取指示信息的情况下，确定需要获取第一任务的子任务，向任务获取模块发送子任务获取指令。In an embodiment of the present application, when the CPU sends the first task to the GPU, it will send subtask acquisition indication information to each core; thus, when the first core receives the subtask acquisition indication information, it determines that it needs to acquire the subtask of the first task and sends the subtask acquisition instruction to the task acquisition module.

在本申请实施例中，CPU下发第一任务前，GPU的各个核内没有第一任务的子任务。在CPU下发第一任务，第一核接收到子任务获取指示信息的情况下，第一核可以将可处理的子任务的数量设置为预设最大任务数量。In the embodiment of the present application, before the CPU issues the first task, there is no subtask of the first task in each core of the GPU. When the CPU issues the first task and the first core receives the subtask acquisition indication information, the first core can set the number of processable subtasks to the preset maximum number of tasks.

在一些实施例中，CPU下发第一任务给GPU，CPU可以通过广播，向各个核下发子任务获取指示信息。In some embodiments, the CPU sends the first task to the GPU, and the CPU may send subtasks to each core to obtain indication information by broadcasting.

可以理解的是，第一核可以通过子任务获取指示信息确定CPU向GPU下发第一任务，从而确定可处理的子任务的数量为预设最大任务数量，开始获取目标子任务并处理。如此，能够使核及时获取第一任务的子任务，提高第一任务的处理效率。It is understandable that the first core can determine that the CPU sends the first task to the GPU through the subtask acquisition indication information, thereby determining that the number of processable subtasks is the preset maximum number of tasks, and starts to acquire and process the target subtask. In this way, the core can acquire the subtasks of the first task in a timely manner, thereby improving the processing efficiency of the first task.

在本申请的一些实施例中，在第一核内的第一数量个子任务被处理完成的情况下，可以通过第一核向任务计数模块发送子任务获取指令；可处理的子任务的数量为被处理完成的第一数量个子任务的数量。In some embodiments of the present application, when a first number of subtasks in the first core are processed, a subtask acquisition instruction can be sent to the task counting module through the first core; the number of processable subtasks is the number of the first number of subtasks that have been processed.

在本申请实施例中，第一核可以同时处理多个子任务，但每个子任务的处理时长不同，处理完成的时间不同；只要有处理完的子任务，该处理完的子任务的资源就可以空闲出来，用于处理新获取的子任务。In an embodiment of the present application, the first core can process multiple subtasks at the same time, but the processing time of each subtask is different and the processing completion time is different; as long as there are subtasks that have been processed, the resources of the processed subtasks can be freed up to process newly acquired subtasks.

在本申请实施例中，第一核可以检测子任务的处理情况，在确定第一核内第一数量个子任务处理完成的情况下，确定第一核需要获取新的子任务，向任务计数模块发送子任务获取指令。这里，第一数量小于或者等于预设最大任务数量，对于第一数量的值，本申请实施例不作限制。In an embodiment of the present application, the first core can detect the processing status of the subtasks, and when it is determined that the first number of subtasks in the first core has been processed, it is determined that the first core needs to obtain a new subtask, and a subtask acquisition instruction is sent to the task counting module. Here, the first number is less than or equal to the preset maximum number of tasks, and the embodiment of the present application does not limit the value of the first number.

在一些实施例中，第一核可以实时检测子任务处理情况，在检测到子任务处理完成的情况下，将被处理完成的子任务的数量确定为第一数量；然后，向任务计数模块发送子任务获取指令，开始获取子任务。这里，第一数量可以是一个，也可以是多个，第一数量小于或者等于预设最大任务数量。In some embodiments, the first core can detect the subtask processing status in real time, and when detecting that the subtask processing is completed, determine the number of subtasks that have been processed as the first number; then, send a subtask acquisition instruction to the task counting module to start acquiring subtasks. Here, the first number can be one or more, and the first number is less than or equal to the preset maximum number of tasks.

可以理解的是，第一核可以在有子任务被处理完成，有空闲资源可以处理新的子任务的情况下，及时获取子任务并处理。如此，能够使第一核及时获取目标子任务，减少核的资源闲置的可能性，从而提高第一任务的处理效率。It is understandable that the first core can obtain and process subtasks in a timely manner when subtasks have been processed and there are idle resources to process new subtasks. In this way, the first core can obtain the target subtask in a timely manner, reduce the possibility of idle core resources, and thus improve the processing efficiency of the first task.

在本申请的一些实施例中，第一核可以在与上一次发送任务获取指令之间的时间间隔大于或者等于预设获取时间间隔，且可处理的子任务的数量大于0的情况下，向子任务计数模块发送子任务获取指令；可处理的子任务的数量为上一次发送任务获取指令之后处理完成的子任务的数量。In some embodiments of the present application, the first core may send a subtask acquisition instruction to the subtask counting module if the time interval between the last time the task acquisition instruction was sent is greater than or equal to the preset acquisition time interval and the number of processable subtasks is greater than 0; the number of processable subtasks is the number of subtasks processed and completed after the last time the task acquisition instruction was sent.

在本申请实施例中，第一核可以在发送一次子任务获取指令之后，记录与子任务获取指令之间的时间间隔，每隔预设获取时间间隔，获取一次可处理的子任务的数量，基于可处理的子任务的数量向子任务计数模块发送一次子任务获取指令。其中，可处理的子任务的数量为两次子任务获取指令之间，处理完成的子任务的数量。In an embodiment of the present application, after sending a subtask acquisition instruction, the first core can record the time interval between the subtask acquisition instruction and the subtask acquisition instruction, obtain the number of processable subtasks once at a preset acquisition time interval, and send a subtask acquisition instruction to the subtask counting module based on the number of processable subtasks. The number of processable subtasks is the number of subtasks that have been processed and completed between two subtask acquisition instructions.

在一些实施例中，第一核每发送一次子任务获取指令，可以将计时器归零，此时，计时器记录的时间即为与上一次发送任务获取指令之间的时间间隔。在计时器的时间大于或者等于预设获取时间间隔的情况下，第一核可以获取可处理的子任务的数量，基于可处理的子任务的数量向任务计数模块发送子任务获取指令。In some embodiments, each time the first core sends a subtask acquisition instruction, the timer can be reset to zero. At this time, the time recorded by the timer is the time interval between the last sending of the task acquisition instruction. When the time of the timer is greater than or equal to the preset acquisition time interval, the first core can obtain the number of processable subtasks and send a subtask acquisition instruction to the task counting module based on the number of processable subtasks.

需要说明的是，在与上一次发送任务获取指令之间的时间间隔大于或者等于预设获取时间间隔的情况下，如果可处理的子任务的数量为0，核可以向任务计数模块发送子任务获取指令，获取0个目标子任务；如此，任务计数模块在接收到该子任务获取模块后，可以忽略该子任务获取指令。It should be noted that when the time interval between the last time a task acquisition instruction was sent is greater than or equal to the preset acquisition time interval, if the number of processable subtasks is 0, the core can send a subtask acquisition instruction to the task counting module to obtain 0 target subtasks; in this way, after receiving the subtask acquisition instruction, the task counting module can ignore the subtask acquisition instruction.

可以理解的是，核可以按照预设获取时间间隔，确定可处理的子任务的数量，根据可处理的子任务的数量向任务计数模块发送子任务获取指令，获取可处理的子任务的数量个目标子任务；如此，能够使核及时获取目标子任务，减少核的资源闲置的可能性，从而提高第一任务的处理效率。It can be understood that the core can determine the number of processable subtasks according to a preset acquisition time interval, send a subtask acquisition instruction to the task counting module according to the number of processable subtasks, and obtain the target subtasks of the number of processable subtasks; in this way, the core can obtain the target subtasks in a timely manner, reduce the possibility of idle core resources, and thus improve the processing efficiency of the first task.

在本申请的一些实施例中，第一核可以在处理完成的子任务的数量大于或者等于预设可处理的子任务的数量的情况下，确定可处理的子任务的数量，基于可处理的子任务的数量向任务计数模块发送子任务获取指令，开始获取子任务。其中，可处理的子任务的数量为处理完成的子任务的数量。预设可处理的子任务的数量小于预设最大任务数量，预设可处理的子任务的数量可以根据需要设置，对此，本申请实施例不作限制。如此，在实现核内负载均衡的同时，减少第一核向任务计数模块发送子任务获取指令的频次，减少第一核的资源消耗。In some embodiments of the present application, the first core can determine the number of processable subtasks when the number of processed subtasks is greater than or equal to the preset number of processable subtasks, and send a subtask acquisition instruction to the task counting module based on the number of processable subtasks to start acquiring subtasks. The number of processable subtasks is the number of processed subtasks. The preset number of processable subtasks is less than the preset maximum number of tasks, and the preset number of processable subtasks can be set as needed, which is not limited by the embodiments of the present application. In this way, while achieving load balancing within the core, the frequency of the first core sending subtask acquisition instructions to the task counting module is reduced, thereby reducing the resource consumption of the first core.

在本申请的一些实施例中，S203中根据计数值和当前处理能力，从第一任务中的未分配子任务中确定目标子任务的实现，如图6所示，可以包括：S301-S302。In some embodiments of the present application, in S203, based on the count value and the current processing capacity, the implementation of the target subtask is determined from the unassigned subtasks in the first task, as shown in FIG. 6 , which may include: S301 - S302 .

S301、在计数值小于第一任务的子任务总数的情况下，根据计数值和可处理的子任务的数量，确定目标子任务的目标任务标识。S301: When the count value is less than the total number of subtasks of the first task, determine a target task identifier of a target subtask according to the count value and the number of processable subtasks.

在本申请实施例中，任务计数模块随着接收到的子任务获取指令，不断更新计数值，计数值不断增大。第一核在接收到任务计数模块反馈的计数值之后，可以先根据计数值确定第一任务是否存在未分配子任务。在本申请实施例中，第一核可以比较计数值和第一任务的子任务总数的大小；在计数值小于第一任务的子任务总数的情况下，确定未分配子任务的数量大于0，也即第一任务存在未分配子任务；此时，第一核可以根据计数值和可处理的子任务的数量从未分配子任务中确定目标任务标识。在计数值大于或者等于第一任务的子任务总数的情况下，未分配子任务的数量等于0，也即第一任务不存在未分配子任务；此时，第一核可以确定目标子任务不存在。在目标任务不存在的情况下，第一核不用确定目标子任务标识，停止获取目标子任务。如此，第一核只有在第一任务还有子任务未分配的情况下，才会获取目标子任务，能够提高核获取目标子任务的准确性。In the embodiment of the present application, the task counting module continuously updates the count value as the subtask acquisition instruction is received, and the count value continuously increases. After receiving the count value fed back by the task counting module, the first core can first determine whether there is an unassigned subtask in the first task according to the count value. In the embodiment of the present application, the first core can compare the count value with the total number of subtasks of the first task; when the count value is less than the total number of subtasks of the first task, it is determined that the number of unassigned subtasks is greater than 0, that is, there are unassigned subtasks in the first task; at this time, the first core can determine the target task identifier from the unassigned subtasks according to the count value and the number of processable subtasks. When the count value is greater than or equal to the total number of subtasks of the first task, the number of unassigned subtasks is equal to 0, that is, there are no unassigned subtasks in the first task; at this time, the first core can determine that the target subtask does not exist. In the case where the target task does not exist, the first core does not need to determine the target subtask identifier and stops acquiring the target subtask. In this way, the first core will only acquire the target subtask when there are still unassigned subtasks in the first task, which can improve the accuracy of the core acquiring the target subtask.

在本申请实施例中，第一核可以根据计数值确定下一个等待被分配的子任务的标识，进而确定可处理的子任务的数量个目标子任务的标识，即目标任务标识。In the embodiment of the present application, the first core can determine the identifier of the next subtask waiting to be assigned according to the count value, and then determine the number of processable subtasks and the identifier of the target subtask, that is, the target task identifier.

在本申请的一些实施例中，第一任务的子任务包括任务标识；任务标识的顺序用于表征第一任务的子任务的分配顺序；第一核可以根据计数值确定目标任务标识中的起始标识，按照分配顺序，依次获取可处理的子任务的数量个任务标识，作为目标任务标识。In some embodiments of the present application, the subtasks of the first task include task identifiers; the order of the task identifiers is used to characterize the allocation order of the subtasks of the first task; the first core can determine the starting identifier in the target task identifier based on the count value, and obtain the number of task identifiers of processable subtasks in the allocation order as the target task identifier.

在本申请实施例中，第一任务的子任务设置有任务标识，任务标识的顺序可以表征子任务的分配顺序。在一些实施例中，任务标识可以为数字，子任务的分配顺序为按照数字从小到大的顺序依次分配。在一些实施例中，任务标识可以为偶数，子任务的分配顺序为按照偶数从小到大的顺序依次分配。在一些实施例中，任务标识可以为字母，子任务的分配顺序为按照字母的排列顺序，从先到后依次分配；对此，可以根据需要设置，本申请实施例不做限制。In an embodiment of the present application, the subtasks of the first task are provided with task identifiers, and the order of the task identifiers can characterize the order of allocation of the subtasks. In some embodiments, the task identifier can be a number, and the order of allocation of the subtasks is to allocate in order from small to large numbers. In some embodiments, the task identifier can be an even number, and the order of allocation of the subtasks is to allocate in order from small to large even numbers. In some embodiments, the task identifier can be a letter, and the order of allocation of the subtasks is to allocate in order from first to last in the order of arrangement of the letters; this can be set as needed, and the embodiment of the present application does not limit it.

在本申请实施例中，第一核在获取计数值之后，可以确定目标任务标识中的起始标识，按照分配顺序，依次获取可处理的子任务的数量个标识，作为目标任务标识。In the embodiment of the present application, after obtaining the count value, the first core can determine the starting identifier in the target task identifier, and sequentially obtain the identifiers of the number of processable subtasks in the allocation order as the target task identifier.

示例性的，任务A包括M个子任务，分别为a₀-a_M-1，在已分配了N个子任务的情况下，未分配子任务包括a_N-a_M-1，其中，N为小于M-1的正整数。第一核向计数器发送子任务获取指令，携带可处理的子任务的数量为2的情况下，计数器将计数值从N更新为N+2，同时将N反馈给第一核。第一核可以确定目标子任务的第一个子任务的标识为a_N，下一个子任务的标识为a_N+1，从而得到目标任务标识包括：a_N和a_N+1。Exemplarily, task A includes M subtasks, which are a ₀ -a _M-1 respectively. When N subtasks have been allocated, the unallocated subtasks include a _N -a _M-1 , where N is a positive integer less than M-1. The first core sends a subtask acquisition instruction to the counter, and when the number of processable subtasks is 2, the counter updates the count value from N to N+2, and feeds N back to the first core. The first core can determine that the identifier of the first subtask of the target subtask is a _N , and the identifier of the next subtask is a _N+1 , so that the target task identifiers include: a _N and a _N+1 .

可以理解的是，由于任务标识可以表征子任务的分配顺序，第一核可以根据计数值确定目标子任务的起始标识，再根据可处理的子任务的数量确定目标任务标识，能够提高确定目标子任务的标识的效率。It can be understood that since the task identifier can represent the allocation order of subtasks, the first core can determine the starting identifier of the target subtask based on the count value, and then determine the target task identifier based on the number of processable subtasks, which can improve the efficiency of determining the identifier of the target subtask.

S302、将未分配子任务中与目标任务标识对应的子任务，确定为目标子任务。S302: Determine the subtask corresponding to the target task identifier among the unassigned subtasks as the target subtask.

在本申请实施例中，第一核在确定该目标任务标识后，可以从未分配子任务中获取任务标识与目标任务标识相同的子任务，即为目标子任务。In the embodiment of the present application, after determining the target task identifier, the first core may obtain a subtask with the same task identifier as the target task identifier from unassigned subtasks, which is the target subtask.

可以理解的是，第一核可以先确定目标任务标识，再根据目标任务标识获取目标子任务，如此，能够提高获取目标子任务的准确性。It is understandable that the first core may first determine the target task identifier, and then obtain the target subtask according to the target task identifier, thereby improving the accuracy of obtaining the target subtask.

在本申请的一些实施例中，任务计数模块在接收到来自多个核的多个任务获取指令的情况下，按照任务获取指令中可处理的子任务的数量进行排序，得到响应顺序；根据响应排序依次响应任务获取指令。In some embodiments of the present application, when a task counting module receives multiple task acquisition instructions from multiple cores, it sorts them according to the number of processable subtasks in the task acquisition instructions to obtain a response order; and responds to the task acquisition instructions in sequence according to the response order.

在本申请实施例中，GPU中的多个核可能同时向任务计数模块发送子任务获取指令，此时，任务计数模块可以获取各个核的可处理的子任务的数量，对各个核的可处理的子任务的数量按照大小进行排序，得到响应顺序。再按照响应顺序，逐个响应子任务获取指令。这里，响应顺序可以为可处理的子任务的数量从小到大的顺序，也可以为可处理的子任务的数量从大到小的顺序，对此，可以根据需要设置，本申请实施例不作限制。In an embodiment of the present application, multiple cores in a GPU may simultaneously send subtask acquisition instructions to a task counting module. At this time, the task counting module may obtain the number of processable subtasks of each core, sort the number of processable subtasks of each core by size, and obtain a response order. Then, according to the response order, respond to the subtask acquisition instruction one by one. Here, the response order may be the order of the number of processable subtasks from small to large, or the order of the number of processable subtasks from large to small. This may be set as needed, and the embodiment of the present application does not limit it.

可以理解的是，任务计数模块每次只能响应一个子任务获取指令，在同时接收到多个子任务获取指令的情况下，按照可处理的子任务的数量确定响应顺序，减少响应错误，提高核获取目标子任务的准确性，从而提高处理任务的性能。It can be understood that the task counting module can only respond to one subtask acquisition instruction at a time. When multiple subtask acquisition instructions are received at the same time, the response order is determined according to the number of processable subtasks, thereby reducing response errors and improving the accuracy of the core acquiring the target subtask, thereby improving the performance of processing tasks.

在一些实施例中，任务计数模块可以按照可处理的子任务的数量从小到大顺序，依次响应多个核。也就是说，任务计数模块可以先回应可处理的子任务的数量小的核，使可处理的子任务的数量小的核可以先获取少量的目标子任务，减少单个核获取子任务数量较多，而其他核没有子任务可以获取的概率，提高核的负载均衡性。In some embodiments, the task counting module may respond to multiple cores in order of the number of processable subtasks from small to large. That is, the task counting module may first respond to the core with a small number of processable subtasks, so that the core with a small number of processable subtasks can first obtain a small number of target subtasks, reducing the probability that a single core obtains a large number of subtasks while other cores have no subtasks to obtain, thereby improving the load balancing of the cores.

在本申请的一些实施例中，S102中获取目标子任务并处理之后的实现，可以包括：通过第一核，在第一核的子任务全部处理完成的情况下，向中央处理器反馈空闲信息。In some embodiments of the present application, the implementation of obtaining and processing the target subtask in S102 may include: feeding back idle information to the central processing unit through the first core when all subtasks of the first core are processed.

在本申请实施例中，第一核在处理子任务的过程中，会及时获取第一任务中未分配的子任务进行处理，直到第一任务的子任务全部分配完毕；第一核将无法获取新的子任务，直到子任务全部处理完成，第一核没有子任务可以处理的情况下，可以向CPU反馈空闲信息。In an embodiment of the present application, during the process of processing subtasks, the first core will promptly obtain unassigned subtasks in the first task for processing until all subtasks of the first task are assigned; the first core will not be able to obtain new subtasks until all subtasks are processed. If the first core has no subtasks to process, it can feedback idle information to the CPU.

在本申请实施例中，GPU中所有核向CPU反馈空闲信息，表示GPU处理完成第一任务，处于空闲状态；也就是说，第一核可以通过反馈空闲信息告知CPU自身的处理状态，进而向CPU告知GPU的处理状态；如此，CPU可以在GPU空闲时继续下发下一个任务，提高CPU下发任务的效率。In an embodiment of the present application, all cores in the GPU feed back idle information to the CPU, indicating that the GPU has completed processing the first task and is in an idle state; that is, the first core can inform the CPU of its own processing status by feeding back idle information, and then inform the CPU of the processing status of the GPU; in this way, the CPU can continue to issue the next task when the GPU is idle, thereby improving the efficiency of the CPU in issuing tasks.

在本申请的一些实施例中，S101中在GPU处理第一任务的过程中，基于第一核的当前处理能力和第一任务中已分配到各个核的已分配子任务的数量，从第一任务中确定第一核的目标子任务之前的实现，还可以包括：接收来自中央处理器下发的第一任务；第一任务中包括第一任务中的子任务总数。In some embodiments of the present application, in S101, during the process of the GPU processing the first task, based on the current processing capability of the first core and the number of allocated subtasks allocated to each core in the first task, the implementation of the target subtask of the first core is determined from the first task, and it may also include: receiving the first task issued from the central processing unit; the first task includes the total number of subtasks in the first task.

在本申请实施例中，GPU接收CPU下发第一任务中，包含第一任务的子任务总数。如此，GPU在接收第一任务的同时，可以确定第一任务的子任务总数，第一核就可以根据子任务的总数和计数值确定第一任务是否还有未分配的子任务，从而在第一任务分配完毕时，及时停止获取目标子任务；能够减少核的资源消耗。In the embodiment of the present application, the GPU receives the first task sent by the CPU, which includes the total number of subtasks of the first task. In this way, the GPU can determine the total number of subtasks of the first task while receiving the first task, and the first core can determine whether there are any unassigned subtasks of the first task according to the total number of subtasks and the count value, so as to stop acquiring the target subtask in time when the first task is assigned, which can reduce the resource consumption of the core.

在本申请的一些实施例中，S101中在GPU处理第一任务的过程中，基于第一核的当前处理能力和第一任务中已分配到各个核的已分配子任务的数量，从第一任务中确定第一核的目标子任务之前的实现，可以包括：通过任务计数模块，响应来自中央处理器的清零指令，将计数值清零。In some embodiments of the present application, in S101, during the process of the GPU processing the first task, based on the current processing capability of the first core and the number of allocated subtasks allocated to each core in the first task, determining the implementation of the target subtask of the first core from the first task may include: clearing the count value in response to a clear instruction from the central processing unit through a task counting module.

在本申请实施例中，GPU可以接收来自CPU的清零指令，清零指令用于指示GPU对任务计数模块的计数值进行清零操作。GPU响应清零指令，将任务计数模块的计数值清零，如此，计数值将更新为0。In the embodiment of the present application, the GPU may receive a reset instruction from the CPU, the reset instruction being used to instruct the GPU to reset the count value of the task counting module. The GPU responds to the reset instruction and resets the count value of the task counting module, so that the count value is updated to 0.

在一些实施例中，清零指令和第一任务可以是同时接收到的，GPU在接收到第一任务的同时，接收到清零指令。然后响应清零指令，对计数值清零。在一些实施例中，清零指令可以为在第一任务之后或之前接收到的；在一些实施例中，清零指令还可以为下发的第一任务；对此，可以根据需要设置，本申请实施例不作限制。In some embodiments, the clear instruction and the first task may be received at the same time, and the GPU receives the clear instruction at the same time as receiving the first task. Then, in response to the clear instruction, the count value is cleared. In some embodiments, the clear instruction may be received after or before the first task; in some embodiments, the clear instruction may also be the first task issued; this may be set as needed, and the embodiments of the present application do not limit this.

可以理解的是，GPU每接收到一个第一任务，需要对计数值清零，如此，任务计数模块的计数值即为针对第一任务的已分配任务的数量的计数值；能够提高计数值的准确性。It is understandable that the GPU needs to clear the count value every time it receives a first task. In this way, the count value of the task counting module is the count value of the number of tasks assigned to the first task, which can improve the accuracy of the count value.

基于上述任务处理方法，本申请实施例示出了一种任务处理方法的流程；如图7所示，该方法可以包括：Based on the above task processing method, the embodiment of the present application shows a process of a task processing method; as shown in FIG7 , the method may include:

S11、CPU向GPU发送第一任务和清零指令，并向GPU中的各个核发送广播子任务获取指示信息；其中，第一任务中包括子任务总数；S11, the CPU sends a first task and a clear instruction to the GPU, and sends a broadcast subtask acquisition instruction information to each core in the GPU; wherein the first task includes the total number of subtasks;

S12、GPU通过任务计数模块响应清零指令，将计数值清零；S12, the GPU responds to the reset instruction through the task counting module and resets the count value to zero;

S13、GPU通过第一核响应子任务获取指示信息，向任务计数模块发送子任务获取指令；子任务获取指令中携带的可处理的子任务的数量为预设最大任务数量；S13, the GPU responds to the subtask acquisition indication information through the first core, and sends a subtask acquisition instruction to the task counting module; the number of processable subtasks carried in the subtask acquisition instruction is the preset maximum number of tasks;

S14、GPU通过任务计数模块响应子任务获取指令，将计数值发送至第一核，并将计数值增加可处理的子任务的数量，得到更新后的计数值；S14, the GPU responds to the subtask acquisition instruction through the task counting module, sends the count value to the first core, and increases the count value by the number of processable subtasks to obtain an updated count value;

S15、GPU通过第一核判断计数值是否小于子任务总数；如果是，则执行S16；否则执行S19；S15, the GPU determines whether the count value is less than the total number of subtasks through the first core; if yes, execute S16; otherwise, execute S19;

S16、GPU通过第一核根据计数值和可处理的子任务的数量，确定目标任务标识；S16, the GPU determines the target task identifier through the first core according to the count value and the number of processable subtasks;

S17、GPU通过第一核从未分配子任务中获取任务标识与目标任务标识相同的子任务，作为目标子任务；S17, the GPU obtains, through the first core, a subtask with a task identifier identical to the target task identifier from the unassigned subtasks as the target subtask;

S18、GPU通过第一核判断核内是否有子任务被处理完成；如果是，则执行S14；否则继续执行S18；S18, the GPU determines whether a subtask in the core has been processed through the first core; if yes, execute S14; otherwise, continue to execute S18;

S19、GPU通过第一核在处理完成第一核的所有子任务的情况下，向CPU发送空闲信息。S19. The GPU sends idle information to the CPU through the first core after completing processing of all subtasks of the first core.

在本申请实施例中，CPU在接收到所有核的空闲信息后，可以确定第一任务处理完成，可以下发下一个任务给GPU。In the embodiment of the present application, after receiving the idle information of all cores, the CPU can determine that the first task processing is completed and can send the next task to the GPU.

本申请实施例提供一种多核图形处理器GPU，其中，第一核用于在所述GPU处理第一任务的过程中，基于所述第一核的当前处理能力和所述第一任务中已分配到各个核的已分配子任务的数量，从所述第一任务中确定所述第一核的目标子任务；所述第一核为所述多核中的一个；所述第一核，还用于获取所述目标子任务并处理。An embodiment of the present application provides a multi-core graphics processor (GPU), wherein a first core is used to determine a target subtask of the first core from the first task based on a current processing capability of the first core and the number of allocated subtasks in the first task that have been allocated to each core during processing of a first task by the GPU; the first core is one of the multiple cores; and the first core is further used to obtain and process the target subtask.

在一些实施例中，所述GPU还包括任务计数模块，任务计数模块用于确定计数值；所述计数值用于表征各个核的已分配子任务的数量。In some embodiments, the GPU further includes a task counting module, which is used to determine a count value; the count value is used to characterize the number of allocated subtasks of each core.

在一些实施例中，所述第一核，还用于向任务计数模块发送子任务获取指令；所述子任务获取指令中包括所述第一核的当前处理能力，所述当前处理能力用可处理的子任务的数量表征；所述任务计数模块，还用于响应所述子任务获取指令，将所述计数值发送至所述第一核；所述第一核，还用于根据所述计数值和所述当前处理能力，从所述第一任务中的未分配子任务中确定所述目标子任务。In some embodiments, the first core is further used to send a subtask acquisition instruction to the task counting module; the subtask acquisition instruction includes the current processing capability of the first core, and the current processing capability is represented by the number of processable subtasks; the task counting module is further used to respond to the subtask acquisition instruction and send the count value to the first core; the first core is further used to determine the target subtask from the unallocated subtasks in the first task based on the count value and the current processing capability.

在一些实施例中，所述任务计数模块，还用于将所述计数值增加所述可处理的子任务的数量，得到更新后的计数值。In some embodiments, the task counting module is further used to increase the count value by the number of processable subtasks to obtain an updated count value.

在一些实施例中，所述第一核，还用于在接收到来自中央处理器的子任务获取指示信息的情况下，向所述任务计数模块发送所述子任务获取指令。In some embodiments, the first core is further configured to send the subtask acquisition instruction to the task counting module upon receiving subtask acquisition indication information from the central processing unit.

在一些实施例中，所述第一核，还用于在所述第一核内的第一数量个子任务被处理完成的情况下，向所述任务计数模块发送所述子任务获取指令；所述可处理的子任务的数量为所述被处理完成的第一数量个子任务的数量。In some embodiments, the first core is further used to send the subtask acquisition instruction to the task counting module when a first number of subtasks in the first core are processed; the number of processable subtasks is the number of the first number of subtasks that have been processed.

在一些实施例中，所述第一核，还用于在与上一次发送子任务获取指令之间的时间间隔大于或者等于预设获取时间间隔，且所述可处理的子任务的数量大于0的情况下，向所述任务计数模块发送所述子任务获取指令；所述可处理的子任务的数量为上一次发送任务获取指令之后处理完成的子任务的数量。In some embodiments, the first core is also used to send the subtask acquisition instruction to the task counting module when the time interval between the last sending of the subtask acquisition instruction is greater than or equal to the preset acquisition time interval and the number of processable subtasks is greater than 0; the number of processable subtasks is the number of subtasks processed and completed after the last sending of the task acquisition instruction.

在一些实施例中，所述第一核，还用于在所述计数值小于所述第一任务的子任务总数的情况下，根据所述计数值和所述可处理的子任务的数量，确定所述目标子任务的目标任务标识；将所述未分配子任务中与所述目标任务标识对应的子任务，确定为所述目标子任务。In some embodiments, the first core is further used to determine the target task identifier of the target subtask based on the count value and the number of processable subtasks when the count value is less than the total number of subtasks of the first task; and determine the subtask corresponding to the target task identifier in the unassigned subtasks as the target subtask.

在一些实施例中，所述第一任务的子任务包括任务标识；所述任务标识的顺序用于表征所述第一任务的子任务的分配顺序；所述第一核，还用于根据所述计数值确定所述目标任务标识中的起始标识，按照所述分配顺序，依次获取所述可处理的子任务的数量个任务标识，作为所述目标任务标识。In some embodiments, the subtasks of the first task include task identifiers; the order of the task identifiers is used to characterize the allocation order of the subtasks of the first task; the first core is also used to determine the starting identifier in the target task identifier according to the count value, and obtain the number of task identifiers of the processable subtasks in turn according to the allocation order as the target task identifier.

在一些实施例中，所述第一核，还用于在所述计数值大于或者等于所述第一任务的子任务总数的情况下，确定所述未分配子任务的数量为0，且所述目标子任务不存在。In some embodiments, the first core is further configured to determine that the number of unallocated subtasks is 0 and the target subtask does not exist when the count value is greater than or equal to the total number of subtasks of the first task.

在一些实施例中，所述任务计数模块，还用于在接收到来自多个核的多个子任务获取指令的情况下，按照所述子任务获取指令中可处理的子任务的数量进行排序，得到响应顺序；根据所述响应顺序依次响应所述子任务获取指令。In some embodiments, the task counting module is also used to, when receiving multiple subtask acquisition instructions from multiple cores, sort them according to the number of processable subtasks in the subtask acquisition instructions to obtain a response order; and respond to the subtask acquisition instructions in sequence according to the response order.

在一些实施例中，所述第一核，还用于在所述第一核的子任务全部处理完成的情况下，向中央处理器反馈空闲信息。In some embodiments, the first core is further configured to feed back idle information to the central processing unit when all subtasks of the first core are processed.

在一些实施例中，所述任务计数模块，还用于响应来自所述中央处理器的清零指令，将所述计数值清零。In some embodiments, the task counting module is further configured to respond to a clear instruction from the central processing unit to clear the count value.

在一些实施例中，所述任务计数模块位于所述GPU的内存中In some embodiments, the task counting module is located in the memory of the GPU

需要说明的是，本申请实施例中，上述任务处理方法可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来，该软件产品存储在一个存储介质中，包括若干指令用以使得一台电子设备(可以是个人计算机、服务器、或者网络设备等)的图形处理器执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read Only Memory，ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样，本申请实施例不限制于任何特定的硬件、软件或固件，或者硬件、软件、固件三者之间的任意结合。It should be noted that, in the embodiment of the present application, the above-mentioned task processing method can be stored in a computer-readable storage medium. Based on such an understanding, the technical solution of the embodiment of the present application is essentially or the part that contributes to the relevant technology can be embodied in the form of a software product, which is stored in a storage medium, including a number of instructions to enable a graphics processor of an electronic device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in each embodiment of the present application. The aforementioned storage medium includes: various media that can store program codes, such as a U disk, a mobile hard disk, a read-only memory (ROM), a disk or an optical disk. In this way, the embodiment of the present application is not limited to any specific hardware, software or firmware, or any combination of hardware, software, and firmware.

本申请实施例提供一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现上述方法中的部分或全部步骤。所述计算机可读存储介质可以是瞬时性的，也可以是非瞬时性的。The embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, some or all of the steps in the above method are implemented. The computer-readable storage medium can be transient or non-transient.

本申请实施例提供一种计算机程序，包括计算机可读代码，在所述计算机可读代码在电子设备中运行的情况下，所述电子设备中的图形处理器执行用于实现上述方法中的部分或全部步骤。An embodiment of the present application provides a computer program, including a computer-readable code. When the computer-readable code is executed in an electronic device, a graphics processor in the electronic device executes some or all of the steps for implementing the above method.

本申请实施例提供一种计算机程序产品，所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，所述计算机程序被计算机读取并执行时，实现上述方法中的部分或全部步骤。该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一些实施例中，所述计算机程序产品具体体现为计算机存储介质，在另一些实施例中，计算机程序产品具体体现为软件产品，例如软件开发包(Software Development Kit，SDK)等等。The embodiment of the present application provides a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, some or all of the steps in the above method are implemented. The computer program product can be implemented specifically by hardware, software or a combination thereof. In some embodiments, the computer program product is specifically embodied as a computer storage medium, and in other embodiments, the computer program product is specifically embodied as a software product, such as a software development kit (SDK) and the like.

这里需要指出的是：上文对各个实施例的描述倾向于强调各个实施例之间的不同之处，其相同或相似之处可以互相参考。以上设备、存储介质、计算机程序及计算机程序产品实施例的描述，与上述方法实施例的描述是类似的，具有同方法实施例相似的有益效果。对于本申请设备、存储介质、计算机程序及计算机程序产品实施例中未披露的技术细节，请参照本申请方法实施例的描述而理解。It should be noted here that the description of the various embodiments above tends to emphasize the differences between the various embodiments, and the same or similar aspects can be referenced to each other. The description of the above device, storage medium, computer program and computer program product embodiments is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the embodiments of the device, storage medium, computer program and computer program product of this application, please refer to the description of the method embodiment of this application for understanding.

本申请实施例还提供的一种电子设备的结构，如图8所示，电子设备700包括处理器701、通信接口702和存储器703；其中，处理器701包括中央处理器7011和多核图形处理器7012。其中，The present application also provides a structure of an electronic device, as shown in FIG8 , where the electronic device 700 includes a processor 701, a communication interface 702, and a memory 703; wherein the processor 701 includes a central processing unit 7011 and a multi-core graphics processor 7012.

中央处理器7011，用于向多核图形处理器7012下发第一任务。The central processing unit 7011 is used to send the first task to the multi-core graphics processor 7012.

多核图形处理器7012，用于执行上述任务处理方法。The multi-core graphics processor 7012 is used to execute the above task processing method.

通信接口702可以使电子设备通过网络与其他终端或服务器通信。The communication interface 702 enables the electronic device to communicate with other terminals or servers through a network.

存储器703配置为存储由处理器701可执行的指令和应用，还可以缓存待处理器701以及电子设备700中各模块待处理或已经处理的数据(例如，图像数据、音频数据、语音通信数据和视频通信数据)，可以通过闪存(FLASH)或随机访问存储器(Random AccessMemory，RAM)实现。处理器701、通信接口702和存储器703之间可以通过总线704进行数据传输。The memory 703 is configured to store instructions and applications executable by the processor 701, and can also cache data to be processed or processed by the processor 701 and each module in the electronic device 700 (for example, image data, audio data, voice communication data, and video communication data), which can be implemented by flash memory (FLASH) or random access memory (Random Access Memory, RAM). Data transmission can be performed between the processor 701, the communication interface 702, and the memory 703 through the bus 704.

应理解，说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此，在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外，这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解，在本申请的各种实施例中，上述各步骤/过程的序号的大小并不意味着执行顺序的先后，各步骤/过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。It should be understood that "one embodiment" or "an embodiment" mentioned throughout the specification means that specific features, structures or characteristics related to the embodiment are included in at least one embodiment of the present application. Therefore, "in one embodiment" or "in an embodiment" appearing throughout the specification does not necessarily refer to the same embodiment. In addition, these specific features, structures or characteristics can be combined in one or more embodiments in any suitable manner. It should be understood that in various embodiments of the present application, the size of the serial number of each step/process mentioned above does not mean the order of execution, and the execution order of each step/process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application. The serial numbers of the embodiments of the present application mentioned above are for description only and do not represent the advantages and disadvantages of the embodiments.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in this article, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, an element defined by the sentence "comprises a ..." does not exclude the existence of other identical elements in the process, method, article or device including the element.

在本申请所提供的几个实施例中，应该理解到，所揭露的设备和方法，可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，如：多个单元或组件可以结合，或可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口，设备或单元的间接耦合或通信连接，可以是电性的、机械的或其它形式的。In the several embodiments provided in the present application, it should be understood that the disclosed devices and methods can be implemented in other ways. The device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as: multiple units or components can be combined, or can be integrated into another system, or some features can be ignored, or not executed. In addition, the coupling, direct coupling, or communication connection between the components shown or discussed can be through some interfaces, and the indirect coupling or communication connection of the devices or units can be electrical, mechanical or other forms.

上述作为分离部件说明的单元可以是、或也可以不是物理上分开的，作为单元显示的部件可以是、或也可以不是物理单元；既可以位于一个地方，也可以分布到多个网络单元上；可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed on multiple network units; some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.

另外，在本申请各实施例中的各功能单元可以全部集成在一个处理单元中，也可以是各单元分别单独作为一个单元，也可以两个或两个以上单元集成在一个单元中；上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be a separate unit, or two or more units may be integrated into one unit; the above-mentioned integrated units may be implemented in the form of hardware or in the form of hardware plus software functional units.

本领域普通技术人员可以理解：实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成，前述的程序可以存储于计算机可读取存储介质中，该程序在执行时，执行包括上述方法实施例的步骤；而前述的存储介质包括：移动存储设备、只读存储器(Read Only Memory，ROM)、磁碟或者光盘等各种可以存储程序代码的介质。A person of ordinary skill in the art can understand that: all or part of the steps of implementing the above-mentioned method embodiment can be completed by hardware related to program instructions, and the aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps of the above-mentioned method embodiment; and the aforementioned storage medium includes: various media that can store program codes, such as mobile storage devices, read-only memories (ROM), magnetic disks or optical disks.

或者，本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括：移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can essentially or in other words, the part that contributes to the relevant technology can be embodied in the form of a software product, which is stored in a storage medium and includes a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in each embodiment of the present application. The aforementioned storage medium includes: various media that can store program codes, such as mobile storage devices, ROMs, magnetic disks, or optical disks.

以上所述，仅为本申请的实施例而已，并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等，均包含在本申请的保护范围之内。The above is only an embodiment of the present application and is not intended to limit the protection scope of the present application. Any modifications, equivalent substitutions and improvements made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A task processing method for a multi-core graphics processor GPU, characterized by comprising:

In a process of the GPU processing a first task, based on a current processing capability of the first core and a number of allocated subtasks in the first task that have been allocated to each core, determining a target subtask of the first core from the first task; the first core is one of the multiple cores;

The target subtask is obtained and processed through the first core.

2. The method according to claim 1 is characterized in that the multi-core GPU includes a task counting module, and the number of allocated subtasks of each core is determined by a count value of the task counting module.

3. The method according to claim 2, wherein determining the target subtask of the first core from the first task comprises:

Sending a subtask acquisition instruction to a task counting module through the first core; the subtask acquisition instruction includes a current processing capability of the first core, and the current processing capability is represented by the number of subtasks that can be processed;

Responding to the subtask acquisition instruction through the task counting module, sending the count value to the first core;

The target subtask is determined from unassigned subtasks in the first task according to the count value and the current processing capability by the first core.

4 . The method according to claim 3 , further comprising: increasing the count value by the number of processable subtasks through the task counting module to obtain an updated count value.

5. The method according to claim 3, characterized in that the sending of the subtask acquisition instruction to the task counting module comprises:

When receiving the subtask acquisition instruction information from the central processing unit, the subtask acquisition instruction is sent to the task counting module.

6. The method according to claim 3, characterized in that the sending of the subtask acquisition instruction to the task counting module comprises:

When a first number of subtasks in the first core are processed, the subtask acquisition instruction is sent to the task counting module; the number of processable subtasks is the number of the first number of subtasks that have been processed.

7. The method according to claim 3, characterized in that the sending of the subtask acquisition instruction to the task counting module comprises:

When the time interval between the last sending of the subtask acquisition instruction is greater than or equal to the preset acquisition time interval, and the number of processable subtasks is greater than 0, the subtask acquisition instruction is sent to the task counting module; the number of processable subtasks is the number of subtasks processed and completed after the last sending of the task acquisition instruction.

8. The method according to claim 3, characterized in that the step of determining the target subtask from unassigned subtasks in the first task according to the count value and the current processing capacity comprises:

In a case where the count value is less than the total number of subtasks of the first task, determining a target task identifier of the target subtask according to the count value and the number of processable subtasks;

A subtask among the unassigned subtasks corresponding to the target task identifier is determined as the target subtask.

9. The method according to claim 8, characterized in that the subtasks of the first task include task identifiers; the order of the task identifiers is used to characterize the order of allocation of the subtasks of the first task; and the determining of the target task identifier of the target subtask according to the count value and the number of processable subtasks comprises:

The starting identifier in the target task identifier is determined according to the count value, and the task identifiers of the number of processable subtasks are acquired in sequence according to the allocation order as the target task identifier.

10. The method according to claim 3, characterized in that the step of determining the target subtask from unassigned subtasks in the first task according to the count value and the current processing capacity comprises:

When the count value is greater than or equal to the total number of subtasks of the first task, it is determined that the number of unassigned subtasks is 0 and the target subtask does not exist.

11. The method according to claim 3, characterized in that the method further comprises:

When receiving multiple subtask acquisition instructions from multiple cores, the task counting module sorts the subtask acquisition instructions according to the number of processable subtasks in the subtask acquisition instructions to obtain a response order; and responds to the subtask acquisition instructions in sequence according to the response order.

12. The method according to any one of claims 1 to 11, characterized in that the method further comprises:

Through the first core, when all subtasks of the first core are processed completely, idle information is fed back to the central processing unit.

13. The method according to any one of claims 1 to 11, characterized in that the method further comprises:

The first task is received from a central processing unit; the first task includes the total number of subtasks of the first task.

14. The method according to claim 13, characterized in that the method further comprises:

The task counting module responds to a clear instruction from the central processing unit to clear the count value.

15. The method according to claim 9, characterized in that the task identifier is represented by a 64-bit integer.

16. The method according to claim 2, characterized in that the task counting module is located in the memory of the GPU.

17. A multi-core graphics processor GPU, comprising:

a first core, configured to determine, during processing of a first task by the GPU, a target subtask of the first core from the first task based on a current processing capability of the first core and a number of allocated subtasks in the first task that have been allocated to each core; the first core being one of the multiple cores;

The first core is further configured to obtain and process the target subtask.

18. An electronic device, comprising: a multi-core graphics processor GPU as claimed in claim 17.

19. A computer storage medium, characterized in that executable instructions are stored thereon, and are used to implement the task processing method according to any one of claims 1 to 16 when executed by a graphics processor.