CN116360858B

CN116360858B - Data processing method, graphics processor, electronic device and storage medium

Info

Publication number: CN116360858B
Application number: CN202310612804.6A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Moore Threads Technology Co Ltd
Current assignee: Mole Thread Intelligent Technology (Beijing) Co.,Ltd.
Priority date: 2023-05-26
Filing date: 2023-05-26
Publication date: 2023-08-29
Anticipated expiration: 2043-05-26
Also published as: CN116360858A

Abstract

The present disclosure relates to the technical field of information processing, and in particular to a data processing method, a graphics processor, an electronic device, and a storage medium. The processing method includes: receiving an instruction to be operated; The interactive address corresponding to the operation data, and based on the interactive address, obtains the data to be operated stored in the register group; wherein, the interactive address includes: storage area row address alignment offset, storage area intermediate address; the storage area The area row address alignment offset is used to indicate the address offset of the data to be operated in the storage area in the preset data processing mode; the intermediate address of the storage area is used to indicate that the data to be operated is in the corresponding An intermediate address in the storage area; an operation result is generated according to the data to be operated and the operation method. The data processing method provided by the embodiment of the present disclosure is beneficial to improving the access flexibility of the register set.

Description

Data processing method, graphics processor, electronic device and storage medium

技术领域technical field

本公开涉及信息处理技术领域，尤其涉及一种数据的处理方法、图形处理器、电子设备及存储介质。The present disclosure relates to the technical field of information processing, and in particular, to a data processing method, a graphics processor, an electronic device, and a storage medium.

背景技术Background technique

相关技术中，图形处理器与主机之间会进行频繁的数据交互，主机将待运算数据发送至图形处理器，由图形处理器保存至存储器后依据待运算数据的运算符对其进行并行运算，以实现待运算数据的高速处理。然而随着程序应用场景不同，图形处理器已无法同时对不同的数据处理模式的待运算数据进行处理，运算局限性过大，故如何更好地对数据进行处理，是开发人员亟需解决的技术问题。In related technologies, there is frequent data interaction between the graphics processor and the host, and the host sends the data to be calculated to the graphics processor, and the graphics processor saves the data to the memory and performs parallel operations on it according to the operator of the data to be calculated. In order to realize the high-speed processing of the data to be calculated. However, with the different application scenarios of the program, the graphics processor can no longer process the data to be calculated in different data processing modes at the same time, and the calculation limitation is too large. Therefore, how to better process the data is an urgent need for developers to solve technical problem.

发明内容Contents of the invention

本公开提出了一种数据的处理技术方案。The present disclosure proposes a data processing technical solution.

根据本公开的一方面，提供了一种数据的处理方法，应用于图形处理器，所述图形处理器包括计算核心、寄存器组；所述处理方法包括：接收待运算指令；根据所述待运算指令，确定运算方式以及待运算数据对应的交互地址，并基于所述交互地址获取所述寄存器组中存储的待运算数据；其中，所述交互地址包括：存储区域行地址对齐偏移量、存储区域中间地址；所述存储区域行地址对齐偏移量用以表示所述待运算数据在预设的数据处理模式下在存储区域中的地址偏移情况；所述存储区域中间地址用以表示所述待运算数据在对应的存储区域中的中间地址；根据所述待运算数据、所述运算方式，生成运算结果。According to an aspect of the present disclosure, a data processing method is provided, which is applied to a graphics processor, and the graphics processor includes a computing core and a register set; the processing method includes: receiving an instruction to be operated; Instructions, determine the operation mode and the interaction address corresponding to the data to be operated, and obtain the data to be operated stored in the register group based on the interaction address; wherein, the interaction address includes: storage area row address alignment offset, storage The middle address of the area; the row address alignment offset of the storage area is used to represent the address offset of the data to be operated in the storage area in the preset data processing mode; the middle address of the storage area is used to represent the The intermediate address of the data to be operated in the corresponding storage area; according to the data to be operated and the operation method, an operation result is generated.

在一种可能的实施方式中，所述寄存器组包括至少一个存储区域，所述处理方法还包括：接收多个待运算数据；根据所述多个待运算数据对应的数据处理模式，将所述多个待运算数据划分为多个待运算数据组；其中，不同的数据处理模式对应的待运算数据组中的待运算数据量不同；将所述多个待运算数据组分配至所述至少一个存储区域中，得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址；其中，所述多个待运算数据组中的每个待运算数据组被分配到至少一个存储区域中。In a possible implementation manner, the register set includes at least one storage area, and the processing method further includes: receiving a plurality of data to be operated; according to a data processing mode corresponding to the plurality of data to be operated, converting the The plurality of data to be operated is divided into a plurality of data groups to be operated; wherein, the amount of data to be operated in the data groups to be operated corresponding to different data processing modes is different; the plurality of data groups to be operated is allocated to the at least one In the storage area, the interactive address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated is obtained; wherein, each data group to be operated in the plurality of data groups to be operated is allocated to at least one storage area.

在一种可能的实施方式中，所述得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址，包括：根据所述每个待运算数据对应的逻辑地址的基地址中自最低位预设位数的地址位作为所述每个待运算数据对应的交互地址中的存储区域行地址对齐偏移量，将剩余地址位左移所述预设位数后，加上逻辑偏移地址得到的和值作为所述每个待运算数据对应的交互地址中的存储区域中间地址；其中，所述逻辑偏移地址为每个待运算数据对应的逻辑地址的地址偏移量。In a possible implementation manner, the obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated includes: according to the address corresponding to each data to be operated The address bits from the lowest preset number of bits in the base address of the logical address are used as the storage area row address alignment offset in the interactive address corresponding to each data to be operated, and the remaining address bits are shifted to the left by the preset bits After counting, the sum value obtained by adding the logical offset address is used as the intermediate address of the storage area in the interactive address corresponding to each data to be operated; wherein, the logical offset address is the logical address corresponding to each data to be operated address offset.

在一种可能的实施方式中，所述根据所述多个待运算数据对应的数据处理模式，将所述多个待运算数据划分为多个待运算数据组，包括：根据所述多个待运算数据对应的数据处理模式，将所述多个待运算数据划分为对应有任务标识的多个待运算数据组；其中，所述任务标识用以映射所述多个待运算数据组中每个待运算数组分配至所述至少一个存储区域中的基地址位置。In a possible implementation manner, the dividing the plurality of data to be operated into a plurality of data groups to be operated according to the data processing mode corresponding to the plurality of data to be operated includes: according to the plurality of data to be operated The data processing mode corresponding to the operation data divides the plurality of data to be operated into a plurality of data groups to be operated corresponding to task identifiers; wherein, the task identifier is used to map each of the plurality of data groups to be operated The array to be operated is allocated to the base address position in the at least one storage area.

在一种可能的实施方式中，所述交互地址包括：存储区域行地址对齐偏移量、存储区域中间地址、哈希值，所述得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址，包括：根据所述多个待运算数据组中每个待运算数据组对应的任务标识，生成所述每个待运算数据组对应的哈希值；其中，任务标识相邻的待运算数据组之间的哈希值不同；根据所述每个待运算数据组对应的哈希值，生成所述每个待运算数据组中每个待运算数据对应的交互地址；其中，不同的哈希值对应的待运算数据组中待运算数据存入不同的物理存储区域。In a possible implementation manner, the interaction address includes: storage area row address alignment offset, storage area intermediate address, and hash value, and each data to be operated in the plurality of data groups to be operated is obtained The interaction address corresponding to each data to be calculated in the group includes: generating a hash value corresponding to each data group to be calculated according to the task identifier corresponding to each data group to be calculated in the plurality of data groups to be calculated; Wherein, the task identifiers have different hash values between adjacent data groups to be operated; according to the hash value corresponding to each data group to be operated, generate the corresponding The interactive address; where, the data to be calculated in the data group to be calculated corresponding to different hash values are stored in different physical storage areas.

在一种可能的实施方式中，所述交互地址包括：存储区域行地址对齐偏移量、存储区域中间地址、哈希值，所述得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址，包括：根据所述多个待运算数据组中每个待运算数据组对应的任务标识、所述待运算数据组对应的采样标识，生成所述每个待运算数据组对应的哈希值；其中，所述采样标识用以表示所述多个待运算数据组中每个待运算数据组针对处理的不同采样点；根据所述每个待运算数据组对应的哈希值，生成所述每个待运算数据组中每个待运算数据对应的交互地址；其中，不同的哈希值对应的待运算数据组中待运算数据存入不同的物理存储区域。In a possible implementation manner, the interaction address includes: storage area row address alignment offset, storage area intermediate address, and hash value, and each data to be operated in the plurality of data groups to be operated is obtained The interaction address corresponding to each data to be calculated in the group includes: generating the each A hash value corresponding to each data group to be operated; wherein, the sampling identifier is used to represent a different sampling point for processing of each data group to be operated in the plurality of data groups to be operated; according to each data group to be operated The hash value corresponding to the group generates the interactive address corresponding to each data to be calculated in each data group to be calculated; wherein, the data to be calculated in the data group to be calculated corresponding to different hash values are stored in different physical storage area.

在一种可能的实施方式中，所述寄存器组包括一个存储区域，所述基于所述交互地址获取所述寄存器组中存储的待运算数据，包括：根据每个待运算数据对应的交互地址，确定所述每个待运算数据在其对应的存储区域中的行地址以及列地址；其中，所述行地址、列地址用于表示所述存储区域中寄存器的位置；根据所述每个待运算数据在其对应的存储区域中的行地址以及列地址，访问其对应的寄存器，得到所述每个待运算数据。In a possible implementation manner, the register group includes a storage area, and the obtaining the data to be operated in the register group based on the interaction address includes: according to the interaction address corresponding to each data to be operated, Determine the row address and column address of each data to be operated in its corresponding storage area; wherein, the row address and column address are used to indicate the position of the register in the storage area; according to each data to be operated The row address and column address of the data in its corresponding storage area are accessed to its corresponding register to obtain each of the data to be operated.

在一种可能的实施方式中，所述寄存器组包括多个存储区域，所述基于所述交互地址获取所述寄存器组中存储的待运算数据，包括：针对所述多个待运算数据中的每个待运算数据，根据所述多个待运算数据，确定所述每个待运算数据对应的交互地址、存储区域标识；其中，所述存储区域标识用以在将所述多个待运算数据组分配至多个存储区域中时，确定多个待运算数据组中每个待运算数据组对应的存储区域；根据所述每个待运算数据对应的存储区域标识，确定所述每个待运算数据对应的存储区域；根据所述每个待运算数据对应的交互地址，确定所述每个待运算数据在其对应的存储区域中的行地址以及列地址；其中，所述行地址、列地址用于表示所述存储区域中寄存器的位置；根据所述每个待运算数据在其对应的存储区域中的行地址以及列地址，访问其对应的存储区域中的寄存器，得到所述每个待运算数据。In a possible implementation manner, the register set includes a plurality of storage areas, and the obtaining the data to be operated in the register set based on the interaction address includes: For each data to be operated, according to the plurality of data to be operated, an interactive address and a storage area identifier corresponding to each data to be operated are determined; wherein, the storage area identification is used to store the plurality of data to be operated When a group is allocated to multiple storage areas, determine the storage area corresponding to each data group to be calculated among the multiple data groups to be calculated; determine each data to be calculated according to the storage area identifier corresponding to each data to be calculated Corresponding storage area; according to the interactive address corresponding to each data to be operated, determine the row address and column address of each data to be operated in its corresponding storage area; wherein, the row address and column address are used To indicate the position of the register in the storage area; according to the row address and column address of each data to be operated in its corresponding storage area, access the register in its corresponding storage area to obtain the data.

在一种可能的实施方式中，所述得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址，包括：生成所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址；根据所述待运算数据对应的线程号、预设的数据处理模式对应的线程总数，生成所述每个待运算数据对应的段数；其中，所述段数表示所述每个待运算数据以所述预设的数据处理模式下进行存储产生的段偏移；所述根据所述每个待运算数据对应的交互地址，确定所述每个待运算数据在其对应的存储区域中的行地址以及列地址，包括：根据所述每个待运算数据对应的交互地址、段数，生成所述每个待运算数据在其对应的存储区域中的行地址；根据所述每个待运算数据对应的交互地址、段数、哈希值，生成所述每个待运算数据在其对应的存储区域中的列地址、或根据所述每个待运算数据对应的交互地址、段数，生成所述每个待运算数据在其对应的存储区域中的列地址。In a possible implementation manner, the obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated includes: generating an address in the plurality of data groups to be operated The interactive address corresponding to each data to be calculated in each data group to be calculated; generate the number of segments corresponding to each data to be calculated according to the thread number corresponding to the data to be calculated and the total number of threads corresponding to the preset data processing mode ; Wherein, the number of segments represents the segment offset generated by storing each of the data to be operated in the preset data processing mode; according to the interactive address corresponding to each of the data to be operated, determine the The row address and column address of each data to be operated in its corresponding storage area, including: generating the data to be operated in its corresponding storage area according to the interactive address and the number of segments corresponding to each data to be operated The row address in; according to the interaction address, segment number, and hash value corresponding to each data to be operated, generate the column address of each data to be operated in its corresponding storage area, or according to each data to be operated The interactive address and segment number corresponding to the data are calculated, and the column address of each data to be calculated in its corresponding storage area is generated.

在一种可能的实施方式中，所述将所述多个待运算数据组分配至所述至少一个存储区域中，包括：针对所述多个待运算数据组中的每个待运算数据组，将所述每个待运算数据组均分发送至每个存储区域。In a possible implementation manner, the allocating the plurality of data groups to be operated into the at least one storage area includes: for each data group to be operated in the plurality of data groups to be operated, Each of the data groups to be operated is equally distributed to each storage area.

在一种可能的实施方式中，所述基于所述交互地址获取所述寄存器组中存储的待运算数据，包括：通过所述图形处理器的多个流水线，基于所述交互地址获取所述寄存器组中存储的待运算数据；所述根据所述待运算数据、所述运算方式，生成运算结果，包括：通过所述图形处理器的多个流水线，根据所述待运算数据、所述运算方式，生成每个待运算数据对应的运算结果。In a possible implementation manner, the acquiring the data to be operated stored in the register set based on the interaction address includes: acquiring the register based on the interaction address through multiple pipelines of the graphics processor The data to be calculated stored in the group; said generating a calculation result according to the data to be calculated and the calculation method includes: passing through multiple pipelines of the graphics processor, according to the data to be calculated and the calculation method , to generate the operation result corresponding to each data to be operated.

在一种可能的实施方式中，所述通过所述图形处理器的多个流水线，根据所述待运算数据、所述运算方式，生成每个待运算数据对应的运算结果，包括：在所述图形处理器的多个流水线中存在不同流水线同时访问寄存器组的目标端口的情况下，根据所述不同流水线的优先级、和/或读取写入操作的优先级进行仲裁，确定所述目标端口对应的目标流水线；依序通过所述目标流水线、所述不同流水线中的其他流水线，根据所述待运算数据、所述运算方式，生成每个待运算数据对应的运算结果。In a possible implementation manner, generating an operation result corresponding to each data to be operated according to the data to be operated and the operation mode through multiple pipelines of the graphics processor includes: In the case that there are different pipelines in the multiple pipelines of the graphics processor accessing the target port of the register group at the same time, arbitration is performed according to the priorities of the different pipelines and/or the priority of the read and write operations to determine the target port The corresponding target pipeline: through the target pipeline and other pipelines in the different pipelines in sequence, according to the data to be calculated and the calculation method, an operation result corresponding to each data to be calculated is generated.

根据本公开的一方面，提供了一种图形处理器，所述图形处理器包括：计算核心、与所述计算核心相连的寄存器组；所述计算核心，用以接收待运算指令；根据所述待运算指令，确定运算方式以及待运算数据对应的交互地址，并基于所述交互地址获取所述寄存器组中存储的待运算数据；其中，所述交互地址包括：存储区域行地址对齐偏移量、存储区域中间地址；所述存储区域行地址对齐偏移量用以表示所述待运算数据在预设的数据处理模式下在存储区域中的地址偏移情况；所述存储区域中间地址用以表示所述待运算数据在对应的存储区域中的中间地址；根据所述待运算数据、所述运算方式，生成运算结果。According to an aspect of the present disclosure, a graphics processor is provided, the graphics processor includes: a computing core, a register group connected to the computing core; the computing core is used to receive instructions to be calculated; according to the The instruction to be operated determines the operation mode and the interactive address corresponding to the data to be operated, and obtains the data to be operated stored in the register set based on the interactive address; wherein the interactive address includes: storage area row address alignment offset . The middle address of the storage area; the row address alignment offset of the storage area is used to represent the address offset of the data to be calculated in the storage area in the preset data processing mode; the middle address of the storage area is used to Indicates the intermediate address of the data to be operated in the corresponding storage area; generates an operation result according to the data to be operated and the operation mode.

根据本公开的一方面，提供了一种电子设备，包括：主机、上述的图形处理器。According to an aspect of the present disclosure, an electronic device is provided, including: a host, and the above-mentioned graphics processor.

根据本公开的一方面，提供了一种计算机可读可写数据存储介质，其上存储有计算机程序指令、待运算数据，所述计算机程序指令被处理器执行时实现上述的数据的处理方法。According to one aspect of the present disclosure, a computer-readable and writable data storage medium is provided, on which computer program instructions and data to be calculated are stored, and when the computer program instructions are executed by a processor, the above data processing method is implemented.

在本公开实施例中，可接收待运算指令，而后根据所述待运算指令，确定运算方式以及待运算数据对应的交互地址，并基于所述交互地址获取所述寄存器组中存储的待运算数据，最终根据所述待运算数据、所述运算方式，生成运算结果。本公开实施例可通过记录有存储区域行地址对齐偏移量的交互地址进行流程交互，将不同数据处理模式下待运算数据在寄存器组中的逻辑地址转换为预设的数据处理模式下的交互地址的方式，使得寄存器组可同时兼容保存不同数据处理模式下的待运算数据，也可同时兼容针对不同数据处理模式下的待运算数据的运算处理，有利于提高寄存器组的访问灵活性。In the embodiment of the present disclosure, the instruction to be operated can be received, and then according to the instruction to be operated, the operation mode and the interaction address corresponding to the data to be operated can be determined, and the data to be operated stored in the register group can be obtained based on the interaction address , and finally generate a calculation result according to the data to be calculated and the calculation method. The embodiment of the present disclosure can perform process interaction through the interaction address recorded with the alignment offset of the row address of the storage area, and convert the logical address of the data to be operated in the register set in different data processing modes to the interaction in the preset data processing mode The way of the address makes the register group compatible with saving the data to be calculated in different data processing modes, and also compatible with the operation and processing of the data to be calculated in different data processing modes, which is beneficial to improve the access flexibility of the register group.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，而非限制本公开。根据下面参考附图对示例性实施例的详细说明，本公开的其它特征及方面将变得清楚。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，这些附图示出了符合本公开的实施例，并与说明书一起用于说明本公开的技术方案。The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.

图1示出了根据本公开实施例提供的一种数据的处理方法的流程图。Fig. 1 shows a flowchart of a data processing method provided according to an embodiment of the present disclosure.

图2示出了根据本公开实施例提供的一种数据的处理方法的流程图。Fig. 2 shows a flowchart of a data processing method provided according to an embodiment of the present disclosure.

图3示出了根据本公开实施例提供的一种数据的处理方法的参考示意图。Fig. 3 shows a reference schematic diagram of a data processing method provided according to an embodiment of the present disclosure.

图4示出了根据本公开实施例提供的一种数据的处理方法的参考示意图。FIG. 4 shows a reference schematic diagram of a data processing method provided according to an embodiment of the present disclosure.

图5示出了根据本公开实施例提供的一种图形处理器的框图。Fig. 5 shows a block diagram of a graphics processor provided according to an embodiment of the present disclosure.

图6示出了根据本公开实施例提供的一种电子设备的框图。Fig. 6 shows a block diagram of an electronic device provided according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面，但是除非特别指出，不必按比例绘制附图。Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.

本文中术语“和/或”，仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合，例如，包括A、B、C中的至少一种，可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.

另外，为了更好地说明本公开，在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解，没有某些具体细节，本公开同样可以实施。在一些实例中，对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述，以便于凸显本公开的主旨。In addition, in order to better illustrate the present disclosure, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present disclosure may be practiced without some of the specific details. In some instances, methods, means, components and circuits that are well known to those skilled in the art have not been described in detail so as to obscure the gist of the present disclosure.

相关技术中，寄存器组通常只能对待运算数据进行同一种数据处理模式的处理，相关技术中的WAVE32（一次数据处理流程中由32个线程完成数据处理，其中，WAVE或称为线程束，示例性地，WAVE为SIMT（单指令多线程流）编程模型下的线程束，WAVE32、WAVE64、WAVE128分别为32、64、128条线程（及数据）共同执行同一指令序列）、WAVE64（一次数据处理流程中由64个线程完成数据处理）、WAVE128（一次数据处理流程中由128个线程完成数据处理）无法兼容，此处以WAVE32、WAVE128不能兼容的方案为例，若寄存器组为若干行、若干列的寄存器矩阵，寄存器组的列数为32，则WAVE32数据处理模式下存储的待运算数据为一整行，WAVE128数据处理模式下存储的待运算数据为4整行，在流水线指令进行相关技术中的repeat（或称重运算）或burst（或称跳变）等需要在寄存器组的地址的基础上进行加1累加的操作处理时，则WAVE128数据处理模式逻辑上的加1与WAVE32数据处理模式逻辑上的加1并不等同，若将二者强行放在一起处理，则会出现地址访问错误，换言之，相关技术中的寄存器组无法同时兼容WAVE32、WAVE128的数据处理模式。换言之，相关技术中不存在数据处理模式一说，只存在固定的数据处理模式。In the related art, the register bank can usually only process the data to be operated in the same data processing mode. In the related art, WAVE32 (data processing is completed by 32 threads in a data processing flow, where WAVE is also called a thread warp, example Specifically, WAVE is a thread warp under the SIMT (Single Instruction Multiple Threads) programming model, WAVE32, WAVE64, and WAVE128 are 32, 64, and 128 threads (and data) that jointly execute the same instruction sequence), WAVE64 (a data processing Data processing is completed by 64 threads in the process), WAVE128 (data processing is completed by 128 threads in a data processing process) are not compatible, here we take the incompatible scheme of WAVE32 and WAVE128 as an example, if the register set has several rows and several columns register matrix, the number of columns of the register set is 32, then the data to be calculated stored in the WAVE32 data processing mode is a whole line, and the data to be calculated stored in the WAVE128 data processing mode is 4 whole lines. When the repeat (or weighing operation) or burst (or jump) needs to be added and accumulated on the basis of the address of the register group, the WAVE128 data processing mode logically adds 1 and the WAVE32 data processing mode Logically adding 1 is not equivalent. If the two are forced to be processed together, an address access error will occur. In other words, the register set in the related art cannot be compatible with the data processing modes of WAVE32 and WAVE128 at the same time. In other words, there is no data processing mode in the related art, only a fixed data processing mode.

有鉴于此，本公开实施例提供了一种数据的处理方法，可接收待运算指令，而后根据所述待运算指令，确定运算方式以及待运算数据对应的交互地址，并基于所述交互地址获取所述寄存器组中存储的待运算数据，最终根据所述待运算数据、所述运算方式，生成运算结果。本公开实施例可通过记录有存储区域行地址对齐偏移量的交互地址进行流程交互，将不同数据处理模式下待运算数据在寄存器组中的逻辑地址转换为预设的数据处理模式下的交互地址的方式，使得寄存器组可同时兼容保存不同数据处理模式下的待运算数据，也可同时兼容针对不同数据处理模式下的待运算数据的运算处理，有利于提高寄存器组的访问灵活性。In view of this, the embodiment of the present disclosure provides a data processing method, which can receive the instruction to be operated, and then determine the operation method and the interaction address corresponding to the data to be operated according to the instruction to be operated, and obtain the data based on the interaction address. The data to be operated stored in the register group finally generates an operation result according to the data to be operated and the operation mode. The embodiment of the present disclosure can perform process interaction through the interaction address recorded with the alignment offset of the row address of the storage area, and convert the logical address of the data to be operated in the register set in different data processing modes to the interaction in the preset data processing mode The way of the address makes the register group compatible with saving the data to be calculated in different data processing modes, and also compatible with the operation and processing of the data to be calculated in different data processing modes, which is beneficial to improve the access flexibility of the register group.

参阅图1，图1示出了根据本公开实施例提供的一种数据的处理方法的流程图，结合图1所示，所述处理方法应用于图形处理器，在一个示例中，所述图形处理器与主机异构连接，所述图形处理器包括计算核心、寄存器组。所述处理方法包括：步骤S100，接收待运算指令。示例性地，上述待运算指令可表示为运算符，例如加法、乘法等，多个待运算数据的运算符可相同。在一个示例中，若图形处理器与主机相连，则待运算指令可由主机发送。Referring to FIG. 1, FIG. 1 shows a flow chart of a data processing method according to an embodiment of the present disclosure. With reference to FIG. 1, the processing method is applied to a graphics processor. In one example, the graphics The processor is heterogeneously connected to the host, and the graphics processor includes a computing core and a register set. The processing method includes: step S100, receiving an instruction to be calculated. Exemplarily, the above instruction to be operated may be expressed as an operator, such as addition, multiplication, etc., and the operators of multiple data to be operated may be the same. In one example, if the graphics processor is connected to the host, the instruction to be executed can be sent by the host.

步骤S200，根据所述待运算指令，确定运算方式以及待运算数据对应的交互地址，并基于所述交互地址获取所述寄存器组中存储的待运算数据。其中，所述交互地址包括：存储区域行地址对齐偏移量、存储区域中间地址。所述存储区域行地址对齐偏移量用以表示所述待运算数据在预设的数据处理模式下在存储区域中的地址偏移情况。所述存储区域中间地址用以表示所述待运算数据在对应的存储区域中的中间地址。交互地址的构成，后文将予以详述。上述中间地址可介于物理地址与逻辑地址的转换过程之间，以实现不同地址间的转换，中间地址与物理地址、逻辑地址之间存在对应关系即可。在一个示例中，待运算指令可包括操作码、操作域，其中，操作码用以表示上述运算方式，操作域用以被解析出上述交互地址，具体的转换关系可视实际情况而定。Step S200, according to the instruction to be operated, determine the operation mode and the interaction address corresponding to the data to be operated, and obtain the data to be operated stored in the register group based on the interaction address. Wherein, the interactive address includes: a row address alignment offset of the storage area, and an intermediate address of the storage area. The row address alignment offset of the storage area is used to represent the address offset of the data to be operated in the storage area in a preset data processing mode. The intermediate address of the storage area is used to represent the intermediate address of the data to be calculated in the corresponding storage area. The composition of the interactive address will be described in detail later. The above-mentioned intermediate address may be between the conversion process of the physical address and the logical address to realize the conversion between different addresses, as long as there is a corresponding relationship between the intermediate address and the physical address and the logical address. In an example, the instruction to be operated may include an operation code and an operation field, wherein the operation code is used to represent the above-mentioned operation mode, and the operation field is used to be parsed to obtain the above-mentioned interaction address, and the specific conversion relationship may depend on the actual situation.

步骤S300，根据所述待运算数据、所述运算方式，生成运算结果。在一个示例中，上述运算结果可保存至图形处理器的存储介质中，若图形处理器与主机相连，则主机可在接收到图形处理器的计算完成信号后访问该存储介质，以得到上述运算结果。在另一个示例中，上述运算结果可仅由相关技术中的数据处理，转换为与图形处理器连接的显示屏幕的显示信号，以实现画面的显示。Step S300, generating a calculation result according to the data to be calculated and the calculation method. In one example, the above calculation results can be stored in the storage medium of the graphics processor. If the graphics processor is connected to the host, the host can access the storage medium after receiving the calculation completion signal from the graphics processor to obtain the above calculation. result. In another example, the above calculation result can be converted into a display signal of a display screen connected to a graphics processor only by data processing in the related art, so as to realize the display of the picture.

在一种可能的实施方式中，步骤S200中基于所述交互地址获取所述寄存器组中存储的待运算数据，可包括：通过所述图形处理器的多个流水线，基于所述交互地址获取所述寄存器组中存储的待运算数据。步骤S300可包括：通过所述图形处理器的多个流水线，根据所述待运算数据、所述运算方式，生成每个待运算数据对应的运算结果。示例性地，图形处理器可通过至少一个流水线对上述交互地址进行访问以得到待运算数据，并基于待运算指令对其进行运算，具体的运算流程、待运算指令具体指示了何种运算本公开实施例在此不做限定，开发人员可视实际情况进行设定。示例性地，有关于地址的计算可由计算核心中的指令解析地址单元处计算，或放入寄存器组中进行计算，本公开实施例在此不做限定。示例性地，此处以单个线程实例处理一个加法操作的数据处理为例，线程索引_n处理A数组元素A[索引_n] + B数组元素B[索引_n]，以数组A[5]={1，3，5，7，9}与数组B[5]={2，4，6，8，10}相加举例，得到3（1+2）、7（3+4）、11（5+6）、15（7+8）、19（9+10）即为上述运算子结果，数组C[5]={3，7，11，15，19}为上述运算结果。In a possible implementation manner, in step S200, obtaining the data to be calculated stored in the register set based on the interaction address in step S200 may include: obtaining the data to be calculated based on the interaction address through multiple pipelines of the graphics processor. The data to be operated stored in the above register group. Step S300 may include: generating an operation result corresponding to each data to be operated according to the data to be operated and the operation mode through multiple pipelines of the graphics processor. Exemplarily, the graphics processor can access the above-mentioned interactive address through at least one pipeline to obtain the data to be operated, and perform operations on it based on the instruction to be operated. The embodiment is not limited here, and the developer can set it according to the actual situation. Exemplarily, the calculation related to the address can be calculated at the instruction parsing address unit in the computing core, or placed in the register bank for calculation, which is not limited in this embodiment of the present disclosure. Exemplarily, here, a single thread instance handles data processing of an addition operation as an example, thread index_n processes A array element A[index_n] + B array element B[index_n], and array A[5] ={1, 3, 5, 7, 9} is added to the array B[5]={2, 4, 6, 8, 10} for example, to get 3 (1+2), 7 (3+4), 11 (5+6), 15 (7+8), 19 (9+10) are the results of the above operators, and the array C[5]={3, 7, 11, 15, 19} is the result of the above operations.

在一种可能的实施方式中，所述通过所述图形处理器的多个流水线，根据所述待运算数据、所述运算方式，生成每个待运算数据对应的运算结果，包括：在所述图形处理器的多个流水线中存在不同流水线同时访问寄存器组的目标端口的情况下，根据所述不同流水线的优先级、和/或读取写入操作的优先级进行仲裁，确定所述目标端口对应的目标流水线。示例性地，上述流水线的优先级、读取写入操作的优先级本公开实施例在此不做限定，开发人员可根据实际需求进行设定。例如：上述流水线的优先级可与流水线对应的时序、编号顺序相关等。而后依序通过所述目标流水线、所述不同流水线中的其他流水线，根据所述待运算数据、所述运算方式，生成每个待运算数据对应的运算结果。参阅图2所示，图2示出了根据本公开实施例提供的一种数据的处理方法的流程图，在一种可能的实施方式中，所述寄存器组包括至少一个存储区域，示例性地，上述寄存器组可表现为通用数据寄存器组，其每个存储区域包括若干个寄存器。所述处理方法还包括：步骤S10，接收多个待运算数据。在一个示例中，所述多个待运算数据对应有运算顺序。示例性地，上述多个待运算数据可表现为向量、数组、矩阵等，其中的元素可自带有运算顺序这一属性，该属性可用以对待运算数据组进行划分、对每个元素的存储顺序进行确定等。此处以数组为例，上述待运算指令可表现为数组间的相加，多个待运算数据可包括：数组A[5] = {1，3，5，7，9}与数组B[5] ={2，4，6，8，10}，而后依序进行A[0]+B[0]、A[1]+B[1]、A[2]+B[2]、A[3]+B[3]、A[4]+B[4]的计算，得到新的数组C[5]={3，7，11，15，19}。在一个示例中，若图形处理器与主机相连，则多个待运算数据可由主机发送，具体的数据内容可视主机中运行的上层任务而定。In a possible implementation manner, generating an operation result corresponding to each data to be operated according to the data to be operated and the operation mode through multiple pipelines of the graphics processor includes: In the case that there are different pipelines in the multiple pipelines of the graphics processor accessing the target port of the register group at the same time, arbitration is performed according to the priorities of the different pipelines and/or the priority of the read and write operations to determine the target port Corresponding target pipeline. Exemplarily, the above-mentioned priority of the pipeline and the priority of the read and write operations are not limited in this embodiment of the present disclosure, and developers can set them according to actual requirements. For example, the priority of the above-mentioned pipelines may be related to the timing and numbering sequence corresponding to the pipelines. Then, through the target pipeline and other pipelines in the different pipelines in sequence, according to the data to be calculated and the calculation method, a calculation result corresponding to each data to be calculated is generated. Referring to FIG. 2, FIG. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure. In a possible implementation, the register set includes at least one storage area, for example , the above-mentioned register group can be represented as a general-purpose data register group, each storage area of which includes several registers. The processing method further includes: Step S10, receiving a plurality of data to be calculated. In an example, the plurality of data to be operated corresponds to an operation sequence. Exemplarily, the above-mentioned multiple data to be operated can be expressed as vectors, arrays, matrices, etc., and the elements in them can have the attribute of operation sequence, which can be used to divide the data to be operated and store each element The sequence is determined, etc. Taking an array as an example here, the above-mentioned instruction to be operated can be expressed as an addition between arrays, and multiple data to be operated can include: array A[5] = {1, 3, 5, 7, 9} and array B[5] ={2, 4, 6, 8, 10}, then A[0]+B[0], A[1]+B[1], A[2]+B[2], A[3] ]+B[3], A[4]+B[4] to get a new array C[5]={3, 7, 11, 15, 19}. In one example, if the graphics processor is connected to the host, multiple pieces of data to be calculated may be sent by the host, and the specific data content may depend on the upper-layer tasks running on the host.

步骤S20，根据所述多个待运算数据对应的数据处理模式，将所述多个待运算数据划分为多个待运算数据组。其中，不同的数据处理模式对应的待运算数据组中的待运算数据量不同。示例性地，上述数据处理模式可包括相关技术中的WAVE32、WAVE64、WAVE128等，此处以WAVE32举例，若共有64个待运算数据，则所述多个待运算数据可对应有2个WAVE32的段处理，每个WAVE32中包括32个线程（0至31号线程），可按照所述多个待运算数据的运算顺序，分配对应编号的线程（在其他示例中也可通过其他规则分配，本公开实施例在此不做限定）。继续以数组A[5] = {1，3，5，7，9}举例，则0号线程对应数字1，1号线程对应数字3，直至数组A中的元素被WAVE32分配完成。若数组A在分配过程中不足32个元素，则可仍然为其分配一个WAVE32，该WAVE32中的部分线程可空闲。例如数组A有50个元素，分配一个WAVE32后剩余18个元素（该示例中一个线程对应一个元素），则再分配一个WAVE32，该WAVE32中有14个线程处于空闲状态。此处以WAVE32、WAVE128的数据处理模式为例，不同的数据处理模式对应的待运算数据组中的待运算数据量不同可表现如下：WAVE32中存在32个线程，即对应有32个待运算数据，WAVE128中存在128个线程，即对应有128个待运算数据。Step S20, dividing the plurality of data to be operated into a plurality of data groups to be operated according to the data processing modes corresponding to the plurality of data to be operated on. Wherein, different data processing modes correspond to different amounts of data to be operated in the data group to be operated. Exemplarily, the above-mentioned data processing mode may include WAVE32, WAVE64, WAVE128, etc. in the related art. Here, WAVE32 is used as an example. If there are 64 data to be calculated, the multiple data to be calculated may correspond to 2 segments of WAVE32. processing, each WAVE32 includes 32 threads (threads 0 to 31), and the threads with corresponding numbers can be assigned according to the operation order of the plurality of data to be operated (in other examples, it can also be assigned by other rules, this disclosure Examples are not limited here). Continuing with the example of array A[5] = {1, 3, 5, 7, 9}, thread 0 corresponds to number 1, thread 1 corresponds to number 3, until the elements in array A are allocated by WAVE32. If the array A has less than 32 elements during the allocation process, a WAVE32 can still be allocated to it, and some threads in the WAVE32 can be idle. For example, the array A has 50 elements, and after allocating a WAVE32 there are 18 elements left (in this example, one thread corresponds to one element), then another WAVE32 is allocated, and 14 threads in this WAVE32 are idle. Here, taking the data processing modes of WAVE32 and WAVE128 as examples, different data processing modes correspond to different amounts of data to be calculated in the data group to be calculated, which can be expressed as follows: there are 32 threads in WAVE32, corresponding to 32 data to be calculated, There are 128 threads in WAVE128, that is, there are 128 data to be calculated.

在一种可能的实施方式中，步骤S20可包括：根据所述多个待运算数据对应的数据处理模式，将所述多个待运算数据划分为对应有任务标识的多个待运算数据组。其中，所述任务标识用以映射所述多个待运算数据组中每个待运算数组分配至所述至少一个存储区域中的基地址位置。例如：数组A若有100个元素，且对应的数据处理模式为WAVE32，则可分配任务标识分别为0、1、2、3的WAVE32的任务。In a possible implementation manner, step S20 may include: dividing the plurality of data to be operated into a plurality of data groups to be operated corresponding to task identifiers according to a data processing mode corresponding to the plurality of data to be operated. Wherein, the task identifier is used to map each of the plurality of data groups to be operated to a base address position allocated in the at least one storage area. For example: if the array A has 100 elements, and the corresponding data processing mode is WAVE32, then WAVE32 tasks with task identifiers 0, 1, 2, and 3 can be allocated.

继续参阅图2，步骤S30，将所述多个待运算数据组分配至所述至少一个存储区域中，得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址。其中，所述多个待运算数据组中的每个待运算数据组被分配到至少一个存储区域中。所述交互地址包括：存储区域行地址对齐偏移量、存储区域中间地址。所述存储区域行地址对齐偏移量用以表示所述多个待运算数据在预设的数据处理模式下在存储区域中的地址偏移情况。所述存储区域中间地址用以表示所述每个待运算数据在对应的存储区域中的中间地址。示例性地，上述交互地址中存储区域行地址对齐偏移量可位于高位、存储区域中间地址可位于低位。在一个示例中，上述预设的数据处理模式可为线程数较小的数据处理模式，例如：若开发人员欲使寄存器组兼容WAVE32、WAVE64、WAVE128，由于WAVE64可拆分为两个WAVE32，WAVE128可拆分为四个WAVE32，则可将WAVE32作为上述预设的数据处理模式，以使线程束较大的数据处理模式可向线程数较小的数据处理模式对齐。若WAVE32对应的中间地址的总位数为10位，WAVE128对应的中间地址的总位数是10位，WAVE128对应的预设位数（后文将予以详述）为2，则可选取WAVE128中间地址的最后两位作为上述存储区域行地址对齐偏移量。当然上述存储区域行地址对齐偏移量也可与上述最后两位的数值建立映射关系等，本公开实施例在此不做限制。Continue to refer to FIG. 2, step S30, assign the plurality of data groups to be operated to the at least one storage area, and obtain the corresponding interaction address. Wherein, each data group to be operated among the plurality of data groups to be operated is allocated to at least one storage area. The interactive address includes: the row address alignment offset of the storage area, and the middle address of the storage area. The row address alignment offset of the storage area is used to represent the address offset of the plurality of data to be operated in the storage area in a preset data processing mode. The intermediate address of the storage area is used to represent the intermediate address of each data to be calculated in the corresponding storage area. Exemplarily, the row address alignment offset of the storage area in the above-mentioned interaction address may be at a high position, and the middle address of the storage area may be at a low position. In one example, the above-mentioned preset data processing mode can be a data processing mode with a small number of threads. For example, if the developer wants to make the register set compatible with WAVE32, WAVE64, and WAVE128, since WAVE64 can be split into two WAVE32, WAVE128 It can be divided into four WAVE32, and WAVE32 can be used as the above-mentioned preset data processing mode, so that the data processing mode with a larger thread warp can be aligned with the data processing mode with a smaller number of threads. If the total number of bits of the middle address corresponding to WAVE32 is 10 bits, the total number of bits of the middle address corresponding to WAVE128 is 10 bits, and the preset number of bits corresponding to WAVE128 (will be described in detail later) is 2, then the middle address of WAVE128 can be selected The last two bits of the address are used as the row address alignment offset of the above storage area. Of course, the above storage area row address alignment offset may also establish a mapping relationship with the above last two digits, which is not limited in this embodiment of the present disclosure.

在一种可能的实施方式中，步骤S30中得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址，可包括：根据所述每个待运算数据对应的逻辑地址的基地址中自最低位预设位数的地址位作为所述每个待运算数据对应的交互地址中的存储区域行地址对齐偏移量，将剩余地址位左移所述预设位数后，加上逻辑偏移地址得到的和值作为所述每个待运算数据对应的交互地址中的存储区域中间地址。其中，所述逻辑偏移地址为每个待运算数据对应的逻辑地址的地址偏移量。示例性地，逻辑偏移地址用以表示不同待运算数据对应的逻辑偏移量，例如：第9个待运算数据相较于第0个待运算数据之间的逻辑偏移量为9。在一个示例中，上述逻辑地址、逻辑偏移地址可通过图形处理器中的编译器获取，在一个示例中，待运算指令可包括：运算类型、逻辑地址。在另一个示例中，逻辑地址可包括基地址、逻辑偏移量。在一个示例中，不同数据处理模式对应的预设位数不同。示例性地，若以WAVE32为预设的数据处理模式，则WAVE32的预设位数可为0，WAVE64的预设位数可为1，WAVE128的预设位数可为2。参阅图3所示，图3示出了根据本公开实施例提供的一种数据的处理方法的参考示意图，结合图3，此处以通用寄存器组的存储区域中n为0至3（可参考图中R_4n至R_4n+3、R0至R4，其中R为Register，寄存器的简写）时为例，首个线程实例所要处理的数据数组D[5]为{11，13，15，17，19}，则19为第4个参与运算的待运算数据，11为第0个参与运算的待运算数据（可参考图中第一数据处理模式中DW0=11，其中，DW为Double-Word的简写，译为双字，在本公开实施例中以每个寄存器Bank可存储8个双字（可参考图中DW0至DW7）为例，示例性地，每个双字可对应一个线程以进行访问），在Wave32数据处理模式（可参考图中第一数据处理模式）下，存储区域行地址对齐偏移量为1行，若不考虑哈希位处理，则待运算数据11存储在当前任务起始行中R0位置，待运算数据19（可参考图中第一数据处理模式中DW0=19）存储在下一行的R4位置。在Wave128数据处理模式（可参考图中第二数据处理模式）下，对应的逻辑偏移量为4行（可参考图中段0至段3），若不考虑哈希位处理，则待运算数据11存储在当前任务起始行中R0位置（可参考图中第二数据处理模式中DW0=11），待运算数据19存储在当前任务起始行增加逻辑偏移量4行的R4位置（可参考图中第二数据处理模式中DW0=19）。In a possible implementation manner, obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated in step S30 may include: The address bits from the lowest preset number of bits in the base address of the corresponding logical address are used as the storage area row address alignment offset in the interactive address corresponding to each data to be operated, and the remaining address bits are shifted left by the preset After the number of bits is set, the sum obtained by adding the logical offset address is used as the intermediate address of the storage area in the interactive address corresponding to each data to be operated. Wherein, the logical offset address is an address offset of a logical address corresponding to each data to be operated. Exemplarily, the logical offset address is used to represent the logical offset corresponding to different data to be operated, for example: the logical offset between the 9th data to be operated and the 0th data to be operated is 9. In an example, the above logical address and logical offset address may be obtained by a compiler in a graphics processor, and in an example, the instruction to be operated may include: an operation type and a logical address. In another example, a logical address may include a base address, a logical offset. In one example, different data processing modes correspond to different preset bits. For example, if WAVE32 is used as the default data processing mode, the preset number of digits for WAVE32 can be 0, the preset number of digits for WAVE64 can be 1, and the preset number of digits for WAVE128 can be 2. Referring to FIG. 3, FIG. 3 shows a reference schematic diagram of a data processing method provided according to an embodiment of the present disclosure. In combination with FIG. R_4n to R_4n+3, R0 to R4, where R is Register, shorthand for register) as an example, the data array D[5] to be processed by the first thread instance is {11, 13, 15, 17, 19}, Then 19 is the 4th data to be calculated, and 11 is the 0th data to be calculated (refer to DW0=11 in the first data processing mode in the figure, where DW is the abbreviation of Double-Word, translated It is a double word. In the embodiment of the present disclosure, each register Bank can store 8 double words (refer to DW0 to DW7 in the figure) as an example. Exemplarily, each double word can correspond to a thread for access), In the Wave32 data processing mode (refer to the first data processing mode in the figure), the row address alignment offset of the storage area is 1 row. If the hash bit processing is not considered, the data to be operated 11 is stored in the current task start row In the middle R0 position, the data to be calculated 19 (refer to DW0=19 in the first data processing mode in the figure) is stored in the R4 position of the next row. In the Wave128 data processing mode (refer to the second data processing mode in the figure), the corresponding logical offset is 4 lines (refer to segment 0 to segment 3 in the figure), if the hash bit processing is not considered, the data to be operated 11 is stored in the R0 position in the start line of the current task (refer to DW0=11 in the second data processing mode in the figure), and the data to be calculated 19 is stored in the R4 position of the current task start line with a logical offset of 4 lines (can be Refer to DW0=19 in the second data processing mode in the figure).

示例性地，本公开实施例在此提供了一种存储区域行地址对齐偏移量的生成方式以供参考，此处以预设的数据处理模式为WAVE32，多个待运算数据对应的数据处理模式为WAVE128，WAVE32对应的中间地址的总位数为10位，WAVE128对应的中间地址的总位数是10位（高位可能存在填0）为例，伪代码如下：存储区域行地址对齐偏移量base_align_mod4 =(wave_mode == wave32) ? 0 : (base & 0x3)，其中，base为待运算数据对应的当前任务在数据处理模式下的起始基行地址，可由一个地址生成模块进行决策生成，该地址配置模块可生成base即可，本公开实施例在此不做限制。承接上例，base & 0x3可得到WAVE128相较于WAVE32不对齐的两个低位做为存储区域行地址对齐偏移量，具体数值可结合开发人员的实际情况而后将存储区域行地址对齐偏移量移至交互地址的高位，即base_align_mod4<<= const_addr_bit，其中，const_addr_bit为WAVE32的中间地址总位数，在该示例中为10，即存储区域行地址对齐偏移量作为交互地址的第11位、第12位，换言之，在该示例中const_addr_bit的值等于预设的数据处理模式的中间地址总位数。而后将其他地址位（在该示例中为WAVE128的第2至第9位）对应的数值、逻辑偏移地址之和作为所述每个待运算地址对应的交互地址中的存储区域中间地址。在另一个示例中，交互地址还可包括哈希位（后文予以详述），则const_addr_bit的值等于预设的数据处理模式的中间地址总位数加上哈希位的位数。示例性地，若待运算数据对应的数据处理模式与预设的数据处理模式相同，则存储区域行地址对齐偏移量为0。示例性地，生成存储区域中间地址ac_addr1的伪代码如下：ac_addr1 = ((base >> base_shift_bit) << 2) + reg_off，其中，reg_off为指令逻辑地址转换后逻辑偏移地址，base_shift_bit为地址对齐移动偏差。示例性地，此处是以WAVE32的中间地址与WAVE128的中间地址填充位数（填充至高位，以对齐两个中间地址）差两位、预设的数据格式为WAVE32、欲实现WAVE32与WAVE128兼容为例，则生成地址对齐移动偏差base_shift_bit的伪代码为：base_shift_bit = (wave_mode == wave32) ? 0 : 2。示例性地，生成交互地址ac_addr2的伪代码如下：ac_addr2 = base_align_mod4 | ac_addr1。Exemplarily, the embodiment of the present disclosure provides a method for generating the alignment offset of the row address of the storage area for reference. Here, the preset data processing mode is WAVE32, and the data processing mode corresponding to multiple data to be operated is For WAVE128 and WAVE32, the total number of bits of the intermediate address is 10 bits, and the total number of bits of the intermediate address corresponding to WAVE128 is 10 bits (the high bits may be filled with 0). For example, the pseudo code is as follows: storage area row address alignment offset base_align_mod4 = (wave_mode == wave32) ? 0 : (base & 0x3), where base is the starting base row address of the current task corresponding to the data to be calculated in the data processing mode, which can be generated by an address generation module. It only needs that the address configuration module can generate the base, which is not limited in this embodiment of the present disclosure. Following the above example, base & 0x3 can get the two lower bits that are not aligned in WAVE128 compared to WAVE32 as the alignment offset of the storage area row address. The specific value can be combined with the actual situation of the developer and then the storage area row address alignment offset Move to the high bit of the interaction address, that is, base_align_mod4<<= const_addr_bit, where const_addr_bit is the total number of bits of the intermediate address of WAVE32, which is 10 in this example, that is, the row address alignment offset of the storage area is used as the 11th bit of the interaction address, The 12th bit, in other words, in this example, the value of const_addr_bit is equal to the total number of bits of the intermediate address in the preset data processing mode. Then, the sum of the value corresponding to other address bits (in this example, the 2nd to 9th bits of WAVE128) and the logical offset address is used as the intermediate address of the storage area in the interactive address corresponding to each address to be calculated. In another example, the interactive address may further include a hash bit (details will be described later), and the value of const_addr_bit is equal to the total number of bits of the intermediate address in the preset data processing mode plus the number of hash bits. Exemplarily, if the data processing mode corresponding to the data to be operated is the same as the preset data processing mode, the storage area row address alignment offset is 0. Exemplarily, the pseudocode for generating the intermediate address ac_addr1 of the storage area is as follows: ac_addr1 = ((base >> base_shift_bit) << 2) + reg_off, wherein, reg_off is the logical offset address after the logical address conversion of the instruction, and base_shift_bit is the address alignment shift deviation. Exemplarily, here is a two-digit difference between the middle address of WAVE32 and the middle address of WAVE128 (filling to the high bit to align the two middle addresses), the default data format is WAVE32, and it is intended to achieve compatibility between WAVE32 and WAVE128 For example, the pseudo code for generating the address alignment shift deviation base_shift_bit is: base_shift_bit = (wave_mode == wave32) ? 0 : 2. Exemplarily, the pseudo code for generating the interaction address ac_addr2 is as follows: ac_addr2 = base_align_mod4 | ac_addr1.

在一种可能的实施方式中，所述交互地址包括：存储区域行地址对齐偏移量、存储区域中间地址、哈希值，步骤S30中得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址，可包括：根据所述多个待运算数据组中每个待运算数据组对应的任务标识，生成所述每个待运算数据组对应的哈希值。其中，任务标识相邻的待运算数据组之间的哈希值不同。示例性地，上述哈希值可简单划分为奇偶标识，可用1比特（0标识偶或1标识奇）表示，可将任务标识取2的余数，即可得知任务标识的奇偶，可将0或1作为上述哈希值。在一个示例中，也可直接将任务标识最低位作为上述哈希值。上述任务标识的生成方式本公开实施例在此不做赘述，可在将多个待运算数据划分为不同待运算数据组时生成，可表示多个待运算数据组的计算顺序即可。当然在一个示例中，哈希值也可取任务标识的任意数值的余数，哈希值的位数也会对应变动，适应性更改上文中const_addr_bit的数值即可，本公开实施例在此不做限制，可由开发人员根据实际情况进行设定。其中，taskid为任务标识。而后根据所述每个待运算数据组对应的哈希值，生成所述每个待运算数据组中每个待运算数据对应的交互地址。其中，不同的哈希值对应的待运算数据组中待运算数据存入不同的物理存储区域。示例性地，哈希值的生成伪代码如下：taskid_hash =(taskid & 0x1)<< (const_addr_bit - 1)，其中由于增加哈希位导致const_addr_bit比无哈希位处理多1位。据此可得到的交互地址为：高位的存储区域行地址对齐偏移量、哈希值、低位的存储区域中间地址。示例性地，在交互地址ac_addr2由哈希值生成的情况下，则生成其的伪代码可如下：ac_addr2 = base_align_mod4 | taskid_hash | ac_addr1。In a possible implementation manner, the interaction address includes: storage area row address alignment offset, storage area intermediate address, and hash value. In step S30, each to-be-operated data group in the plurality of to-be-operated data groups is obtained The interaction address corresponding to each data group to be operated in the data group may include: generating a hash corresponding to each data group to be operated according to the task identifier corresponding to each data group to be operated in the plurality of data groups to be operated value. Wherein, the hash values between adjacent data groups to be operated by the task identifier are different. Exemplarily, the above hash value can be simply divided into parity identifiers, which can be represented by 1 bit (0 indicates even or 1 indicates odd), and the task identifier can be obtained by taking the remainder of 2 to know the parity of the task identifier. or 1 as the above hash. In an example, the lowest bit of the task identifier may also be directly used as the above hash value. The method of generating the above-mentioned task identifier will not be described in detail here in the embodiment of the present disclosure. It can be generated when multiple data to be calculated are divided into different data groups to be calculated, and can represent the calculation sequence of the multiple data groups to be calculated. Of course, in an example, the hash value can also take the remainder of any value of the task identifier, and the number of digits of the hash value will also change accordingly. Just change the value of const_addr_bit above adaptively, and the embodiment of the present disclosure does not limit it here , which can be set by the developer according to the actual situation. Among them, taskid is the task ID. Then, according to the hash value corresponding to each data group to be operated, an interaction address corresponding to each data to be operated in each data group to be operated is generated. Wherein, the data to be calculated in the data group to be calculated corresponding to different hash values are stored in different physical storage areas. Exemplarily, the pseudo-code for generating the hash value is as follows: taskid_hash = (taskid & 0x1) << (const_addr_bit - 1), where the const_addr_bit is 1 bit more than that without the hash bit due to the addition of the hash bit. According to this, the interactive address that can be obtained is: high-order storage area row address alignment offset, hash value, and low-order storage area middle address. Exemplarily, in the case that the interaction address ac_addr2 is generated by a hash value, the pseudo code for generating it may be as follows: ac_addr2 = base_align_mod4 | taskid_hash | ac_addr1.

参阅图4所示，图4示出了根据本公开实施例提供的一种数据的处理方法的参考示意图，结合图4，在本公开实施例中，不同的哈希值下的数据处理模式的存储方式不同，此处以预设数据处理模式为第二数据处理模式，寄存器组为通用数据寄存器组为例，第一数据处理模式占用通用数据寄存器组的四行（可参考图中段0至段3），图中的R0至R_2N-1（其中，R为Register的缩写，或称寄存器）为2N个寄存器，图中仅示出一个存储区域，应当理解的是存储区域的数量可任意设定，图中不予以限定。图中第一数据处理模式的偶任务、奇任务（可由上文哈希值予以表示）对应的存储方式交错，第二数据处理模式的偶任务、奇任务对应的存储方式交错，由于奇偶任务的存储Bank位置不同，故在读取阶段，奇偶任务也可同时对多个寄存器列（或称多个bank，同一个bank共用同一个通信通道）进行信息读取，有利于提高数据的处理速度。具体的交错方式本公开实施例在此不做限定，不同的哈希值对应不同的待运算数据的存储顺序即可。Referring to FIG. 4, FIG. 4 shows a reference schematic diagram of a data processing method provided according to an embodiment of the present disclosure. With reference to FIG. 4, in an embodiment of the present disclosure, the data processing modes under different hash values The storage methods are different. Here, the preset data processing mode is the second data processing mode, and the register group is the general data register group as an example. The first data processing mode occupies four rows of the general data register group (refer to section 0 to section 3 in the figure ), R0 to R_2N-1 in the figure (where R is the abbreviation of Register, or register) are 2N registers, and only one storage area is shown in the figure, it should be understood that the number of storage areas can be set arbitrarily, It is not limited in the figure. In the figure, the storage methods corresponding to even tasks and odd tasks (which can be represented by the hash value above) in the first data processing mode are interleaved, and the storage methods corresponding to even tasks and odd tasks in the second data processing mode are interleaved. The location of the storage bank is different, so in the reading stage, the parity task can also read information from multiple register columns (or multiple banks, the same bank shares the same communication channel) at the same time, which is conducive to improving the data processing speed. The specific interleaving manner is not limited in this embodiment of the present disclosure, as long as different hash values correspond to different storage sequences of the data to be calculated.

在一种可能的实施方式中，所述交互地址包括：存储区域行地址对齐偏移量、存储区域中间地址、哈希值，步骤S30中得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址，可包括：根据所述多个待运算数据组中每个待运算数据组对应的任务标识、所述待运算数据组对应的采样标识，生成所述每个待运算数据组对应的哈希值。其中，所述采样标识用以表示所述多个待运算数据组中每个待运算数据组针对处理的不同采样点。根据所述每个待运算数据组对应的哈希值，生成所述每个待运算数据组中每个待运算数据对应的交互地址。其中，不同的哈希值对应的待运算数据组中待运算数据存入不同的物理存储区域。示例性地，哈希值的生成伪代码如下：taskid_hash = （(taskid& 0x1) ^ (samplerid & 0x1) ）<< (const_addr_bit - 1)。其中，samplerid为上文所述的采样标识可由计算核心进行生成，可参考相关技术，本公开实施例在此不做赘述。本公开实施例提供的数据的处理方法不仅考虑到了不同任务标识对存储Bank位置的影响，还考虑到了采样标识对存储Bank位置的影响，有利于实现后续流水线对待运算数据访问时的均衡化，有利于降低访问冲突。In a possible implementation manner, the interaction address includes: storage area row address alignment offset, storage area intermediate address, and hash value. In step S30, each to-be-operated data group in the plurality of to-be-operated data groups is obtained The interaction address corresponding to each data to be calculated in the data group may include: generating the corresponding address according to the task identification corresponding to each data group to be calculated in the plurality of data groups to be calculated and the sampling identification corresponding to the data group to be calculated. Describe the hash value corresponding to each data group to be operated. Wherein, the sampling identifier is used to indicate a different sampling point for processing of each data group to be operated in the plurality of data groups to be operated. An interaction address corresponding to each data to be operated in each data group to be operated is generated according to the hash value corresponding to each data group to be operated. Wherein, the data to be calculated in the data group to be calculated corresponding to different hash values are stored in different physical storage areas. Exemplarily, the pseudo code for generating the hash value is as follows: taskid_hash = ((taskid& 0x1) ^ (samplerid & 0x1) )<< (const_addr_bit - 1). Wherein, samplerid is the above-mentioned sampling identifier that can be generated by the computing core, and related technologies can be referred to, and the embodiments of the present disclosure will not repeat it here. The data processing method provided by the embodiments of the present disclosure not only takes into account the influence of different task identifiers on the storage Bank location, but also takes into account the influence of the sampling identifier on the storage Bank location, which is conducive to the realization of equalization when the subsequent pipeline accesses the computing data. Helps reduce access conflicts.

在一种可能的实施方式中，所述寄存器组包括一个存储区域，所述基于所述交互地址获取所述寄存器组中存储的待运算数据，包括：根据每个待运算数据对应的交互地址，确定所述每个待运算数据在其对应的存储区域中的行地址以及列地址。其中，所述行地址、列地址用于表示所述存储区域中寄存器的位置。而后根据所述每个待运算数据在其对应的存储区域中的行地址以及列地址，访问其对应的寄存器，得到所述每个待运算数据。示例性地，在寄存区组只包括一个存储区域的情况下，每个交互地址对应了唯一的行地址、列地址，可直接通过行地址、列地址即访问寄存器组中对应的寄存器，以得到待运算数据。每个交互地址对应唯一的行地址、列地址即可，具体的生成对应关系可根据实际需求而定。In a possible implementation manner, the register group includes a storage area, and the obtaining the data to be operated in the register group based on the interaction address includes: according to the interaction address corresponding to each data to be operated, Determine the row address and column address of each data to be operated in its corresponding storage area. Wherein, the row address and the column address are used to indicate the position of the register in the storage area. Then, according to the row address and the column address of each data to be operated in its corresponding storage area, the corresponding register is accessed to obtain each data to be operated. Exemplarily, in the case where the register group includes only one storage area, each interactive address corresponds to a unique row address and column address, and the corresponding register in the register group can be accessed directly through the row address and column address to obtain Data to be processed. Each interaction address only needs to correspond to a unique row address and column address, and the specific generation corresponding relationship can be determined according to actual needs.

在一种可能的实施方式中，所述寄存器组包括多个存储区域。在一个示例中，步骤S200中基于所述交互地址获取所述寄存器组中存储的待运算数据，可包括：针对所述多个待运算数据中的每个待运算数据，根据所述多个待运算数据，确定所述每个待运算数据对应的交互地址、存储区域标识。其中，所述存储区域标识用以在将所述多个待运算数据组分配至多个存储区域中时，确定多个待运算数据组中每个待运算数据组对应的存储区域。示例性地，上述多个待运算数据组中每个待运算数据组可对应有存储区域标识，具体的分配规则本公开实施例在此不做限定，例如可将所述多个待运算数据组任意、根据每个存储区域的存储空间加权进行分配。在一个示例中，所述将所述多个待运算数据组分配至所述至少一个存储区域中，可包括：针对所述多个待运算数据组中的每个待运算数据组，将所述每个待运算数据组均分发送至所述每个存储区域。此处以均分、待运算数据的处理格式为WAVE32、存储区域共4个为例，则可将WAVE32中的32个线程（每个线程对应存储一个待运算数据）均分4份，每8个连续的线程为一个小的线程块并配有一个线程块编号，即0-7号线程对应的线程块存入存储区域0中，8-15号线程存入存储区域1中，16-23号线程存入存储区域2中，24-31号线程存入存储区域3中。示例性地，以共四个存储区域为例，所述存储区域标识为线程块编号的低两位，来表示存入对应的存储区域中。而后根据所述每个待运算数据对应的存储区域标识，确定所述每个待运算数据对应的存储区域。示例性地，上述存储区域标识与存储区域一一对应。再根据所述每个待运算数据对应的交互地址，确定所述每个待运算数据在其对应的存储区域中的行地址以及列地址。其中，所述行地址、列地址用于表示所述存储区域中寄存器的位置。最终根据所述每个待运算数据在其对应的存储区域中的行地址以及列地址，访问其对应的存储区域中的寄存器，得到所述每个待运算数据。示例性地，上述行地址、列地址分别用以指示需访问的寄存器在寄存器组中的行数、列数，以对其进行准确访问。In a possible implementation manner, the register set includes multiple storage areas. In an example, obtaining the data to be operated in the register group based on the interaction address in step S200 may include: for each data to be operated in the plurality of data to be operated, according to the plurality of data to be operated To calculate the data, determine the interaction address and storage area identifier corresponding to each data to be calculated. Wherein, the storage area identification is used to determine the storage area corresponding to each data group to be operated in the plurality of data groups to be operated when the plurality of data groups to be operated is allocated to the plurality of storage areas. Exemplarily, each of the above-mentioned plurality of data groups to be operated can be associated with a storage area identifier, and the specific distribution rules are not limited in this embodiment of the present disclosure, for example, the plurality of data groups to be operated can be Arbitrary, allocated according to the storage space weight of each storage area. In an example, the allocating the plurality of data groups to be operated into the at least one storage area may include: for each data group to be operated in the plurality of data groups to be operated, assigning the Each data group to be operated is equally distributed to each storage area. Here, taking the example of equal division, the processing format of the data to be calculated as WAVE32, and a total of 4 storage areas, the 32 threads in WAVE32 (each thread corresponds to storing a data to be calculated) can be divided into 4 parts, and each 8 Continuous threads are a small thread block and are equipped with a thread block number, that is, thread blocks corresponding to threads 0-7 are stored in storage area 0, threads 8-15 are stored in storage area 1, and threads 16-23 are stored in storage area 1. Threads are stored in storage area 2, and threads 24-31 are stored in storage area 3. Exemplarily, taking a total of four storage areas as an example, the storage areas are identified as the lower two bits of the thread block number to indicate that they are stored in corresponding storage areas. Then, according to the storage area identifier corresponding to each data to be operated, the storage area corresponding to each data to be operated is determined. Exemplarily, the above-mentioned storage area identifiers are in one-to-one correspondence with storage areas. Then, according to the interaction address corresponding to each data to be operated, the row address and the column address of each data to be operated in its corresponding storage area are determined. Wherein, the row address and the column address are used to indicate the position of the register in the storage area. Finally, according to the row address and the column address of each data to be operated in its corresponding storage area, the register in its corresponding storage area is accessed to obtain each data to be operated. Exemplarily, the above-mentioned row address and column address are respectively used to indicate the row number and the column number of the register to be accessed in the register set, so as to accurately access it.

在一种可能的实施方式中，步骤S30中得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址，可包括：生成所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址。根据所述待运算数据对应的线程号、预设的数据处理模式对应的线程总数，生成所述每个待运算数据对应的段数。其中，所述段数表示所述每个待运算数据以所述预设的数据处理模式下进行存储产生的段偏移。示例性地，每个待运算数据可对应一个线程，若该待运算数据对应的数据处理模式为WAVE32，则待运算数据可能对应的线程号为0至31，若为WAVE128，则可能对应的线程号为0至127。上述预设的数据处理模式对应的线程总数，若以WAVE32为例，则线程总数为32，若以WAVE128为例，则线程总数为128。可直接将线程号除以上述分段中线程总数，例如以线程号为30、线程总数为32为例，则对应的段数为0（30/32=0），例如以线程号66（例如该待运算数据对应的数据处理模式为WAVE128），线程总数为32为例，则对应的段数为2（66/32=2）。结合图4所示，第一数据处理模式在适配于预设的数据处理模式的情况下，产生了段偏移，此处以第一数据处理模式为WAVE128，预设的数据处理模式为WAVE32举例，WAVE128共有128个线程，WAVE32共有32个线程，故以WAVE32的数据格式对WAVE128的数据格式进行处理，则相当于一个WAVE128包括四个WAVE32的分段，若每行寄存器可存储一个WAVE32对应的待运算数据，则共需4行（或称4段），则WAVE128包括段数为0至3（可参考图中段0至段3）的四段待运算数据。本公开实施例通过设定段数的方式对该情况进行处理，以实现不同数据处理模式的待运算数据的兼容处理。In a possible implementation manner, obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated in step S30 may include: generating the plurality of data to be operated The interactive address corresponding to each data to be operated in each data group to be operated in the group. According to the thread number corresponding to the data to be operated and the total number of threads corresponding to the preset data processing mode, the number of segments corresponding to each data to be operated is generated. Wherein, the number of segments represents a segment offset generated when each data to be operated is stored in the preset data processing mode. Exemplarily, each data to be calculated may correspond to a thread. If the data processing mode corresponding to the data to be calculated is WAVE32, the thread number corresponding to the data to be calculated may be 0 to 31. If it is WAVE128, the corresponding thread number may be The number is 0 to 127. The total number of threads corresponding to the above preset data processing modes, if WAVE32 is taken as an example, the total number of threads is 32, and if WAVE128 is taken as an example, the total number of threads is 128. You can directly divide the thread number by the total number of threads in the above segment. For example, if the thread number is 30 and the total number of threads is 32, the corresponding segment number is 0 (30/32=0). For example, if the thread number is 66 (for example, the The data processing mode corresponding to the data to be calculated is WAVE128), and the total number of threads is 32, for example, the corresponding number of segments is 2 (66/32=2). As shown in Figure 4, the first data processing mode generates a segment offset when it is adapted to the preset data processing mode. Here, the first data processing mode is WAVE128, and the default data processing mode is WAVE32. , WAVE128 has a total of 128 threads, and WAVE32 has a total of 32 threads. Therefore, the data format of WAVE128 is processed in the data format of WAVE32, which is equivalent to a WAVE128 including four WAVE32 segments. If each row of registers can store a WAVE32 corresponding The data to be calculated requires a total of 4 lines (or 4 segments), and WAVE128 includes four segments of data to be calculated with the number of segments ranging from 0 to 3 (refer to segment 0 to segment 3 in the figure). The embodiment of the present disclosure handles this situation by setting the number of segments, so as to realize compatible processing of data to be operated in different data processing modes.

示例性地，先对交互地址进行处理，得到physical_addr1，伪代码如下：physical_addr1 = ac_addr2 + burst_offset，burst_offset为相关技术中repeat指令或burst指令对应的累加值。示例性地，流水线可得到上述physical_addr1，而后对其进行处理，即可得到访问待运算数据所需的参数。示例性地，伪代码如下：base_align_mod4 = (physical_addr1 >> const_addr_bit ) & 0x3、taskid_hash = ((physical_addr1 >> (const_addr_bit - 1)) & 0x1) << 1、segmentid_hash = (4 - segmentid)、physical_addr2 =physical_addr1 & (0x1 << (const_addr_bit - 1)) - 0x1。其中，segmentid为段数，segment_num为Wave中段数的总数，segmentid_hash（或称段数哈希）的效果与上文中的哈希值的作用类似，可提高流水线在访问时的均衡性，即有利于降低访问冲突的发生概率。在一个示例中，所述根据所述每个待运算数据对应的交互地址，确定所述每个待运算数据在其对应的存储区域中的行地址以及列地址，包括：根据所述每个待运算数据对应的交互地址、段数，生成所述每个待运算数据在其对应的存储区域中的行地址。示例性地，行地址line_addr的生成伪代码如下：line_addr = ((physical_addr2 >> 2) << base_shift_bit) + segmentid + base_align_mod4。而后根据所述每个待运算数据对应的交互地址、段数、哈希值，生成所述每个待运算数据在其对应的存储区域中的列地址、或根据所述每个待运算数据对应的交互地址、段数，生成所述每个待运算数据在其对应的存储区域中的列地址。示例性地，列地址bank_addr的生成伪代码如下：bank_addr = (taskid_hash +segmentid_hash + (physical_addr2 & 0x3)) % bank_num。其中，bank_num为一个存储区域中的bank的总数。在一个示例中，为了更好地支持流水线的运行，可将读取待运算数据的操作置于流水线内部，当流水线准备执行待运算指令时，可先读取待运算数据，再进行运算。上述段数可更好地指示线程数高于预设的数据处理模式的数据处理模式的访问，以兼容读取不同数据处理模式的待运算数据。在另一个示例中，在不同的流水线向同一个寄存器组的端口进行同时读取或同时写入操作时，可根据流水线自身的优先级进行仲裁决定，在一个示例中，读取操作和写入操作也可分开进行仲裁，上述优先级和具体的仲裁内容本公开实施例在此不做限定，开发人员可根据实际情况进行设定。本公开实施例在流水线最终访问寄存器组时，可基于physical_addr1移除不对齐的部分，而后分别分离出存储区域行地址对齐偏移量、哈希值、存储区域中间地址。物理地址physical_addr2可为存储区域中间地址，存储区域中间地址右移再左移需要偏移的位数做对齐，然后加上段数、哈希值、存储区域行地址对齐偏移量，得到最终访问的行地址。可将哈希值、段数哈希、physical_addr2的base_shift_bit数量的低位求和，按照段数的总数取余，得到上述列地址。本公开实施例中每条流水线上取回来的待运算数据、待运算指令可按向量操作形式动态循环执行多次，资源利用率更高。在硬件实现上，每个计算核心内部可包括WAVE的调度和指令调度执行单元，以实现不同WAVE的兼容管理，可根据WAVE的执行粒度和执行状态进行调度，可提升调度效率，也有利于提高各流水线的利用率。Exemplarily, the interaction address is processed first to obtain physical_addr1, the pseudocode is as follows: physical_addr1 = ac_addr2 + burst_offset, burst_offset is the accumulated value corresponding to the repeat instruction or the burst instruction in the related art. Exemplarily, the pipeline can obtain the above physical_addr1, and then process it to obtain the parameters needed to access the data to be operated. Exemplarily, the pseudocode is as follows: base_align_mod4 = (physical_addr1 >> const_addr_bit ) & 0x3, taskid_hash = ((physical_addr1 >> (const_addr_bit - 1)) & 0x1) << 1, segmentid_hash = (4 - segmentid), physical_addr2 = physical_addr1 & (0x1 << (const_addr_bit - 1)) - 0x1. Among them, segmentid is the number of segments, segment_num is the total number of segments in Wave, and the effect of segmentid_hash (or segment number hash) is similar to that of the hash value above, which can improve the balance of pipeline access, that is, it is beneficial to reduce access probability of conflict. In an example, the determining the row address and column address of each data to be operated in its corresponding storage area according to the interaction address corresponding to each data to be operated includes: The interactive address and segment number corresponding to the data are calculated, and the row address of each data to be calculated in its corresponding storage area is generated. Exemplarily, the pseudo code for generating the line address line_addr is as follows: line_addr = ((physical_addr2 >> 2) << base_shift_bit) + segmentid + base_align_mod4. Then, according to the interaction address, segment number, and hash value corresponding to each data to be operated, the column address of each data to be operated in its corresponding storage area is generated, or according to the address corresponding to each data to be operated By interacting with the address and the number of segments, the column address of each data to be operated in its corresponding storage area is generated. Exemplarily, the pseudo code for generating the column address bank_addr is as follows: bank_addr = (taskid_hash + segmentid_hash + (physical_addr2 & 0x3)) % bank_num. Wherein, bank_num is the total number of banks in a storage area. In one example, in order to better support the operation of the pipeline, the operation of reading the data to be calculated can be placed inside the pipeline. When the pipeline is ready to execute the instruction to be calculated, the data to be calculated can be read first, and then the calculation can be performed. The number of segments above can better indicate the access of the data processing mode whose thread number is higher than the preset data processing mode, so as to be compatible with reading data to be operated in different data processing modes. In another example, when different pipelines perform simultaneous read or write operations to the same register bank port, the arbitration decision can be made according to the priority of the pipeline itself. In one example, the read operation and the write operation Operations can also be arbitrated separately, and the above-mentioned priorities and specific arbitration content are not limited in this embodiment of the present disclosure, and developers can set them according to actual conditions. In the embodiment of the present disclosure, when the pipeline finally accesses the register group, the unaligned part can be removed based on physical_addr1, and then the storage area row address alignment offset, hash value, and storage area intermediate address can be separated respectively. The physical address physical_addr2 can be the middle address of the storage area. The middle address of the storage area is shifted to the right and then to the left to be aligned by the number of digits that need to be offset, and then the segment number, hash value, and alignment offset of the row address of the storage area are added to obtain the final access row address. The above-mentioned column address can be obtained by summing the hash value, the segment number hash, and the lower bits of the base_shift_bit number of physical_addr2, and taking the remainder according to the total number of segments. In the embodiments of the present disclosure, the data to be calculated and the instructions to be calculated on each pipeline can be dynamically and cyclically executed multiple times in the form of vector operations, and the resource utilization rate is higher. In terms of hardware implementation, each computing core can include WAVE scheduling and instruction scheduling execution units to achieve compatible management of different WAVEs. Scheduling can be performed according to the execution granularity and execution status of WAVEs, which can improve scheduling efficiency and improve Utilization of each pipeline.

本公开实施例提供的数据的处理方法，每组计算核心可直接兼容WAVE32的数据处理模式，也可循环两次兼容WAVE64、循环四次兼容WAVE128模式。使得图形处理器中的图形渲染管线可以使用WAVE32模式执行相关技术中的生产者-消费者模型，也可兼容多种模式WAVE同时处理，根据具体的应用场景，可动态配置、组装WAVE进行数据的处理。In the data processing method provided by the embodiments of the present disclosure, each group of computing cores can be directly compatible with the WAVE32 data processing mode, and can also be compatible with the WAVE64 mode twice, and compatible with the WAVE128 mode four times. The graphics rendering pipeline in the graphics processor can use the WAVE32 mode to execute the producer-consumer model in related technologies, and it is also compatible with multiple modes of WAVE processing at the same time. According to specific application scenarios, WAVE can be dynamically configured and assembled for data processing. deal with.

可以理解，本公开提及的上述各个方法实施例，在不违背原理逻辑的情况下，均可以彼此相互结合形成结合后的实施例，限于篇幅，本公开不再赘述。本领域技术人员可以理解，在具体实施方式的上述方法中，各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。It can be understood that the above-mentioned method embodiments mentioned in this disclosure can all be combined with each other to form a combined embodiment without violating the principle and logic. Due to space limitations, this disclosure will not repeat them. Those skilled in the art can understand that, in the above method in the specific implementation manner, the specific execution order of each step should be determined according to its function and possible internal logic.

此外，本公开还提供了电子设备、计算机可读可写数据存储介质、程序，上述均可用来实现本公开提供的任一种数据的处理方法，相应技术方案和描述和参见方法部分的相应记载，不再赘述。In addition, this disclosure also provides electronic equipment, computer-readable and writable data storage media, and programs, all of which can be used to implement any data processing method provided by this disclosure, and refer to the corresponding technical solutions and descriptions in the method section. ,No longer.

参阅图5所示，图5示出了根据本公开实施例提供的一种图形处理器的框图，结合图5，所述图形处理器100包括：计算核心110，与所述计算核心相连的寄存器组120。在一个示例中，所述图形处理器100可与主机相连，所述计算核心，用以接收待运算指令；根据所述待运算指令确定运算方式以及待运算数据对应的交互地址，并基于所述交互地址获取所述寄存器组中存储的待运算数据；其中，所述交互地址包括：存储区域行地址对齐偏移量、存储区域中间地址；所述存储区域行地址对齐偏移量用以表示所述待运算数据在预设的数据处理模式下在存储区域中的地址偏移情况；所述存储区域中间地址用以表示所述待运算数据在对应的存储区域中的中间地址；根据所述待运算数据、所述运算方式，生成运算结果。Referring to FIG. 5, FIG. 5 shows a block diagram of a graphics processor provided according to an embodiment of the present disclosure. With reference to FIG. 5, the graphics processor 100 includes: a computing core 110, registers connected to the computing core Group 120. In one example, the graphics processor 100 may be connected to a host, and the computing core is used to receive instructions to be operated; determine the operation mode and the interactive address corresponding to the data to be operated according to the instructions to be operated, and based on the The interactive address obtains the data to be operated stored in the register group; wherein, the interactive address includes: storage area row address alignment offset, storage area intermediate address; the storage area row address alignment offset is used to represent the The address offset of the data to be calculated in the storage area in the preset data processing mode; the intermediate address of the storage area is used to indicate the intermediate address of the data to be calculated in the corresponding storage area; according to the data to be calculated Computing the data and the computing method to generate a computing result.

在一种可能的实施方式中，所述寄存器组包括至少一个存储区域，所述计算核心还用以：接收多个待运算数据；根据所述多个待运算数据对应的数据处理模式，将所述多个待运算数据划分为多个待运算数据组；其中，不同的数据处理模式对应的待运算数据组中的待运算数据量不同；将所述多个待运算数据组分配至所述至少一个存储区域中，得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址；其中，所述多个待运算数据组中的每个待运算数据组被分配到至少一个存储区域中。In a possible implementation manner, the register set includes at least one storage area, and the computing core is further configured to: receive a plurality of data to be operated; The plurality of data to be operated is divided into a plurality of data groups to be operated; wherein, the amount of data to be operated in the data groups to be operated corresponding to different data processing modes is different; the plurality of data groups to be operated is allocated to the at least In a storage area, an interactive address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated is obtained; wherein, each data group to be operated in the plurality of data groups to be operated are allocated in at least one storage area.

在一种可能的实施方式中，所述得到所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址，包括：生成所述多个待运算数据组中每个待运算数据组中每个待运算数据对应的交互地址；根据所述待运算数据对应的线程号、预设的数据处理模式对应的分段中线程总数，生成所述每个待运算数据对应的段数；其中，所述段数表示所述每个待运算数据以所述预设的数据处理模式下进行存储产生的段偏移；所述根据所述每个待运算数据对应的交互地址，确定所述每个待运算数据在其对应的存储区域中的行地址以及列地址，包括：根据所述每个待运算数据对应的交互地址、段数，生成所述每个待运算数据在其对应的存储区域中的行地址；根据所述每个待运算数据对应的交互地址、段数、哈希值，生成所述每个待运算数据在其对应的存储区域中的列地址、或根据所述每个待运算数据对应的交互地址、段数，生成所述每个待运算数据在其对应的存储区域中的列地址。In a possible implementation manner, the obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated includes: generating an address in the plurality of data groups to be operated The interactive address corresponding to each data to be calculated in each data group to be calculated; generate each data to be calculated according to the thread number corresponding to the data to be calculated and the total number of threads in the segment corresponding to the preset data processing mode Corresponding segment number; wherein, the segment number represents the segment offset generated by storing each data to be operated in the preset data processing mode; according to the interactive address corresponding to each data to be operated, Determining the row address and column address of each data to be calculated in its corresponding storage area includes: generating the data to be calculated in its corresponding the row address in the storage area of each data to be operated; generate the column address of each data to be operated in its corresponding storage area according to the interactive address, the number of segments, and the hash value corresponding to each data to be operated; or according to the The interactive address and segment number corresponding to each data to be operated are used to generate the column address of each data to be operated in its corresponding storage area.

该方法与计算机系统的内部结构存在特定技术关联，且能够解决如何提升硬件运算效率或执行效果的技术问题（包括减少数据存储量、减少数据传输量、提高硬件处理速度等），从而获得符合自然规律的计算机系统内部性能改进的技术效果。This method has a specific technical relationship with the internal structure of the computer system, and can solve the technical problems of how to improve the hardware computing efficiency or execution effect (including reducing the amount of data storage, reducing the amount of data transmission, increasing the processing speed of the hardware, etc.), so as to obtain a natural The technical effect of regular computer system internal performance improvements.

在一些实施例中，本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法，其具体实现可以参照上文方法实施例的描述，为了简洁，这里不再赘述。In some embodiments, the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.

本公开实施例还提出一种计算机可读可写数据存储介质，其上存储有计算机程序指令、待处理数据，所述计算机程序指令被处理器执行时实现上述方法。计算机可读可写数据存储介质可以是易失性或非易失性计算机可读可写数据存储介质。Embodiments of the present disclosure also propose a computer-readable and writable data storage medium, on which computer program instructions and data to be processed are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor. The computer readable and writable data storage medium may be a volatile or nonvolatile computer readable and writable data storage medium.

本公开实施例还提出一种电子设备，包括：处理器；用于存储处理器可执行指令的存储器；其中，所述处理器被配置为调用所述存储器存储的指令，以执行上述方法。An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.

本公开实施例还提供了一种计算机程序产品，包括计算机可读代码，或者承载有计算机可读代码的非易失性计算机可读可写数据存储介质，当所述计算机可读代码在电子设备的处理器中运行时，所述电子设备中的处理器执行上述方法。An embodiment of the present disclosure also provides a computer program product, including computer-readable codes, or a non-volatile computer-readable and writable data storage medium bearing computer-readable codes, when the computer-readable codes are stored in an electronic device When running in the processor of the electronic device, the processor in the electronic device executes the above method.

电子设备可以被提供为终端设备、服务器或其它形态的设备。Electronic devices may be provided as terminal devices, servers, or other forms of devices.

参阅图6所示，图6示出了根据本公开实施例提供的一种电子设备1900的框图。例如，电子设备1900可以被提供为一服务器或终端设备。参照图6，电子设备1900包括处理组件1922，其进一步包括一个或多个处理器，以及由存储器1932所代表的存储器资源，用于存储可由处理组件1922的执行的指令，例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外，处理组件1922被配置为执行指令，以执行上述方法。Referring to FIG. 6 , FIG. 6 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure. For example, the electronic device 1900 may be provided as a server or terminal device. Referring to FIG. 6 , electronic device 1900 includes processing component 1922 , which further includes one or more processors, and a memory resource represented by memory 1932 for storing instructions executable by processing component 1922 , such as application programs. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 1922 is configured to execute instructions to perform the above method.

电子设备1900还可以包括一个电源组件1926被配置为执行电子设备1900的电源管理，一个有线或无线网络接口1950被配置为将电子设备1900连接到网络，和一个输入输出接口1958。电子设备1900可以操作基于存储在存储器1932的操作系统，例如微软服务器操作系统（Windows Server^TM），苹果公司推出的基于图形用户界面操作系统(Mac OS X^TM)，多用户多进程的计算机操作系统（Unix^TM）, 自由和开放原代码的类Unix操作系统（Linux^TM），开放原代码的类Unix操作系统（FreeBSD^TM）或类似。The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900 , a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input and output interface 1958 . The electronic device 1900 can operate based on the operating system stored in the memory 1932, such as the Microsoft server operating system (Windows Server ^TM ), the operating system based on the graphical user interface (Mac OS X ^TM ) introduced by Apple Inc., and the multi-user and multi-process computer operating system (Unix ^TM ), a free and open-source Unix-like operating system (Linux ^TM ), an open-source Unix-like operating system (FreeBSD ^TM ), or similar.

在示例性实施例中，还提供了一种非易失性计算机可读可写数据存储介质，例如包括计算机程序指令的存储器1932，上述计算机程序指令可由电子设备1900的处理组件1922执行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable and writable data storage medium, such as a memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to perform the above-mentioned method.

本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读可写数据存储介质，其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。The present disclosure can be a system, method and/or computer program product. The computer program product may include a computer readable and writable data storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.

计算机可读可写数据存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读可写数据存储介质例如可以是（但不限于）电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读可写数据存储介质的更具体的例子（非穷举的列表）包括：便携式计算机盘、硬盘、随机存取存储器（RAM）、只读存储器（ROM）、可擦式可编程只读存储器（EPROM或闪存）、静态随机存取存储器（SRAM）、便携式压缩盘只读存储器（CD-ROM）、数字多功能盘（DVD）、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读可写数据存储介质不被解释为瞬时信号本身，诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波（例如，通过光纤电缆的光脉冲）、或者通过电线传输的电信号。A computer readable and writable data storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. A computer-readable and writable data storage medium may be, for example (but not limited to), an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (a non-exhaustive list) of computer readable and writable data storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only Memories (EPROM or Flash), Static Random Access Memory (SRAM), Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disks (DVD), Memory Sticks, Floppy Disks, Mechanically encoded devices, e.g. Punch cards or raised-in-groove structures of instruction, and any suitable combination of the foregoing. As used herein, computer readable and writable data storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), Or an electrical signal transmitted over a wire.

这里所描述的计算机可读程序指令可以从计算机可读可写数据存储介质下载到各个计算/处理设备，或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令，并转发该计算机可读程序指令，以供存储在各个计算/处理设备中的计算机可读可写数据存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable and writable data storage medium to each computing/processing device, or downloaded to an external computer or external storage device through a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. . The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for computer-readable and writable data stored in each computing/processing device in the storage medium.

用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构（ISA）指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码，所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等，以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机（例如利用因特网服务提供商来通过因特网连接）。在一些实施例中，通过利用计算机可读程序指令的状态信息来个性化定制电子电路，例如可编程逻辑电路、现场可编程门阵列（FPGA）或可编程逻辑阵列（PLA），该电子电路可以执行计算机可读程序指令，从而实现本公开的各个方面。Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider). connect). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs) or programmable logic arrays (PLAs), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the present disclosure are implemented by executing computer readable program instructions.

这里参照根据本公开实施例的方法、装置（系统）和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解，流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合，都可以由计算机可读程序指令实现。Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器，从而生产出一种机器，使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时，产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读可写数据存储介质中，这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作，从而，存储有指令的计算机可读介质则包括一个制造品，其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable and writable data storage medium, and these instructions make computers, programmable data processing devices and/or other devices work in a specific way, so that the computer-readable The medium then includes an article of manufacture including instructions for implementing various aspects of the functions/acts specified in one or more blocks in the flowcharts and/or block diagrams.

也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上，使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤，以产生计算机实现的过程，从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分，所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.

该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中，所述计算机程序产品具体体现为计算机存储介质，在另一个可选实施例中，计算机程序产品具体体现为软件产品，例如软件开发包(Software Development Kit，SDK)等等。The computer program product can be specifically realized by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium. In another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. wait.

上文对各个实施例的描述倾向于强调各个实施例之间的不同之处，其相同或相似之处可以互相参考，为了简洁，本文不再赘述。The above descriptions of the various embodiments tend to emphasize the differences between the various embodiments, the same or similar points can be referred to each other, and for the sake of brevity, details are not repeated herein.

本领域技术人员可以理解，在具体实施方式的上述方法中，各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定，各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.

若本申请技术方案涉及个人信息，应用本申请技术方案的产品在处理个人信息前，已明确告知个人信息处理规则，并取得个人自主同意。若本申请技术方案涉及敏感个人信息，应用本申请技术方案的产品在处理敏感个人信息前，已取得个人单独同意，并且同时满足“明示同意”的要求。例如，在摄像头等个人信息采集装置处，设置明确显著的标识告知已进入个人信息采集范围，将会对个人信息进行采集，若个人自愿进入采集范围即视为同意对其个人信息进行采集；或者在个人信息处理的装置上，利用明显的标识/信息告知个人信息处理规则的情况下，通过弹窗信息或请个人自行上传其个人信息等方式获得个人授权；其中，个人信息处理规则可包括个人信息处理者、个人信息处理目的、处理方式以及处理的个人信息种类等信息。If the technical solution of this application involves personal information, the product applying the technical solution of this application has clearly notified the personal information processing rules and obtained the individual's independent consent before processing personal information. If the technical solution of this application involves sensitive personal information, the products applying the technical solution of this application have obtained individual consent before processing sensitive personal information, and at the same time meet the requirements of "express consent". For example, at a personal information collection device such as a camera, a clear and prominent sign is set up to inform that it has entered the scope of personal information collection, and personal information will be collected. If an individual voluntarily enters the collection scope, it is deemed to agree to the collection of his personal information; or On the personal information processing device, when the personal information processing rules are informed with obvious signs/information, personal authorization is obtained through pop-up information or by asking individuals to upload their personal information; among them, the personal information processing rules may include Information such as the information processor, the purpose of personal information processing, the method of processing, and the type of personal information processed.

以上已经描述了本公开的各实施例，上述说明是示例性的，并非穷尽性的，并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进，或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。Having described various embodiments of the present disclosure above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims

1. A data processing method, characterized in that it is applied to a graphics processor, and the graphics processor includes a computing core and a register group, and the computing core is used to store data to be calculated in different data processing modes in the registers The logical addresses in the group are converted into interactive addresses in a preset data processing mode; the processing method includes:

Receive pending instructions;

According to the instruction to be operated, determine the operation mode and the interaction address corresponding to the data to be operated, and obtain the data to be operated stored in the register group based on the interaction address; wherein, the interaction address includes: storage area row address alignment Offset, intermediate address of the storage area; the row address alignment offset of the storage area is used to represent the address offset of the data to be operated in the storage area in the preset data processing mode; the storage The middle address of the area is used to indicate the middle address of the data to be calculated in the corresponding storage area;

generating a calculation result according to the data to be calculated and the calculation method;

Wherein, the row address alignment offset of the storage area is the address bit from the lowest preset number of digits in the base address of the logical address corresponding to the data to be operated, and the middle address of the storage area is to shift the remaining address bits to the left A sum value obtained by adding a logical offset address to the preset number of bits, and the logical offset address is an address offset of the logical address corresponding to the data to be operated.

2. The processing method according to claim 1, wherein the register set comprises at least one storage area, and the processing method further comprises:

Receive multiple data to be calculated;

According to the data processing mode corresponding to the plurality of data to be operated, divide the plurality of data to be operated into a plurality of data groups to be operated; wherein, the amount of data to be operated in the data group to be operated corresponding to different data processing modes different;

Allocating the plurality of data groups to be operated into the at least one storage area to obtain an interactive address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated; wherein, the Each data group to be operated among the plurality of data groups to be operated is allocated to at least one storage area.

3. The processing method according to claim 1, wherein, according to the data processing mode corresponding to the plurality of data to be operated, the plurality of data to be operated is divided into a plurality of data groups to be operated, including :

According to the data processing mode corresponding to the plurality of data to be operated, divide the plurality of data to be operated into a plurality of data groups to be operated corresponding to task identifiers; wherein, the task identifier is used to map the plurality of data to be operated Each array to be operated in the operation data group is allocated to a base address position in the at least one storage area.

4. The processing method according to claim 3, wherein the interactive address comprises: storage area row address alignment offset, storage area intermediate address, hash value, and the obtaining of the plurality of data to be operated The interactive address corresponding to each data to be operated in each data group to be operated in the group, including:

According to the task identification corresponding to each data group to be operated in the plurality of data groups to be operated, a hash value corresponding to each data group to be operated is generated; wherein, the task identification is between adjacent data groups to be operated different hash values;

According to the hash value corresponding to each data group to be operated, an interaction address corresponding to each data group to be operated in each data group to be operated is generated; wherein, different hash values correspond to the data group to be operated Operational data is stored in different physical storage areas.

5. The processing method according to claim 3, wherein the interactive address comprises: storage area row address alignment offset, storage area intermediate address, hash value, and the obtaining of the plurality of data to be operated The interactive address corresponding to each data to be operated in each data group to be operated in the group, including:

According to the task identification corresponding to each data group to be operated in the plurality of data groups to be operated, and the sampling identification corresponding to the data group to be operated, generate the hash value corresponding to each data group to be operated; wherein, the The sampling identifier is used to indicate a different sampling point for each data group to be processed in the plurality of data groups to be processed;

6. The processing method according to claim 1, wherein the register set includes a storage area, and the acquisition of the data to be operated stored in the register set based on the interaction address comprises:

According to the interactive address corresponding to each data to be operated, determine the row address and column address of each data to be operated in its corresponding storage area; wherein, the row address and column address are used to indicate the the location of the register;

According to the row address and column address of each data to be operated in its corresponding storage area, access its corresponding register to obtain each data to be operated.

7. The processing method according to claim 1, wherein the register set includes a plurality of storage areas, and the acquisition of the data to be operated stored in the register set based on the interaction address comprises:

For each data to be operated in the plurality of data to be operated, according to the plurality of data to be operated, determine an interaction address and a storage area identifier corresponding to each data to be operated; wherein, the storage area identification is used When allocating the plurality of data groups to be operated into a plurality of storage areas, determine the storage area corresponding to each data group to be operated in the plurality of data groups to be operated;

Determine the storage area corresponding to each data to be calculated according to the storage area identifier corresponding to each data to be calculated;

According to the interactive address corresponding to each data to be operated, determine the row address and column address of each data to be operated in its corresponding storage area; wherein, the row address and column address are used to represent the storage the location of the register in the region;

According to the row address and column address of each data to be operated in its corresponding storage area, access the register in its corresponding storage area to obtain each data to be operated.

8. The processing method according to claim 2, wherein said obtaining the interactive address corresponding to each data to be operated in each data group to be operated in said plurality of data groups to be operated comprises:

Generate an interaction address corresponding to each data to be operated in each data group to be operated among the plurality of data groups to be operated;

According to the thread number corresponding to the data to be calculated and the total number of threads corresponding to the preset data processing mode, the number of segments corresponding to each data to be calculated is generated; wherein, the number of segments indicates that each data to be calculated is in the The segment offset generated by storage in the preset data processing mode;

The determining the row address and column address of each data to be calculated in its corresponding storage area according to the interaction address corresponding to each data to be calculated includes:

Generate a row address of each data to be calculated in its corresponding storage area according to the interactive address and segment number corresponding to each data to be calculated;

According to the interaction address, segment number, and hash value corresponding to each data to be operated, generate the column address of each data to be operated in its corresponding storage area, or according to the interaction corresponding to each data to be operated The address and the number of segments generate the column address of each data to be operated in its corresponding storage area.

9. The processing method according to claim 2, wherein said distributing said plurality of data groups to be operated into said at least one storage area comprises: for said plurality of data groups to be operated For each data group to be operated, the data group to be operated is equally distributed to each storage area.

10. The processing method according to claim 1, wherein said obtaining the data to be calculated based on said interactive address and stored in said register bank comprises: passing through a plurality of pipelines of said graphics processor, based on said interactive address Obtain the data to be calculated stored in the register group through the interactive address;

The generating an operation result according to the data to be operated and the operation method includes:

Through the plurality of pipelines of the graphics processor, according to the data to be operated and the operation mode, an operation result corresponding to each data to be operated is generated.

11. The processing method according to claim 10, characterized in that, the plurality of pipelines of the graphics processor generates an operation corresponding to each data to be operated according to the data to be operated and the operation mode Results, including:

In the case that there are different pipelines in the multiple pipelines of the graphics processor simultaneously accessing the target port of the register group, arbitration is performed according to the priorities of the different pipelines and/or the priorities of the read and write operations to determine the The target pipeline corresponding to the target port;

Through the target pipeline and other pipelines in the different pipelines in sequence, according to the data to be calculated and the calculation method, a calculation result corresponding to each data to be calculated is generated.

12. A graphics processor, characterized in that the graphics processor comprises: a computing core, a register set connected to the computing core, and the computing core is used to transfer data to be calculated in different data processing modes in the The logical address in the register bank is converted into the interactive address in the preset data processing mode;

The calculation core is also used to receive the instruction to be operated; according to the instruction to be operated, determine the operation mode and the interaction address corresponding to the data to be operated, and obtain the data to be operated stored in the register group based on the interaction address; Wherein, the interactive address includes: storage area row address alignment offset, storage area intermediate address; the storage area row address alignment offset is used to indicate that the data to be operated is in the preset data processing mode address offset in the storage area; the intermediate address of the storage area is used to represent the intermediate address of the data to be calculated in the corresponding storage area; according to the data to be calculated and the calculation method, an operation result is generated;

13. An electronic device, comprising: a host, and the graphics processor according to claim 12.

14. A computer-readable and writable data storage medium, on which computer program instructions and data to be calculated are stored, characterized in that, when the computer program instructions are executed by a processor, the computer program instructions in any one of claims 1 to 11 are implemented. The data processing method described above.