CN114116553B - Data processing device, method and system - Google Patents
Data processing device, method and system Download PDFInfo
- Publication number
- CN114116553B CN114116553B CN202111445101.6A CN202111445101A CN114116553B CN 114116553 B CN114116553 B CN 114116553B CN 202111445101 A CN202111445101 A CN 202111445101A CN 114116553 B CN114116553 B CN 114116553B
- Authority
- CN
- China
- Prior art keywords
- access request
- transmission
- data
- transmission access
- data processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000005540 biological transmission Effects 0.000 claims abstract description 271
- 230000004044 response Effects 0.000 claims abstract description 23
- 230000008569 process Effects 0.000 claims description 16
- 238000003672 processing method Methods 0.000 claims description 11
- 238000011022 operating instruction Methods 0.000 claims 1
- 230000002093 peripheral effect Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 6
- 230000003139 buffering effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 239000013256 coordination polymer Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
- G06F13/1673—Details of memory controller using buffers
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bus Control (AREA)
Abstract
Description
技术领域technical field
本公开的实施例涉及一种数据处理装置、数据处理方法以及数据处理系统。Embodiments of the present disclosure relate to a data processing device, a data processing method, and a data processing system.
背景技术Background technique
直接存储器存取(Direct Memory Access,DMA)是计算机系统中一种快速传输数据的机制。DMA方式将数据从一个地址空间传输到另外一个地址空间。中央处理单元(CPU)初始化DMA传输操作,而传输操作本身是由DMA控制器来直接执行和完成。DMA控制器从CPU完全接管对系统总线的控制而不依赖于CPU。DMA传输主要涉及四种情况,但这四种情况本质上是一样的,都是从某一存储区域传输到另一存储区域。这四种情况包括:外设到内存(Device to Host,D2H)、内存到外设(H2D)、内存到内存(H2H)、外设到外设(D2D)(内存被认为属于主机侧)。Direct memory access (Direct Memory Access, DMA) is a mechanism for fast data transfer in computer systems. The DMA method transfers data from one address space to another. The central processing unit (CPU) initiates the DMA transfer operation, and the transfer operation itself is directly executed and completed by the DMA controller. The DMA controller completely takes over control of the system bus from the CPU and is not dependent on the CPU. DMA transfer mainly involves four situations, but these four situations are essentially the same, and they are all transferred from a certain storage area to another storage area. These four cases include: peripheral to memory (Device to Host, D2H), memory to peripheral (H2D), memory to memory (H2H), peripheral to peripheral (D2D) (memory is considered to belong to the host side).
发明内容Contents of the invention
本公开的至少一个实施例提供了一种数据处理装置,包括:At least one embodiment of the present disclosure provides a data processing device, including:
计算单元,被配置为响应于需要传输的数据而产生传输访问请求;a computing unit configured to generate a transfer access request in response to data to be transferred;
传输访问请求缓冲,被配置为缓存所述传输访问请求;a transmission access request buffer configured to cache the transmission access request;
命令处理单元,被配置为:响应于缓存在所述传输访问请求缓冲中的所述传输访问请求,将所述传输访问请求提供至数据传输控制装置,由此所述数据传输控制装置响应所述传输访问请求以传输所述需要传输的数据。a command processing unit configured to: in response to the transmission access request buffered in the transmission access request buffer, provide the transmission access request to a data transmission control device, whereby the data transmission control device responds to the An access request is transmitted to transmit the data that needs to be transmitted.
例如,在根据本公开至少一实施例的数据处理装置中,所述计算单元还被配置为,在产生所述传输访问请求之后,通知所述数据处理装置之外的中央处理单元。For example, in the data processing device according to at least one embodiment of the present disclosure, the computing unit is further configured to, after generating the transmission access request, notify a central processing unit outside the data processing device.
例如,在根据本公开至少一实施例的数据处理装置中,所述计算单元还被配置为,在产生所述传输访问请求之后,查询所述传输访问请求缓冲是否已满,且响应于所述传输访问请求缓冲未满,将所述传输访问请求写入所述传输访问请求缓冲。For example, in the data processing device according to at least one embodiment of the present disclosure, the calculation unit is further configured to, after generating the transmission access request, query whether the transmission access request buffer is full, and respond to the The transmission access request buffer is not full, and the transmission access request is written into the transmission access request buffer.
例如,在根据本公开至少一实施例的数据处理装置中,所述传输访问请求包括:传输源地址、传输目标地址和传输数据规模。For example, in the data processing apparatus according to at least one embodiment of the present disclosure, the transmission access request includes: a transmission source address, a transmission destination address, and a transmission data size.
例如,在根据本公开至少一实施例的数据处理装置中,所述传输访问请求还包括传输状态,所述传输状态包括待传输、传输中、传输完成或传输失败。For example, in the data processing apparatus according to at least one embodiment of the present disclosure, the transmission access request further includes a transmission status, and the transmission status includes pending transmission, transmission in progress, transmission completed, or transmission failed.
例如,根据本公开至少一实施例的数据处理装置还包括:命令缓冲,被配置为缓存用于所述命令处理单元的操作指令;其中,所述命令处理单元还配置为:在所述命令缓冲中缓存有至少一条操作指令以及所述传输访问请求缓冲缓存有至少一条传输访问请求的情形,优先处理所述传输访问请求缓冲中缓存的所述至少一条传输访问请求。For example, the data processing device according to at least one embodiment of the present disclosure further includes: a command buffer configured to buffer operation instructions for the command processing unit; wherein the command processing unit is further configured to: When there is at least one operation instruction cached in the middle and at least one transmission access request is cached in the transmission access request buffer, the at least one transmission access request cached in the transmission access request buffer is preferentially processed.
例如,在根据本公开至少一实施例的数据处理装置中,所述命令处理单元还配置为:在处理了所述命令缓冲中的每条操作指令之后,查询所述传输访问请求缓冲中是否缓存有至少一条传输访问请求,响应于所述传输访问请求缓冲中缓存有所述待处理的传输访问请求,在处理了所述传输访问请求缓冲中缓存的所述待处理的传输访问请求之后,再返回处理所述命令缓冲中其他的操作指令。For example, in the data processing device according to at least one embodiment of the present disclosure, the command processing unit is further configured to: after processing each operation instruction in the command buffer, query whether There is at least one transmission access request, in response to the pending transmission access request cached in the transmission access request buffer, after processing the pending transmission access request cached in the transmission access request buffer, then Return to process other operation instructions in the command buffer.
例如,在根据本公开至少一实施例的数据处理装置中,所述传输访问请求缓冲为先进先出缓冲。For example, in the data processing device according to at least one embodiment of the present disclosure, the transmission access request buffer is a first-in first-out buffer.
例如,在根据本公开至少一实施例的数据处理装置中,所述命令处理单元还被配置为将所述传输访问请求提供至所述数据处理装置之外的所述数据传输控制装置。For example, in the data processing device according to at least one embodiment of the present disclosure, the command processing unit is further configured to provide the transmission access request to the data transmission control device outside the data processing device.
本公开的至少一个实施例还提供了一种数据处理系统,该系统包括:如上所述的任一所述的数据处理装置以及上述数据传输控制装置,该数据传输控制装置被配置为响应所述传输访问请求以传输所述需要传输的数据。At least one embodiment of the present disclosure further provides a data processing system, the system comprising: any one of the above-mentioned data processing devices and the above-mentioned data transmission control device, the data transmission control device is configured to respond to the An access request is transmitted to transmit the data that needs to be transmitted.
例如,根据本公开至少一实施例的数据处理系统还包括中央处理单元,其中,所述计算单元还被配置为,在产生所述传输访问请求之后,通知所述中央处理单元。For example, the data processing system according to at least one embodiment of the present disclosure further includes a central processing unit, wherein the computing unit is further configured to notify the central processing unit after generating the transmission access request.
例如,根据本公开至少一实施例的数据处理系统中,所述中央处理单元配置为在收到关于所述传输访问请求的通知之后,持续查看所述传输访问请求的完成状态。For example, in the data processing system according to at least one embodiment of the present disclosure, the central processing unit is configured to continuously check the completion status of the transmission access request after receiving the notification about the transmission access request.
例如,根据本公开至少一实施例的数据处理系统中,所述数据传输控制装置包括直接存储器访问装置。For example, in the data processing system according to at least one embodiment of the present disclosure, the data transmission control device includes a direct memory access device.
本公开的至少一个实施例还提供了一种数据处理方法,该方法包括:响应于需要传输的数据而产生传输访问请求;缓存所述传输访问请求;响应于缓存的所述传输访问请求,将所述传输访问请求提供至数据传输控制装置;使得所述数据传输控制装置响应所述传输访问请求以传输所述需要传输的数据。At least one embodiment of the present disclosure further provides a data processing method, the method including: generating a transmission access request in response to the data to be transmitted; caching the transmission access request; responding to the cached transmission access request, The transmission access request is provided to the data transmission control device; so that the data transmission control device responds to the transmission access request to transmit the data to be transmitted.
例如,根据本公开至少一实施例的数据处理方法还包括:缓存操作指令;使得执行所述操作指令以及所述传输访问请求的命令处理单元优先处理所述传输访问请求。For example, the data processing method according to at least one embodiment of the present disclosure further includes: caching an operation instruction; enabling a command processing unit that executes the operation instruction and the transmission access request to process the transmission access request preferentially.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the accompanying drawings of the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description only relate to some embodiments of the present disclosure, rather than limiting the present disclosure .
图1示出了一种计算机系统的示意图;Fig. 1 shows a schematic diagram of a computer system;
图2示出了根据本公开至少一个实施例的数据处理装置的示意图;Fig. 2 shows a schematic diagram of a data processing device according to at least one embodiment of the present disclosure;
图3示出了根据本公开至少一个实施例的数据处理系统的示意图;Figure 3 shows a schematic diagram of a data processing system according to at least one embodiment of the present disclosure;
图4示出了根据本公开至少一个实施例的数据处理方法的示意图。Fig. 4 shows a schematic diagram of a data processing method according to at least one embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings of the embodiments of the present disclosure. Apparently, the described embodiments are some of the embodiments of the present disclosure, not all of them. Based on the described embodiments of the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without creative effort fall within the protection scope of the present disclosure.
计算机系统可以包括一个或多个中央处理单元(CPU)、一个或多个深度计算单元,深度计算单元例如包括图像处理器(GPU)、张量处理单元(TPU)、神经计算单元(NPU)等。为了尽可能提高系统整机性能,需要深度计算单元和CPU之间密切配合。例如,在使用深度计算单元继续大尺度的矩阵运算时,需要频繁地进行大批量数据传输操作。此时,使用DMA方式具有更高的传输效率。The computer system may include one or more central processing units (CPUs), one or more depth computing units, such as image processors (GPUs), tensor processing units (TPUs), neural computing units (NPUs), etc. . In order to improve the overall performance of the system as much as possible, a close cooperation between the deep computing unit and the CPU is required. For example, when using a deep computing unit to continue large-scale matrix operations, frequent large-scale data transfer operations are required. At this time, using the DMA method has higher transmission efficiency.
图1示出了一种计算机系统,该计算机系统包括中央处理单元(CPU)101、深度计算单元100、DMA控制器103以及总线104,中央处理单元(CPU)101、深度计算单元100、DMA控制器103通过总线104彼此通信。Fig. 1 shows a kind of computer system, and this computer system comprises central processing unit (CPU) 101, depth calculation unit 100, DMA controller 103 and
当需要针对深度计算单元100所需的数据或产生的数据以DMA方式进行数据传输时,例如,可以采用如下步骤。When the data required or generated by the depth calculation unit 100 needs to be transmitted in a DMA manner, for example, the following steps can be used.
(1)在深度计算单元100根据指令进行数据计算,通知中央处理单元101需要传输的数据已就绪,该需要传输的数据可以是从深度计算单元100传输到内存或其他外设,或者从内存或其他外设传输到深度计算单元100。(1) Perform data calculation in the depth calculation unit 100 according to instructions, and notify the
(2)中央处理单元101根据系统中当前运行的应用程序的指令,通过操作系统的应用程序接口(API)发出DMA传输访问请求,例如这个DMA传输访问请求会写到操作系统的运行队列中。(2) The
(3)运行队列会把该DMA传输访问请求以及当前程序运行过程中产生的其他请求以例如PM4包的形式写到CPU/深度计算单元共享的命令缓冲(command buffer)中。(3) The run queue will write the DMA transfer access request and other requests generated during the running of the current program into the command buffer (command buffer) shared by the CPU/deep computing unit in the form of PM4 packets, for example.
(4)深度计算单元100的命令处理单元(command processor,CP)处理该命令缓冲中缓存的各种包,由此将该DMA传输访问请求提供至DMA控制器(DMA Engine)。(4) The command processing unit (command processor, CP) of the depth calculation unit 100 processes various packets buffered in the command buffer, thereby providing the DMA transfer access request to a DMA controller (DMA Engine).
(5)DMA控制器进行数据传输,将需要传输的数据从源地址直接传输到目的地址。在DMA控制器完成数据传输后,通知CPU数据传输完成。(5) The DMA controller performs data transmission, and directly transmits the data to be transmitted from the source address to the destination address. After the DMA controller completes the data transfer, it notifies the CPU that the data transfer is complete.
上述步骤(2)-(3)实际上仍然在一定程度上涉及CPU需要进行的工作。在进行大量数据传输操作的情况下,上述步骤多次迭代。此时,这些步骤(2)-(3)将会对系统性能带来不利影响。The above steps (2)-(3) actually still involve the work that the CPU needs to do to a certain extent. In the case of a large amount of data transfer operations, the above steps are iterated multiple times. At this time, these steps (2)-(3) will have adverse effects on system performance.
本公开的至少一个实施例提供了一种数据处理装置,该中包括计算单元(Calculation Unit)、传输访问请求缓冲(Request Buffer)、命令处理单元(CommandProcessor)。该计算单元被配置为响应于需要传输的数据而产生传输访问请求。该传输访问缓冲被配置为缓存传输访问请求。该命令处理单元被配置为:响应于缓存在传输访问缓冲中的传输访问请求,将传输访问请求提供至数据传输控制装置,由此数据传输控制装置响应传输访问以传输需要传输的数据。At least one embodiment of the present disclosure provides a data processing device, which includes a calculation unit (Calculation Unit), a transmission access request buffer (Request Buffer), and a command processing unit (Command Processor). The computing unit is configured to generate a transfer access request in response to data to be transferred. The transport access buffer is configured to cache transport access requests. The command processing unit is configured to: respond to the transmission access request cached in the transmission access buffer, and provide the transmission access request to the data transmission control device, whereby the data transmission control device transmits the data to be transmitted in response to the transmission access.
本公开的至少一个实施例还提供了一种数据处理系统,该系统包括上述数据处理装置以及数据传输控制装置,该数据传输控制装置被配置为响应传输访问请求以传输需要传输的数据。At least one embodiment of the present disclosure further provides a data processing system, which includes the above-mentioned data processing device and a data transmission control device, where the data transmission control device is configured to transmit data to be transmitted in response to a transmission access request.
本公开的至少一个实施例还提供了一种数据处理方法,该方法包括:响应于需要传输的数据而产生传输访问请求;缓存传输访问请求;响应于缓存的传输访问请求,将传输访问请求提供至数据传输控制装置;使得数据传输控制装置响应传输访问请求以传输需要传输的数据。At least one embodiment of the present disclosure also provides a data processing method, the method including: generating a transmission access request in response to the data to be transmitted; caching the transmission access request; responding to the cached transmission access request, providing the transmission access request to the data transmission control device; causing the data transmission control device to respond to the transmission access request to transmit the data to be transmitted.
通过本公开上述实施例的数据处理装置、系统以及方法,在应用于DMA方式时,可以进一步降低CPU的开销,增加DMA操作的有效带宽,提高系统的整体性能。Through the data processing device, system and method of the above-mentioned embodiments of the present disclosure, when applied in a DMA mode, the CPU overhead can be further reduced, the effective bandwidth of the DMA operation can be increased, and the overall performance of the system can be improved.
本公开的实施例的数据处理装置例如可以为深度计算单元,例如深度计算单元可以具体实现为图像处理器(GPU)、张量处理单元(TPU)、神经处理单元(NPU)等,本公开的实施例对此不作限制,并且可以通过集成电路等实现。The data processing device in the embodiment of the present disclosure may be, for example, a depth calculation unit, for example, the depth calculation unit may be embodied as an image processor (GPU), a tensor processing unit (TPU), a neural processing unit (NPU), etc., the present disclosure The embodiment does not limit this, and it can be realized by an integrated circuit or the like.
图2示出了根据本公开一实施例的数据处理装置的示意图;图3示出了根据本公开一实施例的数据处理系统的示意图,该数据处理系统包括例如图2所示数据处理装置。例如,该数据处理装置例如为深度计算单元,该数据处理系统例如为计算机系统,该系统中数据处理装置相对于CPU为外设(Device)侧,而对应地,CPU或内存被作为主机(Host)侧。Fig. 2 shows a schematic diagram of a data processing device according to an embodiment of the present disclosure; Fig. 3 shows a schematic diagram of a data processing system according to an embodiment of the present disclosure, the data processing system includes, for example, the data processing device shown in Fig. 2 . For example, the data processing device is, for example, a deep calculation unit, and the data processing system is, for example, a computer system. In this system, the data processing device is a peripheral (Device) side relative to the CPU, and correspondingly, the CPU or memory is used as a host (Host) )side.
如图2所示,该数据处理装置200包括命令处理单元211、命令缓冲212、传输访问请求缓冲213、一个或多个计算单元230等。例如,该数据处理装置200还可以包括存储装置(未示出),该存储装置例如是高宽带存储器(HBM)。例如,该数据处理装置200的存储装置可以被数据处理装置200之外的装置(例如CPU或另一个数据处理装置)访问。As shown in FIG. 2 , the
图2中具体示出了2个计算单元230作为示例,而省略了其他可能的计算单元。每个计算单元230包括线程束调度/分发模块、多个计算核心(Kernel)、寄存器堆、共享L1缓存等。为了在多个计算单元230之间对执行计算任务的线程进行调度,该数据处理装置200还包括线程块调度单元221。FIG. 2 specifically shows two
该数据处理装置可以用于例如矩阵计算、图像渲染等计算任务,这些计算任务可以通过多个线程(thread)并行执行。例如,这些线程在执行前,在线程块调度单元221中被划分成多个线程块(thread block),然后这些线程块被分发到各个计算单元。一个线程块中的所有线程通常要分配到同一个计算单元上执行。同时,线程块会被拆分成线程束(或简称线程束,thread warp),例如,每个线程束包含了固定数量(或小于这个固定数量)的线程,例如,32个线程。多个线程块可以在同一个计算单元中执行,或者在不同计算单元中执行。The data processing device can be used for calculation tasks such as matrix calculation and image rendering, and these calculation tasks can be executed in parallel through multiple threads. For example, before execution, these threads are divided into multiple thread blocks (thread blocks) in the thread block scheduling unit 221, and then these thread blocks are distributed to various computing units. All threads in a thread block are usually assigned to the same computing unit for execution. At the same time, the thread block will be split into thread warps (or thread warps for short), for example, each thread warp includes a fixed number (or less than the fixed number) of threads, for example, 32 threads. Multiple thread blocks can execute in the same computing unit, or in different computing units.
在每个计算单元中,线程束调度/分发模块对线程束进行调度、分配,以便该计算单元230的多个计算核心运行对应的线程束。每个计算核心包括算术逻辑单元(ALU)、浮点计算单元等。根据计算单元中计算核心的个数,一个线程块中的多个线程束可以同时执行或分时执行。每个线程束中的多个线程会执行相同的指令。例如,指令的读取、译码、发射都在线程束调度/分发模块中完成。内存执行指令会被发射到计算单元中的共享缓存(例如共享L1缓存)或进一步发射到统一缓存中以进行读写操作等。In each computing unit, the warp scheduling/distributing module schedules and distributes the warps, so that multiple computing cores of the
在执行计算任务的进行过程中,计算单元230需要获取将被处理的输入数据,也同样会产生结果数据,这些数据例如可以存储在该数据处理装置200的存储装置(例如HBM)中。这些输入数据或结果数据可以通过DMA方式进行传输(输入或输出),在这种情况下计算单元230将产生传输访问请求,用于进行DMA。传输访问请求缓冲213用于缓存上述传输访问请求。计算单元230将该传输访问请求缓存(提交)到传输访问请求缓冲213缓存。例如,计算单元230在产生传输访问请求之后,例如通过总线通知数据处理装置200之外的中央处理单元。During the execution of the calculation task, the
传输访问请求缓冲213根据自身容量可以包括多个条目(entry),由此可以保存多条传输访问请求。例如,传输访问请求缓冲213可以是先进先出(FIFO)队列,由此先产生的传输访问请求可以先得到处理。例如,当传输访问请求缓冲213未满时,则可以一直接收并缓存新的传输访问请求,直到到达已满的状态。The transmission
例如,计算单元230在产生传输访问请求之后,查询传输访问请求缓冲213是否已满,且在传输访问请求缓冲213未满的情况下,将传输访问请求写入传输访问请求缓冲213,否则例如进行等待,或者抛弃当前的传输访问请求。For example, after the
例如,该传输访问请求可以包括传输源地址、传输目标地址和传输数据规模。传输源地址和传输目标地址根据具体需求可以分别是数据处理装置200内的存储装置的存储地址,或者可以是系统内存的存储地址,或者是计算机系统中其他外设(例如另一个数据处理装置)内的存储地址等,本公开的实施例对此不作限制。根据具体情况,可以得到例如用于如下四种情况的传输访问请求:外设到内存(Device to Host,D2H)、内存到外设(H2D)、内存到内存(H2H)、外设到外设(D2D)。传输数据规模指代在进行DMA时连续传输数据长度,其例如可以等于系统总线的带宽(例如可以为64位或128位等)。For example, the transfer access request may include a transfer source address, a transfer destination address, and a transfer data size. The transmission source address and the transmission destination address can be the storage addresses of the storage devices in the
当传输访问请求缓存在传输访问请求缓冲213之后,传输访问请求缓冲213除了记录传输源地址、传输目标地址和传输数据规模之外,还可以进一步记录每个传输访问请求的状态,例如,该状态可以包括:待传输、传输中、传输完成或传输失败。例如,这些状态信息可以由命令处理单元211或其他可以访问该传输访问请求缓冲213的设备(例如CPU或数据传输控制装置)访问,甚至修改。根据需要,传输访问请求还可以包括其他状态。When the transmission access request cache is behind the transmission
例如,在至少一个示例中,命令处理单元211可以查询传输访问请求缓冲213是否有至少一条待处理的传输访问请求,如果有待处理的传输访问请求,则将该传输访问请求提交给数据传输控制装置(例如DMA控制器),或将这些传输访问请求依次提交给数据传输控制装置,以便数据传输控制装置进行数据传输操作。数据传输控制装置根据具体的传输访问请求进行处理。For example, in at least one example, the command processing unit 211 may inquire whether the transmission
例如,当一个传输访问请求被处理完成之后,例如,如果确定传输完成或传输失败,则可以将该传输访问请求从传输访问请求缓冲213删除。该操作例如可由命令处理单元211执行,也即查询传输访问请求缓冲213是否有待处理的传输访问请求,获取仍然缓存在传输访问请求缓冲213中的传输访问请求的状态,在确定某一个传输访问请求所对应的数据传输完成或传输失败,则可以将该传输访问请求从传输访问请求缓冲213删除。For example, after a transmission access request is processed, for example, if it is determined that the transmission is completed or fails, the transmission access request may be deleted from the transmission
在本公开的实施例中,数据处理装置200内的命令缓冲212用于缓存来自CPU等设备或数据处理装置200自身的指令或请求等,这些指令或请求例如以包(例如PM4包)的形式被缓存。命令缓冲212根据自身容量可以包括多个条目(entry),由此可以保存多个包。对应地,命令处理单元211还用于执行在命令缓冲212中缓存的指令或请求,例如还可以与CPU等设备进行通信、交互。例如,命令缓冲212和传输访问请求缓冲213在数据处理装置200内可以彼此独立设置,或者共享同一个物理存储装置,但是在逻辑上分别控制、管理。In the embodiment of the present disclosure, the command buffer 212 in the
例如,在至少一个示例中,命令处理单元211在命令缓冲212中缓存有至少一条操作指令以及传输访问请求缓冲213缓存有至少一条传输访问请求的情形下,优先处理传输访问请求缓冲213中缓存的至少一条传输访问请求。也即,命令处理单元211在同时面对需要处理的传输访问请求以及需要处理的其他操作指令时,先处理该传输访问请求,将该传输访问请求提交到数据传输控制装置。For example, in at least one example, the command processing unit 211 preferentially processes the cached operation instruction in the transmission
具体地,在一个示例中,例如,命令处理单元211在处理了命令缓冲中的每条操作指令之后,查询传输访问请求缓冲213中是否缓存有至少一条传输访问请求,在传输访问请求缓冲213中缓存有至少一条传输访问请求的情况下,在处理了传输访问请求缓冲213中缓存的待处理的传输访问请求之后,再返回处理命令缓冲212中其他的操作指令。例如,命令处理单元211在将传输访问请求缓冲213中全部传输访问请求都提供至数据传输控制装置之后,回到之前离开命令缓冲212时的位置继续处理命令缓冲212中的操作指令。Specifically, in one example, for example, after processing each operation instruction in the command buffer, the command processing unit 211 queries whether there is at least one transmission access request cached in the transmission
例如,在一个示例中,命令处理单元211还响应于关于某一传输访问请求的状态改变(例如数据传输完成)的消息,对仍缓存在传输访问请求缓冲213中的该传输访问请求进行后续处理,例如,修改其状态信息,或将其删除等。For example, in one example, the command processing unit 211 also performs subsequent processing on the transmission access request still buffered in the transmission
如图3所示的计算机系统包括中央处理单元(CPU)201、一个或多个数据处理装置200、存储装置202、数据传输控制装置203以及总线204。该数据处理装置200为图2中所示的数据处理装置,图中示出了2个数据处理装置200作为示例。该2个数据处理装置200分别为数据处理装置200-1和数据处理装置200-2,例如,可以在数据处理装置200-1和数据处理装置200-2之间,根据传输访问请求由数据传输控制装置进行数据传输。The computer system shown in FIG. 3 includes a central processing unit (CPU) 201 , one or more
在本公开的实施例中,中央处理单元201例如可以为单核处理器或多核处理器,可以为RISC或CISC处理器等,例如可以为ARM处理器、RISC-V处理器等,本公开的实施例对此不做限制。数据传输控制装置203例如为DMA控制器,该DMA控制器例如单独提供或集成在例如中央处理单元201中等。存储装置202可以为计算机系统的系统内存或外存(例如硬盘),例如,可以为半导体存储装置等,本公开的实施例对于该存储装置的类型、结构等不作限制。In the embodiment of the present disclosure, the
中央处理单元201、数据处理装置200、存储装置202、DMA控制器203通过总线204彼此通信。本公开的实施例对于总线的类型不做限制,例如可以为各种适用的总线,例如PCIE总线。The
例如,中央处理单元201在收到关于传输访问请求的通知之后,持续查看传输访问请求的完成状态,在确定传输访问请求完成之后,通知例如应用程序。For example, after receiving the notification about the transmission access request, the
例如,数据传输控制装置203根据传输访问请求执行从传输源地址到传输目的地址的数据传输操作,在数据传输过程中、在完成了该数据传输之后,或者在多次尝试进行数据传输而仍然失败之后,还可以进一步更新对应的传输访问请求的状态,例如,提供关于状态改变的消息以更新数据处理装置200的传输访问请求缓冲213中存储的对应的传输访问请求的状态。For example, the data transmission control device 203 performs a data transmission operation from the transmission source address to the transmission destination address according to the transmission access request, during the data transmission process, after the data transmission is completed, or after multiple attempts to perform data transmission and still fail Afterwards, the state of the corresponding transmission access request may be further updated, for example, a message about state change is provided to update the state of the corresponding transmission access request stored in the transmission
下面结合图2和图3说明根据本公开至少一实施例的数据处理装置以及数据处理系统的示例性工作过程。An exemplary working process of a data processing device and a data processing system according to at least one embodiment of the present disclosure will be described below with reference to FIG. 2 and FIG. 3 .
在工作过程中,CPU 201根据当前运行的应用程序,将计算任务发送给数据处理装置200中的一个或多个进行处理。During the working process, the
例如,对于其中的数据处理装置200-1,在计算任务的执行过程中,计算单元230例如根据指令进行计算且产生了某一传输访问请求(这里称为“第一传输访问请求”),之后计算单元230查询传输访问请求缓冲213是否已满,如果传输访问请求缓冲213未满,则将该第一传输访问请求写入到传输访问请求缓冲213以待处理,并且将该第一传输访问请求的状态标注为“未传输”。该第一传输访问请求记载有对应的传输源地址、传输目的地址以及传输数据规模;同时,计算单元230还通知CPU 201产生了第一传输访问请求。CPU 201在接收到该通知之后,持续监测该第一传输访问请求的完成状态。For example, for the data processing device 200-1 therein, during the execution of the calculation task, the
在计算单元230执行计算任务的同时,数据处理装置200中的命令处理单元211在处理命令缓冲212中的缓存的多个操作指令;当执行完一条操作指令P1之后,命令处理单元211查询传输访问请求缓冲213当前是否有待处理的传输访问请求。在第一传输访问请求已经写入到传输访问请求缓冲213中之后,命令处理单元211访问到该第一传输访问请求且获知其状态为“待传输”,该第一传输访问请求比命令缓冲212中的操作指令有更高的优先级,因此命令处理单元211转入处理该第一传输访问请求,将该第一传输访问请求发送到数据传输控制装置203。之后,如果传输访问请求缓冲213还有其他传输访问请求,则命令处理单元211继续处理这些待处理的传输访问请求,直到全部待处理的传输访问请求都被成功发送到数据传输控制装置203,然后,命令处理单元211才回转到命令缓冲212,从之前处理过的操作指令P1的位置开始继续处理命令缓冲212中其他待处理的操作指令。While the
数据传输控制装置203根据该第一传输访问请求记载的传输源地址、传输目的地址以及传输数据规模,进行对应的数据传输操作,通知数据处理装置200-1的命令处理单元211将该第一传输访问请求的状态修改为“传输中”;并且,当数据传输操作完成之后,数据传输控制装置203通知数据处理装置200-1的命令处理单元211,该命令处理单元211在传输访问请求缓冲213中修改第一传输访问请求的状态为“传输完成”,并且例如预定时间之后或由其他操作触发将该已经完成的第一传输访问请求从传输访问请求缓冲213中删除。此外,数据传输控制装置203还可以通知CPU 201该第一传输访问请求已经完成,由此CPU 201可以将该信息反馈给上层应用程序。The data transmission control device 203 performs the corresponding data transmission operation according to the transmission source address, transmission destination address and transmission data size recorded in the first transmission access request, and notifies the command processing unit 211 of the data processing device 200-1 of the first transmission The status of the access request is modified to "in transmission"; and, after the data transmission operation is completed, the data transmission control device 203 notifies the command processing unit 211 of the data processing device 200-1, and the command processing unit 211 is in the transmission
在上述过程中,数据处理装置200主动发起传输访问请求,由此相比于经由CPU发起传输访问请求,可以降低系统开销,提升系统性能。In the above process, the
对应于上述装置与系统,本公开的实施例还提供了一种数据处理方法。如图4所示,该数据处理方法包括如下的步骤301~304:Corresponding to the above device and system, an embodiment of the present disclosure also provides a data processing method. As shown in Figure 4, the data processing method includes the following steps 301-304:
步骤301:响应于需要传输的数据而产生传输访问请求。Step 301: Generate a transfer access request in response to the data to be transferred.
步骤302:缓存传输访问请求。Step 302: Cache and transmit the access request.
步骤303:响应于缓存的传输访问请求,将传输访问请求提供至数据传输控制装置。Step 303: In response to the buffered transmission access request, provide the transmission access request to the data transmission control device.
步骤304:使得数据传输控制装置响应传输访问请求以传输需要传输的数据。Step 304: Make the data transmission control device respond to the transmission access request to transmit the data to be transmitted.
例如,通过计算单元进行计算并确定需要传输的数据,并且响应于需要传输的数据而产生传输访问请求。通过传输访问请求缓冲缓存传输访问请求。通过命令处理单元,响应于缓存在传输访问请求缓冲中的传输访问请求,将传输访问请求提供至数据传输控制装置。通过数据传输控制装置响应传输访问请求以传输需要传输的数据。For example, the computing unit performs calculations to determine the data that needs to be transmitted, and generates a transmission access request in response to the data that needs to be transmitted. Cache transfer access requests by transfer access request buffering. The transmission access request is provided to the data transmission control means by the command processing unit in response to the transmission access request buffered in the transmission access request buffer. The data to be transmitted is transmitted by the data transmission control device in response to the transmission access request.
例如,例如,根据本公开至少一实施例的数据处理方法还包括:缓存操作指令;使得执行操作指令以及传输访问请求的命令处理单元优先处理传输访问请求。For example, the data processing method according to at least one embodiment of the present disclosure further includes: caching operation instructions; enabling the command processing unit that executes the operation instructions and transmits access requests to process the transmission access requests preferentially.
对于该方法其他可能的操作步骤或选择等,可以参考前面结合图2和图3所描述的操作,这里不再赘述。For other possible operation steps or selections of the method, reference may be made to the operations described above in conjunction with FIG. 2 and FIG. 3 , which will not be repeated here.
该实施例的数据处理方法例如可通过数据处理装置发起传输访问请求,由此可以降低系统开销,提升系统性能。In the data processing method of this embodiment, for example, a data processing device can initiate a transmission access request, thereby reducing system overhead and improving system performance.
对于本公开,还有以下几点需要说明:For this disclosure, the following points need to be explained:
(1)本公开实施例附图只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。(1) The drawings of the embodiments of the present disclosure only relate to the structures involved in the embodiments of the present disclosure, and other structures may refer to general designs.
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。(2) In the case of no conflict, the embodiments of the present disclosure and the features in the embodiments can be combined with each other to obtain new embodiments.
以上所述仅是本公开的示范性实施方式,而非用于限制本公开的保护范围,本公开的保护范围由所附的权利要求确定。The above descriptions are only exemplary implementations of the present disclosure, and are not intended to limit the protection scope of the present disclosure, which is determined by the appended claims.
Claims (15)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111445101.6A CN114116553B (en) | 2021-11-30 | 2021-11-30 | Data processing device, method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111445101.6A CN114116553B (en) | 2021-11-30 | 2021-11-30 | Data processing device, method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114116553A CN114116553A (en) | 2022-03-01 |
CN114116553B true CN114116553B (en) | 2023-01-20 |
Family
ID=80368796
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111445101.6A Active CN114116553B (en) | 2021-11-30 | 2021-11-30 | Data processing device, method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114116553B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116302504B (en) * | 2023-02-23 | 2024-08-27 | 海光信息技术股份有限公司 | Thread block processing system, method and related equipment |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000354083A (en) * | 1999-04-09 | 2000-12-19 | Matsushita Electric Ind Co Ltd | Data transmitter |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03122745A (en) * | 1989-10-05 | 1991-05-24 | Mitsubishi Electric Corp | Dma control system |
JP2000215154A (en) * | 1999-01-25 | 2000-08-04 | Matsushita Electric Ind Co Ltd | Dma controller |
JP5102917B2 (en) * | 2008-02-22 | 2012-12-19 | 株式会社日立製作所 | Storage apparatus and access command transmission method |
TW201015321A (en) * | 2008-09-25 | 2010-04-16 | Panasonic Corp | Buffer memory device, memory system and data trnsfer method |
US8775699B2 (en) * | 2011-03-01 | 2014-07-08 | Freescale Semiconductor, Inc. | Read stacking for data processor interface |
US9658975B2 (en) * | 2012-07-31 | 2017-05-23 | Silicon Laboratories Inc. | Data transfer manager |
CN106202261A (en) * | 2016-06-29 | 2016-12-07 | 浪潮(北京)电子信息产业有限公司 | The distributed approach of a kind of data access request and engine |
JP6880402B2 (en) * | 2017-05-10 | 2021-06-02 | 富士通株式会社 | Memory access control device and its control method |
-
2021
- 2021-11-30 CN CN202111445101.6A patent/CN114116553B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000354083A (en) * | 1999-04-09 | 2000-12-19 | Matsushita Electric Ind Co Ltd | Data transmitter |
Also Published As
Publication number | Publication date |
---|---|
CN114116553A (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12061562B2 (en) | Computer memory expansion device and method of operation | |
US8149854B2 (en) | Multi-threaded transmit transport engine for storage devices | |
CN112527730A (en) | System, apparatus and method for processing remote direct memory access operations with device attached memory | |
CN100549992C (en) | Data transmitting and receiving method and system capable of reducing delay | |
JP2516300B2 (en) | Apparatus and method for optimizing the performance of a multi-processor system | |
WO2020087927A1 (en) | Method and device for memory data migration | |
CN114095251B (en) | SSLVPN implementation method based on DPDK and VPP | |
US9560117B2 (en) | Low latency cluster computing | |
WO2013082809A1 (en) | Acceleration method, device and system for co-processing | |
US11995351B2 (en) | DMA engines configured to perform first portion data transfer commands with a first DMA engine and second portion data transfer commands with second DMA engine | |
US11258887B2 (en) | Payload cache | |
US9286129B2 (en) | Termination of requests in a distributed coprocessor system | |
JP3266470B2 (en) | Data processing system with per-request write-through cache in forced order | |
CN114116553B (en) | Data processing device, method and system | |
US11687460B2 (en) | Network cache injection for coherent GPUs | |
CN115481072A (en) | Inter-core data transmission method, multi-core chip and machine-readable storage medium | |
JP2009217721A (en) | Data synchronization method in multiprocessor system and multiprocessor system | |
JP2006252358A (en) | Disk array device, its shared memory device, and control program and control method for disk array device | |
US20090089559A1 (en) | Method of managing data movement and cell broadband engine processor using the same | |
US7120758B2 (en) | Technique for improving processor performance | |
US20230195664A1 (en) | Software management of direct memory access commands | |
US20240095184A1 (en) | Address Translation Service Management | |
US10423424B2 (en) | Replicated stateless copy engine | |
WO2024183678A1 (en) | Method for acquiring lock of data object, network interface card, and computing device | |
CN118796485A (en) | A parallel processing method for computing tasks based on MESI protocol |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |