CN114445260A

CN114445260A - Distributed GPU communication method and device based on FPGA

Info

Publication number: CN114445260A
Application number: CN202210051088.4A
Authority: CN
Inventors: 张静东; 王江为; 王媛丽; 阚宏伟
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-05-06
Anticipated expiration: 2042-01-17
Also published as: CN114445260B

Abstract

The utility model relates to a distributed GPU communication method and device based on FPGA, the method is applied to a communication device, the communication device comprises a first FPGA processing chip, the first FPGA processing chip receives request information sent by GPU, the request information comprises a data storage address, data is read from the data storage address of the GPU, the data is sent to a server according to configuration information received from a remote resource scheduling center device, or the data is sent to a second FPGA processing chip according to the configuration information, the FPGA processing chip is used as a switching card between the GPU and the server, the coupling degree between the GPU and the server is reduced, a two-dimensional ring network topology can be formed with adjacent FPGA processing chips, the flexibility between the GPUs is greatly improved, and the time for mutual communication between the GPUs is reduced.

Description

Method and device for distributed GPU communication based on FPGA

技术领域technical field

本申请涉及通信技术领域，特别是涉及一种基于FPGA的分布式GPU通信的方法及装置。The present application relates to the field of communication technologies, and in particular, to a method and device for FPGA-based distributed GPU communication.

背景技术Background technique

图形处理器(Graphics Processing Unit，GPU)是一种专用的图形处理芯片，无论是早期用于图形图像处理，还是现在广泛用于AI人工智能计算领域，都是一种重要的计算芯片。GPU作为一种高速串行计算机扩展总线标准(Peripheral Component Interconnectexpress，PCIe)设备插接在数据中心服务器插槽上，通过PCIe接口与主机服务器和其他GPU节点通信。Graphics Processing Unit (GPU) is a dedicated graphics processing chip, which is an important computing chip whether it is used for graphics and image processing in the early days or is now widely used in the field of AI artificial intelligence computing. As a high-speed serial computer expansion bus standard (Peripheral Component Interconnect express, PCIe) device, the GPU is plugged into the slot of the server in the data center, and communicates with the host server and other GPU nodes through the PCIe interface.

由于GPU与服务器通过PCIe接口连接的紧耦合关系，GPU无法独立于服务器单独运行，跨节点GPU间通信只能通过网卡连接到交换机的方式进行，通信网络拓扑不够灵活，数据转发效率低、通信延时大。Due to the tight coupling relationship between the GPU and the server through the PCIe interface, the GPU cannot run independently of the server, and the communication between GPUs across nodes can only be carried out by connecting the network card to the switch. The communication network topology is not flexible enough, data forwarding efficiency is low, and communication delays big time.

发明内容SUMMARY OF THE INVENTION

基于此，有必要针对上述技术问题，提供一种基于FPGA的分布式GPU通信的方法及装置，FPGA处理芯片作为GPU与服务器之间的转接卡，不仅降低了GPU与服务器之间的耦合度，还可以与相邻的FPGA处理芯片形成二维环形网络拓扑，极大地提高了通信网络拓扑的灵活性，减少GPU之间互相通信的时间，提高数据转发效率。Based on this, it is necessary to provide a method and device for FPGA-based distributed GPU communication in view of the above technical problems. The FPGA processing chip is used as an adapter card between the GPU and the server, which not only reduces the coupling between the GPU and the server. , and can also form a two-dimensional ring network topology with adjacent FPGA processing chips, which greatly improves the flexibility of the communication network topology, reduces the time for mutual communication between GPUs, and improves data forwarding efficiency.

第一方面，提供一种基于FPGA的分布式GPU通信的方法，该方法应用于通信装置，通信装置包括第一FPGA处理芯片，第一FPGA处理芯片包括第一接口、第二接口、第一网络接口和多个第二网络接口；第一FPGA处理芯片的第一接口与GPU通信连接，第一FPGA处理芯片的第二接口与服务器通信连接；第一网络接口与远程资源调度中心设备通信连接；多个第二网络接口用于与第二FPGA处理芯片通信连接；方法包括：A first aspect provides a method for FPGA-based distributed GPU communication, the method is applied to a communication device, the communication device includes a first FPGA processing chip, and the first FPGA processing chip includes a first interface, a second interface, a first network an interface and a plurality of second network interfaces; the first interface of the first FPGA processing chip is communicatively connected to the GPU, the second interface of the first FPGA processing chip is communicatively connected to the server; the first network interface is communicatively connected to the remote resource scheduling center device; A plurality of second network interfaces are used for communication and connection with the second FPGA processing chip; the method includes:

接收GPU发送的请求信息，请求信息包括数据存储的地址；Receive the request information sent by the GPU, and the request information includes the address of the data storage;

从GPU的数据存储地址中读取数据；Read data from the data storage address of the GPU;

根据从远程资源调度中心设备接收到的配置信息向服务器发送数据；或者根据配置信息向第二FPGA处理芯片发送数据。Send data to the server according to the configuration information received from the remote resource scheduling center device; or send data to the second FPGA processing chip according to the configuration information.

在一种可能的实现方式中，第一FPGA处理芯片还包括数据处理模块，数据处理模块包括GPU直接数据存取单元；从GPU的数据存储地址中读取数据，包括：In a possible implementation manner, the first FPGA processing chip further includes a data processing module, and the data processing module includes a GPU direct data access unit; reading data from a data storage address of the GPU includes:

通过GPU直接数据存取单元从GPU的数据存储地址中读取数据。Data is read from the data storage address of the GPU through the GPU direct data access unit.

在一种可能的实现方式中，第一FPGA处理芯片还包括桥梁模块；在根据从远程资源调度中心设备接收到的配置信息向服务器发送数据；或者根据配置信息向第二FPGA处理芯片发送数据之前，方法还包括：In a possible implementation manner, the first FPGA processing chip further includes a bridge module; before sending data to the server according to the configuration information received from the remote resource scheduling center device; or before sending data to the second FPGA processing chip according to the configuration information , the method also includes:

通过桥梁模块接收远程资源调度中心设备发送的配置信息。Receive the configuration information sent by the remote resource scheduling center equipment through the bridge module.

在一种可能的实现方式中，配置信息包括第一指示信息；根据配置信息向服务器发送数据，包括：In a possible implementation manner, the configuration information includes first indication information; sending data to the server according to the configuration information includes:

根据第一指示信息，向服务器发送数据。According to the first indication information, data is sent to the server.

在一种可能的实现方式中，配置信息包括第二指示信息；根据配置信息向第二FPGA处理芯片发送数据，包括：In a possible implementation manner, the configuration information includes second indication information; sending data to the second FPGA processing chip according to the configuration information includes:

根据第二指示信息，向第二FPGA处理芯片发送数据。According to the second indication information, data is sent to the second FPGA processing chip.

在一种可能的实现方式中，数据处理模块还包括运算单元，配置信息还包括运算规则的信息和目标网络接口的信息；根据第二指示信息，向第二FPGA处理芯片发送数据，包括：In a possible implementation manner, the data processing module further includes an operation unit, and the configuration information further includes information of the operation rule and information of the target network interface; according to the second indication information, sending data to the second FPGA processing chip includes:

根据第二指示信息，通过运算单元采用运算规则对数据进行处理得到处理结果；According to the second indication information, the operation unit uses the operation rule to process the data to obtain the processing result;

通过多个第二网络接口中的目标网络接口向第二FPGA处理芯片发送数据。The data is sent to the second FPGA processing chip through the target network interface among the plurality of second network interfaces.

第二方面，提供了一种基于FPGA的分布式GPU通信的装置，该装置包括第一FPGA处理芯片，第一FPGA处理芯片包括第一接口、第二接口、第一网络接口和多个第一网络接口；第一FPGA处理芯片的第一接口与GPU通信连接，第一FPGA处理芯片的第二接口与服务器通信连接；第一网络接口与远程资源调度中心设备通信连接；多个第二网络接口用于与第二FPGA处理芯片通信连接；第一FPGA处理芯片用于：In a second aspect, an FPGA-based distributed GPU communication device is provided, the device includes a first FPGA processing chip, and the first FPGA processing chip includes a first interface, a second interface, a first network interface, and a plurality of first network interface; the first interface of the first FPGA processing chip is communicatively connected to the GPU, the second interface of the first FPGA processing chip is communicatively connected to the server; the first network interface is communicatively connected to the remote resource scheduling center device; a plurality of second network interfaces It is used to communicate and connect with the second FPGA processing chip; the first FPGA processing chip is used for:

通过第一接口接收GPU发送的请求信息，请求信息包括数据存储的地址；Receive request information sent by the GPU through the first interface, where the request information includes the address of the data storage;

根据通过第一网络接口从远程资源调度中心设备接收到的配置信息向服务器发送数据；或者根据配置信息向第二FPGA处理芯片发送数据设备通信连接；第二网络接口用于与第二FPGA处理芯片通信连接。Send data to the server according to the configuration information received from the remote resource scheduling center device through the first network interface; or send data to the second FPGA processing chip according to the configuration information. The device communication connection; the second network interface is used to communicate with the second FPGA processing chip communication connection.

在一种可能的实现方式中，第一FPGA处理芯片还包括数据处理模块，数据处理模块包括GPU直接数据存取单元；第一FPGA处理芯片具体用于：In a possible implementation manner, the first FPGA processing chip further includes a data processing module, and the data processing module includes a GPU direct data access unit; the first FPGA processing chip is specifically used for:

在一种可能的实现方式中，第一FPGA处理芯片还包括桥梁模块；第一FPGA处理芯片具体用于：In a possible implementation manner, the first FPGA processing chip further includes a bridge module; the first FPGA processing chip is specifically used for:

在一种可能的实现方式中，配置信息包括第一指示信息；第一FPGA处理芯片具体用于：In a possible implementation manner, the configuration information includes first indication information; the first FPGA processing chip is specifically used for:

上述基于FPGA的分布式GPU通信的方法及装置，通过接收GPU发送的请求信息，请求信息包括数据存储地址，从GPU的数据存储地址中读取数据，根据从远程资源调度中心设备接收到的配置信息向服务器发送数据，或者根据配置信息向第二FPGA处理芯片发送数据，FPGA处理芯片作为GPU与服务器之间的转接卡，不仅降低了GPU与服务器之间的耦合度，减少服务器端CPU、内存、网络等资源的开销，还可以与相邻的FPGA处理芯片形成二维环形网络拓扑，极大地提高了GPU间的灵活性，减少GPU之间互相通信的时间。The above-mentioned FPGA-based distributed GPU communication method and device, by receiving the request information sent by the GPU, the request information including the data storage address, reading data from the data storage address of the GPU, according to the configuration received from the remote resource scheduling center equipment The information sends data to the server, or sends data to the second FPGA processing chip according to the configuration information. The FPGA processing chip acts as an adapter card between the GPU and the server, which not only reduces the coupling between the GPU and the server, but also reduces the server-side CPU, The overhead of resources such as memory and network can also form a two-dimensional ring network topology with adjacent FPGA processing chips, which greatly improves the flexibility between GPUs and reduces the time for mutual communication between GPUs.

附图说明Description of drawings

图1为本申请一个实施例中基于FPGA的分布式GPU方法的应用环境图；1 is an application environment diagram of an FPGA-based distributed GPU method in an embodiment of the application;

图2为一个实施例中第一FPGA处理芯片的结构框图；Fig. 2 is the structural block diagram of the first FPGA processing chip in one embodiment;

图3为一个实施例中基于FPGA的分布式GPU方法的流程示意图。FIG. 3 is a schematic flowchart of an FPGA-based distributed GPU method in one embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

在现有技术中，GPUDirect系列是一种GPU直接内存访问技术，可以让GPU通过PCIe芯片集总线系统访问其他子设备或主机的内存，从而减少不必要的内存拷贝、降低中央处理器(Central Processing Unit，CPU)使用开销，提高数据传输效率。其中，GPUDirectShared Memory是GPUDirect系列的一种GPU直接共享内存的技术，能够使GPU与其他PCIe设备共享主机的内存页来实现不同PCIe设备之间数据通信的技术。In the prior art, the GPUDirect series is a GPU direct memory access technology, which allows the GPU to access the memory of other sub-devices or the host through the PCIe chipset bus system, thereby reducing unnecessary memory copies and reducing the cost of the central processing unit (Central Processing Unit). Unit, CPU) usage overhead to improve data transmission efficiency. Among them, GPUDirectShared Memory is a technology of GPUDirect series that directly shares memory between GPUs, which enables GPU and other PCIe devices to share memory pages of the host to realize data communication between different PCIe devices.

GPUDirect P2P是GPUDirect系列的一种GPU间直接共享显存的技术，是指同一PCIe根复合体(PCIe Root Complex，PCIe RC)域下两个GPU设备直接互访GPU显存，GPU间不需要将数据拷贝到主机内存里中转，相比GPUDirect Shared Memory技术，减少了将数据从GPU显存拷贝到主机内存和从主机内存拷贝到GPU显存的步骤，降低了数据通路延时，提高了数据传输效率。GPUDirect P2P is a technology of GPUDirect series that directly shares video memory between GPUs. It means that two GPU devices in the same PCIe Root Complex (PCIe RC) domain directly access GPU video memory, and there is no need to copy data between GPUs. Compared with the GPUDirect Shared Memory technology, the transfer to the host memory reduces the steps of copying data from the GPU memory to the host memory and from the host memory to the GPU memory, reducing the data path delay and improving the data transmission efficiency.

GPU直接远程直接内存访问(GPUDirect Remote Direct Memory Access，GPUDirect RDMA)技术是利用RDMA相关技术、物理网卡和传输网络实现GPU间直接交互显存数据，该技术解决了传统网络数据传输过程中处理延时大、CPU占用率高等问题，实现了不同节点间直接互访GPU显存的功能。GPU Direct Remote Direct Memory Access (GPUDirect RDMA) technology uses RDMA-related technologies, physical network cards and transmission networks to realize direct interaction of video memory data between GPUs. This technology solves the problem of large processing delays in the traditional network data transmission process. , the problem of high CPU usage, and realizes the function of directly accessing GPU memory between different nodes.

GPUDirect Shared Memory技术和GPUDirect P2P技术都是基于GPU作为主机下的PCIe设备开发实现，与其他设备通信依赖CPU、主机内存和PCIe交换系统等参与，设备与服务器CPU、内存等通过PCIe紧耦合，通信仅限于单节点内的GPU。采用GPUDirect SharedMemory技术进行单节点内的GPU间交换显存数据时，需要经过每个CPU及CPU1内存等模块，CPU额外开销大、数据传输处理延时大。采用GPUDirect P2P技术进行GPU间显存数据交换时，仅限于同一PCIe RC域下的GPU间通过PCIe芯片集进行显存数据直接交互，如果跨两个CPU的PCIe RC域还需要CPU及CPU内存参与数据传输，导致GPU间显存数据交互延时和CPU开销依然很大。GPUDirect Shared Memory technology and GPUDirect P2P technology are both developed and implemented based on GPU as a PCIe device under the host computer. Communication with other devices relies on the participation of CPU, host memory and PCIe switching system. The device and server CPU, memory, etc. are tightly coupled through PCIe to communicate Limited to GPUs within a single node. When using the GPUDirect SharedMemory technology to exchange video memory data between GPUs in a single node, it needs to pass through each CPU and CPU1 memory modules, resulting in high CPU overhead and large data transmission processing delays. When GPUDirect P2P technology is used to exchange video memory data between GPUs, it is limited to the direct interaction of video memory data between GPUs in the same PCIe RC domain through the PCIe chipset. If the PCIe RC domains of two CPUs span two CPUs, the CPU and CPU memory are also required to participate in data transmission. , resulting in a large amount of video memory data interaction delay and CPU overhead between GPUs.

GPUDirect RDMA技术虽然利用RDMA技术实现了跨节点间的GPU通信问题，但需要在同一PCIe域下的高性能网卡以及本地服务器CPU等参与帮助GPU完成跨节点的数据传输，GPU与服务器仍是通过PCIe连接的紧耦合关系，GPU无法独立于服务器单独运行，跨节点GPU间通信只能通过网卡连接到交换机的方式进行，通信网络拓扑不够灵活，数据包转发效率低、通信延时大。Although GPUDirect RDMA technology uses RDMA technology to realize the problem of GPU communication between nodes, it requires high-performance network cards and local server CPUs in the same PCIe domain to participate in helping the GPU to complete data transmission across nodes. The GPU and the server are still through PCIe Due to the tight coupling relationship of the connection, the GPU cannot run independently of the server, and the communication between GPUs across nodes can only be carried out by connecting the network card to the switch. The communication network topology is not flexible enough, the data packet forwarding efficiency is low, and the communication delay is large.

为了解决现有技术问题，本申请实施例提供了一种基于FPGA的分布式GPU通信的方法及装置。下面首先对本申请实施例所提供的基于FPGA的分布式GPU通信的方法进行介绍，该方法应用于图1所示的应用环境，如图1所示，通信装置100包括多个FPGA处理芯片，每个FPGA处理芯片包含多个网络接口，其中第一网络接口230通过交换机与远程资源调度中心设备通信连接，第二网络接口240与周围其他FPGA处理芯片组成2D环形通信拓扑，第二网络接口240的数量可以根据用户需求进行调整，不以设置4个为限。FPGA处理芯片通过第二网络接口240可以向相邻的FPGA处理芯片发送数据，或者接收相邻的FPGA处理芯片发送的数据。第一网络接口和第二网络接口为100G网络光口，使GPU之间以及GPU与远程资源调度中心设备进行高效通信。In order to solve the problems in the prior art, the embodiments of the present application provide a method and apparatus for FPGA-based distributed GPU communication. The following first introduces the FPGA-based distributed GPU communication method provided by the embodiment of the present application. The method is applied to the application environment shown in FIG. 1 . As shown in FIG. 1 , the communication device 100 includes a plurality of FPGA processing chips. Each FPGA processing chip includes multiple network interfaces, wherein the first network interface 230 is communicatively connected to the remote resource scheduling center device through a switch, and the second network interface 240 forms a 2D ring communication topology with other surrounding FPGA processing chips. The number can be adjusted according to user needs, not limited to 4. The FPGA processing chip can send data to the adjacent FPGA processing chip through the second network interface 240, or receive data sent by the adjacent FPGA processing chip. The first network interface and the second network interface are 100G network optical ports, enabling efficient communication between GPUs and between GPUs and remote resource scheduling center equipment.

将其中任意一个FPGA处理芯片定义为第一FPGA处理芯片，与第一FPGA处理芯片相连接的FPGA处理芯片定义为第二FPGA处理芯片，第一FPGA处理芯片和第二FPGA处理芯片具有相同的结构。Any one of the FPGA processing chips is defined as the first FPGA processing chip, the FPGA processing chip connected to the first FPGA processing chip is defined as the second FPGA processing chip, and the first FPGA processing chip and the second FPGA processing chip have the same structure .

如图2所示，第一FPGA处理芯片200包括第一接口210、第二接口220、第一网络接口230和多个第二网络接口240；第一FPGA处理芯片的第一接口210与GPU通信连接，第一FPGA处理芯片200的第二接口220与服务器通信连接；第一网络接口230与远程资源调度中心设备通信连接；多个第二网络接口240用于与第二FPGA处理芯片通信连接。As shown in FIG. 2 , the first FPGA processing chip 200 includes a first interface 210, a second interface 220, a first network interface 230 and a plurality of second network interfaces 240; the first interface 210 of the first FPGA processing chip communicates with the GPU The second interface 220 of the first FPGA processing chip 200 is communicatively connected to the server; the first network interface 230 is communicatively connected to the remote resource scheduling center device; a plurality of second network interfaces 240 are used to communicate with the second FPGA processing chip.

第一FPGA处理芯片200还包括指令配置模块270、路由模块280、第一收发模块290、第二收发模块2100，其中，第一收发模块290为RoCE收发模块，与第二网络接口240和路由模块280通信连接，通过RoCE协议接收来自相邻FPGA处理芯片的数据包，解析该数据包，并对第一FPGA处理芯片的数据进行组包，向相邻FPGA处理芯片发送数据包。The first FPGA processing chip 200 further includes an instruction configuration module 270, a routing module 280, a first transceiver module 290, and a second transceiver module 2100, wherein the first transceiver module 290 is a RoCE transceiver module, which is connected to the second network interface 240 and the routing module. 280 communication connection, receive the data packet from the adjacent FPGA processing chip through the RoCE protocol, parse the data packet, package the data of the first FPGA processing chip, and send the data packet to the adjacent FPGA processing chip.

第二收发模块2100为RoCEv2收发模块，与第一网络接口230、路由模块280和指令配置模块270通信连接，通过RoCEv2协议接收远程资源调度中心设备通信向第一网络接口230发送的配置信息，并解析配置信息，将解析结果分发至对应的模块，比如说将运算法则的信息发送至指令配置模块270，完成GPU资源的注册、初始化等任务。同时，还可以将与第一FPGA处理芯片连接的GPU数据进行组包，通过第一网络接口230连接到交换机与其他FPGA处理芯片连接的GPU进行通信。The second transceiver module 2100 is a RoCEv2 transceiver module, which is in communication connection with the first network interface 230, the routing module 280 and the instruction configuration module 270, and receives configuration information sent by the remote resource scheduling center equipment to the first network interface 230 through the RoCEv2 protocol, and Parse the configuration information, and distribute the parsing results to the corresponding modules, for example, send the information of the algorithm to the instruction configuration module 270 to complete tasks such as registration and initialization of GPU resources. At the same time, the GPU data connected to the first FPGA processing chip can also be packaged, and connected to the switch through the first network interface 230 to communicate with the GPUs connected to other FPGA processing chips.

路由模块280在第一FPGA处理芯片上电后，通过第二网络接口240与相邻的FPGA处理芯片进行通信，获取相邻的FPGA处理芯片的第二网络接口240的MAC地址信息并保存至内存中，以便通过RoCE协议进行通信。After the first FPGA processing chip is powered on, the routing module 280 communicates with the adjacent FPGA processing chip through the second network interface 240, obtains the MAC address information of the second network interface 240 of the adjacent FPGA processing chip and saves it in the memory , in order to communicate via the RoCE protocol.

图3示出了本申请一个实施例提供基于FPGA的分布式GPU通信的方法的流程示意图。如图3所示，该方法可以包括以下步骤：FIG. 3 shows a schematic flowchart of a method for providing FPGA-based distributed GPU communication according to an embodiment of the present application. As shown in Figure 3, the method may include the following steps:

S310，接收GPU发送的请求信息，请求信息包括数据存储地址。S310: Receive request information sent by the GPU, where the request information includes a data storage address.

在进行数据通信之前，远程资源调度中心设备对通信装置进行初始化设置，通过服务器向GPU发送待处理的数据，GPU对接收到的数据进行处理，保存处理后的数据，向通信装置发送请求信息，以使通信装置将GPU保存的数据进行传输。其中，请求信息包括数据存储地址，以便通信装置准确地获取GPU需要传送的数据。Before performing data communication, the remote resource scheduling center equipment initializes the communication device, sends the data to be processed to the GPU through the server, and the GPU processes the received data, saves the processed data, and sends request information to the communication device. So that the communication device transmits the data saved by the GPU. Wherein, the request information includes a data storage address, so that the communication device can accurately obtain the data to be transmitted by the GPU.

通信装置通过第一FPGA处理芯片200的第一接口210接收GPU发送的请求信息，其中，第一接口210为PCIe接口，以Root Point模式与GPU的PCIe接口采用Gen5x16标准金手指连接。The communication device receives the request information sent by the GPU through the first interface 210 of the first FPGA processing chip 200, wherein the first interface 210 is a PCIe interface, and is connected with the PCIe interface of the GPU in Root Point mode using the Gen5x16 standard golden finger.

S320，从GPU的数据存储地址中读取数据。S320, read data from the data storage address of the GPU.

第一FPGA处理芯片根据接收到的数据存储地址，采用Direct Memory Access直接数据存取(Direct Memory Access，DMA)的方式从GPU显存中的数据存储地址读取数据，使得数据的传输时延小，提高数据的读取效率。The first FPGA processing chip uses the Direct Memory Access (Direct Memory Access, DMA) method to read data from the data storage address in the GPU display memory according to the received data storage address, so that the data transmission delay is small, Improve data read efficiency.

S330，根据从远程资源调度中心设备接收到的配置信息向服务器发送数据；或者根据配置信息向第二FPGA处理芯片发送数据。S330: Send data to the server according to the configuration information received from the remote resource scheduling center device; or send data to the second FPGA processing chip according to the configuration information.

配置信息包括切换信息，其决定了数据的传输路径，当切换信息为由GPU向服务器通信时，第一FPGA处理芯片通过第二接口220向服务器发送数据，由于GPU与服务器之间引入了基于FPGA处理芯片的通信装置，降低了GPU和服务器的耦合度，便于GPU独立池化管理。其中，第二接口220为Endpoint模式的PCIe接口，与GPU的PCIe接口采用Gen5x16标准连接，以匹配GPU的通信接口模式。The configuration information includes switching information, which determines the transmission path of the data. When the switching information is communicated by the GPU to the server, the first FPGA processing chip sends data to the server through the second interface 220. The communication device of the processing chip reduces the coupling between the GPU and the server and facilitates the independent pool management of the GPU. The second interface 220 is a PCIe interface in the Endpoint mode, and is connected to the PCIe interface of the GPU using the Gen5x16 standard to match the communication interface mode of the GPU.

当切换信息为由GPU向其他FPGA处理芯片通信时，第一FPGA处理芯片向第二FPGA处理芯片发送数据，GPU通过FPGA处理芯片的多个网络接口与其他FPGA处理芯片的GPU通信网络拓扑更加灵活。通过多个网络接口与多个FPGA处理芯片进行通信实现多个维度同时计算传输数据，减少数据计算传输经过PCIe总线的次数，降低数据更新所需时间，从而减小数据通信时间开销。When the switching information is that the GPU communicates with other FPGA processing chips, the first FPGA processing chip sends data to the second FPGA processing chip, and the GPU communicates with the GPU of other FPGA processing chips through multiple network interfaces of the FPGA processing chip. The network topology is more flexible . Communication with multiple FPGA processing chips through multiple network interfaces realizes simultaneous calculation and transmission of data in multiple dimensions, reduces the number of data calculation and transmission through the PCIe bus, reduces the time required for data update, and thus reduces data communication time overhead.

在本申请实施例中，通过接收GPU发送的请求信息，请求信息包括数据存储地址，从GPU的数据存储地址中读取数据，根据从远程资源调度中心设备接收到的配置信息向服务器发送数据，或者根据配置信息向第二FPGA处理芯片发送数据，FPGA处理芯片作为GPU与服务器之间的转接卡，不仅降低了GPU与服务器之间的耦合度，减少服务器端CPU、内存、网络等资源的开销，还可以与相邻的FPGA处理芯片形成二维环形网络拓扑，极大地提高了GPU间的灵活性，减少GPU之间互相通信的时间。In the embodiment of the present application, by receiving request information sent by the GPU, the request information includes a data storage address, data is read from the data storage address of the GPU, and data is sent to the server according to the configuration information received from the remote resource scheduling center device, Or send data to the second FPGA processing chip according to the configuration information. The FPGA processing chip acts as an adapter card between the GPU and the server, which not only reduces the coupling between the GPU and the server, but also reduces the CPU, memory, network and other resources on the server side. It can also form a two-dimensional ring network topology with adjacent FPGA processing chips, which greatly improves the flexibility between GPUs and reduces the time for mutual communication between GPUs.

在一些实施例中，第一FPGA处理芯片还包括数据处理模块250，数据处理模块包括GPU直接数据存取单元251；从GPU的数据存储地址中读取数据，包括：In some embodiments, the first FPGA processing chip further includes a data processing module 250, and the data processing module includes a GPU direct data access unit 251; reading data from the data storage address of the GPU includes:

通过GPU直接数据存取单元251从GPU显存中的数据存储地址读取数据，第一FPGA处理芯片200通过PCIe和GPU直接数据存取单元251直接访问GPU内部显存，读取待传输数据，有效地降低了GPU间的通信延时。The GPU direct data access unit 251 reads data from the data storage address in the GPU display memory, and the first FPGA processing chip 200 directly accesses the GPU internal display memory through PCIe and the GPU direct data access unit 251 to read the data to be transmitted, effectively Reduced communication latency between GPUs.

在一些实施例中，第一FPGA处理芯片200还包括桥梁模块260；在根据从远程资源调度中心设备接收到的配置信息向服务器发送数据；或者根据配置信息向第二FPGA处理芯片发送数据之前，方法还包括：In some embodiments, the first FPGA processing chip 200 further includes a bridge module 260; before sending data to the server according to the configuration information received from the remote resource scheduling center device; or before sending data to the second FPGA processing chip according to the configuration information, Methods also include:

通过桥梁模块260接收远程资源调度中心设备发送的配置信息。The configuration information sent by the remote resource scheduling center device is received through the bridge module 260 .

通过桥梁模块260与指令配置模块270、第一接口210、第二接口220和数据处理模块连接，远程资源调度中心设备通过交换机网络将配置信息发送至第一网络接口230，第一网络接口230对配置信息进行解析得到切换PCIe RC信号，并切换PCIe RC信号发送至指令配置模块270，指令配置模块270对切换PCIe RC信号进行判断，并将判断结果发送至桥梁模块260，桥梁模块260根据判断结果可以切换GPU的PCIe总线的连接关系。The bridge module 260 is connected to the instruction configuration module 270, the first interface 210, the second interface 220 and the data processing module, and the remote resource scheduling center device sends the configuration information to the first network interface 230 through the switch network, and the first network interface 230 pairs The configuration information is parsed to obtain the switching PCIe RC signal, and the switching PCIe RC signal is sent to the instruction configuration module 270. The instruction configuration module 270 judges the switching PCIe RC signal, and sends the judgment result to the bridge module 260, and the bridge module 260 according to the judgment result The connection relationship of the PCIe bus of the GPU can be switched.

在通信装置初始化阶段，远程资源调度中心设备通过交换机网络向第一FPGA处理芯片发送配置信息，控制切换GPU的PCIe总线的连接关系，确定数据传输路径，灵活选择GPU中数据的传输对象，解除GPU对服务器主机的依赖性。In the initialization phase of the communication device, the remote resource scheduling center device sends configuration information to the first FPGA processing chip through the switch network, controls and switches the connection relationship of the PCIe bus of the GPU, determines the data transmission path, flexibly selects the data transmission object in the GPU, and releases the GPU. Dependency on the server host.

在一些实施例中，配置信息包括第一指示信息；根据配置信息向服务器发送数据，包括：In some embodiments, the configuration information includes first indication information; sending data to the server according to the configuration information includes:

指令配置模块270接收配置信息中的切换PCIe RC信号后并对其进行解析，判断切换PCIe RC信号对应的值是1还是0。其中，第一指示信息为0，当配置信息的解析结果为0时，GPU与服务器进行数据交互，桥梁模块260直接将数据发送至服务器。The instruction configuration module 270 receives the switch PCIe RC signal in the configuration information, parses it, and determines whether the value corresponding to the switch PCIe RC signal is 1 or 0. Wherein, the first indication information is 0, and when the analysis result of the configuration information is 0, the GPU exchanges data with the server, and the bridge module 260 directly sends the data to the server.

所述配置信息包括第二指示信息；根据所述配置信息向所述第二FPGA处理芯片发送所述数据，包括：The configuration information includes second indication information; sending the data to the second FPGA processing chip according to the configuration information includes:

根据所述第二指示信息，向所述第二FPGA处理芯片发送所述数据。According to the second indication information, the data is sent to the second FPGA processing chip.

第二指示信息为1，当配置信息的解析结果为1时，GPU与第二FPGA处理芯片进行数据交互，桥梁模块260先将数据发送至数据处理模块250，数据处理模块250对其进行处理，将处理后的数据发送至第一收发模块290，最后通过第二网络接口240送至第二FPGA处理芯片。The second indication information is 1. When the analysis result of the configuration information is 1, the GPU and the second FPGA processing chip perform data exchange. The bridge module 260 first sends the data to the data processing module 250, and the data processing module 250 processes it. The processed data is sent to the first transceiver module 290 , and finally sent to the second FPGA processing chip through the second network interface 240 .

在一些实施例中，数据处理模块250还包括运算单元252，配置信息还包括运算规则的信息和目标网络接口的信息；根据第二指示信息，向第二FPGA处理芯片发送数据，包括：In some embodiments, the data processing module 250 further includes an operation unit 252, and the configuration information further includes information of the operation rules and information of the target network interface; according to the second indication information, sending data to the second FPGA processing chip includes:

根据第二指示信息，通过运算模块采用运算规则对数据进行处理得到处理结果；According to the second indication information, the operation module uses the operation rule to process the data to obtain the processing result;

在GPU与第二FPGA处理芯片进行数据交互的过程中，若数据处理模块250的接收单元未收到其他GPU对应的FPGA处理芯片发送的待处理数据，指令配置模块270根据配置信息控制运算单元252不进行任何计算，直接将GPU直接数据存取单元251从GPU读取的数据发送至发送单元253，发送单元253将数据发送至第一收发模块290，第一收发模块290从路由模块280中获取配置信息中目标网络接口的AMC地址信息，通过目标网络接口向其他FPGA处理芯片发送该数据。In the process of data interaction between the GPU and the second FPGA processing chip, if the receiving unit of the data processing module 250 does not receive the data to be processed sent by the FPGA processing chips corresponding to other GPUs, the instruction configuration module 270 controls the operation unit 252 according to the configuration information. Without any calculation, the data read from the GPU by the GPU direct data access unit 251 is directly sent to the sending unit 253, the sending unit 253 sends the data to the first transceiver module 290, and the first transceiver module 290 obtains from the routing module 280 The AMC address information of the target network interface in the configuration information is sent to other FPGA processing chips through the target network interface.

若数据处理模块的接收单元收到其他GPU对应的FPGA处理芯片发送的待处理数据，指令配置模块270根据运算规则的信息控制运算单元252进行相应的计算，GPU直接数据存取单元251从GPU获取到数据后，将数据发送至运算单元252，运算单元252根据预先配置的运算规则将GPU读取到的数据和通过接收单元254接收的FPGA处理芯片发送的数据进行混合计算，将计算结果发送至发送单元253，发送单元253将数据发送至第一收发模块290，第一收发模块290从路由模块280中获取配置信息中目标网络接口的AMC地址信息，通过目标网络接口向其他FPGA处理芯片发送该数据，FPGA处理芯片通过GPU直接数据存取单元251写入到对应的GPU显存中，完成GPU的数据的更新迭代。If the receiving unit of the data processing module receives the data to be processed sent by the FPGA processing chips corresponding to other GPUs, the instruction configuration module 270 controls the operation unit 252 to perform corresponding calculations according to the information of the operation rules, and the GPU direct data access unit 251 obtains the data from the GPU After the data is received, the data is sent to the operation unit 252, and the operation unit 252 performs mixed calculation on the data read by the GPU and the data sent by the FPGA processing chip received by the receiving unit 254 according to the preconfigured operation rules, and sends the calculation result to The sending unit 253, the sending unit 253 sends the data to the first transceiver module 290, and the first transceiver module 290 obtains the AMC address information of the target network interface in the configuration information from the routing module 280, and sends the AMC address information to other FPGA processing chips through the target network interface. The FPGA processing chip writes the data into the corresponding GPU video memory through the GPU direct data access unit 251 to complete the update iteration of the GPU data.

在一个实施例中，提供了一种基于FPGA的分布式GPU通信的装置，该装置包括第一FPGA处理芯片，第一FPGA处理芯片包括第一接口、第二接口、第一网络接口和多个第一网络接口；第一FPGA处理芯片的第一接口与GPU通信连接，第一FPGA处理芯片的第二接口与服务器通信连接；第一网络接口与远程资源调度中心设备通信连接；多个第二网络接口用于与第二FPGA处理芯片通信连接；第一FPGA处理芯片用于：In one embodiment, an FPGA-based distributed GPU communication device is provided, the device includes a first FPGA processing chip, and the first FPGA processing chip includes a first interface, a second interface, a first network interface, and a plurality of the first network interface; the first interface of the first FPGA processing chip is communicatively connected to the GPU, the second interface of the first FPGA processing chip is communicatively connected to the server; the first network interface is communicatively connected to the remote resource scheduling center device; a plurality of second The network interface is used for communicating with the second FPGA processing chip; the first FPGA processing chip is used for:

在本申请实施例中，第一FPGA处理芯片作为GPU与服务器之间的转接卡，不仅降低了GPU与服务器之间的耦合度，还可以与相邻的FPGA处理芯片形成二维环形网络拓扑，极大地提高了通信网络拓扑的灵活性，减少GPU之间互相通信的时间，提高数据转发效率。In the embodiment of the present application, the first FPGA processing chip is used as an adapter card between the GPU and the server, which not only reduces the coupling degree between the GPU and the server, but also forms a two-dimensional ring network topology with adjacent FPGA processing chips , which greatly improves the flexibility of the communication network topology, reduces the time for mutual communication between GPUs, and improves the efficiency of data forwarding.

在一个实施例中，第一FPGA处理芯片还包括数据处理模块，数据处理模块包括GPU直接数据存取单元；第一FPGA处理芯片具体用于：In one embodiment, the first FPGA processing chip further includes a data processing module, and the data processing module includes a GPU direct data access unit; the first FPGA processing chip is specifically used for:

在一个实施例中，第一FPGA处理芯片还包括桥梁模块；第一FPGA处理芯片具体用于：In one embodiment, the first FPGA processing chip further includes a bridge module; the first FPGA processing chip is specifically used for:

在一个实施例中，配置信息包括第一指示信息；第一FPGA处理芯片具体用于：In one embodiment, the configuration information includes first indication information; the first FPGA processing chip is specifically used for:

在一些实施例中，配置信息包括第二指示信息；第一FPGA处理芯片具体用于：In some embodiments, the configuration information includes second indication information; the first FPGA processing chip is specifically used for:

在一些实施例中，数据处理模块还包括运算单元，配置信息还包括运算规则的信息和目标网络接口的信息；第一FPGA处理芯片具体用于：In some embodiments, the data processing module further includes an arithmetic unit, and the configuration information further includes information of an arithmetic rule and information of a target network interface; the first FPGA processing chip is specifically used for:

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be noted that, for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

1. a method for FPGA-based distributed GPU communication, wherein the method is applied to a communication device, and the communication device includes a first FPGA processing chip, and the first FPGA processing chip includes a first interface, a first two interfaces, a first network interface and a plurality of second network interfaces; the first interface of the first FPGA processing chip is communicatively connected to the GPU, and the second interface of the first FPGA processing chip is communicatively connected to the server; the first FPGA processing chip is communicatively connected to the server; A network interface is in communication connection with the remote resource scheduling center device; the plurality of second network interfaces are used for communication connection with the second FPGA processing chip; the method includes:

Receive request information sent by the GPU, where the request information includes an address for data storage;

read data from the data storage address of the GPU;

Send the data to the server according to the configuration information received from the remote resource scheduling center device; or send the data to the second FPGA processing chip according to the configuration information.

2. The method according to claim 1, wherein the first FPGA processing chip further comprises a data processing module, and the data processing module comprises a GPU direct data access unit; Read data from the data storage address, including:

Data is read from the data storage address of the GPU by the GPU direct data access unit.

3. The method according to claim 1 or 2, wherein the first FPGA processing chip further comprises a bridge module; sending the data; or before sending the data to the second FPGA processing chip according to the configuration information, the method further includes:

The configuration information sent by the remote resource scheduling center device is received by the bridge module.

4. The method according to claim 3, wherein the configuration information comprises first indication information; sending the data to the server according to the configuration information comprises:

The data is sent to the server according to the first indication information.

5. The method according to claim 3, wherein the configuration information comprises second indication information; sending the data to the second FPGA processing chip according to the configuration information comprises:

According to the second indication information, the data is sent to the second FPGA processing chip.

6 . The method according to claim 5 , wherein the data processing module further comprises an arithmetic unit, and the configuration information further comprises arithmetic rule information and target network interface information; information, sending the data to the second FPGA processing chip, including:

According to the second indication information, the operation unit uses the operation rule to process the data to obtain a processing result;

The data is sent to the second FPGA processing chip through the target network interface of the plurality of second network interfaces.

7. A device for FPGA-based distributed GPU communication, wherein the device includes a first FPGA processing chip, and the first FPGA processing chip includes a first interface, a second interface, a first network interface and multiple a first network interface; the first interface of the first FPGA processing chip is communicatively connected to the GPU, and the second interface of the first FPGA processing chip is communicatively connected to the server; the first network interface is connected to a remote resource scheduling center device communication connection; the plurality of second network interfaces are used for communication connection with the second FPGA processing chip; the first FPGA processing chip is used for:

Receive request information sent by the GPU through the first interface, where the request information includes an address for data storage;

read data from the data storage address of the GPU;

Send the data to the server according to the configuration information received from the remote resource scheduling center device through the first network interface; or send the data device communication to the second FPGA processing chip according to the configuration information connection; the second network interface is used for communication and connection with the second FPGA processing chip.

8. The device according to claim 7, wherein the first FPGA processing chip further comprises a data processing module, and the data processing module comprises a GPU direct data access unit; the first FPGA processing chip specifically uses At:

9. The device according to claim 7 or 8, wherein the first FPGA processing chip further comprises a bridge module; the first FPGA processing chip is specifically used for:

10. The apparatus according to claim 9, wherein the configuration information comprises first indication information; and the first FPGA processing chip is specifically used for:

The data is sent to the server according to the first indication information.