CN114445260B

CN114445260B - FPGA-based distributed GPU communication method and device

Info

Publication number: CN114445260B
Application number: CN202210051088.4A
Authority: CN
Inventors: 张静东; 王江为; 王媛丽; 阚宏伟
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2024-01-12
Anticipated expiration: 2042-01-17
Also published as: CN114445260A

Abstract

The method is applied to a communication device, the communication device comprises a first FPGA processing chip, the first FPGA processing chip receives request information sent by a GPU, the request information comprises a data storage address, data is read from the data storage address of the GPU, the data is sent to a server according to configuration information received from remote resource scheduling center equipment, or the data is sent to a second FPGA processing chip according to the configuration information, the FPGA processing chip is used as a transfer card between the GPU and the server, the coupling degree between the GPU and the server is reduced, and a two-dimensional annular network topology can be formed between the FPGA processing chip and an adjacent FPGA processing chip, so that flexibility between the GPUs is greatly improved, and the time for mutual communication between the GPUs is reduced.

Description

FPGA-based distributed GPU communication method and device

技术领域Technical field

本申请涉及通信技术领域，特别是涉及一种基于FPGA的分布式GPU通信的方法及装置。The present application relates to the field of communication technology, and in particular to a method and device for distributed GPU communication based on FPGA.

背景技术Background technique

图形处理器(Graphics Processing Unit，GPU)是一种专用的图形处理芯片，无论是早期用于图形图像处理，还是现在广泛用于AI人工智能计算领域，都是一种重要的计算芯片。GPU作为一种高速串行计算机扩展总线标准(Peripheral Component Interconnectexpress，PCIe)设备插接在数据中心服务器插槽上，通过PCIe接口与主机服务器和其他GPU节点通信。Graphics Processing Unit (GPU) is a dedicated graphics processing chip. Whether it was used for graphics and image processing in the early days or is now widely used in the field of AI artificial intelligence computing, it is an important computing chip. As a high-speed serial computer expansion bus standard (Peripheral Component Interconnectexpress, PCIe) device, the GPU is plugged into the data center server slot and communicates with the host server and other GPU nodes through the PCIe interface.

由于GPU与服务器通过PCIe接口连接的紧耦合关系，GPU无法独立于服务器单独运行，跨节点GPU间通信只能通过网卡连接到交换机的方式进行，通信网络拓扑不够灵活，数据转发效率低、通信延时大。Due to the tight coupling between the GPU and the server through the PCIe interface, the GPU cannot run independently of the server. Communication between GPUs across nodes can only be done by connecting the network card to the switch. The communication network topology is not flexible enough, the data forwarding efficiency is low, and the communication delay Time is big.

发明内容Contents of the invention

基于此，有必要针对上述技术问题，提供一种基于FPGA的分布式GPU通信的方法及装置，FPGA处理芯片作为GPU与服务器之间的转接卡，不仅降低了GPU与服务器之间的耦合度，还可以与相邻的FPGA处理芯片形成二维环形网络拓扑，极大地提高了通信网络拓扑的灵活性，减少GPU之间互相通信的时间，提高数据转发效率。Based on this, it is necessary to provide a method and device for distributed GPU communication based on FPGA to address the above technical problems. The FPGA processing chip serves as an adapter card between the GPU and the server, which not only reduces the coupling between the GPU and the server , it can also form a two-dimensional ring network topology with adjacent FPGA processing chips, which greatly improves the flexibility of the communication network topology, reduces the time for mutual communication between GPUs, and improves data forwarding efficiency.

第一方面，提供一种基于FPGA的分布式GPU通信的方法，该方法应用于通信装置，通信装置包括第一FPGA处理芯片，第一FPGA处理芯片包括第一接口、第二接口、第一网络接口和多个第二网络接口；第一FPGA处理芯片的第一接口与GPU通信连接，第一FPGA处理芯片的第二接口与服务器通信连接；第一网络接口与远程资源调度中心设备通信连接；多个第二网络接口用于与第二FPGA处理芯片通信连接；方法包括：In a first aspect, a method of distributed GPU communication based on FPGA is provided. The method is applied to a communication device. The communication device includes a first FPGA processing chip. The first FPGA processing chip includes a first interface, a second interface, and a first network. interface and a plurality of second network interfaces; the first interface of the first FPGA processing chip is communicatively connected to the GPU, and the second interface of the first FPGA processing chip is communicatively connected to the server; the first network interface is communicatively connected to the remote resource dispatching center device; A plurality of second network interfaces are used to communicate with the second FPGA processing chip; the method includes:

接收GPU发送的请求信息，请求信息包括数据存储的地址；Receive request information sent by the GPU. The request information includes the address of the data storage;

从GPU的数据存储地址中读取数据；Read data from the data storage address of the GPU;

根据从远程资源调度中心设备接收到的配置信息向服务器发送数据；或者根据配置信息向第二FPGA处理芯片发送数据。Send data to the server according to the configuration information received from the remote resource dispatching center device; or send data to the second FPGA processing chip according to the configuration information.

在一种可能的实现方式中，第一FPGA处理芯片还包括数据处理模块，数据处理模块包括GPU直接数据存取单元；从GPU的数据存储地址中读取数据，包括：In a possible implementation, the first FPGA processing chip also includes a data processing module. The data processing module includes a GPU direct data access unit; reading data from the data storage address of the GPU includes:

通过GPU直接数据存取单元从GPU的数据存储地址中读取数据。Data is read from the data storage address of the GPU through the GPU direct data access unit.

在一种可能的实现方式中，第一FPGA处理芯片还包括桥梁模块；在根据从远程资源调度中心设备接收到的配置信息向服务器发送数据；或者根据配置信息向第二FPGA处理芯片发送数据之前，方法还包括：In a possible implementation, the first FPGA processing chip also includes a bridge module; before sending data to the server according to the configuration information received from the remote resource dispatching center device; or before sending data to the second FPGA processing chip according to the configuration information. , methods also include:

通过桥梁模块接收远程资源调度中心设备发送的配置信息。Receive the configuration information sent by the remote resource dispatching center device through the bridge module.

在一种可能的实现方式中，配置信息包括第一指示信息；根据配置信息向服务器发送数据，包括：In a possible implementation, the configuration information includes first indication information; sending data to the server according to the configuration information includes:

根据第一指示信息，向服务器发送数据。Send data to the server according to the first instruction information.

在一种可能的实现方式中，配置信息包括第二指示信息；根据配置信息向第二FPGA处理芯片发送数据，包括：In a possible implementation, the configuration information includes second indication information; sending data to the second FPGA processing chip according to the configuration information includes:

根据第二指示信息，向第二FPGA处理芯片发送数据。Send data to the second FPGA processing chip according to the second instruction information.

在一种可能的实现方式中，数据处理模块还包括运算单元，配置信息还包括运算规则的信息和目标网络接口的信息；根据第二指示信息，向第二FPGA处理芯片发送数据，包括：In a possible implementation, the data processing module also includes a computing unit, and the configuration information also includes computing rule information and target network interface information; sending data to the second FPGA processing chip according to the second instruction information includes:

根据第二指示信息，通过运算单元采用运算规则对数据进行处理得到处理结果；According to the second instruction information, the arithmetic unit uses arithmetic rules to process the data to obtain a processing result;

通过多个第二网络接口中的目标网络接口向第二FPGA处理芯片发送数据。Send data to the second FPGA processing chip through a target network interface among the plurality of second network interfaces.

第二方面，提供了一种基于FPGA的分布式GPU通信的装置，该装置包括第一FPGA处理芯片，第一FPGA处理芯片包括第一接口、第二接口、第一网络接口和多个第一网络接口；第一FPGA处理芯片的第一接口与GPU通信连接，第一FPGA处理芯片的第二接口与服务器通信连接；第一网络接口与远程资源调度中心设备通信连接；多个第二网络接口用于与第二FPGA处理芯片通信连接；第一FPGA处理芯片用于：In a second aspect, a device for distributed GPU communication based on FPGA is provided. The device includes a first FPGA processing chip. The first FPGA processing chip includes a first interface, a second interface, a first network interface and a plurality of first interfaces. Network interface; the first interface of the first FPGA processing chip is communicatively connected to the GPU, and the second interface of the first FPGA processing chip is communicatively connected to the server; the first network interface is communicatively connected to the remote resource dispatching center device; a plurality of second network interfaces Used for communication connection with the second FPGA processing chip; the first FPGA processing chip is used for:

通过第一接口接收GPU发送的请求信息，请求信息包括数据存储的地址；Receive request information sent by the GPU through the first interface, where the request information includes the address of the data storage;

根据通过第一网络接口从远程资源调度中心设备接收到的配置信息向服务器发送数据；或者根据配置信息向第二FPGA处理芯片发送数据设备通信连接；第二网络接口用于与第二FPGA处理芯片通信连接。Send data to the server according to the configuration information received from the remote resource dispatching center device through the first network interface; or send data to the second FPGA processing chip according to the configuration information; the second network interface is used to communicate with the second FPGA processing chip Communication connection.

在一种可能的实现方式中，第一FPGA处理芯片还包括数据处理模块，数据处理模块包括GPU直接数据存取单元；第一FPGA处理芯片具体用于：In a possible implementation, the first FPGA processing chip also includes a data processing module, and the data processing module includes a GPU direct data access unit; the first FPGA processing chip is specifically used for:

在一种可能的实现方式中，第一FPGA处理芯片还包括桥梁模块；第一FPGA处理芯片具体用于：In a possible implementation, the first FPGA processing chip also includes a bridge module; the first FPGA processing chip is specifically used for:

在一种可能的实现方式中，配置信息包括第一指示信息；第一FPGA处理芯片具体用于：In a possible implementation, the configuration information includes first indication information; the first FPGA processing chip is specifically used for:

上述基于FPGA的分布式GPU通信的方法及装置，通过接收GPU发送的请求信息，请求信息包括数据存储地址，从GPU的数据存储地址中读取数据，根据从远程资源调度中心设备接收到的配置信息向服务器发送数据，或者根据配置信息向第二FPGA处理芯片发送数据，FPGA处理芯片作为GPU与服务器之间的转接卡，不仅降低了GPU与服务器之间的耦合度，减少服务器端CPU、内存、网络等资源的开销，还可以与相邻的FPGA处理芯片形成二维环形网络拓扑，极大地提高了GPU间的灵活性，减少GPU之间互相通信的时间。The above-mentioned FPGA-based distributed GPU communication method and device, by receiving request information sent by the GPU, the request information includes a data storage address, reads data from the data storage address of the GPU, and reads data from the data storage address of the GPU according to the configuration received from the remote resource dispatching center device. Information is sent to the server, or data is sent to the second FPGA processing chip according to the configuration information. The FPGA processing chip serves as an adapter card between the GPU and the server, which not only reduces the coupling between the GPU and the server, but also reduces the server-side CPU, The overhead of resources such as memory and network can also form a two-dimensional ring network topology with adjacent FPGA processing chips, which greatly improves the flexibility between GPUs and reduces the time for mutual communication between GPUs.

附图说明Description of the drawings

图1为本申请一个实施例中基于FPGA的分布式GPU方法的应用环境图；Figure 1 is an application environment diagram of the FPGA-based distributed GPU method in one embodiment of the present application;

图2为一个实施例中第一FPGA处理芯片的结构框图；Figure 2 is a structural block diagram of the first FPGA processing chip in one embodiment;

图3为一个实施例中基于FPGA的分布式GPU方法的流程示意图。Figure 3 is a schematic flowchart of an FPGA-based distributed GPU method in one embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.

在现有技术中，GPUDirect系列是一种GPU直接内存访问技术，可以让GPU通过PCIe芯片集总线系统访问其他子设备或主机的内存，从而减少不必要的内存拷贝、降低中央处理器(Central Processing Unit，CPU)使用开销，提高数据传输效率。其中，GPUDirectShared Memory是GPUDirect系列的一种GPU直接共享内存的技术，能够使GPU与其他PCIe设备共享主机的内存页来实现不同PCIe设备之间数据通信的技术。In the existing technology, the GPUDirect series is a GPU direct memory access technology that allows the GPU to access the memory of other sub-devices or hosts through the PCIe chipset bus system, thereby reducing unnecessary memory copies and reducing the cost of Central Processing. Unit, CPU) usage overhead to improve data transmission efficiency. Among them, GPUDirectShared Memory is a GPU direct shared memory technology of the GPUDirect series, which enables the GPU to share the memory pages of the host with other PCIe devices to achieve data communication between different PCIe devices.

GPUDirect P2P是GPUDirect系列的一种GPU间直接共享显存的技术，是指同一PCIe根复合体(PCIe Root Complex，PCIe RC)域下两个GPU设备直接互访GPU显存，GPU间不需要将数据拷贝到主机内存里中转，相比GPUDirect Shared Memory技术，减少了将数据从GPU显存拷贝到主机内存和从主机内存拷贝到GPU显存的步骤，降低了数据通路延时，提高了数据传输效率。GPUDirect P2P is a technology of the GPUDirect series that directly shares video memory between GPUs. It refers to two GPU devices in the same PCIe Root Complex (PCIe Root Complex, PCIe RC) domain directly accessing GPU video memory. There is no need to copy data between GPUs. Transfer to the host memory. Compared with GPU Direct Shared Memory technology, it reduces the steps of copying data from GPU memory to host memory and from host memory to GPU memory, reduces data path delay, and improves data transmission efficiency.

GPU直接远程直接内存访问(GPUDirect Remote Direct Memory Access，GPUDirect RDMA)技术是利用RDMA相关技术、物理网卡和传输网络实现GPU间直接交互显存数据，该技术解决了传统网络数据传输过程中处理延时大、CPU占用率高等问题，实现了不同节点间直接互访GPU显存的功能。GPU Direct Remote Direct Memory Access (GPUDirect RDMA) technology uses RDMA related technologies, physical network cards, and transmission networks to realize direct interaction of video memory data between GPUs. This technology solves the problem of large processing delays in traditional network data transmission. , high CPU usage and other problems, the function of direct mutual access to GPU memory between different nodes is realized.

GPUDirect Shared Memory技术和GPUDirect P2P技术都是基于GPU作为主机下的PCIe设备开发实现，与其他设备通信依赖CPU、主机内存和PCIe交换系统等参与，设备与服务器CPU、内存等通过PCIe紧耦合，通信仅限于单节点内的GPU。采用GPUDirect SharedMemory技术进行单节点内的GPU间交换显存数据时，需要经过每个CPU及CPU1内存等模块，CPU额外开销大、数据传输处理延时大。采用GPUDirect P2P技术进行GPU间显存数据交换时，仅限于同一PCIe RC域下的GPU间通过PCIe芯片集进行显存数据直接交互，如果跨两个CPU的PCIe RC域还需要CPU及CPU内存参与数据传输，导致GPU间显存数据交互延时和CPU开销依然很大。GPUDirect Shared Memory technology and GPUDirect P2P technology are both developed and implemented based on the GPU as a PCIe device under the host. Communication with other devices relies on the participation of the CPU, host memory and PCIe switching system. The device and server CPU, memory, etc. are tightly coupled through PCIe, and communication Limited to GPU within a single node. When using GPUDirect SharedMemory technology to exchange video memory data between GPUs in a single node, it needs to go through each CPU and CPU1 memory and other modules, resulting in large additional CPU overhead and large data transmission and processing delays. When GPUDirect P2P technology is used to exchange video memory data between GPUs, the direct interaction of video memory data between GPUs in the same PCIe RC domain is limited to the PCIe chipset. If the PCIe RC domain spans two CPUs, the CPU and CPU memory are also required to participate in data transmission. , causing the memory data exchange delay and CPU overhead between GPUs to still be very large.

GPUDirect RDMA技术虽然利用RDMA技术实现了跨节点间的GPU通信问题，但需要在同一PCIe域下的高性能网卡以及本地服务器CPU等参与帮助GPU完成跨节点的数据传输，GPU与服务器仍是通过PCIe连接的紧耦合关系，GPU无法独立于服务器单独运行，跨节点GPU间通信只能通过网卡连接到交换机的方式进行，通信网络拓扑不够灵活，数据包转发效率低、通信延时大。Although GPUDirect RDMA technology uses RDMA technology to realize cross-node GPU communication problems, it requires the participation of high-performance network cards in the same PCIe domain and local server CPUs to help the GPU complete cross-node data transmission. The GPU and server still use PCIe Due to the tight coupling relationship of the connections, the GPU cannot run independently of the server. Communication between GPUs across nodes can only be done by connecting the network card to the switch. The communication network topology is not flexible enough, the data packet forwarding efficiency is low, and the communication delay is large.

为了解决现有技术问题，本申请实施例提供了一种基于FPGA的分布式GPU通信的方法及装置。下面首先对本申请实施例所提供的基于FPGA的分布式GPU通信的方法进行介绍，该方法应用于图1所示的应用环境，如图1所示，通信装置100包括多个FPGA处理芯片，每个FPGA处理芯片包含多个网络接口，其中第一网络接口230通过交换机与远程资源调度中心设备通信连接，第二网络接口240与周围其他FPGA处理芯片组成2D环形通信拓扑，第二网络接口240的数量可以根据用户需求进行调整，不以设置4个为限。FPGA处理芯片通过第二网络接口240可以向相邻的FPGA处理芯片发送数据，或者接收相邻的FPGA处理芯片发送的数据。第一网络接口和第二网络接口为100G网络光口，使GPU之间以及GPU与远程资源调度中心设备进行高效通信。In order to solve the existing technical problems, embodiments of the present application provide a method and device for distributed GPU communication based on FPGA. The following first introduces the FPGA-based distributed GPU communication method provided by the embodiment of the present application. This method is applied to the application environment shown in Figure 1. As shown in Figure 1, the communication device 100 includes multiple FPGA processing chips, each of which is Each FPGA processing chip includes multiple network interfaces. The first network interface 230 is connected to the remote resource dispatching center device through a switch. The second network interface 240 forms a 2D ring communication topology with other surrounding FPGA processing chips. The second network interface 240 The number can be adjusted according to user needs, and is not limited to 4. The FPGA processing chip can send data to the adjacent FPGA processing chip through the second network interface 240, or receive data sent by the adjacent FPGA processing chip. The first network interface and the second network interface are 100G network optical ports, enabling efficient communication between GPUs and between GPUs and remote resource dispatching center equipment.

将其中任意一个FPGA处理芯片定义为第一FPGA处理芯片，与第一FPGA处理芯片相连接的FPGA处理芯片定义为第二FPGA处理芯片，第一FPGA处理芯片和第二FPGA处理芯片具有相同的结构。Any one of the FPGA processing chips is defined as the first FPGA processing chip, and the FPGA processing chip connected to the first FPGA processing chip is defined as the second FPGA processing chip. The first FPGA processing chip and the second FPGA processing chip have the same structure. .

如图2所示，第一FPGA处理芯片200包括第一接口210、第二接口220、第一网络接口230和多个第二网络接口240；第一FPGA处理芯片的第一接口210与GPU通信连接，第一FPGA处理芯片200的第二接口220与服务器通信连接；第一网络接口230与远程资源调度中心设备通信连接；多个第二网络接口240用于与第二FPGA处理芯片通信连接。As shown in Figure 2, the first FPGA processing chip 200 includes a first interface 210, a second interface 220, a first network interface 230 and a plurality of second network interfaces 240; the first interface 210 of the first FPGA processing chip communicates with the GPU. The second interface 220 of the first FPGA processing chip 200 is connected to the server; the first network interface 230 is connected to the remote resource dispatching center device; and the plurality of second network interfaces 240 are used to communicate with the second FPGA processing chip.

第一FPGA处理芯片200还包括指令配置模块270、路由模块280、第一收发模块290、第二收发模块2100，其中，第一收发模块290为RoCE收发模块，与第二网络接口240和路由模块280通信连接，通过RoCE协议接收来自相邻FPGA处理芯片的数据包，解析该数据包，并对第一FPGA处理芯片的数据进行组包，向相邻FPGA处理芯片发送数据包。The first FPGA processing chip 200 also includes an instruction configuration module 270, a routing module 280, a first transceiver module 290, and a second transceiver module 2100. The first transceiver module 290 is a RoCE transceiver module, together with the second network interface 240 and the routing module. 280 communication connection, receives the data packet from the adjacent FPGA processing chip through the RoCE protocol, parses the data packet, packages the data of the first FPGA processing chip, and sends the data packet to the adjacent FPGA processing chip.

第二收发模块2100为RoCEv2收发模块，与第一网络接口230、路由模块280和指令配置模块270通信连接，通过RoCEv2协议接收远程资源调度中心设备通信向第一网络接口230发送的配置信息，并解析配置信息，将解析结果分发至对应的模块，比如说将运算法则的信息发送至指令配置模块270，完成GPU资源的注册、初始化等任务。同时，还可以将与第一FPGA处理芯片连接的GPU数据进行组包，通过第一网络接口230连接到交换机与其他FPGA处理芯片连接的GPU进行通信。The second transceiver module 2100 is a RoCEv2 transceiver module, communicates with the first network interface 230, the routing module 280 and the instruction configuration module 270, receives the configuration information sent by the remote resource dispatching center device to the first network interface 230 through the RoCEv2 protocol, and Parse the configuration information and distribute the parsing results to the corresponding modules. For example, send the algorithm information to the instruction configuration module 270 to complete tasks such as registration and initialization of GPU resources. At the same time, the GPU data connected to the first FPGA processing chip can also be packaged and connected to the switch through the first network interface 230 to communicate with the GPUs connected to other FPGA processing chips.

路由模块280在第一FPGA处理芯片上电后，通过第二网络接口240与相邻的FPGA处理芯片进行通信，获取相邻的FPGA处理芯片的第二网络接口240的MAC地址信息并保存至内存中，以便通过RoCE协议进行通信。After the first FPGA processing chip is powered on, the routing module 280 communicates with the adjacent FPGA processing chip through the second network interface 240, obtains the MAC address information of the second network interface 240 of the adjacent FPGA processing chip, and saves it to the memory. in order to communicate via the RoCE protocol.

图3示出了本申请一个实施例提供基于FPGA的分布式GPU通信的方法的流程示意图。如图3所示，该方法可以包括以下步骤：Figure 3 shows a schematic flowchart of a method for providing FPGA-based distributed GPU communication according to one embodiment of the present application. As shown in Figure 3, the method may include the following steps:

S310，接收GPU发送的请求信息，请求信息包括数据存储地址。S310: Receive request information sent by the GPU, where the request information includes the data storage address.

在进行数据通信之前，远程资源调度中心设备对通信装置进行初始化设置，通过服务器向GPU发送待处理的数据，GPU对接收到的数据进行处理，保存处理后的数据，向通信装置发送请求信息，以使通信装置将GPU保存的数据进行传输。其中，请求信息包括数据存储地址，以便通信装置准确地获取GPU需要传送的数据。Before data communication, the remote resource dispatching center equipment initializes the communication device, sends the data to be processed to the GPU through the server, and the GPU processes the received data, saves the processed data, and sends request information to the communication device. So that the communication device transmits the data saved by the GPU. The request information includes a data storage address so that the communication device can accurately obtain the data that the GPU needs to transmit.

通信装置通过第一FPGA处理芯片200的第一接口210接收GPU发送的请求信息，其中，第一接口210为PCIe接口，以Root Point模式与GPU的PCIe接口采用Gen5x16标准金手指连接。The communication device receives the request information sent by the GPU through the first interface 210 of the first FPGA processing chip 200, where the first interface 210 is a PCIe interface and is connected to the PCIe interface of the GPU using Gen5x16 standard golden finger in Root Point mode.

S320，从GPU的数据存储地址中读取数据。S320: Read data from the data storage address of the GPU.

第一FPGA处理芯片根据接收到的数据存储地址，采用Direct Memory Access直接数据存取(Direct Memory Access，DMA)的方式从GPU显存中的数据存储地址读取数据，使得数据的传输时延小，提高数据的读取效率。The first FPGA processing chip uses Direct Memory Access (DMA) to read data from the data storage address in the GPU memory according to the received data storage address, so that the data transmission delay is small. Improve data reading efficiency.

S330，根据从远程资源调度中心设备接收到的配置信息向服务器发送数据；或者根据配置信息向第二FPGA处理芯片发送数据。S330: Send data to the server according to the configuration information received from the remote resource dispatching center device; or send data to the second FPGA processing chip according to the configuration information.

配置信息包括切换信息，其决定了数据的传输路径，当切换信息为由GPU向服务器通信时，第一FPGA处理芯片通过第二接口220向服务器发送数据，由于GPU与服务器之间引入了基于FPGA处理芯片的通信装置，降低了GPU和服务器的耦合度，便于GPU独立池化管理。其中，第二接口220为Endpoint模式的PCIe接口，与GPU的PCIe接口采用Gen5x16标准连接，以匹配GPU的通信接口模式。The configuration information includes switching information, which determines the data transmission path. When the switching information is communicated from the GPU to the server, the first FPGA processing chip sends data to the server through the second interface 220. Since the FPGA-based FPGA is introduced between the GPU and the server The communication device of the processing chip reduces the coupling between the GPU and the server and facilitates independent pool management of the GPU. The second interface 220 is a PCIe interface in Endpoint mode, and is connected to the PCIe interface of the GPU using the Gen5x16 standard to match the communication interface mode of the GPU.

当切换信息为由GPU向其他FPGA处理芯片通信时，第一FPGA处理芯片向第二FPGA处理芯片发送数据，GPU通过FPGA处理芯片的多个网络接口与其他FPGA处理芯片的GPU通信网络拓扑更加灵活。通过多个网络接口与多个FPGA处理芯片进行通信实现多个维度同时计算传输数据，减少数据计算传输经过PCIe总线的次数，降低数据更新所需时间，从而减小数据通信时间开销。When the switching information is communication from the GPU to other FPGA processing chips, the first FPGA processing chip sends data to the second FPGA processing chip, and the GPU communicates with other FPGA processing chips through multiple network interfaces of the FPGA processing chip. The network topology of the GPU communication is more flexible. . Communicating with multiple FPGA processing chips through multiple network interfaces enables simultaneous calculation and transmission of data in multiple dimensions, reducing the number of data calculations and transmissions through the PCIe bus, reducing the time required for data update, thereby reducing data communication time overhead.

在本申请实施例中，通过接收GPU发送的请求信息，请求信息包括数据存储地址，从GPU的数据存储地址中读取数据，根据从远程资源调度中心设备接收到的配置信息向服务器发送数据，或者根据配置信息向第二FPGA处理芯片发送数据，FPGA处理芯片作为GPU与服务器之间的转接卡，不仅降低了GPU与服务器之间的耦合度，减少服务器端CPU、内存、网络等资源的开销，还可以与相邻的FPGA处理芯片形成二维环形网络拓扑，极大地提高了GPU间的灵活性，减少GPU之间互相通信的时间。In the embodiment of this application, by receiving the request information sent by the GPU, the request information includes the data storage address, reading the data from the data storage address of the GPU, and sending the data to the server according to the configuration information received from the remote resource dispatching center device, Or send data to the second FPGA processing chip according to the configuration information. The FPGA processing chip serves as an adapter card between the GPU and the server, which not only reduces the coupling between the GPU and the server, but also reduces the consumption of server-side CPU, memory, network and other resources. Overhead, it can also form a two-dimensional ring network topology with adjacent FPGA processing chips, which greatly improves the flexibility between GPUs and reduces the time for mutual communication between GPUs.

在一些实施例中，第一FPGA处理芯片还包括数据处理模块250，数据处理模块包括GPU直接数据存取单元251；从GPU的数据存储地址中读取数据，包括：In some embodiments, the first FPGA processing chip also includes a data processing module 250. The data processing module includes a GPU direct data access unit 251; reading data from the data storage address of the GPU includes:

通过GPU直接数据存取单元251从GPU显存中的数据存储地址读取数据，第一FPGA处理芯片200通过PCIe和GPU直接数据存取单元251直接访问GPU内部显存，读取待传输数据，有效地降低了GPU间的通信延时。The data is read from the data storage address in the GPU memory through the GPU direct data access unit 251. The first FPGA processing chip 200 directly accesses the GPU internal memory through PCIe and the GPU direct data access unit 251 to read the data to be transmitted, effectively Reduced communication latency between GPUs.

在一些实施例中，第一FPGA处理芯片200还包括桥梁模块260；在根据从远程资源调度中心设备接收到的配置信息向服务器发送数据；或者根据配置信息向第二FPGA处理芯片发送数据之前，方法还包括：In some embodiments, the first FPGA processing chip 200 also includes a bridge module 260; before sending data to the server according to the configuration information received from the remote resource dispatching center device; or before sending data to the second FPGA processing chip according to the configuration information, Methods also include:

通过桥梁模块260接收远程资源调度中心设备发送的配置信息。The configuration information sent by the remote resource dispatching center device is received through the bridge module 260.

通过桥梁模块260与指令配置模块270、第一接口210、第二接口220和数据处理模块连接，远程资源调度中心设备通过交换机网络将配置信息发送至第一网络接口230，第一网络接口230对配置信息进行解析得到切换PCIe RC信号，并切换PCIe RC信号发送至指令配置模块270，指令配置模块270对切换PCIe RC信号进行判断，并将判断结果发送至桥梁模块260，桥梁模块260根据判断结果可以切换GPU的PCIe总线的连接关系。Through the bridge module 260 connecting with the instruction configuration module 270, the first interface 210, the second interface 220 and the data processing module, the remote resource dispatching center device sends the configuration information to the first network interface 230 through the switch network, and the first network interface 230 The configuration information is parsed to obtain the switching PCIe RC signal, and the switching PCIe RC signal is sent to the instruction configuration module 270. The instruction configuration module 270 judges the switching PCIe RC signal, and sends the judgment result to the bridge module 260. The bridge module 260 determines the result according to the judgment result. You can switch the connection relationship of the GPU's PCIe bus.

在通信装置初始化阶段，远程资源调度中心设备通过交换机网络向第一FPGA处理芯片发送配置信息，控制切换GPU的PCIe总线的连接关系，确定数据传输路径，灵活选择GPU中数据的传输对象，解除GPU对服务器主机的依赖性。In the initialization stage of the communication device, the remote resource dispatching center device sends configuration information to the first FPGA processing chip through the switch network, controls and switches the connection relationship of the PCIe bus of the GPU, determines the data transmission path, flexibly selects the transmission object of the data in the GPU, and releases the GPU. Dependency on the server host.

在一些实施例中，配置信息包括第一指示信息；根据配置信息向服务器发送数据，包括：In some embodiments, the configuration information includes first indication information; sending data to the server according to the configuration information includes:

指令配置模块270接收配置信息中的切换PCIe RC信号后并对其进行解析，判断切换PCIe RC信号对应的值是1还是0。其中，第一指示信息为0，当配置信息的解析结果为0时，GPU与服务器进行数据交互，桥梁模块260直接将数据发送至服务器。The instruction configuration module 270 receives the switching PCIe RC signal in the configuration information and analyzes it to determine whether the value corresponding to the switching PCIe RC signal is 1 or 0. Among them, the first indication information is 0. When the parsing result of the configuration information is 0, the GPU and the server perform data interaction, and the bridge module 260 directly sends the data to the server.

所述配置信息包括第二指示信息；根据所述配置信息向所述第二FPGA处理芯片发送所述数据，包括：The configuration information includes second indication information; sending the data to the second FPGA processing chip according to the configuration information includes:

根据所述第二指示信息，向所述第二FPGA处理芯片发送所述数据。Send the data to the second FPGA processing chip according to the second instruction information.

第二指示信息为1，当配置信息的解析结果为1时，GPU与第二FPGA处理芯片进行数据交互，桥梁模块260先将数据发送至数据处理模块250，数据处理模块250对其进行处理，将处理后的数据发送至第一收发模块290，最后通过第二网络接口240送至第二FPGA处理芯片。The second indication information is 1. When the parsing result of the configuration information is 1, the GPU interacts with the second FPGA processing chip. The bridge module 260 first sends the data to the data processing module 250, and the data processing module 250 processes it. The processed data is sent to the first transceiver module 290, and finally sent to the second FPGA processing chip through the second network interface 240.

在一些实施例中，数据处理模块250还包括运算单元252，配置信息还包括运算规则的信息和目标网络接口的信息；根据第二指示信息，向第二FPGA处理芯片发送数据，包括：In some embodiments, the data processing module 250 also includes an operation unit 252, and the configuration information also includes operation rule information and target network interface information; sending data to the second FPGA processing chip according to the second instruction information includes:

根据第二指示信息，通过运算模块采用运算规则对数据进行处理得到处理结果；According to the second instruction information, the operation module uses operation rules to process the data to obtain the processing result;

在GPU与第二FPGA处理芯片进行数据交互的过程中，若数据处理模块250的接收单元未收到其他GPU对应的FPGA处理芯片发送的待处理数据，指令配置模块270根据配置信息控制运算单元252不进行任何计算，直接将GPU直接数据存取单元251从GPU读取的数据发送至发送单元253，发送单元253将数据发送至第一收发模块290，第一收发模块290从路由模块280中获取配置信息中目标网络接口的AMC地址信息，通过目标网络接口向其他FPGA处理芯片发送该数据。During the data interaction between the GPU and the second FPGA processing chip, if the receiving unit of the data processing module 250 does not receive the data to be processed sent by the FPGA processing chip corresponding to the other GPU, the instruction configuration module 270 controls the computing unit 252 according to the configuration information. Without any calculation, the data read from the GPU by the GPU direct data access unit 251 is directly sent to the sending unit 253. The sending unit 253 sends the data to the first transceiver module 290, and the first transceiver module 290 obtains it from the routing module 280. The AMC address information of the target network interface in the configuration information sends the data to other FPGA processing chips through the target network interface.

若数据处理模块的接收单元收到其他GPU对应的FPGA处理芯片发送的待处理数据，指令配置模块270根据运算规则的信息控制运算单元252进行相应的计算，GPU直接数据存取单元251从GPU获取到数据后，将数据发送至运算单元252，运算单元252根据预先配置的运算规则将GPU读取到的数据和通过接收单元254接收的FPGA处理芯片发送的数据进行混合计算，将计算结果发送至发送单元253，发送单元253将数据发送至第一收发模块290，第一收发模块290从路由模块280中获取配置信息中目标网络接口的AMC地址信息，通过目标网络接口向其他FPGA处理芯片发送该数据，FPGA处理芯片通过GPU直接数据存取单元251写入到对应的GPU显存中，完成GPU的数据的更新迭代。If the receiving unit of the data processing module receives data to be processed sent by FPGA processing chips corresponding to other GPUs, the instruction configuration module 270 controls the computing unit 252 to perform corresponding calculations according to the information of the computing rules, and the GPU direct data access unit 251 obtains the data from the GPU. After receiving the data, the data is sent to the computing unit 252. The computing unit 252 performs mixed calculations on the data read by the GPU and the data sent by the FPGA processing chip received through the receiving unit 254 according to the preconfigured computing rules, and sends the calculation results to The sending unit 253 sends the data to the first transceiver module 290. The first transceiver module 290 obtains the AMC address information of the target network interface in the configuration information from the routing module 280, and sends the AMC address information to other FPGA processing chips through the target network interface. The data is written by the FPGA processing chip into the corresponding GPU display memory through the GPU direct data access unit 251 to complete the update iteration of the GPU data.

在一个实施例中，提供了一种基于FPGA的分布式GPU通信的装置，该装置包括第一FPGA处理芯片，第一FPGA处理芯片包括第一接口、第二接口、第一网络接口和多个第一网络接口；第一FPGA处理芯片的第一接口与GPU通信连接，第一FPGA处理芯片的第二接口与服务器通信连接；第一网络接口与远程资源调度中心设备通信连接；多个第二网络接口用于与第二FPGA处理芯片通信连接；第一FPGA处理芯片用于：In one embodiment, a device for distributed GPU communication based on FPGA is provided. The device includes a first FPGA processing chip. The first FPGA processing chip includes a first interface, a second interface, a first network interface and a plurality of The first network interface; the first interface of the first FPGA processing chip is communicatively connected to the GPU, and the second interface of the first FPGA processing chip is communicatively connected to the server; the first network interface is communicatively connected to the remote resource dispatching center device; a plurality of second The network interface is used for communication connection with the second FPGA processing chip; the first FPGA processing chip is used for:

在本申请实施例中，第一FPGA处理芯片作为GPU与服务器之间的转接卡，不仅降低了GPU与服务器之间的耦合度，还可以与相邻的FPGA处理芯片形成二维环形网络拓扑，极大地提高了通信网络拓扑的灵活性，减少GPU之间互相通信的时间，提高数据转发效率。In the embodiment of this application, the first FPGA processing chip serves as an adapter card between the GPU and the server, which not only reduces the coupling between the GPU and the server, but also forms a two-dimensional ring network topology with adjacent FPGA processing chips. , which greatly improves the flexibility of the communication network topology, reduces the time for GPUs to communicate with each other, and improves data forwarding efficiency.

在一个实施例中，第一FPGA处理芯片还包括数据处理模块，数据处理模块包括GPU直接数据存取单元；第一FPGA处理芯片具体用于：In one embodiment, the first FPGA processing chip also includes a data processing module, and the data processing module includes a GPU direct data access unit; the first FPGA processing chip is specifically used for:

在一个实施例中，第一FPGA处理芯片还包括桥梁模块；第一FPGA处理芯片具体用于：In one embodiment, the first FPGA processing chip also includes a bridge module; the first FPGA processing chip is specifically used for:

在一个实施例中，配置信息包括第一指示信息；第一FPGA处理芯片具体用于：In one embodiment, the configuration information includes first indication information; the first FPGA processing chip is specifically used for:

在一些实施例中，配置信息包括第二指示信息；第一FPGA处理芯片具体用于：In some embodiments, the configuration information includes second indication information; the first FPGA processing chip is specifically used for:

在一些实施例中，数据处理模块还包括运算单元，配置信息还包括运算规则的信息和目标网络接口的信息；第一FPGA处理芯片具体用于：In some embodiments, the data processing module also includes an operation unit, and the configuration information also includes operation rule information and target network interface information; the first FPGA processing chip is specifically used for:

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined in any way. To simplify the description, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, all possible combinations should be used. It is considered to be within the scope of this manual.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-described embodiments only express several implementation modes of the present application, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all fall within the protection scope of the present application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims

1. A method of distributed GPU communication based on FPGA, wherein the method is applied to a communication device, the communication device comprising a first FPGA processing chip comprising a first interface, a second interface, a first network interface and a plurality of second network interfaces; the first interface of the first FPGA processing chip is in communication connection with the GPU, and the second interface of the first FPGA processing chip is in communication connection with the server; the first network interface is in communication connection with remote resource scheduling center equipment; the plurality of second network interfaces are used for being in communication connection with a second FPGA processing chip; the method comprises the following steps:

receiving request information sent by the GPU, wherein the request information comprises an address of data storage;

reading data from the data storage address of the GPU;

transmitting the data to the server according to configuration information received from the remote resource scheduling center device; or sending the data to the second FPGA processing chip according to the configuration information; the configuration information includes first indication information and second indication information, the first indication information and the second indication information are switching PCIe RC signals, and sending the data to the server according to the configuration information includes:

according to the first indication information, sending the data to the server;

and sending the data to the second FPGA processing chip according to the configuration information, wherein the data comprises:

and sending the data to the second FPGA processing chip according to the second indication information.

2. The method of claim 1, wherein the first FPGA processing chip further comprises a data processing module comprising a GPU direct data access unit; the reading data from the data storage address of the GPU includes:

and reading data from the data storage address of the GPU through the GPU direct data access unit.

3. The method of claim 1 or 2, wherein the first FPGA processing chip further comprises a bridge module; transmitting the data to the server according to the configuration information received from the remote resource scheduling center device; or before sending the data to the second FPGA processing chip according to the configuration information, the method further includes:

and receiving the configuration information sent by the remote resource scheduling center equipment through the bridge module.

4. The method of claim 1, wherein the data processing module further comprises an operation unit, and the configuration information further comprises information of an operation rule and information of a target network interface; and sending the data to the second FPGA processing chip according to the second instruction information, including:

according to the second indication information, the operation unit processes the data by adopting the operation rule to obtain a processing result;

and sending the data to the second FPGA processing chip through the target network interface in the second network interfaces.

5. An FPGA-based distributed GPU communication apparatus, wherein the apparatus includes a first FPGA processing chip including a first interface, a second interface, a first network interface, and a plurality of second network interfaces; the first interface of the first FPGA processing chip is in communication connection with the GPU, and the second interface of the first FPGA processing chip is in communication connection with the server; the first network interface is in communication connection with remote resource scheduling center equipment; the plurality of second network interfaces are used for being in communication connection with a second FPGA processing chip; the first FPGA processing chip is configured to:

receiving request information sent by the GPU through the first interface, wherein the request information comprises an address of data storage;

reading data from the data storage address of the GPU;

transmitting the data to the server according to configuration information received from the remote resource scheduling center device through the first network interface; or sending the data to the second FPGA processing chip according to the configuration information; the configuration information includes first indication information and second indication information, the first indication information and the second indication information are switching PCIe RC signals, and sending the data to the server according to the configuration information includes:

according to the first indication information, sending the data to the server;

6. The apparatus of claim 5, wherein the first FPGA processing chip further comprises a data processing module comprising a GPU direct data access unit; the first FPGA processing chip is specifically configured to:

7. The apparatus of claim 5 or 6, wherein the first FPGA processing chip further comprises a bridge module; the first FPGA processing chip is specifically configured to: