CN114445260B - Distributed GPU communication method and device based on FPGA - Google Patents
Distributed GPU communication method and device based on FPGA Download PDFInfo
- Publication number
- CN114445260B CN114445260B CN202210051088.4A CN202210051088A CN114445260B CN 114445260 B CN114445260 B CN 114445260B CN 202210051088 A CN202210051088 A CN 202210051088A CN 114445260 B CN114445260 B CN 114445260B
- Authority
- CN
- China
- Prior art keywords
- data
- processing chip
- gpu
- fpga processing
- fpga
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004891 communication Methods 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 172
- 238000013500 data storage Methods 0.000 claims abstract description 30
- 238000004148 unit process Methods 0.000 claims 1
- 230000008878 coupling Effects 0.000 abstract description 8
- 238000010168 coupling process Methods 0.000 abstract description 8
- 238000005859 coupling reaction Methods 0.000 abstract description 8
- 238000012546 transfer Methods 0.000 abstract description 4
- 230000015654 memory Effects 0.000 description 34
- 238000005516 engineering process Methods 0.000 description 15
- 230000005540 biological transmission Effects 0.000 description 13
- 230000003993 interaction Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5044—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The method is applied to a communication device, the communication device comprises a first FPGA processing chip, the first FPGA processing chip receives request information sent by a GPU, the request information comprises a data storage address, data is read from the data storage address of the GPU, the data is sent to a server according to configuration information received from remote resource scheduling center equipment, or the data is sent to a second FPGA processing chip according to the configuration information, the FPGA processing chip is used as a transfer card between the GPU and the server, the coupling degree between the GPU and the server is reduced, and a two-dimensional annular network topology can be formed between the FPGA processing chip and an adjacent FPGA processing chip, so that flexibility between the GPUs is greatly improved, and the time for mutual communication between the GPUs is reduced.
Description
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a method and an apparatus for distributed GPU communications based on FPGA.
Background
Graphics processors (Graphics Processing Unit, GPUs) are specialized graphics processing chips, whether used early for graphics image processing or now widely used in the AI artificial intelligence computing field, are important computing chips. The GPU is plugged into a data center server slot as a high speed serial computer expansion bus standard (Peripheral Component Interconnect express, PCIe) device, and communicates with the host server and other GPU nodes via a PCIe interface.
Because of the tight coupling relation of the GPU and the server which are connected through the PCIe interface, the GPU cannot independently operate independently of the server, the inter-node GPU communication can only be performed in a mode of connecting the network card to the switch, the communication network topology is not flexible enough, the data forwarding efficiency is low, and the communication delay is large.
Disclosure of Invention
Based on the above, it is necessary to provide a distributed GPU communication method and device based on FPGA, where the FPGA processing chip is used as a switching card between the GPU and the server, so that the coupling degree between the GPU and the server is reduced, and a two-dimensional ring network topology can be formed with the adjacent FPGA processing chip, so that the flexibility of the communication network topology is greatly improved, the time of mutual communication between GPUs is reduced, and the data forwarding efficiency is improved.
In a first aspect, a method for distributed GPU communication based on an FPGA is provided, where the method is applied to a communication device, and the communication device includes a first FPGA processing chip, and the first FPGA processing chip includes a first interface, a second interface, a first network interface, and a plurality of second network interfaces; the first interface of the first FPGA processing chip is in communication connection with the GPU, and the second interface of the first FPGA processing chip is in communication connection with the server; the first network interface is in communication connection with remote resource scheduling center equipment; the plurality of second network interfaces are used for being in communication connection with the second FPGA processing chip; the method comprises the following steps:
receiving request information sent by a GPU, wherein the request information comprises an address of data storage;
reading data from a data storage address of the GPU;
transmitting data to a server according to configuration information received from a remote resource scheduling center device; or sending data to the second FPGA processing chip according to the configuration information.
In one possible implementation, the first FPGA processing chip further includes a data processing module, the data processing module including a GPU direct data access unit; reading data from a data storage address of the GPU, comprising:
and reading data from the data storage address of the GPU through the GPU direct data access unit.
In one possible implementation, the first FPGA processing chip further includes a bridge module; transmitting data to a server according to configuration information received from a remote resource scheduling center device; or before sending the data to the second FPGA processing chip according to the configuration information, the method further comprises the following steps:
and receiving configuration information sent by the remote resource scheduling center equipment through the bridge module.
In one possible implementation, the configuration information includes first indication information; transmitting data to a server according to the configuration information, including:
and sending data to the server according to the first indication information.
In one possible implementation, the configuration information includes second indication information; transmitting data to the second FPGA processing chip according to the configuration information, including:
and sending data to a second FPGA processing chip according to the second indication information.
In a possible implementation manner, the data processing module further comprises an operation unit, and the configuration information further comprises information of operation rules and information of a target network interface; according to the second indication information, sending data to a second FPGA processing chip, including:
according to the second indication information, processing the data by adopting an operation rule through an operation unit to obtain a processing result;
and sending data to the second FPGA processing chip through a target network interface in the plurality of second network interfaces.
In a second aspect, an apparatus for FPGA-based distributed GPU communication is provided, the apparatus comprising a first FPGA processing chip comprising a first interface, a second interface, a first network interface, and a plurality of first network interfaces; the first interface of the first FPGA processing chip is in communication connection with the GPU, and the second interface of the first FPGA processing chip is in communication connection with the server; the first network interface is in communication connection with remote resource scheduling center equipment; the plurality of second network interfaces are used for being in communication connection with the second FPGA processing chip; the first FPGA processing chip is used for:
receiving request information sent by a GPU through a first interface, wherein the request information comprises an address of data storage;
reading data from a data storage address of the GPU;
transmitting data to a server according to configuration information received from a remote resource scheduling center device through a first network interface; or sending data equipment communication connection to the second FPGA processing chip according to the configuration information; the second network interface is used for being in communication connection with the second FPGA processing chip.
In one possible implementation, the first FPGA processing chip further includes a data processing module, the data processing module including a GPU direct data access unit; the first FPGA processing chip is specifically configured to:
and reading data from the data storage address of the GPU through the GPU direct data access unit.
In one possible implementation, the first FPGA processing chip further includes a bridge module; the first FPGA processing chip is specifically configured to:
and receiving configuration information sent by the remote resource scheduling center equipment through the bridge module.
In one possible implementation, the configuration information includes first indication information; the first FPGA processing chip is specifically configured to:
and sending data to the server according to the first indication information.
According to the method and the device for distributed GPU communication based on the FPGA, the request information sent by the GPU is received, the request information comprises the data storage address, the data is read from the data storage address of the GPU, the data is sent to the server according to the configuration information received from the remote resource scheduling center equipment, or the data is sent to the second FPGA processing chip according to the configuration information, the FPGA processing chip is used as a transfer card between the GPU and the server, so that the coupling degree between the GPU and the server is reduced, the cost of resources such as CPU, memory and network at the server end is reduced, a two-dimensional ring network topology can be formed between the FPGA processing chip and the adjacent FPGA processing chip, the flexibility between the GPUs is greatly improved, and the time for mutual communication between the GPUs is reduced.
Drawings
FIG. 1 is an application environment diagram of an FPGA-based distributed GPU method in one embodiment of the present application;
FIG. 2 is a block diagram of a first FPGA processing chip in one embodiment;
FIG. 3 is a flow diagram of a distributed GPU approach based on FPGAs in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In the prior art, the GPUDirect series is a GPU direct memory access technology, and the GPU can access the memories of other sub-devices or hosts through the PCIe chipset bus system, so that unnecessary memory copying is reduced, the use overhead of a central processing unit (Central Processing Unit, CPU) is reduced, and the data transmission efficiency is improved. GPUDirect Shared Memory is a technology of GPUDirect series in which a GPU directly shares a memory, and can enable the GPU to share a memory page of a host with other PCIe devices to implement data communication between different PCIe devices.
GPUDirect P2P is a technology for directly sharing video memory among GPUs of GPUDirect series, and refers to a technology for directly mutually accessing the GPU video memory by two GPU devices in the same PCIe root complex (PCIe Root Complex, PCIe RC) domain, and data is not required to be transferred between the GPUs by copying into a host memory, so that compared with GPUDirect Shared Memory technology, the steps of copying the data from the GPU video memory to the host memory and copying the data from the host memory to the GPU video memory are reduced, the delay of a data path is reduced, and the data transmission efficiency is improved.
The GPU direct remote direct memory access (GPUDirect Remote Direct Memory Access, GPUDirect RDMA) technology is to utilize RDMA related technology, physical network card and transmission network to realize direct interaction video memory data among GPUs, and the technology solves the problems of large processing delay, high CPU occupation rate and the like in the traditional network data transmission process, and realizes the function of direct interaction video memory of the GPU among different nodes.
GPUDirect Shared Memory technology and GPUDirect P2P technology are developed and realized based on PCIe devices under the condition that a GPU is used as a host, communication with other devices depends on participation of a CPU, a host memory, a PCIe switching system and the like, the devices are tightly coupled with a server CPU, a memory and the like through PCIe, and communication is limited to the GPU in a single node. When the GPUDirect Shared Memory technology is adopted to exchange video memory data among the GPUs in a single node, the modules such as each CPU and CPU1 memory are needed to pass through, the CPU overhead is high, and the data transmission processing delay is high. When GPUDirect P2P technology is adopted to exchange the display data among the GPUs, the method is limited to direct interaction of the display data among the GPUs under the same PCIe RC domain through a PCIe chipset, and if the PCIe RC domains crossing two CPUs also need the CPU and the CPU memory to participate in data transmission, the delay of the display data interaction among the GPUs and the CPU cost are still large.
The GPUDirect RDMA technology realizes the problem of inter-node GPU communication by utilizing the RDMA technology, but needs a high-performance network card under the same PCIe domain, a local server CPU and the like to participate in helping the GPU to complete inter-node data transmission, the GPU and the server are still in a tight coupling relation through PCIe connection, the GPU cannot independently operate independently of the server, the inter-node GPU communication can only be performed in a mode that the network card is connected to a switch, the communication network topology is not flexible enough, the data packet forwarding efficiency is low, and the communication delay is large.
In order to solve the problems in the prior art, the embodiment of the application provides a distributed GPU communication method and device based on an FPGA. The method for distributed GPU communication based on FPGA provided in the embodiments of the present application is first described, and the method is applied to an application environment shown in fig. 1, as shown in fig. 1, where the communication apparatus 100 includes a plurality of FPGA processing chips, each FPGA processing chip includes a plurality of network interfaces, where the first network interface 230 is communicatively connected to a remote resource scheduling center device through a switch, the second network interface 240 and other surrounding FPGA processing chips form a 2D ring communication topology, and the number of the second network interfaces 240 can be adjusted according to a user requirement, but not limited to 4. The FPGA processing chip may send data to an adjacent FPGA processing chip or receive data sent by an adjacent FPGA processing chip through the second network interface 240. The first network interface and the second network interface are 100G network optical interfaces, so that efficient communication is performed between the GPUs and between the GPU and the remote resource scheduling center equipment.
Any one of the FPGA processing chips is defined as a first FPGA processing chip, the FPGA processing chip connected with the first FPGA processing chip is defined as a second FPGA processing chip, and the first FPGA processing chip and the second FPGA processing chip have the same structure.
As shown in fig. 2, the first FPGA processing chip 200 includes a first interface 210, a second interface 220, a first network interface 230, and a plurality of second network interfaces 240; the first interface 210 of the first FPGA processing chip is in communication connection with the GPU, and the second interface 220 of the first FPGA processing chip 200 is in communication connection with the server; the first network interface 230 is communicatively coupled to a remote resource scheduling center device; the plurality of second network interfaces 240 are for communication connection with a second FPGA processing chip.
The first FPGA processing chip 200 further includes an instruction configuration module 270, a routing module 280, a first transceiver module 290, and a second transceiver module 2100, where the first transceiver module 290 is a RoCE transceiver module, and is communicatively connected to the second network interface 240 and the routing module 280, and receives a data packet from an adjacent FPGA processing chip through a RoCE protocol, analyzes the data packet, and packages the data of the first FPGA processing chip, and sends the data packet to the adjacent FPGA processing chip.
The second transceiver module 2100 is a RoCEv2 transceiver module, and is communicatively connected to the first network interface 230, the routing module 280 and the instruction configuration module 270, and receives configuration information sent by the remote resource scheduling center device through a RoCEv2 protocol and analyzes the configuration information, and distributes the analysis result to a corresponding module, for example, sends information of an algorithm to the instruction configuration module 270, thereby completing tasks such as registration, initialization and the like of GPU resources. Meanwhile, the GPU data connected with the first FPGA processing chip may be packaged, and connected to the switch through the first network interface 230 to communicate with GPUs connected with other FPGA processing chips.
After the first FPGA processing chip is powered on, the routing module 280 communicates with the adjacent FPGA processing chip through the second network interface 240, and obtains the MAC address information of the second network interface 240 of the adjacent FPGA processing chip and stores the MAC address information in the memory, so as to communicate through the RoCE protocol.
FIG. 3 illustrates a flow diagram of a method of providing FPGA-based distributed GPU communication according to one embodiment of the present application. As shown in fig. 3, the method may include the steps of:
s310, receiving request information sent by the GPU, wherein the request information comprises a data storage address.
Before data communication is carried out, the remote resource scheduling center equipment carries out initialization setting on the communication device, sends data to be processed to the GPU through the server, the GPU processes the received data, stores the processed data, and sends request information to the communication device so that the communication device can transmit the data stored by the GPU. The request information comprises a data storage address, so that the communication device can accurately acquire data required to be transmitted by the GPU.
The communication device receives the request information sent by the GPU through the first interface 210 of the first FPGA processing chip 200, where the first interface 210 is a PCIe interface, and the PCIe interface of the GPU is connected with the PCIe interface of the GPU in a Root Point mode by using a Gen5x16 standard golden finger.
S320, reading data from the data storage address of the GPU.
The first FPGA processing chip reads data from the data storage address in the GPU video memory in a Direct Memory Access direct data access (Direct Memory Access, DMA) mode according to the received data storage address, so that the data transmission delay is small, and the data reading efficiency is improved.
S330, transmitting data to a server according to the configuration information received from the remote resource scheduling center device; or sending data to the second FPGA processing chip according to the configuration information.
The configuration information comprises switching information which determines a data transmission path, when the switching information is that the GPU is communicated with the server, the first FPGA processing chip sends data to the server through the second interface 220, and as the communication device based on the FPGA processing chip is introduced between the GPU and the server, the coupling degree of the GPU and the server is reduced, and the independent pooling management of the GPU is facilitated. The second interface 220 is an Endpoint PCIe interface, and is connected with the PCIe interface of the GPU by adopting a Gen5x16 standard to match the communication interface mode of the GPU.
When the switching information is that the GPU is communicated with other FPGA processing chips, the first FPGA processing chip sends data to the second FPGA processing chip, and the GPU is more flexible in GPU communication network topology with the other FPGA processing chips through a plurality of network interfaces of the FPGA processing chips. And the plurality of network interfaces are communicated with the plurality of FPGA processing chips to realize simultaneous calculation and transmission of data in a plurality of dimensions, so that the number of times of data calculation and transmission through a PCIe bus is reduced, the time required for data update is shortened, and the data communication time cost is reduced.
In the embodiment of the application, the request information sent by the GPU is received, the request information comprises the data storage address, the data is read from the data storage address of the GPU, the data is sent to the server according to the configuration information received from the remote resource scheduling center device, or the data is sent to the second FPGA processing chip according to the configuration information, and the FPGA processing chip is used as a transfer card between the GPU and the server, so that the coupling degree between the GPU and the server is reduced, the cost of resources such as a CPU (central processing unit), a memory and a network at the server side is reduced, a two-dimensional ring network topology can be formed between the FPGA processing chip and the adjacent FPGA processing chip, the flexibility between the GPUs is greatly improved, and the time for mutual communication between the GPUs is reduced.
In some embodiments, the first FPGA processing chip further comprises a data processing module 250 comprising a GPU direct data access unit 251; reading data from a data storage address of the GPU, comprising:
the data is read from the data storage address in the GPU video memory through the GPU direct data access unit 251, the first FPGA processing chip 200 directly accesses the GPU internal video memory through PCIe and the GPU direct data access unit 251, and the data to be transmitted is read, so that the communication delay between the GPUs is effectively reduced.
In some embodiments, the first FPGA processing chip 200 further includes a bridge module 260; transmitting data to a server according to configuration information received from a remote resource scheduling center device; or before sending the data to the second FPGA processing chip according to the configuration information, the method further comprises the following steps:
configuration information sent by the remote resource scheduling center device is received through the bridge module 260.
The bridge module 260 is connected with the instruction configuration module 270, the first interface 210, the second interface 220 and the data processing module, the remote resource scheduling center device sends configuration information to the first network interface 230 through the switch network, the first network interface 230 analyzes the configuration information to obtain a switching PCIe RC signal, the switching PCIe RC signal is sent to the instruction configuration module 270, the instruction configuration module 270 judges the switching PCIe RC signal and sends a judging result to the bridge module 260, and the bridge module 260 can switch the connection relation of the PCIe bus of the GPU according to the judging result.
In the initialization stage of the communication device, the remote resource scheduling center equipment sends configuration information to the first FPGA processing chip through the switch network, controls the connection relation of the PCIe buses of the switching GPU, determines a data transmission path, flexibly selects a data transmission object in the GPU, and releases the dependence of the GPU on a server host.
In some embodiments, the configuration information includes first indication information; transmitting data to a server according to the configuration information, including:
and sending data to the server according to the first indication information.
After receiving the switching PCIe RC signal in the configuration information, the instruction configuration module 270 analyzes the switching PCIe RC signal, and determines whether a value corresponding to the switching PCIe RC signal is 1 or 0. When the analysis result of the configuration information is 0, the GPU performs data interaction with the server, and the bridge module 260 directly sends the data to the server.
The configuration information comprises second indication information; and sending the data to the second FPGA processing chip according to the configuration information, wherein the data comprises:
and sending the data to the second FPGA processing chip according to the second indication information.
When the analysis result of the configuration information is 1, the GPU performs data interaction with the second FPGA processing chip, the bridge module 260 sends the data to the data processing module 250, the data processing module 250 processes the data, the processed data is sent to the first transceiver module 290, and finally the processed data is sent to the second FPGA processing chip through the second network interface 240.
In some embodiments, the data processing module 250 further includes an operation unit 252, and the configuration information further includes information of operation rules and information of a target network interface; according to the second indication information, sending data to a second FPGA processing chip, including:
according to the second indication information, processing the data by adopting an operation rule through an operation module to obtain a processing result;
and sending data to the second FPGA processing chip through a target network interface in the plurality of second network interfaces.
In the process of performing data interaction between the GPU and the second FPGA processing chip, if the receiving unit of the data processing module 250 does not receive the data to be processed sent by the FPGA processing chip corresponding to the other GPU, the instruction configuration module 270 controls the operation unit 252 according to the configuration information to directly send the data read by the GPU direct data access unit 251 from the GPU to the sending unit 253, the sending unit 253 sends the data to the first transceiver module 290, the first transceiver module 290 obtains the AMC address information of the target network interface in the configuration information from the routing module 280, and sends the data to the other FPGA processing chips through the target network interface.
If the receiving unit of the data processing module receives the data to be processed sent by the FPGA processing chip corresponding to the other GPUs, the instruction configuration module 270 controls the operation unit 252 to perform corresponding computation according to the information of the operation rule, the GPU direct data access unit 251 obtains the data from the GPUs, then sends the data to the operation unit 252, the operation unit 252 performs mixed computation according to the pre-configured operation rule on the data read by the GPUs and the data sent by the FPGA processing chip received by the receiving unit 254, sends the computation result to the sending unit 253, the sending unit 253 sends the data to the first transceiver module 290, the first transceiver module 290 obtains the AMC address information of the target network interface in the configuration information from the routing module 280, sends the data to the other FPGA processing chips through the target network interface, and the FPGA processing chips write the data into the corresponding GPU video memory through the GPU direct data access unit 251, so as to complete the updating iteration of the data of the GPUs.
In one embodiment, an apparatus for FPGA-based distributed GPU communication is provided, the apparatus comprising a first FPGA processing chip comprising a first interface, a second interface, a first network interface, and a plurality of first network interfaces; the first interface of the first FPGA processing chip is in communication connection with the GPU, and the second interface of the first FPGA processing chip is in communication connection with the server; the first network interface is in communication connection with remote resource scheduling center equipment; the plurality of second network interfaces are used for being in communication connection with the second FPGA processing chip; the first FPGA processing chip is used for:
receiving request information sent by a GPU through a first interface, wherein the request information comprises an address of data storage;
reading data from a data storage address of the GPU;
transmitting data to a server according to configuration information received from a remote resource scheduling center device through a first network interface; or sending data equipment communication connection to the second FPGA processing chip according to the configuration information; the second network interface is used for being in communication connection with the second FPGA processing chip.
In the embodiment of the application, the first FPGA processing chip is used as the transfer card between the GPU and the server, so that the coupling degree between the GPU and the server is reduced, a two-dimensional ring network topology can be formed by the first FPGA processing chip and the adjacent FPGA processing chip, the flexibility of the communication network topology is greatly improved, the inter-communication time between the GPUs is reduced, and the data forwarding efficiency is improved.
In one embodiment, the first FPGA processing chip further comprises a data processing module comprising a GPU direct data access unit; the first FPGA processing chip is specifically configured to:
and reading data from the data storage address of the GPU through the GPU direct data access unit.
In one embodiment, the first FPGA processing chip further comprises a bridge module; the first FPGA processing chip is specifically configured to:
and receiving configuration information sent by the remote resource scheduling center equipment through the bridge module.
In one embodiment, the configuration information includes first indication information; the first FPGA processing chip is specifically configured to:
and sending data to the server according to the first indication information.
In some embodiments, the configuration information includes second indication information; the first FPGA processing chip is specifically configured to:
and sending data to a second FPGA processing chip according to the second indication information.
In some embodiments, the data processing module further comprises an operation unit, and the configuration information further comprises information of an operation rule and information of a target network interface; the first FPGA processing chip is specifically configured to:
according to the second indication information, processing the data by adopting an operation rule through an operation unit to obtain a processing result;
and sending data to the second FPGA processing chip through a target network interface in the plurality of second network interfaces.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.
Claims (7)
1. A method of distributed GPU communication based on FPGA, wherein the method is applied to a communication device, the communication device comprising a first FPGA processing chip comprising a first interface, a second interface, a first network interface and a plurality of second network interfaces; the first interface of the first FPGA processing chip is in communication connection with the GPU, and the second interface of the first FPGA processing chip is in communication connection with the server; the first network interface is in communication connection with remote resource scheduling center equipment; the plurality of second network interfaces are used for being in communication connection with a second FPGA processing chip; the method comprises the following steps:
receiving request information sent by the GPU, wherein the request information comprises an address of data storage;
reading data from the data storage address of the GPU;
transmitting the data to the server according to configuration information received from the remote resource scheduling center device; or sending the data to the second FPGA processing chip according to the configuration information; the configuration information includes first indication information and second indication information, the first indication information and the second indication information are switching PCIe RC signals, and sending the data to the server according to the configuration information includes:
according to the first indication information, sending the data to the server;
and sending the data to the second FPGA processing chip according to the configuration information, wherein the data comprises:
and sending the data to the second FPGA processing chip according to the second indication information.
2. The method of claim 1, wherein the first FPGA processing chip further comprises a data processing module comprising a GPU direct data access unit; the reading data from the data storage address of the GPU includes:
and reading data from the data storage address of the GPU through the GPU direct data access unit.
3. The method of claim 1 or 2, wherein the first FPGA processing chip further comprises a bridge module; transmitting the data to the server according to the configuration information received from the remote resource scheduling center device; or before sending the data to the second FPGA processing chip according to the configuration information, the method further includes:
and receiving the configuration information sent by the remote resource scheduling center equipment through the bridge module.
4. The method of claim 1, wherein the data processing module further comprises an operation unit, and the configuration information further comprises information of an operation rule and information of a target network interface; and sending the data to the second FPGA processing chip according to the second instruction information, including:
according to the second indication information, the operation unit processes the data by adopting the operation rule to obtain a processing result;
and sending the data to the second FPGA processing chip through the target network interface in the second network interfaces.
5. An FPGA-based distributed GPU communication apparatus, wherein the apparatus includes a first FPGA processing chip including a first interface, a second interface, a first network interface, and a plurality of second network interfaces; the first interface of the first FPGA processing chip is in communication connection with the GPU, and the second interface of the first FPGA processing chip is in communication connection with the server; the first network interface is in communication connection with remote resource scheduling center equipment; the plurality of second network interfaces are used for being in communication connection with a second FPGA processing chip; the first FPGA processing chip is configured to:
receiving request information sent by the GPU through the first interface, wherein the request information comprises an address of data storage;
reading data from the data storage address of the GPU;
transmitting the data to the server according to configuration information received from the remote resource scheduling center device through the first network interface; or sending the data to the second FPGA processing chip according to the configuration information; the configuration information includes first indication information and second indication information, the first indication information and the second indication information are switching PCIe RC signals, and sending the data to the server according to the configuration information includes:
according to the first indication information, sending the data to the server;
and sending the data to the second FPGA processing chip according to the configuration information, wherein the data comprises:
and sending the data to the second FPGA processing chip according to the second indication information.
6. The apparatus of claim 5, wherein the first FPGA processing chip further comprises a data processing module comprising a GPU direct data access unit; the first FPGA processing chip is specifically configured to:
and reading data from the data storage address of the GPU through the GPU direct data access unit.
7. The apparatus of claim 5 or 6, wherein the first FPGA processing chip further comprises a bridge module; the first FPGA processing chip is specifically configured to:
and receiving the configuration information sent by the remote resource scheduling center equipment through the bridge module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210051088.4A CN114445260B (en) | 2022-01-17 | 2022-01-17 | Distributed GPU communication method and device based on FPGA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210051088.4A CN114445260B (en) | 2022-01-17 | 2022-01-17 | Distributed GPU communication method and device based on FPGA |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114445260A CN114445260A (en) | 2022-05-06 |
CN114445260B true CN114445260B (en) | 2024-01-12 |
Family
ID=81368275
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210051088.4A Active CN114445260B (en) | 2022-01-17 | 2022-01-17 | Distributed GPU communication method and device based on FPGA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114445260B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115037747B (en) * | 2022-05-31 | 2024-11-05 | 北京百度网讯科技有限公司 | Data communication method and device, distributed system, equipment and medium |
CN116383127B (en) * | 2023-06-01 | 2023-08-18 | 苏州浪潮智能科技有限公司 | Inter-node communication method, inter-node communication device, electronic equipment and storage medium |
CN118426976B (en) * | 2024-07-04 | 2024-09-20 | 浪潮(北京)电子信息产业有限公司 | Memory expansion system, access method and device, medium and computer program product |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104583933A (en) * | 2012-08-23 | 2015-04-29 | 微软公司 | Direct communication between GPU and FPGA components |
CN107391432A (en) * | 2017-08-11 | 2017-11-24 | 中国计量大学 | A kind of heterogeneous Computing device and computing node interconnection network |
CN108804376A (en) * | 2018-06-14 | 2018-11-13 | 山东航天电子技术研究所 | A kind of small-sized heterogeneous processing system based on GPU and FPGA |
CN109240832A (en) * | 2018-09-25 | 2019-01-18 | 中国电子科技集团公司电子科学研究院 | A kind of hardware reconstruction system and method |
CN113900793A (en) * | 2021-07-29 | 2022-01-07 | 苏州浪潮智能科技有限公司 | Server cluster and deep learning aggregate communication system and method thereof |
-
2022
- 2022-01-17 CN CN202210051088.4A patent/CN114445260B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104583933A (en) * | 2012-08-23 | 2015-04-29 | 微软公司 | Direct communication between GPU and FPGA components |
CN107391432A (en) * | 2017-08-11 | 2017-11-24 | 中国计量大学 | A kind of heterogeneous Computing device and computing node interconnection network |
CN108804376A (en) * | 2018-06-14 | 2018-11-13 | 山东航天电子技术研究所 | A kind of small-sized heterogeneous processing system based on GPU and FPGA |
CN109240832A (en) * | 2018-09-25 | 2019-01-18 | 中国电子科技集团公司电子科学研究院 | A kind of hardware reconstruction system and method |
CN113900793A (en) * | 2021-07-29 | 2022-01-07 | 苏州浪潮智能科技有限公司 | Server cluster and deep learning aggregate communication system and method thereof |
Non-Patent Citations (1)
Title |
---|
A cloud-scale acceleration architecture;Adrian M. Caulfield等;2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114445260A (en) | 2022-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114445260B (en) | Distributed GPU communication method and device based on FPGA | |
US8346928B2 (en) | Administering an epoch initiated for remote memory access | |
US7984450B2 (en) | Dispatching packets on a global combining network of a parallel computer | |
CN113485823A (en) | Data transmission method, device, network equipment and storage medium | |
US7797445B2 (en) | Dynamic network link selection for transmitting a message between compute nodes of a parallel computer | |
CN102263698B (en) | Method for establishing virtual channel, method of data transmission and line card | |
CN114546913B (en) | Method and device for high-speed data interaction between multiple hosts based on PCIE interface | |
US10614026B2 (en) | Switch with data and control path systolic array | |
CN114647602B (en) | Cross-chip access control method, device, equipment and medium | |
US20220114132A1 (en) | Data Switch Chip and Server | |
CN101452430B (en) | Communication method between multi-processors and communication device comprising multi-processors | |
US7705850B1 (en) | Computer system having increased PCIe bandwidth | |
CN116644010A (en) | Data processing method, device, equipment and medium | |
CN119226205A (en) | Data processing method and device | |
KR20050080704A (en) | Apparatus and method of inter processor communication | |
CN117806999A (en) | Bit width and channel adjustable on-chip bus | |
CN117914808A (en) | Data transmission system, method and switch | |
WO2021196904A1 (en) | Device management method, apparatus, and computer system | |
US20030041176A1 (en) | Data transfer algorithm that does not require high latency read operations | |
CN102694717A (en) | Method, device and system for transmitting messages on PCIE bus | |
JP2023529831A (en) | Separate switch control path using directly attached dispatch | |
CN112597092B (en) | Data interaction method, robot and storage medium | |
US20240354141A1 (en) | Virtual data links | |
US20230283547A1 (en) | Computer System Having a Chip Configured for Memory Attachment and Routing | |
US20230280907A1 (en) | Computer System Having Multiple Computer Devices Each with Routing Logic and Memory Controller and Multiple Computer Devices Each with Processing Circuitry |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |