Disclosure of Invention
The application provides a server, which at least solves the problem of poor reliability of the server in the related technology.
The application provides a server, which comprises a first cabinet, a second cabinet, a plurality of graphic processor nodes, a plurality of switching nodes, a first power cabinet and a second power cabinet, wherein,
The first power supply cabinet, the second power supply cabinet and the plurality of graphic processor nodes are arranged in the first cabinet, and the first power supply cabinet, the plurality of graphic processor nodes and the second power supply cabinet are vertically stacked in sequence;
The plurality of switching nodes are arranged in the second cabinet and are sequentially and horizontally stacked;
Wherein the graphics processor node is connected with the switching node.
In one possible implementation, the graphics processor node includes a plurality of connectors therein, the plurality of connectors in the graphics processor node being located on a first side of the first enclosure;
The switching node comprises a plurality of connectors, and the connectors in the switching node are positioned on the second side of the second cabinet;
The first side of the first cabinet is opposite to the second side of the second cabinet, and the graphics processor node is connected with the switching node through a connector.
In one possible implementation, the graphics processor node includes N connectors, where N is the number of switching nodes and N is an integer greater than 1;
The switching node comprises M connectors, M is the number of the graphics processor nodes, and M is an integer greater than 1;
n connectors in the graphics processor node are connected to one connector in each switching node.
In one possible embodiment, the first cabinet further comprises a second power supply copper bar, and the second cabinet further comprises a first power supply copper bar;
The first power supply copper bar is used for supplying power to the plurality of graphics processor nodes, and the second power supply copper bar is used for supplying power to the plurality of switching nodes;
The first power supply copper bar is positioned at the middle position of the plurality of switching nodes and is close to the first side of the first cabinet, and the second power supply copper bar is positioned at the middle position of the plurality of graphic processor nodes and is close to the second side of the second cabinet;
the two ends of the first power supply copper bar are respectively connected with the first power cabinet and the second power cabinet, and the first power supply copper bar is connected with the second power supply copper bar.
In one possible implementation, the graphics processor node includes a first power copper bar, and the switching node includes a second power copper bar;
The first power supply copper bar in the graphic processor node is connected with the first power supply copper bar of the second cabinet;
The second power supply copper bar in the switching node is connected with the second power supply copper bar of the first cabinet.
In one possible implementation, the first cabinet further includes a plurality of second heat sinks, and the second cabinet further includes a plurality of first heat sinks;
The first radiator is used for radiating heat for the plurality of graphic processor nodes, and the second radiator is used for radiating heat for the plurality of switching nodes;
the plurality of second radiators are respectively positioned between the first power cabinet and the plurality of graphic processor nodes and between the plurality of graphic processor nodes and the second power cabinet;
the first radiators are respectively positioned at the left side and the right side in the second cabinet.
In one possible embodiment, the plurality of first heat sinks includes a first water inlet heat sink and a first water outlet heat sink, the first water inlet heat sink is connected to the cabinet water inlet pipe, the first water outlet heat sink is connected to the cabinet water outlet pipe, and the plurality of graphics processor nodes are connected to the first water inlet heat sink and the first water outlet heat sink, respectively.
In one possible implementation, the graphics processor node further includes a first water inlet interface and a first water outlet interface;
the first water inlet interface is connected with the first water inlet radiator, and the first water outlet interface is connected with the first water outlet radiator.
In one possible embodiment, the plurality of second heat sinks includes a second water inlet heat sink and a second water outlet heat sink, the second water inlet heat sink is connected with the first water inlet heat sink through a water inlet transfer pipe, the second water outlet heat sink is connected with the first water outlet heat sink through a water outlet transfer pipe, and the plurality of exchange nodes are respectively connected with the second water inlet heat sink and the second water outlet heat sink.
In one possible embodiment, the switching node further comprises a second water inlet interface and a second water outlet interface;
The second water inlet interface is connected with the second water inlet radiator, and the second water outlet interface is connected with the second water outlet radiator.
In one possible implementation, the graphics processor node includes a plurality of graphics processors, a plurality of central processing units, a plurality of memories, a plurality of hard disks, a plurality of network cards, an integrated circuit chip, and a plurality of connectors;
the graphics processors are connected with the connectors through the integrated circuit chips, the graphics processors are connected with the central processing unit, the central processing unit is connected with the memory, and the central processing unit is respectively connected with the hard disks and the network cards.
In one possible implementation, the graphics processor node further includes a first power supply copper bar, a first water inlet interface, and a first water outlet interface;
The first power supply copper bar is located in the middle position of the connectors, and the first water inlet interface and the first water outlet interface are respectively located on two sides of the connectors.
In one possible implementation, the switching node includes a plurality of switching chips, a plurality of cable connectors, a plurality of optical modules, and a plurality of connectors;
the plurality of exchange chips are respectively connected with the plurality of cable connectors and the plurality of connectors, the plurality of cable connectors are respectively connected with the plurality of connectors, and the plurality of exchange chips are connected with the plurality of optical modules.
In a possible implementation manner, the exchange node further comprises a second power supply copper bar, a second water inlet interface and a second water outlet interface;
the second power supply copper bar is positioned in the middle of the connectors, the connectors are respectively provided with a plurality of first connectors and a plurality of second connectors, and the second water inlet interface and the second water outlet interface are respectively positioned at two sides of the connectors;
The plurality of cable connectors are divided into a plurality of first cable connectors and a plurality of second cable connectors, the plurality of first cable connectors are connected with the plurality of second connectors through cables, and the plurality of second cable connectors are connected with the plurality of first connectors through cables;
The plurality of exchange chips comprise a first exchange chip and a second exchange chip, the first exchange chip is connected with the plurality of first cable connectors through a printed circuit board, the first exchange chip is also connected with the plurality of first connectors through the printed circuit board, the second exchange chip is connected with the plurality of second cable connectors through the printed circuit board, and the second exchange chip is also connected with the plurality of second connectors through the printed circuit board.
In one possible embodiment, the second water inlet port is connected to a heat dissipation pipe of the second exchange chip, the heat dissipation pipe of the second exchange chip is connected to heat dissipation pipes of the plurality of optical modules, the heat dissipation pipes of the plurality of optical modules are connected to heat dissipation pipes of the first exchange chip, and the heat dissipation pipe of the first exchange chip is connected to the second water outlet port.
According to the application, the plurality of graphic processor nodes are arranged in the first cabinet, the plurality of graphic processor nodes are sequentially and vertically stacked, the plurality of switching nodes are arranged in the second cabinet, and the plurality of switching nodes are sequentially and horizontally stacked, wherein the graphic processor nodes are connected with the switching nodes, so that the technical problem of poor reliability of the server can be solved, the difficulty of installation and maintenance is reduced, the integrity of signal transmission is improved, and the reliability of the server is improved.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort are within the scope of the present application.
It should be noted that in the description of the present application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "first," "second," and the like in this specification are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
As AI large model parameters increase to the order of billions or even trillions, the video memory capacity and computing power of a single GPU has failed to meet the demand. In the related art, a plurality of GPUs may be integrated into one physical unit through a high bandwidth interconnection technology to form a server cluster with a stronger collaborative computing capability, which is called a super node server.
To achieve internal cross-node communication, the supernode server needs to deploy a cable bridge (Cabletray) at the back of the cabinet, cabletray can accommodate thousands of high-speed cables to connect the GPU with the switching nodes.
Next, with reference to fig. 1 and 2, a description will be given of a structure of a supernode server in the related art.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic diagram of a front side structure of a related server according to an embodiment of the present application. Fig. 2 is a schematic diagram of a rear structure of a related server according to an embodiment of the present application. Fig. 1 may include a front side structure of a cabinet. Fig. 2 may include a rear side structure of the cabinet.
A power cabinet, a plurality of graphics processor nodes, and a plurality of switching nodes may be included in a front side structure of the cabinet. The power cabinet is positioned at the upper end and the lower end of the cabinet.
The power cabinet can be used for supplying power to the graphics processor node and the switching node so as to ensure the normal operation of each node.
The graphics processor node may be used to perform operational tasks.
The switching nodes can be used for data exchange and communication between the nodes to construct a network for data transmission.
The graphics processor node can cooperate with the switching node to realize data processing and transmission.
The plurality of switching nodes may be located between the plurality of graphics processor nodes.
For example, a server may include 16 graphics processor nodes and 8 switching nodes. The 8 switching nodes are positioned in the middle of the 16 graphic processor nodes, and the upper side and the lower side of the graphic processor nodes also comprise a power cabinet.
The rear side structure of the cabinet can comprise a plurality of radiators, a plurality of cable bridges and power supply copper bars. The plurality of radiators are positioned on the left side and the right side, the plurality of cable bridges are positioned at the middle positions of the radiators, and the power supply copper bars are positioned in the middle of the plurality of cable bridges.
The heat sink may be used to help the heat dissipation of the heat generating components of the graphics processor node, switching node, etc., to maintain the server within a suitable operating temperature range.
The cable bridge may be used for connection between the graphics processor node and the switching node. Thousands of cables are typically deployed in cable trays and are responsible for transmitting the high-speed data signals required for communication between nodes.
The power supply copper bar can be used for power transmission of the server.
However, high-density wiring of the cable bridge is difficult to locate and repair when a single cable fails, requires overall replacement, is extremely poor in maintainability, and long-distance wiring can cause serious signal integrity problems, resulting in poor reliability of the server.
In order to solve the above technical problems, an embodiment of the present application provides a server, in which a plurality of graphics processor nodes are disposed in a first cabinet, and a plurality of graphics processor nodes are sequentially vertically stacked, and a plurality of switching nodes are disposed in a second cabinet, and a plurality of switching nodes are sequentially horizontally stacked, wherein the graphics processor nodes are connected with the switching nodes. Therefore, a cable bridge is not needed, the difficulty of installation and maintenance is reduced, the integrity of signal transmission is improved, and the reliability of the server is improved.
The present application will be further described in detail below with reference to the drawings and detailed description for the purpose of enabling those skilled in the art to better understand the aspects of the present application.
The specific hardware architecture is described herein in connection with the specific hardware architecture upon which execution of the server depends.
Next, the server of the present application will be explained with reference to fig. 3.
The server may include a first enclosure, a second enclosure, a plurality of graphics processor nodes, a plurality of switching nodes, a first power cabinet, and a second power cabinet.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a server according to an embodiment of the present application. Fig. 3 may include a first cabinet and a second cabinet.
The first power cabinet, the second power cabinet and the plurality of graphic processor nodes are arranged in the first cabinet, and the first power cabinet, the plurality of graphic processor nodes and the second power cabinet are vertically stacked in sequence.
The plurality of switching nodes are arranged in the second cabinet and are sequentially and horizontally stacked;
Wherein the graphics processor node is connected with the switching node.
The first cabinet comprises a first power cabinet, a plurality of graphic processor nodes and a second power cabinet from top to bottom in sequence.
The power supply cabinet may be used to power the server.
The graphics processor nodes can be used for executing tasks such as graphics processing, computing and the like, and a plurality of graphics processor nodes can work in parallel to improve the computing efficiency and meet the complex computing demands.
The plurality of switching nodes in the second cabinet are sequentially arranged in a horizontally stacked manner.
In this way, the horizontal layout of the switching nodes forms an independent heat dissipation area, and thermal interference with the high-power consumption graphics processor node is avoided.
The switching node may be used to transmit data generated during the operation or data to be acquired, so as to implement data circulation and processing of the whole system.
The graphics processor node may be connected with the switching node by a connector.
The connectors may include, but are not limited to, orthogonal connectors, and the like.
Therefore, a cable bridge is not needed, the difficulty of installation and maintenance is reduced, the integrity of signal transmission is improved, and the reliability of the server is improved.
In one possible implementation, the graphics processor node includes a plurality of connectors therein, the plurality of connectors in the graphics processor node being located on a first side of the first enclosure.
The switching node includes a plurality of connectors therein, the plurality of connectors in the switching node being located on a second side of the second enclosure.
The first side of the first cabinet is opposite to the second side of the second cabinet, and the graphics processor node is connected with the switching node through a connector.
Wherein the connector may be an interface component for enabling data transmission.
The first side of the first cabinet is disposed opposite the second side of the second cabinet. The opposite arrangement mode is used for facilitating connection between the graphic processor node and the switching node through the connector. Through the face-to-face layout and the butt joint of the connectors, the GPU node and the switching node can establish a stable connection channel, data transmission and interaction between the GPU node and the switching node are realized, and the whole system can work cooperatively to finish tasks such as AI training, data processing and the like.
In one possible implementation, the graphics processor node includes N connectors, where N is the number of switching nodes and N is an integer greater than 1;
The switching node comprises M connectors, M is the number of the graphics processor nodes, and M is an integer greater than 1;
n connectors in the graphics processor node are connected to one connector in each switching node.
Where N may be twice as many as M.
The connection between the graphics processor node and the switching node is explained below with reference to fig. 4.
Fig. 4 is a schematic diagram of a connection relationship between a graphics processor node and a switching node according to an embodiment of the present application. Referring to fig. 4, fig. 4 includes a plurality of graphics processor nodes and a plurality of switching nodes.
A mesh topology is formed between the graphics processor nodes and the switching nodes. The plurality of graphics processor nodes and the plurality of switching nodes are coupled in pairs.
In one possible embodiment, the first cabinet further comprises a second powered copper bar, and the second cabinet further comprises a first powered copper bar.
Next, the power supply copper bar will be explained with reference to fig. 5 and 6.
Fig. 5 is a schematic structural diagram of a power supply copper bar according to an embodiment of the present application. Fig. 6 is a schematic diagram of a server structure according to an embodiment of the present application. Referring to fig. 5, fig. 5 includes a first power supply copper bar, a second power supply copper bar, a first power supply cabinet and a second power supply cabinet. Referring to fig. 6, fig. 6 includes a first cabinet including a first power cabinet, a plurality of graphics processor nodes, a second power copper bar, and a second power cabinet, and a second cabinet including a plurality of switching nodes and a first power copper bar.
The first power supply copper bar is used for supplying power to the plurality of graphics processor nodes, and the second power supply copper bar is used for supplying power to the plurality of switching nodes;
The first power supply copper bar is positioned at the middle position of the plurality of switching nodes and is close to the first side of the first cabinet, and the second power supply copper bar is positioned at the middle position of the plurality of graphic processor nodes and is close to the second side of the second cabinet;
the two ends of the first power supply copper bar are respectively connected with the first power cabinet and the second power cabinet, and the first power supply copper bar is connected with the second power supply copper bar.
The first power supply copper bar can be connected with the second power supply copper bar through the connector.
Therefore, the first power supply copper bar is specially used for supplying power to the nodes of the plurality of graphic processors, the second power supply copper bar is responsible for supplying power to the plurality of switching nodes, and the power supply mode with definite division of work can ensure that the nodes with different functions can obtain stable power, meet the working requirements of high-energy-consumption operation of the nodes of the graphic processors, data transmission of the switching nodes and the like, and ensure the stable operation of the system.
In one possible implementation, the graphics processor node includes a first power copper bar, and the switching node includes a second power copper bar;
The first power supply copper bar in the graphic processor node is connected with the first power supply copper bar of the second cabinet;
The second power supply copper bar in the switching node is connected with the second power supply copper bar of the first cabinet.
In one possible implementation, the first cabinet further includes a plurality of second heat sinks, and the second cabinet further includes a plurality of first heat sinks.
Next, the position of the heat sink in the server will be explained with reference to fig. 7.
Fig. 7 is a schematic diagram of another server structure according to an embodiment of the present application. Referring to fig. 7, fig. 7 includes a first cabinet and a second cabinet, the first cabinet includes a first power cabinet, a second radiator, a plurality of graphics processor nodes, a second power copper bar, and a second power cabinet, and the second cabinet includes a first radiator, a plurality of switching nodes, and a first power copper bar.
The first radiator is used for radiating heat for the plurality of graphic processor nodes, and the second radiator is used for radiating heat for the plurality of switching nodes;
the plurality of second radiators are respectively positioned between the first power cabinet and the plurality of graphic processor nodes and between the plurality of graphic processor nodes and the second power cabinet;
the first radiators are respectively positioned at the left side and the right side in the second cabinet.
Next, the structure of the heat sink will be explained with reference to fig. 8.
Fig. 8 is a schematic structural diagram of a heat sink according to an embodiment of the present application. Referring to fig. 8, fig. 8 includes a plurality of first heat sinks and a plurality of second heat sinks.
The first radiators comprise a first water inlet radiator and a first water outlet radiator, the first water inlet radiator is connected with the cabinet water inlet pipe, the first water outlet radiator is connected with the cabinet water outlet pipe, and the plurality of graphic processor nodes are respectively connected with the first water inlet radiator and the first water outlet radiator.
The second heat radiators comprise a second water inlet radiator and a second water outlet radiator, the second water inlet radiator is connected with the first water inlet radiator through a water inlet transfer pipe, the second water outlet radiator is connected with the first water outlet radiator through a water outlet transfer pipe, and the exchange nodes are respectively connected with the second water inlet radiator and the second water outlet radiator.
Thus, for the graph processor node, the graph processor node is connected with the first water inlet radiator and the first water outlet radiator respectively, the exchange node is connected with the second water inlet radiator and the second water outlet radiator, a large amount of heat generated by the operation of the node can be directly taken away, the node is ensured to work at a proper temperature, and the system performance and stability are maintained. And moreover, the first radiators and the second radiators are connected with each other through the transfer pipes, so that the heat dissipation systems of the graphics processor nodes and the exchange nodes are integrated together, the structure and the layout of the whole heat dissipation system are simplified, the installation, the maintenance and the management are convenient, and the complexity and the maintenance cost of the system are reduced.
In one possible implementation, the graphics processor node further includes a first water inlet interface and a first water outlet interface;
the first water inlet interface is connected with the first water inlet radiator, and the first water outlet interface is connected with the first water outlet radiator.
In one possible embodiment, the switching node further comprises a second water inlet interface and a second water outlet interface;
The second water inlet interface is connected with the second water inlet radiator, and the second water outlet interface is connected with the second water outlet radiator.
In one possible implementation, a graphics processor node includes a plurality of graphics processors, a plurality of central processing units, a plurality of memories, a plurality of hard disks, a plurality of network cards, an integrated circuit chip, and a plurality of connectors.
The graphics processors are connected with the connectors through the integrated circuit chips, the graphics processors are connected with the central processing unit, the central processing unit is connected with the memory, and the central processing unit is respectively connected with the hard disks and the network cards.
In one possible implementation, the graphics processor node further includes a first power supply copper bar, a first water inlet interface, and a first water outlet interface;
The first power supply copper bar is located in the middle position of the connectors, and the first water inlet interface and the first water outlet interface are respectively located on two sides of the connectors.
Next, the graphic processor node will be explained with reference to fig. 9.
Fig. 9 is a schematic structural diagram of a graphics processor node according to an embodiment of the present application. Referring to fig. 9, fig. 9 includes a graphics processor node.
The graphics processor node may include a plurality of graphics processors, a plurality of central processors, a plurality of memories, a plurality of hard disks, a plurality of network cards, an integrated circuit chip, a plurality of connectors, a first power copper bar, a first water inlet interface, and a first water outlet interface.
The first water inlet interface and the first water outlet interface can be used for being connected with a cooling liquid pipeline of the radiator, heat generated during operation of the node is taken away through circulating flow of cooling liquid, and each component is guaranteed to work in a proper temperature range.
An integrated circuit chip may be used to process and control signals from various components within the node.
In one possible implementation, the switching node includes a plurality of switching chips, a plurality of cable connectors, a plurality of optical modules, and a plurality of connectors;
the plurality of exchange chips are respectively connected with the plurality of cable connectors and the plurality of connectors, the plurality of cable connectors are respectively connected with the plurality of connectors, and the plurality of exchange chips are connected with the plurality of optical modules.
The exchange node also comprises a second power supply copper bar, a second water inlet interface and a second water outlet interface;
the second power supply copper bar is positioned in the middle of the connectors, the connectors are respectively provided with a plurality of first connectors and a plurality of second connectors, and the second water inlet interface and the second water outlet interface are respectively positioned at two sides of the connectors;
The plurality of cable connectors are divided into a plurality of first cable connectors and a plurality of second cable connectors, the plurality of first cable connectors are connected with the plurality of second connectors through cables, and the plurality of second cable connectors are connected with the plurality of first connectors through cables;
The plurality of exchange chips comprise a first exchange chip and a second exchange chip, the first exchange chip is connected with the plurality of first cable connectors through a printed circuit board, the first exchange chip is also connected with the plurality of first connectors through the printed circuit board, the second exchange chip is connected with the plurality of second cable connectors through the printed circuit board, and the second exchange chip is also connected with the plurality of second connectors through the printed circuit board.
In this way, the number of cables required inside the switching node is reduced and the signal integrity is improved by means of the printed circuit board connection.
Next, the switching node will be explained with reference to fig. 10.
Fig. 10 is a schematic structural diagram of a switching node according to an embodiment of the present application. Referring to fig. 10, fig. 10 includes a switching node.
The switching node may include a plurality of switching chips, a plurality of cable connectors, a plurality of optical modules, a plurality of connectors, a second power copper bar, a second water inlet interface, and a second water outlet interface.
In one possible embodiment, the plurality of exchange chips includes a first exchange chip including a first heat dissipation channel and a second exchange chip including a second heat dissipation channel.
Next, the heat dissipation pipe will be explained with reference to fig. 11.
Fig. 11 is a schematic diagram of a heat dissipation pipeline of a switching node according to an embodiment of the present application. Referring to fig. 11, fig. 11 includes a switching node including a heat dissipation pipe therein.
The second water inlet interface is connected with a heat dissipation pipeline of the second exchange chip, the heat dissipation pipeline of the second exchange chip is connected with heat dissipation pipelines of a plurality of optical modules, the heat dissipation pipelines of the plurality of optical modules are connected with the heat dissipation pipeline of the first exchange chip, and the heat dissipation pipeline of the first exchange chip is connected with the second water outlet interface.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or" describes an association of associated objects, meaning that there may be three relationships, e.g., A and/or B, and that there may be A alone, while A and B are present, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the front and rear associated objects are in a "or" relationship, and in the formula, the character "/" indicates that the front and rear associated objects are in a "division" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (a, b, or c) of a, b, c, a-b, a-c, b-c, or a-b-c may be represented, wherein a, b, c may be single or plural.
It will be appreciated that the various numerical numbers referred to in the embodiments of the present application are merely for ease of description and are not intended to limit the scope of the embodiments of the present application. In the embodiment of the present application, the sequence number of each process does not mean the sequence of the execution sequence, and the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application in any way.
It should be noted that the above embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present application.