[go: up one dir, main page]

CN115268581A - A high-performance computing power AI edge server system architecture - Google Patents

A high-performance computing power AI edge server system architecture Download PDF

Info

Publication number
CN115268581A
CN115268581A CN202210709739.4A CN202210709739A CN115268581A CN 115268581 A CN115268581 A CN 115268581A CN 202210709739 A CN202210709739 A CN 202210709739A CN 115268581 A CN115268581 A CN 115268581A
Authority
CN
China
Prior art keywords
module
computing node
accelerator card
node module
server system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210709739.4A
Other languages
Chinese (zh)
Other versions
CN115268581B (en
Inventor
林增权
吴戈
吕腾
李鸿强
莫良伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baode Computer System Co ltd
Original Assignee
Baode Computer System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baode Computer System Co ltd filed Critical Baode Computer System Co ltd
Priority to CN202210709739.4A priority Critical patent/CN115268581B/en
Priority claimed from CN202210709739.4A external-priority patent/CN115268581B/en
Publication of CN115268581A publication Critical patent/CN115268581A/en
Application granted granted Critical
Publication of CN115268581B publication Critical patent/CN115268581B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/18Packaging or power distribution
    • G06F1/183Internal mounting support structures, e.g. for printed circuit boards, internal connecting means
    • G06F1/185Mounting of expansion boards
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/18Packaging or power distribution
    • G06F1/183Internal mounting support structures, e.g. for printed circuit boards, internal connecting means
    • G06F1/186Securing of expansion boards in correspondence to slots provided at the computer enclosure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • G06F13/4081Live connection to bus, e.g. hot-plugging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Power Engineering (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Multi Processors (AREA)

Abstract

本申请公开了一种高性能计算力的AI边缘服务器系统架构,用于提高使用的灵活性,满足日益增长的数据实时性和安全性需求。本申请中的AI边缘服务器系统架构包括:机箱、CPU计算节点模组、加速卡计算节点模组、路由板以及电源模组,CPU计算节点模组包括PCIe通道,加速卡计算节点模组上设置有第一加速卡热插拔模组,第一加速卡热插拔模组上安插的加速卡连接网络,第一加速卡热插拔模组通过交换模块交换汇聚于加速卡计算节点模组网络接口,加速卡计算节点模组上还设置有第二加速卡插拔模组,第二加速卡插拔模组通过线缆连接在CPU计算节点模组上,第二加速卡插拔模组上安插的加速卡通过PCIe通道并通过交换模块汇聚于加速卡计算节点模组网络接口,该网络接口与路由板网络接口连接。

Figure 202210709739

The present application discloses an AI edge server system architecture with high-performance computing power, which is used to improve the flexibility of use and meet the increasing demands for real-time data and security. The AI edge server system architecture in this application includes: a chassis, a CPU computing node module, an accelerator card computing node module, a routing board, and a power supply module. The CPU computing node module includes a PCIe channel, and the accelerator card computing node module is provided with There is a first accelerator card hot-swap module, the accelerator card installed on the first accelerator card hot-swap module is connected to the network, and the first accelerator card hot-swap module is exchanged and converged on the acceleration card computing node module network through the exchange module. Interface, the acceleration card computing node module is also provided with a second accelerator card plug-in module, the second accelerator card plug-in module is connected to the CPU computing node module through a cable, and the second accelerator card plug-in module is connected to the The inserted acceleration card is converged on the network interface of the computing node module of the acceleration card through the PCIe channel and through the switching module, and the network interface is connected with the network interface of the routing board.

Figure 202210709739

Description

一种高性能计算力的AI边缘服务器系统架构A high-performance computing AI edge server system architecture

技术领域technical field

本申请涉及计算机技术领域,尤其涉及一种高性能计算力的AI边缘服务器系统架构。This application relates to the field of computer technology, and in particular to an AI edge server system architecture with high-performance computing power.

背景技术Background technique

随着物联网终端设备数量的激增,以及日益增长的数据实时性和安全性需求,在很多行业的应用场景,边缘计算将变得至关重要,例如智慧交通的道路管理和自动驾驶、智能制造的质量检测和设备监控、智慧医疗的疾病监控和辅助诊断等等。边缘计算在中国还处在早期发展阶段,随着数据的越来越多越复杂,目前市面上广泛使用的边缘计算服务器的加速卡数量有限,会限制边缘计算的算力。With the rapid increase in the number of IoT terminal devices and the increasing demand for real-time data and security, edge computing will become crucial in many industry application scenarios, such as road management and autonomous driving for smart traffic, and smart manufacturing. Quality inspection and equipment monitoring, disease monitoring and auxiliary diagnosis of smart medical care, etc. Edge computing is still in the early stage of development in China. As the data becomes more and more complex, the number of accelerator cards widely used in the market for edge computing servers is limited, which will limit the computing power of edge computing.

目前,为了解决边缘计算服务器的加速卡数量有限的问题,会扩展大量的加速卡,以提升边缘计算的算力。通常会采用高速串行计算机扩展总线标准(peripheralcomponent interconnect express,PCIE)扩展方式进行扩展。At present, in order to solve the problem of the limited number of accelerator cards for edge computing servers, a large number of accelerator cards will be expanded to increase the computing power of edge computing. Generally, a high-speed serial computer expansion bus standard (peripheral component interconnect express, PCIE) expansion method is used for expansion.

使用现有的加速卡扩展方式,在扩展加速卡时需要用复杂的背板结构,PCIE总线速率非常高,需要使用很多高速连接器和信号重建或信号放大芯片,运维困难,难以满足变化的需求。Using the existing accelerator card expansion method, a complex backplane structure is required when expanding the accelerator card. The PCIE bus rate is very high, and many high-speed connectors and signal reconstruction or signal amplification chips are required. It is difficult to operate and maintain, and it is difficult to meet changing requirements. need.

发明内容Contents of the invention

为了解决上述技术问题,本申请提供了一种高性能计算力的AI边缘服务器系统架构,用于解决原来基于标准服务器的运维困难、扩展的加速卡数量有限、算力有限的问题,同时提高了使用的灵活性,满足日益增长的数据实时性和安全性需求。In order to solve the above technical problems, this application provides an AI edge server system architecture with high-performance computing power, which is used to solve the problems of difficult operation and maintenance based on standard servers, limited number of expanded accelerator cards, and limited computing power. It improves the flexibility of use and meets the growing demand for data real-time and security.

本申请提供了一种高性能计算力的AI边缘服务器系统架构,包括:This application provides a high-performance computing AI edge server system architecture, including:

机箱、CPU计算节点模组、加速卡计算节点模组、路由板以及电源模组;Chassis, CPU computing node module, accelerator card computing node module, routing board and power supply module;

所述电源模组分别与所述CPU计算节点模组、加速卡计算节点模组以及路由板电性连接;The power supply module is electrically connected to the CPU computing node module, the accelerator card computing node module and the routing board;

所述CPU计算节点模组安装在所述机箱下层,所述CPU计算节点模组包括PCIe通道;The CPU computing node module is installed on the lower layer of the chassis, and the CPU computing node module includes a PCIe channel;

所述加速卡节点模块安装在所述机箱上层,所述加速卡计算节点模组上设置有第一加速卡热插拔模组,所述第一加速卡热插拔模组上安插的加速卡连接网络,所述加速卡上设置有交换模块,所述第一加速卡热插拔模组通过所述交换模块交换汇聚于所述加速卡计算节点模组的网络接口,所述加速卡计算节点模组上还设置有第二加速卡插拔模组,所述第二加速卡插拔模组通过线缆连接在所述CPU计算节点模组上,所述第二加速卡插拔模组上安插的加速卡通过所述PCIe通道并通过交换模块汇聚于所述加速卡计算节点模组的网络接口,所述加速卡计算节点模组的网络接口与所述路由板的网络接口连接。The accelerator card node module is installed on the upper layer of the chassis, the accelerator card computing node module is provided with a first accelerator card hot-swap module, and the accelerator card inserted on the first accelerator card hot-swap module Connecting to the network, the accelerator card is provided with a switch module, and the first accelerator card hot-swappable module exchanges and converges on the network interface of the accelerator card computing node module through the switch module, and the accelerator card computing node The module is also provided with a second accelerator card plug-in module, the second accelerator card plug-in module is connected to the CPU computing node module through a cable, and the second accelerator card plug-in module is The inserted accelerator card is converged to the network interface of the accelerator card computing node module through the PCIe channel and the switch module, and the network interface of the accelerator card computing node module is connected to the network interface of the routing board.

可选地,所述CPU计算节点模组上设置有PCIe插槽,所述PCIe插槽用于安插GPU卡。Optionally, the CPU computing node module is provided with a PCIe slot for inserting a GPU card.

可选地,所述CPU计算节点模组上还包括硬盘模组,用于提供存储功能。Optionally, the CPU computing node module further includes a hard disk module for providing a storage function.

可选地,所述CPU计算节点模块上还设置有风扇,用于给所述CPU计算节点模块散热。Optionally, a fan is further provided on the CPU computing node module for dissipating heat from the CPU computing node module.

可选地,所述CPU计算节点模块上设置有内存,用于存储数据。Optionally, the CPU computing node module is provided with a memory for storing data.

可选地,所述加速卡计算节点模组上安装有中板;Optionally, a midplane is installed on the accelerator card computing node module;

所述加速卡计算节点模组的网络接口通过连接器与所述中板连接,所述中板上的网络接口与所述路由板的网络接口的网线相连。The network interface of the accelerator card computing node module is connected to the mid-board through a connector, and the network interface of the mid-board is connected to the network cable of the network interface of the routing board.

可选地,所述所述CPU计算节点模块上设置OCP3.0模块。Optionally, an OCP3.0 module is set on the CPU computing node module.

可选地,所述第二加速卡插拔模组包括18张支持PCIe的加速卡模组Optionally, the second accelerator card plug-in module includes 18 accelerator card modules supporting PCIe

可选地,所述机箱为4U双层机箱。Optionally, the chassis is a 4U double-layer chassis.

可选地,所述机箱还包括机箱上盖;Optionally, the chassis also includes a chassis cover;

所述机箱上盖设置在所述机箱开口处,用于对所述机箱进行封口。The casing upper cover is arranged at the opening of the casing, and is used for sealing the casing.

从以上技术方案可以看出,本申请实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:

本申请中,CPU计算节点模组包括PCIe通道,加速卡计算节点模组上设置有第一加速卡热插拔模组,第一加速卡热插拔模组上安插的加速卡连接网络,加速卡上设置有交换模块,第一加速卡热插拔模组通过交换模块交换汇聚于加速卡计算节点模组的网络接口,加速卡计算节点模组上还设置有第二加速卡插拔模组,第二加速卡插拔模组通过线缆连接在CPU计算节点模组上,第二加速卡插拔模组上安插的加速卡通过PCIe通道并通过交换模块汇聚于加速卡计算节点模组的网络接口,加速卡计算节点模组的网络接口与路由板的网络接口连接。加速卡通过交换模块交换,通过网络连接,汇聚于路由板的网络接口,通过路由板的转换可以向外提供两个IP与终端连接,每个加速卡单独工作,可以提供多条信号通道,大大提高AI计算能力,提高了使用的灵活性,运维简单,满足日益增长的数据实时性和安全性需求。In this application, the CPU computing node module includes a PCIe channel, the accelerator card computing node module is provided with a first accelerator card hot-swappable module, and the accelerator card installed on the first accelerator card hot-swap module is connected to the network to accelerate The card is provided with a switch module, and the first accelerator card hot-swappable module is exchanged and converged at the network interface of the accelerator card computing node module through the switch module, and the second accelerator card plug-in module is also set on the accelerator card computing node module , the second accelerator card plug-in module is connected to the CPU computing node module through a cable, and the accelerator card installed on the second accelerator card plug-in module is converged to the accelerator card computing node module through the PCIe channel and the switch module. Network interface, the network interface of the accelerator card computing node module is connected to the network interface of the routing board. The acceleration cards are exchanged through the switching module, connected through the network, and converged at the network interface of the routing board. Through the conversion of the routing board, two IPs can be connected to the terminal. Each accelerator card works independently and can provide multiple signal channels. Improve AI computing capabilities, improve the flexibility of use, simplify operation and maintenance, and meet the growing demand for data real-time and security.

附图说明Description of drawings

为了更清楚地说明本申请中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in this application more clearly, the accompanying drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the application. Those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort.

图1为本申请中高性能计算力的AI边缘服务器系统架构的结构示意图;FIG. 1 is a schematic structural diagram of an AI edge server system architecture with high-performance computing power in this application;

图2为本申请中CPU计算节点模组俯视示意图;FIG. 2 is a schematic top view of the CPU computing node module in this application;

图3为本申请中加速卡计算节点模组俯视示意图;FIG. 3 is a schematic top view of the accelerator card computing node module in this application;

图4为本申请中加速卡俯视示意图;FIG. 4 is a schematic top view of the accelerator card in the present application;

图5为本申请中高性能计算力的AI边缘服务器系统架构的供电线路示意图。FIG. 5 is a schematic diagram of the power supply circuit of the high-performance computing AI edge server system architecture in this application.

具体实施方式Detailed ways

在本申请中,术语“上”、“下”、“左”、“右”、“前”、“后”、“顶”、“底”、“内”、“外”、“中”、“竖直”、“水平”、“横向”、“纵向”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅用于说明各部件或组成部分之间的相对位置关系,并不特别限定各部件或组成部分的具体安装方位。In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", The orientation or positional relationship indicated by "vertical", "horizontal", "horizontal", "longitudinal", etc. is based on the orientation or positional relationship shown in the drawings, and is only used to illustrate the relative positional relationship between components or components , does not specifically limit the specific installation orientation of each component or component.

并且,上述部分术语除了可以用于表示方位或位置关系以外,还可能用于表示其他含义,例如术语“上”在某些情况下也可能用于表示某种依附关系或连接关系。对于本领域普通技术人员而言,可以根据具体情况理解这些术语在本申请中的具体含义。Moreover, some of the above terms may be used to indicate other meanings besides orientation or positional relationship, for example, the term "upper" may also be used to indicate a certain attachment relationship or connection relationship in some cases. Those skilled in the art can understand the specific meanings of these terms in this application according to specific situations.

此外,术语“安装”、“设置”、“设有”、“连接”、“相连”应做广义理解。例如,可以是固定连接,可拆卸连接,或整体式构造;可以是机械连接,或电连接;可以是直接相连,或者是通过中间媒介间接相连,又或者是两个装置、元件或组成部分之间内部的连通。对于本领域普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。Furthermore, the terms "installed", "disposed", "provided", "connected", "connected" are to be interpreted broadly. For example, it may be a fixed connection, a detachable connection, or an integral structure; it may be a mechanical connection or an electrical connection; it may be a direct connection or an indirect connection through an intermediary; internal connectivity. Those of ordinary skill in the art can understand the specific meanings of the above terms in this application according to specific situations.

此外,在本申请中所附图式所绘制的结构、比例、大小等,均仅用于配合说明书所揭示的内容,以供本领域技术人员了解与阅读,并非用于限定本申请可实施的限定条件,故不具有技术上的实质意义,任何结构的修饰、比例关系的改变或大小的调整,在不影响本申请所能产生的功效及所能达成的目的下,均仍应落在本申请所揭示的技术内容涵盖的范围内。In addition, the structures, proportions, sizes, etc. drawn in the accompanying drawings in this application are only used to match the content disclosed in the specification, for those skilled in the art to understand and read, and are not used to limit the implementation of this application. Therefore, it has no technical substantive meaning. Any modification of structure, change of proportional relationship or adjustment of size shall still fall within the scope of this application without affecting the effect and purpose of this application. within the scope covered by the technical content disclosed in the application.

本申请实施例提供了高性能计算力的AI边缘服务器系统架构,用于解决原来基于标准服务器的运维困难、扩展的加速卡数量有限、算力有限的问题,同时提高了使用的灵活性,满足日益增长的数据实时性和安全性需求。The embodiment of this application provides an AI edge server system architecture with high-performance computing power, which is used to solve the problems of difficult operation and maintenance based on standard servers, limited number of expanded accelerator cards, and limited computing power, while improving the flexibility of use. Meet the growing data real-time and security requirements.

通常情况下,边缘计算常常与物联网联系在一起。物联网设备参与到越来越强大的处理中,因此生成的大量数据需要迁移到网络的“边缘”,数据不必在集中式服务器之间连续地来回传输来处理。因此,边缘计算在管理来自物联网设备的大量数据方面效率更高,延迟更低,处理速度更快,且可扩展。边缘计算在中国还处在早期发展阶段,随着数据的越来越多越复杂,目前市面上广泛使用的边缘计算服务器的加速卡数量有限,限制了边缘计算的算力,以往如果要扩展大量的加速卡,用PCIe扩展方式,那就需要用复杂的背板结构,PCIe总线速率非常高,能达到2.5Gbps-16Gbps,需要使用很多高速连接器和信号重建或信号放大芯片,运维较为困难,难以满足变化的需求。本申请中的高性能计算力的AI边缘服务器系统架构能够有效解决上述问题,Typically, edge computing is often associated with the Internet of Things. IoT devices are involved in increasingly powerful processing, and thus the vast amounts of data generated need to be migrated to the "edge" of the network, where the data does not have to be continuously transferred back and forth between centralized servers for processing. As a result, edge computing is more efficient at managing large volumes of data from IoT devices, with lower latency, faster processing, and scalability. Edge computing is still in the early stage of development in China. As the data becomes more and more complex, the number of accelerator cards for edge computing servers widely used in the market is currently limited, which limits the computing power of edge computing. In the past, if you want to expand a large number of Acceleration card with PCIe expansion requires a complex backplane structure. The PCIe bus rate is very high and can reach 2.5Gbps-16Gbps. It requires many high-speed connectors and signal reconstruction or signal amplification chips, which makes operation and maintenance more difficult. , it is difficult to meet the changing needs. The high-performance computing AI edge server system architecture in this application can effectively solve the above problems,

请参阅图1,图1为本申请中高性能计算力的AI边缘服务器系统架构的一个结构示意图,包括:Please refer to Figure 1. Figure 1 is a schematic structural diagram of the high-performance computing AI edge server system architecture in this application, including:

机箱、CPU计算节点模组1、加速卡计算节点模组2、路由板以及电源模组11;Chassis, CPU computing node module 1, accelerator card computing node module 2, routing board and power supply module 11;

电源模组11分别与CPU计算节点模组1、加速卡计算节点模组2以及路由板电性连接;The power supply module 11 is electrically connected to the CPU computing node module 1, the accelerator card computing node module 2 and the routing board;

CPU计算节点模组1安装在机箱下层,CPU计算节点模组1包括PCIe通道;CPU computing node module 1 is installed on the lower layer of the chassis, and CPU computing node module 1 includes PCIe channels;

加速卡节点模块安装在机箱上层,加速卡计算节点模组2上设置有第一加速卡热插拔模组21,第一加速卡热插拔模组21上安插的加速卡连接网络,加速卡上设置有交换模块24,第一加速卡热插拔模组21通过交换模块24交换汇聚于加速卡计算节点模组2的网络接口,加速卡计算节点模组2上还设置有第二加速卡插拔模组,第二加速卡插拔模组通过线缆连接在CPU计算节点模组1上,第二加速卡插拔模组上安插的加速卡通过PCIe通道并通过交换模块24汇聚于加速卡计算节点模组2的网络接口,加速卡计算节点模组2的网络接口与路由板的网络接口连接。The accelerator card node module is installed on the upper layer of the chassis. The accelerator card computing node module 2 is provided with a first accelerator card hot-swappable module 21. The accelerator card installed on the first accelerator card hot-swap module 21 is connected to the network. The accelerator card A switch module 24 is arranged on the top, and the first accelerator card hot-swappable module 21 exchanges and converges on the network interface of the accelerator card computing node module 2 through the switch module 24, and the accelerator card computing node module 2 is also provided with a second accelerator card The plug-in module, the second accelerator card plug-in module is connected to the CPU computing node module 1 through a cable, and the accelerator card installed on the second accelerator card plug-in module is converged in the acceleration card through the PCIe channel and through the switch module 24. The network interface of the card computing node module 2, and the network interface of the acceleration card computing node module 2 are connected to the network interface of the routing board.

其中,随着网络设备对带宽,灵活性与性能的要求升高,PCIe标准应运而生。PCIe(peripheral component interconnect express)是一种高速串行计算机扩展总线标准。PCIe属于高速串行点对点双通道高带宽传输,所连接的设备分配独享通道带宽,不共享总线带宽,主要支持主动电源管理,错误报告,端对端的可靠性传输,热插拔以及服务质量等功能,且数据传输速率高。通道就是连接外部信号的通路或接口等,PCIe通道即PCIe信号,一个通道就对应一个信号,例如测多个点的力、温度、湿度等,各通道采集的信号一般是顺序地轮流送往信号调理电路,然后是A/D转换,再传到微处理器。通道数通常是8的倍数,即8通道、16通道等。Among them, as network devices have higher requirements on bandwidth, flexibility and performance, the PCIe standard has emerged as the times require. PCIe (peripheral component interconnect express) is a high-speed serial computer expansion bus standard. PCIe belongs to high-speed serial point-to-point dual-channel high-bandwidth transmission. The connected devices are assigned exclusive channel bandwidth and do not share bus bandwidth. It mainly supports active power management, error reporting, end-to-end reliable transmission, hot swapping and quality of service, etc. function, and the data transfer rate is high. A channel is a path or interface for connecting external signals. A PCIe channel is a PCIe signal. One channel corresponds to one signal, such as measuring force, temperature, humidity, etc. at multiple points. The signals collected by each channel are generally sequentially sent to the signal Conditioning circuits, followed by A/D conversion, and then passed to the microprocessor. The number of channels is usually a multiple of 8, that is, 8 channels, 16 channels, etc.

请进一步参阅图2、图3以及图4,其中,电源模组11与主板连接器通过热插拔方式连接,给的主板供电,热插拔即带电插拔,指的是在不关闭系统电源的情况下,将模块、板卡插入或拔出系统而不影响系统的正常工作,从而提高了系统的可靠性、快速维修性、冗余性和对灾难的及时恢复能力等。该主板上设置有PCIe插槽13,PCIe插槽13用于安插GPU卡,一般情况下设置3个PCIE 5.0插槽,可扩展3张A100 GPU卡,能够优化计算资源的利用率;CPU计算节点模组1上还包括硬盘模组12,该硬盘模组12为12个3.5寸硬盘模组,硬盘模组12的背板通过PCIe通道与主板上的slimsas接口线缆连接,提供存储功能,提高计算机的运行性能和使用的灵活性。CPU计算节点模块上还设置有风扇15,用于给CPU计算节点模块散热,减少高温下CPU的运算性能出现下降的情况;CPU计算节点模块上设置有内存14,设置的内存为32张DDR5内存,用于存储数据。该CPU计算节点模组1还包括主板,OCP3.0模块,2颗Sapphire Rapids系列处理器,路由板以及一组1+1热插拔电源模组11,1+1就是各分担50%的负载,如果有一台电源模组11出现问题,另一台就会承担所有负载,防止出现当仅单台电源模组11出现故障导致停运的情况。其中,第一加速卡计算节点模组2包括前置两个25张第一加速卡热插拔模组,后置两个16张第一加速卡热插拔模组,包括18张走PCIe通道的第二加速卡计算节点模组2,一共100张加速卡,交换模块24设置在加速卡上,还包含一组1+1热插拔电源模组11,以及散热风扇。可选地,加速卡计算节点模组2还安装有中板23,加速卡计算节点模组2的网络接口通过连接器与中板23连接,中板23上的网络接口与路由板的网络接口的网线相连。Please refer to Fig. 2, Fig. 3 and Fig. 4 further, in which, the power supply module 11 is connected to the connector of the main board through hot-swapping to supply power to the main board. Under the circumstances, modules and boards can be inserted into or pulled out of the system without affecting the normal operation of the system, thereby improving the reliability, rapid maintenance, redundancy and timely recovery of disasters of the system. The main board is provided with PCIe slot 13, which is used to install GPU cards. Generally, there are 3 PCIE 5.0 slots, which can expand 3 A100 GPU cards, which can optimize the utilization of computing resources; CPU computing nodes The module 1 also includes a hard disk module 12, the hard disk module 12 is 12 3.5-inch hard disk modules, the backplane of the hard disk module 12 is connected with the slimsas interface cable on the motherboard through the PCIe channel, providing storage function, improving Computer performance and flexibility of use. The CPU computing node module is also provided with a fan 15, which is used to dissipate heat for the CPU computing node module to reduce the decrease in CPU computing performance under high temperature; the CPU computing node module is provided with a memory 14, and the set memory is 32 pieces of DDR5 memory , used to store data. The CPU computing node module 1 also includes a main board, an OCP3.0 module, two Sapphire Rapids series processors, a routing board, and a set of 1+1 hot-swappable power supply modules 11, 1+1 is to share 50% of the load , if there is a problem with one power supply module 11, the other will bear all the loads, preventing the situation that only a single power supply module 11 fails and causes outage. Among them, the first accelerator card computing node module 2 includes two front-mounted hot-swappable modules of 25 first accelerator cards, and two rear-mounted hot-swappable modules of 16 first accelerator cards, including 18 PCIe channels The second accelerator card computing node module 2 has a total of 100 accelerator cards. The switch module 24 is arranged on the accelerator card, and also includes a set of 1+1 hot-swappable power supply modules 11 and cooling fans. Optionally, the accelerator card computing node module 2 is also equipped with a middle board 23, the network interface of the accelerator card computing node module 2 is connected to the middle board 23 through a connector, and the network interface on the middle board 23 is connected to the network interface of the routing board connected by the network cable.

其中,请进一步参阅图5,供电线路方案可以为:16个第一加速卡热插拔模组和25个第一加速卡热插拔模组直接通过网络连接。18个第二加速卡热插拔模组走CPU计算节点模块主板上的PCIe通道,每个通道的时钟序号不同。加速卡计算节点模组2的18个第二加速卡热插拔模组的电源供电为CPU计算节点模组1的电源供电。CPU计算节点模块的2000W1+1冗余热插拔电源给主板供电,通过电缆供电给18个第二加速卡热插拔模组以及CPU计算节点模块的风扇;加速卡计算节点模组2的16个第一加速卡热插拔模组和25个第一加速卡热插拔模组均由上面2000W1+1热插拔电源直接供电。Wherein, please further refer to FIG. 5 , the power supply circuit scheme may be: 16 first accelerator card hot-swap modules and 25 first accelerator card hot-swap modules are directly connected through the network. The 18 second accelerator card hot-swappable modules use the PCIe channels on the motherboard of the CPU computing node module, and the clock numbers of each channel are different. The power supply of the 18 second accelerator card hot-swappable modules of the accelerator card computing node module 2 is the power supply of the CPU computing node module 1. The 2000W1+1 redundant hot-swappable power supply of the CPU computing node module supplies power to the main board, and supplies power to 18 second accelerator card hot-swappable modules and fans of the CPU computing node module through cables; 16 of the accelerator card computing node module 2 The first accelerator card hot-swap module and 25 first accelerator card hot-swap modules are directly powered by the above 2000W1+1 hot-swap power supply.

加速卡节点的25个第一加速卡热插拔模组和16个第一加速卡热插拔模组直接网络连接,1个交换模块最大可以交换9个加速卡,两组25个第一加速卡热插拔模组和两组16个第一加速卡热插拔模组通过置于加速卡背部的10个交换模块交换汇聚于10个网络接口,通过连接器与第一中板连接,由第一中板上10个网络接口与路由板的10个网络接口网线相连;上层加速卡节点的18个第二加速卡插拔模组通过mcio接口与下层CPU计算节点上主板的mcio接口相连,走PCIe信号,继而通过2个交换模块交换汇聚出2个网络接口,通过连接器与第二中板连接,第二中板上的2个网络接口与路由板的2个网络接口网线连接,最终所有的加速卡一共通过12个网络接口汇聚于路由板,由路由板对外提供2个IP地址与终端相连。每个加速卡单独工作,最大可以提供高达2000多条信号通道,大大提高AI计算能力,提高了使用的灵活性,运维简单,满足日益增长的数据实时性和安全性需求。The 25 first accelerator card hot-swappable modules and 16 first accelerator card hot-swap modules of the accelerator card node are directly connected to the network. One switch module can exchange a maximum of 9 accelerator cards, and two groups of 25 first accelerator cards The card hot-swappable module and two groups of 16 hot-swappable modules of the first accelerator card are exchanged and converged on 10 network interfaces through the 10 switching modules placed on the back of the accelerator card, and are connected to the first mid-plane through a connector. The 10 network interfaces on the first middle board are connected to the 10 network interfaces on the routing board; the 18 second accelerator card plug-in modules on the upper-layer accelerator card node are connected to the mcio interface on the motherboard on the lower-layer CPU computing node through the mcio interface. The PCIe signal is used, and then two network interfaces are exchanged and aggregated through two switching modules, and connected to the second midplane through a connector. The two network interfaces on the second midplane are connected to the two network interfaces of the routing board with network cables. All accelerator cards are aggregated to the routing board through 12 network interfaces, and the routing board provides 2 IP addresses to connect to the terminal. Each accelerator card works independently and can provide a maximum of more than 2,000 signal channels, which greatly improves AI computing capabilities, improves the flexibility of use, and is simple to operate and maintain, meeting the growing demand for real-time data and security.

可选地,机箱为4U双层机箱。“U”在服务器领域中特指机架式服务器厚度,是一种表示服务器外部尺寸的单位,是unit的缩略语,详细的尺寸由作为业界团体的美国电子工业协会所决定。厚度以厘米为基本单位。1U就是4.45cm,4U则是1U的4倍为17.8cm。本实施例中,机箱的尺寸可以根据实际需要安放的设备进行调整。Optionally, the chassis is a 4U double-layer chassis. "U" refers specifically to the thickness of rack-mounted servers in the server field. It is a unit that indicates the external dimensions of the server. The basic unit of thickness is cm. 1U is 4.45cm, and 4U is 17.8cm which is 4 times of 1U. In this embodiment, the size of the chassis can be adjusted according to the actual equipment to be placed.

可选地,机箱还包括机箱上盖;Optionally, the chassis also includes a chassis upper cover;

机箱上盖设置在机箱开口处,用于对机箱进行封口。减少灰尘进入机箱中,造成机箱中的部件损坏的情况。The casing upper cover is arranged at the opening of the casing, and is used for sealing the casing. Reduce dust entering the chassis and causing damage to components in the chassis.

需要声明的是,上述发明内容及具体实施方式意在证明本申请所提供技术方案的实际应用,不应解释为对本申请保护范围的限定。本领域技术人员在本申请的精神和原理内,当可作各种修改、等同替换或改进。本申请的保护范围以所附权利要求书为准。It should be declared that the above summary of the invention and specific implementation methods are intended to prove the practical application of the technical solutions provided by the application, and should not be interpreted as limiting the scope of protection of the application. Those skilled in the art may make various modifications, equivalent replacements or improvements within the spirit and principles of the present application. The scope of protection of the present application is subject to the appended claims.

Claims (10)

1.一种高性能计算力的AI边缘服务器系统架构,其特征在于,包括:1. An AI edge server system architecture with high-performance computing power, characterized in that it includes: 机箱、CPU计算节点模组、加速卡计算节点模组、路由板以及电源模组;Chassis, CPU computing node module, accelerator card computing node module, routing board and power supply module; 所述电源模组分别与所述CPU计算节点模组、加速卡计算节点模组以及路由板电性连接;The power supply module is electrically connected to the CPU computing node module, the accelerator card computing node module and the routing board; 所述CPU计算节点模组安装在所述机箱下层,所述CPU计算节点模组包括PCIe通道;The CPU computing node module is installed on the lower layer of the chassis, and the CPU computing node module includes a PCIe channel; 所述加速卡节点模块安装在所述机箱上层,所述加速卡计算节点模组上设置有第一加速卡热插拔模组,所述第一加速卡热插拔模组上安插的加速卡连接网络,所述加速卡上设置有交换模块,所述第一加速卡热插拔模组通过所述交换模块交换汇聚于所述加速卡计算节点模组的网络接口,所述加速卡计算节点模组上还设置有第二加速卡插拔模组,所述第二加速卡插拔模组通过线缆连接在所述CPU计算节点模组上,所述第二加速卡插拔模组上安插的加速卡通过所述PCIe通道并通过交换模块汇聚于所述加速卡计算节点模组的网络接口,所述加速卡计算节点模组的网络接口与所述路由板的网络接口连接。The accelerator card node module is installed on the upper layer of the chassis, the accelerator card computing node module is provided with a first accelerator card hot-swap module, and the accelerator card inserted on the first accelerator card hot-swap module Connecting to the network, the accelerator card is provided with a switch module, and the first accelerator card hot-swappable module exchanges and converges on the network interface of the accelerator card computing node module through the switch module, and the accelerator card computing node The module is also provided with a second accelerator card plug-in module, the second accelerator card plug-in module is connected to the CPU computing node module through a cable, and the second accelerator card plug-in module is The inserted accelerator card is converged to the network interface of the accelerator card computing node module through the PCIe channel and the switch module, and the network interface of the accelerator card computing node module is connected to the network interface of the routing board. 2.根据权利要求1所述的AI边缘服务器系统架构,其特征在于,所述CPU计算节点模组上设置有PCIe插槽,所述PCIe插槽用于安插GPU卡。2. The AI edge server system architecture according to claim 1, wherein the CPU computing node module is provided with a PCIe slot for inserting a GPU card. 3.根据权利要求1所述的AI边缘服务器系统架构,其特征在于,所述CPU计算节点模组上还包括硬盘模组,用于提供存储功能。3. The AI edge server system architecture according to claim 1, wherein the CPU computing node module further includes a hard disk module for providing storage functions. 4.根据权利要求1所述的AI边缘服务器系统架构,其特征在于,所述CPU计算节点模块上还设置有风扇,用于给所述CPU计算节点模块散热。4. The AI edge server system architecture according to claim 1, wherein a fan is further provided on the CPU computing node module to dissipate heat for the CPU computing node module. 5.根据权利要求1所述的AI边缘服务器系统架构,其特征在于,所述CPU计算节点模块上设置有内存,用于存储数据。5. The AI edge server system architecture according to claim 1, wherein the CPU computing node module is provided with a memory for storing data. 6.根据权利要求1所述的AI边缘服务器系统架构,其特征在于,所述加速卡计算节点模组上安装有中板;6. The AI edge server system architecture according to claim 1, wherein a middle board is installed on the accelerator card computing node module; 所述加速卡计算节点模组的网络接口通过连接器与所述中板连接,所述中板上的网络接口与所述路由板的网络接口的网线相连。The network interface of the accelerator card computing node module is connected to the mid-board through a connector, and the network interface of the mid-board is connected to the network cable of the network interface of the routing board. 7.根据权利要求1所述的AI边缘服务器系统架构,其特征在于,所述所述CPU计算节点模块上设置OCP3.0模块。7. The AI edge server system architecture according to claim 1, wherein an OCP3.0 module is set on the CPU computing node module. 8.根据权利要求1所述的AI边缘服务器系统架构,其特征在于,所述第二加速卡插拔模组包括18张支持PCIe的加速卡模组。8. The AI edge server system architecture according to claim 1, wherein the second accelerator card plug-in module includes 18 accelerator card modules supporting PCIe. 9.根据权利要求1至8中任一项所述的AI边缘服务器系统架构,其特征在于,所述机箱为4U双层机箱。9. The AI edge server system architecture according to any one of claims 1 to 8, wherein the chassis is a 4U double-layer chassis. 10.根据权利要求9所述的AI边缘服务器系统架构,其特征在于,所述机箱还包括机箱上盖;10. The AI edge server system architecture according to claim 9, wherein the chassis further comprises a chassis top cover; 所述机箱上盖设置在所述机箱开口处,用于对所述机箱进行封口。The casing upper cover is arranged at the opening of the casing, and is used for sealing the casing.
CN202210709739.4A 2022-06-22 A high-performance computing AI edge server system architecture Active CN115268581B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210709739.4A CN115268581B (en) 2022-06-22 A high-performance computing AI edge server system architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210709739.4A CN115268581B (en) 2022-06-22 A high-performance computing AI edge server system architecture

Publications (2)

Publication Number Publication Date
CN115268581A true CN115268581A (en) 2022-11-01
CN115268581B CN115268581B (en) 2025-04-15

Family

ID=

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117931722A (en) * 2024-03-20 2024-04-26 苏州元脑智能科技有限公司 Computing device and server system
CN119167409A (en) * 2024-11-25 2024-12-20 浙江大华技术股份有限公司 Distributed computing server and distributed computing system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117931722A (en) * 2024-03-20 2024-04-26 苏州元脑智能科技有限公司 Computing device and server system
CN117931722B (en) * 2024-03-20 2024-06-07 苏州元脑智能科技有限公司 Computing device and server system
CN119167409A (en) * 2024-11-25 2024-12-20 浙江大华技术股份有限公司 Distributed computing server and distributed computing system

Similar Documents

Publication Publication Date Title
JP3157935U (en) server
CN113448402B (en) A server supporting multi-backplane cascading
CN108090014A (en) The storage IO casees system and its design method of a kind of compatible NVMe
CN104460927A (en) 4U high-density storage system power supply equipment and method
CN108874711B (en) A Hard Disk Backplane System with Optimized Heat Dissipation
CN106919233A (en) A kind of high density storage server architecture system
CN207704358U (en) A kind of production domesticization server
CN106919533B (en) 4U high-density storage type server
CN111427833A (en) Server cluster
CN210428236U (en) High-density eight-path server
US12072827B2 (en) Scaling midplane bandwidth between storage processors via network devices
CN115268581A (en) A high-performance computing power AI edge server system architecture
CN106528463A (en) Four-subnode star server system capable of realizing hard disk sharing
CN218768130U (en) Hard disk backboard supporting CXL (CXL) signals and PCIe (peripheral component interface express) signals and storage device
CN114340248B (en) A storage server and its independent machine head control system
CN216352292U (en) Server mainboard and server
CN206686217U (en) A kind of multiserver network share framework
CN217847021U (en) AI edge server system architecture with high performance computing power
WO2024045752A1 (en) Server and electronic device
CN115268581B (en) A high-performance computing AI edge server system architecture
CN209103234U (en) A host that supports multiple GPUs in a 2U chassis
CN100541387C (en) A kind of server system based on the Opteron processor
CN111737174A (en) A hard disk backplane compatible with Tri-mode RAID function and its design method
CN218630661U (en) 4U server supporting 8GPU modules
CN112260969B (en) Blade type edge computing equipment based on CPCI framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant