CN113312304B - A kind of interconnection device, motherboard and server - Google Patents
A kind of interconnection device, motherboard and server Download PDFInfo
- Publication number
- CN113312304B CN113312304B CN202110628626.7A CN202110628626A CN113312304B CN 113312304 B CN113312304 B CN 113312304B CN 202110628626 A CN202110628626 A CN 202110628626A CN 113312304 B CN113312304 B CN 113312304B
- Authority
- CN
- China
- Prior art keywords
- interface
- data
- chip
- module
- interface unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/781—On-chip cache; Off-chip memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4004—Coupling between buses
- G06F13/4027—Coupling between buses using bus bridges
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Multi Processors (AREA)
- Bus Control (AREA)
Abstract
本发明提供了一种互联装置、主板及服务器,该互联装置包括至少两组接口、控制模块。每组接口包含至少一个接口单元;每组接口的至少一个接口单元和芯片模块的至少一个小芯片一一对应;每个接口单元连接对应的小芯片。该控制模块给来自不同组接口中的任意两个接口单元建立链路,使该两个接口单元对应的两个小芯片进行数据传输。在板级实现互联时,只需在电路板内总线互连每个小芯片和互连装置的接口单元即可,无需在板级走线互连每个小芯片和其他的小芯片,减少每个插槽上的引出管脚个数,减少电路板内走线的难度。较少的受芯片模块内部的各种协议的限制,便于扩展到不同类型的芯片模块互联。能够大幅的减小时延,方便编程并保证多路性能。
The invention provides an interconnection device, a main board and a server. The interconnection device includes at least two groups of interfaces and control modules. Each group of interfaces includes at least one interface unit; at least one interface unit of each group of interfaces is in one-to-one correspondence with at least one small chip of the chip module; each interface unit is connected to a corresponding small chip. The control module establishes a link between any two interface units from different groups of interfaces, so that the two chiplets corresponding to the two interface units perform data transmission. When realizing interconnection at the board level, it is only necessary to interconnect each chiplet and the interface unit of the interconnection device on the bus in the circuit board, and there is no need to interconnect each chiplet with other chiplets at the board level, reducing each The number of lead-out pins on each slot reduces the difficulty of wiring in the circuit board. Less restricted by various protocols inside the chip module, it is easy to expand to interconnection of different types of chip modules. It can greatly reduce the delay, facilitate programming and ensure multi-channel performance.
Description
技术领域technical field
本发明涉及计算机技术领域,尤其涉及一种互连装置、主板及服务器。The invention relates to the technical field of computers, in particular to an interconnection device, a main board and a server.
背景技术Background technique
随着芯片的工艺的高速发展,目前芯片的线宽已经在10纳米(nm)以下。这样可以在每个芯片上集成100亿级的晶体管,并在其上放置运算单元、缓存、控制逻辑、高速IO、分布式时钟等多类型模块。这对芯片的设计及验证等都提出严峻的挑战。当前芯片的测试验证往往已经超过芯片设计的时间。芯片的复杂度高以及工艺生产中的不可控因素,意味着芯片一个模块上有瑕疵就会导致整个芯片不良或者降档。With the rapid development of chip technology, the current line width of the chip is already below 10 nanometers (nm). In this way, 10 billion-level transistors can be integrated on each chip, and multiple types of modules such as computing units, caches, control logic, high-speed IO, and distributed clocks can be placed on it. This poses severe challenges to chip design and verification. The test verification of the current chip often exceeds the time of chip design. The high complexity of the chip and the uncontrollable factors in the production process mean that a defect in one module of the chip will cause the entire chip to be defective or downshifted.
为保证测试流程简单快捷,并且为了保证芯片的较好的良率,一种采用有效划分功能模块的小芯片(Chiplet,有时也可以称为Die)技术应运而生。小芯片技术一定程度上可以缓解上面的技术演进导致的问题。但是Chiplet技术不可避免的需要把一些互联的总线进行分割,会导致外部Chiplet的部件链接时延和带宽与内部总线(Internal Bus,简称IB)等相比都有很大降级。例如X86多路(或者多Socket(插槽))CPU(中央处理器)之间的访问内存的延时是内部延时的2倍以上,有的复杂多路CPU之间的延时可达内部延时的3倍以上。通常Chiplet之间的延迟增大,会导致延迟敏感的应用,例如数据库,大数据计算等,在多路互联时性能损失会更大。或者说即使Socket个数增加,系统的性能增加并不明显,反而可能出现性能下降。性能下降的主要原因是这类应用不但是计算密集,内存需求也很大,而且数据相互之间有很强的关联性。极端情况下,使得应用不得不只利用一个插槽(Socket),而不是整个系统多个插槽,否则系统性能反而下降。In order to ensure that the test process is simple and fast, and in order to ensure a good yield rate of the chip, a small chip (Chiplet, sometimes called Die) technology that effectively divides functional modules has emerged as the times require. Small chip technology can alleviate the problems caused by the above technological evolution to a certain extent. However, Chiplet technology inevitably needs to divide some interconnected buses, which will cause the component link delay and bandwidth of the external Chiplet to be greatly degraded compared with the internal bus (Internal Bus, IB for short). For example, the delay of accessing memory between X86 multi-channel (or multi-Socket (slot)) CPUs (central processing units) is more than twice the internal delay, and the delay between some complex multi-channel CPUs can reach the internal delay. more than 3 times the delay. Usually, the delay between chiplets increases, which will cause delay-sensitive applications, such as databases, big data computing, etc., to have greater performance loss when multi-channel interconnection. In other words, even if the number of Sockets increases, the performance of the system does not increase significantly, but performance may decrease instead. The main reason for the performance degradation is that such applications are not only computationally intensive, but also require a lot of memory, and the data is highly correlated with each other. In extreme cases, the application has to use only one socket (Socket) instead of multiple sockets in the entire system, otherwise the system performance will decrease instead.
目前在多Chiplet或多Die芯片互联时,一般需要每个Chiplet和另外一个插槽内的不同Chiplet进行全互联,这样可以保证延时和带宽。但是这样会导致每个插槽上的引出管脚(Pin)成倍增加,如图1所示中的两个Socket总8个Chiplet的情况。在板级实现时会导致布线非常困难,导致电路板的成本大幅度提高。另外可以采用一个Chiplet连接另一个插槽内的某个Chiplet或者Die,利用这个Chiplet作为桥再连接同一个插槽内的其他Chiplet的互联方式,即利用远端Chiplet为桥进行互联的方式,但是该互联方式会导致延迟大幅度提高。At present, when multi-Chiplet or multi-Die chips are interconnected, it is generally necessary for each Chiplet to be fully interconnected with different Chiplets in another slot, so that delay and bandwidth can be guaranteed. However, this will result in the multiplication of pins on each socket, as shown in Figure 1 in the case of two Sockets with a total of 8 Chiplets. When implemented at the board level, wiring will be very difficult, resulting in a significant increase in the cost of the circuit board. In addition, a Chiplet can be used to connect to a Chiplet or Die in another slot, using this Chiplet as a bridge to connect to other Chiplets in the same slot, that is, using a remote Chiplet as a bridge for interconnection, but This interconnection method will result in a significant increase in latency.
发明内容Contents of the invention
本发明提供了一种互联装置、主板及服务器,以降低电路板内走线的难度,提高系统的扩展性能,减小时延,方便编程并保证多路性能。The invention provides an interconnection device, a main board and a server to reduce the difficulty of wiring inside the circuit board, improve the expansion performance of the system, reduce time delay, facilitate programming and ensure multi-channel performance.
第一方面,本发明提供了一种互联装置,该互联装置包括与至少两个插槽一一对应的至少两组接口、以及控制模块。其中,每个插槽插接一个芯片模块;每个芯片模块中设置有至少一个小芯片,且至少有一个芯片模块为包含有至少两个小芯片的多芯片模块;每组接口包含有至少一个接口单元;每组接口的至少一个接口单元,和该组接口对应的芯片模块中的至少一个小芯片一一对应;每个接口单元连接对应的小芯片。该控制模块用于给来自不同组接口中的任意两个接口单元建立链路,使该两个接口单元对应的两个小芯片进行数据传输。In a first aspect, the present invention provides an interconnection device, which includes at least two groups of interfaces corresponding to at least two slots one-to-one, and a control module. Wherein, each slot is plugged with a chip module; each chip module is provided with at least one small chip, and at least one chip module is a multi-chip module containing at least two small chips; each group of interfaces contains at least one Interface unit; at least one interface unit of each group of interfaces corresponds to at least one chiplet in the chip module corresponding to the group of interfaces; each interface unit is connected to the corresponding chiplet. The control module is used to establish a link between any two interface units from different groups of interfaces, so that the two chiplets corresponding to the two interface units can perform data transmission.
在上述的方案中,通过互联装置连接插接在不同的插槽上的芯片模块中的小芯片,由互联装置的控制模块给不同的芯片模块中的小芯片建立链路,实现该两个小芯片之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片和互连装置的接口单元即可,无需在板级走线互连每个小芯片和其他的小芯片,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本。且通过控制模块控制互联装置内的不同接口单元建立链路进行数据传输,在进行数据传输时,较少的受芯片模块内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块互联。且来自不同芯片模块中的两个小芯片通过互联装置进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。In the above scheme, the small chips in the chip modules plugged in different slots are connected through the interconnection device, and the control module of the interconnection device establishes links for the small chips in different chip modules to realize the two small chips. Data transfer between chips. Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each chiplet and the interface unit of the interconnection device on the bus in the circuit board, and there is no need to interconnect each chiplet with other chiplets at the board level, reducing The number of lead-out pins on each slot reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board. And through the control module to control different interface units in the interconnection device to establish links for data transmission, during data transmission, it is less restricted by various protocols inside the chip module, so the interconnection method has strong protocol insensitivity , easy to expand to different types of chip module interconnection. Moreover, when two small chips from different chip modules transmit data through the interconnection device, they can perform full bandwidth transmission. Compared with the prior art that uses the remote Chiplet as a bridge for interconnection, the solution of this application can greatly reduce Low latency, easy programming and guaranteed multi-channel performance.
在一个具体的实施方式中,每个接口单元包括地址及控制接口和数据接口;其中,地址及控制接口和该接口单元对应的小芯片的地址及控制接口连接,数据接口和该接口单元对应的小芯片的数据接口连接。即将每个接口单元划分为地址及控制接口和数据接口,采用控制地址面与数据面分离的方式,对设备传输协议不敏感,方便采用同样协议的各种高速模块共同形成高性能系统。且在具体进行不同芯片模块中的两个小芯片数据传输时,由一个小芯片先将接收端小芯片的地址信息发送给互联装置,由互联装置进行解码和配置链路,同时小芯片准备数据。在小芯片准备好需要发送的数据之后,将数据通过互联装置配置好的链路在第一时间进行传输,减少由于配置链路产生的时延,从而降低整个系统的时延。In a specific embodiment, each interface unit includes an address and control interface and a data interface; wherein, the address and control interface is connected to the address and control interface of the chiplet corresponding to the interface unit, and the data interface is connected to the corresponding interface of the interface unit. The data interface connection of the chiplet. That is to say, each interface unit is divided into address, control interface and data interface, and adopts the method of separating the control address plane from the data plane, which is not sensitive to the equipment transmission protocol, and it is convenient for various high-speed modules using the same protocol to form a high-performance system together. And when performing data transmission between two small chips in different chip modules, one small chip first sends the address information of the small chip at the receiving end to the interconnection device, and the interconnection device decodes and configures the link. At the same time, the small chip prepares data . After the small chip is ready to send the data, the data is transmitted through the link configured by the interconnection device at the first time, reducing the delay caused by configuring the link, thereby reducing the delay of the entire system.
在一个具体的实施方式中,在来自不同芯片模块中的任意两个小芯片传输数据时,该两个小芯片中的一个小芯片为发送端,另一个为接收端。该控制模块包括存储模块、交叉开关矩阵、地址译码模块和监控模块。其中,存储模块用于存储发送端通过地址及控制接口传输过来的接收端的节点地址信息。交叉开关矩阵与每个数据接口均连接。地址译码模块用于根据节点地址信息,给发送端对应的数据接口和接收端对应的数据接口建立数据链路关系,并将数据链路关系存储到存储模块的缓存队列中。监控模块用于监控交叉开关矩阵中是否存在和缓存队列中的数据链路关系匹配的空闲链路;监控模块还用于在存在匹配的空闲链路时,根据匹配的数据链路关系,控制交叉开关矩阵给发送端对应的数据接口和接收端对应的数据接口建立数据传输链路。即通过存储模块、地址译码模块及监控模块的相互配合,便于及时发现建立好的数据链路关系和空闲链路能够匹配,从而快速的完成数据传输链路配置。In a specific embodiment, when data is transmitted from any two chiplets in different chip modules, one of the two chiplets is the sending end, and the other is the receiving end. The control module includes a storage module, a cross switch matrix, an address decoding module and a monitoring module. Wherein, the storage module is used for storing the node address information of the receiving end transmitted by the sending end through the address and control interface. The crossbar matrix is connected to each data interface. The address decoding module is used to establish a data link relationship between the data interface corresponding to the sending end and the data interface corresponding to the receiving end according to the node address information, and store the data link relationship in the cache queue of the storage module. The monitoring module is used to monitor whether there is an idle link matching the data link relationship in the buffer queue in the crossbar switch matrix; the monitoring module is also used to control the crossbar according to the matching data link relationship when there is a matching idle link. The switch matrix establishes a data transmission link for the data interface corresponding to the sending end and the data interface corresponding to the receiving end. That is, through the mutual cooperation of the storage module, the address decoding module and the monitoring module, it is convenient to find that the established data link relationship can match with the idle link in time, so as to quickly complete the configuration of the data transmission link.
在一个具体的实施方式中,存储模块为静态随机存取存储器,以提高互联模块在配置数据传输链路时的读写速度,从而提高数据传输链路配置效率,降低时延。In a specific embodiment, the storage module is a static random access memory, so as to increase the reading and writing speed of the interconnection module when configuring the data transmission link, thereby improving the configuration efficiency of the data transmission link and reducing the time delay.
第二方面,本发明还提供了一种主板,该主板包括电路板。在电路板上设置有至少两个插槽,在每个插槽中插接有一个芯片模块,其中,每个芯片模块中设置有至少一个小芯片,且至少有一个芯片模块为包含有至少两个小芯片的多芯片模块。在主板上还设置有互联装置。互联装置包括控制模块和至少两组接口。其中,每组接口包含有至少一个接口单元;每组接口的至少一个接口单元,和该组接口对应的芯片模块中的至少一个小芯片一一对应。每个接口单元连接对应的小芯片。控制模块用于给来自不同组接口中的任意两个接口单元建立链路,使该两个接口单元对应的两个小芯片进行数据传输。In a second aspect, the present invention also provides a motherboard, which includes a circuit board. At least two slots are arranged on the circuit board, and a chip module is plugged into each slot, wherein each chip module is provided with at least one small chip, and at least one chip module contains at least two A multi-chip module of a small chip. An interconnection device is also arranged on the main board. The interconnection device includes a control module and at least two sets of interfaces. Wherein, each group of interfaces includes at least one interface unit; at least one interface unit of each group of interfaces is in one-to-one correspondence with at least one chiplet in the chip module corresponding to the group of interfaces. Each interface unit is connected to a corresponding chiplet. The control module is used to establish a link between any two interface units from different groups of interfaces, so that the two chiplets corresponding to the two interface units can perform data transmission.
在上述的方案中,通过互联装置连接插接在不同的插槽上的芯片模块中的小芯片,由互联装置的控制模块给不同的芯片模块中的小芯片建立链路,实现该两个小芯片之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片和互连装置的接口单元即可,无需在板级走线互连每个小芯片和其他的小芯片,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本,降低主板的制造成本。且通过控制模块控制互联装置内的不同接口单元建立链路进行数据传输,在进行数据传输时,较少的受芯片模块内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块互联。且来自不同芯片模块中的两个小芯片通过互联装置进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。In the above scheme, the small chips in the chip modules plugged in different slots are connected through the interconnection device, and the control module of the interconnection device establishes links for the small chips in different chip modules to realize the two small chips. Data transfer between chips. Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each chiplet and the interface unit of the interconnection device on the bus in the circuit board, and there is no need to interconnect each chiplet with other chiplets at the board level, reducing The number of lead-out pins on each slot reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board and the manufacturing cost of the main board. And through the control module to control different interface units in the interconnection device to establish links for data transmission, during data transmission, it is less restricted by various protocols inside the chip module, so the interconnection method has strong protocol insensitivity , easy to expand to different types of chip module interconnection. Moreover, when two small chips from different chip modules transmit data through the interconnection device, they can perform full bandwidth transmission. Compared with the prior art that uses the remote Chiplet as a bridge for interconnection, the solution of this application can greatly reduce Low latency, easy programming and guaranteed multi-channel performance.
在一个具体的实施方式中,每个小芯片上设置有地址及控制接口和数据接口,每个接口单元包括地址及控制接口和数据接口。其中,地址及控制接口和该接口单元对应的小芯片的地址及控制接口连接,数据接口和该接口单元对应的小芯片的数据接口连接。即将每个接口单元划分为地址及控制接口和数据接口,采用控制地址面与数据面分离的方式,对设备传输协议不敏感,方便采用同样协议的各种高速模块共同形成高性能系统。且在具体进行不同芯片模块中的两个小芯片数据传输时,由一个小芯片先将接收端小芯片的地址信息发送给互联装置,由互联装置进行解码和配置链路,同时小芯片准备数据。在小芯片准备好需要发送的数据之后,将数据通过互联装置配置好的链路在第一时间进行传输,减少由于配置链路产生的时延,从而降低整个系统的时延。In a specific implementation manner, each chiplet is provided with an address, a control interface, and a data interface, and each interface unit includes an address, a control interface, and a data interface. Wherein, the address and control interface is connected with the address and control interface of the chiplet corresponding to the interface unit, and the data interface is connected with the data interface of the chiplet corresponding to the interface unit. That is to say, each interface unit is divided into address, control interface and data interface, and adopts the method of separating the control address plane from the data plane, which is not sensitive to the equipment transmission protocol, and it is convenient for various high-speed modules using the same protocol to form a high-performance system together. And when performing data transmission between two small chips in different chip modules, one small chip first sends the address information of the small chip at the receiving end to the interconnection device, and the interconnection device decodes and configures the link. At the same time, the small chip prepares data . After the small chip is ready to send the data, the data is transmitted through the link configured by the interconnection device at the first time, reducing the delay caused by configuring the link, thereby reducing the delay of the entire system.
在一个具体的实施方式中,来自不同芯片模块中的任意两个小芯片传输数据时候,该两个小芯片中的一个小芯片为发送端,另一个为接收端。控制模块包括存储模块、交叉开关矩阵、地址译码模块和监控模块。其中,存储模块用于存储发送端通过地址及控制接口传输过来的的接收端的节点地址信息。交叉开关矩阵与每个数据接口均连接。地址译码模块用于根据节点地址信息,给发送端对应的数据接口和接收端对应的数据接口建立数据链路关系,并将数据链路关系存储到存储模块的缓存队列中。监控模块用于监控交叉开关矩阵中是否存在和缓存队列中的数据链路关系匹配的空闲链路;该监控模块还用于在存储匹配的空闲链路时,根据匹配的数据链路关系,控制交叉开关矩阵给发送端对应的数据接口和接收端对应的数据接口建立数据传输链路。即通过存储模块、地址译码模块及监控模块的相互配合,便于及时发现建立好的数据链路关系和空闲链路能够匹配,从而快速的完成数据传输链路配置。In a specific embodiment, when any two chiplets from different chip modules transmit data, one of the two chiplets is a sending end, and the other is a receiving end. The control module includes a storage module, a cross switch matrix, an address decoding module and a monitoring module. Wherein, the storage module is used for storing the node address information of the receiving end transmitted by the sending end through the address and control interface. The crossbar matrix is connected to each data interface. The address decoding module is used to establish a data link relationship between the data interface corresponding to the sending end and the data interface corresponding to the receiving end according to the node address information, and store the data link relationship in the cache queue of the storage module. The monitoring module is used to monitor whether there is an idle link matching the data link relationship in the cache queue in the crossbar switch matrix; the monitoring module is also used to control when storing the matched idle link according to the matching data link relationship. The crossbar matrix establishes a data transmission link for the data interface corresponding to the sending end and the data interface corresponding to the receiving end. That is, through the mutual cooperation of the storage module, the address decoding module and the monitoring module, it is convenient to find that the established data link relationship can match with the idle link in time, so as to quickly complete the configuration of the data transmission link.
在一个具体的实施方式中,每个小芯片中均设置有数据总线,且该数据总线和该小芯片上的地址及控制接口和数据接口均连接。使互联装置对每个小芯片内部的各种协议不敏感,便于扩展到不同类型的芯片模块互联。In a specific embodiment, each chiplet is provided with a data bus, and the data bus is connected to the address and control interface and the data interface on the chiplet. The interconnection device is not sensitive to various protocols inside each small chip, and it is convenient to expand to interconnection of different types of chip modules.
在一个具体的实施方式中,芯片模块为中央处理器和/或专用高性能芯片。In a specific implementation, the chip module is a central processing unit and/or a dedicated high-performance chip.
在一个具体的实施方式中,该专用高性能芯片为图形处理器(GraphicsProcessing Unit,简称GPU)、人工智能AI(Artificial Intelligence,简称AI芯片)芯片、现场可编程逻辑门阵列(Field Programmable Gate Array,简称FPGA)或专用集成电路(Application Specific Integrated Circuit,简称ASIC)。In a specific embodiment, the dedicated high-performance chip is a graphics processing unit (Graphics Processing Unit, referred to as GPU), an artificial intelligence AI (Artificial Intelligence, referred to as AI chip) chip, a field programmable logic gate array (Field Programmable Gate Array, FPGA for short) or Application Specific Integrated Circuit (ASIC for short).
第三方面,本发明还提供了一种服务器,该服务器为前述任意一种主板。通过互联装置连接插接在不同的插槽上的芯片模块中的小芯片,由互联装置的控制模块给不同的芯片模块中的小芯片建立链路,实现该两个小芯片之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片和互连装置的接口单元即可,无需在板级走线互连每个小芯片和其他的小芯片,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本,降低主板的制造成本。且通过控制模块控制互联装置内的不同接口单元建立链路进行数据传输,在进行数据传输时,较少的受芯片模块内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块互联。且来自不同芯片模块中的两个小芯片通过互联装置进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。进而降低服务器的成本,降低服务器的时延。In a third aspect, the present invention also provides a server, which is any one of the aforementioned motherboards. The small chips in the chip modules inserted in different slots are connected through the interconnection device, and the control module of the interconnection device establishes links for the small chips in different chip modules to realize data transmission between the two small chips . Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each chiplet and the interface unit of the interconnection device on the bus in the circuit board, and there is no need to interconnect each chiplet with other chiplets at the board level, reducing The number of lead-out pins on each slot reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board and the manufacturing cost of the main board. And through the control module to control different interface units in the interconnection device to establish links for data transmission, during data transmission, it is less restricted by various protocols inside the chip module, so the interconnection method has strong protocol insensitivity , easy to expand to different types of chip module interconnection. Moreover, when two small chips from different chip modules transmit data through the interconnection device, they can perform full bandwidth transmission. Compared with the prior art that uses the remote Chiplet as a bridge for interconnection, the solution of this application can greatly reduce Low latency, easy programming and guaranteed multi-channel performance. In turn, the cost of the server is reduced, and the delay of the server is reduced.
附图说明Description of drawings
图1为现有技术中提供的一种实现全互联的互联示意图;FIG. 1 is a schematic diagram of an interconnection that realizes full interconnection provided in the prior art;
图2为本发明实施例提供的一种通过互联装置实现全互联的结构框图;FIG. 2 is a structural block diagram of realizing full interconnection through an interconnection device provided by an embodiment of the present invention;
图3为本发明实施例提供的一种不同类型的芯片模块通过互联装置实现互联的结构示意图;FIG. 3 is a schematic structural diagram of interconnection of different types of chip modules provided by an embodiment of the present invention through an interconnection device;
图4为本发明实施例提供的地址控制面与数据面分离实现数据传输的结构框图;FIG. 4 is a structural block diagram of separating the address control plane and the data plane to realize data transmission according to an embodiment of the present invention;
图5为本发明实施例提供的来自不同芯片模块中的两个小芯片实现互联的内部模块路径示意图。FIG. 5 is a schematic diagram of an internal module path for interconnection of two chiplets from different chip modules provided by an embodiment of the present invention.
附图标记:Reference signs:
10-互联装置 11-接口单元 111-地址及控制接口 112-数据接口10-interconnect device 11-interface unit 111-address and control interface 112-data interface
12-控制模块 121-交叉开关矩阵 122-缓存队列 123-监控模块12-Control Module 121-Crossbar Matrix 122-Cache Queue 123-Monitoring Module
20-芯片模块 21-小芯片20-chip module 21-small chip
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
为了方便理解本发明实施例提供的互联装置,下面首先说明一下本发明实施例提供的互联装置的应用场景,该互联装置应用于设置有至少两个芯片模块的主板上,用于互联来自不同的芯片模块中的小芯片。下面结合附图对该互联装置进行详细的叙述。In order to facilitate the understanding of the interconnection device provided by the embodiment of the present invention, the application scenario of the interconnection device provided by the embodiment of the present invention will be described first below. Chiplets in chip modules. The interconnection device will be described in detail below in conjunction with the accompanying drawings.
参考图2,本发明实施例提供的互联装置10包括与至少两个插槽一一对应的至少两组接口。其中,每个插槽插接一个芯片模块20,每个芯片模块20中设置有至少一个小芯片21,且至少有一个芯片模块20为包含有至少两个小芯片21的多芯片模块;每组接口包含有至少一个接口单元11;每组接口的至少一个接口单元11,和该组接口对应的芯片模块20中的至少一个小芯片21一一对应;每个接口单元11连接对应的小芯片21。即插槽的个数和芯片模块20的个数相等,每个插槽上插接有一个芯片模块20。每个芯片模块20中均设置有至少一个小芯片21,且存在至少一个芯片模块20为多芯片模块,每个多芯片模块中设置有至少两个小芯片21。互联装置10上的接口组数和芯片模块20的个数相等,每个芯片模块20均对应有一组接口。每个芯片模块20上的小芯片21的个数,和该芯片模块20所对应的一组接口中的接口单元11个数相等。也存在至少一组接口中包含有至少两个接口单元11,和至少一个多芯片模块中的至少两个小芯片21一一对应。每个小芯片21均连接有一个接口单元11。Referring to FIG. 2 , the
在具体确定芯片模块20的个数时,该芯片模块20的个数可以为2个、3个、4个、5个、10个等不少于2个的任意值。对应的插槽个数和芯片模块20的个数相等,每个芯片模块20插接在对应的插槽上,实现芯片模块20和电路板内的走线连接。如图2所示出的是两个芯片模块20的互联,如图3所示出的是4个芯片模块20的互联。在确定芯片模块20的类型时,该芯片模块20可以为中央处理器,还可以为专用高性能芯片,还可以使部分的芯片模块20为中央处理器,部分的芯片模块20为专用高性能芯片。其中的专用高性能芯片具体可以为图形处理器、人工智能AI(Artificial Intelligence,简称AI)芯片、现场可编程逻辑门阵列或专用集成电路。更具体的,如图2所示出的是两个中央处理器(CPU0、CPU1)互联,如图3所示出的是两个中央处理器(CPU0、CPU1)、一个图形处理器(GPU0)和一个现场可编程逻辑门阵列(FPGA)共四个芯片模块20互联。当然,该芯片模块20还可以为其他类型的加速卡。在确定每个芯片模块20中所包含的小芯片21的个数时,每个芯片模块20中所包含的小芯片21的个数可以为1个、2个、3个、4个等不少于1的任意值,且存在有至少一个芯片模块20为多芯片模块,该多芯片模块中所包含的小芯片21的个数为2个、3个、4个等不少于2的任意值。When specifically determining the number of
参考图2,该互联装置10还包括控制模块12,该控制模块12用于给来自不同组接口中的任意两个接口单元11建立链路,使该两个接口单元11对应的两个小芯片21进行数据传输。通过互联装置10连接插接在不同的插槽上的芯片模块20中的小芯片21,由互联装置10的控制模块12给不同的芯片模块20中的小芯片21建立链路,实现该两个小芯片21之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片21和互连装置的接口单元11即可,无需在板级走线互连每个小芯片21和其他的小芯片21,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本。且通过控制模块12控制互联装置10内的不同接口单元11建立链路进行数据传输,在进行数据传输时,较少的受芯片模块20内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块20互联。且来自不同芯片模块20中的两个小芯片21通过互联装置10进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。Referring to Fig. 2, the
在具体设置每个接口单元11时,参考图2,每个接口单元11可以包括地址及控制接口111和数据接口112。其中,地址及控制接口111和该接口单元11对应的小芯片21的地址及控制接口连接,数据接口112和该接口单元11对应的小芯片21的数据接口连接。即将每个接口单元11划分为地址及控制接口111和数据接口112,采用控制地址面与数据面分离的方式,对设备传输协议不敏感,方便采用同样协议的各种高速模块共同形成高性能系统。且在具体进行不同芯片模块20中的两个小芯片21数据传输时,由一个小芯片21先将接收端小芯片21的地址信息发送给互联装置10,由互联装置10进行解码和配置链路,同时小芯片21准备数据。在小芯片21准备好需要发送的数据之后,将数据通过互联装置10配置好的链路在第一时间进行传输,减少由于配置链路产生的时延,从而降低整个系统的时延。应当理解的是,每个接口单元11并不限于上述示出的划分为地址及控制接口111和数据接口112的设置方式,除此之外,还可以采用其他的设置方式。例如,可以将地址及控制接口111和数据接口112合为一个接口,采用将节点地址信息和数据一起发送的数据传输方式。When specifically setting each
在具体设置控制模块12时,参考图2及图4,该控制模块12可以包括存储模块(图中未示出)、交叉开关矩阵121、地址译码模块(图中未示出)和监控模块123。为便于描述,在来自不同芯片模块20中的任意两个小芯片21传输数据时,该两个小芯片21中的一个小芯片21定义为发送端,另一个定义为接收端。该控制模块12中的存储模块用于存储发送端通过地址及控制接口111传输过来的接收端的节点地址信息。即在发送端所对应的接口单元11中的地址及控制接口111接收到发送端传输过来的接收端的节点地址信息之后,将该节点地址信息转存到存储模块中,便于后续对该节点地址信息进行解码,和配置出相应的数据链路关系。When specifically setting the control module 12, with reference to Fig. 2 and Fig. 4, the control module 12 may include a storage module (not shown in the figure), a
参考图4,其中的交叉开关矩阵121与每个数据接口112均连接,便于通过交叉开关矩阵121中的不同节点之间的导通或断开,实现不同的数据接口112之间的互联或断开。该交叉开关矩阵121为现有技术中具有开关选通功能的开关矩阵。With reference to Fig. 4, the
其中的地址译码模块用于根据节点地址信息,给发送端对应的数据接口112和接收端对应的数据接口112建立数据链路关系,并将数据链路关系存储到存储模块的缓存队列122中。即地址译码模块读取并解码存储模块中所存储的节点地址信息,并根据解码节点地址信息后所得到的信息,给发送端对应的数据接口112和接收端对应的数据接口112建立数据链路关系,并将建立好的数据链路关系存储到存储模块的缓存队列122中,进行排队,便于后续根据该建立好的数据链路关系,配置出相应的数据传输链路。The address decoding module is used to establish a data link relationship between the data interface 112 corresponding to the sending end and the data interface 112 corresponding to the receiving end according to the node address information, and store the data link relationship in the
参考图4,其中的监控模块123用于监控交叉开关矩阵121中是否存在和缓存队列122中的数据链路关系匹配的空闲链路;监控模块123还用于在存在匹配的空闲链路时,根据匹配的数据链路关系,控制交叉开关矩阵121给发送端对应的数据接口112和接收端对应的数据接口112建立数据传输链路。即该监控模块123实时监控交叉开关矩阵121中是否存在空闲链路,还实时读取缓存队列122中的建立好的数据链路关系。在发现空闲链路后,实时判断该空闲链路所连接的两个节点,和缓存队列122中所建立好的数据链路关系中的每个数据链路关系中的两个节点是否相同。如果判断结果为相同,即空闲链路中连接的互联的两个小芯片21,和建立好的数据链路关系中的两个小芯片21相同,则视为交叉开关矩阵121中存在和缓存队列122中的数据链路关系匹配的空闲链路。之后,监控模块123根据匹配的数据链路关系,控制交叉开关矩阵121给这两个小芯片21建立数据传输链路,由这两个小芯片21中的发送端向接收端发送数据。具体的,需要在发送端的小芯片21准备好需要发送的数据,并在准备好数据之后,通过配置完成的数据传输链路进行数据传输。通过存储模块、地址译码模块及监控模块123的相互配合,便于及时发现建立好的数据链路关系和空闲链路能够匹配,从而快速的完成数据传输链路配置。应当注意的是,上述仅仅示出了一种控制来自不同接口组的接口单元11建立链路,使该两个接口单元11对应的两个小芯片21进行数据传输的控制方式,除此之外,还可以采用其他能够通过控制不同接口组的接口单元11导通或断开,使两个接口单元11对应的两个小芯片21进行数据传输的方式。With reference to Fig. 4, the
在设置存储模块时,该存储模块可以为静态随机存取存储器,以提高互联装置在配置数据传输链路时的读写速度,从而提高数据传输链路配置效率,降低时延。应当理解的是,该存储模块并不限于上述示出的静态随机存取存储器,除此之外,还可以采用其他的存储介质作为存储模块。When the storage module is set, the storage module can be a static random access memory, so as to increase the reading and writing speed of the interconnection device when configuring the data transmission link, thereby improving the configuration efficiency of the data transmission link and reducing the time delay. It should be understood that the storage module is not limited to the SRAM shown above, and other storage media may also be used as the storage module.
另外,该互联装置并不限于采用电路形成的电路模块,除此之外,还可以采用光模块作为互联装置。In addition, the interconnection device is not limited to a circuit module formed by a circuit, in addition, an optical module may also be used as the interconnection device.
通过互联装置10连接插接在不同的插槽上的芯片模块20中的小芯片21,由互联装置10的控制模块12给不同的芯片模块20中的小芯片21建立链路,实现该两个小芯片21之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片21和互连装置的接口单元11即可,无需在板级走线互连每个小芯片21和其他的小芯片21,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本。且通过控制模块12控制互联装置10内的不同接口单元11建立链路进行数据传输,在进行数据传输时,较少的受芯片模块20内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块20互联。且来自不同芯片模块20中的两个小芯片21通过互联装置10进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。The
另外,本发明实施例还提供了一种主板,该主板包括电路板(图中未示出)。该电路板具体可以为印刷电路板,作为设置各个器件的载体,同时通过其内的走线实现不同器件之间的互联。参考图2及图3,在电路板上设置有至少两个芯片模块20,具体设置时,在电路板上设置有至少两个插槽,在每个插槽中插接有一个芯片模块20。其中,每个芯片模块20中设置有至少一个小芯片21,且至少有一个芯片模块20为包含有至少两个小芯片21的多芯片模块,每个多芯片模块中设置有至少两个小芯片21。即插槽的个数和芯片模块20的个数相等。每个插槽上插接有一个芯片模块20。In addition, an embodiment of the present invention also provides a motherboard, which includes a circuit board (not shown in the figure). Specifically, the circuit board may be a printed circuit board, which is used as a carrier for setting various devices, and at the same time realizes the interconnection between different devices through the wiring in it. Referring to FIG. 2 and FIG. 3 , at least two
在具体确定芯片模块20的个数时,该芯片模块20的个数可以为2个、3个、4个、5个、10个等不少于2个的任意值。对应的插槽个数和芯片模块20的个数相等,每个芯片模块20插接在对应的插槽上,实现芯片模块20和电路板内的走线连接。如图2所示出的是两个芯片模块20的互联,如图3所示出的是4个芯片模块20的互联。在确定芯片模块20的类型时,该芯片模块20可以为中央处理器,还可以为专用高性能芯片,还可以使部分的芯片模块20为中央处理器,部分的芯片模块20为专用高性能芯片。其中的专用高性能芯片具体可以为图形处理器、人工智能AI(Artificial Intelligence,简称AI芯片)芯片、现场可编程逻辑门阵列或专用集成电路。更具体的,如图2所示出的是两个中央处理器(CPU0、CPU1)互联,如图3所示出的是两个中央处理器(CPU0、CPU1)、一个图形处理器(GPU0)和一个现场可编程逻辑门阵列(FPGA)共四个芯片模块20互联。当然,该芯片模块20还可以为其他类型的加速卡。在确定每个芯片模块20中所包含的小芯片21的个数时,每个芯片模块20中所包含的小芯片21的个数可以为2个、3个、4个等不少于2的任意值,且存在有至少一个芯片模块20为多芯片模块,该多芯片模块中所包含的小芯片21的个数为2个、3个、4个等不少于2的任意值。When specifically determining the number of
参考图2,在主板上还设置有互联装置10。该互联装置10包括至少两组接口。每组接口包含有至少一个接口单元11;每组接口的至少一个接口单元11,和该组接口对应的芯片模块20中的至少一个小芯片21一一对应。每个接口单元11连接对应的小芯片21。即互联装置10上的接口组数和芯片模块20的个数相等,每个芯片模块20均对应有一组接口。每个芯片模块20上的小芯片21的个数,该芯片模块20所对应的一组接口中的接口单元11个数相等。也存在至少一组接口中包含有至少两个接口单元11,和至少一个多芯片模块中的至少两个小芯片21一一对应。每个小芯片21均连接有一个接口单元11。Referring to FIG. 2 , an
如图2所示,该互联装置10还包括控制模块12,控制模块12用于给来自不同组接口中的任意两个接口单元11建立链路,使该两个接口单元11对应的两个小芯片21进行数据传输。通过互联装置10连接插接在不同的插槽上的芯片模块20中的小芯片21,由互联装置10的控制模块12给不同的芯片模块20中的小芯片21建立链路,实现该两个小芯片21之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片21和互连装置的接口单元11即可,无需在板级走线互连每个小芯片21和其他的小芯片21,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本,降低主板的制造成本。且通过控制模块12控制互联装置10内的不同接口单元11建立链路进行数据传输,在进行数据传输时,较少的受芯片模块20内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块20互联。且来自不同芯片模块20中的两个小芯片21通过互联装置10进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。As shown in Figure 2, the
在具体设置每个接口单元11时,参考图2,每个小芯片21上可以设置有地址及控制接口111和数据接口112,每个接口单元11可以包括地址及控制接口111和数据接口112。其中,地址及控制接口111和该接口单元11对应的小芯片21的地址及控制接口连接,数据接口112和该接口单元11对应的小芯片21的数据接口连接。即将每个接口单元11划分为地址及控制接口111和数据接口112,采用控制地址面与数据面分离的方式,对设备传输协议不敏感,方便采用同样协议的各种高速模块共同形成高性能系统。且在具体进行不同芯片模块20中的两个小芯片21数据传输时,由一个小芯片21先将接收端小芯片21的地址信息发送给互联装置10,由互联装置10进行解码和配置链路,同时小芯片21准备数据。在小芯片21准备好需要发送的数据之后,将数据通过互联装置10配置好的链路在第一时间进行传输,减少由于配置链路产生的时延,从而降低整个系统的时延。When specifically setting each
在具体设置每个小芯片21上的地址及控制接口和数据接口时,参考图5,每个小芯片21中均设置有数据总线,可以使该数据总线和该小芯片21上的地址及控制接口和数据接口均连接。使互联装置10对每个小芯片21内部的各种协议不敏感,便于扩展到不同类型的芯片模块20互联。When specifically setting the address, control interface and data interface on each
应当理解的是,每个接口单元11并不限于上述示出的划分为地址及控制接口111和数据接口112的设置方式,除此之外,还可以采用其他的设置方式。例如,可以将地址及控制接口111和数据接口112合为一个接口,采用将节点地址信息和数据一起发送的数据传输方式。It should be understood that each
在具体设置控制模块12时,参考图2及图4,该控制模块12可以包括存储模块(图中未示出)、交叉开关矩阵121、地址译码模块(图中未示出)和监控模块123。为便于描述,在来自不同芯片模块20中的任意两个小芯片21传输数据时,该两个小芯片21中的一个小芯片21定义为发送端,另一个定义为接收端。该控制模块12中的存储模块用于存储发送端通过地址及控制接口111传输过来的接收端的节点地址信息。即在发送端所对应的接口单元11中的地址及控制接口111接收到发送端传输过来的接收端的节点地址信息之后,将该节点地址信息转存到存储模块中,便于后续对该节点地址信息进行解码,和配置出相应的数据链路关系。When specifically setting the control module 12, with reference to Fig. 2 and Fig. 4, the control module 12 may include a storage module (not shown in the figure), a
参考图4,其中的交叉开关矩阵121与每个数据接口112均连接,便于通过交叉开关矩阵121中的不同节点之间的导通或断开,实现不同的数据接口112之间的互联或断开。该交叉开关矩阵121为现有技术中具有开关选通功能的开关矩阵。With reference to Fig. 4, the
其中的地址译码模块用于根据节点地址信息,给发送端对应的数据接口112和接收端对应的数据接口112建立数据链路关系,并将数据链路关系存储到存储模块的缓存队列122中。即地址译码模块读取并解码存储模块中所存储的节点地址信息,并根据解码节点地址信息后所得到的信息,给发送端对应的数据接口112和接收端对应的数据接口112建立数据链路关系,并将建立好的数据链路关系存储到存储模块的缓存队列122中,进行排队,便于后续根据该建立好的数据链路关系,配置出相应的数据传输链路。The address decoding module is used to establish a data link relationship between the data interface 112 corresponding to the sending end and the data interface 112 corresponding to the receiving end according to the node address information, and store the data link relationship in the
如图4所示,其中的监控模块123用于监控交叉开关矩阵121中是否存在和缓存队列122中的数据链路关系匹配的空闲链路;该监控模块123还用于在存储匹配的空闲链路时,根据匹配的数据链路关系,控制交叉开关矩阵121给发送端对应的数据接口112和接收端对应的数据接口112建立数据传输链路。即该监控模块123实时监控交叉开关矩阵121中是否存在空闲链路,还实时读取缓存队列122中的建立好的数据链路关系。在发现空闲链路后,实时判断该空闲链路所连接的两个节点,和缓存队列122中所建立好的数据链路关系中的每个数据链路关系中的两个节点是否相同。如果判断结果为相同,即空闲链路中连接的互联的两个小芯片21,和建立好的数据链路关系中的两个小芯片21相同,则视为交叉开关矩阵121中存在和缓存队列122中的数据链路关系匹配的空闲链路。之后,监控模块123根据匹配的数据链路关系,控制交叉开关矩阵121给这两个小芯片21建立数据传输链路,由这两个小芯片21中的发送端向接收端发送数据。具体的,需要在发送端的小芯片21准备好需要发送的数据,并在准备好数据之后,通过配置完成的数据传输链路进行数据传输。通过存储模块、地址译码模块及监控模块123的相互配合,便于及时发现建立好的数据链路关系和空闲链路能够匹配,从而快速的完成数据传输链路配置。应当注意的是,上述仅仅示出了一种控制来自不同接口组的接口单元11建立链路,使该两个接口单元11对应的两个小芯片21进行数据传输的控制方式,除此之外,还可以采用其他能够通过控制不同接口组的接口单元11导通或断开,使两个接口单元11对应的两个小芯片21进行数据传输的方式。As shown in Figure 4, the
在设置存储模块时,该存储模块可以为静态随机存取存储器,以提高互联模块在配置数据传输链路时的读写速度,从而提高数据传输链路配置效率,降低时延。应当理解的是,该存储模块并不限于上述示出的静态随机存取存储器,除此之外,还可以采用其他的存储介质作为存储模块。When the storage module is set, the storage module can be a static random access memory, so as to increase the reading and writing speed of the interconnection module when configuring the data transmission link, thereby improving the configuration efficiency of the data transmission link and reducing the time delay. It should be understood that the storage module is not limited to the SRAM shown above, and other storage media may also be used as the storage module.
下面结合图2、图4及图5,说明主要的数据通信场景。在应用时,主要的数据通信场景有广播查询和定向数据查询。首先介绍广播查询的具体实现方式。The main data communication scenarios are described below with reference to FIG. 2 , FIG. 4 and FIG. 5 . In application, the main data communication scenarios include broadcast query and directional data query. Firstly, the specific implementation of broadcast query is introduced.
假设图2中的左边的芯片模块20中的小芯片Die0作为广播查询的发起方,需要查询系统内的最新数据。小芯片Die0通过芯片模块20内部的互联方式对同一芯片模块20中的其他小芯片Die1、Die2、Die3进行广播查询,其按照现有技术中的常规方式进行查询。小芯片Die0通过本申请提供的互联装置10对不同芯片模块20中的其他的小芯片Die4、Die5、Die6、Die7进行广播查询。在具体查询时,首先小芯片Die0需要通过其对应的地址及控制接口111向互联装置10发送小芯片Die4、Die5、Die6、Die7的节点地址信息,同时准备需要查询的数据信息,例如需求的数据地址信息等。互联装置10在接收到小芯片21Die0发送过来的节点地址信息后,转存到存储模块中。由译码模块进行解码,并分别建立Die0—Die4、Die0—Die5、Die0—Die6、Die0—Die7的数据链路关系,之后将所建立的数据链路关系转存到缓存队列122中。同时监控模块123查询交叉开关矩阵121中的链接状态。若存在某个空闲链路,假如Die0和Die6之间的链路为空闲链路,那么配置Die0—Die6的数据传输链路,将Die0对应的数据接口112和Die6对应的数据接口112导通。在Die0完成查询信息的准备之后,通过Die0—Die6的数据传输链路将查询信息发送给Die6进行查询。若恰好Die6有需要查询的最新的数据,则通过建立好的Die0—Die6的数据传输链路,将最新的数据传递给Die0。若Die6没有最新的数据,则标记本次查询完成。按照上述方式,依次分别完成Die4、Die5、Die7的数据查询。Assume that the chiplet Die0 in the
下面介绍另一种常见的通信应用场景—定向数据查询。假定Die0向Die7查询数据,且Die7正好有该查询的最新数据。首先小芯片Die0需要过其对应的地址及控制接口111向互联装置10发送小芯片Die7的节点地址信息,同时准备需要查询位于小芯片Die7上的查询数据所在的地址信息等。在互联装置10配置好相应的数据传输链路后,小芯片Die0将准备好的需要查询的位于小芯片Die7上的查询数据所在的地址信息数据通过互联装置10发送给小芯片Die7。此时,参考图5,基本的传输路径为Die0的核Core00->Die0的内部总线IB0->Die0的IB0跨插槽访问接口->互联装置10上Die0对应的数据接口112->互连装置上Die7对应的数据接口112->Die7的IB7跨插槽访问接口->Die7的内部总线IB7->Die7的内存或缓存。在小芯片,Die7接收到查询数据所在的地址信息数据后,根据该查询数据所在的地址信息数据,读取相关查询数据,并通过配置好的Die0—Die7数据传输链路,将查询数据传输给小芯片Die0,完成本次定向查询。需要说明的是,图5中的Core00、Core0n分别表示小芯片Die0中不同的核(Core),Core70、Core7n分别表示小芯片Die7中不同的核(Core)。The following introduces another common communication application scenario—directed data query. Assume that Die0 queries Die7 for data, and Die7 happens to have the latest data of the query. First, the small chip Die0 needs to send the node address information of the small chip Die7 to the
另外,该互联装置并不限于采用电路形成的电路模块,除此之外,还可以采用光模块作为互联装置。In addition, the interconnection device is not limited to a circuit module formed by a circuit, in addition, an optical module may also be used as the interconnection device.
通过互联装置10连接插接在不同的插槽上的芯片模块20中的小芯片21,由互联装置10的控制模块12给不同的芯片模块20中的小芯片21建立链路,实现该两个小芯片21之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片21和互连装置的接口单元11即可,无需在板级走线互连每个小芯片21和其他的小芯片21,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本,降低主板的制造成本。且通过控制模块12控制互联装置10内的不同接口单元11建立链路进行数据传输,在进行数据传输时,较少的受芯片模块20内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块20互联。且来自不同芯片模块20中的两个小芯片21通过互联装置10进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。The
再者,本发明实施例还提供了一种服务器,该服务器为前述任意一种主板。通过互联装置连接插接在不同的插槽上的芯片模块中的小芯片,由互联装置的控制模块给不同的芯片模块中的小芯片建立链路,实现该两个小芯片之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片和互连装置的接口单元即可,无需在板级走线互连每个小芯片和其他的小芯片,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本,降低主板的制造成本。且通过控制模块控制互联装置内的不同接口单元建立链路进行数据传输,在进行数据传输时,较少的受芯片模块内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块互联。且来自不同芯片模块中的两个小芯片通过互联装置进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。进而降低服务器的成本,降低服务器的时延。Furthermore, an embodiment of the present invention also provides a server, where the server is any one of the aforementioned motherboards. The small chips in the chip modules inserted in different slots are connected through the interconnection device, and the control module of the interconnection device establishes links for the small chips in different chip modules to realize data transmission between the two small chips . Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each chiplet and the interface unit of the interconnection device on the bus in the circuit board, and there is no need to interconnect each chiplet with other chiplets at the board level, reducing The number of lead-out pins on each slot reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board and the manufacturing cost of the main board. And through the control module to control different interface units in the interconnection device to establish links for data transmission, during data transmission, it is less restricted by various protocols inside the chip module, so the interconnection method has strong protocol insensitivity , easy to expand to different types of chip module interconnection. Moreover, when two small chips from different chip modules transmit data through the interconnection device, they can perform full bandwidth transmission. Compared with the prior art that uses the remote Chiplet as a bridge for interconnection, the solution of this application can greatly reduce Low latency, easy programming and guaranteed multi-channel performance. In turn, the cost of the server is reduced, and the delay of the server is reduced.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. All should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110628626.7A CN113312304B (en) | 2021-06-04 | 2021-06-04 | A kind of interconnection device, motherboard and server |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110628626.7A CN113312304B (en) | 2021-06-04 | 2021-06-04 | A kind of interconnection device, motherboard and server |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113312304A CN113312304A (en) | 2021-08-27 |
| CN113312304B true CN113312304B (en) | 2023-04-21 |
Family
ID=77377405
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110628626.7A Active CN113312304B (en) | 2021-06-04 | 2021-06-04 | A kind of interconnection device, motherboard and server |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113312304B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113806285B (en) * | 2021-09-18 | 2024-06-25 | 北京爱芯科技有限公司 | Data processing module, chip and data processing method |
| CN116383114B (en) * | 2023-05-26 | 2023-09-08 | 北京壁仞科技开发有限公司 | Chips, chip interconnection systems, data transmission methods, electronic devices and media |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102622191A (en) * | 2012-02-24 | 2012-08-01 | 北京经纬恒润科技有限公司 | High-speed mass storage plate |
| CN112559440A (en) * | 2020-12-30 | 2021-03-26 | 海光信息技术股份有限公司 | Method and device for realizing serial service performance optimization in multi-small-chip system |
| CN112612748A (en) * | 2020-12-25 | 2021-04-06 | 南京蓝洋智能科技有限公司 | Super heterogeneous computing method based on extensible small chip architecture |
| CN112817905A (en) * | 2021-02-05 | 2021-05-18 | 中国电子科技集团公司第五十八研究所 | Interconnection bare chip, interconnection micro assembly, interconnection micro system and communication method thereof |
| CN112817902A (en) * | 2021-02-05 | 2021-05-18 | 中国电子科技集团公司第五十八研究所 | Interconnected bare chip interface management system and initialization method thereof |
Family Cites Families (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9153539B2 (en) * | 2013-03-15 | 2015-10-06 | Nvidia Corporation | Ground-referenced single-ended signaling connected graphics processing unit multi-chip module |
| CN104750581A (en) * | 2015-04-01 | 2015-07-01 | 浪潮电子信息产业股份有限公司 | A Redundant Interconnected Memory Shared Server System |
| CN105871730B (en) * | 2016-03-22 | 2019-03-05 | 广东工业大学 | Network-on-chip router based on network code |
| US11461527B2 (en) * | 2018-02-02 | 2022-10-04 | Micron Technology, Inc. | Interface for data communication between chiplets or other integrated circuits on an interposer |
| CN108304341A (en) * | 2018-03-13 | 2018-07-20 | 算丰科技(北京)有限公司 | AI chip high speeds transmission architecture, AI operations board and server |
| CN109542824A (en) * | 2018-11-20 | 2019-03-29 | 北京锐安科技有限公司 | Equipment room information forwards mediating device and Information Exchange System |
| US10909652B2 (en) * | 2019-03-15 | 2021-02-02 | Intel Corporation | Enabling product SKUs based on chiplet configurations |
| CN110781112A (en) * | 2019-10-23 | 2020-02-11 | 中国人民解放军国防科技大学 | A Dual-Channel Serial RapidIO Interface Supporting Multiple Transmission Modes |
| CN111459862A (en) * | 2020-03-04 | 2020-07-28 | 北京网聘咨询有限公司 | Multi-path server system based on fusion framework |
| CN112269751B (en) * | 2020-11-12 | 2022-08-23 | 浙江大学 | Chip expansion method for hundred million-level neuron brain computer |
| CN213276462U (en) * | 2020-11-25 | 2021-05-25 | 海光信息技术股份有限公司 | Dual-socket server motherboard and dual-socket server |
| CN112835848B (en) * | 2021-02-05 | 2023-03-10 | 中国电子科技集团公司第五十八研究所 | Inter-chip interconnection bypass system and communication method for interconnected bare chips |
-
2021
- 2021-06-04 CN CN202110628626.7A patent/CN113312304B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102622191A (en) * | 2012-02-24 | 2012-08-01 | 北京经纬恒润科技有限公司 | High-speed mass storage plate |
| CN112612748A (en) * | 2020-12-25 | 2021-04-06 | 南京蓝洋智能科技有限公司 | Super heterogeneous computing method based on extensible small chip architecture |
| CN112559440A (en) * | 2020-12-30 | 2021-03-26 | 海光信息技术股份有限公司 | Method and device for realizing serial service performance optimization in multi-small-chip system |
| CN112817905A (en) * | 2021-02-05 | 2021-05-18 | 中国电子科技集团公司第五十八研究所 | Interconnection bare chip, interconnection micro assembly, interconnection micro system and communication method thereof |
| CN112817902A (en) * | 2021-02-05 | 2021-05-18 | 中国电子科技集团公司第五十八研究所 | Interconnected bare chip interface management system and initialization method thereof |
Non-Patent Citations (1)
| Title |
|---|
| 杨晖 ; .后摩尔时代Chiplet技术的演进与挑战.集成电路应用.2020,(第05期),全文. * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113312304A (en) | 2021-08-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN105335327B (en) | Restructural based on Soc/dual redundant VPX3U signal transacting support plates | |
| US8234483B2 (en) | Memory units with packet processor for decapsulating read write access from and encapsulating response to external devices via serial packet switched protocol interface | |
| CN104885212B (en) | Die-stacked device with partitioned multi-hop network | |
| US10725957B1 (en) | Uniform memory access architecture | |
| US11036658B2 (en) | Light-weight memory expansion in a coherent memory system | |
| CN117453596B (en) | Protocol controller, protocol control method, chip, system on chip and electronic device | |
| CN113312304B (en) | A kind of interconnection device, motherboard and server | |
| JP2022510803A (en) | Memory request chain on the bus | |
| EP4278268B1 (en) | Dual-port memory module design for composable computing | |
| CN103257946A (en) | High-speed interconnecting method of controllers of tight-coupling multi-control storage system | |
| CN111104358B (en) | Disaggregating computer system | |
| CN120584341A (en) | PCIE retimer using inter-die data interface for redundant endpoint failover | |
| CN119597489A (en) | P2P communication method and system between IO devices based on PCIe-NTB | |
| CN112148663A (en) | Data exchange chip and server | |
| US7565474B2 (en) | Computer system using serial connect bus, and method for interconnecting a plurality of CPU using serial connect bus | |
| CN117851283A (en) | A distributed memory orthogonal architecture based on CXL | |
| CN116132387A (en) | A low-latency switch and switch system | |
| CN117971135B (en) | Storage device access method and device, storage medium and electronic device | |
| CN101599050A (en) | PCI-E controller core and method thereof that can be adaptive | |
| CN114443530B (en) | Chip interconnection circuit and data transmission method based on TileLink | |
| US11537539B2 (en) | Acceleration of data between a network and local I/O in a NUMA system | |
| JP7682890B2 (en) | Repurposing byte enables as clock enables to save power | |
| CN120457421A (en) | Switching across the root complex to multiple endpoints of the inter-die data interface | |
| CN107102961A (en) | Accelerate the method and system of arm processor concurrent working | |
| RU2846663C1 (en) | Method of data transfer between two hardware accelerator chips |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |
