[go: up one dir, main page]

CN113312304B - A kind of interconnection device, motherboard and server - Google Patents

A kind of interconnection device, motherboard and server Download PDF

Info

Publication number
CN113312304B
CN113312304B CN202110628626.7A CN202110628626A CN113312304B CN 113312304 B CN113312304 B CN 113312304B CN 202110628626 A CN202110628626 A CN 202110628626A CN 113312304 B CN113312304 B CN 113312304B
Authority
CN
China
Prior art keywords
interface
data
chip
module
interface unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110628626.7A
Other languages
Chinese (zh)
Other versions
CN113312304A (en
Inventor
蔡云龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hygon Information Technology Co Ltd
Original Assignee
Hygon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hygon Information Technology Co Ltd filed Critical Hygon Information Technology Co Ltd
Priority to CN202110628626.7A priority Critical patent/CN113312304B/en
Publication of CN113312304A publication Critical patent/CN113312304A/en
Application granted granted Critical
Publication of CN113312304B publication Critical patent/CN113312304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/781On-chip cache; Off-chip memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4027Coupling between buses using bus bridges
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Multi Processors (AREA)
  • Bus Control (AREA)

Abstract

本发明提供了一种互联装置、主板及服务器,该互联装置包括至少两组接口、控制模块。每组接口包含至少一个接口单元;每组接口的至少一个接口单元和芯片模块的至少一个小芯片一一对应;每个接口单元连接对应的小芯片。该控制模块给来自不同组接口中的任意两个接口单元建立链路,使该两个接口单元对应的两个小芯片进行数据传输。在板级实现互联时,只需在电路板内总线互连每个小芯片和互连装置的接口单元即可,无需在板级走线互连每个小芯片和其他的小芯片,减少每个插槽上的引出管脚个数,减少电路板内走线的难度。较少的受芯片模块内部的各种协议的限制,便于扩展到不同类型的芯片模块互联。能够大幅的减小时延,方便编程并保证多路性能。

Figure 202110628626

The invention provides an interconnection device, a main board and a server. The interconnection device includes at least two groups of interfaces and control modules. Each group of interfaces includes at least one interface unit; at least one interface unit of each group of interfaces is in one-to-one correspondence with at least one small chip of the chip module; each interface unit is connected to a corresponding small chip. The control module establishes a link between any two interface units from different groups of interfaces, so that the two chiplets corresponding to the two interface units perform data transmission. When realizing interconnection at the board level, it is only necessary to interconnect each chiplet and the interface unit of the interconnection device on the bus in the circuit board, and there is no need to interconnect each chiplet with other chiplets at the board level, reducing each The number of lead-out pins on each slot reduces the difficulty of wiring in the circuit board. Less restricted by various protocols inside the chip module, it is easy to expand to interconnection of different types of chip modules. It can greatly reduce the delay, facilitate programming and ensure multi-channel performance.

Figure 202110628626

Description

一种互联装置、主板及服务器A kind of interconnection device, motherboard and server

技术领域technical field

本发明涉及计算机技术领域,尤其涉及一种互连装置、主板及服务器。The invention relates to the technical field of computers, in particular to an interconnection device, a main board and a server.

背景技术Background technique

随着芯片的工艺的高速发展,目前芯片的线宽已经在10纳米(nm)以下。这样可以在每个芯片上集成100亿级的晶体管,并在其上放置运算单元、缓存、控制逻辑、高速IO、分布式时钟等多类型模块。这对芯片的设计及验证等都提出严峻的挑战。当前芯片的测试验证往往已经超过芯片设计的时间。芯片的复杂度高以及工艺生产中的不可控因素,意味着芯片一个模块上有瑕疵就会导致整个芯片不良或者降档。With the rapid development of chip technology, the current line width of the chip is already below 10 nanometers (nm). In this way, 10 billion-level transistors can be integrated on each chip, and multiple types of modules such as computing units, caches, control logic, high-speed IO, and distributed clocks can be placed on it. This poses severe challenges to chip design and verification. The test verification of the current chip often exceeds the time of chip design. The high complexity of the chip and the uncontrollable factors in the production process mean that a defect in one module of the chip will cause the entire chip to be defective or downshifted.

为保证测试流程简单快捷,并且为了保证芯片的较好的良率,一种采用有效划分功能模块的小芯片(Chiplet,有时也可以称为Die)技术应运而生。小芯片技术一定程度上可以缓解上面的技术演进导致的问题。但是Chiplet技术不可避免的需要把一些互联的总线进行分割,会导致外部Chiplet的部件链接时延和带宽与内部总线(Internal Bus,简称IB)等相比都有很大降级。例如X86多路(或者多Socket(插槽))CPU(中央处理器)之间的访问内存的延时是内部延时的2倍以上,有的复杂多路CPU之间的延时可达内部延时的3倍以上。通常Chiplet之间的延迟增大,会导致延迟敏感的应用,例如数据库,大数据计算等,在多路互联时性能损失会更大。或者说即使Socket个数增加,系统的性能增加并不明显,反而可能出现性能下降。性能下降的主要原因是这类应用不但是计算密集,内存需求也很大,而且数据相互之间有很强的关联性。极端情况下,使得应用不得不只利用一个插槽(Socket),而不是整个系统多个插槽,否则系统性能反而下降。In order to ensure that the test process is simple and fast, and in order to ensure a good yield rate of the chip, a small chip (Chiplet, sometimes called Die) technology that effectively divides functional modules has emerged as the times require. Small chip technology can alleviate the problems caused by the above technological evolution to a certain extent. However, Chiplet technology inevitably needs to divide some interconnected buses, which will cause the component link delay and bandwidth of the external Chiplet to be greatly degraded compared with the internal bus (Internal Bus, IB for short). For example, the delay of accessing memory between X86 multi-channel (or multi-Socket (slot)) CPUs (central processing units) is more than twice the internal delay, and the delay between some complex multi-channel CPUs can reach the internal delay. more than 3 times the delay. Usually, the delay between chiplets increases, which will cause delay-sensitive applications, such as databases, big data computing, etc., to have greater performance loss when multi-channel interconnection. In other words, even if the number of Sockets increases, the performance of the system does not increase significantly, but performance may decrease instead. The main reason for the performance degradation is that such applications are not only computationally intensive, but also require a lot of memory, and the data is highly correlated with each other. In extreme cases, the application has to use only one socket (Socket) instead of multiple sockets in the entire system, otherwise the system performance will decrease instead.

目前在多Chiplet或多Die芯片互联时,一般需要每个Chiplet和另外一个插槽内的不同Chiplet进行全互联,这样可以保证延时和带宽。但是这样会导致每个插槽上的引出管脚(Pin)成倍增加,如图1所示中的两个Socket总8个Chiplet的情况。在板级实现时会导致布线非常困难,导致电路板的成本大幅度提高。另外可以采用一个Chiplet连接另一个插槽内的某个Chiplet或者Die,利用这个Chiplet作为桥再连接同一个插槽内的其他Chiplet的互联方式,即利用远端Chiplet为桥进行互联的方式,但是该互联方式会导致延迟大幅度提高。At present, when multi-Chiplet or multi-Die chips are interconnected, it is generally necessary for each Chiplet to be fully interconnected with different Chiplets in another slot, so that delay and bandwidth can be guaranteed. However, this will result in the multiplication of pins on each socket, as shown in Figure 1 in the case of two Sockets with a total of 8 Chiplets. When implemented at the board level, wiring will be very difficult, resulting in a significant increase in the cost of the circuit board. In addition, a Chiplet can be used to connect to a Chiplet or Die in another slot, using this Chiplet as a bridge to connect to other Chiplets in the same slot, that is, using a remote Chiplet as a bridge for interconnection, but This interconnection method will result in a significant increase in latency.

发明内容Contents of the invention

本发明提供了一种互联装置、主板及服务器,以降低电路板内走线的难度,提高系统的扩展性能,减小时延,方便编程并保证多路性能。The invention provides an interconnection device, a main board and a server to reduce the difficulty of wiring inside the circuit board, improve the expansion performance of the system, reduce time delay, facilitate programming and ensure multi-channel performance.

第一方面,本发明提供了一种互联装置,该互联装置包括与至少两个插槽一一对应的至少两组接口、以及控制模块。其中,每个插槽插接一个芯片模块;每个芯片模块中设置有至少一个小芯片,且至少有一个芯片模块为包含有至少两个小芯片的多芯片模块;每组接口包含有至少一个接口单元;每组接口的至少一个接口单元,和该组接口对应的芯片模块中的至少一个小芯片一一对应;每个接口单元连接对应的小芯片。该控制模块用于给来自不同组接口中的任意两个接口单元建立链路,使该两个接口单元对应的两个小芯片进行数据传输。In a first aspect, the present invention provides an interconnection device, which includes at least two groups of interfaces corresponding to at least two slots one-to-one, and a control module. Wherein, each slot is plugged with a chip module; each chip module is provided with at least one small chip, and at least one chip module is a multi-chip module containing at least two small chips; each group of interfaces contains at least one Interface unit; at least one interface unit of each group of interfaces corresponds to at least one chiplet in the chip module corresponding to the group of interfaces; each interface unit is connected to the corresponding chiplet. The control module is used to establish a link between any two interface units from different groups of interfaces, so that the two chiplets corresponding to the two interface units can perform data transmission.

在上述的方案中,通过互联装置连接插接在不同的插槽上的芯片模块中的小芯片,由互联装置的控制模块给不同的芯片模块中的小芯片建立链路,实现该两个小芯片之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片和互连装置的接口单元即可,无需在板级走线互连每个小芯片和其他的小芯片,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本。且通过控制模块控制互联装置内的不同接口单元建立链路进行数据传输,在进行数据传输时,较少的受芯片模块内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块互联。且来自不同芯片模块中的两个小芯片通过互联装置进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。In the above scheme, the small chips in the chip modules plugged in different slots are connected through the interconnection device, and the control module of the interconnection device establishes links for the small chips in different chip modules to realize the two small chips. Data transfer between chips. Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each chiplet and the interface unit of the interconnection device on the bus in the circuit board, and there is no need to interconnect each chiplet with other chiplets at the board level, reducing The number of lead-out pins on each slot reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board. And through the control module to control different interface units in the interconnection device to establish links for data transmission, during data transmission, it is less restricted by various protocols inside the chip module, so the interconnection method has strong protocol insensitivity , easy to expand to different types of chip module interconnection. Moreover, when two small chips from different chip modules transmit data through the interconnection device, they can perform full bandwidth transmission. Compared with the prior art that uses the remote Chiplet as a bridge for interconnection, the solution of this application can greatly reduce Low latency, easy programming and guaranteed multi-channel performance.

在一个具体的实施方式中,每个接口单元包括地址及控制接口和数据接口;其中,地址及控制接口和该接口单元对应的小芯片的地址及控制接口连接,数据接口和该接口单元对应的小芯片的数据接口连接。即将每个接口单元划分为地址及控制接口和数据接口,采用控制地址面与数据面分离的方式,对设备传输协议不敏感,方便采用同样协议的各种高速模块共同形成高性能系统。且在具体进行不同芯片模块中的两个小芯片数据传输时,由一个小芯片先将接收端小芯片的地址信息发送给互联装置,由互联装置进行解码和配置链路,同时小芯片准备数据。在小芯片准备好需要发送的数据之后,将数据通过互联装置配置好的链路在第一时间进行传输,减少由于配置链路产生的时延,从而降低整个系统的时延。In a specific embodiment, each interface unit includes an address and control interface and a data interface; wherein, the address and control interface is connected to the address and control interface of the chiplet corresponding to the interface unit, and the data interface is connected to the corresponding interface of the interface unit. The data interface connection of the chiplet. That is to say, each interface unit is divided into address, control interface and data interface, and adopts the method of separating the control address plane from the data plane, which is not sensitive to the equipment transmission protocol, and it is convenient for various high-speed modules using the same protocol to form a high-performance system together. And when performing data transmission between two small chips in different chip modules, one small chip first sends the address information of the small chip at the receiving end to the interconnection device, and the interconnection device decodes and configures the link. At the same time, the small chip prepares data . After the small chip is ready to send the data, the data is transmitted through the link configured by the interconnection device at the first time, reducing the delay caused by configuring the link, thereby reducing the delay of the entire system.

在一个具体的实施方式中,在来自不同芯片模块中的任意两个小芯片传输数据时,该两个小芯片中的一个小芯片为发送端,另一个为接收端。该控制模块包括存储模块、交叉开关矩阵、地址译码模块和监控模块。其中,存储模块用于存储发送端通过地址及控制接口传输过来的接收端的节点地址信息。交叉开关矩阵与每个数据接口均连接。地址译码模块用于根据节点地址信息,给发送端对应的数据接口和接收端对应的数据接口建立数据链路关系,并将数据链路关系存储到存储模块的缓存队列中。监控模块用于监控交叉开关矩阵中是否存在和缓存队列中的数据链路关系匹配的空闲链路;监控模块还用于在存在匹配的空闲链路时,根据匹配的数据链路关系,控制交叉开关矩阵给发送端对应的数据接口和接收端对应的数据接口建立数据传输链路。即通过存储模块、地址译码模块及监控模块的相互配合,便于及时发现建立好的数据链路关系和空闲链路能够匹配,从而快速的完成数据传输链路配置。In a specific embodiment, when data is transmitted from any two chiplets in different chip modules, one of the two chiplets is the sending end, and the other is the receiving end. The control module includes a storage module, a cross switch matrix, an address decoding module and a monitoring module. Wherein, the storage module is used for storing the node address information of the receiving end transmitted by the sending end through the address and control interface. The crossbar matrix is connected to each data interface. The address decoding module is used to establish a data link relationship between the data interface corresponding to the sending end and the data interface corresponding to the receiving end according to the node address information, and store the data link relationship in the cache queue of the storage module. The monitoring module is used to monitor whether there is an idle link matching the data link relationship in the buffer queue in the crossbar switch matrix; the monitoring module is also used to control the crossbar according to the matching data link relationship when there is a matching idle link. The switch matrix establishes a data transmission link for the data interface corresponding to the sending end and the data interface corresponding to the receiving end. That is, through the mutual cooperation of the storage module, the address decoding module and the monitoring module, it is convenient to find that the established data link relationship can match with the idle link in time, so as to quickly complete the configuration of the data transmission link.

在一个具体的实施方式中,存储模块为静态随机存取存储器,以提高互联模块在配置数据传输链路时的读写速度,从而提高数据传输链路配置效率,降低时延。In a specific embodiment, the storage module is a static random access memory, so as to increase the reading and writing speed of the interconnection module when configuring the data transmission link, thereby improving the configuration efficiency of the data transmission link and reducing the time delay.

第二方面,本发明还提供了一种主板,该主板包括电路板。在电路板上设置有至少两个插槽,在每个插槽中插接有一个芯片模块,其中,每个芯片模块中设置有至少一个小芯片,且至少有一个芯片模块为包含有至少两个小芯片的多芯片模块。在主板上还设置有互联装置。互联装置包括控制模块和至少两组接口。其中,每组接口包含有至少一个接口单元;每组接口的至少一个接口单元,和该组接口对应的芯片模块中的至少一个小芯片一一对应。每个接口单元连接对应的小芯片。控制模块用于给来自不同组接口中的任意两个接口单元建立链路,使该两个接口单元对应的两个小芯片进行数据传输。In a second aspect, the present invention also provides a motherboard, which includes a circuit board. At least two slots are arranged on the circuit board, and a chip module is plugged into each slot, wherein each chip module is provided with at least one small chip, and at least one chip module contains at least two A multi-chip module of a small chip. An interconnection device is also arranged on the main board. The interconnection device includes a control module and at least two sets of interfaces. Wherein, each group of interfaces includes at least one interface unit; at least one interface unit of each group of interfaces is in one-to-one correspondence with at least one chiplet in the chip module corresponding to the group of interfaces. Each interface unit is connected to a corresponding chiplet. The control module is used to establish a link between any two interface units from different groups of interfaces, so that the two chiplets corresponding to the two interface units can perform data transmission.

在上述的方案中,通过互联装置连接插接在不同的插槽上的芯片模块中的小芯片,由互联装置的控制模块给不同的芯片模块中的小芯片建立链路,实现该两个小芯片之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片和互连装置的接口单元即可,无需在板级走线互连每个小芯片和其他的小芯片,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本,降低主板的制造成本。且通过控制模块控制互联装置内的不同接口单元建立链路进行数据传输,在进行数据传输时,较少的受芯片模块内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块互联。且来自不同芯片模块中的两个小芯片通过互联装置进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。In the above scheme, the small chips in the chip modules plugged in different slots are connected through the interconnection device, and the control module of the interconnection device establishes links for the small chips in different chip modules to realize the two small chips. Data transfer between chips. Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each chiplet and the interface unit of the interconnection device on the bus in the circuit board, and there is no need to interconnect each chiplet with other chiplets at the board level, reducing The number of lead-out pins on each slot reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board and the manufacturing cost of the main board. And through the control module to control different interface units in the interconnection device to establish links for data transmission, during data transmission, it is less restricted by various protocols inside the chip module, so the interconnection method has strong protocol insensitivity , easy to expand to different types of chip module interconnection. Moreover, when two small chips from different chip modules transmit data through the interconnection device, they can perform full bandwidth transmission. Compared with the prior art that uses the remote Chiplet as a bridge for interconnection, the solution of this application can greatly reduce Low latency, easy programming and guaranteed multi-channel performance.

在一个具体的实施方式中,每个小芯片上设置有地址及控制接口和数据接口,每个接口单元包括地址及控制接口和数据接口。其中,地址及控制接口和该接口单元对应的小芯片的地址及控制接口连接,数据接口和该接口单元对应的小芯片的数据接口连接。即将每个接口单元划分为地址及控制接口和数据接口,采用控制地址面与数据面分离的方式,对设备传输协议不敏感,方便采用同样协议的各种高速模块共同形成高性能系统。且在具体进行不同芯片模块中的两个小芯片数据传输时,由一个小芯片先将接收端小芯片的地址信息发送给互联装置,由互联装置进行解码和配置链路,同时小芯片准备数据。在小芯片准备好需要发送的数据之后,将数据通过互联装置配置好的链路在第一时间进行传输,减少由于配置链路产生的时延,从而降低整个系统的时延。In a specific implementation manner, each chiplet is provided with an address, a control interface, and a data interface, and each interface unit includes an address, a control interface, and a data interface. Wherein, the address and control interface is connected with the address and control interface of the chiplet corresponding to the interface unit, and the data interface is connected with the data interface of the chiplet corresponding to the interface unit. That is to say, each interface unit is divided into address, control interface and data interface, and adopts the method of separating the control address plane from the data plane, which is not sensitive to the equipment transmission protocol, and it is convenient for various high-speed modules using the same protocol to form a high-performance system together. And when performing data transmission between two small chips in different chip modules, one small chip first sends the address information of the small chip at the receiving end to the interconnection device, and the interconnection device decodes and configures the link. At the same time, the small chip prepares data . After the small chip is ready to send the data, the data is transmitted through the link configured by the interconnection device at the first time, reducing the delay caused by configuring the link, thereby reducing the delay of the entire system.

在一个具体的实施方式中,来自不同芯片模块中的任意两个小芯片传输数据时候,该两个小芯片中的一个小芯片为发送端,另一个为接收端。控制模块包括存储模块、交叉开关矩阵、地址译码模块和监控模块。其中,存储模块用于存储发送端通过地址及控制接口传输过来的的接收端的节点地址信息。交叉开关矩阵与每个数据接口均连接。地址译码模块用于根据节点地址信息,给发送端对应的数据接口和接收端对应的数据接口建立数据链路关系,并将数据链路关系存储到存储模块的缓存队列中。监控模块用于监控交叉开关矩阵中是否存在和缓存队列中的数据链路关系匹配的空闲链路;该监控模块还用于在存储匹配的空闲链路时,根据匹配的数据链路关系,控制交叉开关矩阵给发送端对应的数据接口和接收端对应的数据接口建立数据传输链路。即通过存储模块、地址译码模块及监控模块的相互配合,便于及时发现建立好的数据链路关系和空闲链路能够匹配,从而快速的完成数据传输链路配置。In a specific embodiment, when any two chiplets from different chip modules transmit data, one of the two chiplets is a sending end, and the other is a receiving end. The control module includes a storage module, a cross switch matrix, an address decoding module and a monitoring module. Wherein, the storage module is used for storing the node address information of the receiving end transmitted by the sending end through the address and control interface. The crossbar matrix is connected to each data interface. The address decoding module is used to establish a data link relationship between the data interface corresponding to the sending end and the data interface corresponding to the receiving end according to the node address information, and store the data link relationship in the cache queue of the storage module. The monitoring module is used to monitor whether there is an idle link matching the data link relationship in the cache queue in the crossbar switch matrix; the monitoring module is also used to control when storing the matched idle link according to the matching data link relationship. The crossbar matrix establishes a data transmission link for the data interface corresponding to the sending end and the data interface corresponding to the receiving end. That is, through the mutual cooperation of the storage module, the address decoding module and the monitoring module, it is convenient to find that the established data link relationship can match with the idle link in time, so as to quickly complete the configuration of the data transmission link.

在一个具体的实施方式中,每个小芯片中均设置有数据总线,且该数据总线和该小芯片上的地址及控制接口和数据接口均连接。使互联装置对每个小芯片内部的各种协议不敏感,便于扩展到不同类型的芯片模块互联。In a specific embodiment, each chiplet is provided with a data bus, and the data bus is connected to the address and control interface and the data interface on the chiplet. The interconnection device is not sensitive to various protocols inside each small chip, and it is convenient to expand to interconnection of different types of chip modules.

在一个具体的实施方式中,芯片模块为中央处理器和/或专用高性能芯片。In a specific implementation, the chip module is a central processing unit and/or a dedicated high-performance chip.

在一个具体的实施方式中,该专用高性能芯片为图形处理器(GraphicsProcessing Unit,简称GPU)、人工智能AI(Artificial Intelligence,简称AI芯片)芯片、现场可编程逻辑门阵列(Field Programmable Gate Array,简称FPGA)或专用集成电路(Application Specific Integrated Circuit,简称ASIC)。In a specific embodiment, the dedicated high-performance chip is a graphics processing unit (Graphics Processing Unit, referred to as GPU), an artificial intelligence AI (Artificial Intelligence, referred to as AI chip) chip, a field programmable logic gate array (Field Programmable Gate Array, FPGA for short) or Application Specific Integrated Circuit (ASIC for short).

第三方面,本发明还提供了一种服务器,该服务器为前述任意一种主板。通过互联装置连接插接在不同的插槽上的芯片模块中的小芯片,由互联装置的控制模块给不同的芯片模块中的小芯片建立链路,实现该两个小芯片之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片和互连装置的接口单元即可,无需在板级走线互连每个小芯片和其他的小芯片,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本,降低主板的制造成本。且通过控制模块控制互联装置内的不同接口单元建立链路进行数据传输,在进行数据传输时,较少的受芯片模块内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块互联。且来自不同芯片模块中的两个小芯片通过互联装置进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。进而降低服务器的成本,降低服务器的时延。In a third aspect, the present invention also provides a server, which is any one of the aforementioned motherboards. The small chips in the chip modules inserted in different slots are connected through the interconnection device, and the control module of the interconnection device establishes links for the small chips in different chip modules to realize data transmission between the two small chips . Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each chiplet and the interface unit of the interconnection device on the bus in the circuit board, and there is no need to interconnect each chiplet with other chiplets at the board level, reducing The number of lead-out pins on each slot reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board and the manufacturing cost of the main board. And through the control module to control different interface units in the interconnection device to establish links for data transmission, during data transmission, it is less restricted by various protocols inside the chip module, so the interconnection method has strong protocol insensitivity , easy to expand to different types of chip module interconnection. Moreover, when two small chips from different chip modules transmit data through the interconnection device, they can perform full bandwidth transmission. Compared with the prior art that uses the remote Chiplet as a bridge for interconnection, the solution of this application can greatly reduce Low latency, easy programming and guaranteed multi-channel performance. In turn, the cost of the server is reduced, and the delay of the server is reduced.

附图说明Description of drawings

图1为现有技术中提供的一种实现全互联的互联示意图;FIG. 1 is a schematic diagram of an interconnection that realizes full interconnection provided in the prior art;

图2为本发明实施例提供的一种通过互联装置实现全互联的结构框图;FIG. 2 is a structural block diagram of realizing full interconnection through an interconnection device provided by an embodiment of the present invention;

图3为本发明实施例提供的一种不同类型的芯片模块通过互联装置实现互联的结构示意图;FIG. 3 is a schematic structural diagram of interconnection of different types of chip modules provided by an embodiment of the present invention through an interconnection device;

图4为本发明实施例提供的地址控制面与数据面分离实现数据传输的结构框图;FIG. 4 is a structural block diagram of separating the address control plane and the data plane to realize data transmission according to an embodiment of the present invention;

图5为本发明实施例提供的来自不同芯片模块中的两个小芯片实现互联的内部模块路径示意图。FIG. 5 is a schematic diagram of an internal module path for interconnection of two chiplets from different chip modules provided by an embodiment of the present invention.

附图标记:Reference signs:

10-互联装置 11-接口单元 111-地址及控制接口 112-数据接口10-interconnect device 11-interface unit 111-address and control interface 112-data interface

12-控制模块 121-交叉开关矩阵 122-缓存队列 123-监控模块12-Control Module 121-Crossbar Matrix 122-Cache Queue 123-Monitoring Module

20-芯片模块 21-小芯片20-chip module 21-small chip

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

为了方便理解本发明实施例提供的互联装置,下面首先说明一下本发明实施例提供的互联装置的应用场景,该互联装置应用于设置有至少两个芯片模块的主板上,用于互联来自不同的芯片模块中的小芯片。下面结合附图对该互联装置进行详细的叙述。In order to facilitate the understanding of the interconnection device provided by the embodiment of the present invention, the application scenario of the interconnection device provided by the embodiment of the present invention will be described first below. Chiplets in chip modules. The interconnection device will be described in detail below in conjunction with the accompanying drawings.

参考图2,本发明实施例提供的互联装置10包括与至少两个插槽一一对应的至少两组接口。其中,每个插槽插接一个芯片模块20,每个芯片模块20中设置有至少一个小芯片21,且至少有一个芯片模块20为包含有至少两个小芯片21的多芯片模块;每组接口包含有至少一个接口单元11;每组接口的至少一个接口单元11,和该组接口对应的芯片模块20中的至少一个小芯片21一一对应;每个接口单元11连接对应的小芯片21。即插槽的个数和芯片模块20的个数相等,每个插槽上插接有一个芯片模块20。每个芯片模块20中均设置有至少一个小芯片21,且存在至少一个芯片模块20为多芯片模块,每个多芯片模块中设置有至少两个小芯片21。互联装置10上的接口组数和芯片模块20的个数相等,每个芯片模块20均对应有一组接口。每个芯片模块20上的小芯片21的个数,和该芯片模块20所对应的一组接口中的接口单元11个数相等。也存在至少一组接口中包含有至少两个接口单元11,和至少一个多芯片模块中的至少两个小芯片21一一对应。每个小芯片21均连接有一个接口单元11。Referring to FIG. 2 , the interconnection device 10 provided by the embodiment of the present invention includes at least two groups of interfaces corresponding to at least two slots one-to-one. Wherein, each slot inserts a chip module 20, and at least one small chip 21 is arranged in each chip module 20, and at least one chip module 20 is a multi-chip module that includes at least two small chips 21; The interface includes at least one interface unit 11; at least one interface unit 11 of each group of interfaces corresponds to at least one small chip 21 in the chip module 20 corresponding to the group of interfaces; each interface unit 11 is connected to the corresponding small chip 21 . That is, the number of slots is equal to the number of chip modules 20 , and one chip module 20 is plugged into each slot. Each chip module 20 is provided with at least one small chip 21 , and at least one chip module 20 is a multi-chip module, and each multi-chip module is provided with at least two small chips 21 . The number of interface groups on the interconnection device 10 is equal to the number of chip modules 20 , and each chip module 20 corresponds to a group of interfaces. The number of chiplets 21 on each chip module 20 is equal to the number of interface units 11 in a group of interfaces corresponding to the chip module 20 . There is also at least two interface units 11 included in at least one group of interfaces, corresponding to at least two chiplets 21 in at least one multi-chip module. Each chiplet 21 is connected to an interface unit 11 .

在具体确定芯片模块20的个数时,该芯片模块20的个数可以为2个、3个、4个、5个、10个等不少于2个的任意值。对应的插槽个数和芯片模块20的个数相等,每个芯片模块20插接在对应的插槽上,实现芯片模块20和电路板内的走线连接。如图2所示出的是两个芯片模块20的互联,如图3所示出的是4个芯片模块20的互联。在确定芯片模块20的类型时,该芯片模块20可以为中央处理器,还可以为专用高性能芯片,还可以使部分的芯片模块20为中央处理器,部分的芯片模块20为专用高性能芯片。其中的专用高性能芯片具体可以为图形处理器、人工智能AI(Artificial Intelligence,简称AI)芯片、现场可编程逻辑门阵列或专用集成电路。更具体的,如图2所示出的是两个中央处理器(CPU0、CPU1)互联,如图3所示出的是两个中央处理器(CPU0、CPU1)、一个图形处理器(GPU0)和一个现场可编程逻辑门阵列(FPGA)共四个芯片模块20互联。当然,该芯片模块20还可以为其他类型的加速卡。在确定每个芯片模块20中所包含的小芯片21的个数时,每个芯片模块20中所包含的小芯片21的个数可以为1个、2个、3个、4个等不少于1的任意值,且存在有至少一个芯片模块20为多芯片模块,该多芯片模块中所包含的小芯片21的个数为2个、3个、4个等不少于2的任意值。When specifically determining the number of chip modules 20 , the number of chip modules 20 may be any value not less than 2, such as 2, 3, 4, 5, 10, or the like. The number of corresponding slots is equal to the number of chip modules 20, and each chip module 20 is plugged into a corresponding slot to realize the wiring connection between the chip module 20 and the circuit board. What is shown in FIG. 2 is the interconnection of two chip modules 20 , and what is shown in FIG. 3 is the interconnection of four chip modules 20 . When determining the type of the chip module 20, the chip module 20 can be a central processing unit, or a dedicated high-performance chip, or a part of the chip module 20 can be a central processing unit, and a part of the chip module 20 can be a dedicated high-performance chip. . Specifically, the dedicated high-performance chip may be a graphics processor, an artificial intelligence (AI) chip, a field programmable logic gate array, or an application-specific integrated circuit. More specifically, as shown in Figure 2, two central processing units (CPU0, CPU1) are interconnected, and as shown in Figure 3, two central processing units (CPU0, CPU1), a graphics processing unit (GPU0) A total of four chip modules 20 are interconnected with a field programmable logic gate array (FPGA). Of course, the chip module 20 can also be other types of accelerator cards. When determining the number of small chips 21 contained in each chip module 20, the number of small chips 21 contained in each chip module 20 can be 1, 2, 3, 4, etc. Any value greater than or equal to 1, and at least one chip module 20 is a multi-chip module, and the number of chiplets 21 contained in the multi-chip module is any value not less than 2, such as 2, 3, 4, etc. .

参考图2,该互联装置10还包括控制模块12,该控制模块12用于给来自不同组接口中的任意两个接口单元11建立链路,使该两个接口单元11对应的两个小芯片21进行数据传输。通过互联装置10连接插接在不同的插槽上的芯片模块20中的小芯片21,由互联装置10的控制模块12给不同的芯片模块20中的小芯片21建立链路,实现该两个小芯片21之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片21和互连装置的接口单元11即可,无需在板级走线互连每个小芯片21和其他的小芯片21,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本。且通过控制模块12控制互联装置10内的不同接口单元11建立链路进行数据传输,在进行数据传输时,较少的受芯片模块20内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块20互联。且来自不同芯片模块20中的两个小芯片21通过互联装置10进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。Referring to Fig. 2, the interconnection device 10 also includes a control module 12, which is used to establish a link between any two interface units 11 from different groups of interfaces, so that the two chiplets corresponding to the two interface units 11 21 for data transmission. The chiplets 21 in the chip modules 20 that are plugged into different slots are connected through the interconnection device 10, and the control module 12 of the interconnection device 10 establishes a link for the chiplets 21 in different chip modules 20 to realize the two Data transfer between chiplets 21 . Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each small chip 21 and the interface unit 11 of the interconnection device in the circuit board bus, and it is not necessary to interconnect each small chip 21 and other small chips 21 with other small chips at the board level. The chip 21 reduces the number of lead-out pins on each slot, and at the same time reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board. Moreover, the control module 12 controls the different interface units 11 in the interconnection device 10 to establish links for data transmission. During data transmission, it is less restricted by various protocols inside the chip module 20, so that the interconnection mode has a strong The protocol is insensitive, and it is convenient to extend to the interconnection of different types of chip modules 20 . Moreover, when the two small chips 21 from different chip modules 20 transmit data through the interconnection device 10, they can perform full bandwidth transmission. Significantly reduce delay, facilitate programming and ensure multi-channel performance.

在具体设置每个接口单元11时,参考图2,每个接口单元11可以包括地址及控制接口111和数据接口112。其中,地址及控制接口111和该接口单元11对应的小芯片21的地址及控制接口连接,数据接口112和该接口单元11对应的小芯片21的数据接口连接。即将每个接口单元11划分为地址及控制接口111和数据接口112,采用控制地址面与数据面分离的方式,对设备传输协议不敏感,方便采用同样协议的各种高速模块共同形成高性能系统。且在具体进行不同芯片模块20中的两个小芯片21数据传输时,由一个小芯片21先将接收端小芯片21的地址信息发送给互联装置10,由互联装置10进行解码和配置链路,同时小芯片21准备数据。在小芯片21准备好需要发送的数据之后,将数据通过互联装置10配置好的链路在第一时间进行传输,减少由于配置链路产生的时延,从而降低整个系统的时延。应当理解的是,每个接口单元11并不限于上述示出的划分为地址及控制接口111和数据接口112的设置方式,除此之外,还可以采用其他的设置方式。例如,可以将地址及控制接口111和数据接口112合为一个接口,采用将节点地址信息和数据一起发送的数据传输方式。When specifically setting each interface unit 11 , referring to FIG. 2 , each interface unit 11 may include an address and control interface 111 and a data interface 112 . Wherein, the address and control interface 111 is connected with the address and control interface of the chiplet 21 corresponding to the interface unit 11 , and the data interface 112 is connected with the data interface of the chiplet 21 corresponding to the interface unit 11 . That is to say, each interface unit 11 is divided into address and control interface 111 and data interface 112. The method of separating the control address plane from the data plane is not sensitive to the equipment transmission protocol, and it is convenient for various high-speed modules using the same protocol to form a high-performance system together. . And when performing data transmission between two small chips 21 in different chip modules 20, one small chip 21 first sends the address information of the receiving end small chip 21 to the interconnection device 10, and the interconnection device 10 decodes and configures the link , while the chiplet 21 prepares data. After the chiplet 21 prepares the data to be sent, the data is transmitted through the link configured by the interconnection device 10 at the first time, reducing the delay caused by configuring the link, thereby reducing the delay of the entire system. It should be understood that each interface unit 11 is not limited to the above-mentioned arrangement of dividing into an address and control interface 111 and a data interface 112 , and other arrangements can also be adopted. For example, the address and control interface 111 and the data interface 112 can be combined into one interface, and a data transmission mode of sending node address information and data together is adopted.

在具体设置控制模块12时,参考图2及图4,该控制模块12可以包括存储模块(图中未示出)、交叉开关矩阵121、地址译码模块(图中未示出)和监控模块123。为便于描述,在来自不同芯片模块20中的任意两个小芯片21传输数据时,该两个小芯片21中的一个小芯片21定义为发送端,另一个定义为接收端。该控制模块12中的存储模块用于存储发送端通过地址及控制接口111传输过来的接收端的节点地址信息。即在发送端所对应的接口单元11中的地址及控制接口111接收到发送端传输过来的接收端的节点地址信息之后,将该节点地址信息转存到存储模块中,便于后续对该节点地址信息进行解码,和配置出相应的数据链路关系。When specifically setting the control module 12, with reference to Fig. 2 and Fig. 4, the control module 12 may include a storage module (not shown in the figure), a crossbar switch matrix 121, an address decoding module (not shown in the figure) and a monitoring module 123. For ease of description, when any two chiplets 21 in different chip modules 20 transmit data, one of the two chiplets 21 is defined as the sending end, and the other is defined as the receiving end. The storage module in the control module 12 is used to store the node address information of the receiving end transmitted by the sending end through the address and control interface 111 . That is, after the address and control interface 111 in the interface unit 11 corresponding to the sending end receives the node address information of the receiving end transmitted from the sending end, it transfers the node address information to the storage module, so as to facilitate the subsequent processing of the node address information Decode and configure the corresponding data link relationship.

参考图4,其中的交叉开关矩阵121与每个数据接口112均连接,便于通过交叉开关矩阵121中的不同节点之间的导通或断开,实现不同的数据接口112之间的互联或断开。该交叉开关矩阵121为现有技术中具有开关选通功能的开关矩阵。With reference to Fig. 4, the crossbar matrix 121 among them is all connected with each data interface 112, is convenient to realize the interconnection or disconnection between different data interfaces 112 by conducting or disconnecting between different nodes in the crossbar matrix 121. open. The crossbar switch matrix 121 is a switch matrix with switch gating function in the prior art.

其中的地址译码模块用于根据节点地址信息,给发送端对应的数据接口112和接收端对应的数据接口112建立数据链路关系,并将数据链路关系存储到存储模块的缓存队列122中。即地址译码模块读取并解码存储模块中所存储的节点地址信息,并根据解码节点地址信息后所得到的信息,给发送端对应的数据接口112和接收端对应的数据接口112建立数据链路关系,并将建立好的数据链路关系存储到存储模块的缓存队列122中,进行排队,便于后续根据该建立好的数据链路关系,配置出相应的数据传输链路。The address decoding module is used to establish a data link relationship between the data interface 112 corresponding to the sending end and the data interface 112 corresponding to the receiving end according to the node address information, and store the data link relationship in the cache queue 122 of the storage module . That is, the address decoding module reads and decodes the node address information stored in the storage module, and according to the information obtained after decoding the node address information, establishes a data link for the data interface 112 corresponding to the sending end and the data interface 112 corresponding to the receiving end and store the established data link relationship in the cache queue 122 of the storage module for queuing, so as to facilitate subsequent configuration of corresponding data transmission links according to the established data link relationship.

参考图4,其中的监控模块123用于监控交叉开关矩阵121中是否存在和缓存队列122中的数据链路关系匹配的空闲链路;监控模块123还用于在存在匹配的空闲链路时,根据匹配的数据链路关系,控制交叉开关矩阵121给发送端对应的数据接口112和接收端对应的数据接口112建立数据传输链路。即该监控模块123实时监控交叉开关矩阵121中是否存在空闲链路,还实时读取缓存队列122中的建立好的数据链路关系。在发现空闲链路后,实时判断该空闲链路所连接的两个节点,和缓存队列122中所建立好的数据链路关系中的每个数据链路关系中的两个节点是否相同。如果判断结果为相同,即空闲链路中连接的互联的两个小芯片21,和建立好的数据链路关系中的两个小芯片21相同,则视为交叉开关矩阵121中存在和缓存队列122中的数据链路关系匹配的空闲链路。之后,监控模块123根据匹配的数据链路关系,控制交叉开关矩阵121给这两个小芯片21建立数据传输链路,由这两个小芯片21中的发送端向接收端发送数据。具体的,需要在发送端的小芯片21准备好需要发送的数据,并在准备好数据之后,通过配置完成的数据传输链路进行数据传输。通过存储模块、地址译码模块及监控模块123的相互配合,便于及时发现建立好的数据链路关系和空闲链路能够匹配,从而快速的完成数据传输链路配置。应当注意的是,上述仅仅示出了一种控制来自不同接口组的接口单元11建立链路,使该两个接口单元11对应的两个小芯片21进行数据传输的控制方式,除此之外,还可以采用其他能够通过控制不同接口组的接口单元11导通或断开,使两个接口单元11对应的两个小芯片21进行数据传输的方式。With reference to Fig. 4, the monitoring module 123 wherein is used for monitoring whether there is the idle link that matches the data link relationship in the cache queue 122 in the crossbar matrix 121; The monitoring module 123 is also used for when there is a matching idle link, According to the matching data link relationship, the crossbar switch matrix 121 is controlled to establish a data transmission link for the data interface 112 corresponding to the sending end and the data interface 112 corresponding to the receiving end. That is, the monitoring module 123 monitors whether there is an idle link in the crossbar switch matrix 121 in real time, and also reads the established data link relationship in the cache queue 122 in real time. After an idle link is found, it is judged in real time whether the two nodes connected to the idle link are the same as the two nodes in each data link relationship established in the cache queue 122 . If the judgment result is the same, that is, the two small chips 21 connected in the idle link are the same as the two small chips 21 in the established data link relationship, then it is considered that there is a buffer queue in the crossbar matrix 121 The idle link that matches the data link relationship in 122. Afterwards, the monitoring module 123 controls the crossbar switch matrix 121 to establish a data transmission link for the two chiplets 21 according to the matching data link relationship, and the sending end of the two chiplets 21 sends data to the receiving end. Specifically, the small chip 21 at the sending end needs to prepare the data to be sent, and after the data is prepared, perform data transmission through the configured data transmission link. Through the mutual cooperation of the storage module, the address decoding module and the monitoring module 123, it is convenient to timely find that the established data link relationship and the idle link can match, so as to quickly complete the configuration of the data transmission link. It should be noted that the above only shows a control method for controlling the interface units 11 from different interface groups to establish links so that the two chiplets 21 corresponding to the two interface units 11 can perform data transmission. , it is also possible to use other methods that can enable the two chiplets 21 corresponding to the two interface units 11 to perform data transmission by controlling the interface units 11 of different interface groups to be turned on or off.

在设置存储模块时,该存储模块可以为静态随机存取存储器,以提高互联装置在配置数据传输链路时的读写速度,从而提高数据传输链路配置效率,降低时延。应当理解的是,该存储模块并不限于上述示出的静态随机存取存储器,除此之外,还可以采用其他的存储介质作为存储模块。When the storage module is set, the storage module can be a static random access memory, so as to increase the reading and writing speed of the interconnection device when configuring the data transmission link, thereby improving the configuration efficiency of the data transmission link and reducing the time delay. It should be understood that the storage module is not limited to the SRAM shown above, and other storage media may also be used as the storage module.

另外,该互联装置并不限于采用电路形成的电路模块,除此之外,还可以采用光模块作为互联装置。In addition, the interconnection device is not limited to a circuit module formed by a circuit, in addition, an optical module may also be used as the interconnection device.

通过互联装置10连接插接在不同的插槽上的芯片模块20中的小芯片21,由互联装置10的控制模块12给不同的芯片模块20中的小芯片21建立链路,实现该两个小芯片21之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片21和互连装置的接口单元11即可,无需在板级走线互连每个小芯片21和其他的小芯片21,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本。且通过控制模块12控制互联装置10内的不同接口单元11建立链路进行数据传输,在进行数据传输时,较少的受芯片模块20内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块20互联。且来自不同芯片模块20中的两个小芯片21通过互联装置10进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。The chiplets 21 in the chip modules 20 that are plugged into different slots are connected through the interconnection device 10, and the control module 12 of the interconnection device 10 establishes a link for the chiplets 21 in different chip modules 20 to realize the two Data transfer between chiplets 21 . Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each small chip 21 and the interface unit 11 of the interconnection device in the circuit board bus, and it is not necessary to interconnect each small chip 21 and other small chips 21 with other small chips at the board level. The chip 21 reduces the number of lead-out pins on each slot, and at the same time reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board. Moreover, the control module 12 controls the different interface units 11 in the interconnection device 10 to establish links for data transmission. During data transmission, it is less restricted by various protocols inside the chip module 20, so that the interconnection mode has a strong The protocol is insensitive, and it is convenient to extend to the interconnection of different types of chip modules 20 . Moreover, when the two small chips 21 from different chip modules 20 transmit data through the interconnection device 10, they can perform full bandwidth transmission. Significantly reduce delay, facilitate programming and ensure multi-channel performance.

另外,本发明实施例还提供了一种主板,该主板包括电路板(图中未示出)。该电路板具体可以为印刷电路板,作为设置各个器件的载体,同时通过其内的走线实现不同器件之间的互联。参考图2及图3,在电路板上设置有至少两个芯片模块20,具体设置时,在电路板上设置有至少两个插槽,在每个插槽中插接有一个芯片模块20。其中,每个芯片模块20中设置有至少一个小芯片21,且至少有一个芯片模块20为包含有至少两个小芯片21的多芯片模块,每个多芯片模块中设置有至少两个小芯片21。即插槽的个数和芯片模块20的个数相等。每个插槽上插接有一个芯片模块20。In addition, an embodiment of the present invention also provides a motherboard, which includes a circuit board (not shown in the figure). Specifically, the circuit board may be a printed circuit board, which is used as a carrier for setting various devices, and at the same time realizes the interconnection between different devices through the wiring in it. Referring to FIG. 2 and FIG. 3 , at least two chip modules 20 are arranged on the circuit board, and at least two slots are arranged on the circuit board, and a chip module 20 is plugged into each slot. Wherein, each chip module 20 is provided with at least one small chip 21, and at least one chip module 20 is a multi-chip module comprising at least two small chips 21, and each multi-chip module is provided with at least two small chips twenty one. That is, the number of slots is equal to the number of chip modules 20 . A chip module 20 is plugged into each slot.

在具体确定芯片模块20的个数时,该芯片模块20的个数可以为2个、3个、4个、5个、10个等不少于2个的任意值。对应的插槽个数和芯片模块20的个数相等,每个芯片模块20插接在对应的插槽上,实现芯片模块20和电路板内的走线连接。如图2所示出的是两个芯片模块20的互联,如图3所示出的是4个芯片模块20的互联。在确定芯片模块20的类型时,该芯片模块20可以为中央处理器,还可以为专用高性能芯片,还可以使部分的芯片模块20为中央处理器,部分的芯片模块20为专用高性能芯片。其中的专用高性能芯片具体可以为图形处理器、人工智能AI(Artificial Intelligence,简称AI芯片)芯片、现场可编程逻辑门阵列或专用集成电路。更具体的,如图2所示出的是两个中央处理器(CPU0、CPU1)互联,如图3所示出的是两个中央处理器(CPU0、CPU1)、一个图形处理器(GPU0)和一个现场可编程逻辑门阵列(FPGA)共四个芯片模块20互联。当然,该芯片模块20还可以为其他类型的加速卡。在确定每个芯片模块20中所包含的小芯片21的个数时,每个芯片模块20中所包含的小芯片21的个数可以为2个、3个、4个等不少于2的任意值,且存在有至少一个芯片模块20为多芯片模块,该多芯片模块中所包含的小芯片21的个数为2个、3个、4个等不少于2的任意值。When specifically determining the number of chip modules 20 , the number of chip modules 20 may be any value not less than 2, such as 2, 3, 4, 5, 10, or the like. The number of corresponding slots is equal to the number of chip modules 20, and each chip module 20 is plugged into a corresponding slot to realize the wiring connection between the chip module 20 and the circuit board. What is shown in FIG. 2 is the interconnection of two chip modules 20 , and what is shown in FIG. 3 is the interconnection of four chip modules 20 . When determining the type of the chip module 20, the chip module 20 can be a central processing unit, or a dedicated high-performance chip, or a part of the chip module 20 can be a central processing unit, and a part of the chip module 20 can be a dedicated high-performance chip. . The dedicated high-performance chip can specifically be a graphics processor, an artificial intelligence AI (Artificial Intelligence, AI chip for short) chip, a field programmable logic gate array, or an application-specific integrated circuit. More specifically, as shown in Figure 2, two central processing units (CPU0, CPU1) are interconnected, and as shown in Figure 3, two central processing units (CPU0, CPU1), a graphics processing unit (GPU0) A total of four chip modules 20 are interconnected with a field programmable logic gate array (FPGA). Of course, the chip module 20 can also be other types of accelerator cards. When determining the number of small chips 21 contained in each chip module 20, the number of small chips 21 contained in each chip module 20 can be 2, 3, 4, etc. not less than 2 Any value, and at least one chip module 20 is a multi-chip module, and the number of chiplets 21 contained in the multi-chip module is any value not less than 2, such as 2, 3, 4, etc.

参考图2,在主板上还设置有互联装置10。该互联装置10包括至少两组接口。每组接口包含有至少一个接口单元11;每组接口的至少一个接口单元11,和该组接口对应的芯片模块20中的至少一个小芯片21一一对应。每个接口单元11连接对应的小芯片21。即互联装置10上的接口组数和芯片模块20的个数相等,每个芯片模块20均对应有一组接口。每个芯片模块20上的小芯片21的个数,该芯片模块20所对应的一组接口中的接口单元11个数相等。也存在至少一组接口中包含有至少两个接口单元11,和至少一个多芯片模块中的至少两个小芯片21一一对应。每个小芯片21均连接有一个接口单元11。Referring to FIG. 2 , an interconnection device 10 is also provided on the motherboard. The interconnection device 10 includes at least two sets of interfaces. Each group of interfaces includes at least one interface unit 11; at least one interface unit 11 of each group of interfaces corresponds to at least one chiplet 21 in the chip module 20 corresponding to the group of interfaces. Each interface unit 11 is connected to a corresponding chiplet 21 . That is, the number of interface groups on the interconnection device 10 is equal to the number of chip modules 20 , and each chip module 20 corresponds to a group of interfaces. The number of chiplets 21 on each chip module 20 is equal to the number of interface units 11 in a group of interfaces corresponding to the chip module 20 . There is also at least two interface units 11 included in at least one group of interfaces, corresponding to at least two chiplets 21 in at least one multi-chip module. Each chiplet 21 is connected to an interface unit 11 .

如图2所示,该互联装置10还包括控制模块12,控制模块12用于给来自不同组接口中的任意两个接口单元11建立链路,使该两个接口单元11对应的两个小芯片21进行数据传输。通过互联装置10连接插接在不同的插槽上的芯片模块20中的小芯片21,由互联装置10的控制模块12给不同的芯片模块20中的小芯片21建立链路,实现该两个小芯片21之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片21和互连装置的接口单元11即可,无需在板级走线互连每个小芯片21和其他的小芯片21,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本,降低主板的制造成本。且通过控制模块12控制互联装置10内的不同接口单元11建立链路进行数据传输,在进行数据传输时,较少的受芯片模块20内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块20互联。且来自不同芯片模块20中的两个小芯片21通过互联装置10进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。As shown in Figure 2, the interconnection device 10 also includes a control module 12, the control module 12 is used to establish links for any two interface units 11 from different groups of interfaces, so that the two interface units 11 corresponding to the two small The chip 21 performs data transmission. The chiplets 21 in the chip modules 20 that are plugged into different slots are connected through the interconnection device 10, and the control module 12 of the interconnection device 10 establishes a link for the chiplets 21 in different chip modules 20 to realize the two Data transfer between chiplets 21 . Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each small chip 21 and the interface unit 11 of the interconnection device in the circuit board bus, and it is not necessary to interconnect each small chip 21 and other small chips 21 with other small chips at the board level. The chip 21 reduces the number of lead-out pins on each slot, and at the same time reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board and the manufacturing cost of the main board. Moreover, the control module 12 controls the different interface units 11 in the interconnection device 10 to establish links for data transmission. During data transmission, it is less restricted by various protocols inside the chip module 20, so that the interconnection mode has a strong The protocol is insensitive, and it is convenient to extend to the interconnection of different types of chip modules 20 . Moreover, when the two small chips 21 from different chip modules 20 transmit data through the interconnection device 10, they can perform full bandwidth transmission. Significantly reduce delay, facilitate programming and ensure multi-channel performance.

在具体设置每个接口单元11时,参考图2,每个小芯片21上可以设置有地址及控制接口111和数据接口112,每个接口单元11可以包括地址及控制接口111和数据接口112。其中,地址及控制接口111和该接口单元11对应的小芯片21的地址及控制接口连接,数据接口112和该接口单元11对应的小芯片21的数据接口连接。即将每个接口单元11划分为地址及控制接口111和数据接口112,采用控制地址面与数据面分离的方式,对设备传输协议不敏感,方便采用同样协议的各种高速模块共同形成高性能系统。且在具体进行不同芯片模块20中的两个小芯片21数据传输时,由一个小芯片21先将接收端小芯片21的地址信息发送给互联装置10,由互联装置10进行解码和配置链路,同时小芯片21准备数据。在小芯片21准备好需要发送的数据之后,将数据通过互联装置10配置好的链路在第一时间进行传输,减少由于配置链路产生的时延,从而降低整个系统的时延。When specifically setting each interface unit 11, referring to FIG. Wherein, the address and control interface 111 is connected with the address and control interface of the chiplet 21 corresponding to the interface unit 11 , and the data interface 112 is connected with the data interface of the chiplet 21 corresponding to the interface unit 11 . That is to say, each interface unit 11 is divided into address and control interface 111 and data interface 112. The method of separating the control address plane from the data plane is not sensitive to the equipment transmission protocol, and it is convenient for various high-speed modules using the same protocol to form a high-performance system together. . And when performing data transmission between two small chips 21 in different chip modules 20, one small chip 21 first sends the address information of the receiving end small chip 21 to the interconnection device 10, and the interconnection device 10 decodes and configures the link , while the chiplet 21 prepares data. After the chiplet 21 prepares the data to be sent, the data is transmitted through the link configured by the interconnection device 10 at the first time, reducing the delay caused by configuring the link, thereby reducing the delay of the entire system.

在具体设置每个小芯片21上的地址及控制接口和数据接口时,参考图5,每个小芯片21中均设置有数据总线,可以使该数据总线和该小芯片21上的地址及控制接口和数据接口均连接。使互联装置10对每个小芯片21内部的各种协议不敏感,便于扩展到不同类型的芯片模块20互联。When specifically setting the address, control interface and data interface on each small chip 21, with reference to Fig. Both interface and data interface are connected. The interconnection device 10 is not sensitive to various protocols inside each small chip 21 , which facilitates expansion to interconnection of different types of chip modules 20 .

应当理解的是,每个接口单元11并不限于上述示出的划分为地址及控制接口111和数据接口112的设置方式,除此之外,还可以采用其他的设置方式。例如,可以将地址及控制接口111和数据接口112合为一个接口,采用将节点地址信息和数据一起发送的数据传输方式。It should be understood that each interface unit 11 is not limited to the above-mentioned arrangement of dividing into an address and control interface 111 and a data interface 112 , and other arrangements can also be adopted. For example, the address and control interface 111 and the data interface 112 can be combined into one interface, and a data transmission mode of sending node address information and data together is adopted.

在具体设置控制模块12时,参考图2及图4,该控制模块12可以包括存储模块(图中未示出)、交叉开关矩阵121、地址译码模块(图中未示出)和监控模块123。为便于描述,在来自不同芯片模块20中的任意两个小芯片21传输数据时,该两个小芯片21中的一个小芯片21定义为发送端,另一个定义为接收端。该控制模块12中的存储模块用于存储发送端通过地址及控制接口111传输过来的接收端的节点地址信息。即在发送端所对应的接口单元11中的地址及控制接口111接收到发送端传输过来的接收端的节点地址信息之后,将该节点地址信息转存到存储模块中,便于后续对该节点地址信息进行解码,和配置出相应的数据链路关系。When specifically setting the control module 12, with reference to Fig. 2 and Fig. 4, the control module 12 may include a storage module (not shown in the figure), a crossbar switch matrix 121, an address decoding module (not shown in the figure) and a monitoring module 123. For ease of description, when any two chiplets 21 in different chip modules 20 transmit data, one of the two chiplets 21 is defined as the sending end, and the other is defined as the receiving end. The storage module in the control module 12 is used to store the node address information of the receiving end transmitted by the sending end through the address and control interface 111 . That is, after the address and control interface 111 in the interface unit 11 corresponding to the sending end receives the node address information of the receiving end transmitted from the sending end, it transfers the node address information to the storage module, so as to facilitate the subsequent processing of the node address information Decode and configure the corresponding data link relationship.

参考图4,其中的交叉开关矩阵121与每个数据接口112均连接,便于通过交叉开关矩阵121中的不同节点之间的导通或断开,实现不同的数据接口112之间的互联或断开。该交叉开关矩阵121为现有技术中具有开关选通功能的开关矩阵。With reference to Fig. 4, the crossbar matrix 121 among them is all connected with each data interface 112, is convenient to realize the interconnection or disconnection between different data interfaces 112 by conducting or disconnecting between different nodes in the crossbar matrix 121. open. The crossbar switch matrix 121 is a switch matrix with switch gating function in the prior art.

其中的地址译码模块用于根据节点地址信息,给发送端对应的数据接口112和接收端对应的数据接口112建立数据链路关系,并将数据链路关系存储到存储模块的缓存队列122中。即地址译码模块读取并解码存储模块中所存储的节点地址信息,并根据解码节点地址信息后所得到的信息,给发送端对应的数据接口112和接收端对应的数据接口112建立数据链路关系,并将建立好的数据链路关系存储到存储模块的缓存队列122中,进行排队,便于后续根据该建立好的数据链路关系,配置出相应的数据传输链路。The address decoding module is used to establish a data link relationship between the data interface 112 corresponding to the sending end and the data interface 112 corresponding to the receiving end according to the node address information, and store the data link relationship in the cache queue 122 of the storage module . That is, the address decoding module reads and decodes the node address information stored in the storage module, and according to the information obtained after decoding the node address information, establishes a data link for the data interface 112 corresponding to the sending end and the data interface 112 corresponding to the receiving end and store the established data link relationship in the cache queue 122 of the storage module for queuing, so as to facilitate subsequent configuration of corresponding data transmission links according to the established data link relationship.

如图4所示,其中的监控模块123用于监控交叉开关矩阵121中是否存在和缓存队列122中的数据链路关系匹配的空闲链路;该监控模块123还用于在存储匹配的空闲链路时,根据匹配的数据链路关系,控制交叉开关矩阵121给发送端对应的数据接口112和接收端对应的数据接口112建立数据传输链路。即该监控模块123实时监控交叉开关矩阵121中是否存在空闲链路,还实时读取缓存队列122中的建立好的数据链路关系。在发现空闲链路后,实时判断该空闲链路所连接的两个节点,和缓存队列122中所建立好的数据链路关系中的每个数据链路关系中的两个节点是否相同。如果判断结果为相同,即空闲链路中连接的互联的两个小芯片21,和建立好的数据链路关系中的两个小芯片21相同,则视为交叉开关矩阵121中存在和缓存队列122中的数据链路关系匹配的空闲链路。之后,监控模块123根据匹配的数据链路关系,控制交叉开关矩阵121给这两个小芯片21建立数据传输链路,由这两个小芯片21中的发送端向接收端发送数据。具体的,需要在发送端的小芯片21准备好需要发送的数据,并在准备好数据之后,通过配置完成的数据传输链路进行数据传输。通过存储模块、地址译码模块及监控模块123的相互配合,便于及时发现建立好的数据链路关系和空闲链路能够匹配,从而快速的完成数据传输链路配置。应当注意的是,上述仅仅示出了一种控制来自不同接口组的接口单元11建立链路,使该两个接口单元11对应的两个小芯片21进行数据传输的控制方式,除此之外,还可以采用其他能够通过控制不同接口组的接口单元11导通或断开,使两个接口单元11对应的两个小芯片21进行数据传输的方式。As shown in Figure 4, the monitoring module 123 wherein is used to monitor whether there is an idle link matching the data link relation in the cache queue 122 in the crossbar matrix 121; When connecting, according to the matching data link relationship, the crossbar switch matrix 121 is controlled to establish a data transmission link for the data interface 112 corresponding to the sending end and the data interface 112 corresponding to the receiving end. That is, the monitoring module 123 monitors whether there is an idle link in the crossbar switch matrix 121 in real time, and also reads the established data link relationship in the cache queue 122 in real time. After an idle link is found, it is judged in real time whether the two nodes connected to the idle link are the same as the two nodes in each data link relationship established in the cache queue 122 . If the judgment result is the same, that is, the two small chips 21 connected in the idle link are the same as the two small chips 21 in the established data link relationship, then it is considered that there is a buffer queue in the crossbar matrix 121 The idle link that matches the data link relationship in 122. Afterwards, the monitoring module 123 controls the crossbar switch matrix 121 to establish a data transmission link for the two chiplets 21 according to the matching data link relationship, and the sending end of the two chiplets 21 sends data to the receiving end. Specifically, the small chip 21 at the sending end needs to prepare the data to be sent, and after the data is prepared, perform data transmission through the configured data transmission link. Through the mutual cooperation of the storage module, the address decoding module and the monitoring module 123, it is convenient to timely find that the established data link relationship and the idle link can match, so as to quickly complete the configuration of the data transmission link. It should be noted that the above only shows a control method for controlling the interface units 11 from different interface groups to establish links so that the two chiplets 21 corresponding to the two interface units 11 can perform data transmission. , it is also possible to use other methods that can enable the two chiplets 21 corresponding to the two interface units 11 to perform data transmission by controlling the interface units 11 of different interface groups to be turned on or off.

在设置存储模块时,该存储模块可以为静态随机存取存储器,以提高互联模块在配置数据传输链路时的读写速度,从而提高数据传输链路配置效率,降低时延。应当理解的是,该存储模块并不限于上述示出的静态随机存取存储器,除此之外,还可以采用其他的存储介质作为存储模块。When the storage module is set, the storage module can be a static random access memory, so as to increase the reading and writing speed of the interconnection module when configuring the data transmission link, thereby improving the configuration efficiency of the data transmission link and reducing the time delay. It should be understood that the storage module is not limited to the SRAM shown above, and other storage media may also be used as the storage module.

下面结合图2、图4及图5,说明主要的数据通信场景。在应用时,主要的数据通信场景有广播查询和定向数据查询。首先介绍广播查询的具体实现方式。The main data communication scenarios are described below with reference to FIG. 2 , FIG. 4 and FIG. 5 . In application, the main data communication scenarios include broadcast query and directional data query. Firstly, the specific implementation of broadcast query is introduced.

假设图2中的左边的芯片模块20中的小芯片Die0作为广播查询的发起方,需要查询系统内的最新数据。小芯片Die0通过芯片模块20内部的互联方式对同一芯片模块20中的其他小芯片Die1、Die2、Die3进行广播查询,其按照现有技术中的常规方式进行查询。小芯片Die0通过本申请提供的互联装置10对不同芯片模块20中的其他的小芯片Die4、Die5、Die6、Die7进行广播查询。在具体查询时,首先小芯片Die0需要通过其对应的地址及控制接口111向互联装置10发送小芯片Die4、Die5、Die6、Die7的节点地址信息,同时准备需要查询的数据信息,例如需求的数据地址信息等。互联装置10在接收到小芯片21Die0发送过来的节点地址信息后,转存到存储模块中。由译码模块进行解码,并分别建立Die0—Die4、Die0—Die5、Die0—Die6、Die0—Die7的数据链路关系,之后将所建立的数据链路关系转存到缓存队列122中。同时监控模块123查询交叉开关矩阵121中的链接状态。若存在某个空闲链路,假如Die0和Die6之间的链路为空闲链路,那么配置Die0—Die6的数据传输链路,将Die0对应的数据接口112和Die6对应的数据接口112导通。在Die0完成查询信息的准备之后,通过Die0—Die6的数据传输链路将查询信息发送给Die6进行查询。若恰好Die6有需要查询的最新的数据,则通过建立好的Die0—Die6的数据传输链路,将最新的数据传递给Die0。若Die6没有最新的数据,则标记本次查询完成。按照上述方式,依次分别完成Die4、Die5、Die7的数据查询。Assume that the chiplet Die0 in the chip module 20 on the left in FIG. 2 acts as the initiator of the broadcast query and needs to query the latest data in the system. The small chip Die0 broadcasts and inquires other small chips Die1, Die2, and Die3 in the same chip module 20 through the internal interconnection mode of the chip module 20, and the query is performed in a conventional manner in the prior art. The chiplet Die0 performs broadcast query to other chiplets Die4 , Die5 , Die6 , and Die7 in different chip modules 20 through the interconnection device 10 provided by this application. When inquiring specifically, first, the small chip Die0 needs to send the node address information of the small chips Die4, Die5, Die6, and Die7 to the interconnection device 10 through its corresponding address and control interface 111, and at the same time prepare the data information that needs to be queried, such as the required data address information, etc. After receiving the node address information sent by the chiplet 21Die0, the interconnection device 10 dumps it into the storage module. Decoding is performed by the decoding module, and the data link relationships of Die0-Die4, Die0-Die5, Die0-Die6, Die0-Die7 are respectively established, and then the established data link relationships are transferred to the cache queue 122 . At the same time, the monitoring module 123 queries the link status in the crossbar switch matrix 121 . If there is an idle link, if the link between Die0 and Die6 is an idle link, then configure the data transmission link of Die0-Die6, and connect the data interface 112 corresponding to Die0 and the data interface 112 corresponding to Die6. After Die0 completes the preparation of the query information, it sends the query information to Die6 for query through the data transmission link of Die0-Die6. If Die6 happens to have the latest data that needs to be queried, the latest data will be passed to Die0 through the established Die0-Die6 data transmission link. If Die6 does not have the latest data, mark the completion of this query. According to the above method, the data query of Die4, Die5, and Die7 are respectively completed in sequence.

下面介绍另一种常见的通信应用场景—定向数据查询。假定Die0向Die7查询数据,且Die7正好有该查询的最新数据。首先小芯片Die0需要过其对应的地址及控制接口111向互联装置10发送小芯片Die7的节点地址信息,同时准备需要查询位于小芯片Die7上的查询数据所在的地址信息等。在互联装置10配置好相应的数据传输链路后,小芯片Die0将准备好的需要查询的位于小芯片Die7上的查询数据所在的地址信息数据通过互联装置10发送给小芯片Die7。此时,参考图5,基本的传输路径为Die0的核Core00->Die0的内部总线IB0->Die0的IB0跨插槽访问接口->互联装置10上Die0对应的数据接口112->互连装置上Die7对应的数据接口112->Die7的IB7跨插槽访问接口->Die7的内部总线IB7->Die7的内存或缓存。在小芯片,Die7接收到查询数据所在的地址信息数据后,根据该查询数据所在的地址信息数据,读取相关查询数据,并通过配置好的Die0—Die7数据传输链路,将查询数据传输给小芯片Die0,完成本次定向查询。需要说明的是,图5中的Core00、Core0n分别表示小芯片Die0中不同的核(Core),Core70、Core7n分别表示小芯片Die7中不同的核(Core)。The following introduces another common communication application scenario—directed data query. Assume that Die0 queries Die7 for data, and Die7 happens to have the latest data of the query. First, the small chip Die0 needs to send the node address information of the small chip Die7 to the interconnection device 10 through its corresponding address and the control interface 111, and at the same time prepare the address information where the query data on the small chip Die7 needs to be queried. After the interconnection device 10 configures the corresponding data transmission link, the chiplet Die0 sends the prepared address information data where the query data on the chiplet Die7 needs to be queried to the chiplet Die7 through the interconnection device 10 . At this time, referring to FIG. 5 , the basic transmission path is Core00 of Die0 -> internal bus IB0 of Die0 -> IB0 cross-slot access interface of Die0 -> data interface 112 corresponding to Die0 on interconnection device 10 -> interconnection device Data interface 112 corresponding to Die7 -> IB7 cross-slot access interface of Die7 -> internal bus IB7 of Die7 -> memory or cache of Die7. In the small chip, after Die7 receives the address information data where the query data is located, it reads the relevant query data according to the address information data where the query data is located, and transmits the query data to The small chip Die0 completes this directional query. It should be noted that Core00 and Core0n in FIG. 5 respectively represent different cores (Cores) in the chiplet Die0, and Core70 and Core7n respectively represent different cores (Cores) in the chiplet Die7.

另外,该互联装置并不限于采用电路形成的电路模块,除此之外,还可以采用光模块作为互联装置。In addition, the interconnection device is not limited to a circuit module formed by a circuit, in addition, an optical module may also be used as the interconnection device.

通过互联装置10连接插接在不同的插槽上的芯片模块20中的小芯片21,由互联装置10的控制模块12给不同的芯片模块20中的小芯片21建立链路,实现该两个小芯片21之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片21和互连装置的接口单元11即可,无需在板级走线互连每个小芯片21和其他的小芯片21,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本,降低主板的制造成本。且通过控制模块12控制互联装置10内的不同接口单元11建立链路进行数据传输,在进行数据传输时,较少的受芯片模块20内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块20互联。且来自不同芯片模块20中的两个小芯片21通过互联装置10进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。The chiplets 21 in the chip modules 20 that are plugged into different slots are connected through the interconnection device 10, and the control module 12 of the interconnection device 10 establishes a link for the chiplets 21 in different chip modules 20 to realize the two Data transfer between chiplets 21 . Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each small chip 21 and the interface unit 11 of the interconnection device in the circuit board bus, and it is not necessary to interconnect each small chip 21 and other small chips 21 with other small chips at the board level. The chip 21 reduces the number of lead-out pins on each slot, and at the same time reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board and the manufacturing cost of the main board. Moreover, the control module 12 controls the different interface units 11 in the interconnection device 10 to establish links for data transmission. During data transmission, it is less restricted by various protocols inside the chip module 20, so that the interconnection mode has a strong The protocol is insensitive, and it is convenient to extend to the interconnection of different types of chip modules 20 . Moreover, when the two small chips 21 from different chip modules 20 transmit data through the interconnection device 10, they can perform full bandwidth transmission. Significantly reduce delay, facilitate programming and ensure multi-channel performance.

再者,本发明实施例还提供了一种服务器,该服务器为前述任意一种主板。通过互联装置连接插接在不同的插槽上的芯片模块中的小芯片,由互联装置的控制模块给不同的芯片模块中的小芯片建立链路,实现该两个小芯片之间的数据传输。从而在板级实现互联时,只需在电路板内总线互连每个小芯片和互连装置的接口单元即可,无需在板级走线互连每个小芯片和其他的小芯片,减少每个插槽上的引出管脚个数,同时减少电路板内走线的难度,从而减少电路板的加工成本,降低主板的制造成本。且通过控制模块控制互联装置内的不同接口单元建立链路进行数据传输,在进行数据传输时,较少的受芯片模块内部的各种协议的限制,从而互联方式具有较强的协议不敏感性,便于扩展到不同类型的芯片模块互联。且来自不同芯片模块中的两个小芯片通过互联装置进行数据传输时,能够进行满带宽传输,相比现有技术中利用远端Chiplet为桥进行互联的方式,本申请的方案能够大幅的减小时延,方便编程并保证多路性能。进而降低服务器的成本,降低服务器的时延。Furthermore, an embodiment of the present invention also provides a server, where the server is any one of the aforementioned motherboards. The small chips in the chip modules inserted in different slots are connected through the interconnection device, and the control module of the interconnection device establishes links for the small chips in different chip modules to realize data transmission between the two small chips . Therefore, when realizing interconnection at the board level, it is only necessary to interconnect each chiplet and the interface unit of the interconnection device on the bus in the circuit board, and there is no need to interconnect each chiplet with other chiplets at the board level, reducing The number of lead-out pins on each slot reduces the difficulty of wiring in the circuit board, thereby reducing the processing cost of the circuit board and the manufacturing cost of the main board. And through the control module to control different interface units in the interconnection device to establish links for data transmission, during data transmission, it is less restricted by various protocols inside the chip module, so the interconnection method has strong protocol insensitivity , easy to expand to different types of chip module interconnection. Moreover, when two small chips from different chip modules transmit data through the interconnection device, they can perform full bandwidth transmission. Compared with the prior art that uses the remote Chiplet as a bridge for interconnection, the solution of this application can greatly reduce Low latency, easy programming and guaranteed multi-channel performance. In turn, the cost of the server is reduced, and the delay of the server is reduced.

以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. All should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims (10)

1. An interconnect device, comprising:
at least two groups of interfaces corresponding to the at least two slots one by one; wherein each slot is inserted with a chip module; at least one small chip is arranged in each chip module, and at least one chip module is a multi-chip module comprising at least two small chips; each set of interfaces comprises at least one interface unit; at least one interface unit of each group of interfaces corresponds to at least one chiplet in the chip module corresponding to the group of interfaces one by one; each interface unit is connected with a corresponding small chip;
and the control module is used for establishing a link for any two interface units from different groups of interfaces so that two chiplets corresponding to the two interface units can perform data transmission.
2. The interconnect device of claim 1, wherein each interface unit includes an address and control interface and a data interface; the address and control interface is connected with the address and control interface of the small chip corresponding to the interface unit, and the data interface is connected with the data interface of the small chip corresponding to the interface unit.
3. The interconnect device of claim 2, wherein when data is transmitted from any two chiplets from different chip modules, one of the two chiplets is a transmitting end and the other is a receiving end;
the control module includes:
the storage module is used for storing node address information of the receiving end, which is transmitted by the transmitting end through an address in an interface unit corresponding to the transmitting end and a control interface;
a crossbar matrix connected to the data interface in each interface unit;
the address decoding module is used for establishing a data link relation for a data interface in the interface unit corresponding to the transmitting end and a data interface in the interface unit corresponding to the receiving end according to the node address information, and storing the data link relation into a cache queue of the storage module;
the monitoring module is used for monitoring whether an idle link matched with the data link relation in the cache queue exists in the cross switch matrix; and the cross switch matrix is also used for controlling the cross switch matrix to establish a data transmission link for the data interface in the corresponding interface unit of the transmitting end and the data interface in the corresponding interface unit of the receiving end according to the matched data link relation when the matched idle link exists.
4. The interconnect device of claim 3, wherein the memory module is a static random access memory.
5. A motherboard, comprising:
a circuit board;
at least two slots arranged on the circuit board;
the chip modules are inserted into each slot, wherein each chip module is provided with at least one small chip, and at least one chip module is a multi-chip module comprising at least two small chips;
an interconnection device disposed on the motherboard, comprising:
at least two groups of interfaces corresponding to the at least two slots one by one; wherein each set of interfaces comprises at least one interface unit; at least one interface unit of each group of interfaces corresponds to at least one chiplet in the chip module corresponding to the group of interfaces one by one; each interface unit is connected with a corresponding small chip;
and the control module is used for establishing a link for any two interface units from different groups of interfaces so that two chiplets corresponding to the two interface units can perform data transmission.
6. The motherboard of claim 5, wherein each chiplet has an address and control interface and a data interface disposed thereon, each interface unit including an address and control interface and a data interface;
The address and control interface is connected with the address and control interface of the small chip corresponding to the interface unit, and the data interface is connected with the data interface of the small chip corresponding to the interface unit.
7. The motherboard of claim 6, wherein when any two chiplets from different chip modules transmit data, one of the two chiplets is a transmitting end and the other is a receiving end;
the control module includes:
the storage module is used for storing node address information of the receiving end, which is transmitted by the transmitting end through an address in an interface unit corresponding to the transmitting end and a control interface;
a crossbar matrix connected to the data interface in each interface unit;
the address decoding module is used for establishing a data link relation for the data interface in the corresponding interface unit and the data interface in the corresponding interface unit of the receiving end according to the node address information, and storing the data link relation into a cache queue of the storage module;
the monitoring module is used for monitoring whether an idle link matched with the data link relation in the cache queue exists in the cross switch matrix; and the cross switch matrix is also used for controlling the cross switch matrix to establish a data transmission link for the data interface in the corresponding interface unit of the transmitting end and the data interface in the corresponding interface unit of the receiving end according to the matched data link relation when the matched idle link exists.
8. The motherboard of claim 6 wherein a data bus is provided in each chiplet and the data bus is connected to both address and control interfaces and data interfaces on the chiplet.
9. Motherboard according to claim 5, characterized in that the chip module is a central processor and/or a dedicated high-performance chip.
10. A server comprising a motherboard according to any of claims 5 to 9.
CN202110628626.7A 2021-06-04 2021-06-04 A kind of interconnection device, motherboard and server Active CN113312304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110628626.7A CN113312304B (en) 2021-06-04 2021-06-04 A kind of interconnection device, motherboard and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110628626.7A CN113312304B (en) 2021-06-04 2021-06-04 A kind of interconnection device, motherboard and server

Publications (2)

Publication Number Publication Date
CN113312304A CN113312304A (en) 2021-08-27
CN113312304B true CN113312304B (en) 2023-04-21

Family

ID=77377405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110628626.7A Active CN113312304B (en) 2021-06-04 2021-06-04 A kind of interconnection device, motherboard and server

Country Status (1)

Country Link
CN (1) CN113312304B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806285B (en) * 2021-09-18 2024-06-25 北京爱芯科技有限公司 Data processing module, chip and data processing method
CN116383114B (en) * 2023-05-26 2023-09-08 北京壁仞科技开发有限公司 Chips, chip interconnection systems, data transmission methods, electronic devices and media

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622191A (en) * 2012-02-24 2012-08-01 北京经纬恒润科技有限公司 High-speed mass storage plate
CN112559440A (en) * 2020-12-30 2021-03-26 海光信息技术股份有限公司 Method and device for realizing serial service performance optimization in multi-small-chip system
CN112612748A (en) * 2020-12-25 2021-04-06 南京蓝洋智能科技有限公司 Super heterogeneous computing method based on extensible small chip architecture
CN112817905A (en) * 2021-02-05 2021-05-18 中国电子科技集团公司第五十八研究所 Interconnection bare chip, interconnection micro assembly, interconnection micro system and communication method thereof
CN112817902A (en) * 2021-02-05 2021-05-18 中国电子科技集团公司第五十八研究所 Interconnected bare chip interface management system and initialization method thereof

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9153539B2 (en) * 2013-03-15 2015-10-06 Nvidia Corporation Ground-referenced single-ended signaling connected graphics processing unit multi-chip module
CN104750581A (en) * 2015-04-01 2015-07-01 浪潮电子信息产业股份有限公司 A Redundant Interconnected Memory Shared Server System
CN105871730B (en) * 2016-03-22 2019-03-05 广东工业大学 Network-on-chip router based on network code
US11461527B2 (en) * 2018-02-02 2022-10-04 Micron Technology, Inc. Interface for data communication between chiplets or other integrated circuits on an interposer
CN108304341A (en) * 2018-03-13 2018-07-20 算丰科技(北京)有限公司 AI chip high speeds transmission architecture, AI operations board and server
CN109542824A (en) * 2018-11-20 2019-03-29 北京锐安科技有限公司 Equipment room information forwards mediating device and Information Exchange System
US10909652B2 (en) * 2019-03-15 2021-02-02 Intel Corporation Enabling product SKUs based on chiplet configurations
CN110781112A (en) * 2019-10-23 2020-02-11 中国人民解放军国防科技大学 A Dual-Channel Serial RapidIO Interface Supporting Multiple Transmission Modes
CN111459862A (en) * 2020-03-04 2020-07-28 北京网聘咨询有限公司 Multi-path server system based on fusion framework
CN112269751B (en) * 2020-11-12 2022-08-23 浙江大学 Chip expansion method for hundred million-level neuron brain computer
CN213276462U (en) * 2020-11-25 2021-05-25 海光信息技术股份有限公司 Dual-socket server motherboard and dual-socket server
CN112835848B (en) * 2021-02-05 2023-03-10 中国电子科技集团公司第五十八研究所 Inter-chip interconnection bypass system and communication method for interconnected bare chips

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622191A (en) * 2012-02-24 2012-08-01 北京经纬恒润科技有限公司 High-speed mass storage plate
CN112612748A (en) * 2020-12-25 2021-04-06 南京蓝洋智能科技有限公司 Super heterogeneous computing method based on extensible small chip architecture
CN112559440A (en) * 2020-12-30 2021-03-26 海光信息技术股份有限公司 Method and device for realizing serial service performance optimization in multi-small-chip system
CN112817905A (en) * 2021-02-05 2021-05-18 中国电子科技集团公司第五十八研究所 Interconnection bare chip, interconnection micro assembly, interconnection micro system and communication method thereof
CN112817902A (en) * 2021-02-05 2021-05-18 中国电子科技集团公司第五十八研究所 Interconnected bare chip interface management system and initialization method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨晖 ; .后摩尔时代Chiplet技术的演进与挑战.集成电路应用.2020,(第05期),全文. *

Also Published As

Publication number Publication date
CN113312304A (en) 2021-08-27

Similar Documents

Publication Publication Date Title
CN105335327B (en) Restructural based on Soc/dual redundant VPX3U signal transacting support plates
US8234483B2 (en) Memory units with packet processor for decapsulating read write access from and encapsulating response to external devices via serial packet switched protocol interface
CN104885212B (en) Die-stacked device with partitioned multi-hop network
US10725957B1 (en) Uniform memory access architecture
US11036658B2 (en) Light-weight memory expansion in a coherent memory system
CN117453596B (en) Protocol controller, protocol control method, chip, system on chip and electronic device
CN113312304B (en) A kind of interconnection device, motherboard and server
JP2022510803A (en) Memory request chain on the bus
EP4278268B1 (en) Dual-port memory module design for composable computing
CN103257946A (en) High-speed interconnecting method of controllers of tight-coupling multi-control storage system
CN111104358B (en) Disaggregating computer system
CN120584341A (en) PCIE retimer using inter-die data interface for redundant endpoint failover
CN119597489A (en) P2P communication method and system between IO devices based on PCIe-NTB
CN112148663A (en) Data exchange chip and server
US7565474B2 (en) Computer system using serial connect bus, and method for interconnecting a plurality of CPU using serial connect bus
CN117851283A (en) A distributed memory orthogonal architecture based on CXL
CN116132387A (en) A low-latency switch and switch system
CN117971135B (en) Storage device access method and device, storage medium and electronic device
CN101599050A (en) PCI-E controller core and method thereof that can be adaptive
CN114443530B (en) Chip interconnection circuit and data transmission method based on TileLink
US11537539B2 (en) Acceleration of data between a network and local I/O in a NUMA system
JP7682890B2 (en) Repurposing byte enables as clock enables to save power
CN120457421A (en) Switching across the root complex to multiple endpoints of the inter-die data interface
CN107102961A (en) Accelerate the method and system of arm processor concurrent working
RU2846663C1 (en) Method of data transfer between two hardware accelerator chips

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant