[go: up one dir, main page]

CN119149485A - Highly programmable processor array network - Google Patents

Highly programmable processor array network Download PDF

Info

Publication number
CN119149485A
CN119149485A CN202411613152.9A CN202411613152A CN119149485A CN 119149485 A CN119149485 A CN 119149485A CN 202411613152 A CN202411613152 A CN 202411613152A CN 119149485 A CN119149485 A CN 119149485A
Authority
CN
China
Prior art keywords
processor
random access
array
dynamic random
access memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411613152.9A
Other languages
Chinese (zh)
Inventor
段帅君
刘坤
李�瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinfangzhou Shanghai Integrated Circuit Co ltd
Original Assignee
Xinfangzhou Shanghai Integrated Circuit Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinfangzhou Shanghai Integrated Circuit Co ltd filed Critical Xinfangzhou Shanghai Integrated Circuit Co ltd
Priority to CN202411613152.9A priority Critical patent/CN119149485A/en
Publication of CN119149485A publication Critical patent/CN119149485A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8038Associative processors

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

本发明属于计算机技术领域,具体涉及一种高度可编程处理器阵列网络,包括:处理器阵列,处理器阵列具有若干呈阵列分布的处理器单元,处理器单元具有处理器节点及寄存器群,相邻处理器节点之间通过寄存器群进行数据交互;动态随机存储器层,动态随机存储器层具有一层或若干层动态随机存储器,动态随机存储器层堆叠于处理器阵列上层,各动态随机存储器与各处理器节点分别连接。本发明设计为处理器阵列形式,在封装时可以随意定制处理器阵列的大小,根据需求进行切割,也可实现单晶圆大芯片目的。

The present invention belongs to the field of computer technology, and specifically relates to a highly programmable processor array network, including: a processor array, the processor array has a plurality of processor units distributed in an array, the processor unit has a processor node and a register group, and adjacent processor nodes exchange data through the register group; a dynamic random access memory layer, the dynamic random access memory layer has one or more layers of dynamic random access memory, the dynamic random access memory layer is stacked on the upper layer of the processor array, and each dynamic random access memory is connected to each processor node respectively. The present invention is designed in the form of a processor array, and the size of the processor array can be customized at will during packaging, and it can be cut according to demand, and the purpose of a single wafer large chip can also be achieved.

Description

Highly programmable processor array network
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a highly programmable processor array network.
Background
Traditional Field Programmable Gate Arrays (FPGAs) offer extremely high design flexibility through programmable logic blocks and interconnect resources, and are widely used in the field of customized computing and acceleration. However, FPGAs are relatively complex to program and relatively energy inefficient in performing highly complex computational tasks due to their basic building blocks being logic gates and look-up tables.
Disclosure of Invention
Aiming at the technical problems of higher programming complexity and relatively lower energy efficiency when the traditional FPGA realizes highly complex calculation tasks, the invention aims to provide a highly programmable processor array network.
To solve the foregoing technical problem, the present invention provides a highly programmable processor array network, including:
The processor array is provided with a plurality of processor units distributed in an array, the processor units are provided with processor nodes and register groups, and data interaction is carried out between adjacent processor nodes through the register groups;
The dynamic random access memory layer is provided with one or a plurality of layers of dynamic random access memories, the dynamic random access memory layer is stacked on the upper layer of the processor array, and each dynamic random access memory is respectively connected with each processor node.
Optionally, in a highly programmable processor array network as described above, the processor array has 2 n of the processor units distributed in an array, where n is a natural number;
when n is even, the processor array has 2 n/2 rows and 2 n/2 columns;
When n is odd, the processor array has 2 (n-1)/2 rows and 2 (n+1)/2 columns.
Optionally, in the highly programmable processor array network as described above, in the single processor unit, the register group is located around the processor node, and the register group performs data interaction with other adjacent processor units in four directions respectively.
Optionally, in a highly programmable processor array network as described above, in a single said processor unit, said processor node and said register group are connected by a metal layer copper interconnect;
in the adjacent processor units, the two adjacent register groups are connected through metal layer copper interconnection.
Optionally, in the highly programmable processor array network as described above, in a single processor unit, storage controller nodes are disposed around the processor nodes, and the processor nodes respectively perform data interaction with the register groups in four directions through the storage controller nodes in four directions.
Optionally, in the highly programmable processor array network as described above, in a single processor unit, the processor node and the storage controller node are connected by a metal layer copper interconnect, and the storage controller node and the register group are connected by a metal layer copper interconnect;
in the adjacent processor units, the two adjacent register groups are connected through metal layer copper interconnection.
Optionally, in a highly programmable processor array network as described previously, the processor nodes comprise a central processor (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), a tensor processor (Tensor Processing Unit, TPU), a processor decentralized processor (Data Processing Unit, DPU), a smart processor (Image Processing Unit, IPU) or a neural network processor (Neural-network Processing Unit, NPU).
Optionally, in a highly programmable processor array network as described above, the processor array projection is fully coincident with the dynamic random memory layer region.
Optionally, in a highly programmable processor array network as described above, the processor node is connected to the dynamic random access memory layer through the register group.
Optionally, in the highly programmable processor array network as described above, the register group and the dynamic random access memory layer are connected by a metal layer copper interconnect.
Optionally, in the highly programmable processor array network as described above, the dynamic random access memory layer has a separate memory space belonging to each of the processor nodes.
Optionally, in a highly programmable processor array network as described above, each of the processor nodes shares all address space within the dynamic random access memory layer.
The invention has the positive progress effects that:
1. The invention is designed into a processor array form, the size of the processor array can be customized at will during packaging, and the purpose of cutting a single wafer large chip can be realized according to requirements.
2. The invention has the advantages of equality and decentralization among all the processor nodes, so that each processor node in the network has a host (host) function and has no master-slave division. Each processor node has its own memory management, control logic and external communication capabilities. Any node can initiate computing tasks, manage data transmissions, and control the collaboration of other nodes, without relying on a single central controller.
3. The invention provides a higher level of abstraction by using the processor node as a basic unit, reduces programming complexity, and facilitates development of complex algorithms and applications. The fine control and dynamic resource management of the processor nodes can effectively reduce idle power consumption and improve the energy efficiency ratio when processing specific tasks.
Drawings
The present disclosure will become more apparent with reference to the accompanying drawings. It is to be understood that these drawings are solely for purposes of illustration and are not intended as a definition of the limits of the invention. In the figure:
FIG. 1 is a schematic diagram of a partial connection of a processor array according to the present invention;
FIG. 2 is a schematic diagram of another partial connection of a processor array according to the present invention;
Fig. 3 is a schematic view of a partial connection according to the present invention.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which is to be read in light of the specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention.
It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
In the description of the present invention, it should be noted that, for the azimuth terms, such as terms "outside," "middle," "inside," "outside," and the like, the azimuth and positional relationships are indicated based on the azimuth or positional relationships shown in the drawings, only for convenience in describing the present invention and simplifying the description, but not to indicate or imply that the apparatus or element to be referred to must have a specific azimuth, be configured and operated in a specific azimuth, and should not be construed as limiting the specific protection scope of the present invention.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features. Thus, the definition of "a first", "a second" feature may explicitly or implicitly include one or more of such feature, and in the description of the present invention, the meaning of "a number", "a number" is two or more, unless otherwise specifically defined.
Referring to fig. 1-3, embodiments of the present invention provide a highly programmable processor array network (CPPA: configurable Processor Array Network) that replaces the basic cells of an FPGA with small processor nodes (e.g., CPU cores or GPU cores, etc.) from a logic gate array to form a highly programmable processor array network to achieve a higher level of computational abstraction and energy efficiency improvement.
The highly programmable processor array network of the present invention comprises a processor array 1 and a dynamic random access memory layer 2.
The processor array 1 has a plurality of processor units 11 distributed in an array, the processor units 11 having processor nodes 111 and register groups 112 (also referred to as register groups), and data interaction between adjacent processor nodes 111 is performed through the register groups 112.
The dynamic random access memory layer 2 has one or several layers of dynamic random access memories 21 (Dynamic Random Access Memory, DRAM), the dynamic random access memory layer 2 is stacked on the upper layer of the processor array 1, and each dynamic random access memory 21 is connected to each processor node 111.
The invention is designed into a processor array form, the size of the processor array can be customized at will during packaging, and the purpose of cutting a single wafer large chip can be realized according to requirements.
The CPPA is constructed by a multi-core processor node core layer and a plurality of DRAM layers through a three-dimensional stacking technology. All the processor nodes realize efficient data exchange and computing resource optimization by sharing the upper stacked DRAM space, and are particularly suitable for data-intensive tasks such as AI, big data processing and the like.
Each processor node is designed as an autonomous unit and has own memory management, control logic and external communication capability. Each processor node is equipped with complete computing resources and control logic, capable of executing program code independently, managing local memory, and interacting with peripherals. This means that any processor node can initiate computing tasks, manage data transfer, and control the collaboration of other nodes, without relying on a single central controller.
The task scheduling and allocation of the invention is no longer controlled by a single host, but can be implemented by a distributed task scheduling algorithm on the network that balances task allocation based on the current load of the processor nodes, resource availability, and task characteristics. Each processor node directly communicates with each other through a standardized protocol without being transferred through a central controller. Each processor node can dynamically decide to accept new tasks, delegate tasks to other nodes or request resources according to the current load, the resource state and the task queue, so as to realize a real peer-to-peer computing environment.
Specifically, memory resources can be dynamically allocated through a resource scheduling system with a distributed task scheduling algorithm, data distribution is optimized according to task requirements and running conditions, data copying and migration are reduced, and data access efficiency is improved.
In some embodiments, the processor array 1 has 2 n processor units 11 distributed in an array, where n is a natural number. When n is an even number, processor array 1 has 2 n/2 rows and 2 n/2 columns. When n is an odd number, processor array 1 has 2 (n -1)/2 rows and 2 (n+1)/2 columns.
Through the reasonable design, the processor array 1 is conveniently cut into the required network size according to the requirement.
In some embodiments, referring to fig. 1, in a single processor unit 11, a register group 112 is located around a processor node 111, and the register group 112 performs data interaction with other adjacent processor units 11 in four directions, respectively.
In some embodiments, in a single processor unit 11, the processor node 111 and the register group 112 are connected by a metal layer copper interconnect.
In the adjacent processor unit 11, the adjacent two register groups 112 are connected by metal layer copper interconnection.
The metal layer copper interconnect ensures low delay of data transmission.
In some embodiments, referring to fig. 2, in a single processor unit 11, memory Controller (MC) nodes 113 are disposed around the processor node 111, and the processor node 111 performs data interaction with four-directional register groups 112 through the four-directional Memory controller nodes 113, respectively.
In some embodiments, referring to FIG. 3, in a single processor unit 11, processor node 111 is connected to memory controller node 113 through a metal layer copper interconnect, and memory controller node 113 is connected to register group 112 through a metal layer copper interconnect.
In the adjacent processor unit 11, the adjacent two register groups 112 are connected by metal layer copper interconnection.
In some embodiments, the processor node 111 includes a central processor (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), a tensor processor (Tensor Processing Unit, TPU), a processor-dispersed processor (Data Processing Unit, DPU), an intelligent processor (Image Processing Unit, IPU), or a neural network processor (Neural-network Processing Unit, NPU).
Therefore, the CPPA mainly comprising a many-core CPU/GPU/TPU/DPU/IPU/NPU core layer and a plurality of DRAM layers can be constructed by adopting the three-dimensional stacking technology.
In some embodiments, the processor array 1 projection coincides completely with the upper dynamic random access memory layer 2 area.
In some embodiments, referring to FIG. 3, processor node 111 is coupled to DRAM layer 2 through register bank 112.
In some embodiments, the register file 112 is connected to the DRAM layer 2 via a metal layer copper interconnect.
Specifically, when the dram layer 2 is a plurality of layers of the dram 21, the register group 112 is distributed and connected with the dram 21 of each layer through the metal layer copper interconnection.
In some embodiments, in order to achieve decentralization between processor nodes, in the memory space of the upper DRAM, there is an independent memory space within the dynamic random access memory layer 2 that belongs to each processor node 111. The independent storage space is controlled and allocated by a local memory management circuit of the processor node.
In some embodiments, each processor node 111 shares all address space within dynamic random access memory layer 2 in order to achieve high-speed communication of data and strong correlation of data between the processor nodes. I.e. all processor nodes 111 have access to all addresses within the dynamic random access memory layer 2.
In some embodiments, in the CPPA of the present invention, a high-efficiency NoC (network on chip) may be integrated to optimize the transfer of data between the processor node and the DRAM, supporting high-bandwidth, low-latency communications. The NoC adopts a dynamic routing algorithm to intelligently adjust the data flow direction according to the network condition and the task priority, so that the data processing parallelism is improved. The NoC architecture adopts a multi-dimensional ring or two-dimensional grid topology, so that data can be transmitted in multiple directions, and transmission bottlenecks are reduced. The dynamic routing algorithm is based on a distributed self-adaptive mechanism, collects network traffic information and task priority indexes, and dynamically plans an optimal path for data transmission by combining a first-in first-out (FIFO) queue with a priority queue management strategy. In addition, the flow control mechanism embedded in the NoC can prevent data packet collision and congestion, and further improves the fluency and efficiency of data transmission.
In some embodiments, in the CPPA of the present invention, the microarchitecture of processor nodes may be fine-grained using a Hardware Description Language (HDL) level, such as instruction sets, cache sizes, and interconnect topologies.
The present invention has been described in detail with reference to the embodiments of the drawings, and those skilled in the art can make various modifications to the invention based on the above description. Accordingly, certain details of the embodiments are not to be interpreted as limiting the invention, which is defined by the appended claims.

Claims (12)

1.一种高度可编程处理器阵列网络,其特征在于,所述高度可编程处理器阵列网络包括:1. A highly programmable processor array network, characterized in that the highly programmable processor array network comprises: 处理器阵列,所述处理器阵列具有若干呈阵列分布的处理器单元,所述处理器单元具有处理器节点及寄存器群,相邻所述处理器节点之间通过所述寄存器群进行数据交互;A processor array, wherein the processor array has a plurality of processor units distributed in an array, the processor units have processor nodes and register groups, and adjacent processor nodes exchange data through the register groups; 动态随机存储器层,所述动态随机存储器层具有一层或若干层动态随机存储器,所述动态随机存储器层堆叠于所述处理器阵列上层,各所述动态随机存储器与各所述处理器节点分别连接。A dynamic random access memory layer, wherein the dynamic random access memory layer has one or more layers of dynamic random access memory, the dynamic random access memory layer is stacked on the upper layer of the processor array, and each dynamic random access memory is connected to each processor node respectively. 2.如权利要求1所述的高度可编程处理器阵列网络,其特征在于,所述处理器阵列具有2n个呈阵列分布的所述处理器单元,其中n为自然数;2. The highly programmable processor array network according to claim 1, wherein the processor array has 2 n processor units distributed in an array, wherein n is a natural number; 当n为偶数时,所述处理器阵列具有2n/2行和2n/2列;When n is an even number, the processor array has 2 n/2 rows and 2 n/2 columns; 当n为奇数时,所述处理器阵列具有2 (n-1)/2行和2 (n+1)/2列。When n is an odd number, the processor array has 2 (n-1)/2 rows and 2 (n+1)/2 columns. 3.如权利要求1所述的高度可编程处理器阵列网络,其特征在于,在单个所述处理器单元中,所述寄存器群位于所述处理器节点的四周,所述寄存器群分别与四个方向的其他相邻处理器单元进行数据交互。3. The highly programmable processor array network as described in claim 1 is characterized in that, in a single processor unit, the register group is located around the processor node, and the register group exchanges data with other adjacent processor units in four directions respectively. 4.如权利要求3所述的高度可编程处理器阵列网络,其特征在于,在单个所述处理器单元中,所述处理器节点与所述寄存器群通过金属层铜互联实现连接;4. The highly programmable processor array network as claimed in claim 3, characterized in that in a single processor unit, the processor node and the register group are connected via a metal layer copper interconnect; 在相邻所述处理器单元中,相邻两个所述寄存器群通过金属层铜互联实现连接。In adjacent processor units, two adjacent register groups are connected via metal layer copper interconnection. 5.如权利要求1所述的高度可编程处理器阵列网络,其特征在于,在单个所述处理器单元中,所述处理器节点的四周设置有存储控制器节点,所述处理器节点通过四个方向的所述存储控制器节点分别与四个方向的所述寄存器群进行数据交互。5. The highly programmable processor array network as described in claim 1 is characterized in that, in a single processor unit, storage controller nodes are arranged around the processor node, and the processor node interacts with the register groups in four directions through the storage controller nodes in four directions respectively. 6.如权利要求5所述的高度可编程处理器阵列网络,其特征在于,在单个所述处理器单元中,所述处理器节点与所述存储控制器节点通过金属层铜互联实现连接,所述存储控制器节点与所述寄存器群通过金属层铜互联实现连接;6. The highly programmable processor array network according to claim 5, characterized in that, in a single processor unit, the processor node is connected to the storage controller node via a metal layer copper interconnect, and the storage controller node is connected to the register group via a metal layer copper interconnect; 在相邻所述处理器单元中,相邻两个所述寄存器群通过金属层铜互联实现连接。In adjacent processor units, two adjacent register groups are connected via metal layer copper interconnection. 7.如权利要求1所述的高度可编程处理器阵列网络,其特征在于,所述处理器节点包括中央处理器、图形处理器、张量处理器、处理器分散处理器、智能处理器或神经网络处理器。7. The highly programmable processor array network of claim 1, wherein the processor nodes include central processing units, graphics processing units, tensor processors, processor distribution processors, intelligent processors, or neural network processors. 8.如权利要求1至7中任意一项所述的高度可编程处理器阵列网络,其特征在于,所述处理器阵列投影与所述动态随机存储器层区域完全重合。8. A highly programmable processor array network as described in any one of claims 1 to 7, characterized in that the processor array projection completely overlaps with the dynamic random access memory layer area. 9.如权利要求8所述的高度可编程处理器阵列网络,其特征在于,所述处理器节点通过所述寄存器群与所述动态随机存储器层实现连接。9. The highly programmable processor array network as described in claim 8, characterized in that the processor node is connected to the dynamic random access memory layer through the register group. 10.如权利要求9所述的高度可编程处理器阵列网络,其特征在于,所述寄存器群与所述动态随机存储器层通过金属层铜互联实现连接。10. The highly programmable processor array network as claimed in claim 9, wherein the register group and the dynamic random access memory layer are connected via metal layer copper interconnection. 11.如权利要求8所述的高度可编程处理器阵列网络,其特征在于,所述动态随机存储器层内具有属于各所述处理器节点的独立存储空间。11. The highly programmable processor array network as described in claim 8, characterized in that the dynamic random access memory layer has an independent storage space belonging to each of the processor nodes. 12.如权利要求8所述的高度可编程处理器阵列网络,其特征在于,各所述处理器节点共享所述动态随机存储器层内的所有地址空间。12. The highly programmable processor array network as claimed in claim 8, wherein each of the processor nodes shares all address spaces within the dynamic random access memory layer.
CN202411613152.9A 2024-11-13 2024-11-13 Highly programmable processor array network Pending CN119149485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411613152.9A CN119149485A (en) 2024-11-13 2024-11-13 Highly programmable processor array network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411613152.9A CN119149485A (en) 2024-11-13 2024-11-13 Highly programmable processor array network

Publications (1)

Publication Number Publication Date
CN119149485A true CN119149485A (en) 2024-12-17

Family

ID=93803658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411613152.9A Pending CN119149485A (en) 2024-11-13 2024-11-13 Highly programmable processor array network

Country Status (1)

Country Link
CN (1) CN119149485A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1645352A (en) * 2004-01-21 2005-07-27 汤姆森许可贸易公司 Method for managing data in an array processor and array processor carrying out this method
CN108090022A (en) * 2016-11-22 2018-05-29 英特尔公司 Programmable integrated circuit with stacked memory dies for storing configuration data
US20190379380A1 (en) * 2019-08-20 2019-12-12 Intel Corporation Stacked programmable integrated circuitry with smart memory
CN118626434A (en) * 2024-04-25 2024-09-10 北京清微智能科技有限公司 A three-dimensional storage reconfigurable computing array and packaging method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1645352A (en) * 2004-01-21 2005-07-27 汤姆森许可贸易公司 Method for managing data in an array processor and array processor carrying out this method
CN108090022A (en) * 2016-11-22 2018-05-29 英特尔公司 Programmable integrated circuit with stacked memory dies for storing configuration data
US20190379380A1 (en) * 2019-08-20 2019-12-12 Intel Corporation Stacked programmable integrated circuitry with smart memory
CN118626434A (en) * 2024-04-25 2024-09-10 北京清微智能科技有限公司 A three-dimensional storage reconfigurable computing array and packaging method

Similar Documents

Publication Publication Date Title
CN113312299B (en) Safety communication system between cores of multi-core heterogeneous domain controller
US20190258601A1 (en) Memory Processing Core Architecture
CN102141975B (en) computer system
US20250097282A1 (en) Three-class vertex degree aware-based 1.5-dimensional graph division method and application
CN117493237B (en) Computing device, server, data processing method, and storage medium
CN104166597B (en) A kind of method and device for distributing long-distance inner
US11645225B2 (en) Partitionable networked computer
WO2024159988A1 (en) Data-flow-driven reconfigurable processor chip and reconfigurable processor cluster
CN107959643A (en) A kind of exchange system and its routing algorithm built by exchange chip
US11461234B2 (en) Coherent node controller
CN116578523B (en) Network-on-chip system and control method thereof
CN104125293A (en) Cloud server and application method thereof
CN215601334U (en) 3D-IC baseband chip and stacked chip
Skeie et al. Flexible DOR routing for virtualization of multicore chips
CN118519753B (en) A computing resource aggregation method and system based on pooled memory
Kao et al. Design of high-radix clos network-on-chip
CN118445089A (en) Three-dimensional stacked storage and computing integrated SRAM and CPU integrated storage and computing architecture and implementation method
CN119149485A (en) Highly programmable processor array network
CN116610630B (en) Multi-core system and data transmission method based on network-on-chip
CN118939425A (en) An intelligent computing cluster based on multi-core CPU-NPU collaboration
CN114116167B (en) High-performance computing-oriented regional autonomous heterogeneous many-core processor
Mamidala et al. Optimizing mpi collectives using efficient intra-node communication techniques over the blue gene/p supercomputer
CN107341057A (en) A data processing method and device
Dhanraj Enhancement of LiMIC-Based Collectives for Multi-core Clusters
CN119211163B (en) A PCIe switch chip system supporting online computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination