[go: up one dir, main page]

CN120144534B - Chip design method and chip system - Google Patents

Chip design method and chip system

Info

Publication number
CN120144534B
CN120144534B CN202510629574.3A CN202510629574A CN120144534B CN 120144534 B CN120144534 B CN 120144534B CN 202510629574 A CN202510629574 A CN 202510629574A CN 120144534 B CN120144534 B CN 120144534B
Authority
CN
China
Prior art keywords
dram
chip
shared
local
dram controller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510629574.3A
Other languages
Chinese (zh)
Other versions
CN120144534A (en
Inventor
段帅君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinfangzhou Shanghai Integrated Circuit Co ltd
Original Assignee
Xinfangzhou Shanghai Integrated Circuit Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinfangzhou Shanghai Integrated Circuit Co ltd filed Critical Xinfangzhou Shanghai Integrated Circuit Co ltd
Priority to CN202510629574.3A priority Critical patent/CN120144534B/en
Publication of CN120144534A publication Critical patent/CN120144534A/en
Application granted granted Critical
Publication of CN120144534B publication Critical patent/CN120144534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • G06F15/7846On-chip cache and off-chip main memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • G06F15/785Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers) with decentralized control, e.g. smart memories
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/401Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C5/00Details of stores covered by group G11C11/00
    • G11C5/02Disposition of storage elements, e.g. in the form of a matrix array
    • G11C5/025Geometric lay-out considerations of storage- and peripheral-blocks in a semiconductor storage device
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Dram (AREA)

Abstract

The invention belongs to the technical field of semiconductors, and particularly relates to a chip design method and a chip system. The chip design method comprises the steps of integrating a local DRAM controller, a global DRAM controller and a shared DRAM controller on one processor core, and integrating a plurality of processor cores on a logic chip. The DRAM chips are vertically stacked on the logic chips, each global DRAM controller is connected with the shared DRAM storage space, and the shared DRAM controllers are respectively connected with the corresponding private DRAM storage spaces. The invention directly connects to the shared DRAM storage space through the global DRAM controller, reduces intermediate links, shortens data paths, improves data exchange speed, and realizes memory sharing data interaction among processor cores through the direct connection of the shared DRAM controller to the private DRAM storage space of each processor core.

Description

Chip design method and chip system
Technical Field
The invention belongs to the technical field of semiconductors, and particularly relates to a chip design method and a chip system.
Background
The three-dimensional DRAM technology greatly improves the storage density and the data access speed by vertically stacking the storage units, but in a large-scale parallel data processing scene, how to efficiently manage the data flow, avoid bus congestion and become key factors for restricting the system performance.
Traditional data routing algorithms are struggled to handle highly concurrent access and high bandwidth requirements, particularly in distributed bus architectures where the flexibility and efficiency of data routing directly affects the response speed and energy efficiency ratio of the overall memory system.
Disclosure of Invention
Aiming at the technical problems that delay or congestion exists in the existing data transmission path and the storage response speed and the energy efficiency ratio are affected, the invention aims to provide a chip design method and a chip system.
In order to solve the foregoing technical problem, a first aspect of the present invention provides a chip design method, including:
Integrating a local DRAM controller, a global DRAM controller and a shared DRAM controller which are respectively connected with the local DRAM controller on a processor core, integrating a plurality of the processor cores on a logic chip, and connecting the local DRAM controllers in the logic chip;
And vertically stacking a DRAM chip with a shared DRAM storage space and a plurality of private DRAM storage spaces on the logic chip through a three-dimensional stacking storage technology, connecting each global DRAM controller with the shared DRAM storage space, and connecting each shared DRAM controller with the private DRAM storage space corresponding to each shared DRAM controller.
Optionally, in the chip design method as described above, the local DRAM controllers are connected through a NoC structure, so that the local DRAM controller of each processor core is responsible for managing local memory access requests thereof, and cooperates with local controllers of other processor cores through the NoC.
Optionally, in the chip design method as described above, a logic layer is further integrated on the logic chip, and each local DRAM controller is connected to the logic layer through an independent bus to form a plurality of parallel data transmission channels, so as to allow a plurality of data transmissions to be performed simultaneously.
Optionally, in the chip design method as described above, the logic layer is a CPU or an NPU.
Optionally, in the chip design method as described above, the chip design method further includes:
a central arbiter is designed for allocating bandwidth resources of the bus.
Optionally, in the chip design method as described above, the chip design method further includes:
And integrating a memory data prefetching model in the local DRAM controller, wherein the memory data prefetching model is used for predicting future data access, and the global DRAM controller or the shared DRAM controller is utilized to load data from the shared DRAM storage space or the private DRAM storage space into a cache in advance.
Optionally, in the chip design method as described above, the global DRAM controller is connected to the shared DRAM memory space by using a metal layer copper interconnect.
Optionally, in the chip design method as described above, when the shared DRAM controller is connected to the corresponding private DRAM storage space, a metal layer copper interconnection is used to implement connection.
In order to solve the foregoing technical problem, a second aspect of the present invention provides a chip system, including:
The logic chip is integrated with a plurality of processor cores, the processor cores are integrated with a local DRAM controller, a global DRAM controller and a shared DRAM controller, the local DRAM controllers are respectively connected with the global DRAM controller and the shared DRAM controller, and the local DRAM controllers of the processor cores are connected with each other;
the DRAM chip is vertically stacked on the logic chip and is provided with a shared DRAM storage space and a plurality of private DRAM storage spaces, the shared DRAM storage spaces are respectively connected with the global DRAM controller, and each private DRAM storage space is connected with a corresponding shared DRAM controller.
Optionally, in the aforementioned chip system, the local DRAM controllers are connected through a NoC structure, so that the local DRAM controller of each processor core is responsible for managing local memory access requests thereof, and cooperates with local controllers of other processor cores through the NoC.
Optionally, in the chip system as described above, the logic chip further integrates a logic layer, and each of the local DRAM controllers is connected to the logic layer through an independent bus to form a plurality of parallel data transmission channels, so as to allow multiple data transmissions to be performed simultaneously.
Optionally, in the foregoing chip system, the logic layer is a CPU or an NPU.
Optionally, in the chip system as described above, the chip system further includes:
And the central arbiter is used for distributing bandwidth resources of the bus.
Optionally, in the chip system as described above, the local DRAM controller integrates a memory data prefetching model, where the memory data prefetching model is used to predict future data accesses, and the global DRAM controller or the shared DRAM controller is used to load data from the shared DRAM storage space or the private DRAM storage space into the cache in advance.
Optionally, in the chip system as described above, the global DRAM controller is connected to the shared DRAM memory space by using a metal layer copper interconnect.
Optionally, in the chip system as described above, the connection is implemented by using a metal layer copper interconnect when the shared DRAM controller is connected to the corresponding private DRAM storage space.
The invention has the positive progress effects that:
1. According to the invention, under the condition that the processor cores are provided with the local DRAM controllers, the two DRAM controllers are additionally arranged, specifically, the global DRAM controller is directly connected to the shared DRAM storage space, so that intermediate links are reduced, a data path is shortened, the data exchange speed is improved, and the shared DRAM controllers are directly connected with the private DRAM storage space of each processor core, thereby realizing the memory sharing data interaction between the processor cores.
2. The local DRAM controller of each processor core is connected with the logic layer through the independent high-speed bus to form a plurality of parallel data transmission channels, so that a plurality of data transmission can be performed simultaneously, and the data throughput is obviously improved.
3. The invention adopts a distributed memory management strategy, the local DRAM controller of each processor core is responsible for managing local memory access requests, reduces the pressure of global arbitration, and ensures the efficient scheduling and consistency of global memory access by the cooperation of NoC (Network-on-Chip) technology and the local DRAM controllers of other processor cores.
4. The invention can dynamically allocate resources according to task priority, data locality and bus idle state by the design of the central arbiter, thereby ensuring high-efficiency and orderly data access.
5. The invention predicts future data access through the memory data prefetching model integrated in the local DRAM controller, is closely cooperated with the cache subsystem, optimizes data flow, reduces unnecessary data migration, and further improves the overall performance and response speed of the system.
Drawings
The present disclosure will become more apparent with reference to the accompanying drawings. It is to be understood that these drawings are solely for purposes of illustration and are not intended as a definition of the limits of the invention. In the figure:
FIG. 1 is a schematic diagram of a partial connection relationship according to the present invention;
FIG. 2 is a diagram showing a partial connection relationship between a DRAM chip and a logic chip according to the present invention;
FIG. 3 is a schematic diagram of a portion of a DRAM chip vertically stacked on a logic chip according to the present invention;
FIG. 4 is a block diagram showing a connection between a DRAM chip and a logic chip according to the present invention.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which is to be read in light of the specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention.
It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
In the description of the present invention, it should be noted that, for the azimuth terms, such as terms "outside," "middle," "inside," "outside," and the like, the azimuth and positional relationships are indicated based on the azimuth or positional relationships shown in the drawings, only for convenience in describing the present invention and simplifying the description, but not to indicate or imply that the apparatus or element to be referred to must have a specific azimuth, be configured and operated in a specific azimuth, and should not be construed as limiting the specific protection scope of the present invention.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features. Thus, the definition of "a first", "a second" feature may explicitly or implicitly include one or more of such feature, and in the description of the present invention, the meaning of "a number", "a number" is two or more, unless otherwise specifically defined.
The embodiment of the invention provides a chip design method, which comprises the following steps:
S1, integrating a local DRAM controller, a global DRAM controller connected with the local DRAM controller and a shared DRAM controller connected with the local DRAM controller on a processor core, integrating a plurality of processor cores on a logic chip, and connecting the local DRAM controllers in the logic chip.
S2, vertically stacking the DRAM chips with the shared DRAM storage space and the private DRAM storage spaces on the logic chip through a three-dimensional stacking storage technology, connecting all the global DRAM controllers with the shared DRAM storage space, and connecting all the shared DRAM controllers with the private DRAM storage spaces corresponding to all the shared DRAM controllers.
In the prior art, a local DRAM controller is typically provided on the processor core, and is responsible for managing its local memory access requests. The invention is based on the fact that two DRAM controllers are additionally integrated in the processor core, and the two DRAM controllers are directly connected with the upper layer stacked DRAM.
In DRAM, the present invention divides it into one shared DRAM memory space and several private DRAM memory spaces. The shared DRAM memory space is a memory space which is common to all processor cores and can be accessed, the private DRAM memory space is a memory space which is unique to each processor core, and each processor core cannot directly access the private DRAM memory space of other processor cores. Therefore, the two DRAM controllers are respectively a global DRAM controller and a shared DRAM controller, wherein the global DRAM controller is directly connected to the shared DRAM storage space, so that intermediate links are reduced, a data path is shortened, and the data exchange speed is improved. The shared DRAM controllers are connected with the unique private DRAM storage spaces so as to realize memory sharing data interaction among the processor cores through the shared DRAM controllers.
By the method, the data transmission path is optimized, delay is reduced, and the bandwidth utilization rate is improved, so that the overall performance of the storage system is remarkably improved in large data processing, high-performance computing and cloud computing environments.
In some embodiments, the local DRAM controllers are connected by a NoC structure, so that the local DRAM controller of each processor core is responsible for managing local memory access requests thereof, and the NoC cooperates with the local controllers of other processor cores to ensure efficient scheduling and consistency of global memory access.
In some embodiments, a logic layer is further integrated on the logic chip, and each local DRAM controller is connected to the logic layer through an independent bus to form a plurality of parallel data transmission channels, so that a plurality of data transmissions are allowed to be performed simultaneously, and data throughput is significantly improved.
In some embodiments, the logical layer is a CPU or NPU.
In some embodiments, the chip design method further comprises designing a central arbiter for allocating bandwidth resources of the bus.
The central arbiter in this embodiment may directly adopt the prior art, for example, the central arbiter processes the bus access request according to the preset rule and the real-time load information, and for example, the central arbiter dynamically allocates resources according to at least one factor of the task priority, the data locality, the bus idle state, and the like, so as to ensure efficient and ordered data access.
In some embodiments, the chip design method further includes integrating a memory data pre-fetch model in the local DRAM controller, the memory data pre-fetch model for predicting future data accesses, and utilizing the global DRAM controller or the shared DRAM controller to load data from the shared DRAM memory space or the private DRAM memory space into the cache in advance.
The memory data prefetching model in this embodiment may directly employ the prior art, such as a model that predicts future data accesses using locality principles. By loading data from DRAM to cache in advance and tightly cooperating with the cache subsystem, the data flow is optimized, unnecessary data migration is reduced, and the overall performance and response speed of the system are further improved.
In some embodiments, the global DRAM controller is connected to the shared DRAM memory space using a metal layer copper interconnect.
In some embodiments, the shared DRAM controller is connected to the corresponding private DRAM memory space using a metal layer copper interconnect.
The embodiment of the invention also provides a chip system which comprises a logic chip and a DRAM chip, wherein the DRAM chip is vertically stacked on the logic chip through a three-dimensional stacking storage technology.
The logic chip is integrated with a plurality of processor cores, the processor cores are integrated with a local DRAM controller, a global DRAM controller and a shared DRAM controller, and the local DRAM controllers are respectively connected with the global DRAM controller and the shared DRAM controller, and the local DRAM controllers of the processor cores are connected.
The DRAM chip is provided with a shared DRAM storage space and a plurality of private DRAM storage spaces, wherein the shared DRAM storage spaces are respectively connected with the global DRAM controller, and each private DRAM storage space is connected with a corresponding shared DRAM controller.
As shown in fig. 1, four processor cores in a logic chip are shown, a first processor core 10, a second processor core 20, a third processor core 30, and a fourth processor core 40, respectively. Of course the logic chip may also have more or fewer processor cores. Taking a four processor core as an example,
The first processor core 10 has a first local DRAM controller 11, a first global DRAM controller 12, and a first shared DRAM controller 13, wherein the first local DRAM controller 11 is connected to the first global DRAM controller 12 and the first shared DRAM controller 13, respectively.
The second processor core 20 has a second local DRAM controller 21, a second global DRAM controller 22, and a second shared DRAM controller 23, and the second local DRAM controller 21 is connected to the second global DRAM controller 22 and the second shared DRAM controller 23, respectively.
The third processor core 30 has a third local DRAM controller 31, a third global DRAM controller 32, and a third shared DRAM controller 33, wherein the third local DRAM controller 31 is connected to the third global DRAM controller 32 and the third shared DRAM controller 33, respectively.
The fourth processor core 40 has a fourth local DRAM controller 41, a fourth global DRAM controller 42, and a fourth shared DRAM controller 43, wherein the fourth local DRAM controller 41 is connected to the fourth global DRAM controller 42 and the fourth shared DRAM controller 43, respectively.
The first local DRAM controller 11, the second local DRAM controller 21, the third local DRAM controller 31 and the fourth local DRAM controller 41 are connected by signals, so that the cooperation of each processor core is realized through a distributed memory management strategy, and the efficient scheduling and consistency of global memory access are ensured.
The four processor cores are respectively allocated with an independent private DRAM memory space, specifically, the first processor core 10 corresponds to a first private DRAM memory space 51, and the first private DRAM memory space 51 is connected to the first shared DRAM controller 13. The second processor core 20 corresponds to a second private DRAM memory space 52, the second private DRAM memory space 52 being connected to the second shared DRAM controller 23. The third processor core 30 corresponds to a third private DRAM memory space 53, the third private DRAM memory space 53 being connected to the third shared DRAM controller 33. The fourth processor core 40 corresponds to a fourth private DRAM memory space 54, the fourth private DRAM memory space 54 being connected to the fourth shared DRAM controller 43.
The four processor cores have shared DRAM memory 55 in common. Wherein the shared DRAM memory space 55 is respectively connected to the first global DRAM controller 12, the second global DRAM controller 22, the third global DRAM controller 32, and the fourth global DRAM controller 42.
As shown in fig. 2 and 3, there are 8 processor cores in the logic chip 6, each local DRAM controller corresponding to each processor core forming a local DRAM controller cluster. Each local DRAM controller is connected to each memory space in the DRAM chip 7 by a shared DRAM controller or a global DRAM controller.
As shown in fig. 4, the logic chip 6 is shown with 2 processor cores, namely, the processor core 61 and the processor core 62, respectively, and the processor core 61 or the processor core 62 is connected with each memory space in the DRAM chip 7 in a Hybrid Bonding manner.
Since the DRAM chip 7 is vertically stacked on the logic chip 6 by the three-dimensional stacking memory technology, the hybrid bonding technology is adopted, and the method is better suitable for a three-dimensional stacking scene.
In some embodiments, the local DRAM controllers of the processor cores are connected by a NoC structure, so that the local DRAM controller of each processor core is responsible for managing local memory access requests thereof, and the NoC cooperates with the local controllers of other processor cores to ensure efficient scheduling and consistency of global memory access.
In some embodiments, the logic chip is further integrated with a logic layer, and each local DRAM controller is connected to the logic layer through an independent bus to form a plurality of parallel data transmission channels, so that a plurality of data transmissions are allowed to be performed simultaneously, and the data throughput is significantly improved.
The system-on-chip also includes a central arbiter for allocating bandwidth resources of the bus.
In some embodiments, the logical layer is a CPU or NPU.
In some embodiments, a memory data prefetch model is integrated in the local DRAM controller, the memory data prefetch model being used to predict future data accesses, data being loaded into the cache from either the shared DRAM memory space or the private DRAM memory space in advance using either the global DRAM controller or the shared DRAM controller.
In some embodiments, the global DRAM controller is connected to the shared DRAM memory space using a metal layer copper interconnect.
In some embodiments, the shared DRAM controller is connected to the corresponding private DRAM memory space using a metal layer copper interconnect.
The embodiments of the invention realize high-efficiency data read-write operation by adopting a distributed bus DRAM control technology, particularly by adopting a distributed control logic in a three-dimensional stacked structure, and optimizing a data path, and are particularly suitable for high-performance computing scenes such as intelligent driving and the like. In this way, not only is storage bandwidth and density increased, but the ability of the system to handle real-time data is also enhanced. The invention adopts a framework combining distributed control logic and multi-bus design, and further improves the efficiency of data read-write operation.
The present invention has been described in detail with reference to the embodiments of the drawings, and those skilled in the art can make various modifications to the invention based on the above description. Accordingly, certain details of the embodiments are not to be interpreted as limiting the invention, which is defined by the appended claims.

Claims (8)

1.一种芯片设计方法,其特征在于,所述芯片设计方法包括:1. A chip design method, characterized in that the chip design method comprises: 将一个本地DRAM控制器、以及与所述本地DRAM控制器分别连接的一个全局DRAM控制器、一个共享DRAM控制器集成于一个处理器核心上,将若干所述处理器核心集成于逻辑芯片上,在所述逻辑芯片中,将各所述本地DRAM控制器之间进行连接;Integrating a local DRAM controller, a global DRAM controller connected to the local DRAM controller, and a shared DRAM controller on a processor core, integrating several of the processor cores on a logic chip, and connecting the local DRAM controllers in the logic chip; 将具有共享DRAM存储空间和若干私有DRAM存储空间的DRAM芯片通过三维堆叠存储技术垂直堆叠在所述逻辑芯片上,将各所述全局DRAM控制器均与所述共享DRAM存储空间连接,将各所述共享DRAM控制器分别与各自对应的所述私有DRAM存储空间连接;Vertically stacking DRAM chips having a shared DRAM storage space and a plurality of private DRAM storage spaces on the logic chip using a three-dimensional stacking storage technology, connecting each of the global DRAM controllers to the shared DRAM storage space, and connecting each of the shared DRAM controllers to the corresponding private DRAM storage space; 各所述本地DRAM控制器之间通过NoC结构实现连接,以使得每个所述处理器核心的所述本地DRAM控制器负责管理其局部内存访问请求,并通过NoC与其他处理器核心的本地控制器协同工作;The local DRAM controllers are connected to each other via a NoC structure, so that the local DRAM controller of each processor core is responsible for managing its local memory access requests and working in conjunction with the local controllers of other processor cores via the NoC; 在所述逻辑芯片上还集成有逻辑层,各所述本地DRAM控制器通过独立的总线与所述逻辑层连接,以形成多条并行数据传输通道,允许同时进行多个数据传输。A logic layer is also integrated on the logic chip. Each local DRAM controller is connected to the logic layer via an independent bus to form multiple parallel data transmission channels, allowing multiple data transmissions to be performed simultaneously. 2.如权利要求1所述的芯片设计方法,其特征在于,所述逻辑层为CPU或NPU;2. The chip design method according to claim 1, wherein the logic layer is a CPU or an NPU; 和/或,所述芯片设计方法还包括:设计一个中央仲裁器,所述中央仲裁器用于分配所述总线的带宽资源。And/or, the chip design method further includes: designing a central arbitrator, wherein the central arbitrator is used to allocate bandwidth resources of the bus. 3.如权利要求1所述的芯片设计方法,其特征在于,所述芯片设计方法还包括:3. The chip design method according to claim 1, further comprising: 在所述本地DRAM控制器中集成内存数据预取模型,所述内存数据预取模型用于预测未来数据访问,利用所述全局DRAM控制器或所述共享DRAM控制器将数据提前从所述共享DRAM存储空间或所述私有DRAM存储空间加载至缓存。A memory data prefetch model is integrated in the local DRAM controller, and the memory data prefetch model is used to predict future data access, and the global DRAM controller or the shared DRAM controller is used to load data from the shared DRAM storage space or the private DRAM storage space to the cache in advance. 4.如权利要求1所述的芯片设计方法,其特征在于,所述全局DRAM控制器与所述共享DRAM存储空间连接时采用金属层铜互联实现连接;4. The chip design method according to claim 1, wherein the global DRAM controller is connected to the shared DRAM storage space using a metal layer copper interconnection; 所述共享DRAM控制器与对应的所述私有DRAM存储空间连接时采用金属层铜互联实现连接。The shared DRAM controller is connected to the corresponding private DRAM storage space by using metal layer copper interconnection to achieve connection. 5.一种芯片系统,其特征在于,所述芯片系统包括:5. A chip system, characterized in that the chip system comprises: 逻辑芯片,所述逻辑芯片集成有若干处理器核心,所述处理器核心集成有一个本地DRAM控制器、一个全局DRAM控制器以及一个共享DRAM控制器,所述本地DRAM控制器分别连接所述全局DRAM控制器、所述共享DRAM控制器,各所述处理器核心的所述本地DRAM控制器之间连接;A logic chip, wherein the logic chip integrates several processor cores, each of which integrates a local DRAM controller, a global DRAM controller, and a shared DRAM controller, wherein the local DRAM controllers are connected to the global DRAM controller and the shared DRAM controller, respectively, and the local DRAM controllers of the processor cores are connected to each other; DRAM芯片,所述DRAM芯片垂直堆叠在所述逻辑芯片上,所述DRAM芯片具有共享DRAM存储空间和若干私有DRAM存储空间,所述共享DRAM存储空间分别与所述全局DRAM控制器连接,各所述私有DRAM存储空间与对应的一个所述共享DRAM控制器连接;a DRAM chip, wherein the DRAM chip is vertically stacked on the logic chip, the DRAM chip having a shared DRAM storage space and a plurality of private DRAM storage spaces, the shared DRAM storage spaces being respectively connected to the global DRAM controller, and each of the private DRAM storage spaces being connected to a corresponding one of the shared DRAM controllers; 各所述本地DRAM控制器之间通过NoC结构实现连接;The local DRAM controllers are connected via a NoC structure; 所述逻辑芯片上还集成有逻辑层,各所述本地DRAM控制器通过独立的总线与所述逻辑层连接。A logic layer is also integrated on the logic chip, and each of the local DRAM controllers is connected to the logic layer via an independent bus. 6.如权利要求5所述的芯片系统,其特征在于,所述逻辑层为CPU或NPU;6. The chip system according to claim 5, wherein the logic layer is a CPU or an NPU; 和/或,所述芯片系统还包括:中央仲裁器,所述中央仲裁器用于分配所述总线的带宽资源。And/or, the chip system further includes: a central arbitrator, wherein the central arbitrator is configured to allocate bandwidth resources of the bus. 7.如权利要求5所述的芯片系统,其特征在于,所述本地DRAM控制器中集成内存数据预取模型,所述内存数据预取模型用于预测未来数据访问,利用所述全局DRAM控制器或所述共享DRAM控制器将数据提前从所述共享DRAM存储空间或所述私有DRAM存储空间加载至缓存。7. The chip system as described in claim 5 is characterized in that a memory data prefetch model is integrated in the local DRAM controller, and the memory data prefetch model is used to predict future data access, and use the global DRAM controller or the shared DRAM controller to load data from the shared DRAM storage space or the private DRAM storage space to the cache in advance. 8.如权利要求5所述的芯片系统,其特征在于,所述全局DRAM控制器与所述共享DRAM存储空间连接时采用金属层铜互联实现连接;8. The chip system according to claim 5, wherein the global DRAM controller is connected to the shared DRAM storage space by using a metal layer copper interconnection; 所述共享DRAM控制器与对应的所述私有DRAM存储空间连接时采用金属层铜互联实现连接。The shared DRAM controller is connected to the corresponding private DRAM storage space by using metal layer copper interconnection to achieve connection.
CN202510629574.3A 2025-05-16 2025-05-16 Chip design method and chip system Active CN120144534B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510629574.3A CN120144534B (en) 2025-05-16 2025-05-16 Chip design method and chip system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510629574.3A CN120144534B (en) 2025-05-16 2025-05-16 Chip design method and chip system

Publications (2)

Publication Number Publication Date
CN120144534A CN120144534A (en) 2025-06-13
CN120144534B true CN120144534B (en) 2025-08-19

Family

ID=95947558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510629574.3A Active CN120144534B (en) 2025-05-16 2025-05-16 Chip design method and chip system

Country Status (1)

Country Link
CN (1) CN120144534B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116151345A (en) * 2023-04-20 2023-05-23 西安紫光展锐科技有限公司 Data transmission method, device, electronic equipment and storage medium
CN116737617A (en) * 2023-08-11 2023-09-12 上海芯高峰微电子有限公司 Access controller
CN118445089A (en) * 2024-07-08 2024-08-06 芯方舟(上海)集成电路有限公司 Three-dimensional stacked storage and computing integrated SRAM and CPU integrated storage and computing architecture and implementation method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6738870B2 (en) * 2000-12-22 2004-05-18 International Business Machines Corporation High speed remote storage controller
CN100375067C (en) * 2005-10-28 2008-03-12 中国人民解放军国防科学技术大学 Heterogeneous multi-core microprocessor local space shared storage method
KR20070112950A (en) * 2006-05-24 2007-11-28 삼성전자주식회사 Multi-Port Memory Devices, Multi-Processor Systems Including Multi-Port Memory Devices, and Methods of Data Delivery in Multi-Processor Systems
CN103019809B (en) * 2011-09-28 2015-05-27 中国移动通信集团公司 Business processing device and method, and business processing control device
US11704271B2 (en) * 2020-08-20 2023-07-18 Alibaba Group Holding Limited Scalable system-in-package architectures
CN113643739B (en) * 2021-09-02 2025-02-07 西安紫光国芯半导体股份有限公司 LLC chip and cache system
US12294368B2 (en) * 2021-09-24 2025-05-06 Intel Corporation Three-dimensional stacked programmable logic fabric and processor design architecture

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116151345A (en) * 2023-04-20 2023-05-23 西安紫光展锐科技有限公司 Data transmission method, device, electronic equipment and storage medium
CN116737617A (en) * 2023-08-11 2023-09-12 上海芯高峰微电子有限公司 Access controller
CN118445089A (en) * 2024-07-08 2024-08-06 芯方舟(上海)集成电路有限公司 Three-dimensional stacked storage and computing integrated SRAM and CPU integrated storage and computing architecture and implementation method

Also Published As

Publication number Publication date
CN120144534A (en) 2025-06-13

Similar Documents

Publication Publication Date Title
CN113312299B (en) Safety communication system between cores of multi-core heterogeneous domain controller
US11573900B2 (en) Proactive data prefetch with applied quality of service
US7477535B2 (en) 3D chip arrangement including memory manager
JP4497184B2 (en) Integrated device, layout method thereof, and program
US20120144104A1 (en) Partitioning of Memory Device for Multi-Client Computing System
US12113723B2 (en) Switch for transmitting packet, network on chip having the same, and operating method thereof
Poremba et al. There and back again: Optimizing the interconnect in networks of memory cubes
CN103186501A (en) Multiprocessor shared storage method and system
CN120144534B (en) Chip design method and chip system
CN112148653B (en) Data transmission device, data processing system, data processing method, and medium
EP2239665B1 (en) Distributed flash memory storage manager systems
CN1902611A (en) Data processing system
CN115202859A (en) Memory expansion method and related equipment
KR20220154009A (en) Computing storage architecture with multi-storage processing cores
US12353338B2 (en) Locality-based data processing
CN119440631B (en) Asynchronous prefetching system based on 3D stacking
CN119149485A (en) Highly programmable processor array network
CN121364959A (en) A unified memory architecture for distributed 3D-DRAM
Balasubramonian et al. Interconnection Networks within Large Caches
CN118860954A (en) Storage and computing chip and data processing method
CN120849317A (en) High-speed cache system and method based on three-dimensional dynamic random access memory
CN120104518A (en) A remote memory data caching method based on shared cache
Huang et al. An FPGA-Based Distributed Shared Memory Architecture Supporting CXL 2.0+ Specification
KR20260002426A (en) Memory system and operating method thereof
JP2026505514A (en) COMPUTING DEVICE, SERVER, DATA PROCESSING METHOD, AND STORAGE MEDIUM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant