CN120144534B

CN120144534B - Chip design method and chip system

Info

Publication number: CN120144534B
Application number: CN202510629574.3A
Authority: CN
Inventors: 段帅君
Original assignee: Xinfangzhou Shanghai Integrated Circuit Co ltd
Current assignee: Xinfangzhou Shanghai Integrated Circuit Co ltd
Priority date: 2025-05-16
Filing date: 2025-05-16
Publication date: 2025-08-19
Anticipated expiration: 2045-05-16
Also published as: CN120144534A

Abstract

The invention belongs to the technical field of semiconductors, and particularly relates to a chip design method and a chip system. The chip design method comprises the steps of integrating a local DRAM controller, a global DRAM controller and a shared DRAM controller on one processor core, and integrating a plurality of processor cores on a logic chip. The DRAM chips are vertically stacked on the logic chips, each global DRAM controller is connected with the shared DRAM storage space, and the shared DRAM controllers are respectively connected with the corresponding private DRAM storage spaces. The invention directly connects to the shared DRAM storage space through the global DRAM controller, reduces intermediate links, shortens data paths, improves data exchange speed, and realizes memory sharing data interaction among processor cores through the direct connection of the shared DRAM controller to the private DRAM storage space of each processor core.

Description

Chip design method and chip system

Technical Field

The invention belongs to the technical field of semiconductors, and particularly relates to a chip design method and a chip system.

Background

The three-dimensional DRAM technology greatly improves the storage density and the data access speed by vertically stacking the storage units, but in a large-scale parallel data processing scene, how to efficiently manage the data flow, avoid bus congestion and become key factors for restricting the system performance.

Traditional data routing algorithms are struggled to handle highly concurrent access and high bandwidth requirements, particularly in distributed bus architectures where the flexibility and efficiency of data routing directly affects the response speed and energy efficiency ratio of the overall memory system.

Disclosure of Invention

Aiming at the technical problems that delay or congestion exists in the existing data transmission path and the storage response speed and the energy efficiency ratio are affected, the invention aims to provide a chip design method and a chip system.

In order to solve the foregoing technical problem, a first aspect of the present invention provides a chip design method, including:

Integrating a local DRAM controller, a global DRAM controller and a shared DRAM controller which are respectively connected with the local DRAM controller on a processor core, integrating a plurality of the processor cores on a logic chip, and connecting the local DRAM controllers in the logic chip;

And vertically stacking a DRAM chip with a shared DRAM storage space and a plurality of private DRAM storage spaces on the logic chip through a three-dimensional stacking storage technology, connecting each global DRAM controller with the shared DRAM storage space, and connecting each shared DRAM controller with the private DRAM storage space corresponding to each shared DRAM controller.

Optionally, in the chip design method as described above, the local DRAM controllers are connected through a NoC structure, so that the local DRAM controller of each processor core is responsible for managing local memory access requests thereof, and cooperates with local controllers of other processor cores through the NoC.

Optionally, in the chip design method as described above, a logic layer is further integrated on the logic chip, and each local DRAM controller is connected to the logic layer through an independent bus to form a plurality of parallel data transmission channels, so as to allow a plurality of data transmissions to be performed simultaneously.

Optionally, in the chip design method as described above, the logic layer is a CPU or an NPU.

Optionally, in the chip design method as described above, the chip design method further includes:

a central arbiter is designed for allocating bandwidth resources of the bus.

And integrating a memory data prefetching model in the local DRAM controller, wherein the memory data prefetching model is used for predicting future data access, and the global DRAM controller or the shared DRAM controller is utilized to load data from the shared DRAM storage space or the private DRAM storage space into a cache in advance.

Optionally, in the chip design method as described above, the global DRAM controller is connected to the shared DRAM memory space by using a metal layer copper interconnect.

Optionally, in the chip design method as described above, when the shared DRAM controller is connected to the corresponding private DRAM storage space, a metal layer copper interconnection is used to implement connection.

In order to solve the foregoing technical problem, a second aspect of the present invention provides a chip system, including:

The logic chip is integrated with a plurality of processor cores, the processor cores are integrated with a local DRAM controller, a global DRAM controller and a shared DRAM controller, the local DRAM controllers are respectively connected with the global DRAM controller and the shared DRAM controller, and the local DRAM controllers of the processor cores are connected with each other;

the DRAM chip is vertically stacked on the logic chip and is provided with a shared DRAM storage space and a plurality of private DRAM storage spaces, the shared DRAM storage spaces are respectively connected with the global DRAM controller, and each private DRAM storage space is connected with a corresponding shared DRAM controller.

Optionally, in the aforementioned chip system, the local DRAM controllers are connected through a NoC structure, so that the local DRAM controller of each processor core is responsible for managing local memory access requests thereof, and cooperates with local controllers of other processor cores through the NoC.

Optionally, in the chip system as described above, the logic chip further integrates a logic layer, and each of the local DRAM controllers is connected to the logic layer through an independent bus to form a plurality of parallel data transmission channels, so as to allow multiple data transmissions to be performed simultaneously.

Optionally, in the foregoing chip system, the logic layer is a CPU or an NPU.

Optionally, in the chip system as described above, the chip system further includes:

And the central arbiter is used for distributing bandwidth resources of the bus.

Optionally, in the chip system as described above, the local DRAM controller integrates a memory data prefetching model, where the memory data prefetching model is used to predict future data accesses, and the global DRAM controller or the shared DRAM controller is used to load data from the shared DRAM storage space or the private DRAM storage space into the cache in advance.

Optionally, in the chip system as described above, the global DRAM controller is connected to the shared DRAM memory space by using a metal layer copper interconnect.

Optionally, in the chip system as described above, the connection is implemented by using a metal layer copper interconnect when the shared DRAM controller is connected to the corresponding private DRAM storage space.

The invention has the positive progress effects that:

1. According to the invention, under the condition that the processor cores are provided with the local DRAM controllers, the two DRAM controllers are additionally arranged, specifically, the global DRAM controller is directly connected to the shared DRAM storage space, so that intermediate links are reduced, a data path is shortened, the data exchange speed is improved, and the shared DRAM controllers are directly connected with the private DRAM storage space of each processor core, thereby realizing the memory sharing data interaction between the processor cores.

2. The local DRAM controller of each processor core is connected with the logic layer through the independent high-speed bus to form a plurality of parallel data transmission channels, so that a plurality of data transmission can be performed simultaneously, and the data throughput is obviously improved.

3. The invention adopts a distributed memory management strategy, the local DRAM controller of each processor core is responsible for managing local memory access requests, reduces the pressure of global arbitration, and ensures the efficient scheduling and consistency of global memory access by the cooperation of NoC (Network-on-Chip) technology and the local DRAM controllers of other processor cores.

4. The invention can dynamically allocate resources according to task priority, data locality and bus idle state by the design of the central arbiter, thereby ensuring high-efficiency and orderly data access.

5. The invention predicts future data access through the memory data prefetching model integrated in the local DRAM controller, is closely cooperated with the cache subsystem, optimizes data flow, reduces unnecessary data migration, and further improves the overall performance and response speed of the system.

Drawings

The present disclosure will become more apparent with reference to the accompanying drawings. It is to be understood that these drawings are solely for purposes of illustration and are not intended as a definition of the limits of the invention. In the figure:

FIG. 1 is a schematic diagram of a partial connection relationship according to the present invention;

FIG. 2 is a diagram showing a partial connection relationship between a DRAM chip and a logic chip according to the present invention;

FIG. 3 is a schematic diagram of a portion of a DRAM chip vertically stacked on a logic chip according to the present invention;

FIG. 4 is a block diagram showing a connection between a DRAM chip and a logic chip according to the present invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which is to be read in light of the specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention.

It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

In the description of the present invention, it should be noted that, for the azimuth terms, such as terms "outside," "middle," "inside," "outside," and the like, the azimuth and positional relationships are indicated based on the azimuth or positional relationships shown in the drawings, only for convenience in describing the present invention and simplifying the description, but not to indicate or imply that the apparatus or element to be referred to must have a specific azimuth, be configured and operated in a specific azimuth, and should not be construed as limiting the specific protection scope of the present invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features. Thus, the definition of "a first", "a second" feature may explicitly or implicitly include one or more of such feature, and in the description of the present invention, the meaning of "a number", "a number" is two or more, unless otherwise specifically defined.

The embodiment of the invention provides a chip design method, which comprises the following steps:

S1, integrating a local DRAM controller, a global DRAM controller connected with the local DRAM controller and a shared DRAM controller connected with the local DRAM controller on a processor core, integrating a plurality of processor cores on a logic chip, and connecting the local DRAM controllers in the logic chip.

S2, vertically stacking the DRAM chips with the shared DRAM storage space and the private DRAM storage spaces on the logic chip through a three-dimensional stacking storage technology, connecting all the global DRAM controllers with the shared DRAM storage space, and connecting all the shared DRAM controllers with the private DRAM storage spaces corresponding to all the shared DRAM controllers.

In the prior art, a local DRAM controller is typically provided on the processor core, and is responsible for managing its local memory access requests. The invention is based on the fact that two DRAM controllers are additionally integrated in the processor core, and the two DRAM controllers are directly connected with the upper layer stacked DRAM.

In DRAM, the present invention divides it into one shared DRAM memory space and several private DRAM memory spaces. The shared DRAM memory space is a memory space which is common to all processor cores and can be accessed, the private DRAM memory space is a memory space which is unique to each processor core, and each processor core cannot directly access the private DRAM memory space of other processor cores. Therefore, the two DRAM controllers are respectively a global DRAM controller and a shared DRAM controller, wherein the global DRAM controller is directly connected to the shared DRAM storage space, so that intermediate links are reduced, a data path is shortened, and the data exchange speed is improved. The shared DRAM controllers are connected with the unique private DRAM storage spaces so as to realize memory sharing data interaction among the processor cores through the shared DRAM controllers.

By the method, the data transmission path is optimized, delay is reduced, and the bandwidth utilization rate is improved, so that the overall performance of the storage system is remarkably improved in large data processing, high-performance computing and cloud computing environments.

In some embodiments, the local DRAM controllers are connected by a NoC structure, so that the local DRAM controller of each processor core is responsible for managing local memory access requests thereof, and the NoC cooperates with the local controllers of other processor cores to ensure efficient scheduling and consistency of global memory access.

In some embodiments, a logic layer is further integrated on the logic chip, and each local DRAM controller is connected to the logic layer through an independent bus to form a plurality of parallel data transmission channels, so that a plurality of data transmissions are allowed to be performed simultaneously, and data throughput is significantly improved.

In some embodiments, the logical layer is a CPU or NPU.

In some embodiments, the chip design method further comprises designing a central arbiter for allocating bandwidth resources of the bus.

The central arbiter in this embodiment may directly adopt the prior art, for example, the central arbiter processes the bus access request according to the preset rule and the real-time load information, and for example, the central arbiter dynamically allocates resources according to at least one factor of the task priority, the data locality, the bus idle state, and the like, so as to ensure efficient and ordered data access.

In some embodiments, the chip design method further includes integrating a memory data pre-fetch model in the local DRAM controller, the memory data pre-fetch model for predicting future data accesses, and utilizing the global DRAM controller or the shared DRAM controller to load data from the shared DRAM memory space or the private DRAM memory space into the cache in advance.

The memory data prefetching model in this embodiment may directly employ the prior art, such as a model that predicts future data accesses using locality principles. By loading data from DRAM to cache in advance and tightly cooperating with the cache subsystem, the data flow is optimized, unnecessary data migration is reduced, and the overall performance and response speed of the system are further improved.

In some embodiments, the global DRAM controller is connected to the shared DRAM memory space using a metal layer copper interconnect.

In some embodiments, the shared DRAM controller is connected to the corresponding private DRAM memory space using a metal layer copper interconnect.

The embodiment of the invention also provides a chip system which comprises a logic chip and a DRAM chip, wherein the DRAM chip is vertically stacked on the logic chip through a three-dimensional stacking storage technology.

The logic chip is integrated with a plurality of processor cores, the processor cores are integrated with a local DRAM controller, a global DRAM controller and a shared DRAM controller, and the local DRAM controllers are respectively connected with the global DRAM controller and the shared DRAM controller, and the local DRAM controllers of the processor cores are connected.

The DRAM chip is provided with a shared DRAM storage space and a plurality of private DRAM storage spaces, wherein the shared DRAM storage spaces are respectively connected with the global DRAM controller, and each private DRAM storage space is connected with a corresponding shared DRAM controller.

As shown in fig. 1, four processor cores in a logic chip are shown, a first processor core 10, a second processor core 20, a third processor core 30, and a fourth processor core 40, respectively. Of course the logic chip may also have more or fewer processor cores. Taking a four processor core as an example,

The first processor core 10 has a first local DRAM controller 11, a first global DRAM controller 12, and a first shared DRAM controller 13, wherein the first local DRAM controller 11 is connected to the first global DRAM controller 12 and the first shared DRAM controller 13, respectively.

The second processor core 20 has a second local DRAM controller 21, a second global DRAM controller 22, and a second shared DRAM controller 23, and the second local DRAM controller 21 is connected to the second global DRAM controller 22 and the second shared DRAM controller 23, respectively.

The third processor core 30 has a third local DRAM controller 31, a third global DRAM controller 32, and a third shared DRAM controller 33, wherein the third local DRAM controller 31 is connected to the third global DRAM controller 32 and the third shared DRAM controller 33, respectively.

The fourth processor core 40 has a fourth local DRAM controller 41, a fourth global DRAM controller 42, and a fourth shared DRAM controller 43, wherein the fourth local DRAM controller 41 is connected to the fourth global DRAM controller 42 and the fourth shared DRAM controller 43, respectively.

The first local DRAM controller 11, the second local DRAM controller 21, the third local DRAM controller 31 and the fourth local DRAM controller 41 are connected by signals, so that the cooperation of each processor core is realized through a distributed memory management strategy, and the efficient scheduling and consistency of global memory access are ensured.

The four processor cores are respectively allocated with an independent private DRAM memory space, specifically, the first processor core 10 corresponds to a first private DRAM memory space 51, and the first private DRAM memory space 51 is connected to the first shared DRAM controller 13. The second processor core 20 corresponds to a second private DRAM memory space 52, the second private DRAM memory space 52 being connected to the second shared DRAM controller 23. The third processor core 30 corresponds to a third private DRAM memory space 53, the third private DRAM memory space 53 being connected to the third shared DRAM controller 33. The fourth processor core 40 corresponds to a fourth private DRAM memory space 54, the fourth private DRAM memory space 54 being connected to the fourth shared DRAM controller 43.

The four processor cores have shared DRAM memory 55 in common. Wherein the shared DRAM memory space 55 is respectively connected to the first global DRAM controller 12, the second global DRAM controller 22, the third global DRAM controller 32, and the fourth global DRAM controller 42.

As shown in fig. 2 and 3, there are 8 processor cores in the logic chip 6, each local DRAM controller corresponding to each processor core forming a local DRAM controller cluster. Each local DRAM controller is connected to each memory space in the DRAM chip 7 by a shared DRAM controller or a global DRAM controller.

As shown in fig. 4, the logic chip 6 is shown with 2 processor cores, namely, the processor core 61 and the processor core 62, respectively, and the processor core 61 or the processor core 62 is connected with each memory space in the DRAM chip 7 in a Hybrid Bonding manner.

Since the DRAM chip 7 is vertically stacked on the logic chip 6 by the three-dimensional stacking memory technology, the hybrid bonding technology is adopted, and the method is better suitable for a three-dimensional stacking scene.

In some embodiments, the local DRAM controllers of the processor cores are connected by a NoC structure, so that the local DRAM controller of each processor core is responsible for managing local memory access requests thereof, and the NoC cooperates with the local controllers of other processor cores to ensure efficient scheduling and consistency of global memory access.

In some embodiments, the logic chip is further integrated with a logic layer, and each local DRAM controller is connected to the logic layer through an independent bus to form a plurality of parallel data transmission channels, so that a plurality of data transmissions are allowed to be performed simultaneously, and the data throughput is significantly improved.

The system-on-chip also includes a central arbiter for allocating bandwidth resources of the bus.

In some embodiments, the logical layer is a CPU or NPU.

In some embodiments, a memory data prefetch model is integrated in the local DRAM controller, the memory data prefetch model being used to predict future data accesses, data being loaded into the cache from either the shared DRAM memory space or the private DRAM memory space in advance using either the global DRAM controller or the shared DRAM controller.

The embodiments of the invention realize high-efficiency data read-write operation by adopting a distributed bus DRAM control technology, particularly by adopting a distributed control logic in a three-dimensional stacked structure, and optimizing a data path, and are particularly suitable for high-performance computing scenes such as intelligent driving and the like. In this way, not only is storage bandwidth and density increased, but the ability of the system to handle real-time data is also enhanced. The invention adopts a framework combining distributed control logic and multi-bus design, and further improves the efficiency of data read-write operation.

The present invention has been described in detail with reference to the embodiments of the drawings, and those skilled in the art can make various modifications to the invention based on the above description. Accordingly, certain details of the embodiments are not to be interpreted as limiting the invention, which is defined by the appended claims.

Claims

1. A chip design method, characterized in that the chip design method comprises:

Integrating a local DRAM controller, a global DRAM controller connected to the local DRAM controller, and a shared DRAM controller on a processor core, integrating several of the processor cores on a logic chip, and connecting the local DRAM controllers in the logic chip;

Vertically stacking DRAM chips having a shared DRAM storage space and a plurality of private DRAM storage spaces on the logic chip using a three-dimensional stacking storage technology, connecting each of the global DRAM controllers to the shared DRAM storage space, and connecting each of the shared DRAM controllers to the corresponding private DRAM storage space;

The local DRAM controllers are connected to each other via a NoC structure, so that the local DRAM controller of each processor core is responsible for managing its local memory access requests and working in conjunction with the local controllers of other processor cores via the NoC;

A logic layer is also integrated on the logic chip. Each local DRAM controller is connected to the logic layer via an independent bus to form multiple parallel data transmission channels, allowing multiple data transmissions to be performed simultaneously.

2. The chip design method according to claim 1, wherein the logic layer is a CPU or an NPU;

And/or, the chip design method further includes: designing a central arbitrator, wherein the central arbitrator is used to allocate bandwidth resources of the bus.

3. The chip design method according to claim 1, further comprising:

A memory data prefetch model is integrated in the local DRAM controller, and the memory data prefetch model is used to predict future data access, and the global DRAM controller or the shared DRAM controller is used to load data from the shared DRAM storage space or the private DRAM storage space to the cache in advance.

4. The chip design method according to claim 1, wherein the global DRAM controller is connected to the shared DRAM storage space using a metal layer copper interconnection;

The shared DRAM controller is connected to the corresponding private DRAM storage space by using metal layer copper interconnection to achieve connection.

5. A chip system, characterized in that the chip system comprises:

A logic chip, wherein the logic chip integrates several processor cores, each of which integrates a local DRAM controller, a global DRAM controller, and a shared DRAM controller, wherein the local DRAM controllers are connected to the global DRAM controller and the shared DRAM controller, respectively, and the local DRAM controllers of the processor cores are connected to each other;

a DRAM chip, wherein the DRAM chip is vertically stacked on the logic chip, the DRAM chip having a shared DRAM storage space and a plurality of private DRAM storage spaces, the shared DRAM storage spaces being respectively connected to the global DRAM controller, and each of the private DRAM storage spaces being connected to a corresponding one of the shared DRAM controllers;

The local DRAM controllers are connected via a NoC structure;

A logic layer is also integrated on the logic chip, and each of the local DRAM controllers is connected to the logic layer via an independent bus.

6. The chip system according to claim 5, wherein the logic layer is a CPU or an NPU;

And/or, the chip system further includes: a central arbitrator, wherein the central arbitrator is configured to allocate bandwidth resources of the bus.

7. The chip system as described in claim 5 is characterized in that a memory data prefetch model is integrated in the local DRAM controller, and the memory data prefetch model is used to predict future data access, and use the global DRAM controller or the shared DRAM controller to load data from the shared DRAM storage space or the private DRAM storage space to the cache in advance.

8. The chip system according to claim 5, wherein the global DRAM controller is connected to the shared DRAM storage space by using a metal layer copper interconnection;