[go: up one dir, main page]

CN120217995A - Chip design system and method based on multi-agent - Google Patents

Chip design system and method based on multi-agent Download PDF

Info

Publication number
CN120217995A
CN120217995A CN202510314033.1A CN202510314033A CN120217995A CN 120217995 A CN120217995 A CN 120217995A CN 202510314033 A CN202510314033 A CN 202510314033A CN 120217995 A CN120217995 A CN 120217995A
Authority
CN
China
Prior art keywords
module
agent
hardware core
core module
chip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510314033.1A
Other languages
Chinese (zh)
Inventor
贾天宇
闫沛然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202510314033.1A priority Critical patent/CN120217995A/en
Publication of CN120217995A publication Critical patent/CN120217995A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/392Floor-planning or layout, e.g. partitioning or placement
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/327Logic synthesis; Behaviour synthesis, e.g. mapping logic, HDL to netlist, high-level language to RTL or netlist
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/02System on chip [SoC] design
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/08Intellectual property [IP] blocks or IP cores

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Architecture (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

本申请实施例提供了基于多智能体的芯片设计系统及方法。其中,系统中包括多个智能体,通过该多个智能体协作执行如下步骤:接收用户的设计请求,并基于设计请求中包括的芯片设计需求信息,执行整个芯片设计处理流程,从而生成芯片设计文件(包含芯片版图信息),以用于后续芯片实际制造。可见,本申请方案实现了将如基于LLM的智能体应用于芯片设计全流程探索中,这可以减少人为参与,提高芯片设计效率。

The embodiment of the present application provides a chip design system and method based on multi-agents. The system includes multiple agents, through which the following steps are performed in collaboration: receiving the user's design request, and based on the chip design requirement information included in the design request, executing the entire chip design process, thereby generating a chip design file (including chip layout information) for subsequent actual chip manufacturing. It can be seen that the present application scheme realizes the application of LLM-based agents to the exploration of the entire chip design process, which can reduce human participation and improve chip design efficiency.

Description

Chip design system and method based on multiple intelligent agents
Technical Field
The application relates to the technical field of chips, in particular to a chip design system and method based on multiple intelligent agents.
Background
With the development of the generated artificial intelligence, a Large Language Model (LLM) shows remarkable intelligent capability, and the efficiency and productivity of various industries can be improved. For example, in the field of chip design, research has explored the ability to use LLM for chip design to reduce chip design complexity and design cycle time. However, in the existing research, when LLM is used for chip design, there are chip designs that are limited and can only be used for chip designs with relatively low complexity, such as pipeline designs or simpler Finite State Machine (FSM) designs, and in addition, the application of LLM is often limited to a specific stage of a chip design flow, and the application of LLM in the exploration of the whole chip design flow is omitted.
Disclosure of Invention
In view of the above-mentioned problems in the background art, embodiments of the present application provide a multi-agent-based chip design system, method, apparatus and storage medium to solve or at least partially solve the above-mentioned problems. Wherein, the
In a first embodiment, the present application provides a chip design file generation system. The system includes a plurality of agents;
Wherein the plurality of agents cooperatively perform the following:
receiving a design request of a user, wherein the design request comprises chip design requirement information;
And executing chip design processing based on the chip design requirement information to generate a chip design file, wherein the chip design file contains chip layout information.
In a second embodiment, the application provides a chip design file generation method. The method comprises the following steps:
the following operation steps are cooperatively performed by using a plurality of agents:
receiving a design request of a user, wherein the design request comprises chip design requirement information;
and executing chip design processing based on the chip design requirement information to generate a chip layout file.
In a third embodiment, the present application provides an electronic device. The electronic device comprises the chip design file generation system provided by the first embodiment of the application.
In a fourth aspect, the present application also provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a computer to implement the method for generating a chip design file according to the second embodiment of the present application.
In the technical scheme provided by the embodiment of the application, a plurality of agents are used for cooperation, a design request of a user is received, and based on chip design requirement information included in the design request, the whole chip design processing flow is executed, so that a chip design file (comprising chip layout information) is generated for the actual manufacture of a subsequent chip. Therefore, the scheme of the application realizes that the intelligent agent based on LLM is applied to the whole flow exploration of chip design, which can reduce the artificial participation and improve the chip design efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a chip design scheme according to an embodiment of the present application;
fig. 2a to 2d are schematic diagrams of a chip design system structure and a plurality of intelligent agent cooperation whole-flow exploration chip design principle in the system according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an open source hardware core library establishment principle according to an embodiment of the present application;
Fig. 4a and fig. 4b are schematic diagrams of an agent design hardware core module according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an agent generating module codes for a hardware core module according to an embodiment of the present application;
FIG. 6 shows two examples of chip layouts designed according to embodiments of the present application;
FIG. 7 is a bar chart showing the energy efficiency and performance improvement of a chip designed by the scheme of the application according to an embodiment of the application;
FIG. 8 is a bar chart showing the energy efficiency improvement of a chip designed according to the embodiment of the present application compared with a corresponding chip;
Fig. 9 is a schematic diagram of an NPU execution workflow and a process for generating a systolic array code according to the variation of the quantity of agents in the workflow according to an embodiment of the present application.
Detailed Description
In the field of chip design, a chip design often spans a long-stage process from chip architecture definition and related code development testing to physical implementation, especially for the current new generation of chip products, the complexity requirement is higher, and the design period is longer, so that the problem of design complexity and long design period faced by the chip design is caused, so that the problem of the design of the new generation of chip products becomes a main bottleneck. It is therefore important to explore agile design methods of a chip to accelerate the design cycle of increasingly complex chips.
At present, with the development of the generated artificial intelligence, the advent of large language models (Large Language Model, LLM) such as GPT-4 and Llama3 provides a brand new view angle and technical support for the agile design of chips of the new generation which are increasingly complicated. As shown in the exemplary flow chart of chip design with LLM with reference to fig. 1, a chip (ChatCPU) for chat robots is designed with LLM, chatCPU is a RISC-V CPU designed with LLM and implementing streaming. RISC-V CPU, among other things, refers to a Central Processing Unit (CPU) designed and implemented based on the RISC-V Instruction Set Architecture (ISA), which demonstrates how chip design flow can be accelerated by generative artificial intelligence techniques. In particular implementations, for the ChatCPU design, it is the EDA (Electronic Design Automation, electronic automation) physical implementation phase that is based on LLM execution script generation, task decomposition, and execution. However, the above-mentioned existing chip design scheme using LLM has a limitation:
1) LLM generated chips are relatively low in design complexity and limited. For example, only pipeline designs or simpler finite state machines (FINITE STATE MACHINE, FSM) can be implemented.
2) The application of LLM is often limited to a specific stage of the chip design flow, but the application of LLM to the exploration of the full flow of chip design is omitted. For example, applications are limited to RTL (REGISTER TRANSFER LEVEL ) code writing.
In order to solve the above problems, the embodiment of the application provides a chip design solution, and the basic idea is as follows:
Considering Open-source hardware (OSH) as another important innovative approach to support agile chip designs, and in recent years, many Open-source hardware designs have been released, which have significantly shortened the chip development cycle. By utilizing existing open source hardware designs, a designer can accelerate the development process, reuse existing modules, rather than design a module from scratch. Based on this, in the solution of the present application, the advantages of LLM and open source hardware are combined, and an end-to-end chip design solution is explored to realize a design that can be used for complex chips, such as a design that can be used for complex Domain Specific Architecture (DSA) or domain specific system on chip (DSSoC). DSA is a computing architecture that is optimized specifically for a certain class of applications or tasks, and that improves the efficiency of execution of a particular task by integrating specialized hardware accelerators. DSSoC is a chip design tailored to a particular task, typically employing a heterogeneous architecture that includes multiple accelerators to handle heavy workload. Conventional DSSoC design processes are tedious and complex, requiring extensive expertise and complex design tools for each design stage. In order to solve the problems, the method adopts a multi-agent system based on LLM to realize the instruction chip architecture exploration, the related code generation, the verification and the like, and simultaneously, utilizes the open source hardware IP obtained from online available resources to accelerate the integration of chips (such as DSSoC) and shorten the design period through LLM.
Wherein, the related codes comprise hardware description language (HDL, hardware Description Language), HDL. HDL code may describe a circuit in a variety of ways, including behavior level, RTL level, gate level, and the like. RTL code is a specific type of HDL code that is specifically used to describe the register transfer level behavior of a circuit. And each of the above LLM-based agents is given a specific role, memory capability, planning capability, and tool use capability so that the agents can perform a specified task. In addition, in the present application, the end-to-end chip design process is partitioned and assigned to different agents that are responsible for the specific tasks specified. And, regarding the scheme provided by the application, verification has been carried out through two DSSoC design examples, the chip design cycle of the examples is short, for example, for the Internet of things and mobile equipment, the design cycle is only 2-4 weeks, and through the architecture design in the specific field DSSoC, the design efficiency improvement of about 23.81-32.43 times is realized compared with the existing SoC (system on chip) with similar performance.
In order to enable those skilled in the art to better understand the present application, the following description will clearly and completely describe the technical solution according to the embodiments of the present application according to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Furthermore, in some of the flows described in the specification, claims, and drawings above, a plurality of operations occurring in a particular order may be included, and the operations may be performed out of order or concurrently with respect to the order in which they occur. The sequence numbers of operations such as 101, 102, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types. The term "or/and" in the present application is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a or/and B indicates that a may exist alone, a and B exist together, and B exists alone. It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or system. Without further limitation, an element defined by the phrase "comprising one of the elements" does not exclude the presence of additional identical elements in a commodity or system comprising the element. Furthermore, the embodiments described below are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The following describes the technical schemes provided by the embodiments of the present application.
First, words involved in the embodiments of the present application will be described. It will be understood that this description is for a clearer understanding of embodiments of the application and is not necessarily to be construed as limiting the embodiments of the application.
Agent in the present application, an agent refers to a software class agent designed based on LLM for processing a series of specific tasks.
The chip design is IC (INTEGRATED CIRCUIT ) design.
EDA (Electronic Design Automation ) physical implementation in an Electronic Design Automation (EDA) process, the physical implementation stage refers to the process of converting a logic design into a physical layout, the goal of which is to generate a layout file that can be actually produced in a semiconductor manufacturing process. The layout file may be, but is not limited to, GDSII (Graphic Data System II) file format. The GDSII file format is a standard file format (binary file format) widely used in the semiconductor industry, and is dedicated to the exchange of physical layout data for storage Integrated Circuit (IC) designs, where all the geometric information and hierarchy of the chip design is contained, a key step in delivering the design from EDA tools to the manufacturing plant.
A hardware core refers to a module that can implement a specific function, and can be used directly or after a little modification to integrate into a new Chip design, such as a larger System on Chip (SoC) design. The hardware core may be a simple logic unit or a complex processing unit. And the hardware can be designed in an internal development mode or obtained from the outside.
Hardware IP (intellectual property core), which is a special form of hardware core, includes not only design files and technical documents, but also related intellectual property protection. Hardware IP is typically developed by a third party and authorized for use by users through a licensing agreement.
It should be understood from the foregoing that a hardware core includes a hardware IP (also referred to as a hardware IP core or a hard core), and refers to a number of standardized hardware core modules that are designed in advance (have been physically designed, laid out, etc.), and are usually provided in the form of RTL codes, and can be reused in different chip designs. In particular, the hardware core may be a completed processor core, or may be a dedicated circuit module or the like that performs a particular function. For example, the hardware core may be, but is not limited to, a processor core (e.g., ARM, RISC-V), a memory controller, an input/output interface (e.g., USB, HDMI), etc., which have been validated and optimized, are functionally stable, and can be used directly or after modification to be integrated into a new chip design.
The hardware core library refers to a collection of hardware cores. In a library of hardware cores, each hardware core corresponds to an independent "building block," which a designer can build a complex chip system by combining.
In an embodiment of the present application, an open source hardware core library, such as an Open Source Hardware (OSH) IP library, is used. OSH IP libraries have been developed for the public, allowing anyone to view, modify, distribute and use the hardware IP in the library. And, in the following description of the embodiments provided in the present application, the "hardware core" is referred to as "hardware core module".
Fig. 2a shows a schematic diagram of the chip design system according to the present application. The system is a multi-agent system based on LLM. In specific implementation, the system is deployed on a corresponding electronic device, where the electronic device may be a terminal device or a server device, and the terminal device may be, for example, a smart phone, a desktop computer, a notebook computer, an intelligent wearable device, and the server device may be, for example, a single server, a cluster formed by multiple servers, a cloud server, a virtual server, and the like. Wherein, no matter whether the system is deployed on a terminal device or a server device, a corresponding system entry is provided so that a user can interact with the system, thereby realizing various functions and services (such as an automation service of chip design, etc.) of the system. The system portal may be, but is not limited to, a web page, an application page, or an applet page, among others. In practical applications, the web page is generally used as a main portal for user interaction with the system, because the web page is used as a system portal, a user can access the system only by a browser without installing a specific application program, and the web page can be used on various devices such as a desktop computer, a notebook computer, a tablet computer, a smart phone and the like, and in addition, because all users access the same web page version, unified management and updating are facilitated.
As shown in fig. 2a, the chip design system provided in this embodiment includes a plurality of agents. The intelligent agent cooperation can effectively complete the whole chip design process in a high-quality automatic manner.
In the chip design process, each agent is responsible for a different design task, and the design task that each agent is responsible for may be determined by the identity information configured for it. Specifically, the identity information of each agent mainly comprises a plurality of components including character specifications, action guides, partners and output formats for ensuring that the actions and functions of the agents meet design requirements. The above role specification defines the role and role of an agent in the overall system, describing the main tasks, goals, and their location and relationship in the system, etc. of an agent. The behavioral guidelines then specify the behavioral criteria and policies that the agent should follow in performing the task to ensure that the agent's behavior is consistent and expected. The partner refers to other agents or entities that need to cooperate in the process of the agent executing the task. The output format defines the output content and format that the agent should generate after completing the task, which ensures that the agent's output can be properly parsed and used by other agents or systems.
Illustratively, assuming there is one second agent 22 (i.e., the "architecture expert" agent shown in FIG. 2 a) in the present system for the provided automated chip design services, the role specifications, behavior guidelines, partners and output formats of that second agent 22 may be defined to include the following, respectively:
1) The role specification includes the main tasks of selecting the appropriate hardware IP and optimizing architecture according to the Task Flow Graph (TFG), the goal of ensuring that the design meets PPA (power consumption, performance, area) requirements, the rights that all hardware core libraries in the system can be accessed and design documentation can be modified, and the relationships to other roles of cooperating with third agent 23 (for writing agents, such as "RTL writer" agents in particular), fourth agent 24 (for testing agents, such as "RTL tester" agents in particular) to provide a detailed module description documentation of the hardware core modules.
Wherein the Task Flow Graph (TFG) is a directed acyclic graph, each node representing critical information of a task block. The task blocks are referred to as functional modules in other embodiments below. The decomposition structure, execution sequence and dependency relationship of a task (such as an algorithm task) can be clearly shown through the task flow diagram.
2) The behavior guideline comprises the contents that decision rules are used for selecting the most suitable hardware core module according to a Task Flow Graph (TFG) and optimizing according to PPA indexes, priority setting is used for preferentially processing tasks which directly affect the system performance, such as module selection on a key path, exception processing is used for searching an alternative scheme or notifying a project manager (management agent 10) if a certain hardware core module is not available, and a communication protocol is used for exchanging data with a third agent 23 and a fourth agent 24 by using a RESTful API.
3) The partners include a third agent 23 (which is responsible for converting module description documents into Verilog codes, the second agent 22 needs to provide detailed module description documents to the third agent 23), a management agent 10 (which is responsible for coordinating the overall chip design progress, other agents need to report progress and questions to the management agent on a regular basis, etc.).
4) The output formats include a JSON format for the module description document whose output type is generated, a structured data format, a metadata format, and an interface specification (e.g., the module description document is sent to the third agent 23 and the fourth agent 24 via RESTful API).
In this embodiment, the cooperation of the plurality of agents specifically performs the following steps:
101. receiving a design request of a user, wherein the design request comprises chip design requirement information;
102. And executing chip design processing based on the chip design requirement information to generate a chip design file, wherein the design file contains chip layout information.
In particular implementations, the plurality of agents includes a management agent 10 (being the "manager" agent shown in FIG. 2a, etc.) and at least one execution agent. The management agent is used for coordinating and managing each task and subtask in the whole design flow, and ensures that the tasks are efficiently and orderly completed. For example, the management agent 10 may decompose the entire chip design task into a plurality of sub-tasks and assign the sub-tasks to the other agents that are most suitable, and may monitor the conditions of the other agents to complete the task, etc. At least one of the executing agents includes the second agent 22, the third agent 23, the fourth agent 24, etc. which appear as described above.
Considering that having too many agents in one system increases the complexity of communication, while too few agents can lead to ambiguous responsibility and reduced output quality, six different agents are defined for deployment in the present system for a balancing agent. The six agents include one management agent 10 and five execution agents. Five execution agents include a first agent 21 (which is a functional agent, such as an "algorithmic expert" agent), a second agent 22 (i.e., an "architecture expert" agent), a third agent 23 (which is an "RTL writer" agent), a fourth agent 24 (which is an "RTL tester" agent), and a fifth agent 25 (i.e., an "EDA expert" agent for EAD physical implementation).
The detailed description of the functions of each agent will be described in other embodiments below, and detailed descriptions thereof will be omitted herein.
Also, in this embodiment, different agents may be equipped with different LLMs. For example, for third agent 23, a fine-tuned Llama 3.1 70B model may be used, which uses an open-source Verilog dataset and LoRA (Low-Rank Adaptation) tuning. LoRA is an efficient tuning method for large-scale pre-training models, which aims to reduce the number of parameters and computational resources required for tuning by low-rank matrix decomposition. For knowledge-intensive tasks, the first agent 21 may be solved using a mental chain (Chain of Thought, coT) method, which is a method for enhancing LLM reasoning capabilities. For labor intensive tasks, such as third agent 23 and fourth agent 24, the method ReAct (Reasoning AND ACTING) can be used for planning, where ReAct is a framework that combines reasoning (Reasoning) and actions (Acting) to aim at enhancing the ability of LLM to solve complex problems. For agents that require a memory module to support decision making, such as third agent 23, the relevant information may be stored in the memory module within for quick access.
In addition, in some agents, the required tools are also provided/embedded. And in order to improve the utilization efficiency of the tools, a special API (Application Programming Interface ) interface is designed for each tool. Each API is a configuration or script file for a particular set of operating tools that serves as a bridge for agents to call those tools. For example, the second agent 22 cooperates with the management agent 10 and the first agent 21, and has a memory module embedded with a hardware core library (e.g., OSH IP library) for efficient access, and further, the second agent 22 is provided with simulation tools to support intelligent decision making and design optimization.
In combination with the above, in implementing an automated chip design process, six agent functions included in the system are as follows:
1. Management agent 10 (i.e., the "manager" agent shown in FIG. 2 a)
The management agent is used for receiving a design request of a user, executing chip design task decomposition based on chip design requirement information contained in the design request to obtain a plurality of subtasks, and distributing each subtask to other agents (second agents) in adaptation.
In particular implementations, a user may submit a design request through a system portal (e.g., a web page) of the system. For example, a user may input a chip design prompt (e.g., input "please design a dssoc that is suitable for IOT end") through a web page of the system, where the prompt includes chip design requirement information, and then click to submit, so as to send a design request to the system. The design request may be received by a management agent in the system. After receiving the design request, the management agent analyzes the chip design requirement information provided by the user from the request, then decomposes the complex whole chip design task based on the chip design requirement information, and outputs a plurality of decomposed sub-tasks to distribute the sub-tasks to other corresponding agents for processing. In addition to decomposing tasks for requests, management agents may also perform other operations, such as recording time stamps, user identification (e.g., user ID), and the like.
The chip design requirement information includes, but is not limited to, design targets such as required functional information, constraint conditions, and the like. The constraints include, for example, chip application scenarios (e.g., internet of things (IOT)), chip types (e.g., DSSoc), PPA (Performance), power consumption (Power) and Area (Area) requirements of the chip, and interface standards.
The decomposed multiple subtasks comprise a functional relation diagram generating task, an architecture generating task, a code writing task, a testing task and a physical realization task. The above-described functional relation diagram generation task may be generated based on functional information contained in the chip design requirement information.
When the management agent performs a plurality of subtask assignments, specifically, a functional relationship diagram generating task is assigned to a first agent 21, an architecture generating task is assigned to a second agent 22, a code writing task is assigned to a third agent 23, a testing task is assigned to a fourth agent 24, and a physical realization task is assigned to a fifth agent 25.
It should be noted that, in addition to the above-mentioned functions, the management agent may perform other functions, and reference may be made to the related content in other embodiments for other functions that may be performed.
2. Five executing agents (including a first agent 21, a second agent 22, a third agent 23, a fourth agent 24, a fifth agent 25)
With continued reference to FIG. 2a, the overall chip design mainly includes three phases, an architecture definition phase, a code generation and verification phase, and an EAD physical implementation phase. In the chip architecture definition phase, it is implemented by two LLM-based agents, i.e., as shown in fig. 2b, the chip architecture is explored and defined by the two agents, the first agent 21 and the second agent 22, cooperatively performing four steps, 1) task analysis, 2) infrastructure generation, 3) architecture assessment, 4) architecture definition. The task analysis 1) mainly refers to analysis of algorithm information provided by the first agent 21. The steps 2) to 4) are mainly implemented by the second agent 22 based on the output of the first agent 21. The code generation and verification stage is realized by the cooperation of the third agent 23 and the fourth agent 24, and the EDA physical realization is realized by the fifth agent 25.
The functions of each execution agent are specifically realized as follows:
2.1 first agent 21 (e.g. an "algorithmic expert" agent)
The first agent 21 is specifically configured to generate a functional relationship diagram based on the inputted algorithm information and send the functional relationship diagram to the second agent 22 when executing the task of generating the functional relationship diagram. Each node in the functional relation diagram represents a functional module, each edge represents a data dependency relationship between two functional modules, and each functional module is responsible for completing part of algorithm tasks in the algorithm information.
In specific implementation, the algorithm information includes algorithm codes. Of course, in other embodiments, the algorithm information may also include other information, such as an algorithm document. An algorithm document refers to a file or collection of materials detailing the design, implementation, method of use, and performance characteristics of an algorithm, which is intended to assist developers, users, and other related personnel in understanding the principles of operation of the algorithm. And, the first agent 21 may analyze the inputted algorithm code (or algorithm code and algorithm document) and generate a corresponding functional relationship diagram using a software analysis tool. The functional relationship graph may be a Task Flow Graph (TFG).
In order to realize the architecture exploration of the workload, the steps of generating the functional relation diagram in this embodiment are specifically as follows:
First, the first agent 21 parses an inputted algorithm code from a functional layer, for example, specifically, may parse the algorithm code in combination with an algorithm document, so as to fully understand the function and structure of the algorithm, and divide the algorithm code into a plurality of functional modules, where each functional module may include a single functional unit or an aggregation of a plurality of functional units. Each functional unit corresponds to at least a portion of the algorithm code. Whereby each functional module and/or functional unit is responsible for the execution of a specific piece of code.
In example 11, in vision Transformer algorithm code, the transducer code is considered as a single functional unit and the single functional unit is taken as a functional module. Wherein Vision Transformer (ViT) is a deep learning model based on a transducer architecture.
Example 12 contains a piece of code in the algorithm code that implements a matrix multiplication algorithm, where the inputs are two matrices a and B, the outputs are matrix C, and the steps involved in this process are triple-loop computing the product and accumulation for each element. This piece of code can be broken down into a number of functional units and each functional unit is responsible for the execution of a part of the specific code, an initializing unit (for initializing the result matrix C), a calculating unit (for implementing the specific calculation logic of the matrix multiplication), an auxiliary function unit (verifying the validity of the input matrix), a post-processing unit (performing the necessary processing (e.g. normalization) of the result matrix). The initialization unit, the auxiliary function unit and the post-processing unit are aggregated into a functional module m1, and the functional module m1 is responsible for initializing a result matrix C in a matrix multiplication algorithm, verifying the validity of an input matrix, and performing algorithm tasks of further processing (such as normalization) the result matrix. And determining a single computing unit as a functional module m2, the functional module m2 being responsible for implementing specific computational logic tasks in the matrix multiplication algorithm.
The first agent 21 then uses built-in performance analysis tools (e.g., including gprof or PyTorch Profiler) to analyze the execution time, call frequency, basic operation type and its proportions, and data flow dependencies of the functional modules. The analysis generates reports from which a functional relationship graph (as task flow graph TFG) can be generated, e.g. in JSON format.
From the above, each node in the functional relationship diagram is understood as a workload node, which represents at least part of the algorithmic tasks in the algorithmic information (in particular, the execution tasks of part of the algorithmic code).
2.2 Second agent 22 ("architecture expert" agent)
The second agent 22 has a memory module. The memory module has the main function of giving the LLM memory capability so that the LLM memory capability can store and memorize the past interaction content. A memory module is typically made up of a plurality of memory components, each of which is responsible for a different memory task. In this embodiment, a hardware core library (e.g., OSH IP library) is embedded in the memory module, and each hardware core module in the hardware core library is developed in advance.
The second agent 22 is specifically configured to allocate a hardware core module to each functional module in the functional relationship graph to obtain module information of at least one hardware core module, perform an evaluation operation according to the module information of the at least one hardware core module, and search for a matched hardware core module from the hardware core library according to the module information of the at least one hardware core module after the evaluation. The hardware module is for integration into a system-on-chip design.
In a specific implementation, as shown in fig. 2b, the infrastructure is generated for step 2) shown by the second agent 22, which is understandably that a hardware core module is allocated to each functional module in the functional relationship diagram, so as to obtain module information of at least one hardware core module. The module information of the hardware core module comprises hardware configuration information and corresponding functional modules. The hardware configuration information includes key attribute information such as storage attributes (cache size, memory hierarchy, etc.), computation attributes, bus bandwidth, etc. The corresponding functional module is a workload (i.e., a task load, which is a node in the functional relationship graph) of the hardware core module, which indicates an algorithmic task that the hardware core module is responsible for executing (e.g., the algorithmic task is at least a portion of code in the algorithmic code). In addition, the module information of the hardware core module may also include other information, such as a hardware identifier (e.g., a hardware name).
In this embodiment, when implementing the step 2), the second agent selects at least one suitable hardware core module, such as one or more of a CPU (central processing unit), a GPU (graphics processor), an NPU (neural network processing unit), etc., according to a functional relation graph (TFG), so as to generate an infrastructure on chip architecture (e.g., an infrastructure SoC architecture) document. Specifically, the second agent takes the functional relation diagram as the input of the LLM model therein, the LLM model internally executes matching between the functional modules and the hardware core modules according to the characteristics of each functional module in the functional relation diagram (i.e. it can be understood that the tasks shown in the diagram are matched with the architecture), so as to select an appropriate hardware core module for each functional module to implement, and outputs an underlying chip architecture document according to the selected hardware core module. When selecting a proper hardware core module for a functional module, it is specifically determined what configuration (attribute) the functional module should match the hardware core module, for example, a matched GPU needs to have a large memory and/or a strong computing capability, so as to generate a hardware table containing hardware configuration information and task loads (i.e., corresponding functional modules), where the hardware table is a basic chip architecture document.
For example, it is assumed that the functional module m1 and the functional module m2 in the foregoing example 12 are included in the functional relationship diagram. Depending on the characteristics of the functional module m1, the hardware core module selected for this functional module m1 may be a CPU, since this part of the task of the functional module m1 mainly involves initialization, logic judgment, data verification and post-processing, which are relatively simple and do not require a large amount of computation, and are suitably executed on the CPU. Depending on the characteristics of the functional module m2, the hardware core module selected for the functional module m2 may be a GPU or a dedicated accelerator (e.g. DSP, TPU, etc.), and the task of the functional module m2 requires a large amount of parallel computation, requiring hardware with a high computational power, and is therefore suitable for execution on the GPU or the dedicated accelerator. The CPU, GPU (or dedicated accelerator) selected according to the above will output a hardware table as the basic chip architecture document.
It should be noted here that different functional modules may be assigned to different hardware core modules or may also be assigned to the same hardware core module.
From the above, that is, the basic chip architecture document in this embodiment is understood to be a hardware table containing module information of the hardware core module (such as the hardware core module identifier, configuration information of the hardware core module, and corresponding functional modules, etc.). For example, the hardware table may be seen as shown in table 1 below:
TABLE 1
The Acc (Accumulator) is a special register for storing intermediate results of calculations, which are commonly used for arithmetic and logical operations.
Further, the second agent will perform the step 3) architecture assessment shown in the figure, which step 3) is specifically and understandably performing the architecture assessment based on the basic chip architecture document (i.e. the module information of the at least one hardware core module). Specifically, the second agent performs DSE (DESIGN SPACE Exploration ) according to the basic chip architecture document by using a simulation tool (simulator) configured therein, so as to evaluate PPA (power consumption, performance, area) of the basic chip architecture and optimize architecture parameters, thereby enabling the architecture performance, function, area and the like to reach relatively better balance, and then determining that the basic chip architecture meets the requirements.
Still further, after the evaluation is passed, the second smart agent performs the step 4) architecture definition shown in the figure based on the simulation report (PPA report) to generate a chip architecture and output an architecture document of the chip architecture. This step 4) is understandably comprised of retrieving a search matching hardware core module from a hardware core library for a subsequent processing step based on module information of at least one hardware core module.
In the implementation, the matched hardware core module can be searched from the hardware core library according to the hardware identifier (such as a hardware name), hardware configuration information and the like contained in the module information of the hardware core module.
The hardware core library is, for example, an Open Source (OSH) IP library. The open source IP is a key method for accelerating chip design, and in order to realize flexible retrieval and modification of the existing IP, in the embodiment of the present application, LLM is used to analyze and integrate the existing open source IP, so as to form a structured OSH IP library (such as an IP library containing hardware IP) for an agent (such as the second agent 22, etc.) to query and further modify and develop.
Fig. 3 shows an example flow chart for establishing an OSH IP library. The process for establishing the OSH IP library comprises the following steps:
(1) Documents and open source code are collected and organized code and documents are generated using LLM to generate structured text.
First, for each target open source IP (e.g., CVA6 CPU), its related content is collected, including related IP documents and open source IP codes. The IP documents may include technical documents such as user manuals, API documents, design specifications, and the like. The open source IP code may include hardware description language code such as Verilog, VHDL, systemC.
The collected documents and open source code are then parsed using LLM to generate organized code and documents. Specific steps may be, but are not limited to, the following:
11 Extracting module identification and corresponding function description from the document and the open source code, wherein the identification can be name. For example, for a CVA6 CPU, the names and functions of the various units (e.g., ALU, FPU, cache, etc.) in the module are identified in addition to the module names.
12 Interface definition between extracting modules, and definitely inputting and outputting signals and data types thereof. For example, the interface between the CPU and the memory controller.
13 State machine descriptions for modules containing state machines, extracting state transition diagrams and their descriptions, which help understand the workflow and behavior of the modules.
14 Such as including relationships between individual units in a module, and relationships with other modules (external modules), etc.
15 Code such as the code of each unit in the module.
Finally, based on the information obtained by 11) to 15), a structured text is generated, wherein the content of the structured text comprises names (such as module names and names of all units in the module), function descriptions, interface descriptions, state machine descriptions, codes (such as module codes and/or codes of all units in the module), relations (such as relations with other modules and relations of all units in the module) and the like.
(2) Document for constructing graph structure (namely, document for module graph structure)
From the structured text generated as described above, a document of a graph structure (such as the structured IP document shown in fig. 4) is formed. In the document of the graph structure, a node may code a unit in the current target open source IP (e.g., CVA6 CPU) module and its attributes, and edges represent the relationship between different units.
The same methods given in the above (1) - (2) can be applied to different open source IP modules to generate corresponding organized documents and codes, so that a document with a graph structure is finally formed.
(3) Creating a brief summary description (understandably, a description of the key functions of the module)
Next, a brief summary description is created for each IP block, capturing its core functions and features.
For example, the abstract description of CVA6 CPU is a high performance RISC-V processor supporting RV64IMAFDC instruction set with multi-stage pipelining and dynamic branch prediction functions.
(4) Classification and integration
The IP libraries are classified and integrated according to functions and purposes, and the IP modules are grouped into corresponding categories (such as CPU, GPU, DSA, IO and the like), so that an open source IP library with a comprehensive structure is formed, and the IP modules with a wide range are covered.
Illustratively, the IP blocks may be grouped into, but are not limited to, the following respective categories:
CPU, such as IP module including CVA6, rock Chip, etc.
GPU such as OpenCL GPU, NVIDIA CUDA Core and other IP modules.
DSA (Domain-Specific Accelerator) IP modules such as TPU for machine learning, dedicated accelerators for encryption algorithms, etc.
IO (Input/Output) IP modules such as USB controllers, PCIe interfaces and the like.
The structured OSH IP library formed in the steps (1) - (4) can provide great support for rapid query and flexible modification of subsequent intelligent agents.
For example, when using a CPU module (such as module2 shown in fig. 3) in the OSH IP library, the second agent 22 may delete redundant units therein, modify one or more units, and/or add new units thereto, according to actual requirements.
Still further, as shown in fig. 2b, when the second agent 22 is performing the architecture definition of step 4), if no matching module is found from the hardware core library for an expected required hardware core module (e.g., a special NPU design), the second agent will use the LLM configured therein to design and generate the new hardware core module.
The design generating method comprises the steps of selecting a hardware core module with the same type from a hardware core library as an initial hardware core module, and modifying the initial hardware core module to design the new hardware core module. The new hardware core module is designed through layering driving and data flow driving, wherein layering driving means that a layering method is adopted to refine and modify layers by layers in a top-down mode, and data flow driving means that any hardware unit on each layer is refined and modified according to a data transmission path. The data stream driving design is performed inside the hardware unit, which means that after data transmission enters a hardware unit, the data will sequentially pass through a plurality of processing units according to a predetermined data path, and each processing unit is responsible for a specific task and transmits a processing result to the next processing unit. And, to implement data flow driving within a unit, a data interface is defined between each processing unit to ensure that data can be efficiently transferred from one processing unit to the next.
Illustratively, it is assumed that a functional module represented by a node in a functional relationship graph (TFG) for which it is determined to be assigned to one NPU for execution is responsible for tasks involving a large number of matrix multiplication operations (e.g., CNN algorithms, sparse computations, etc.) with high performance requirements. That is, according to the functional relationship diagram, a baseline architecture with a special NPU domain-specific architecture (DSA) is finally generated, and a corresponding node in the functional relationship diagram (the functional module represented by the node is responsible for tasks including a large number of matrix multiplication operations with high performance requirements) is allocated to the NPU Intellectual Property (IP). However, the expected NPU module cannot be found in the existing hardware core library, but the NPU module with the same type can be found, and for this purpose, the NPU module with the same type can be found as the reference initial NPU module. And then, adopting a layering method to gradually refine and modify the initial NPU module in a top-down mode, and finally enabling the modified reference NPU module to support operations such as CNN algorithm, sparse calculation and the like. Specifically, the second agent 22 determines the design requirements for the module information of the NPU module, where the design requirements are for designing an NPU supporting Convolutional Neural Network (CNN) algorithm and sparse computation, and then determines the modification scheme layer by layer in a top-down manner according to the design requirements, and may feed back the modification scheme to the management agent, so that the management agent evaluates the modification scheme, and after the evaluation passes, the second agent 22 performs the modification operation according to the modification scheme. The feedback to the management agent can be performed after the modification scheme of one level is determined, or the feedback to the manager can be performed after the modification schemes of all levels are determined. And, modifying operations included in the modification scheme include deleting some existing redundant functional units, and/or adding new functional units, and/or modifying one or more functional units.
As shown in FIG. 4a, the initial NPU module is divided into three levels, namely a first Level1, a second Level2 and a third Level3, wherein the next Level corresponds to a certain unit in the previous Level 1. The modification process for these three layers may be as follows:
firstly, according to tasks required to be responsible, a CNN algorithm and sparse calculation are supported if required, and a modification scheme of the first-Level 1 is determined to be that redundant units are deleted, wherein other units are redundant except for the calculation units, the storage units and the interconnection buses. After the modification scheme is fed back to the management agent 10 for evaluation, the second agent 22 deletes other units therein, thereby completing the modification of the first Level 1.
Then, the second Level2 is modified, where the second Level2 corresponds to the computing units on the first Level1, that is, the second Level2 includes each unit in the computing units. For the second Level2, in order to realize the CNN algorithm, the determined modification scheme is that an Img2col unit needs to be added in the Level for supporting Img2col technology, and data transmission is determined to enter the Img2col unit through data stream driving design inside the Img2col unit, and then the four processing units, namely an input configuration unit, a filling unit, an image unfolding unit and a matrix control unit, need to be sequentially conducted according to corresponding data paths. After the modification scheme is fed back to the management agent 10 for evaluation, the second agent 22 will newly add an Img2col unit on the second Level2, and the Img2col unit includes four processing units of an input configuration unit, a filling unit, an image expansion unit, and a matrix control unit, where the data dependency relationship between the four processing units is that the input configuration unit- > the filling unit- > the image expansion unit- > the matrix control unit, and an arrow "-" appearing in the present application indicates a data flow direction. Img2col is a commonly used matrix transformation technique, which is widely used in Convolutional Neural Networks (CNNs), and particularly when implementing convolutional layers, by expanding a local area of an input image into a matrix array (column) so that a convolution operation can be converted into matrix multiplication, thereby simplifying a calculation process and enabling more efficient acceleration using a highly optimized linear algebraic library (such as BLAS). And the sub-units included in the Img2col unit function as an input configured to be responsible for receiving and parsing external input parameters such as image size, convolution kernel size, stride, padding, etc., which will be used to guide the subsequent image expansion process, a padding unit to be responsible for adding zero-valued pixels around the input image to ensure that the convolution operation does not lose edge information, an image expansion to be responsible for expanding the input image data according to a specified convolution window and storing in a matrix array, and a matrix control to be responsible for managing the timing and status of the entire Img2col process, which ensures that the units are performed in the correct order and handles state transitions and error conditions that may occur.
Finally, the third hierarchical Level3 is modified. Specifically, in order to support sparse computation, according to data stream driving of sparse computation, a PE unit modification scheme is determined to be that a zero judgment detection unit and an MAC unit need to be added in the PE unit, and the data dependency relationship is the zero judgment detection unit- > MAC unit. After the modification is fed back to the management agent 10 for evaluation, the second agent 22 performs addition of the zero determination detecting unit and the MAC unit to the PE unit, thereby forming a new PE unit. Where a PE array refers to a collection of parallel PE units, each PE unit is a separate computational unit (e.g., a floating point arithmetic unit (FPU) or a dedicated accelerator (e.g., a matrix multiplication unit), etc.), that can perform basic arithmetic and logical operations. These PE units are typically used to perform certain types of computing tasks. The MAC unit mainly performs multiplication and accumulation operations, and is widely used for various computing tasks such as matrix multiplication, convolution operations in a Convolutional Neural Network (CNN), filters in signal processing, and the like.
After the modification is completed, the second agent 22 then invokes the simulation tool configured therein to perform simulation evaluation on the modified initial NPU module, determines that PPA (power consumption, performance and area) of the initial reference NPU module is compared with PPA required by the user, and performs iterative optimization by adjusting parameters and improving functions of the module according to the comparison result until the user requirement can be met. Wherein the PPA required by the user may be issued by the management agent 10 to the second agent 22. The second agent then feeds back the module document of the modified initial NPU module to the management agent 10, and the management agent 10 reviews the module modification design according to a set of predefined data flow inspection criteria. At the time of inspection, as shown in the left-hand diagram of fig. 4b, the management agent 10 may, for example, be concerned with one or more of cell redundancy, cell loss, potential data flow blocking or congestion, and the like. If any problems are found, the management agent 10 instructs the second agent 22 to make modifications. After the second agent 22 finally completes the modified design of the initial NPU module, the management agent 10 will flatten the entire modified initial NPU module for final module evaluation. Flattening a hardware core module (such as the modified reference NPU module herein) generally refers to converting a complex multi-level hardware core module structure into a single-layer, more easily understood and implemented form. For example, the flattened modified initial NPU module described above, in which the PE array, each processing unit in the Img2col unit, etc., are all in the top layer.
To sum up, as shown in the right-hand diagram of fig. 4b, during the process of modifying and designing a new hardware core module by the second agent 22, a feedback-based adaptive system is formed between the management agent 10 and the second agent 22. The second agent 22 performs the modular design exploration according to predefined guidelines (e.g., hierarchical and top-down) and data flow driven) while the management agent is responsible for the supervision and validation evaluation. Wherein the interaction (feedback, communication) between the second agent 22 and the management agent 10 is performed through structured module description text, including the summary context of the current module information and discussion communication.
After the design of a hardware core module is finally completed, if the modification of the initial NPU model is completed, the second intelligent agent determines that the designed hardware core module is the target hardware core module required by the expectation, and writes a structured module description document for the hardware core module to assist in subsequent RTL writing and the like. The content in the structured module description document includes, but is not limited to, module identification (e.g., module name, names of units in the module), function (e.g., key function description), structure hierarchy, interface signals, relationships with other external modules, relationships between units in the module, state machine description, and so on. In addition, the second agent 22 may also generate and output a corresponding module design summary document for the hardware module, the module design summary document including an overview of the module design process of the hardware core module.
The above describes a new hardware core module design implementation by taking a hardware core module selected from a hardware core library as an initial hardware core module as an example, and of course, in other embodiments, when a target hardware core module which is expected to be needed is not found out from the hardware core library, the second agent may also determine the corresponding initial hardware core module in other manners, for example, may directly determine and automatically generate a corresponding initial hardware core module in a hierarchical determination and data flow driving manner based on module information of the target hardware core module which is expected to be needed, and perform feedback evaluation to the management agent for the initial hardware core module, and then modify the initial hardware core module according to the problem of feedback of the management agent until the requirement is met.
Based on the above, the second agent 22 is further configured to generate a hardware core module as the target hardware core module based on the model information of the target hardware core module when a matching target hardware core module is not found from the hardware core library.
The second agent 22 is specifically configured to, when configured to generate a hardware core module as the target hardware core module based on the model information of the target hardware core module:
Determining an initial hardware core module, wherein the initial hardware core module is the same as the required target hardware core module in type;
modifying the reference hardware core module based on module information of the target hardware core module;
determining the modified reference module as the target hardware core module;
The initial hardware core module comprises at least one layer, each layer is provided with at least one hardware unit, the modification comprises at least one modification operation of the hardware units on the layer in a top-down mode, wherein the at least one modification operation comprises the steps of deleting redundant hardware units, adding hardware units and modifying hardware units, and when one hardware unit is added and/or modified, determining a processing unit to be contained in the hardware unit according to data stream driving.
Further, the second agent 22 may also add the self-designed target hardware core module to the hardware core library to expand the hardware core library. Thus, the second agent 22 may also be used to:
generating a module diagram structural document and a module key function description according to the related information of the target hardware core module;
Adding the target hardware core module into an adaptive module classification group in the hardware core library according to the module diagram structural document and the module key function description;
Wherein the related information includes code information (module code) of the target hardware core module acquired from the third agent, and may further include a module document and the like. The module documents may include technical documents in which the content may detail the design specifications, implementation, method of use, performance characteristics, critical functions, interfaces, etc. of this target hardware core module, which may be generated by the second agent. And one node in the module diagram structure represents one hardware unit in the target hardware core module, and the attribute and the edge of the hardware unit represent the relationship among different units in the target hardware core module.
The specific implementation process description of adding the target hardware core code to the hardware core library can refer to the hardware core library (OSH IP library) establishment related process described in connection with 3 in other embodiments, and will not be described in detail herein. In addition, the steps of adding the target hardware core module to the hardware core library given above may be triggered to be performed after the code generation and test verification of the target hardware core module by the third agent and the fourth agent.
2.3, A third agent 23 (an "RTL writer" agent as shown in fig. 2a, etc.) and a fourth agent (an "RTL tester" agent as shown in fig. 2a, etc.).
After the second agent 22 generates a comprehensive module description document (such as the Arch document shown in fig. 2 c) and module design summary document for the newly designed hardware core, these documents are handed over to the code generation and verification stage process.
As with the example given with reference to fig. 2c, the code generation and verification phase can be divided into four steps, 1) IP code generation, 2) IP code verification, 3) SoC integration, 4) SoC verification. Steps 1) and 3) are performed by third agent 23 (i.e., third agent is responsible for converting the corresponding description document (e.g., module description document into RTL code, such as Verilog code, in particular)), and steps 2) and 4) are performed by fourth agent 24. And the above IP code may refer to RTL code, which may be from OSH IP library and/or generated using LLM. The above-mentioned IP code generation and IP code verification means that the hardware core module generates a corresponding IP code (which is an RTL code) and verifies the IP code. In addition, RTL code generation follows a hierarchical drive structure and data stream driven direction, specifically, the third agent 23 and the fourth agent 24 perform respective tasks in a hierarchical drive and data stream driven direction based on received module description documents and the like.
Specifically, for a self-designed hardware core module (an NPU module generated by the self-design of the second agent 22 as described above in connection with fig. 4 a), the third agent 23 may use a fine-tuned LLM model (e.g., the llama 3.1B model) therein to generate RTL code for the hardware core module based on receipt of the corresponding module description document and module design summary document. In addition, a module description document (or a module description document and a module design summary document) may also be provided to the fourth agent 24 to facilitate development of the reference hardware core module and test cases. The fourth agent 24, upon receiving the RTL code from the third agent 23, will perform test verification using the established UVM verification environment and tools such as VCS. Once all test cases pass, the self-designed hardware core module (hardware IP module) is assembled and integrated with other hardware core modules obtained from an OSH IP library, and SoC-level test verification is carried out. In the present application, the procedure for hardware IP and SoC verification is the same, except that the test object at the time of verification is different. And, for other hardware core modules obtained from the OSH IP library, the RTL code thereof may be obtained from the OSH IP library or may be generated by using LLM model, which is not limited herein.
It should be noted here that the hardware core module shown in each figure (e.g. fig. 2 c) represents a hardware core module newly modified and designed by the second agent 22, and only the hardware core module with the "Self IP" identifier is stored in the hardware core library (e.g. OSH IP library) before. And, for the hardware core module searched in the hardware core library, the module code thereof may be obtained from the hardware core library, but of course, in other embodiments, the module code may also be generated by using LLM, for example, the second agent 22 may send the module description information of the hardware core module obtained from the hardware core library to the third agent 23 and the fourth agent 24, so that the third agent 23 and the fourth agent 24 cooperatively generate the module code of the hardware core module.
For example, referring to the example given in the dashed box a in fig. 5, when the third agent 23 performs its task in a hierarchical driving (in a bottom-up manner) and data stream driving guiding manner according to the module description document of one NPU module included in the chip architecture, firstly, the name, structure, function description, and the like of the PE unit in the third hierarchical Level3 are extracted from the module description document, and then, a corresponding unit code file is generated for the PE unit according to the extracted information. After the unit code file of each unit on the third Level3 is generated, the name, structure, function description and the like of each unit in the second Level2 are extracted from the module description document, so that corresponding unit codes are generated for each unit on the unit code file. For example, taking the PE array on the second Level2 as an example, the corresponding element code file is generated for the PE array according to the information such as the name, structure and function description of the extracted PE array, the relationship with other external elements, the relationship among the PE elements, and the like, and the transmission path of the data stream inside the PE array. According to the unit code files of the units on the second Level, the Level code file of the second Level2 can be finally generated according to the data flow relation of the unit elements on the second Level, and the Level file of the second Level2 is the unit code file of the computing unit on the first Level 1. After the unit code file of each unit on the second Level3 is generated, the name, structure, function description and the like of each unit in the first Level1 are extracted from the module description document, so that corresponding unit codes are generated for each unit on the unit code file. The generation of unit code files of the computing units, the storage units and the like on the first Level1 can be seen from the related unit code files which are described for the first Level1 and the second Level2 to generate related contents. Based on the generated unit codes, a module code file of the NPU module is finally generated.
It should be noted that, in the present application, the code file (e.g., unit code file, etc.) may be, but is not limited to, a Verilog code file, which is a Hardware Description (HDL) code, specifically, a code at the RTL level. And, for the third agent 23, in order to enhance the capability of the LLM model in terms of Verilog code generation, the application utilizes an open source dataset to fine tune the LLM model (such as LLama 3.1.1 70b model), and integrates the fine-tuned LLM model as a core component in the third agent. That is, the code generation described in the above example is generated by the third agent 23 using the LLM model trimmed therein. In addition, as shown in step 2) of fig. 2c, after generating the module code file of a hardware core module, the third agent 23 may also call a code analysis tool (e.g. Linter, a static code analysis tool) to analyze and detect the latent errors, style inconsistencies and other problems in the module code, and output a check report. The inspection report (Linter report) will typically contain a series of warnings and error messages, indicating where in the code improvements are needed, and modification advice, etc. The third agent 23 will modify the generated module code file according to the inspection report until the module code file is modified to be free of errors, and then trigger execution to issue the module code file to the fourth agent 24 for test verification.
Further, referring to what is shown in dashed box b in fig. 5, in order to speed up the testing process, the present application creates a UVM (Universal Verification Methodology) verification platform environment in advance, which enables the fourth agent 24 to focus on developing test cases and reference hardware core models. For example, with the foregoing example, the fourth agent 24 receives a module code file of an NPU module from the third agent 23, and invokes a simulation tool (e.g., VCS, modelSim) on the established UVM verification platform to test and verify the module code of the NPU module based on the test case and the reference hardware core module generated for the NPU. And when the test verification fails, feeding back a test verification report to the third agent 23 so that the third agent 23 modifies the module code of the NPU module based on the test verification report. The test verification mode is to use the test cases as the input of the reference hardware core module and the NPU module respectively, and determine the test verification result by comparing the output results of the two modules.
After the test verification is passed, immediately before the SoC integration of step 3) is performed, and before the hardware core modules are assembled and integrated, as shown in a dashed box c in fig. 5, the third agent 23 may evaluate the interface compatibility between different hardware IP modules, and if any interface is found to be incompatible, the third agent 23 may develop an adapter module according to a predefined interface template (e.g. the illustrated adaptation layer template). These templates mainly solve three types of problems, data width, clock domain crossing and protocol consistency. The third agent adjusts these templates as needed, specifically, generates the final adapter code according to the specific interface requirements between the hardware IP modules. Finally, the respective hardware core modules are assembled and integrated. Specifically, the integrated implementation manner is that the third agent performs logic synthesis processing on the module codes of the corresponding hardware core modules to obtain a chip-level code file (such as a dssoc.v file shown in fig. 2c, specifically, a netlist file). The third agent 23 then issues the chip-level code file to the fourth agent 24 for system-level test verification by the fourth agent 24. Once the test verifies, the chip-level code file will eventually be transferred to the physical agent to enter the EDA physical implementation stage for subsequent physical implementation.
2.4 Fifth agent (EDA expert agent as shown in FIGS. 2a, 2d, etc.)
As shown with reference to fig. 2d, the fifth agent is provided with the relevant EDA script and tool manual inside. At the EDA physical implementation stage. The physical agents develop custom scripts and apply them to the corresponding EDA tools (e.g., genus and Innovus) according to specific requirements. Through iterative adjustment, the physical agent eventually generates and outputs a GDSII file (which is a physical layout file) that meets specified design requirements.
Illustratively, the GDSSII file generation process may include the steps of:
s1, synthesizing, namely converting codes in the chip-level code file into a gate-level netlist.
S2, layout planning, namely determining the overall layout of the chip.
S3, power planning, namely designing a power distribution network.
S4, layout, namely placing the logic unit on a chip.
S5, clock Tree Synthesis (CTS) is implemented by generating a clock tree to ensure low jitter of a clock signal.
S6, wiring, namely connecting all logic units and a power supply network.
S6, static time sequence analysis (STA) is performed to verify whether the time sequence meets the requirement.
Through the above, after the physical implementation steps of synthesis, layout, wiring and the like are completed, a physical view (which is a chip layout) of the chip design is formed, so that a design file of the chip can be generated based on the physical view. According to the design file, subsequent chip manufacture can be performed.
In summary, the execution tasks of the third agent 23, the fourth agent 24 and the fifth agent 25 can be briefly described as follows:
A third agent 23, configured to perform a code testing task, so as to generate a module code of the hardware core module according to a module description document of the hardware core module;
A fourth agent 24, configured to perform a code testing task, generate a test case and a reference hardware core module according to a module description document of the hardware core module, and test a module code of the target hardware core module based on the test case and the reference hardware core module;
The third agent 23 is further configured to perform logic synthesis processing on module codes of the hardware core modules (including a hardware module newly designed by the second agent and a hardware core module found from a hardware core library) after determining that the verification agent performs a code verification task, so as to obtain a chip-level code file;
the fourth agent 24 is further configured to test the chip-level code file;
And the fifth agent 25 is configured to trigger to execute a physical implementation task after determining that the chip-level code file passes the test, so as to generate a chip layout according to the chip-level code file, and generate the chip design file according to the chip layout. The chip design file contains chip layout information and is used for chip manufacturing.
It should be added that the chip-level code file according to the present application is understandably a netlist (see, for example, the netlist shown in fig. 1, for example, a gate-level netlist). Netlist (Netlist) is a text representation of a circuit design, which is an important element in the chip design flow, and converts high-level design descriptions into low-level logic gate level representations for subsequent physical implementation and verification. That is, the netlist can be considered an important step in converting from a high-level design description (e.g., verilog or VHDL code) to a physical implementation (layout design)
To verify the present application, two chip design scenarios are listed below.
With the scheme of the application, two special-purpose system chips (DSSoC) are developed for the Internet of things (IoT) and mobile application fields, such as the design information listed in table 2. Fig. 1 shows the generated chip layout for these two domains. The internet of things chip (whose layout is shown as case a on the left in fig. 6) is intended to support MobileNet (a lightweight deep learning model designed specifically for mobile devices and embedded devices), resNet (residual neural network) and DS-CNN (deep separable convolutional neural network) algorithms, while the mobile system-level chip (whose layout is shown as case B on the right in fig. 6) is capable of running ViT, ilama 2-7B and 3D GS (3D gaussian scattering). For case a, the chip architecture includes a CPU, a neural Network Processing Unit (NPU) and a Digital Signal Processor (DSP), where the NPU enhances sparse computation and img2col support. For case B, the chip architecture integrates a CPU, a GPU and an NPU, the GPU being extended by a Special Functional Unit (SFU) and a tensor computation module. The NPU is further optimized for INT8/FP16 hybrid accuracy calculations and sparse calculations.
In the process of code generation and SoC integration, the case A adopts OSH CVA6 as a CPU, integrates a plurality of modules of OSH to construct a DSP, and independently develops an NPU. For case B, a single core OSH C910 CPU was used and Nyuzi was chosen as the GPU. The GPU is further enhanced by adding SFU and tensor computation units. The third agent ("RTL writer" agent) also independently developed the NPU code for case B. Finally, the physical agent ("EDA expert" agent) implements case A by flattening, while case B combines a modular approach with a hierarchical physical design flow. Different technology nodes (7 nm and 22 nm) are used depending on different cost considerations. After the entire workflow design process is completed, the final design of the two socs is finally obtained, as shown in fig. 6 and table 2.
TABLE 2 design information
Table 2 above shows detailed design specifications for two chips with operating frequencies of 500MHz and 1GHz, respectively. The total area and power consumption for case a were 4.0 square millimeters and 419.6 milliwatts, respectively, and for case B were 30.5 square millimeters and 22.6 watts. Considering manual debugging and tool use in the design flow, the design time for case a is about 2 weeks, while case B takes about 4 weeks. In contrast, there is an existing SoC (system on a chip) designed agilely, which is reported to have a design time of 10-12 weeks. Obviously, the scheme of the application significantly shortens the design period, and shows the efficiency of the design in rapid design.
Fig. 7 illustrates the energy efficiency and performance improvement for case a and case B. By designing DSA (domain specific architecture) architecture with the solution of the present application, it is evident that significant efficiency and performance enhancements are obtained. For example, mobileNet algorithm achieved a 23.45-fold efficiency improvement and 26.72-fold performance improvement over the base case a architecture. In the architecture optimization process of the present application, additional 1.47 times and 1.34 times improvement was obtained. For case B, the evaluation results show that ViT algorithm achieves an impressive 36.64-fold energy efficiency and 43.44-fold performance improvement over the final improved architecture.
In addition, the application also compares the energy efficiency of the existing internet of things terminal STM32MP25 SoC and case A, and the energy efficiency of the mobile terminal Jetson Nano and case B. As shown in FIG. 8, the DS-CNN operating on case A is 23.81 times more energy efficient than STM32MP25, while Llama2-7B operating on case B is 32.43 times more energy efficient than Jetson Nano. Other algorithms also exhibit varying degrees of improvement.
To verify the impact of multi-agent systems in the methodology of the present application, workflow performance under different numbers of agents was also evaluated. As shown in fig. 9, the present application also demonstrates the execution workflow of a Neural Processing Unit (NPU) and the code generation process of a pulse array that varies with the number of agents in the workflow and the number of agents in the code generation, and is standardized based on the highest energy efficiency, the smallest total area, and the shortest running time. For NPU design tasks, when the number of agents in the workflow is excessive, e.g., up to 10, the design time reaches a maximum of about 7 times longer than the 6 agents, while the energy efficiency is minimized. Also, for code generation of systolic arrays, the code writing process involving 5 agents results in a design time of almost 5 times the shortest time, with the total area expanding to about 3 times the smallest area. This phenomenon is mainly due to the increased communication overhead between agents and the distortion of information that may occur during communication, which ultimately results in a decrease in efficiency.
The application also provides a chip design method based on the system provided by the application. Specifically, the method comprises the following steps:
s1, executing the following operation steps by using a plurality of agents to cooperatively:
s11, receiving a design request of a user, wherein the design request comprises chip design requirement information;
and S12, executing chip design processing based on the chip design requirement information to generate a chip layout file.
For a specific implementation of each step in the above method embodiments, reference may be made to the relevant content in other embodiments. In addition, the above method embodiments include not only the steps shown above, but also other steps, and related other steps may also be included, and related content in other embodiments may also be referred to, which will not be described in detail herein.
The application further provides electronic equipment. The electronic device is provided with the system provided in the other embodiments of the application. The electronic device may be various terminal devices such as a smart phone, a personal computer (e.g., desktop computer, notebook computer), a tablet, etc., or may be a server device such as a single server, a server cluster, a virtual server, etc.
Further, the electronic device may also include other components, such as an interface module, a control panel, and the like.
Accordingly, the present application also provides a computer-readable storage medium storing a computer program, where the computer program is executed by a computer to implement the steps or functions of the method embodiments provided by the present application.
And, embodiments of the present application also provide a computer program product, which includes a computer program, and the computer program can implement the steps or functions in the method embodiments provided by the present application when the computer program is executed by a processor.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM (Read Only Memory)/RAM (RandomAccess Memory ), magnetic disk, optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same, and although the present application has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present application.

Claims (10)

1. A chip design system based on multiple agents is characterized by comprising multiple agents;
the plurality of agents cooperatively perform the following:
receiving a design request of a user, wherein the design request comprises chip design requirement information;
And executing chip design processing based on the chip design requirement information to generate a chip design file, wherein the chip design file contains chip layout information.
2. The system of claim 1, wherein the chip design requirement information includes required functional information including algorithm information, wherein the plurality of agents includes at least one executing agent including a first agent and a second agent;
The first agent is used for generating a functional relation diagram based on the algorithm information; each node in the functional relation diagram represents a functional module, and each side represents the data dependency relation before two functional modules;
The second intelligent agent is provided with a memory module, wherein the memory module is embedded with a hardware core library, and is used for distributing a hardware core module for each functional module in the functional relation diagram to obtain module information of at least one required hardware core module, and searching for the matched hardware core module from the hardware core library according to the module information of the hardware core module, wherein the module information comprises configuration information of the hardware core module and the corresponding functional module.
3. The system of claim 2, wherein the target hardware core module is one of the at least one hardware core module required, and
The second agent is further configured to generate, when the matching target hardware core module is not found from the hardware core library, a hardware core module as the target hardware core module based on module information of the target hardware core module.
4. The system of claim 3, wherein the second agent, when configured to generate a hardware core module as the target hardware core module based on the model information of the target hardware core module, is specifically configured to:
Determining an initial hardware core module, wherein the type of the initial hardware core module is the same as that of a required target hardware core module;
Modifying the initial hardware core module based on the module information of the target hardware core module;
determining the modified initial hardware core module as the target hardware core module;
The initial hardware core module comprises at least one layer, each layer is provided with at least one hardware unit, the modification comprises at least one modification operation of the hardware units on the layer in a top-down mode, wherein the at least one modification operation comprises the steps of deleting redundant hardware units, adding hardware units and modifying hardware units, and when one hardware unit is added and/or modified, determining a processing unit to be contained in the hardware unit according to data stream driving.
5. The system of claim 4, wherein the second agent is further configured to:
generating a module diagram structural document and a module key function description according to the related information of the target hardware core module;
Adding the target hardware core module into an adaptive module classification group in the hardware core library according to the module diagram structural document and the module key function description;
The related information comprises a module code of a target hardware core module acquired from a third agent, wherein the third agent is one of the at least one execution agent, and one node in the module diagram structure represents one hardware unit in the target hardware core module and the attribute and edge of the hardware unit represent the relation among different units in the target hardware core module.
6. The system of any one of claims 3 to 5, wherein the at least one executing agent further comprises a third agent, a fourth agent, a fifth agent;
the second agent is further configured to generate a module description document of the target hardware core module;
the third agent is used for generating a module code of the target hardware core module according to the module description document;
The fourth agent is configured to generate a test case and a reference hardware core module according to the module description document, and test a module code of the target hardware core module based on the test case and the reference hardware core module;
The third agent is further configured to perform logic synthesis processing on the module code of the target hardware core module and the module code of each hardware core module found from the hardware core library after determining that the module code of the target hardware core module passes the test, so as to obtain a chip-level code file;
The fourth agent is further configured to test the chip-level code file;
The fifth agent is configured to generate a chip layout according to the chip-level code file that passes the test, and generate the chip design file according to the chip layout.
7. The system of claim 6, wherein the plurality of agents further comprises a management agent;
The management agent is used for receiving a design request of a user and executing chip design task decomposition based on chip design requirement information contained in the design request so as to allocate an adaptive processing task for the at least one execution agent.
8. The chip design method based on the multiple agents is characterized by comprising the following steps of:
the following operation steps are cooperatively performed by using a plurality of agents:
receiving a design request of a user, wherein the design request comprises chip design requirement information;
and executing chip design processing based on the chip design requirement information to generate a chip layout file.
9. An electronic device comprising a system as claimed in any one of the preceding claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a computer, is capable of realizing the method of the preceding claim 8.
CN202510314033.1A 2025-03-17 2025-03-17 Chip design system and method based on multi-agent Pending CN120217995A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510314033.1A CN120217995A (en) 2025-03-17 2025-03-17 Chip design system and method based on multi-agent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510314033.1A CN120217995A (en) 2025-03-17 2025-03-17 Chip design system and method based on multi-agent

Publications (1)

Publication Number Publication Date
CN120217995A true CN120217995A (en) 2025-06-27

Family

ID=96100437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510314033.1A Pending CN120217995A (en) 2025-03-17 2025-03-17 Chip design system and method based on multi-agent

Country Status (1)

Country Link
CN (1) CN120217995A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121257419A (en) * 2025-12-03 2026-01-02 珠海硅芯科技有限公司 Cross-level error tracking and repair method and system for stacked chip systems

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121257419A (en) * 2025-12-03 2026-01-02 珠海硅芯科技有限公司 Cross-level error tracking and repair method and system for stacked chip systems

Similar Documents

Publication Publication Date Title
US10372859B2 (en) System and method for designing system on chip (SoC) circuits using single instruction multiple agent (SIMA) instructions
Teich Hardware/software codesign: The past, the present, and predicting the future
US11256845B2 (en) Machine-learning driven prediction in integrated circuit design
WO2021190597A1 (en) Processing method for neural network model, and related device
Xiao et al. Plasticity-on-chip design: Exploiting self-similarity for data communications
CN118171609B (en) Key path delay optimization method based on logic netlist
US20240020537A1 (en) Methodology to generate efficient models and architectures for deep learning
US12277374B2 (en) Synthesis placement bounds based on physical timing analysis
US12175176B2 (en) Fast synthesis of logical circuit design with predictive timing
Antonov et al. Algo500—a new approach to the joint analysis of algorithms and computers
CA3187339A1 (en) Reducing resources in quantum circuits
Yang et al. A new design approach of hardware implementation through natural language entry
CN120217995A (en) Chip design system and method based on multi-agent
Esmaeilzadeh et al. An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators
US20220075920A1 (en) Automated Debug of Falsified Power-Aware Formal Properties using Static Checker Results
Odetola et al. 2l-3w: 2-level 3-way hardware-software co-verification for the mapping of deep learning architecture (dla) onto fpga boards
Abarajithan et al. Cgra4ml: A framework to implement modern neural networks for scientific edge computing
US20230126888A1 (en) Computation of weakly connected components in a parallel, scalable and deterministic manner
Lu et al. GAN-place: Advancing open source placers to commercial-quality using generative adversarial networks and transfer learning
Odetola et al. 2l-3w: 2-level 3-way hardware–software co-verification for the mapping of convolutional neural network (cnn) onto fpga boards
US12223248B1 (en) System, method, and computer program product for optimization-based printed circuit board design
Chang et al. Large processor chip model
US20230214574A1 (en) Extended regular expression matching in a directed acyclic graph by using assertion simulation
US12367334B2 (en) Runtime and memory efficient attribute query handling for distributed engine
Zhang et al. Generative AI through CAS lens: An integrated overview of algorithmic optimizations, architectural advances, and automated designs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination