CN115756296A

CN115756296A - Buffer management method and device, control program and controller

Info

Publication number: CN115756296A
Application number: CN202111028812.3A
Authority: CN
Inventors: 李亚文; 刘衡祁; 徐金林
Original assignee: Sanechips Technology Co Ltd
Current assignee: Sanechips Technology Co Ltd
Priority date: 2021-09-02
Filing date: 2021-09-02
Publication date: 2023-03-07
Also published as: WO2023030195A1

Abstract

The embodiment of the application provides a cache management method and device, a control program and a controller; the method comprises the following steps: the cache management unit MMU identifies the type of an externally connected cache controller based on the CPU configuration information; confirming an offset address in a table look-up mode based on an address management submodule and an address area corresponding to the type of the cache controller; calculating a logical address of the type of the cache controller based on the offset address; wherein, the number of the external connection channels gated by different cache controller types is different. Through the application, the problem that different controllers are compatible under the same frame is solved, and then multiple controllers can be compatible under the same frame, and the effect of improving the storage efficiency is achieved.

Description

Buffer management method and device, control program and controller

技术领域technical field

本申请涉及通信技术领域，具体而言，涉及一种缓存管理方法和装置、控制程序及控制器。The present application relates to the technical field of communications, and in particular, to a cache management method and device, a control program, and a controller.

背景技术Background technique

在以太网交换芯片的应用中，往往需要根据应用场景和成本等，选择不同型号的片外缓存单元，即缓存控制器，所以芯片的缓存管理单元(Memory Management Unit,简称MMU)不仅要能够满足基本的功能需求，且要具有良好的兼容性和可移植性，以便在不同的应用场景下，根据存储大小、速率、功耗和成本等因素连接不同的缓存控制器，而不需要进行多次开发，从而节省人力和成本。In the application of Ethernet switching chips, it is often necessary to select different types of off-chip cache units, that is, cache controllers, according to application scenarios and costs. Therefore, the memory management unit (MMU) of the chip must not only meet Basic functional requirements, and must have good compatibility and portability, so that in different application scenarios, different cache controllers can be connected according to factors such as storage size, speed, power consumption, and cost, without the need for multiple development, thereby saving manpower and cost.

随着处理器和存储控制器频率和带宽的不断提升，其各自性能也在不断提高，但缓存访问效率却往往成为系统性能的瓶颈问题，而缓存访问效率与MMU的实现方式有关。在以太网交换芯片中，MMU的主要功能是负责报文(packet,PK)的数据、报文描述符(packetdescriptor,PD)的写请求、写数据分发，并按照写请求保序释放；再将数据根据读请求保序读回，在这个过程中，通过片外地址管理、物理地址映射和拼包等技术尽可能最大化利用缓存带宽，以提高存储控制器的效率。当下主流的缓存控制器有DDR3/DDR4/DDR5(DoubleData Rate,DDR双倍速率)、HBM(High Bandwidth Memory)(高带宽存储器)等，如何在同一种框架下兼容不同的控制器且能够保证其存储效率是待解决的主要问题。As the frequency and bandwidth of processors and storage controllers continue to increase, their respective performances are also improving, but the cache access efficiency often becomes the bottleneck of system performance, and the cache access efficiency is related to the implementation of the MMU. In the Ethernet switch chip, the main function of the MMU is to be responsible for the data of the packet (packet, PK), the write request of the packet descriptor (packet descriptor, PD), and the distribution of the written data, and to release them in order according to the write request; The data is read back in order according to the read request. During this process, technologies such as off-chip address management, physical address mapping, and packetization are used to maximize the use of cache bandwidth to improve the efficiency of the memory controller. The current mainstream cache controllers include DDR3/DDR4/DDR5 (Double Data Rate, DDR double rate), HBM (High Bandwidth Memory) (high bandwidth memory), etc. How to be compatible with different controllers under the same framework and ensure their Storage efficiency is the main issue to be solved.

针对上述如何在同一种框架下兼容不同的控制器的问题，目前尚未提出有效的解决方案。For the above-mentioned problem of how to be compatible with different controllers under the same framework, no effective solution has been proposed so far.

发明内容Contents of the invention

本申请实施例提供了一种缓存管理方法和装置、控制程序及控制器，以至少解决相关技术中如何在同一种框架下兼容不同的控制器的问题。Embodiments of the present application provide a cache management method and device, a control program, and a controller, so as to at least solve the problem of how to be compatible with different controllers under the same framework in the related art.

根据本申请的一个实施例，提供了一种缓存管理方法，包括：缓存管理单元MMU基于CPU配置信息，识别外接的缓存控制器类型；基于地址管理子模块和上述缓存控制器类型对应的地址区域，以查表的方式确认偏移地址；基于上述偏移地址计算出上述缓存控制器类型的逻辑地址；其中，不同的上述缓存控制器类型选通的对外连接通道数不同。According to an embodiment of the present application, a cache management method is provided, including: the cache management unit MMU identifies the external cache controller type based on the CPU configuration information; based on the address management submodule and the address area corresponding to the above cache controller type , confirming the offset address by means of table lookup; calculating the logical address of the above-mentioned cache controller type based on the above-mentioned offset address; wherein, the number of external connection channels selected by different cache controller types is different.

根据本申请的另一个实施例，提供了一种缓存管理装置，包括：识别单元，用于使缓存管理单元MMU基于CPU配置信息，识别外接的缓存控制器类型；确认单元，用于基于地址管理子模块和上述缓存控制器类型对应的地址区域，以查表的方式确认偏移地址；计算单元，用于基于上述偏移地址计算出上述缓存控制器类型的逻辑地址；其中，不同的上述缓存控制器类型选通的对外连接通道数不同。According to another embodiment of the present application, a cache management device is provided, including: an identification unit for enabling the cache management unit MMU to identify the type of an external cache controller based on CPU configuration information; a confirmation unit for address-based management The address area corresponding to the submodule and the above-mentioned cache controller type confirms the offset address by means of a look-up table; the calculation unit is used to calculate the logical address of the above-mentioned cache controller type based on the above-mentioned offset address; wherein, the different above-mentioned cache The number of external connection channels gated by the controller type is different.

根据本申请的又一个实施例，还提供了一种计算机可读存储控制程序，上述计算机可读存储控制程序中存储有计算机程序，其中，上述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。According to yet another embodiment of the present application, a computer-readable storage control program is also provided, wherein a computer program is stored in the above-mentioned computer-readable storage control program, wherein the above-mentioned computer program is configured to perform any one of the above-mentioned methods when running Steps in the examples.

根据本申请的又一个实施例，还提供了一种控制器，包括缓存器和处理器，上述控制器中存储有计算机程序，上述处理器被设置为运行上述计算机程序以执行上述任一项方法实施例中的步骤。According to another embodiment of the present application, there is also provided a controller, including a buffer and a processor, the above-mentioned controller stores a computer program, and the above-mentioned processor is configured to run the above-mentioned computer program to perform any one of the above-mentioned methods Steps in the examples.

通过本申请，由于采用了缓存管理单元MMU基于CPU配置信息，识别外接的缓存控制器类型；基于地址管理子模块和上述缓存控制器类型对应的地址区域，以查表的方式确认偏移地址；基于上述偏移地址计算出上述缓存控制器类型的逻辑地址；其中，不同的上述缓存控制器类型选通的对外连接通道数不同；因此，实现了同一种框架下不同的控制器的切换，可以解决同一种框架下兼容不同的控制器的问题，进而在同一框架下可兼容多种控制器，以及提高存储效率的效果。Through this application, because the cache management unit MMU is used to identify the type of external cache controller based on the CPU configuration information; based on the address management submodule and the address area corresponding to the above cache controller type, the offset address is confirmed in a table lookup manner; The logical address of the above-mentioned cache controller type is calculated based on the above-mentioned offset address; wherein, the number of external connection channels selected by different above-mentioned cache controller types is different; therefore, the switching of different controllers under the same framework is realized, which can Solve the problem of being compatible with different controllers under the same framework, and then be compatible with multiple controllers under the same framework, and improve the effect of storage efficiency.

附图说明Description of drawings

图1是根据本申请实施例的缓存管理方法的移动终端的硬件结构框图；FIG. 1 is a block diagram of a hardware structure of a mobile terminal according to a cache management method according to an embodiment of the present application;

图2是根据本申请实施例的缓存管理方法的流程图；FIG. 2 is a flowchart of a cache management method according to an embodiment of the present application;

图3是根据本申请实施例的缓存管理系统的架构示意图；FIG. 3 is a schematic diagram of the architecture of a cache management system according to an embodiment of the present application;

图4是根据本申请实施例的缓存管理方法中的HBM的optionE模式的示意图；FIG. 4 is a schematic diagram of an optionE mode of HBM in a cache management method according to an embodiment of the present application;

图5是根据本申请实施例的缓存管理方法中的地址对应关系的示意图一；FIG. 5 is a first schematic diagram of address correspondence in a cache management method according to an embodiment of the present application;

图6是根据本申请实施例的缓存管理方法中的地址对应关系的示意图二；FIG. 6 is a second schematic diagram of address correspondence in a cache management method according to an embodiment of the present application;

图7是根据本申请实施例的缓存管理方法中的地址对应关系的示意图三；FIG. 7 is a schematic diagram 3 of address correspondence in the cache management method according to an embodiment of the present application;

图8是根据本申请实施例的缓存管理方法中的地址对应关系的示意图四；FIG. 8 is a schematic diagram 4 of address correspondence in the cache management method according to an embodiment of the present application;

图9是根据本申请实施例的缓存管理方法中的地址映射的示意图一；FIG. 9 is a first schematic diagram of address mapping in a cache management method according to an embodiment of the present application;

图10是根据本申请实施例的缓存管理方法中的地址映射的示意图二；FIG. 10 is a second schematic diagram of address mapping in a cache management method according to an embodiment of the present application;

图11是根据本申请实施例的缓存管理方法中的地址映射的示意图三；FIG. 11 is a third schematic diagram of address mapping in a cache management method according to an embodiment of the present application;

图12是根据本申请实施例的缓存管理方法的片外地址管理示意图；FIG. 12 is a schematic diagram of off-chip address management according to a cache management method according to an embodiment of the present application;

图13是根据本申请实施例的缓存管理装置的结构示意图。FIG. 13 is a schematic structural diagram of a cache management device according to an embodiment of the present application.

具体实施方式Detailed ways

下文中将参考附图并结合实施例来详细说明本申请的实施例。Embodiments of the present application will be described in detail below with reference to the drawings and in combination with the embodiments.

需要说明的是，本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence.

本申请实施例中所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在移动终端上为例，图1是本申请实施例的缓存管理方法的移动终端的硬件结构框图。如图1所示，移动终端可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和用于存储数据的存储器104，其中，上述移动终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解，图1所示的结构仅为示意，其并不对上述移动终端的结构造成限定。例如，移动终端还可包括比图1中所示更多或者更少的组件，或者具有与图1所示不同的配置。The method embodiments provided in the embodiments of the present application may be executed in mobile terminals, computer terminals or similar computing devices. Taking running on a mobile terminal as an example, FIG. 1 is a block diagram of a hardware structure of a mobile terminal according to a cache management method according to an embodiment of the present application. As shown in Figure 1, the mobile terminal may include one or more (only one is shown in Figure 1) processors 102 (processors 102 may include but not limited to processing devices such as microprocessor MCU or programmable logic device FPGA, etc.) and a memory 104 for storing data, wherein the above-mentioned mobile terminal may also include a transmission device 106 and an input and output device 108 for communication functions. Those skilled in the art can understand that the structure shown in FIG. 1 is only for illustration, and it does not limit the structure of the above mobile terminal. For example, the mobile terminal may also include more or fewer components than those shown in FIG. 1 , or have a different configuration from that shown in FIG. 1 .

存储器104可用于存储计算机程序，例如，应用软件的软件程序以及模块，如本申请实施例中的缓存管理方法对应的计算机程序，处理器102通过运行存储在存储器104内的计算机程序，从而执行各种功能应用以及数据处理，即实现上述的方法。存储器104可包括高速随机存储器，还可包括非易失性存储器，如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中，存储器104可进一步包括相对于处理器102远程设置的存储器，这些远程存储器可以通过网络连接至移动终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the cache management method in the embodiment of the present application, and the processor 102 executes various functions by running the computer programs stored in the memory 104 A functional application and data processing, that is, to realize the above-mentioned method. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory that is remotely located relative to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括移动终端的通信供应商提供的无线网络。在一个实例中，传输装置106包括一个网络适配器(Network Interface Controller，简称为NIC)，其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中，传输装置106可以为射频(Radio Frequency，简称为RF)模块，其用于通过无线方式与互联网进行通讯。The transmission device 106 is used to receive or transmit data via a network. The specific example of the above network may include a wireless network provided by the communication provider of the mobile terminal. In one example, the transmission device 106 includes a network interface controller (NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (Radio Frequency, RF for short) module, which is used to communicate with the Internet in a wireless manner.

图2是根据本申请实施例的缓存管理方法的流程图，如图2所示，该流程包括如下步骤：Fig. 2 is a flowchart of a cache management method according to an embodiment of the present application. As shown in Fig. 2, the process includes the following steps:

步骤S202，缓存管理单元MMU基于中央处理器CPU配置信息，识别外接的缓存控制器类型；Step S202, the cache management unit MMU identifies the type of the external cache controller based on the CPU configuration information of the central processing unit;

步骤S204，基于地址管理子模块和上述缓存控制器类型对应的地址区域，以查表的方式确认偏移地址；Step S204, based on the address management submodule and the address area corresponding to the above cache controller type, confirm the offset address by means of table lookup;

步骤S206，基于上述偏移地址计算出上述缓存控制器类型的逻辑地址；其中，不同的上述缓存控制器类型选通的对外连接通道数不同。Step S206, calculating the logical address of the above-mentioned cache controller type based on the above-mentioned offset address; wherein, the number of external connection channels selected by different cache controller types is different.

通过本申请实施例，由于采用了缓存管理单元MMU基于CPU配置信息，识别外接的缓存控制器类型；基于地址管理子模块和上述缓存控制器类型对应的地址区域，以查表的方式确认偏移地址；基于上述偏移地址计算出上述缓存控制器类型的逻辑地址；其中，不同的上述缓存控制器类型选通的对外连接通道数不同；因此，实现了同一种框架下不同的控制器的切换，可以解决同一种框架下兼容不同的控制器的问题，进而同一框架下可兼容多种控制器，以及提高存储效率的效果。Through the embodiment of this application, since the cache management unit MMU is used to identify the type of the external cache controller based on the CPU configuration information; based on the address management submodule and the address area corresponding to the above cache controller type, the offset is confirmed in the form of a table lookup Address; the logical address of the above-mentioned cache controller type is calculated based on the above-mentioned offset address; wherein, the number of external connection channels selected by different above-mentioned cache controller types is different; therefore, the switching of different controllers under the same framework is realized , can solve the problem of being compatible with different controllers under the same framework, and then can be compatible with multiple controllers under the same framework, and can improve storage efficiency.

在一个或多个实施例中，上述缓存管理方法还包括：上述MMU的地址映射子模块读取预设的地址映射关系；上述地址映射关系为将片内逻辑地址转换为能被缓存芯片接受的物理地址。In one or more embodiments, the above cache management method further includes: the address mapping submodule of the above MMU reads the preset address mapping relationship; the above address mapping relationship is to convert the on-chip logical address into a value acceptable to the cache chip physical address.

在一个或多个实施例中，上述缓存管理方法还包括：上述方法还包括：上述MMU的地址映射子模块读取预设的地址映射关系；上述地址映射关系为将片内逻辑地址转换为能被缓存芯片接受的物理地址；根据不同应用场景，通过CPU进行重配置，以获取重配置后与上述应用场景对应的地址映射关系。In one or more embodiments, the above-mentioned cache management method further includes: the above-mentioned method also includes: the address mapping submodule of the above-mentioned MMU reads a preset address mapping relationship; the above-mentioned address mapping relationship is to convert the on-chip logical address into The physical address accepted by the cache chip; according to different application scenarios, reconfigure through the CPU to obtain the address mapping relationship corresponding to the above application scenarios after reconfiguration.

在一个或多个实施例中，上述缓存管理方法还包括：根据片外缓存的需求和缓存控制器的结构属性，按照块单元block将CPU发送给上述MMU的数据包进行切分，其中，切分后的数据包与block地址相对应，一个数据包对应多个block地址，上述缓存控制器的类型不同时地址管理的区间不同。In one or more embodiments, the above-mentioned cache management method further includes: according to the requirements of the off-chip cache and the structural attributes of the cache controller, segmenting the data packet sent by the CPU to the above-mentioned MMU according to the block unit block, wherein the segmentation The divided data packet corresponds to the block address, and one data packet corresponds to multiple block addresses. The address management intervals are different when the types of the above cache controllers are different.

在一个或多个实施例中，上述缓存管理方法还包括：上述MMU接收到报文缓存管理单元PMU发送的片外数据，其中，上述片外数据包括报文描述符PD和报文PK；提取上述PD的数据信息；将上述数据信息进行拼包得到拼包数据，并删除上述PK中的第一数据；这里，第一数据可以包括拼包数据中的无效数据。In one or more embodiments, the buffer management method further includes: the MMU receives off-chip data sent by the packet buffer management unit PMU, wherein the off-chip data includes a packet descriptor PD and a packet PK; extracting The data information of the above-mentioned PD; the above-mentioned data information is bundled to obtain the bundled data, and the first data in the above-mentioned PK is deleted; here, the first data may include invalid data in the bundled data.

对上述拼包数据进行移位拼接，提取出第二数据；这里，第二数据可以包括拼包数据中的有效数据。将上述拼包数据中的上述PD的数据信息和提取出的上述第二数据再次进行拼包，得到目标拼包数据。上述目标拼包数据发送至片外缓存进行存储。Shifting and splicing is performed on the above grouped data to extract second data; here, the second data may include valid data in the grouped data. The data information of the above-mentioned PD in the above-mentioned grouped data and the extracted second data are grouped again to obtain the target grouped data. The above-mentioned target grouping data is sent to the off-chip cache for storage.

在一个或多个实施例中，上述缓存管理方法还包括：在上述PD的数据位长度加上提取出的上述第二数据的数据位长度小于或等于总线位宽时，同时输出上述PD和上述第二数据；In one or more embodiments, the above cache management method further includes: when the data bit length of the above PD plus the extracted data bit length of the second data is less than or equal to the bus bit width, simultaneously output the above PD and the above second data;

在上述PD的数据位长度加上提取出的上述第二数据的数据位长度大于总线位宽时，将上述PD和上述第二数据拆分为第一拆分数据和第二拆分数据；When the data bit length of the PD plus the extracted data bit length of the second data is greater than the bus bit width, split the PD and the second data into first split data and second split data;

输出上述PD和上述第一拆分数据；outputting the aforementioned PD and the aforementioned first split data;

将上述第二拆分数据进行补零操作，得到数据位长度等于上述总线位宽的补零处理数据，并输出上述补零处理数据。performing a zero-padding operation on the above-mentioned second split data to obtain zero-padding processing data with a data bit length equal to the above-mentioned bus bit width, and outputting the above-mentioned zero-padding processing data.

在一个或多个实施例中，上述缓存管理方法还包括：报文缓存管理单元PMU下发写报文；In one or more embodiments, the above cache management method further includes: a message cache management unit PMU sends a write message;

上述PMU存储上述写报文，发送写释放命令给流量缓存管理单元TMMU；上述TMMU下发报文描述符至队列管理单元QMU；The above-mentioned PMU stores the above-mentioned write message, and sends a write release command to the traffic buffer management unit TMMU; the above-mentioned TMMU sends a message descriptor to the queue management unit QMU;

上述QMU通过上述TMMU下发写命令，并透传给上述MMU，上述QMU完成存储上述写命令后向上述TMMU发写释放信号；The above-mentioned QMU issues a write command through the above-mentioned TMMU, and transparently transmits it to the above-mentioned MMU, and the above-mentioned QMU sends a write release signal to the above-mentioned TMMU after storing the above-mentioned write command;

命令队列下发读报文命令，读取读报文数据并将上述读报文数据返回到上述PMU；The command queue issues a read message command, reads the read message data and returns the above read message data to the above PMU;

上述TMMU下发读命令，并读取报文描述符数据。The above-mentioned TMMU issues a read command and reads the packet descriptor data.

基于上述实施例，在一应用实施例中，本申请提供的缓存管理装置中，MMU位于片外存储控制器和片上控制模块之间，图3是根据本申请实施例的缓存管理系统的架构示意图，如图3所示，MMU位于PMU(Packet Memory Unit)报文存储单元、TMMU(TM MemoryManagement Unit内存管理单元)、CMD_FIFO(Command FIFO,命令先进先出队列)和HBM(high Bandwidth Memory,高带宽存储器)/DDR之间，本申请实施例的缓存管理方法主要能实现功能如下：Based on the above embodiments, in an application embodiment, in the cache management device provided by the present application, the MMU is located between the off-chip memory controller and the on-chip control module. FIG. 3 is a schematic diagram of the architecture of the cache management system according to the embodiment of the present application. , as shown in Figure 3, the MMU is located in the PMU (Packet Memory Unit) message storage unit, TMMU (TM Memory Management Unit memory management unit), CMD_FIFO (Command FIFO, command first-in-first-out queue) and HBM (high Bandwidth Memory, high bandwidth memory)/DDR, the buffer management method of the embodiment of the present application can mainly realize the following functions:

1)负责多队列报文PK、报文描述符PD写请求和写数据的分发，写释放的保存；1) Responsible for the distribution of multi-queue message PK, message descriptor PD write request and write data, and the storage of write release;

2)负责多队列PK、PD读请求的分发，读数据的保存；2) Responsible for the distribution of multi-queue PK and PD read requests, and the storage of read data;

3)支持多类型的缓存控制器的模式切换；3) Support mode switching of multiple types of cache controllers;

4)逻辑地址到片外物理地址的映射可重配置。4) The mapping from logical address to off-chip physical address can be reconfigured.

上述PMU存储上述写报文，发送写释放命令给流量缓存管理单元TMMU；上述TMMU下发报文描述符至队列管理单元QMU；上述QMU通过上述TMMU下发写命令，并透传给上述MMU，上述QMU完成存储上述写命令后向上述TMMU发写释放信号；The above-mentioned PMU stores the above-mentioned write message, and sends a write release command to the traffic buffer management unit TMMU; the above-mentioned TMMU sends a message descriptor to the queue management unit QMU; the above-mentioned QMU sends a write command through the above-mentioned TMMU, and transparently transmits it to the above-mentioned MMU, After the above-mentioned QMU finishes storing the above-mentioned write command, it sends a write release signal to the above-mentioned TMMU;

命令队列下发读报文命令，读取读报文数据并将上述读报文数据返回到上述PMU；上述TMMU下发读命令，并读取报文描述符数据。在整个框架中，为了解决兼容性和提高片外访问带宽及效率，主要采用的技术有地址管理、PC均衡、片外地址映射和拼包技术，兼容性是贯穿整个缓存管理过程，从而兼顾所有支持控制器的功能和性能。The command queue issues a read message command, reads the read message data, and returns the read message data to the PMU; the above TMMU issues a read command, and reads the message descriptor data. In the whole framework, in order to solve the compatibility and improve the off-chip access bandwidth and efficiency, the main technologies used include address management, PC balance, off-chip address mapping and packetization technology. Compatibility runs through the entire cache management process, so as to take into account all Supports controller functionality and performance.

在一应用实施例中，上述缓存管理方法包括：In an application embodiment, the above cache management method includes:

第一步，多控制器切换。通过CPU配置，缓存管理模块识别外接的缓存控制器类型，地址管理子模块根据控制器类型对应的地址区域，通过查表的方式确认偏移地址，从而计算出该类型控制器的逻辑地址；配置不同的缓存控制器类型，选通的对外连接通道数不同。本申请根据片外缓存的需求，设计16个Channel(通道)的片外缓存接口，每个Channel均完整支持AXI4总线的写地址、写数据、写响应、读地址和读数据五个通道。在MMU中，分HBM和DDR模式，支持模式切换可配置，默认为16Channel的HBM。通过CPU配置，可切换到DDR模式下，用户根据自身的需求，可选择连接DDR4或DDR5等不同类型，只要连接的容量大小不小于地址管理的最大节点数对应的数据容量，连接的HBM/DDR空间均可被有效利用；The first step is multi-controller switching. Through CPU configuration, the cache management module identifies the type of external cache controller, and the address management submodule confirms the offset address by looking up the table according to the address area corresponding to the controller type, thereby calculating the logical address of this type of controller; configuration Different types of cache controllers have different numbers of external connection channels to be selected. According to the requirements of the off-chip cache, this application designs off-chip cache interfaces of 16 Channels, and each Channel fully supports the five channels of write address, write data, write response, read address and read data of the AXI4 bus. In the MMU, there are HBM and DDR modes, and the mode switching is configurable. The default is HBM of 16Channel. Through CPU configuration, it can be switched to DDR mode. Users can choose to connect to different types such as DDR4 or DDR5 according to their own needs. As long as the capacity of the connection is not less than the data capacity corresponding to the maximum number of nodes managed by the address, the connected HBM/DDR Space can be used effectively;

由于支持不同类型的缓存控制器，各类型的控制器的传输速率并不一定和MMU的系统时钟同步，为了适配不同的控制器和保证数据的线速传输，划分片上的部分缓存区域，以异步(先进先出队列)FIFO的形式，先将数据缓存，同时通过自研逻辑的预读功能，将数据和命令预读出来进入Ready等待状态，当有效缓存控制器到来的时候，通过握手机制同周期将数据送出，这样可以保证最大程度利用片外控制器的带宽，且可靠地将数据流跨时钟处理。Due to the support of different types of cache controllers, the transfer rate of each type of controller is not necessarily synchronized with the system clock of the MMU. In order to adapt to different controllers and ensure the line-speed transmission of data, some cache areas on the chip are divided. In the form of asynchronous (first-in-first-out queue) FIFO, the data is cached first, and at the same time, through the pre-reading function of the self-developed logic, the data and commands are pre-read out and enter the Ready waiting state. When the effective cache controller arrives, through the handshake mechanism The data is sent out in the same cycle, which can ensure the maximum utilization of the bandwidth of the off-chip controller and reliably process the data stream across clocks.

第二步，配置地址映射。地址映射子模块根据配置的类型，读取系统预设的地址映射关系，此地址映射关系可根据应用场景，通过CPU进行重配置，以调整最佳的映射方式；The second step is to configure address mapping. The address mapping sub-module reads the address mapping relationship preset by the system according to the configuration type, and the address mapping relationship can be reconfigured through the CPU according to the application scenario to adjust the best mapping method;

由于逻辑地址不能直接索引到HBM/DDR的地址引脚上，经过MMU的处理，将逻辑地址转换为能被缓存芯片接受的地址，称为物理地址。因为缓存HBM/DDR的多层次结构，地址映射的方式和片外读写带宽、存储速率及效率有很大关系。Since the logical address cannot be directly indexed to the address pin of the HBM/DDR, after processing by the MMU, the logical address is converted into an address that can be accepted by the cache chip, which is called a physical address. Because of the multi-level structure of caching HBM/DDR, the way of address mapping has a lot to do with the off-chip read and write bandwidth, storage rate and efficiency.

本申请的地址映射，是将每个物理通道(数据总线位宽128bit)分两个虚拟通道(Pseudo Channel,伪信道，简称PC，数据总线位宽64bit)，两个虚拟通道共用一组地址和控制总线。The address mapping of this application is to divide each physical channel (data bus bit width 128bit) into two virtual channels (Pseudo Channel, pseudo channel, referred to as PC, data bus bit width 64bit), and the two virtual channels share a set of addresses and control bus.

图4是根据本申请实施例的缓存管理方法中的HBM的optionE模式的示意图，在HBM的OptionE模式下如图4所示，每个虚拟通道对应一个控制器，控制器以缓存频率的一半运行。Psgnt代表的虚拟通道(PS)，是控制器内部仲裁信号，控制器根据PS的物理接口确定Psgnt的值。逻辑在处理的时候，只需把一个HBMstack的8个控制器当成16个控制器对待，一个控制对应一个物理PS。Fig. 4 is a schematic diagram of the optionE mode of HBM in the buffer management method according to the embodiment of the present application. In the optionE mode of HBM, as shown in Fig. 4, each virtual channel corresponds to a controller, and the controller operates at half the cache frequency . The virtual channel (PS) represented by Psgnt is an internal arbitration signal of the controller, and the controller determines the value of Psgnt according to the physical interface of PS. When processing logic, you only need to treat the 8 controllers of an HBMstack as 16 controllers, and one control corresponds to one physical PS.

SID是数据8Hi的特有地址，可以当做一个bank(存储体)地址，8Hi的bank数是4Hi的二倍，分别是32个bank，16个bank。连续4个ID的bank属于一个存储体集合bank_group。8Hi、4Hi分别是8个bank，4个bank。SID is the unique address of data 8Hi, which can be regarded as a bank (storage body) address. The number of banks of 8Hi is twice that of 4Hi, which are 32 banks and 16 banks respectively. Banks with 4 consecutive IDs belong to a bank set bank_group. 8Hi and 4Hi are 8 banks and 4 banks respectively.

laddr[N:0]是8Byte地址(一个控制的总线位宽为128bit，划分为两个PS，每个PS的位宽为64bit)，由于HBM控制器的预取倍为4，一次存储256bit，会占用4个地址，所以laddr[1:0]实际逻辑不会分配地址，实际默认为0，控制器也不用。laddr[N:0] is an 8Byte address (a control bus has a bit width of 128 bits, which is divided into two PSs, and the bit width of each PS is 64 bits). Since the prefetch times of the HBM controller is 4, 256 bits are stored at a time. It will occupy 4 addresses, so the actual logic of laddr[1:0] will not assign an address, and the actual default is 0, and the controller does not use it.

MMU送出的地址为384B地址(实际地址为2的整数次幂送出)，AXI的地址为1字节，PS通道对应的颗粒每个地址存128bit数据，所以四者地址对应关系如下：The address sent by the MMU is the 384B address (the actual address is sent by an integer power of 2), the address of the AXI is 1 byte, and each address of the particle corresponding to the PS channel stores 128bit data, so the correspondence between the four addresses is as follows:

图5是根据本申请实施例的缓存管理方法中的地址对应关系的示意图一，使用Samsung HBM2 4Hi4G时，每个PS的存储空间为4GB/16＝2Gb,对应关系如图5所示，{A[31:A28],A[4:0]}填充为0：FIG. 5 is a schematic diagram of address correspondence in the cache management method according to an embodiment of the present application. When using Samsung HBM2 4Hi4G, the storage space of each PS is 4GB/16=2Gb, and the correspondence is as shown in FIG. 5, {A [31:A28],A[4:0]} is filled with 0:

图6是根据本申请实施例的缓存管理方法中的地址对应关系的示意图二，使用Samsung HBM2 8Hi8G时，每个PS的存储空间为8GB/16＝4Gb,对应关系如图6所示，{A[31:A29],A[4:0]}填充为0：Fig. 6 is a schematic diagram 2 of the address correspondence in the cache management method according to the embodiment of the present application. When using Samsung HBM2 8Hi8G, the storage space of each PS is 8GB/16=4Gb, and the correspondence is as shown in Fig. 6, {A [31:A29],A[4:0]} is filled with 0:

图7是根据本申请实施例的缓存管理方法中的地址对应关系的示意图三，使用Samsung HBM2E 4Hi8G时，每个PS的存储空间为8GB/16＝4Gb,对应关系如图7所示，{A[31:A29],A[4:0]}填充为0：FIG. 7 is a schematic diagram of address correspondence in the cache management method according to an embodiment of the present application. Three, when using Samsung HBM2E 4Hi8G, the storage space of each PS is 8GB/16=4Gb, and the corresponding relationship is shown in FIG. 7, {A [31:A29],A[4:0]} is filled with 0:

图8是根据本申请实施例的缓存管理方法中的地址对应关系的示意图四，使用Samsung HBM2E 8Hi16G时，每个PS的存储空间为8GB/16＝4Gb,对应关系如图8所示，{A[31:A30],A[4:0]}填充为0：FIG. 8 is a schematic diagram 4 of the address correspondence in the cache management method according to the embodiment of the present application. When using Samsung HBM2E 8Hi16G, the storage space of each PS is 8GB/16=4Gb, and the corresponding relationship is shown in FIG. 8, {A [31:A30],A[4:0]} is filled with 0:

图9是根据本申请实施例的缓存管理方法中的地址映射的示意图一，要充分发挥HBM的带宽，需均衡的使用16个PS，同时通道内增加Bank切换，当前采用4Hi4G的颗粒，逻辑地址和物理地址间的映射关系如图9所示。Figure 9 is a schematic diagram of address mapping in the cache management method according to the embodiment of the present application. To fully utilize the bandwidth of HBM, it is necessary to use 16 PS in a balanced manner. The mapping relationship between physical addresses and physical addresses is shown in Figure 9.

图10是根据本申请实施例的缓存管理方法中的地址映射的示意图二，外挂DDR5的时候，可配置连接3个通道的DDR5，其逻辑地址和物理地址的映射关系如图10所示。FIG. 10 is a schematic diagram of address mapping in the buffer management method according to the embodiment of the present application. When DDR5 is plugged in, DDR5 connected to 3 channels can be configured, and the mapping relationship between its logical address and physical address is shown in FIG. 10 .

图11是根据本申请实施例的缓存管理方法中的地址映射的示意图三，外挂DDR4的时候，可配置连接3个通道的DDR5，其逻辑地址和物理地址的映射关系如图11所示。FIG. 11 is a schematic diagram of address mapping in the buffer management method according to the embodiment of the present application. Third, when DDR4 is plugged in, DDR5 connected to 3 channels can be configured, and the mapping relationship between its logical address and physical address is shown in FIG. 11 .

第三步，片外地址管理。根据片外缓存的需求，和缓存控制器的结构特点，片外缓存地址是以块block(可配置的固定大小数据单元)为单位进行管理，称为虚拟地址(也称作逻辑地址)，处理器发送给MMU的一个数据包需按照block进行切分，以便与block地址进行对应，这样处理后，一个数据包有可能对应多个block数据，即需要产生多个block地址。根据片外缓存控制器的类型不同，地址管理的区间不同。The third step is off-chip address management. According to the requirements of the off-chip cache and the structural characteristics of the cache controller, the off-chip cache address is managed in units of blocks (configurable fixed-size data units), which are called virtual addresses (also called logical addresses). A data packet sent by the device to the MMU needs to be segmented according to the block so as to correspond to the block address. After such processing, a data packet may correspond to multiple block data, that is, multiple block addresses need to be generated. Depending on the type of off-chip cache controller, the range of address management is different.

图12是根据本申请实施例的缓存管理方法的片外地址管理示意图，如图12所示，Trunk(中继端口)地址为128K，T[16:13]为16个大链表ID，T[12:10]为每个大链表下面的子链表ID，T[9:0]为每个子链表内链表编号，B[3:0]为每个trunk下面blk的个数，C[2:0]为每个blk下面的切片个数(数据+1代表切片个数)。链表ID申请采用RR(Round-Robin(轮询))调度，当一条流进来时会先RR申请选中大链表，大链表内再RR选中一个子链表，然后从子链表内申请一个链表。同一条流会优先耗尽一个turnk中的blk，不同流会重新申请trunk。地址管理的{T[9:0]，B[3:0]，T[12:10]，T[16:13]}可作为一个counter，此地址是连续变化。当外挂HBM的时候，T[16:13]个16个channel一一对应；当外挂DDR的时候，对应选通3个通道，使用T[14:13]作为channel_ID，即选通0(4、8、12)、1(5、9、13)、2(6、10、14)通道，3、7、11、15通道可配置不使用。由于T[16:13]的低两bit的粒度更小，所以其变化更快，通过地址映射后的缓存访问效率会更好。根据设定的位宽大小，MMU可管理2(4+3+10+4)＝2M个节点，总共可管理2Mx3k(blk大小可配，以3k为例)＝48G的数据。Fig. 12 is a schematic diagram of off-chip address management according to the cache management method of the embodiment of the present application. As shown in Fig. 12, the Trunk (trunk port) address is 128K, and T[16:13] is 16 large linked list IDs, and T[ 12:10] is the sub-list ID under each large linked list, T[9:0] is the linked list number in each sub-linked list, B[3:0] is the number of blks under each trunk, C[2:0 ] is the number of slices under each blk (data + 1 represents the number of slices). Linked list ID application adopts RR (Round-Robin (polling)) scheduling. When a stream comes in, RR application will first select the large linked list, then RR will select a sub-linked list in the large linked list, and then apply for a linked list from the sub-linked list. The same flow will first exhaust the blk in a turnk, and different flows will re-apply for the trunk. Address management {T[9:0], B[3:0], T[12:10], T[16:13]} can be used as a counter, and the address changes continuously. When plugging in HBM, T[16:13] correspond to 16 channels one by one; when plugging in DDR, correspondingly select 3 channels, use T[14:13] as channel_ID, that is, select 0(4, 8, 12), 1 (5, 9, 13), 2 (6, 10, 14) channels, 3, 7, 11, 15 channels can be configured not to use. Since the granularity of the lower two bits of T[16:13] is smaller, it changes faster, and the cache access efficiency after address mapping will be better. According to the set bit width, the MMU can manage 2(4+3+10+4)=2M nodes, and can manage a total of 2Mx3k (blk size can be configured, take 3k as an example)=48G data.

为了保证整体的性能和带宽，在地址管理模块需保证各伪信道PC间的均衡访问，避免出现短时间内个别PC频繁访问，而其余PC空闲的情况，如果出现某个PC返回较慢，反馈给MMU忙状态时候，可根据实时命令统计和单PC历史命令统计，减少一次调度，从数据流的整个过程分析，PC之间依然是均衡访问。In order to ensure the overall performance and bandwidth, the address management module needs to ensure balanced access among the pseudo-channel PCs to avoid frequent access by individual PCs in a short period of time while the rest of the PCs are idle. If a PC returns slowly, feedback When the MMU is busy, one scheduling can be reduced based on real-time command statistics and single PC historical command statistics. From the analysis of the entire process of data flow, there is still a balanced access between PCs.

第四步，小包拼接技术。将片上PD存储至片外，理论上因其包长的原因，会降低片外存储的带宽和效率，但因其存储数量较大，为了节省片上资源，将PD搬移至片外进行存储，此处采用拼包技术对其进行存储，拼包的目的是通过挤泡将片外需要拼包的PK和PD中的无效字节挤掉，然后将有效字节进行拼接，以此来提高片外缓存的利用率。The fourth step is small packet splicing technology. Storing on-chip PDs off-chip will theoretically reduce the bandwidth and efficiency of off-chip storage due to the packet length. Packing technology is used to store it. The purpose of packing is to squeeze out the invalid bytes in the PK and PD that need to be packed outside the chip by squeezing bubbles, and then splicing the valid bytes to improve the off-chip Cache utilization.

拼包的步骤是MMU收到PMU送过来的片外包PD和PK，首先进行PD信息的提取和拼包，同时将PK中的无效字节挤掉，对小包进行移位拼接，提取出有效数据，然后将拼包后的PD信息和提取出有效数据的PK再次进行拼包，将结果发送至片外进行存储。The step of grouping is that the MMU receives the outsourced PD and PK sent by the PMU, firstly extracts and combines the PD information, and at the same time squeezes out the invalid bytes in the PK, shifts and splices the small packets, and extracts valid data , and then combine the packaged PD information and the PK from which the effective data is extracted again, and send the result to off-chip for storage.

由于片外的总线位宽是384B，除了单包(不需要拼包)，不同种类的拼包情况对于每一拍输出的结果也不一样，小包拼接分两种情况：小包拼小包(PD_len+PK_len≤384B)；小包拼小包(PD_len+PK_len>384B)。当第一种小包拼小包时，PD的长度加上提取出的数据长度小于总线位宽，可以一拍输出；当第二种小包拼小包的时候，由于长度大于总线位宽，需要分两拍输出，第一拍输出高384B，剩下的部分在第二拍尾部补零输出，这样会对线速产生影响，需尽量规避。Since the bit width of the off-chip bus is 384B, except for single packets (does not need to be combined), different types of combined packages will have different output results for each shot. There are two cases of small package splicing: PK_len≤384B); small package and small package (PD_len+PK_len>384B). When the first type of small packets are combined into small packets, the length of the PD plus the length of the extracted data is less than the bus bit width, and can be output in one shot; when the second type of small packets are combined into small packets, because the length is greater than the bus bit width, it needs to be divided into two shots. Output, the output of the first beat is 384B high, and the remaining part is output with zeros at the end of the second beat, which will affect the line speed and should be avoided as much as possible.

本申请的方案不仅能够兼容多种类型的缓存控制器，且能够提升片外读写带宽和访问效率。以下通过实测数据，分别举例说明外挂HBM/DDR5/DDR4三种类型控制器的情况下，带宽和效率提升前后的实测数据。The solution of the present application is not only compatible with various types of cache controllers, but also can improve off-chip read and write bandwidth and access efficiency. The following uses the actual measurement data to illustrate the actual measurement data before and after the bandwidth and efficiency improvement in the case of external HBM/DDR5/DDR4 three types of controllers.

使用标准地址映射的测试外挂HBM的结果如表1所示。本申请的效率提升方法测试结果如表2所示，表3为外挂DDR5的测试数据，表4为外挂DDR4的测试数据，经过实测结果对比分析，各模式下的片外总带宽和存储效率均有提升。The results of testing the plug-in HBM using standard address mapping are shown in Table 1. The test results of the efficiency improvement method of the present application are shown in Table 2. Table 3 is the test data of the external DDR5, and Table 4 is the test data of the external DDR4. After comparative analysis of the actual measurement results, the total off-chip bandwidth and storage efficiency in each mode are all the same. There is an improvement.

表1Table 1

用例编号use case number 片外存储包长(Byte)Off-chip storage package length (Byte) 总带宽(Gbps)Total Bandwidth (Gbps) 效率efficiency 11 128128 862862 0.350.35 22 192192 13521352 0.550.55 33 256256 16431643 0.670.67 44 288288 16071607 0.650.65 55 352352 17391739 0.710.71 66 384384 17691769 0.720.72 77 768768 16481648 0.670.67 88 11521152 16341634 0.670.67 99 15361536 16281628 0.660.66 1010 15681568 16011601 0.650.65

表2Table 2

表2(续)Table 2 (continued)

用例编号use case number 片外存储包长(Byte)Off-chip storage package length (Byte) 总带宽(Gbps)Total Bandwidth (Gbps) 效率efficiency 77 768768 21692169 0.880.88 88 11521152 21662166 0.880.88 99 15361536 21332133 0.860.86 1010 15681568 20282028 0.830.83

表3table 3

用例编号use case number 片外存储包长(Byte)Off-chip storage package length (Byte) 总带宽(Gbps)Total Bandwidth (Gbps) 效率efficiency 11 128128 136.89136.89 0.590.59 22 192192 154.22154.22 0.670.67 33 256256 163.54163.54 0.710.71 44 384384 168.49168.49 0.730.73 55 768768 170.63170.63 0.740.74 66 11521152 172.45172.45 0.750.75 77 11841184 169.61169.61 0.740.74 88 15361536 173.55173.55 0.750.75 99 15681568 169.95169.95 0.740.74 1010 1228812288 166.54166.54 0.720.72

表4Table 4

用例编号use case number 片外存储包长(Byte)Off-chip storage package length (Byte) 总带宽(Gbps)Total Bandwidth (Gbps) 效率efficiency 11 128128 200.64200.64 65.3565.35 22 192192 247.85247.85 80.7380.73 33 256256 267.03267.03 86.9886.98 44 384384 272.29272.29 88.6988.69 55 768768 270.61270.61 88.1588.15 66 11521152 243.78243.78 79.4179.41 77 15361536 252.68252.68 82.3182.31 88 15681568 232.25232.25 75.6575.65 99 61446144 233.99233.99 76.2276.22 1010 1228812288 220.05220.05 71.6871.68

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，或者网络设备等)执行本申请各个实施例上述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is Better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the above-mentioned methods in various embodiments of the present application.

在本实施例中还提供了一种图形渲染的处理装置，该装置用于实现上述实施例及优选实施方式，已经进行过说明的不再赘述。如以下所使用的，术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现，但是硬件，或者软件和硬件的组合的实现也是可能并被构想的。This embodiment also provides a processing device for graphics rendering, which is used to implement the above embodiments and preferred implementation modes, and what has been described will not be repeated here. As used below, the term "module" may be a combination of software and/or hardware that realizes a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.

图13本申请实施例的缓存管理装置的结构框图，如图13所示，该装置包括：FIG. 13 is a structural block diagram of a cache management device according to an embodiment of the present application. As shown in FIG. 13 , the device includes:

识别单元1302，用于使缓存管理单元MMU基于CPU配置信息，识别外接的缓存控制器类型；The identification unit 1302 is configured to enable the cache management unit MMU to identify the type of the external cache controller based on the CPU configuration information;

确认单元1304，用于基于地址管理子模块和上述缓存控制器类型对应的地址区域，以查表的方式确认偏移地址；A confirmation unit 1304, configured to confirm the offset address in a table look-up manner based on the address management submodule and the address area corresponding to the above cache controller type;

计算单元1306，用于基于上述偏移地址计算出上述缓存控制器类型的逻辑地址；其中，不同的上述缓存控制器类型选通的对外连接通道数不同。The calculation unit 1306 is configured to calculate the logical address of the above-mentioned cache controller type based on the above-mentioned offset address; wherein, the number of external connection channels selected by different cache controller types is different.

通过本申请，由于采用了缓存管理单元MMU基于CPU配置信息，识别外接的缓存控制器类型；基于地址管理子模块和上述缓存控制器类型对应的地址区域，以查表的方式确认偏移地址；基于上述偏移地址计算出上述缓存控制器类型的逻辑地址；其中，不同的上述缓存控制器类型选通的对外连接通道数不同；因此，实现了同一种框架下不同的控制器的切换，可以解决同一种框架下兼容不同的控制器的问题，进而同一框架下可兼容多种控制器，以及提高存储效率的效果。Through this application, because the cache management unit MMU is used to identify the type of external cache controller based on the CPU configuration information; based on the address management submodule and the address area corresponding to the above cache controller type, the offset address is confirmed in a table lookup manner; The logical address of the above-mentioned cache controller type is calculated based on the above-mentioned offset address; wherein, the number of external connection channels selected by different above-mentioned cache controller types is different; therefore, the switching of different controllers under the same framework is realized, which can Solve the problem of being compatible with different controllers under the same framework, and then be compatible with multiple controllers under the same framework, and improve the effect of storage efficiency.

需要说明的是，上述各个模块是可以通过软件或硬件来实现的，对于后者，可以通过以下方式实现，但不限于此：上述模块均位于同一处理器中；或者，上述各个模块以任意组合的形式分别位于不同的处理器中。It should be noted that the above-mentioned modules can be realized by software or hardware. For the latter, it can be realized by the following methods, but not limited to this: the above-mentioned modules are all located in the same processor; or, the above-mentioned modules can be combined in any combination The forms of are located in different processors.

本申请的实施例还提供了一种计算机可读存储控制程序，该计算机可读存储控制程序中存储有计算机程序，其中，该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。Embodiments of the present application also provide a computer-readable storage control program, where a computer program is stored in the computer-readable storage control program, wherein the computer program is configured to perform any of the above method embodiments when running. step.

在一个示例性实施例中，上述计算机可读存储控制程序可以包括但不限于：CPU和存储控制器的驱动程序、FPGA与HBM/DDR3连接的控制程序等。In an exemplary embodiment, the above-mentioned computer-readable storage control program may include but not limited to: a driver program for a CPU and a storage controller, a control program for connecting an FPGA to an HBM/DDR3, and the like.

本申请的实施例还提供了一种控制器，包括缓存器(缓存部分数据)和处理器，该控制器中存储有计算机程序，该控制器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。Embodiments of the present application also provide a controller, including a buffer (cache part of the data) and a processor, a computer program is stored in the controller, and the controller is configured to run the computer program to perform any one of the above methods Steps in the examples.

在一个示例性实施例中，上述控制器还可以包括协议转换的传输设备，其中，该传输设备和上述控制器连接，实现与缓存控制器的连接。In an exemplary embodiment, the above-mentioned controller may further include a transmission device for protocol conversion, wherein the transmission device is connected to the above-mentioned controller to realize the connection with the cache controller.

本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例，本实施例在此不再赘述。For specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and exemplary implementation manners, and details will not be repeated here in this embodiment.

显然，本领域的技术人员应该明白，上述的本申请的各模块或各步骤可以用通用的计算装置来实现，它们可以集中在单个的计算装置上，或者分布在多个计算装置所组成的网络上，它们可以用计算装置可执行的程序代码来实现，从而，可以将它们存储在存储装置中由计算装置来执行，并且在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤，或者将它们分别制作成各个集成电路模块，或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样，本申请不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that each module or each step of the above-mentioned application can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices In fact, they can be implemented in program code executable by a computing device, and thus, they can be stored in a storage device to be executed by a computing device, and in some cases, can be executed in an order different from that shown here. Or described steps, or they are fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present application is not limited to any specific combination of hardware and software.

以上所述仅为本申请的优选实施例而已，并不用于限制本申请，对于本领域的技术人员来说，本申请可以有各种更改和变化。凡在本申请的原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. For those skilled in the art, there may be various modifications and changes in the present application. Any modifications, equivalent replacements, improvements, etc. made within the principles of this application shall be included within the scope of protection of this application.

Claims

1. A method for cache management, comprising:

the cache management unit MMU identifies the type of an externally connected cache controller based on the CPU configuration information;

confirming an offset address in a table look-up mode based on address management sub-modules and address areas corresponding to the types of the cache controllers;

calculating a logical address of the cache controller type based on the offset address; and the number of the external connection channels gated by different cache controller types is different.

2. The method of claim 1, further comprising:

the address mapping submodule of the MMU reads a preset address mapping relation; the address mapping relation is that the on-chip logic address is converted into a physical address which can be accepted by the cache chip.

3. The method of claim 1, further comprising:

the address mapping submodule of the MMU reads a preset address mapping relation; the address mapping relation is that the on-chip logic address is converted into a physical address which can be accepted by the cache chip;

and according to different application scenes, reconfiguring through the CPU to obtain an address mapping relation corresponding to the application scenes after reconfiguration.

4. The method of claim 1, further comprising:

and segmenting data packets sent to the MMU by the CPU according to the demand of the off-chip cache and the structural attribute of a cache controller, wherein the segmented data packets correspond to block addresses, one data packet corresponds to a plurality of block addresses, and the cache controller has different types and different address management intervals.

5. The method of claim 1, further comprising:

the MMU receives off-chip data sent by a message cache management unit PMU, wherein the off-chip data comprises a message descriptor PD and a message PK;

extracting data information of the PD;

the data information is spliced to obtain spliced data, and the first data in the PK is deleted;

shifting and splicing the spliced data to extract second data;

splicing and packaging the data information of the PD in the spliced and packaged data and the extracted second data again to obtain target spliced and packaged data

And sending the target packet data to an off-chip cache for storage.

6. The method of claim 5, further comprising:

when the data bit length of the PD plus the data bit length of the extracted second data is less than or equal to the bus bit width, outputting the PD and the second data simultaneously;

when the data bit length of the PD plus the extracted data bit length of the second data is larger than the bus bit width, splitting the PD and the second data into first split data and second split data;

outputting the PD and the first split data;

and carrying out zero padding operation on the second split data to obtain zero padding processing data with the data bit length equal to the bus bit width, and outputting the zero padding processing data.

7. The method according to any one of claims 1 to 6, further comprising:

a message cache management unit (PMU) issues a write message;

the PMU stores the write message and sends a write release command to a traffic cache management unit (TMMU); the TMMU transmits a message descriptor to a queue management unit QMU;

the QMU sends a write command through the TMMU and transmits the write command to the MMU, and the QMU sends a write release signal to the TMMU after finishing storing the write command;

the command queue issues a message reading command, reads message reading data and returns the message reading data to the PMU;

and the TMMU issues a read command and reads the message descriptor data.

8. A cache management apparatus, comprising:

the identification unit is used for enabling the cache management unit MMU to identify the type of the externally connected cache controller based on the CPU configuration information of the central processing unit;

the confirming unit is used for confirming the offset address in a table look-up mode based on the address management submodule and the address area corresponding to the type of the cache controller;

a calculating unit, configured to calculate a logical address of the cache controller type based on the offset address; and the number of the external connection channels gated by different cache controller types is different.

9. A computer-readable storage control program, characterized in that a computer program is stored in the computer-readable storage program, wherein the computer program is arranged to perform the method of any of claims 1 to 7 when executed.

10. A controller comprising a buffer and a processor, wherein a computer program is stored in the controller, and wherein the processor is configured to execute the computer program to perform the method according to any of claims 1 to 7.