CN114238167B

CN114238167B - Information prefetching method, processor and electronic equipment

Info

Publication number: CN114238167B
Application number: CN202111529899.2A
Authority: CN
Inventors: 胡世文
Original assignee: Hygon Information Technology Co Ltd
Current assignee: Hygon Information Technology Co Ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-09-09
Anticipated expiration: 2041-12-14
Also published as: CN114238167A

Abstract

Provided are an information prefetching method, a processor and an electronic device. The method is used for a processor. The processor includes a first level cache space, a first page table walker, a second page table walker, and at least one predetermined cache space. The at least one preset cache space comprises a target preset cache space, the first page table walker and the target preset cache space are arranged at the same path level, and the first page table walker is in communication connection with the target preset cache space. The information prefetching method comprises the following steps: responding to a first page table walker selected from a first page table walker and a second page table walker based on a preset rule to execute an address translation operation to obtain a physical address, and sending a pre-fetching request to a target preset cache space by the first page table walker; and responding to the prefetching request, and performing information prefetching operation on the target preset cache space based on the physical address. The method can reduce address translation time delay and simultaneously realize the data/instruction prefetching function, reduce the time delay of read-write operation and improve the overall performance of the system.

Description

Information prefetching method, processor, electronic device

技术领域technical field

本公开的实施例涉及一种信息预取方法、处理器、电子设备。Embodiments of the present disclosure relate to an information prefetching method, a processor, and an electronic device.

背景技术Background technique

在计算机技术领域，计算机操作系统的重要职能之一是内存管理。在多进程操作系统中，每个进程都有自己的虚拟地址空间，可以使用系统规定范围内的任意虚拟地址(Virtual Address)。中央处理器(Central Processing Unit， CPU)执行应用程序时所使用的地址是虚拟地址。操作系统给进程分配内存时，需要把使用到的虚拟地址映射到物理地址(Physical Address)，物理地址才是真正的物理内存访问地址。通过这种将地址区分为虚拟地址和物理地址的方式，可以简化程序编译，使编译器基于连续的、充足的虚拟地址空间来编译程序，并且不同进程的虚拟地址被分配到不同的物理地址，使得系统能够同时运行多个进程，从而提高整个计算机系统的运行效率。此外，由于应用程序可以使用但无法更改地址翻译，因此一个进程无法访问到另一个进程的内存内容，从而增加了系统的安全性。In the field of computer technology, one of the important functions of a computer operating system is memory management. In a multi-process operating system, each process has its own virtual address space and can use any virtual address (Virtual Address) within the range specified by the system. An address used by a central processing unit (Central Processing Unit, CPU) to execute an application program is a virtual address. When the operating system allocates memory to a process, it needs to map the virtual address used to the physical address, and the physical address is the real physical memory access address. By distinguishing addresses into virtual addresses and physical addresses in this way, program compilation can be simplified, so that the compiler compiles programs based on a continuous and sufficient virtual address space, and the virtual addresses of different processes are assigned to different physical addresses, It enables the system to run multiple processes at the same time, thereby improving the operating efficiency of the entire computer system. In addition, since the address translation can be used by applications but cannot be changed, one process cannot access the memory contents of another process, thus increasing the security of the system.

发明内容SUMMARY OF THE INVENTION

本公开至少一实施例提供一种信息预取方法，用于处理器，其中，所述处理器包括第一级缓存空间、第一页表遍历器、第二页表遍历器和至少一个预设缓存空间，所述第一级缓存空间和所述至少一个预设缓存空间依序通信连接以形成通信链路，所述至少一个预设缓存空间包括目标预设缓存空间，所述第一页表遍历器与所述目标预设缓存空间设置在同一路径等级，所述第一页表遍历器与所述目标预设缓存空间通信连接，所述第二页表遍历器与所述第一级缓存空间设置在同一路径等级，所述第二页表遍历器与所述第一级缓存空间通信连接，所述方法包括：响应于基于预设规则在所述第一页表遍历器和所述第二页表遍历器中选择所述第一页表遍历器执行地址翻译操作得到物理地址，所述第一页表遍历器向所述目标预设缓存空间发送预取请求，其中，所述预取请求包括所述物理地址；响应于所述预取请求，所述目标预设缓存空间基于所述物理地址进行信息预取操作。At least one embodiment of the present disclosure provides an information prefetching method for a processor, wherein the processor includes a first-level cache space, a first page table walker, a second page table walker, and at least one preset Cache space, the first-level cache space and the at least one preset cache space are sequentially connected in communication to form a communication link, the at least one preset cache space includes a target preset cache space, the first page table The traverser and the target preset cache space are set at the same path level, the first page table traverser is in communication connection with the target preset cache space, and the second page table traverser is connected to the first level cache The space is set at the same path level, the second page table traverser is connected in communication with the first level cache space, and the method includes: in response to the first page table traverser and the first page table traverser based on a preset rule. Among the two page table walkers, the first page table walker is selected to perform an address translation operation to obtain a physical address, and the first page table walker sends a prefetch request to the target preset cache space, wherein the prefetch The request includes the physical address; in response to the prefetch request, the target preset cache space performs an information prefetch operation based on the physical address.

例如，在本公开一实施例提供的方法中，处理器还包括处理器核，其中，所述目标预设缓存空间基于所述物理地址进行所述信息预取操作，包括：确定预取缓存空间，其中，所述预取缓存空间是所述第一级缓存空间和所述至少一个预设缓存空间中的至少之一；基于所述物理地址，所述目标预设缓存空间获取所述物理地址对应存储的目标信息；所述目标预设缓存空间将所述目标信息发送至所述预取缓存空间。For example, in the method provided by an embodiment of the present disclosure, the processor further includes a processor core, wherein performing the information prefetching operation on the target preset cache space based on the physical address includes: determining a prefetch cache space , wherein the prefetch cache space is at least one of the first-level cache space and the at least one preset cache space; based on the physical address, the target preset cache space obtains the physical address corresponding to the stored target information; the target preset buffer space sends the target information to the prefetch buffer space.

例如，在本公开一实施例提供的方法中，确定所述预取缓存空间，包括：获取预设标识，其中，所述预设标识表示缓存空间的等级信息，所述预设标识存储在指定存储空间或者被携带在所述预取请求中；根据所述预设标识确定所述预取缓存空间。For example, in the method provided by an embodiment of the present disclosure, determining the prefetching cache space includes: acquiring a preset identifier, where the preset identifier represents level information of the cache space, and the preset identifier is stored in a specified The storage space is either carried in the prefetch request; the prefetch cache space is determined according to the preset identifier.

例如，在本公开一实施例提供的方法中，所述第一级缓存空间包括第一级指令空间和第一级数据空间，所述等级信息指示第一级，根据所述预设标识确定所述预取缓存空间，包括：响应于所述目标信息为指令类型，确定所述预取缓存空间为所述第一级指令空间；响应于所述目标信息为数据类型，确定所述预取缓存空间为所述第一级数据空间。For example, in the method provided by an embodiment of the present disclosure, the first-level cache space includes a first-level instruction space and a first-level data space, the level information indicates the first level, and the first level is determined according to the preset identifier. The prefetch cache space includes: in response to the target information being an instruction type, determining that the prefetch cache space is the first-level instruction space; in response to the target information being a data type, determining the prefetch cache space The space is the first level data space.

例如，在本公开一实施例提供的方法中，基于所述物理地址，所述目标预设缓存空间获取所述物理地址对应存储的所述目标信息，包括：基于所述物理地址，从所述目标预设缓存空间至内存的路径中以逐级查询的方式获取所述目标信息。For example, in the method provided by an embodiment of the present disclosure, acquiring, by the target preset cache space, the target information stored corresponding to the physical address based on the physical address includes: based on the physical address, from the physical address The target information is acquired in a step-by-step query manner in the path from the target preset cache space to the memory.

例如，在本公开一实施例提供的方法中，所述目标预设缓存空间将所述目标信息发送至所述预取缓存空间，包括：从所述目标预设缓存空间至所述预取缓存空间的路径中以逐级传递的方式将所述目标信息发送至所述预取缓存空间。For example, in the method provided by an embodiment of the present disclosure, the target preset cache space sending the target information to the prefetch cache space includes: from the target preset cache space to the prefetch cache space The target information is sent to the prefetch buffer space in a step-by-step manner in the path of the space.

例如，在本公开一实施例提供的方法，还包括：所述目标预设缓存空间将所述物理地址发送至所述处理器核。For example, in the method provided by an embodiment of the present disclosure, the method further includes: sending the physical address to the processor core by the target preset cache space.

例如，在本公开一实施例提供的方法中，所述目标预设缓存空间将所述物理地址发送至所述处理器核，包括：所述目标预设缓存空间以逐级传递的方式将所述物理地址发送至所述处理器核。For example, in the method provided by an embodiment of the present disclosure, the sending, by the target preset cache space, the physical address to the processor core includes: the target preset cache space transfers the all The physical address is sent to the processor core.

例如，在本公开一实施例提供的方法中，所述处理器还包括页表项缓存空间，所述处理器核与所述页表项缓存空间设置在同一路径等级，所述处理器核与所述页表项缓存空间通信连接，所述方法还包括：响应于所述页表项缓存空间中不存在地址翻译所需的页表项数据，利用所述处理器核生成所述地址翻译请求。For example, in the method provided by an embodiment of the present disclosure, the processor further includes a page table entry cache space, the processor core and the page table entry cache space are set at the same path level, and the processor core and the page table entry cache space are set at the same path level. The page table entry cache space communication connection, the method further includes: in response to the page table entry cache space does not exist in the page table entry data required for address translation, using the processor core to generate the address translation request .

例如，在本公开一实施例提供的方法，还包括：响应于基于所述预设规则在所述第一页表遍历器和所述第二页表遍历器中选择所述第一页表遍历器执行所述地址翻译操作，所述第一页表遍历器响应于所述地址翻译请求执行所述地址翻译操作，以得到所述物理地址。For example, the method provided in an embodiment of the present disclosure further includes: in response to selecting the first page table traversal in the first page table traverser and the second page table traverser based on the preset rule The first page table walker performs the address translation operation in response to the address translation request to obtain the physical address.

例如，在本公开一实施例提供的方法中，所述第一页表遍历器响应于所述地址翻译请求执行所述地址翻译操作，以得到所述物理地址，包括：所述第一页表遍历器接收所述处理器核生成的所述地址翻译请求，经由所述目标预设缓存空间从内存获取页表项数据，并使用所述页表项数据进行所述地址翻译操作，以获得所述物理地址。For example, in the method provided by an embodiment of the present disclosure, the first page table walker performs the address translation operation in response to the address translation request to obtain the physical address, including: the first page table The traverser receives the address translation request generated by the processor core, obtains page table entry data from the memory via the target preset cache space, and uses the page table entry data to perform the address translation operation to obtain the address translation request. the physical address.

例如，在本公开一实施例提供的方法中，所述第一页表遍历器经由所述目标预设缓存空间从所述内存获取所述页表项数据，并使用所述页表项数据进行所述地址翻译操作，包括：所述第一页表遍历器根据所述地址翻译请求从所述目标预设缓存空间至所述内存的路径中以逐级查询的方式获取所述页表项数据，并使用所述页表项数据进行翻译以获得所述物理地址。For example, in the method provided by an embodiment of the present disclosure, the first page table traverser obtains the page table entry data from the memory via the target preset cache space, and uses the page table entry data to perform The address translation operation includes: the first page table traverser obtains the page table entry data in a step-by-step query manner from the path from the target preset cache space to the memory according to the address translation request , and use the page table entry data for translation to obtain the physical address.

例如，在本公开一实施例提供的方法，还包括：响应于所述第一页表遍历器被确定执行所述地址翻译操作，所述第一页表遍历器接收所述第二页表遍历器转发的所述地址翻译请求。For example, the method provided in an embodiment of the present disclosure further includes: in response to the first page table walker being determined to perform the address translation operation, the first page table walker receives the second page table walker The address translation request forwarded by the server.

例如，在本公开一实施例提供的方法中，所述地址翻译请求包括翻译信息，所述翻译信息包括：地址翻译请求序列号、需要翻译的虚拟地址值、最高级页表的初始地址。For example, in the method provided by an embodiment of the present disclosure, the address translation request includes translation information, and the translation information includes: the address translation request serial number, the virtual address value to be translated, and the initial address of the highest-level page table.

例如，在本公开一实施例提供的方法中，所述翻译信息还包括请求类型标识，所述请求类型标识指示所述物理地址对应存储的所述目标信息是指令类型或数据类型。For example, in the method provided by an embodiment of the present disclosure, the translation information further includes a request type identifier, where the request type identifier indicates that the target information stored corresponding to the physical address is an instruction type or a data type.

例如，在本公开一实施例提供的方法，还包括：所述处理器核根据所述第一级缓存空间和所述至少一个预设缓存空间的存储状态确定用于存储所述目标信息的缓存空间，并使所述地址翻译请求以所述预设标识的方式携带用于存储所述目标信息的缓存空间的等级信息；所述第一页表遍历器解析得到所述预设标识，并使所述预取请求携带所述预设标识。For example, in the method provided in an embodiment of the present disclosure, the method further includes: determining, by the processor core, a cache for storing the target information according to the storage state of the first-level cache space and the at least one preset cache space space, and make the address translation request carry the level information of the cache space used to store the target information in the form of the preset identifier; the first page table traverser parses and obtains the preset identifier, and makes The prefetch request carries the preset identifier.

例如，在本公开一实施例提供的方法，还包括：响应于基于所述预设规则在所述第一页表遍历器和所述第二页表遍历器中选择所述第二页表遍历器执行所述地址翻译操作，所述第二页表遍历器响应于所述地址翻译请求执行所述地址翻译操作，以得到所述物理地址。For example, the method provided in an embodiment of the present disclosure further includes: in response to selecting the second page table traversal in the first page table traverser and the second page table traverser based on the preset rule The second page table walker performs the address translation operation in response to the address translation request to obtain the physical address.

例如，在本公开一实施例提供的方法中，所述第二页表遍历器响应于所述地址翻译请求执行所述地址翻译操作，以得到所述物理地址，包括：所述第二页表遍历器接收所述处理器核生成的所述地址翻译请求，根据所述地址翻译请求从内存获取页表项数据，并使用所述页表项数据进行地址翻译，以获得所述物理地址。For example, in the method provided by an embodiment of the present disclosure, the second page table walker performs the address translation operation in response to the address translation request to obtain the physical address, including: the second page table The traverser receives the address translation request generated by the processor core, obtains page table entry data from the memory according to the address translation request, and performs address translation using the page table entry data to obtain the physical address.

例如，在本公开一实施例提供的方法中，所述预设规则包括：当所述页表项缓存空间中不存在地址翻译所需的页表项数据，或者所述页表项缓存空间中对应于地址翻译所需的页表项数据的页表级大于阈值时，确定由所述第一页表遍历器执行所述地址翻译操作。For example, in the method provided by an embodiment of the present disclosure, the preset rule includes: when the page table entry data required for address translation does not exist in the page table entry cache space, or the page table entry cache space does not exist in the page table entry cache space When the page table level corresponding to the page table entry data required for address translation is greater than a threshold, it is determined that the address translation operation is performed by the first page table walker.

例如，在本公开一实施例提供的方法中，所述处理器还包括请求缓存区，所述请求缓存区与所述第一页表遍历器设置在同一路径等级，所述请求缓存区与所述第一页表遍历器通信连接，并且与所述目标预设缓存空间通信连接，所述方法还包括：利用所述处理器核向所述请求缓存区发送待处理的地址翻译请求队列。For example, in the method provided by an embodiment of the present disclosure, the processor further includes a request buffer area, the request buffer area and the first page table traverser are set at the same path level, the request buffer area and all The first page table traverser is communicatively connected, and is communicatively connected to the target preset buffer space, and the method further includes: using the processor core to send a pending address translation request queue to the request buffer area.

例如，在本公开一实施例提供的方法中，所述至少一个预设缓存空间包括第二级缓存空间至第N级缓存空间，N是大于2的整数，所述第N级缓存空间距离内存最近且距离所述处理器核最远，所述第二级缓存空间至所述第N级缓存空间中的任一级缓存空间作为所述目标预设缓存空间。For example, in the method provided by an embodiment of the present disclosure, the at least one preset cache space includes a second-level cache space to an Nth-level cache space, where N is an integer greater than 2, and the Nth-level cache space is far from the memory The closest and farthest from the processor core, any level cache space from the second level cache space to the Nth level cache space is used as the target preset cache space.

例如，在本公开一实施例提供的方法中，所述第N级缓存空间是共享类型的缓存空间，所述第N级缓存空间作为所述目标预设缓存空间。For example, in the method provided by an embodiment of the present disclosure, the Nth level cache space is a shared type cache space, and the Nth level cache space is used as the target preset cache space.

例如，在本公开一实施例提供的方法中，所述第二级缓存空间是私有类型或共享类型的缓存空间，所述第二级缓存空间作为所述目标预设缓存空间。For example, in the method provided by an embodiment of the present disclosure, the second-level cache space is a private-type or shared-type cache space, and the second-level cache space serves as the target preset cache space.

本公开至少一个实施例还提供一种处理器，包括第一级缓存空间、第一页表遍历器、第二页表遍历器和至少一个预设缓存空间，其中，所述第一级缓存空间和所述至少一个预设缓存空间依序通信连接以形成通信链路，所述至少一个预设缓存空间包括目标预设缓存空间，所述第一页表遍历器与所述目标预设缓存空间设置在同一路径等级，所述第一页表遍历器与所述目标预设缓存空间通信连接，所述第二页表遍历器与所述第一级缓存空间设置在同一路径等级，所述第二页表遍历器与所述第一级缓存空间通信连接，所述第一页表遍历器配置为：响应于基于预设规则在所述第一页表遍历器和所述第二页表遍历器中选择所述第一页表遍历器执行地址翻译操作得到物理地址，向所述目标预设缓存空间发送预取请求，其中，所述预取请求包括所述物理地址；所述目标预设缓存空间配置为：响应于所述预取请求，基于所述物理地址进行信息预取操作。At least one embodiment of the present disclosure further provides a processor including a first-level cache space, a first page table walker, a second page table walker, and at least one preset cache space, wherein the first-level cache space and the at least one preset cache space is sequentially connected in communication to form a communication link, the at least one preset cache space includes a target preset cache space, and the first page table traverser is connected to the target preset cache space Set at the same path level, the first page table traverser is in communication connection with the target preset cache space, the second page table traverser and the first level cache space are set at the same path level, the Two page table walkers are connected in communication with the first-level cache space, and the first page table walker is configured to: in response to traversing the first page table walker and the second page table walker based on a preset rule The first page table traverser is selected to perform an address translation operation to obtain a physical address, and a prefetch request is sent to the target preset cache space, wherein the prefetch request includes the physical address; the target preset The cache space is configured to perform an information prefetch operation based on the physical address in response to the prefetch request.

例如，在本公开一实施例提供的处理器中，所述目标预设缓存空间还配置为确定预取缓存空间，基于所述物理地址，获取所述物理地址对应存储的目标信息，并将所述目标信息发送至所述预取缓存空间，所述预取缓存空间是所述第一级缓存空间和所述至少一个预设缓存空间中的至少之一。For example, in the processor provided by an embodiment of the present disclosure, the target preset cache space is further configured to determine a prefetch cache space, obtain target information stored corresponding to the physical address based on the physical address, and store the The target information is sent to the prefetch buffer space, where the prefetch buffer space is at least one of the first level buffer space and the at least one preset buffer space.

本公开至少一个实施例还提供一种电子设备，包括本公开任一实施例提供的处理器。At least one embodiment of the present disclosure further provides an electronic device, including the processor provided by any embodiment of the present disclosure.

附图说明Description of drawings

图1为一种地址翻译流程的示意图；1 is a schematic diagram of an address translation process;

图2为一种多核处理器的架构示意图；2 is a schematic diagram of the architecture of a multi-core processor;

图3为采用图2所示的处理器进行地址翻译的数据流示意图；Fig. 3 is the data flow schematic diagram that adopts the processor shown in Fig. 2 to carry out address translation;

图4为采用图2所示的处理器进行地址翻译以及请求数据的过程示意图；Fig. 4 is the process schematic diagram that adopts the processor shown in Fig. 2 to carry out address translation and request data;

图5为本公开一些实施例提供的一种处理器的架构示意图；FIG. 5 is a schematic structural diagram of a processor according to some embodiments of the present disclosure;

图6为本公开一些实施例提供的一种信息预取方法的流程示意图；6 is a schematic flowchart of a method for information prefetching provided by some embodiments of the present disclosure;

图7为采用本公开实施例提供的处理器进行地址翻译以及请求数据的过程示意图；7 is a schematic diagram of a process for performing address translation and requesting data using the processor provided by an embodiment of the present disclosure;

图8为图6中步骤S20的示例性流程图；Fig. 8 is an exemplary flowchart of step S20 in Fig. 6;

图9为图8中步骤S21的示例性流程图；Fig. 9 is an exemplary flowchart of step S21 in Fig. 8;

图10为图9中步骤S212的示例性流程图；Fig. 10 is an exemplary flowchart of step S212 in Fig. 9;

图11为本公开一些实施例提供的另一种信息预取方法的流程示意图；11 is a schematic flowchart of another information prefetching method provided by some embodiments of the present disclosure;

图12为本公开一些实施例提供的另一种信息预取方法的流程示意图；12 is a schematic flowchart of another information prefetching method provided by some embodiments of the present disclosure;

图13为本公开一些实施例提供的另一种信息预取方法的流程示意图；13 is a schematic flowchart of another information prefetching method provided by some embodiments of the present disclosure;

图14为本公开一些实施例提供的另一种信息预取方法的流程示意图；14 is a schematic flowchart of another information prefetching method provided by some embodiments of the present disclosure;

图15为本公开一些实施例提供的一种电子设备的示意框图；FIG. 15 is a schematic block diagram of an electronic device according to some embodiments of the present disclosure;

图16为本公开一些实施例提供的另一种电子设备的示意框图。FIG. 16 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure.

具体实施方式Detailed ways

为了使得本公开实施例的目的、技术方案和优点更加清楚，下面将结合本公开实施例的附图，对本公开实施例的技术方案进行清楚、完整地描述。显然，所描述的实施例是本公开的一部分实施例，而不是全部的实施例。基于所描述的本公开的实施例，本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例，都属于本公开保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. Obviously, the described embodiments are some, but not all, embodiments of the present disclosure. Based on the described embodiments of the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present disclosure.

除非另外定义，本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性，而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同，而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接，而是可以包括电性的连接，不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系，当被描述对象的绝对位置改变后，则该相对位置关系也可能相应地改变。Unless otherwise defined, technical or scientific terms used in this disclosure shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. As used in this disclosure, "first," "second," and similar terms do not denote any order, quantity, or importance, but are merely used to distinguish the various components. "Comprises" or "comprising" and similar words mean that the elements or things appearing before the word encompass the elements or things recited after the word and their equivalents, but do not exclude other elements or things. Words like "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "Down", "Left", "Right", etc. are only used to represent the relative positional relationship, and when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

图1为一种地址翻译流程的示意图，示出了四级页表的地址翻译过程。如图1所示，一个虚拟地址被划分为几段，例如分别表示为EXT、 OFFSET_lvl4、OFFSET_lvl3、OFFSET_lvl2、OFFSET_lvl1、OFFSET_pg。在该示例中，高位的虚拟地址段EXT没有被使用。虚拟地址段OFFSET_lvl4、 OFFSET_lvl3、OFFSET_lvl2、OFFSET_lvl1分别表示四级页表的偏移值，也即是，虚拟地址段OFFSET_lvl4表示第四级页表的偏移值，虚拟地址段 OFFSET_lvl3表示第三级页表的偏移值，虚拟地址段OFFSET_lvl2表示第二级页表的偏移值，虚拟地址段OFFSET_lvl1表示第一级页表的偏移值。FIG. 1 is a schematic diagram of an address translation process, showing an address translation process of a four-level page table. As shown in FIG. 1 , a virtual address is divided into several segments, such as EXT, OFFSET_lvl4, OFFSET_lvl3, OFFSET_lvl2, OFFSET_lvl1, and OFFSET_pg, respectively. In this example, the upper virtual address segment EXT is not used. The virtual address segments OFFSET_lvl4, OFFSET_lvl3, OFFSET_lvl2, and OFFSET_lvl1 respectively represent the offset value of the fourth-level page table, that is, the virtual address segment OFFSET_lvl4 represents the offset value of the fourth-level page table, and the virtual address segment OFFSET_lvl3 represents the third-level page table. The offset value of the virtual address segment OFFSET_lvl2 represents the offset value of the second-level page table, and the virtual address segment OFFSET_lvl1 represents the offset value of the first-level page table.

最高一级页表(也即第四级页表)的初始地址存放在架构寄存器REG_pt 中，其内容由操作系统设定，应用程序无法更改。在第二级页表、第三级页表、第四级页表中，每一级页表的页表项中存放的是下一级页表的起始地址。第一级页表项(Page Table Entry，PTE)存放的是相应内存页的物理地址的高位，它与虚拟地址偏移(OFFSET_pg)合并即可得到该虚拟地址对应的物理地址。由此，通过这种方式逐级获取下一级页表的起始地址，最终可以得到第一级页表项(PTE)，从而进一步得到对应的物理地址，实现了从虚拟地址到物理地址的翻译。The initial address of the highest-level page table (ie, the fourth-level page table) is stored in the architecture register REG_pt, the content of which is set by the operating system and cannot be changed by the application program. In the second-level page table, the third-level page table, and the fourth-level page table, the page table entry of each level of page table stores the starting address of the next-level page table. The first-level page table entry (Page Table Entry, PTE) stores the high order of the physical address of the corresponding memory page, which is combined with the virtual address offset (OFFSET_pg) to obtain the physical address corresponding to the virtual address. Therefore, in this way, the starting address of the next-level page table is obtained step by step, and finally the first-level page table entry (PTE) can be obtained, thereby further obtaining the corresponding physical address, which realizes the transition from virtual address to physical address. translate.

需要说明的是，虽然图1示出了4级页表，但是本公开的实施例不限于此，可以采用任意数量的多级页表，例如2级页表、3级页表、5级页表等，还可以采用单级页表，这可以根据实际需求而定，本公开的实施例对此不作限制。例如，一个系统可能支持不同大小的页，每个页的大小由虚拟地址偏移OFFSET_pg的位数来表示。同一个系统中，越大的页所需的地址翻译级数越少。It should be noted that although FIG. 1 shows a 4-level page table, embodiments of the present disclosure are not limited thereto, and any number of multi-level page tables may be used, such as a 2-level page table, a 3-level page table, and a 5-level page table. Tables, etc., a single-level page table may also be used, which may be determined according to actual requirements, which is not limited by the embodiments of the present disclosure. For example, a system may support pages of different sizes, the size of each page being represented by the number of bits of the virtual address offset OFFSET_pg. On the same system, larger pages require fewer address translation stages.

图2为一种多核处理器的架构示意图。例如，如图2所示，该处理器具有4个处理器核(CPU Core)。同时，该处理器还具有多级缓存，例如第一级缓存(L1 Cache)、第二级缓存(L2 Cache)和最后一级缓存(Last Level Cache，LLC)。在该示例中，最后一级缓存实际上是第三级缓存(L3 Cache)。当然，本公开的实施例不限于此，处理器可以具有任意数量的多级缓存，因此最后一级缓存也可以为任意等级的缓存，这可以根据实际需求而定。FIG. 2 is a schematic diagram of the architecture of a multi-core processor. For example, as shown in FIG. 2, the processor has 4 processor cores (CPU Core). At the same time, the processor also has multi-level caches, such as the first level cache (L1 Cache), the second level cache (L2 Cache) and the last level cache (Last Level Cache, LLC). In this example, the last level cache is actually the third level cache (L3 Cache). Of course, the embodiments of the present disclosure are not limited thereto, and the processor may have any number of multi-level caches, so the last level of cache may also be a cache of any level, which may be determined according to actual requirements.

例如，在该示例中，最后一级缓存是多个处理器核共享的，第二级缓存是各个处理器核私有的。也即是，多个处理器核共用一个最后一级缓存，而每个处理器核被单独提供一个专用的第二级缓存。最后一级缓存和第二级缓存用于存储指令与数据，最后一级缓存与内存连接。需要说明的是，在另一些示例中，第二级缓存也可以是共享类型的缓存，本公开的实施例对此不作限制。For example, in this example, the last level cache is shared by multiple processor cores, and the second level cache is private to each processor core. That is, multiple processor cores share a last-level cache, while each processor core is provided with a dedicated second-level cache. The last level cache and the second level cache are used to store instructions and data, and the last level cache is connected to the memory. It should be noted that, in other examples, the second-level cache may also be a shared-type cache, which is not limited in this embodiment of the present disclosure.

例如，为每个处理器核单独设置一个专用的第一级缓存，第一级缓存设置在处理器核内部。例如，第一级缓存可以包括第一级指令缓存(L1I缓存) 和第一级数据缓存(L1D缓存)，分别用于缓存指令和数据。该处理器还包括内存，处理器核通过多级缓存与内存的数据缓存机制实现指令传递和数据读取。For example, a dedicated first-level cache is separately set for each processor core, and the first-level cache is set inside the processor core. For example, the first-level cache may include a first-level instruction cache (L1I cache) and a first-level data cache (L1D cache) for caching instructions and data, respectively. The processor further includes a memory, and the processor core realizes instruction transfer and data reading through a multi-level cache and a data cache mechanism of the memory.

例如，为每个处理器核单独设置有翻译后备缓冲器(Translation LookasideBuffer，TLB)，翻译后备缓冲器可以包括针对指令的翻译后备缓冲器(ITLB)和针对数据的翻译后备缓冲器(DTLB)。ITLB和DTLB都设置在处理器核内。For example, a translation lookaside buffer (TLB) is separately provided for each processor core, and the translation lookaside buffer may include a translation lookaside buffer (ITLB) for instructions and a translation lookaside buffer (DTLB) for data. Both ITLB and DTLB are set inside the processor core.

地址翻译是一个非常耗时的过程，对于多级页表，通常需要多次访问内存才能获得相应的物理地址。以图1所示的4级页表为例，需要访问内存4 次才能获得相应的物理地址。因此，为了节省地址翻译时间，提升计算机系统性能，可以在处理器核中设置TLB(例如包括ITLB和DTLB)来存放之前使用过的第一级页表项(PTE)。当需要进行地址翻译时，首先到TLB中查询是否有需要的PTE，如果命中，则可以立即获得相应的物理地址。与 CPU缓存架构类似，TLB也可以有多种架构，比如全相联(Fully Associative)、组相联(SetAssociative)、直接索引(Directly Indexed)等。TLB架构也可以是多级结构，最低一级TLB的尺寸最小且速度最快，当最低一级TLB没有命中时，再搜索下一级TLB。Address translation is a very time-consuming process. For multi-level page tables, it usually requires multiple memory accesses to obtain the corresponding physical address. Taking the 4-level page table shown in Figure 1 as an example, it needs to access the memory 4 times to obtain the corresponding physical address. Therefore, in order to save address translation time and improve computer system performance, a TLB (for example, including ITLB and DTLB) may be set in the processor core to store previously used first-level page table entries (PTEs). When address translation is required, the TLB is firstly queried to see if there is a required PTE, and if it hits, the corresponding physical address can be obtained immediately. Similar to the CPU cache architecture, TLB can also have multiple architectures, such as Fully Associative, SetAssociative, Directly Indexed, and so on. The TLB architecture can also be a multi-level structure. The lowest level TLB has the smallest size and the fastest speed. When the lowest level TLB is not hit, the next level TLB is searched.

尽管TLB能够减少很多地址翻译的时延，但是，在执行程序的过程中，还是避免不了访问页表来进行地址翻译。为了减少翻译操作所需时间，通常为处理器核单独设置一个硬件页表遍历器(Hardware Page Table Walker， PTW)，该硬件页表遍历器设置在处理器核内部。通过使用硬件页表遍历器，可以遍历多级页表以获得最终的内存页物理地址。Although TLB can reduce the delay of a lot of address translation, in the process of executing the program, it is unavoidable to access the page table for address translation. In order to reduce the time required for the translation operation, a hardware page table walker (Hardware Page Table Walker, PTW) is usually set separately for the processor core, and the hardware page table walker is set inside the processor core. By using a hardware page table walker, a multi-level page table can be traversed to obtain the final physical address of a memory page.

L1I缓存和L1D缓存使用物理地址(physically indexed，virtually tagged 方式)来访问，第二级缓存、最后一级缓存、内存也使用物理地址进行访问。因此，在访问数据之前，需要先通过ITLB或DTLB进行地址翻译。与正常数据读取请求一致，硬件页表遍历器的读取请求最远可以经过第一级缓存、第二级缓存、最后一级缓存到达内存。如果硬件页表遍历器请求的数据在某一级缓存中存在，则该缓存返回该数据，且不再向下级缓存/内存传递该硬件页表遍历器的请求。L1I cache and L1D cache are accessed using physical addresses (physically indexed, virtually tagged), and the second-level cache, last-level cache, and memory are also accessed using physical addresses. Therefore, address translation through ITLB or DTLB is required before accessing data. Consistent with the normal data read request, the read request of the hardware page table walker can reach the memory through the first-level cache, the second-level cache, and the last-level cache at the farthest. If the data requested by the hardware page table walker exists in a certain level of cache, the cache returns the data, and no longer transmits the request of the hardware page table walker to the lower-level cache/memory.

图3为采用图2所示的处理器进行地址翻译的数据流示意图。如图3所示，在一种可能的情况下，TLB没有命中因而需要访问内存进行地址翻译，此时，需要访问内存4次来获得最终内存页的物理地址。在大数据、云计算、人工智能(AI)等新型应用场景下，往往同时使用很大的指令与数据空间，且热点指令段与数据段的数量多且彼此分散。因此，这些新型应用程序往往存在较多的缓存未命中(Cache Miss)和TLB未命中(TLB Miss)。这使得硬件页表遍历器的数据请求往往不在某一级缓存中，而只能通过多次内存访问进行地址翻译。FIG. 3 is a schematic diagram of a data flow for performing address translation using the processor shown in FIG. 2 . As shown in Figure 3, in a possible situation, the TLB is not hit and therefore needs to access the memory for address translation. At this time, it is necessary to access the memory 4 times to obtain the physical address of the final memory page. In new application scenarios such as big data, cloud computing, and artificial intelligence (AI), large instruction and data spaces are often used at the same time, and the number of hot instruction segments and data segments is large and scattered. Therefore, these new applications often have more cache misses (Cache Miss) and TLB misses (TLB Miss). This makes the data request of the hardware page table walker often not in a certain level of cache, but can only perform address translation through multiple memory accesses.

在通常的CPU架构中，程序的指令与数据都保存在内存中，而处理器核运行频率远远高于内存运行频率，因此，从内存获取数据或者指令需要上百个时钟，这往往会造成处理器核由于无法继续运行相关指令而空转，造成性能损失。因此，现代高性能处理器都包含多级缓存架构来保存最近被访问的数据，同时提前预取即将被访问的数据、指令到缓存中。通过提前预取数据、指令到缓存，使得相应读写操作能命中缓存，从而可以减少时延。In the usual CPU architecture, the instructions and data of the program are stored in the memory, and the operating frequency of the processor core is much higher than the operating frequency of the memory. Therefore, it takes hundreds of clocks to obtain data or instructions from the memory, which often causes The processor core is idling because it cannot continue to run the relevant instructions, resulting in a performance loss. Therefore, modern high-performance processors contain multi-level cache architectures to save recently accessed data, while prefetching upcoming data and instructions into the cache in advance. By prefetching data and instructions to the cache in advance, the corresponding read and write operations can hit the cache, thereby reducing latency.

当采用图2所示的处理器时，地址翻译以及请求数据的过程如图4所示。例如，当数据读取请求发生TLB未命中(TLB Miss)时，需要先进行页表遍历获得物理地址，也即，从内存中读取四级页表项以进行地址翻译，然后根据翻译得到的物理地址从缓存/内存获得相应数据。图4中两个五角星图形之间的时间是该操作的所有时延，包括地址翻译时延(实线间最长距离)及数据读取时延(虚线间距离)。When the processor shown in FIG. 2 is used, the process of address translation and requesting data is shown in FIG. 4 . For example, when a TLB miss (TLB Miss) occurs in a data read request, it is necessary to traverse the page table first to obtain the physical address, that is, read the four-level page table entry from the memory for address translation, and then according to the translation obtained The physical address gets the corresponding data from cache/memory. The time between the two pentagrams in Figure 4 is all the delays for the operation, including the address translation delay (the longest distance between the solid lines) and the data read delay (the distance between the dashed lines).

在图4所示的示例中，一个需要通过页表遍历进行地址翻译的数据读写操作(该操作可能本身就是一个数据预取请求)已经没有机会进行数据预取，只能在获得物理地址之后，通过多级缓存从内存中获得数据。这使得数据预取无法发挥作用，不能减少时延，对系统整体性能带来消极影响。In the example shown in Figure 4, a data read and write operation that requires address translation through page table traversal (the operation may itself be a data prefetch request) has no chance to perform data prefetch, only after obtaining the physical address , fetches data from memory through multi-level caches. This makes data prefetching ineffective, cannot reduce latency, and has a negative impact on the overall performance of the system.

本公开至少一个实施例提供一种信息预取方法、处理器、电子设备。该信息预取方法可以在降低地址翻译时延的同时实现数据/指令预取功能，有效减少数据/指令读写操作的时延，提高系统整体性能。At least one embodiment of the present disclosure provides an information prefetching method, a processor, and an electronic device. The information prefetching method can realize the data/instruction prefetching function while reducing the address translation time delay, effectively reduce the data/instruction read and write operation time delay, and improve the overall performance of the system.

下面，将参考附图详细地说明本公开的实施例。应当注意的是，不同的附图中相同的附图标记将用于指代已描述的相同的元件。Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. It should be noted that the same reference numerals will be used in different drawings to refer to the same elements that have been described.

本公开至少一个实施例提供一种信息预取方法，用于处理器。该处理器包括第一级缓存空间、第一页表遍历器、第二页表遍历器和至少一个预设缓存空间，第一级缓存空间和至少一个预设缓存空间依序通信连接以形成通信链路。至少一个预设缓存空间包括目标预设缓存空间，第一页表遍历器与目标预设缓存空间设置在同一路径等级，第一页表遍历器与目标预设缓存空间通信连接。第二页表遍历器与第一级缓存空间设置在同一路径等级，第二页表遍历器与第一级缓存空间通信连接。该信息预取方法包括：响应于基于预设规则在第一页表遍历器和第二页表遍历器中选择第一页表遍历器执行地址翻译操作得到物理地址，第一页表遍历器向目标预设缓存空间发送预取请求，其中，预取请求包括物理地址；响应于预取请求，目标预设缓存空间基于物理地址进行信息预取操作。At least one embodiment of the present disclosure provides an information prefetching method for a processor. The processor includes a first-level cache space, a first page table traverser, a second page table traverser and at least one preset cache space, the first-level cache space and the at least one preset cache space are sequentially connected in communication to form communication link. At least one preset cache space includes a target preset cache space, the first page table traverser and the target preset cache space are set at the same path level, and the first page table traverser is connected in communication with the target preset cache space. The second page table traverser and the first-level cache space are set at the same path level, and the second page table traverser is in communication connection with the first-level cache space. The information prefetching method includes: in response to selecting the first page table traverser from the first page table traverser and the second page table traverser to perform an address translation operation to obtain a physical address based on a preset rule, the first page table traverser sends a request to the physical address. The target preset cache space sends a prefetch request, wherein the prefetch request includes a physical address; in response to the prefetch request, the target preset cache space performs an information prefetch operation based on the physical address.

本公开至少一个实施例提供一种处理器。该处理器包括第一级缓存空间、第一页表遍历器、第二页表遍历器和至少一个预设缓存空间。第一级缓存空间和至少一个预设缓存空间依序通信连接以形成通信链路。至少一个预设缓存空间包括目标预设缓存空间，第一页表遍历器与目标预设缓存空间设置在同一路径等级，第一页表遍历器与目标预设缓存空间通信连接。第二页表遍历器与第一级缓存空间设置在同一路径等级，第二页表遍历器与第一级缓存空间通信连接。第一页表遍历器配置为响应于基于预设规则在第一页表遍历器和第二页表遍历器中选择第一页表遍历器执行地址翻译操作得到物理地址，向目标预设缓存空间发送预取请求，预取请求包括物理地址；目标预设缓存空间配置为响应于预取请求，基于物理地址进行信息预取操作。At least one embodiment of the present disclosure provides a processor. The processor includes a first level cache space, a first page table walker, a second page table walker and at least one preset cache space. The first-level buffer space and at least one preset buffer space are sequentially connected in communication to form a communication link. At least one preset cache space includes a target preset cache space, the first page table traverser and the target preset cache space are set at the same path level, and the first page table traverser is connected in communication with the target preset cache space. The second page table traverser and the first-level cache space are set at the same path level, and the second page table traverser is in communication connection with the first-level cache space. The first page table traverser is configured to, in response to selecting the first page table traverser from the first page table traverser and the second page table traverser to perform an address translation operation based on a preset rule, obtain a physical address, and send the physical address to the target preset cache space. A prefetch request is sent, and the prefetch request includes a physical address; the target preset cache space is configured to perform an information prefetch operation based on the physical address in response to the prefetch request.

图5为本公开一些实施例提供的一种处理器的架构示意图。下面先结合图5对本公开实施例提供的处理器进行说明，然后再对本公开实施例提供的信息预取方法进行说明。FIG. 5 is a schematic structural diagram of a processor according to some embodiments of the present disclosure. The processor provided by the embodiment of the present disclosure is first described below with reference to FIG. 5 , and then the information prefetching method provided by the embodiment of the present disclosure is described.

如图5所示，在本公开的一些实施例中，该处理器包括处理器核、第一级缓存空间、第一页表遍历器、第二页表遍历器和至少一个预设缓存空间。在一些示例中，第一页表遍历器和第二页表遍历器可以为前述的硬件页表遍历器(PTW)。这里，第一页表遍历器和第二页表遍历器可以为前述的PTW，是指其实现的地址翻译功能类似，所采用的地址翻译原理相似，而第一页表遍历器与前述的PTW的硬件结构、设置位置等可以不同，第二页表遍历器与前述的PTW的硬件结构、设置位置等也可以不同，本公开的实施例对此不作限制。第一页表遍历器为新增的硬件页表遍历器(PTW)，与处理器的各级缓存空间中除第一级缓存空间外的任一级缓存空间设置在同一路径等级，第二页表遍历器可以与第一级缓存空间设置在同一路径等级。As shown in FIG. 5 , in some embodiments of the present disclosure, the processor includes a processor core, a first-level cache space, a first page table walker, a second page table walker, and at least one preset cache space. In some examples, the first page table walker and the second page table walker may be the aforementioned hardware page table walkers (PTW). Here, the first page table traverser and the second page table traverser may be the aforementioned PTW, which means that the address translation functions implemented by them are similar, and the adopted address translation principles are similar, and the first page table traverser is the same as the aforementioned PTW. The hardware structure, setting location, etc. of the second page table traverser may be different, and the hardware structure, setting location, etc. of the second page table traverser and the aforementioned PTW may also be different, which is not limited by the embodiment of the present disclosure. The first page table walker is the newly added hardware page table walker (PTW). The table traverser can be set at the same path level as the first level cache space.

例如，第一级缓存空间为L1缓存，设置在处理器核内部。例如，第一级缓存空间与处理器核设置在同一路径等级，第一级缓存空间与处理器核通信连接，处理器核可以直接从第一级缓存空间获取数据或指令。这里，“设置在同一路径等级”是指在芯片中的物理位置相邻或相近，且可以直接进行数据交互和传递。因此，第一级缓存空间与处理器核设置在同一路径等级可以指第一缓存空间设置在处理器核旁边，距处理器核的距离较近，且处理器核可以与第一级缓存空间直接进行数据交互和传输。例如，“通信连接”是指可以直接传输数据/指令。For example, the first level cache space is the L1 cache, which is set inside the processor core. For example, the first-level cache space and the processor core are set at the same path level, the first-level cache space is connected in communication with the processor core, and the processor core can directly obtain data or instructions from the first-level cache space. Here, "arranged at the same path level" means that the physical positions in the chip are adjacent or close, and data exchange and transfer can be performed directly. Therefore, setting the first-level cache space and the processor core at the same path level may mean that the first-level cache space is located next to the processor core and is relatively close to the processor core, and the processor core can be directly connected to the first-level cache space. Data exchange and transmission. For example, a "communication connection" means that data/instructions can be transferred directly.

例如，第二页表遍历器与第一级缓存空间设置在同一路径等级，第二页表遍历器与第一级缓存空间通信连接。第二页表遍历器与第一级缓存空间设置在同一路径等级可以指第二页表遍历器设置在第一级缓存空间旁边，距第一级缓存空间的距离较近，且处第一级缓存空间可以与第二页表遍历器直接进行数据交互和传输。又例如，第二页表遍历器可以设置在处理器核内部，第二页表遍历器在逻辑上可以与处理器核设置在同一路径等级，第二页表遍历器与处理器核通信连接。For example, the second page table traverser and the first-level cache space are set at the same path level, and the second page table traverser is communicatively connected to the first-level cache space. The second page table traverser is set at the same path level as the first-level cache space. It may mean that the second page table traverser is set next to the first-level cache space, is close to the first-level cache space, and is located at the first-level cache space. The cache space can directly interact and transfer data with the second page table walker. For another example, the second page table walker may be disposed inside the processor core, the second page table walker may be logically disposed at the same path level as the processor core, and the second page table walker is communicatively connected to the processor core.

在一些示例中，第一级缓存空间包括L1I缓存和L1D缓存，L1I缓存用于存储指令，L1D缓存用于存储数据。当然，本公开的实施例不限于此，在其他示例中，也可以不区分L1I缓存和L1D缓存，而仅设置一个L1缓存，既用于存储数据又用于存储指令。In some examples, the first level cache space includes an L1I cache for storing instructions and an L1D cache for storing data. Of course, the embodiments of the present disclosure are not limited to this, and in other examples, the L1I cache and the L1D cache may not be distinguished, but only one L1 cache is provided, which is used to store both data and instructions.

例如，在一些示例中，至少一个预设缓存空间包括第二级缓存空间至第 N级缓存空间，N是大于2的整数。第N级缓存空间距离内存最近且距离处理器核最远。例如，在图4所示的示例中，至少一个预设缓存空间可以包括第二级缓存空间(L2缓存)和最后一级缓存空间(LLC)，也即是，此时N＝3。当然，本公开的实施例不限于此，N可以为任意的大于2的整数，例如为4、 5、6等，相应地，该处理器为4级缓存架构、5级缓存架构、6级缓存架构等。例如，在另一些示例中，至少一个预设缓存空间包括一个缓存空间，也即仅包括第二级缓存空间，此时，该处理器为2级缓存架构。需要说明的是，在本公开实施例提供的处理器中，除了第一级缓存空间以外，其他级别的缓存可以统称为预设缓存空间。For example, in some examples, the at least one preset cache space includes the second level cache space to the Nth level cache space, where N is an integer greater than 2. The Nth level cache space is closest to the memory and farthest from the processor core. For example, in the example shown in FIG. 4 , at least one preset cache space may include a second-level cache space (L2 cache) and a last-level cache space (LLC), that is, N=3 at this time. Of course, the embodiments of the present disclosure are not limited to this, and N can be any integer greater than 2, such as 4, 5, 6, etc. Correspondingly, the processor has a level-4 cache architecture, a level-5 cache architecture, and a level-6 cache. architecture, etc. For example, in some other examples, the at least one preset cache space includes one cache space, that is, only includes the second-level cache space, and in this case, the processor is a level-2 cache architecture. It should be noted that, in the processor provided by the embodiments of the present disclosure, in addition to the first-level cache space, other levels of cache may be collectively referred to as a preset cache space.

例如，第一级缓存空间、至少一个预设缓存空间依序通信连接以形成通信链路，由此可以实现逐级向下的数据获取。例如，处理器核需要获取数据时，可以首先到第一级缓存空间查询，如果没有命中，则继续到第二级缓存空间查询，如果仍然没有命中，则到最后一级缓存空间查询。如果最后一级缓存空间还是未命中，则到内存中获取数据。同样地，第二页表遍历器需要获取数据时，可以首先到第一级缓存空间查询，如果没有命中，则继续到第二级缓存空间查询，如果仍然没有命中，则到最后一级缓存空间查询。如果最后一级缓存空间还是未命中，则到内存中获取数据。For example, the first-level cache space and at least one preset cache space are sequentially connected in communication to form a communication link, so that data acquisition can be implemented in a step-by-step manner. For example, when the processor core needs to obtain data, it can first query the first-level cache space, if there is no hit, continue to the second-level cache space query, and if there is still no hit, then go to the last-level cache space query. If the last level of cache space is still not hit, go to the memory to get the data. Similarly, when the second page table traverser needs to obtain data, it can first query the first-level cache space. If there is no hit, it will continue to the second-level cache space to query. If there is still no hit, it will go to the last-level cache space. Inquire. If the last level of cache space is still not hit, go to the memory to get the data.

例如，至少一个预设缓存空间包括目标预设缓存空间，该目标预设缓存空间可以是多个预设缓存空间中的任意一个。例如，可以将第二级缓存空间至第N级缓存空间中的任一级缓存空间作为目标预设缓存空间。第一页表遍历器与目标预设缓存空间设置在同一路径等级，第一页表遍历器与目标预设缓存空间通信连接。For example, the at least one preset cache space includes a target preset cache space, and the target preset cache space may be any one of a plurality of preset cache spaces. For example, any level cache space from the second level cache space to the Nth level cache space may be used as the target preset cache space. The first page table traverser and the target preset cache space are set at the same path level, and the first page table traverser is connected in communication with the target preset cache space.

例如，在图5的示例中，最后一级缓存空间作为目标预设缓存空间，第一页表遍历器与最后一级缓存空间设置在同一路径等级，第一页表遍历器与最后一级缓存空间通信连接。这里，“设置在同一路径等级”是指在芯片中的物理位置相邻或相近，且可以直接进行数据交互和传递。因此，第一页表遍历器与最后一级缓存空间设置在同一路径等级可以指第一页表遍历器设置在最后一级缓存空间旁边，距最后一级缓存空间的距离较近，且最后一级缓存空间可以与第一页表遍历器直接进行数据交互和传输。For example, in the example of FIG. 5, the last level cache space is used as the target preset cache space, the first page table traverser and the last level cache space are set at the same path level, and the first page table traverser and the last level cache space are set at the same path level. Space communication connection. Here, "arranged at the same path level" means that the physical positions in the chip are adjacent or close, and data exchange and transfer can be performed directly. Therefore, setting the first page table traverser and the last level of cache space at the same path level may mean that the first page table traverser is set next to the last level of cache space, and is closer to the last level of cache space, and the last The first-level cache space can directly interact and transfer data with the first page table walker.

例如，在一些示例中，第N级缓存空间是共享类型的缓存空间，第N 级缓存空间作为目标预设缓存空间，该示例为图5所示的情形。例如，在另一些示例中，第二级缓存空间是私有类型或共享类型的缓存空间，第二级缓存空间作为目标预设缓存空间。也即是，在一些处理器架构中，第二级缓存空间是针对每个处理器核单独提供的，属于私有类型，而在另一些处理器架构中，第二级缓存空间是多个处理器核共享的，属于共享类型。无论第二级缓存空间是私有类型还是共享类型，都可以将第二级缓存空间作为目标预设缓存空间。For example, in some examples, the Nth level cache space is a shared type of cache space, and the Nth level cache space is used as the target preset cache space, and this example is the situation shown in FIG. 5 . For example, in other examples, the second-level cache space is a private type or a shared type of cache space, and the second-level cache space is used as the target preset cache space. That is, in some processor architectures, the second-level cache space is provided separately for each processor core and is of a private type, while in other processor architectures, the second-level cache space is multiple processors. If the core is shared, it belongs to the shared type. Regardless of whether the second-level cache space is a private type or a shared type, the second-level cache space can be used as the target preset cache space.

需要说明的是，虽然图5示出了将最后一级缓存空间作为目标预设缓存空间并且将第一页表遍历器设置在最后一级缓存空间的旁边，但这并不构成对本公开实施例的限制。在其他示例中，在将第二级缓存空间作为目标预设缓存空间的情形中，第一页表遍历器设置在第二级缓存空间旁边，且与第二级缓存空间通信连接。在一些示例中，当处理器包括更多级别的缓存时，可以将除了第一级缓存空间之外的其他任意一级缓存空间作为目标预设缓存空间，由此相应调整第一页表遍历器的设置位置。需要注意的是，第一页表遍历器没有设置在处理器核内，或者，第一页表遍历器没有设置在第一级缓存空间的旁边。It should be noted that although FIG. 5 shows that the last-level cache space is used as the target preset cache space and the first page table traverser is set beside the last-level cache space, this does not constitute a negative impact on the embodiment of the present disclosure. limits. In other examples, in the case where the second-level cache space is used as the target preset cache space, the first page table walker is disposed beside the second-level cache space and is communicatively connected with the second-level cache space. In some examples, when the processor includes more levels of cache, any other level-1 cache space except the first-level cache space can be used as the target preset cache space, and the first page table walker is adjusted accordingly setting location. It should be noted that the first page table walker is not set in the processor core, or the first page table walker is not set next to the first level cache space.

例如，第一页表遍历器配置为响应于基于预设规则在第一页表遍历器和第二页表遍历器中选择第一页表遍历器执行地址翻译操作得到物理地址，向目标预设缓存空间发送预取请求。例如，预取请求包括物理地址，也即，预取请求携带有第一页表遍历器翻译得到的物理地址。例如，当需要将虚拟地址翻译为物理地址时，如果在ITLB或DTLB中未命中，且确定由第一页表遍历器执行地址翻译操作时，则处理器核会向第一页表遍历器发送地址翻译请求。TLB的架构不限于ITLB和DTLB的方式，可以采用任意适用的架构，本公开的实施例对此不作限制。For example, the first page table walker is configured to, in response to selecting the first page table walker from the first page table walker and the second page table walker to perform an address translation operation based on a preset rule, obtains a physical address, and presets a physical address to the target. Cache space to send prefetch requests. For example, the prefetch request includes a physical address, that is, the prefetch request carries the physical address translated by the first page table walker. For example, when a virtual address needs to be translated to a physical address, if there is a miss in ITLB or DTLB, and it is determined that the address translation operation is performed by the first page table walker, the processor core will send the first page table walker to the first page table walker. Address translation request. The architecture of the TLB is not limited to the manners of ITLB and DTLB, and any applicable architecture may be adopted, which is not limited by the embodiments of the present disclosure.

例如，地址翻译请求可以触发第一页表遍历器执行地址翻译操作。地址翻译请求可以通过多级缓存架构传递至第一页表遍历器，也可以通过处理器内部的流水线传递至第一页表遍历器，本公开实施例对地址翻译请求的传递方式不作限制。在地址翻译请求通过多级缓存架构传递至第一页表遍历器的情形中，地址翻译请求采用多级缓存架构可识别的数据读取请求类型。For example, an address translation request may trigger the first page table walker to perform an address translation operation. The address translation request may be transmitted to the first page table traverser through a multi-level cache architecture, or may be transmitted to the first page table traverser through a pipeline inside the processor. The embodiment of the present disclosure does not limit the transmission method of the address translation request. In the case where the address translation request is passed to the first page table walker through the multi-level cache architecture, the address translation request adopts a data read request type recognizable by the multi-level cache architecture.

例如，地址翻译操作可以是多级页表的地址翻译过程，可以参见关于图 1的说明，此处不再赘述。需要注意的是，进行地址翻译的页表不限于4级，可以采用任意数量的多级页表，例如2级页表、3级页表、5级页表等，还可以采用单级页表，这可以根据实际需求而定，本公开的实施例对此不作限制。例如，页表级数越多，每次地址翻译访问内存的次数就越多，因此本公开实施例提供的处理器能够提供的性能提升空间也就越大。例如，页表的物理页面大小不受限制，可以根据实际需求而定。For example, the address translation operation may be an address translation process of a multi-level page table, and reference may be made to the description about FIG. 1 , which will not be repeated here. It should be noted that the page table for address translation is not limited to 4 levels, and any number of multi-level page tables can be used, such as 2-level page tables, 3-level page tables, 5-level page tables, etc., and single-level page tables can also be used. , which can be determined according to actual requirements, which is not limited by the embodiments of the present disclosure. For example, the greater the number of page table levels, the more times the memory is accessed for each address translation, and therefore the greater the performance improvement space that the processor provided by the embodiments of the present disclosure can provide. For example, the physical page size of the page table is not limited and can be determined according to actual needs.

地址翻译请求可以包括翻译信息。翻译信息可以包括：地址翻译请求序列号、需要翻译的虚拟地址值、最高级页表的初始地址。第一页表遍历器接收到地址翻译请求后，会被触发执行地址翻译操作，并且基于翻译信息可以获取执行地址翻译操作所需要的内容，例如虚拟地址值、最高级页表的初始地址等。在一些示例中，可以用Addr_Trans_Req表示该请求为地址翻译请求，用Addr_Trans_SN表示地址翻译请求序列号，用REG_pt表示最高级页表的初始地址(也即该进程的REG_pt值)，用VA表示需要翻译的虚拟地址值。The address translation request may include translation information. The translation information may include: address translation request sequence number, virtual address value to be translated, and initial address of the top-level page table. After the first page table traverser receives the address translation request, it will be triggered to perform the address translation operation, and based on the translation information, it can obtain the content required for the address translation operation, such as the virtual address value, the initial address of the highest-level page table, and so on. In some examples, Addr_Trans_Req may be used to indicate that the request is an address translation request, Addr_Trans_SN may be used to indicate the address translation request sequence number, REG_pt may be used to indicate the initial address of the highest level page table (ie, the REG_pt value of the process), and VA may be used to indicate that translation is required virtual address value.

例如，翻译信息还可以包括请求类型标识。请求类型标识指示物理地址对应存储的目标信息是指令类型或数据类型。在一些示例中，可以用I/D表示该请求对应的是指令还是数据，例如用I表示指令，用D表示数据。For example, the translation information may also include a request type identification. The request type identifier indicates that the target information stored corresponding to the physical address is an instruction type or a data type. In some examples, I/D may be used to indicate whether the request corresponds to an instruction or data, for example, I may be used to indicate an instruction, and D may be used to indicate data.

由于第一页表遍历器设置在目标预设缓存空间(该示例中为最后一级缓存空间)旁边，第一页表遍历器距离内存较近，因此，第一页表遍历器每次从内存获取页表项的时间较短，这显著提高了地址翻译的效率，大大缩短了地址翻译所花费的时间。第一页表遍历器没有设置在第一级缓存空间(L1 缓存)的旁边，摆脱了通常将第一页表遍历器设置在处理器核内的束缚，并且由于第一页表遍历器可以更靠近内存，因而可以降低第一页表遍历器访问内存及地址翻译的时延，提高处理器的系统性能。这种第一页表遍历器的设置方式适用于多种新型应用场景(例如大数据、云计算、AI等)及多种CPU 架构，可以进一步提高这些新型应用场景的性能。Since the first page table walker is set next to the target preset cache space (the last level cache space in this example), the first page table walker is relatively close to the memory. The time to obtain the page table entry is shorter, which significantly improves the efficiency of address translation and greatly reduces the time spent in address translation. The first page table walker is not located next to the first-level cache space (L1 cache), getting rid of the constraints of usually setting the first page table walker in the processor core, and because the first page table walker can be more It is close to the memory, so it can reduce the delay of the first page table traverser to access the memory and address translation, and improve the system performance of the processor. This setting method of the first page table walker is suitable for various new application scenarios (such as big data, cloud computing, AI, etc.) and various CPU architectures, which can further improve the performance of these new application scenarios.

需要说明的是，本公开的实施例中，可以将第一页表遍历器设置在除了第一级缓存空间以外的任意一级缓存空间的旁边，也可以将第一页表遍历器直接设置在内存旁边，这可以根据实际需求而定，例如根据处理器架构、工艺、缓存大小与延迟、内存延迟、是否支持缓存一致性、常用应用程序特性等多种因素而定，本公开的实施例对此不作限制。It should be noted that, in the embodiments of the present disclosure, the first page table traverser may be set beside any level of cache space except the first level cache space, or the first page table traverser may be set directly in Next to the memory, this can be determined according to actual needs, for example, according to various factors such as processor architecture, technology, cache size and delay, memory delay, whether to support cache coherence, common application characteristics and other factors. This is not limited.

例如，目标预设缓存空间配置为响应于预取请求，基于物理地址进行信息预取操作。例如，目标预设缓存空间进一步配置为：确定预取缓存空间，基于物理地址获取物理地址对应存储的目标信息，并将目标信息发送至预取缓存空间。例如，预取缓存空间是第一级缓存空间和至少一个预设缓存空间中至少之一，也即是，可以是第一级缓存空间和预设缓存空间中的任意一个或多个缓存空间。例如，在一些示例中，预取缓存空间是第一级缓存空间和至少一个预设缓存空间所形成的通信链路中相比于目标预设缓存空间更靠近处理器核的缓存空间，由此可以提高预取效率。当然，在其他示例中，预取缓存空间也可以是相比于目标预设缓存空间更远离处理器核的缓存空间。For example, the target preset cache space is configured to perform an information prefetch operation based on a physical address in response to a prefetch request. For example, the target preset buffer space is further configured to: determine the prefetch buffer space, obtain target information stored corresponding to the physical address based on the physical address, and send the target information to the prefetch buffer space. For example, the prefetch cache space is at least one of the first-level cache space and at least one preset cache space, that is, it may be any one or more cache spaces of the first-level cache space and the preset cache space. For example, in some examples, the prefetch cache space is the cache space that is closer to the processor core than the target preset cache space in the communication link formed by the first level cache space and the at least one preset cache space, thereby Prefetching efficiency can be improved. Of course, in other examples, the prefetch cache space may also be a cache space farther from the processor core than the target preset cache space.

也即是，在处理器核接收到物理地址并请求信息(例如数据或指令)之前，目标预设缓存空间根据物理地址进行信息预取操作，将该物理地址对应存储的目标信息存入预取缓存空间。由此，当处理器核基于物理地址请求信息时，可以在预取缓存空间中命中，从而可以有效减少时延，实现数据/指令的预取。That is, before the processor core receives the physical address and requests information (such as data or instructions), the target preset cache space performs an information prefetch operation according to the physical address, and stores the target information corresponding to the physical address in the prefetch. cache space. Therefore, when the processor core requests information based on the physical address, it can hit in the prefetch cache space, thereby effectively reducing the time delay and realizing data/instruction prefetching.

例如，如图5所示，该处理器还包括页表项缓存空间，页表项缓存空间可以为前述的翻译后备缓冲器(TLB)。例如，页表项缓存空间可以包括针对指令的翻译后备缓冲器(ITLB)和针对数据的翻译后备缓冲器(DTLB)。例如，处理器核与页表项缓存空间设置在同一路径等级，处理器核与页表项缓存空间通信连接。这里，“设置在同一路径等级”是指在芯片中的物理位置相邻或相近，且可以直接进行数据交互和传递。因此，处理器核与页表项缓存空间设置在同一路径等级可以指页表项缓存空间设置在处理器核旁边，距处理器核的距离较近，且处理器核可以与页表项缓存空间直接进行数据交互和传输。例如，“通信连接”是指可以直接传输数据/指令。又例如，页表项缓存空间可以设置在处理器核内，可以与同样设置在处理器核内的第二页表遍历器在同一路径等级，第二页表遍历器与页表项缓存空间通信连接。例如，第二页表遍历器与针对指令的翻译后备缓冲器(ITLB)和针对数据的翻译后备缓冲器(DTLB)设置在同一路径等级。例如，页表项缓存空间可以设置在处理器核内部。For example, as shown in FIG. 5 , the processor further includes a page table entry cache space, and the page table entry cache space may be the aforementioned translation lookaside buffer (TLB). For example, the page table entry cache space may include a translation lookaside buffer (ITLB) for instructions and a translation lookaside buffer (DTLB) for data. For example, the processor core and the page table entry cache space are set at the same path level, and the processor core is communicatively connected to the page table entry cache space. Here, "arranged at the same path level" means that the physical positions in the chip are adjacent or close, and data exchange and transfer can be performed directly. Therefore, setting the processor core and the page table entry cache space at the same path level may mean that the page table entry cache space is set next to the processor core, which is close to the processor core, and the processor core can be connected to the page table entry cache space. Direct data interaction and transmission. For example, a "communication connection" means that data/instructions can be transferred directly. For another example, the page table entry cache space may be set in the processor core, and may be at the same path level as the second page table traverser also set in the processor core, and the second page table traverser communicates with the page table entry cache space. connect. For example, the second page table walker is placed at the same path level as the translation lookaside buffer (ITLB) for instructions and the translation lookaside buffer (DTLB) for data. For example, page table entry cache space can be located inside the processor core.

例如，页表项缓存空间、第一级缓存空间、至少一个预设缓存空间依序通信连接以形成通信链路，由此可以实现逐级向下的数据获取。例如，设置在处理器核内的第二页表遍历器需要获取数据(例如页表项数据)时，可以首先到页表项缓存空间查询，如果没有命中，则继续到第一级缓存空间查询，如果没有命中，则继续到第二级缓存空间查询，如果仍然没有命中，则到最后一级缓存空间查询。如果最后一级缓存空间还是未命中，则到内存中获取数据。For example, the page table entry cache space, the first-level cache space, and the at least one preset cache space are sequentially connected in communication to form a communication link, so that data acquisition can be achieved step by step downward. For example, when the second page table traverser set in the processor core needs to obtain data (such as page table entry data), it can first query the page table entry cache space, and if there is no hit, continue to the first level cache space query , if there is no hit, continue to the second level cache space query, if there is still no hit, then go to the last level cache space query. If the last level of cache space is still not hit, go to the memory to get the data.

例如，页表项缓存空间储存有第一级页表的页表项数据至第M级页表的页表项数据中的至少部分页表项数据，M是大于1的整数。也即是，页表项缓存空间可以存储最近使用过的任意的页表项数据，例如PTE等。For example, the page table entry cache space stores at least part of the page table entry data of the page table entry data of the first level page table to the page table entry data of the Mth level page table, where M is an integer greater than 1. That is, the page table entry cache space can store any recently used page table entry data, such as PTE and so on.

例如，如图5所示，该处理器还可以包括请求缓存区。请求缓存区也可以称为页申请缓存(Page Request Buffer，PRB)，请求缓存区与第一页表遍历器设置在同一路径等级。请求缓存区与第一页表遍历器通信连接，并且与目标预设缓存空间通信连接，请求缓存区例如设置在第一页表遍历器与目标预设缓存空间之间。这里，“设置在同一路径等级”是指在芯片中的物理位置相邻或相近，且可以直接进行数据交互和传递。因此，请求缓存区与第一页表遍历器设置在同一路径等级可以指请求缓存区设置在第一页表遍历器旁边，距第一页表遍历器的距离较近，且第一页表遍历器可以与请求缓存区直接进行数据交互和传输。同时，请求缓存区还可以与目标预设缓存空间直接进行数据交互和传输。For example, as shown in FIG. 5, the processor may also include a request buffer. The request buffer area may also be called a page request buffer (Page Request Buffer, PRB), and the request buffer area and the first page table traverser are set at the same path level. The request buffer area is connected in communication with the first page table traverser, and is in communication connection with the target preset buffer space, and the request buffer area is, for example, set between the first page table traverser and the target preset buffer space. Here, "arranged at the same path level" means that the physical positions in the chip are adjacent or close, and data exchange and transfer can be performed directly. Therefore, setting the request buffer area and the first page table traverser at the same path level may mean that the request buffer area is set next to the first page table traverser, and is closer to the first page table traverser, and the first page table traversal area is The server can directly interact and transmit data with the request buffer. At the same time, the request buffer area can also directly interact and transmit data with the target preset buffer space.

请求缓存区配置为存储处理器核发送的待处理的地址翻译请求队列。当本公开实施例提供的处理器包括多个处理器核时，第一页表遍历器无法处理多个处理器核同时发送的地址翻译请求，因此可以采用请求缓存区来存储待处理的地址翻译请求队列。第一页表遍历器可以从请求缓存区依序获取地址翻译请求并执行相应的地址翻译操作。The request buffer is configured to store a queue of pending address translation requests sent by the processor core. When the processor provided by the embodiment of the present disclosure includes multiple processor cores, the first page table walker cannot process address translation requests sent by multiple processor cores at the same time, so a request buffer area can be used to store pending address translations request queue. The first page table traverser can sequentially obtain address translation requests from the request buffer and perform corresponding address translation operations.

例如，在一些示例中，处理器核配置为响应于基于预设规则在第一页表遍历器和第二页表遍历器中选择第一页表遍历器执行地址翻译操作，并且第一页表遍历器响应于地址翻译请求执行地址翻译操作，以得到物理地址。例如，处理器基于预设规则动态选择使用第一页表遍历器或第二页表遍历器来处理一个新的地址翻译请求，选择出更合适的页表遍历器来减少地址翻译的时延。例如，预设规则包括：当页表项缓存空间中不存在地址翻译所需的页表项数据，或者页表项缓存空间中对应于地址翻译所需的页表项数据的页表级大于阈值时，确定由第一页表遍历器执行地址翻译操作，否则，确定由第二页表遍历器执行该地址翻译操作。例如，阈值与CPU架构以及芯片工艺中多种因素相关，可以根据实际需求进行任意设定，本公开的实施例对此不作限制。For example, in some examples, the processor core is configured to perform an address translation operation in response to selecting a first page table walker among the first page table walker and the second page table walker based on preset rules, and the first page table The traverser performs an address translation operation in response to an address translation request to obtain a physical address. For example, the processor dynamically selects to use the first page table walker or the second page table walker to process a new address translation request based on a preset rule, and selects a more suitable page table walker to reduce the delay of address translation. For example, the preset rules include: when the page table entry data required for address translation does not exist in the page table entry cache space, or the page table level in the page table entry cache space corresponding to the page table entry data required for address translation is greater than a threshold When it is determined that the address translation operation is performed by the first page table walker, otherwise, it is determined that the address translation operation is performed by the second page table walker. For example, the threshold is related to the CPU architecture and various factors in the chip process, and can be arbitrarily set according to actual requirements, which is not limited in the embodiments of the present disclosure.

例如，在一些示例中，若页表项缓存空间中不存在地址翻译所需的页表项数据，则确定由第一页表遍历器执行地址翻译操作。例如，在另一些示例中，若页表项缓存空间中对应于地址翻译所需的页表项数据的页表级大于阈值，则确定由第一页表遍历器执行地址翻译操作。假设阈值为2，则当页表项缓存空间中对应于地址翻译所需的页表项数据为第3级页表项数据或第4 级页表项数据时，则确定由第一页表遍历器执行地址翻译操作。需要说明的是，预设规则不限于上文描述的方式，可以采用任意适用的规则来选择第一页表遍历器和第二页表遍历器其中之一进行地址翻译操作，这可以根据实际需求而定，本公开的实施例对此不作限制。For example, in some examples, if the page table entry data required for the address translation does not exist in the page table entry cache space, it is determined that the address translation operation is performed by the first page table walker. For example, in other examples, if the page table level in the page table entry cache space corresponding to the page table entry data required for address translation is greater than a threshold, it is determined that the address translation operation is performed by the first page table walker. Assuming that the threshold is 2, when the page table entry data corresponding to the address translation required in the page table entry cache space is the third-level page table entry data or the fourth-level page table entry data, it is determined that the first page table is traversed. The processor performs address translation operations. It should be noted that the preset rules are not limited to the methods described above, and any applicable rules can be used to select one of the first page table traverser and the second page table traverser to perform address translation operations, which can be based on actual needs. Rather, the embodiments of the present disclosure do not limit this.

例如，处理器核配置为响应于页表项缓存空间中不存在地址翻译所需的页表项数据，生成地址翻译请求，以及在确定由第一页表遍历器执行地址翻译操作的情形下向第一页表遍历器发送地址翻译请求。例如，当需要将虚拟地址翻译为物理地址时，如果地址翻译所需要的页表项数据在ITLB或DTLB 中未命中，则需要进行地址翻译操作。此时处理器会基于预设规则确定第一页表遍历器和第二页表遍历器之一来执行地址翻译操作。当确定由第一页表遍历器执行地址翻译操作时，处理器核会向第一页表遍历器发送地址翻译请求。当确定由第二页表遍历器执行地址翻译操作时，处理器核会向第二页表遍历器发送地址翻译请求。例如，TLB的架构不限于ITLB和DTLB的方式，可以采用任意适用的架构，本公开的实施例对此不作限制。For example, the processor core is configured to generate an address translation request in response to the absence of page table entry data required for the address translation in the page table entry cache space, and if it is determined that the address translation operation is performed by the first page table walker The first page table walker sends an address translation request. For example, when a virtual address needs to be translated into a physical address, if the page table entry data required for the address translation is not hit in the ITLB or DTLB, an address translation operation is required. At this time, the processor determines one of the first page table traverser and the second page table traverser to perform the address translation operation based on a preset rule. When it is determined that the address translation operation is performed by the first page table walker, the processor core sends an address translation request to the first page table walker. When it is determined that the address translation operation is performed by the second page table walker, the processor core sends an address translation request to the second page table walker. For example, the architecture of the TLB is not limited to the manner of ITLB and DTLB, and any applicable architecture may be adopted, which is not limited by the embodiments of the present disclosure.

例如，在一些示例中，第一页表遍历器还配置为接收处理器核生成的地址翻译请求，经由目标预设缓存空间从内存获取页表项数据，并使用页表项数据进行地址翻译操作，以获得物理地址。For example, in some examples, the first page table walker is further configured to receive an address translation request generated by the processor core, obtain page table entry data from memory via the target preset cache space, and perform address translation operations using the page table entry data , to get the physical address.

例如，在该示例中，第一页表遍历器并不与内存直接通信连接，不会直接访问内存，而是通过目标预设缓存空间间接访问内存，以获得页表项数据。例如，第一页表遍历器还配置为根据地址翻译请求从目标预设缓存空间至内存的路径中以逐级查询的方式获取页表项数据。该逐级查询的方式与通过多级缓存逐级获取数据的方式类似。例如，当目标预设缓存空间为最后一级缓存空间时，第一页表遍历器通过最后一级缓存空间访问内存；当目标预设缓存空间是第二级缓存空间或其他级缓存空间时，第一页表遍历器通过目标预设缓存空间逐级向下查询访问内存。例如，第一级缓存空间至第N级缓存空间存储有第一级页表的页表项数据至第M级页表的页表项数据中的至少部分页表项数据，M是大于1的整数。For example, in this example, the first page table walker is not directly connected to the memory in communication, and does not directly access the memory, but indirectly accesses the memory through the target preset cache space to obtain page table entry data. For example, the first page table traverser is further configured to obtain page table entry data in a step-by-step query manner from the path from the target preset cache space to the memory according to the address translation request. The way of this level-by-level query is similar to the way of acquiring data level-by-level through a multi-level cache. For example, when the target preset cache space is the last level cache space, the first page table traverser accesses the memory through the last level cache space; when the target preset cache space is the second level cache space or other level cache space, The first page table traverser accesses memory by querying down level by level through the target preset cache space. For example, the first level cache space to the Nth level cache space store at least part of the page table entry data from the page table entry data of the first level page table to the page table entry data of the Mth level page table, where M is greater than 1 Integer.

由此，第一页表遍历器读取的页表项数据可以被保存在目标预设缓存空间以及目标预设缓存空间与内存之间的缓存空间中，可以便于在下一次地址翻译过程中查询缓存空间中可能存在的页表项，如果命中，则无需再访问内存，以进一步提高地址翻译的效率，获得比内存访问更少的时延。并且，在本公开的实施例中，在多核架构下，由于第一页表遍历器读取的页表项数据被保存在目标预设缓存空间中，因此缓存一致性机制可以保证第一页表遍历器获得正确的页表项内容。Therefore, the page table entry data read by the first page table traverser can be stored in the target preset cache space and the cache space between the target preset cache space and the memory, which can facilitate the query of the cache in the next address translation process. If the page table entry that may exist in the space is hit, there is no need to access the memory, so as to further improve the efficiency of address translation and obtain less latency than memory access. Moreover, in the embodiment of the present disclosure, under the multi-core architecture, since the page table entry data read by the first page table traverser is stored in the target preset cache space, the cache coherency mechanism can ensure the first page table The traverser gets the correct page table entry content.

例如，在一些示例中，第一页表遍历器可以包括多级页表缓存区(Page WalkCache，PWT)。多级页表缓存区配置为缓存第一级页表的页表项数据至第M级页表的页表项数据中的至少部分页表项数据，M是大于1的整数。例如，多级页表缓存区是页表遍历器内部的一个缓存，用于保存最近使用过的第一级页表、第二级页表项、第三级页表项、第四级页表项等任意页表项。如果一个地址翻译在多级页表缓存区中找到相应的页表项，则可以跳过更高级的页表访问，从而减少内存访问次数及地址翻译时延。需要说明的是，多级页表缓存区是针对页表遍历器的微架构优化，也可以省略，这可以根据实际需求而定，本公开的实施例对此不作限制。类似地，第二页表遍历器也可以包括多级页表缓存区。For example, in some examples, the first page table walker may include a multi-level page table cache (Page WalkCache, PWT). The multi-level page table cache area is configured to cache at least part of the page table entry data of the page table entry data of the first level page table to the page table entry data of the Mth level page table, where M is an integer greater than 1. For example, the multi-level page table cache area is a cache inside the page table walker, which is used to save the most recently used first-level page tables, second-level page table entries, third-level page table entries, and fourth-level page tables. entry and other arbitrary page table entries. If an address translation finds the corresponding page table entry in the multi-level page table cache, it can skip higher-level page table accesses, thereby reducing the number of memory accesses and address translation latency. It should be noted that the multi-level page table cache area is optimized for the micro-architecture of the page table traverser, and may also be omitted, which may be determined according to actual requirements, which is not limited by the embodiments of the present disclosure. Similarly, the second page table walker may also include a multi-level page table cache area.

例如，第一页表遍历器还配置为响应于无法获取到地址翻译所需的页表项数据，发送错误反馈指令至处理器核，以确定页表项读取错误。也即是，在处理器架构下，当某一级被访问的页表不在内存中，或者数据操作与该数据所属页表项属性不符时，会触发页面错误(Page Fault)，并由操作系统来处理该异常。因此，当无法获取到地址翻译所需的页表项数据，第一页表遍历器会发送错误反馈指令至处理器核，错误反馈指令例如为中断指令或其他类型的指令，表示存在页表项读取错误，从而触发Page Fault。For example, the first page table walker is further configured to send an error feedback instruction to the processor core to determine a page table entry read error in response to failing to obtain page table entry data required for address translation. That is, under the processor architecture, when the page table accessed at a certain level is not in memory, or the data operation does not match the attributes of the page table entry to which the data belongs, a page fault (Page Fault) will be triggered, and the operating system will to handle the exception. Therefore, when the page table entry data required for address translation cannot be obtained, the first page table traverser will send an error feedback instruction to the processor core. The error feedback instruction is, for example, an interrupt instruction or other types of instructions, indicating that there is a page table entry. Read error, which triggers Page Fault.

例如，在一些示例中，第一页表遍历器还配置为发送数据返回指令至处理器核。当第一页表遍历器执行完地址翻译操作后，可以得到相应的物理地址，因此第一页表遍历器将数据返回指令发送至处理器核，从而将物理地址传递给处理器核。数据返回指令可以通过多级缓存架构传递至处理器核，也可以通过处理器内部的流水线传递至处理器核，本公开实施例对数据返回指令的传递方式不作限制。在数据返回指令通过多级缓存架构传递至处理器核的情形中，数据返回指令采用多级缓存架构可识别的请求响应类型。For example, in some examples, the first page table walker is further configured to send a data return instruction to the processor core. After the first page table walker completes the address translation operation, the corresponding physical address can be obtained. Therefore, the first page table walker sends a data return instruction to the processor core, thereby passing the physical address to the processor core. The data return instruction may be delivered to the processor core through a multi-level cache architecture, or may be delivered to the processor core through a pipeline inside the processor. The embodiment of the present disclosure does not limit the delivery manner of the data return instruction. In the case where the data return instruction is delivered to the processor core through the multi-level cache architecture, the data return instruction adopts a request-response type recognizable by the multi-level cache architecture.

例如，数据返回指令包括地址翻译请求序列号、内存页的物理地址及属性等。例如，在一些示例中，可以用Addr_Trans_Resp表示该信息是对地址翻译请求Addr_Trans_SN的回复(也即表示该信息是数据返回指令)，用 Addr_Trans_SN表示地址翻译请求序列号，用PTE表示相应的第一级页表项内容，例如包含内存页的物理地址及属性。For example, the data return instruction includes the address translation request sequence number, the physical address and attributes of the memory page, and the like. For example, in some examples, Addr_Trans_Resp may be used to indicate that the information is a reply to an address translation request Addr_Trans_SN (that is, to indicate that the information is a data return instruction), Addr_Trans_SN may be used to indicate the address translation request sequence number, and PTE may be used to indicate the corresponding first level Contents of page table entries, such as physical addresses and attributes of memory pages.

例如，处理器核配置为响应于基于预设规则在第一页表遍历器和第二页表遍历器中选择第二页表遍历器执行地址翻译操作，生成地址翻译请求，并且第二页表遍历器响应于地址翻译请求执行地址翻译操作，以得到物理地址。第二页表遍历器执行地址翻译操作的过程请参考图2，在此不再赘述。For example, the processor core is configured to generate an address translation request in response to selecting a second page table walker among the first page table walker and the second page table walker to perform an address translation operation based on a preset rule, and the second page table The traverser performs an address translation operation in response to an address translation request to obtain a physical address. Please refer to FIG. 2 for the process of performing the address translation operation by the second page table traverser, which will not be repeated here.

例如，在一些示例中，第二页表遍历器接收处理器核生成的地址翻译请求，根据地址翻译请求从内存获取页表项数据，并使用页表项数据进行地址翻译，以获得物理地址。例如，在确定由第一页表遍历器执行地址翻译操作的情形下，第二页表遍历器可以接收来自处理器核的地址翻译请求，然后再将地址翻译请求转发至第一页表遍历器，由此为地址翻译请求提供多样化的传输方式。第一页表遍历器接收该地址翻译请求后执行地址翻译操作，具体地址翻译操作过程请参考上述实施例，在此不再赘述。For example, in some examples, the second page table walker receives an address translation request generated by the processor core, obtains page table entry data from memory according to the address translation request, and performs address translation using the page table entry data to obtain a physical address. For example, where it is determined that the address translation operation is performed by the first page table walker, the second page table walker may receive the address translation request from the processor core and then forward the address translation request to the first page table walker , thus providing a variety of transmission methods for address translation requests. The first page table traverser performs an address translation operation after receiving the address translation request. For the specific address translation operation process, please refer to the above embodiment, and details are not repeated here.

需要说明的是，本公开的实施例中，处理器可以为单核架构，也可以为多核架构，本公开的实施例对此不作限制。缓存的数量以及设置方式也不受限制，这可以根据实际需求而定。处理器不限于图5所示出的结构，可以包括更多或更少的组成部分，各个组成部分之间的连接方式不受限制。It should be noted that, in the embodiment of the present disclosure, the processor may be a single-core architecture or a multi-core architecture, which is not limited by the embodiment of the present disclosure. The number of caches and how they are set are also unlimited, which can be determined according to actual needs. The processor is not limited to the structure shown in FIG. 5 , and may include more or less components, and the connection manner between the components is not limited.

图6为本公开一些实施例提供的一种信息预取方法的流程示意图。该信息预取方法可以用于图5所示的处理器。在一些实施例中，如图6所示，该信息预取方法包括如下操作。FIG. 6 is a schematic flowchart of an information prefetching method provided by some embodiments of the present disclosure. This information prefetching method can be used for the processor shown in FIG. 5 . In some embodiments, as shown in FIG. 6 , the information prefetching method includes the following operations.

步骤S10：响应于基于预设规则在第一页表遍历器和第二页表遍历器中选择第一页表遍历器执行地址翻译操作得到物理地址，第一页表遍历器向目标预设缓存空间发送预取请求，其中，预取请求包括物理地址；Step S10: In response to selecting the first page table traverser from the first page table traverser and the second page table traverser to perform the address translation operation based on the preset rule to obtain the physical address, the first page table traverser sends the target preset cache to the physical address. Space sends a prefetch request, wherein the prefetch request includes a physical address;

步骤S20：响应于预取请求，目标预设缓存空间基于物理地址进行信息预取操作。Step S20: In response to the prefetch request, the target preset cache space performs an information prefetch operation based on the physical address.

例如，在步骤S10中，处理器基于预设规则在第一页表遍历器和第二页表遍历器中选择第一页表遍历器执行地址翻译操作来减少地址翻译的时延，生成地址翻译请求。第一页表遍历器接收处理器发送的地址翻译请求执行地址翻译操作，得到物理地址。For example, in step S10, the processor selects the first page table traverser from the first page table traverser and the second page table traverser to perform the address translation operation based on a preset rule to reduce the delay of address translation and generate address translation ask. The first page table traverser receives the address translation request sent by the processor and performs an address translation operation to obtain a physical address.

例如，预设规则包括：当页表项缓存空间中不存在地址翻译所需的页表项数据，或者页表项缓存空间中对应于地址翻译所需的页表项数据的页表级大于阈值时，确定由第一页表遍历器执行地址翻译操作，否则，确定由第二页表遍历器执行该地址翻译操作。例如，阈值与CPU架构以及芯片工艺中多种因素相关，可以根据实际需求进行任意设定，本公开的实施例对此不作限制。For example, the preset rules include: when the page table entry data required for address translation does not exist in the page table entry cache space, or the page table level in the page table entry cache space corresponding to the page table entry data required for address translation is greater than a threshold When it is determined that the address translation operation is performed by the first page table walker, otherwise, it is determined that the address translation operation is performed by the second page table walker. For example, the threshold is related to the CPU architecture and various factors in the chip process, and can be arbitrarily set according to actual requirements, which is not limited in the embodiments of the present disclosure.

例如，当第一页表遍历器进行地址翻译操作得到物理地址后，第一页表遍历器向目标预设缓存空间发送预取请求。例如，预取请求包括物理地址，也即，预取请求中携带有物理地址，由此可以使目标预设缓存空间获得物理地址。For example, after the first page table traverser obtains the physical address by performing the address translation operation, the first page table traverser sends a prefetch request to the target preset cache space. For example, the prefetch request includes a physical address, that is, the prefetch request carries the physical address, so that the target preset cache space can obtain the physical address.

例如，在步骤S20中，目标预设缓存空间接收到预取请求后，会根据预取请求中携带的物理地址进行信息预取操作。例如，预取请求是用于触发目标预设缓存空间进行信息预取操作的请求，可以采用任意适用的请求类型，本公开的实施例对此不作限制。例如，信息预取操作用于实现信息预取，所预取的目标信息可以为数据也可以为指令，该目标信息存储在物理地址所指示的存储空间中。For example, in step S20, after receiving the prefetch request, the target preset cache space will perform an information prefetch operation according to the physical address carried in the prefetch request. For example, the prefetch request is a request for triggering the target preset cache space to perform an information prefetch operation, and any applicable request type may be used, which is not limited in the embodiments of the present disclosure. For example, the information prefetching operation is used to implement information prefetching, and the prefetched target information may be data or an instruction, and the target information is stored in the storage space indicated by the physical address.

图7为采用本公开实施例提供的处理器进行地址翻译以及请求数据的过程示意图。如图7所示，当某一数据读取请求发生TLB未命中(TLB Miss) 时，需要先进行页表遍历获得物理地址。例如，在由第一页表遍历器进行地址翻译的情形中，第一页表遍历器从内存中读取四级页表项以进行地址翻译，从而得到物理地址。该物理地址被发送至处理器核，然后处理器核根据物理地址从缓存/内存获得相应数据。FIG. 7 is a schematic diagram of a process of performing address translation and requesting data by using a processor provided by an embodiment of the present disclosure. As shown in FIG. 7 , when a TLB miss (TLB Miss) occurs in a data read request, it is necessary to traverse the page table first to obtain the physical address. For example, in the case of address translation by the first page table walker, the first page table walker reads the four-level page table entry from memory for address translation, thereby obtaining the physical address. The physical address is sent to the processor core, and then the processor core obtains the corresponding data from the cache/memory according to the physical address.

由于第一页表遍历器设置在目标预设缓存空间(例如LLC)旁边，因此第一页表遍历器以页表遍历的方式进行地址翻译时，第一页表遍历器与目标预设缓存空间比处理器核更早获得该请求的物理地址。此时，在处理器核基于物理地址请求数据之前，目标预设缓存空间可以提前根据物理地址从内存中获得相应数据，并发送到处理器核中(或者也可以发送到指定的缓存中，例如L1缓存或L2缓存等)，从而实现数据预取。图7中两个五角星图形之间的时间是该操作的所有时延，包括地址翻译时延(实线间最长距离)及数据预取时延(虚线间距离)。Since the first page table traverser is set next to the target preset cache space (such as LLC), when the first page table traverser performs address translation in the way of page table traversal, the first page table traverser and the target preset cache space Obtain the requested physical address earlier than the processor core. At this time, before the processor core requests data based on the physical address, the target preset cache space can obtain the corresponding data from the memory according to the physical address in advance, and send it to the processor core (or can also send it to the designated cache, such as L1 cache or L2 cache, etc.) to achieve data prefetching. The time between the two pentagrams in Figure 7 is all the delays for this operation, including the address translation delay (the longest distance between the solid lines) and the data prefetch delay (the distance between the dashed lines).

在该示例中，数据预取的时间如图7中虚线所示。与图4中请求数据的时间(图4中虚线所示的时间)相比，数据预取节省的时间大致等于从处理器核到目标预设缓存空间(例如LLC)的时延。因此，在第一页表遍历器靠近内存从而节省地址翻译时延的基础上，该数据预取方法可以进一步节省获取数据的时延。In this example, the time of data prefetching is shown by the dotted line in FIG. 7 . Compared with the time to request data in Figure 4 (the time shown by the dotted line in Figure 4), the time saved by data prefetching is roughly equal to the delay from the processor core to the target preset cache space (eg LLC). Therefore, on the basis that the first page table walker is close to the memory to save the address translation delay, the data prefetching method can further save the delay of acquiring data.

通过上述方式，该信息预取方法可以在降低地址翻译时延的同时实现数据/指令预取功能，有效减少数据/指令读写操作的时延，提高系统整体性能。In the above manner, the information prefetching method can realize the data/instruction prefetching function while reducing the address translation time delay, effectively reduce the data/instruction read and write operation time delay, and improve the overall performance of the system.

图8为图6中步骤S20的示例性流程图。在一些示例中，上述步骤S20 可以进一步包括如下操作。FIG. 8 is an exemplary flowchart of step S20 in FIG. 6 . In some examples, the above step S20 may further include the following operations.

步骤S21：确定预取缓存空间；Step S21: determine the prefetch cache space;

步骤S22：基于物理地址，目标预设缓存空间获取物理地址对应存储的目标信息；Step S22: Based on the physical address, the target preset cache space obtains the target information stored corresponding to the physical address;

步骤S23：目标预设缓存空间将目标信息发送至预取缓存空间。Step S23: The target preset buffer space sends the target information to the prefetch buffer space.

例如，在步骤S21中，首先需要确定预取缓存空间，预取缓存空间用于缓存物理地址对应存储的目标信息。例如，预取缓存空间是第一级缓存空间和至少一个预设缓存空间中至少之一，也即是，可以是第一级缓存空间和预设缓存空间中的任意一个或多个缓存空间。例如，在一些示例中，预取缓存空间是第一级缓存空间和至少一个预设缓存空间所形成的通信链路中相比于目标预设缓存空间更靠近处理器核的缓存空间，由此可以提高预取效率。在图5所示的处理器架构中，预取缓存空间可以是L2缓存，也可以是L1 缓存(L1I缓存或L1D缓存)。For example, in step S21, a prefetch cache space needs to be determined first, and the prefetch cache space is used to cache the target information stored corresponding to the physical address. For example, the prefetch cache space is at least one of the first-level cache space and at least one preset cache space, that is, it may be any one or more cache spaces of the first-level cache space and the preset cache space. For example, in some examples, the prefetch cache space is the cache space that is closer to the processor core than the target preset cache space in the communication link formed by the first level cache space and the at least one preset cache space, thereby Prefetching efficiency can be improved. In the processor architecture shown in FIG. 5, the prefetch cache space may be an L2 cache or an L1 cache (L1I cache or L1D cache).

图9为图8中步骤S21的示例性流程图。在一些示例中，如图9所示，上述步骤S21可以进一步包括如下操作。FIG. 9 is an exemplary flowchart of step S21 in FIG. 8 . In some examples, as shown in FIG. 9 , the above step S21 may further include the following operations.

步骤S211：获取预设标识；Step S211: obtaining a preset identifier;

步骤S212：根据预设标识确定预取缓存空间。Step S212: Determine the prefetch cache space according to the preset identifier.

例如，在步骤S211中，预设标识表示缓存空间的等级信息，也即，指示预取缓存空间是哪一级缓存空间。例如，预设标识为1时，表示预取缓存空间为L1缓存；预设标识为2时，表示预取缓存空间为L2缓存，此次类推。需要说明的是，本公开实施例对预设标识的具体数据格式和表示方式不作限制，只要能根据预设标识确定出预取缓存空间是哪一级缓存空间即可。For example, in step S211, the preset identifier indicates the level information of the cache space, that is, indicates which level of cache space the prefetch cache space is. For example, when the preset identifier is 1, it means that the prefetch cache space is the L1 cache; when the preset identifier is 2, it means that the prefetch cache space is the L2 cache, and so on. It should be noted that the embodiment of the present disclosure does not limit the specific data format and representation of the preset identifier, as long as it can determine which level of cache space the prefetch cache space is based on the preset identifier.

例如，预设标识存储在指定存储空间或者被携带在预取请求中。For example, the preset identifier is stored in the designated storage space or carried in the prefetch request.

例如，在一些示例中，预设标识存储在指定存储空间中，也即，预设标识可以是提前设定好的并且是固定不变的。在需要获取预设标识时，只需到指定存储空间读取即可。这种方式可以简化预设标识的获取方式。For example, in some examples, the preset identifier is stored in a designated storage space, that is, the preset identifier may be set in advance and is fixed. When you need to obtain the preset ID, you only need to read it from the specified storage space. In this way, the way of obtaining the preset identifier can be simplified.

例如，在另一些示例中，预设标识被携带在预取请求中。当第一页表遍历器向目标预设缓存空间发送预取请求时，使预取请求携带预设标识，从而使目标预设缓存空间可以获得预设标识，以确定预取缓存空间是哪一级缓存空间。第一页表遍历器确定预设标识的方式将在后文说明，此处不再赘述。通过这种方式，可以动态选择预取缓存空间，使预取缓存空间不固定为某一级缓存空间，可以在每次预取时灵活设定，从而提高整体的处理效率。For example, in other examples, the preset identifier is carried in the prefetch request. When the first page table traverser sends a prefetch request to the target preset cache space, the prefetch request carries the preset identifier, so that the target preset cache space can obtain the preset identifier to determine which prefetch cache space is. level cache space. The manner in which the first page table traverser determines the preset identifier will be described later, and will not be repeated here. In this way, the prefetching cache space can be dynamically selected, so that the prefetching cache space is not fixed to a certain level of cache space, and can be set flexibly in each prefetching, thereby improving the overall processing efficiency.

需要说明的是，本公开的实施例中，获取预设标识的方式不限于上文描述的方式，还可以为其他任意适用的方式，这可以根据实际需求而定，本公开的实施例对此不作限制。It should be noted that, in the embodiment of the present disclosure, the manner of acquiring the preset identifier is not limited to the manner described above, and may also be any other applicable manner, which may be determined according to actual needs, and the embodiment of the present disclosure is not limited to this. No restrictions apply.

例如，在步骤S212中，在获取到预设标识之后，可以根据预设标识确定预取缓存空间。例如，预设标识为1时，则确定预取缓存空间为L1缓存；预设标识为2时，则确定预取缓存空间为L2缓存，此次类推。在图5所示的处理器架构中，目标预设缓存空间为LLC，因此所确定的预取缓存空间L1缓存或L2缓存相比于LLC更靠近处理器核，由此可以提高预取效率。For example, in step S212, after the preset identifier is acquired, the prefetch cache space may be determined according to the preset identifier. For example, when the preset identifier is 1, the prefetch cache space is determined to be the L1 cache; when the preset identifier is 2, the prefetch cache space is determined to be the L2 cache, and so on. In the processor architecture shown in FIG. 5 , the target preset cache space is LLC, so the determined prefetch cache space L1 cache or L2 cache is closer to the processor core than LLC, thereby improving prefetch efficiency.

例如，在一些示例中，第一级缓存空间包括第一级指令空间(例如L1I 缓存)和第一级数据空间(例如L1D缓存)。在一种可能的情形中，预设标识所表示的等级信息指示第一级，也即，预设标识为1，则上述步骤S212 可以进一步包括如下操作，如图10所示。For example, in some examples, the first-level cache space includes a first-level instruction space (eg, L1I cache) and a first-level data space (eg, L1D cache). In a possible situation, the level information represented by the preset identifier indicates the first level, that is, the preset identifier is 1, and the above step S212 may further include the following operations, as shown in FIG. 10 .

步骤S212a：响应于目标信息为指令类型，确定预取缓存空间为第一级指令空间；Step S212a: in response to the target information being the instruction type, determine that the prefetch cache space is the first-level instruction space;

步骤S212b：响应于目标信息为数据类型，确定预取缓存空间为第一级数据空间。Step S212b: In response to the target information being the data type, determine that the prefetch buffer space is the first-level data space.

例如，在步骤S212a和步骤S212b中，由于第一级缓存空间包括L1I缓存和L1D缓存，两者分别缓存不同类型的信息，因此需要进一步确定预取缓存空间为L1I缓存和L1D缓存中的哪一个。若物理地址对应存储的目标信息是指令类型，则确定预取缓存空间为L1I缓存；若物理地址对应存储的目标信息是数据类型，则确定预取缓存空间为L1D缓存。由此可以将目标信息预取到正确的缓存中。For example, in step S212a and step S212b, since the first-level cache space includes L1I cache and L1D cache, which respectively cache different types of information, it is necessary to further determine which of the L1I cache and L1D cache the prefetch cache space is. . If the target information stored corresponding to the physical address is an instruction type, the prefetch cache space is determined to be an L1I cache; if the target information stored corresponding to the physical address is a data type, the prefetch cache space is determined to be an L1D cache. This allows the target information to be prefetched into the correct cache.

返回至图8，在步骤S22中，基于物理地址，目标预设缓存空间获取物理地址对应存储的目标信息。例如，在一些示例中，步骤S22可以包括：基于该物理地址，从目标预设缓存空间至内存的路径中以逐级查询的方式获取目标信息。如果目标信息在某一级缓存中命中，则可以直接获取。如果目标信息在缓存中未命中，则需要到内存中获取。逐级查询的方式与通过多级缓存逐级获取数据的方式类似。Returning to FIG. 8 , in step S22 , based on the physical address, the target preset cache space acquires target information stored corresponding to the physical address. For example, in some examples, step S22 may include: based on the physical address, acquiring target information in a step-by-step query manner from the path from the target preset cache space to the memory. If the target information is hit in a certain level of cache, it can be obtained directly. If the target information is not hit in the cache, it needs to be retrieved from memory. The way of level-by-level query is similar to the way of acquiring data level by level through multi-level cache.

例如，在一些示例中，物理地址PA可以表示为：PA＝(第一级PTE值) <<X|OFFSET_pg。这里，OFFSET_pg表示虚拟地址偏移，X表示内存页大小的log值。例如，对于4KB页面，其X值为12。需要说明的是，这仅是物理地址计算方式的一个示例，不构成对本公开实施例的限制。For example, in some examples, the physical address PA may be expressed as: PA=(first-level PTE value)<<X|OFFSET_pg. Here, OFFSET_pg represents the virtual address offset, and X represents the log value of the memory page size. For example, for a 4KB page, its X value is 12. It should be noted that this is only an example of a physical address calculation method, and does not constitute a limitation on the embodiments of the present disclosure.

例如，在步骤S23中，在获取到目标信息之后，目标预设缓存空间将目标信息发送至预取缓存空间，从而将目标信息缓存在预取缓存空间中。例如，在一些示例中，步骤S23可以包括：从目标预设缓存空间至预取缓存空间的路径中以逐级传递的方式将目标信息发送至预取缓存空间。逐级传递的方式与通过多级缓存逐级传递数据的方式类似。For example, in step S23, after acquiring the target information, the target preset buffer space sends the target information to the prefetch buffer space, so as to cache the target information in the prefetch buffer space. For example, in some examples, step S23 may include: sending the target information to the prefetch buffer space in a step-by-step manner in the path from the target preset buffer space to the prefetch buffer space. The way of passing level by level is similar to the way of passing data level by level through multi-level cache.

通过上述方式，目标信息被缓存在预取缓存空间中，当处理器核根据物理地址请求该目标信息时，可以在预取缓存空间中命中，从而有效减少数据 /指令读写操作的时延，提高系统整体性能。这种预取方式只需很少的硬件改动，并且只需增加较少的硬件资源，易于实现。In the above way, the target information is cached in the prefetch cache space. When the processor core requests the target information according to the physical address, it can hit the prefetch cache space, thereby effectively reducing the latency of data/instruction read and write operations. Improve overall system performance. This prefetching method requires few hardware changes, and only needs to add less hardware resources, so it is easy to implement.

图11为本公开一些实施例提供的另一种信息预取方法的流程示意图。在一些实施例中，除了包括步骤S10-S20，该信息预取方法还可以进一步包括步骤S30-S40。该实施例中的步骤S10-S20与图6中所示的步骤S10-S20 基本相同，此处不再赘述。FIG. 11 is a schematic flowchart of another information prefetching method provided by some embodiments of the present disclosure. In some embodiments, in addition to including steps S10-S20, the information prefetching method may further include steps S30-S40. Steps S10-S20 in this embodiment are basically the same as steps S10-S20 shown in FIG. 6, and are not repeated here.

步骤S30：响应于基于预设规则在第一页表遍历器和第二页表遍历器中选择第一页表遍历器执行地址翻译操作，第一页表遍历器响应于地址翻译请求执行地址翻译操作，以得到物理地址；Step S30: In response to selecting the first page table traverser among the first page table traverser and the second page table traverser to perform the address translation operation based on the preset rule, the first page table traverser performs address translation in response to the address translation request operation to get the physical address;

步骤S40：目标预设缓存空间将物理地址发送至处理器核。Step S40: The target preset cache space sends the physical address to the processor core.

例如，在步骤S30中，当需要将虚拟地址翻译为物理地址时，如果在ITLB 或DTLB中未命中，当根据预设规则确定由第一页表遍历器执行地址翻译操作时，则处理器核会向第一页表遍历器发送地址翻译请求。TLB的架构不限于ITLB和DTLB的方式，可以采用任意适用的架构，本公开的实施例对此不作限制。例如，地址翻译请求可以触发第一页表遍历器执行地址翻译操作。例如，地址翻译操作可以是多级页表的地址翻译过程，可以参见关于图1的说明，此处不再赘述。需要注意的是，进行地址翻译的页表不限于4级，可以采用任意数量的多级页表，例如2级页表、3级页表、5级页表等，还可以采用单级页表，这可以根据实际需求而定，本公开的实施例对此不作限制。例如，页表级数越多，每次地址翻译访问内存的次数就越多，因此本公开实施例提供的处理器能够提供的性能提升空间也就越大。例如，页表的物理页面大小不受限制，可以根据实际需求而定。For example, in step S30, when a virtual address needs to be translated into a physical address, if there is a miss in the ITLB or DTLB, when it is determined according to a preset rule that the address translation operation is performed by the first page table walker, the processor core An address translation request is sent to the first page table walker. The architecture of the TLB is not limited to the manners of ITLB and DTLB, and any applicable architecture may be adopted, which is not limited by the embodiments of the present disclosure. For example, an address translation request may trigger the first page table walker to perform an address translation operation. For example, the address translation operation may be an address translation process of a multi-level page table, and reference may be made to the description about FIG. 1 , which will not be repeated here. It should be noted that the page table for address translation is not limited to 4 levels, and any number of multi-level page tables can be used, such as 2-level page tables, 3-level page tables, 5-level page tables, etc., and single-level page tables can also be used. , which can be determined according to actual requirements, which is not limited by the embodiments of the present disclosure. For example, the greater the number of page table levels, the more times the memory is accessed for each address translation, and therefore the greater the performance improvement space that the processor provided by the embodiments of the present disclosure can provide. For example, the physical page size of the page table is not limited and can be determined according to actual needs.

例如，在一些示例中，步骤S30可以包括：第一页表遍历器接收处理器核生成的地址翻译请求，经由目标预设缓存空间从内存获取页表项数据，并使用页表项数据进行地址翻译操作，以获得物理地址。例如，第一页表遍历器根据地址翻译请求从目标预设缓存空间至内存的路径中以逐级查询的方式获取页表项数据，并使用页表项数据进行翻译以获得物理地址。For example, in some examples, step S30 may include: the first page table traverser receives the address translation request generated by the processor core, obtains page table entry data from the memory via the target preset cache space, and uses the page table entry data for address translation Translate operations to obtain physical addresses. For example, the first page table traverser obtains page table entry data in a step-by-step query manner from the path from the target preset cache space to the memory according to the address translation request, and uses the page table entry data for translation to obtain the physical address.

例如，地址翻译请求包括翻译信息，该翻译信息包括：地址翻译请求序列号、需要翻译的虚拟地址值、最高级页表的初始地址。在一些示例中，可以用Addr_Trans_Req表示该请求为地址翻译请求，用Addr_Trans_SN表示地址翻译请求序列号，用REG_pt表示最高级页表的初始地址(也即该进程的REG_pt值)，用VA表示需要翻译的虚拟地址值。For example, the address translation request includes translation information, and the translation information includes: the address translation request serial number, the virtual address value to be translated, and the initial address of the top-level page table. In some examples, Addr_Trans_Req may be used to indicate that the request is an address translation request, Addr_Trans_SN may be used to indicate the address translation request sequence number, REG_pt may be used to indicate the initial address of the highest level page table (ie, the REG_pt value of the process), and VA may be used to indicate that translation is required virtual address value.

例如，翻译信息还可以包括请求类型标识，该请求类型标识指示物理地址对应存储的目标信息是指令类型或数据类型。在一些示例中，可以用I/D 表示该请求对应的是指令还是数据，例如用I表示指令，用D表示数据。由此，在预设缓存空间为第一级缓存空间的情形中，可以根据目标信息的类型确定预设缓存空间为L1I缓存或L1D缓存。For example, the translation information may further include a request type identifier, where the request type identifier indicates that the target information stored corresponding to the physical address is an instruction type or a data type. In some examples, I/D may be used to indicate whether the request corresponds to an instruction or data, eg, I for an instruction and D for data. Therefore, in the case where the preset cache space is the first-level cache space, the preset cache space may be determined to be the L1I cache or the L1D cache according to the type of target information.

例如，在步骤S40中，在第一页表遍历器进行地址翻译操作得到物理地址并且向目标预设缓存空间发送预取请求之后，目标预设缓存空间将物理地址发送至处理器核。由于预取请求携带有物理地址，因此目标预设缓存空间可以获得物理地址。当然，本公开的实施例不限于此，也可以由第一页表遍历器单独发送物理地址至目标预设缓存空间，这可以根据实际需求而定。For example, in step S40, after the first page table traverser performs an address translation operation to obtain a physical address and sends a prefetch request to the target preset cache space, the target preset cache space sends the physical address to the processor core. Since the prefetch request carries the physical address, the target preset cache space can obtain the physical address. Of course, the embodiment of the present disclosure is not limited to this, and the first page table walker can also send the physical address to the target preset cache space independently, which can be determined according to actual requirements.

例如，在一些示例中，步骤S40可以包括：目标预设缓存空间以逐级传递的方式将物理地址发送至处理器核，也即，通过多级缓存架构传递至处理器核。当然，本公开的实施例不限于此，也可以通过处理器内部的流水线将物理地址传递至处理器核，本公开实施例对此不作限制。例如，可以采用数据返回指令实现物理地址的传输。例如，数据返回指令包括地址翻译请求序列号、内存页的物理地址及属性等。例如，在一些示例中，可以用 Addr_Trans_Resp表示该信息是对地址翻译请求Addr_Trans_SN的回复(也即表示该信息是数据返回指令)，用Addr_Trans_SN表示地址翻译请求序列号，用PTE表示相应的第一级页表项内容，例如包含内存页的物理地址及属性。For example, in some examples, step S40 may include: the target preset cache space transmits the physical address to the processor core in a level-by-level transfer manner, that is, transferred to the processor core through a multi-level cache architecture. Of course, the embodiment of the present disclosure is not limited to this, and the physical address may also be delivered to the processor core through a pipeline inside the processor, which is not limited in the embodiment of the present disclosure. For example, the transfer of the physical address can be accomplished using a data return instruction. For example, the data return instruction includes the address translation request sequence number, the physical address and attributes of the memory page, and the like. For example, in some examples, Addr_Trans_Resp may be used to indicate that the information is a reply to an address translation request Addr_Trans_SN (that is, to indicate that the information is a data return instruction), Addr_Trans_SN may be used to indicate the address translation request sequence number, and PTE may be used to indicate the corresponding first level Contents of page table entries, such as physical addresses and attributes of memory pages.

需要说明的是，当目标预设缓存空间接收到预取请求之后，可以并行执行步骤S20和S40，也即是，在将物理地址发送至处理器核的同时进行信息预取操作。这里，“同时”可以指在同一时刻开始执行，也可以指两个操作之间具有较小的时间差，本公开的实施例对此不作限制。当然，本公开的实施例不限于此，也可以按照一定顺序执行步骤S20和S40，例如先执行步骤S20 再执行步骤S40，或者先执行步骤S40再执行步骤S20，这可以根据实际需求而定。It should be noted that, after the target preset cache space receives the prefetch request, steps S20 and S40 may be performed in parallel, that is, the information prefetch operation is performed while the physical address is sent to the processor core. Here, "simultaneously" may mean that the execution starts at the same time, or may mean that there is a small time difference between two operations, which is not limited in this embodiment of the present disclosure. Of course, the embodiment of the present disclosure is not limited to this, and steps S20 and S40 can also be performed in a certain order, for example, step S20 is performed first and then step S40 is performed, or step S40 is performed first and then step S20 is performed, which may be determined according to actual needs.

图12为本公开一些实施例提供的另一种信息预取方法的流程示意图。例如，在一些示例中，处理器还包括页表项缓存空间和请求缓存区，处理器核与页表项缓存空间设置在同一路径等级，处理器核与页表项缓存空间通信连接。页表项缓存空间例如可以设置在处理器核内部。请求缓存区与第一页表遍历器设置在同一路径等级。该实施例中的步骤S10-S40与图11中所示的步骤S10-S40基本相同，此处不再赘述。在一些实施例中，除了包括步骤 S10-S40，该信息预取方法还可以进一步包括：FIG. 12 is a schematic flowchart of another information prefetching method provided by some embodiments of the present disclosure. For example, in some examples, the processor further includes a page table entry cache space and a request cache area, the processor core and the page table entry cache space are set at the same path level, and the processor core is communicatively connected to the page table entry cache space. For example, the page table entry cache space can be set inside the processor core. The request buffer is set at the same path level as the first page table walker. Steps S10-S40 in this embodiment are basically the same as steps S10-S40 shown in FIG. 11, and are not repeated here. In some embodiments, in addition to steps S10-S40, the information prefetching method may further include:

步骤S50：响应于页表项缓存空间中不存在地址翻译所需的页表项数据，利用处理器核生成地址翻译请求。Step S50: In response to the page table entry data required for address translation not existing in the page table entry cache space, use the processor core to generate an address translation request.

步骤S60：利用处理器核向请求缓存区发送待处理的地址翻译请求队列。Step S60: Using the processor core to send the pending address translation request queue to the request buffer.

例如，在步骤S50中，当需要将虚拟地址翻译为物理地址时，如果在ITLB 或DTLB中未命中，也即，页表项缓存空间中不存在地址翻译所需的页表项数据，则处理器核会生成地址翻译请求。For example, in step S50, when a virtual address needs to be translated into a physical address, if there is a miss in the ITLB or DTLB, that is, the page table entry data required for address translation does not exist in the page table entry cache space, then processing The core will generate an address translation request.

例如，在步骤S60中，当处理器包括多个处理器核时，第一页表遍历器无法处理多个处理器核同时发送的地址翻译请求，因此利用处理器核向请求缓存区发送待处理的地址翻译请求队列。第一页表遍历器可以从请求缓存区依序获取地址翻译请求并执行相应的地址翻译操作。For example, in step S60, when the processor includes multiple processor cores, the first page table walker cannot process the address translation requests sent by the multiple processor cores at the same time, so the processor core is used to send the pending request to the request buffer area. address translation request queue. The first page table traverser can sequentially obtain address translation requests from the request buffer and perform corresponding address translation operations.

例如，在一些示例中，该信息预取方法还可以进一步包括：响应于第一页表遍历器被确定执行地址翻译操作，第一页表遍历器接收第二页表遍历器转发的地址翻译请求。For example, in some examples, the information prefetching method may further include: in response to the first page table walker being determined to perform an address translation operation, the first page table walker receiving an address translation request forwarded by the second page table walker .

例如，在选择第一页表遍历器执行地址翻译操作的情形下，第二页表遍历器可以接收来自处理器核的地址翻译请求，然后再将地址翻译请求转发至第一页表遍历器，由此为地址翻译请求提供多样化的传输方式。第一页表遍历器接收该地址翻译请求后执行地址翻译操作，具体地址翻译操作过程请参考上述实施例，在此不再赘述。For example, in the case where the first page table walker is selected to perform the address translation operation, the second page table walker may receive the address translation request from the processor core, and then forward the address translation request to the first page table walker, This provides a variety of transmission methods for address translation requests. The first page table traverser performs an address translation operation after receiving the address translation request. For the specific address translation operation process, please refer to the above embodiment, and details are not repeated here.

图13为本公开一些实施例提供的另一种信息预取方法的流程示意图。如图13所示，在一些示例中，该信息预取方法还可以进一步包括步骤S70 和步骤S80。FIG. 13 is a schematic flowchart of another information prefetching method provided by some embodiments of the present disclosure. As shown in FIG. 13 , in some examples, the information prefetching method may further include steps S70 and S80.

步骤S70：处理器核根据第一级缓存空间和至少一个预设缓存空间的存储状态确定用于存储目标信息的缓存空间，并使地址翻译请求以预设标识的方式携带用于存储目标信息的缓存空间的等级信息；Step S70: The processor core determines the cache space for storing the target information according to the storage state of the first-level cache space and at least one preset cache space, and makes the address translation request carry the cache space for storing the target information in the form of a preset identifier. Level information of cache space;

步骤S80：第一页表遍历器解析得到预设标识，并使预取请求携带预设标识。Step S80: The first page table traverser obtains the preset identifier by parsing, and makes the prefetch request carry the preset identifier.

例如，在步骤S70中，当需要将虚拟地址翻译为物理地址并且确定由第一页表遍历器执行地址翻译操作时，处理器核会生成地址翻译请求并发送至第一页表遍历器。此时，处理器核根据第一级缓存空间和预设缓存空间的存储状态确定用于存储目标信息的缓存空间，并使地址翻译请求携带预设标识，该预设标识指示用于存储目标信息的缓存空间的等级信息。例如，可以在地址翻译请求中增加相应域来表示预设标识，以用于指定预取缓存空间。由此，可以将预设标识传递至第一页表遍历器。需要说明的是，可以根据每 1000条指令的缓存未命中数(Misses Per Thousand Instructions，MPKI)来确定用于存储目标信息的缓存空间，当然，还可以根据各个缓存空间的空闲程度、所缓存的数据的有效性和命中率等因素来确定用于存储目标信息的缓存空间，本公开的实施例对此不作限制。For example, in step S70, when the virtual address needs to be translated into a physical address and it is determined that the address translation operation is performed by the first page table walker, the processor core generates an address translation request and sends it to the first page table walker. At this time, the processor core determines the cache space for storing the target information according to the storage state of the first-level cache space and the preset cache space, and makes the address translation request carry a preset identifier, and the preset identifier indicates that the target information is stored. The level information of the cache space. For example, a corresponding field may be added to the address translation request to represent the preset identifier, which is used to specify the prefetch buffer space. Thus, the preset identification can be passed to the first page table walker. It should be noted that the cache space for storing target information can be determined according to the number of cache misses per 1000 instructions (Misses Per Thousand Instructions, MPKI). The cache space for storing the target information is determined by factors such as the validity of the data and the hit rate, which is not limited in this embodiment of the present disclosure.

例如，在一些示例中，若第一级缓存空间的MPKI数值较大，则确定采用第一级缓存空间作为预取缓存空间，也即，将物理地址对应存储的目标信息预取并缓存至第一级缓存空间。此时处理器核将预设标识设置为1并且随同地址翻译请求发送至页表遍历器。例如，在该情形中，结合图10所示的确定L1I缓存和L1D缓存的方法，可以进一步具体确定出预取缓存空间。For example, in some examples, if the MPKI value of the first-level cache space is relatively large, the first-level cache space is determined to be used as the prefetch cache space, that is, the target information stored corresponding to the physical address is prefetched and cached to the first level. Level 1 cache space. At this time, the processor core sets the preset flag to 1 and sends it to the page table walker along with the address translation request. For example, in this case, combined with the method for determining the L1I cache and the L1D cache shown in FIG. 10 , the prefetch cache space can be further specifically determined.

例如，在另一些示例中，若第二级缓存空间的MPKI数值较大，则确定采用第二级缓存空间作为预取缓存空间，也即，将物理地址对应存储的目标信息预取并缓存至第二级缓存空间。此时处理器核将预设标识设置为2并且随同地址翻译请求发送至第一页表遍历器。For example, in other examples, if the MPKI value of the second-level cache space is relatively large, the second-level cache space is determined to be used as the prefetch cache space, that is, the target information stored corresponding to the physical address is prefetched and cached to Second level cache space. At this time, the processor core sets the preset flag to 2 and sends it to the first page table walker along with the address translation request.

例如，在步骤S80中，当第一页表遍历器需要向目标预设缓存空间发送预取请求时，第一页表遍历器会将根据解析地址翻译请求所得到的预设标识随同预取请求发送至目标预设缓存空间，由此可以使目标预设缓存空间根据预设标识确定出预取缓存空间。例如，可以在预取请求中增加相应域来表示预设标识，以用于指定预取缓存空间。For example, in step S80, when the first page table traverser needs to send a prefetch request to the target preset cache space, the first page table traverser will accompany the prefetch request with the preset identifier obtained by parsing the address translation request It is sent to the target preset cache space, so that the target preset cache space can determine the prefetch cache space according to the preset identifier. For example, a corresponding field may be added to the prefetch request to represent the preset identifier, which is used to specify the prefetch buffer space.

通过上述方式，可以由处理器核确定本次预取所采用的缓存空间并将预设标识通过第一页表遍历器传递至目标预设缓存空间，从而使目标预设缓存空间获知本次预取所采用的缓存空间。由此，可以动态选择预取缓存空间，使预取缓存空间不固定为某一级缓存空间，可以在每次预取时灵活设定，从而提高整体的处理效率。In the above manner, the processor core can determine the cache space used for this prefetch and transmit the preset identifier to the target preset cache space through the first page table traverser, so that the target preset cache space can know this prefetch. Take the buffer space used. Therefore, the prefetching cache space can be dynamically selected, so that the prefetching cache space is not fixed to a certain level of cache space, and can be flexibly set in each prefetching, thereby improving the overall processing efficiency.

图14为本公开一些实施例提供的另一种信息预取方法的流程示意图。如图14所示，在一些示例中，该信息预取方法还可以进一步包括步骤S90 和S100。FIG. 14 is a schematic flowchart of another information prefetching method provided by some embodiments of the present disclosure. As shown in FIG. 14, in some examples, the information prefetching method may further include steps S90 and S100.

步骤S90：响应于基于预设规则在第一页表遍历器和第二页表遍历器中选择第二页表遍历器执行地址翻译操作，第二页表遍历器响应于地址翻译请求执行地址翻译操作，以得到物理地址。Step S90: In response to selecting the second page table traverser among the first page table traverser and the second page table traverser to perform the address translation operation based on the preset rule, the second page table traverser performs address translation in response to the address translation request operation to get the physical address.

例如，处理器核响应于页表项缓存空间中不存在地址翻译所需的页表项数据，生成地址翻译请求，以及在确定由第二页表遍历器执行地址翻译操作的情形下向第二页表遍历器发送地址翻译请求。例如，TLB的架构不限于 ITLB和DTLB的方式，可以采用任意适用的架构，本公开的实施例对此不作限制。在确定由第二页表遍历器执行地址翻译操作的情形下，第二页表遍历器接收处理器核生成的地址翻译请求，根据地址翻译请求从内存获取页表项数据，并使用页表项数据进行地址翻译，以获得物理地址。For example, the processor core generates an address translation request in response to the absence of page table entry data required for the address translation in the page table entry cache space, and if it is determined that the address translation operation is performed by the second page table walker The page table walker sends an address translation request. For example, the architecture of the TLB is not limited to the manner of ITLB and DTLB, and any applicable architecture may be adopted, which is not limited by the embodiments of the present disclosure. In the case where it is determined that the address translation operation is performed by the second page table walker, the second page table walker receives the address translation request generated by the processor core, obtains the page table entry data from the memory according to the address translation request, and uses the page table entry The data is address translated to obtain the physical address.

例如，在一些示例中，该方法还可以包括：For example, in some examples, the method may also include:

步骤S100：响应于第一页表遍历器被确定执行地址翻译操作，利用第二页表遍历器向第一页表遍历器转发地址翻译请求。Step S100: In response to the first page table walker being determined to perform the address translation operation, use the second page table walker to forward the address translation request to the first page table walker.

例如，处理器核响应于页表项缓存空间中不存在地址翻译所需的页表项数据，生成地址翻译请求，以及在确定由第一页表遍历器执行地址翻译操作的情形下，可以通过第二页表遍历器转发地址翻译请求给第一页表遍历器。例如，当需要将虚拟地址翻译为物理地址时，如果地址翻译所需要的页表项数据在ITLB或DTLB中未命中，则需要进行地址翻译操作。此时处理器会基于预设规则确定第一页表遍历器和第二页表遍历器之一来执行地址翻译操作。当确定由第一页表遍历器执行地址翻译操作时，处理器核触发第二页表遍历器转发地址翻译请求给第一页表遍历器。For example, the processor core generates an address translation request in response to the absence of page table entry data required for address translation in the page table entry cache space, and in the case where it is determined that the address translation operation is performed by the first page table walker, can pass The second page table walker forwards the address translation request to the first page table walker. For example, when a virtual address needs to be translated into a physical address, if the page table entry data required for the address translation is not hit in the ITLB or DTLB, an address translation operation is required. At this time, the processor determines one of the first page table traverser and the second page table traverser to perform the address translation operation based on a preset rule. When it is determined that the address translation operation is performed by the first page table walker, the processor core triggers the second page table walker to forward the address translation request to the first page table walker.

需要说明的是，本公开的实施例中，信息预取方法不限于上文描述的步骤，还可以包括更多或更少的步骤，各个步骤的执行顺序不受限制，这可以根据实际需求而定。关于该方法的详细说明，可以参考上文中关于处理器的描述，此处不再赘述。It should be noted that, in the embodiments of the present disclosure, the information prefetching method is not limited to the steps described above, but may also include more or less steps, and the execution order of each step is not limited, which can be determined according to actual requirements. Certainly. For a detailed description of the method, reference may be made to the above description of the processor, which will not be repeated here.

本公开至少一个实施例还提供一种电子设备，该电子设备包括本公开任一实施例提供的处理器。该电子设备可以在降低地址翻译时延的同时实现数据/指令预取功能，有效减少数据/指令读写操作的时延，提高系统整体性能。At least one embodiment of the present disclosure further provides an electronic device, where the electronic device includes the processor provided by any embodiment of the present disclosure. The electronic device can realize the data/instruction prefetch function while reducing the address translation time delay, effectively reduce the data/instruction read and write operation time delay, and improve the overall performance of the system.

图15为本公开一些实施例提供的一种电子设备的示意框图。如图15所示，电子设备100包括处理器110，处理器110为本公开任一实施例提供的处理器，例如为图5所示的处理器。电子设备100可以用于大数据、云计算、人工智能(AI)等新型应用场景，相应地，电子设备100可以为大数据计算设备、云计算设备、人工智能设备等，本公开的实施例对此不作限制。FIG. 15 is a schematic block diagram of an electronic device according to some embodiments of the present disclosure. As shown in FIG. 15 , the electronic device 100 includes a processor 110, and the processor 110 is a processor provided by any embodiment of the present disclosure, for example, the processor shown in FIG. 5 . The electronic device 100 can be used in new application scenarios such as big data, cloud computing, artificial intelligence (AI), etc. Correspondingly, the electronic device 100 can be a big data computing device, a cloud computing device, an artificial intelligence device, etc. This is not limited.

图16为本公开一些实施例提供的另一种电子设备的示意框图。如图16 所示，该电子设备200例如适于用来实施本公开实施例提供的信息预取方法。电子设备200可以是终端设备或服务器等。需要注意的是，图16示出的电子设备200仅仅是一个示例，其不会对本公开实施例的功能和使用范围带来任何限制。FIG. 16 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure. As shown in FIG. 16 , the electronic device 200 is, for example, suitable for implementing the information prefetching method provided by the embodiment of the present disclosure. The electronic device 200 may be a terminal device or a server or the like. It should be noted that the electronic device 200 shown in FIG. 16 is only an example, which does not impose any limitation on the function and scope of use of the embodiments of the present disclosure.

如图16所示，电子设备200可以包括处理装置(例如中央处理器、图形处理器等)21，其可以根据存储在只读存储器(ROM)22中的程序或者从存储装置28加载到随机访问存储器(RAM)23中的程序而执行各种适当的动作和处理。例如，处理装置21可以为本公开任一实施例提供的处理器，例如为图5所示的处理器。在RAM 23中，还存储有电子设备200操作所需的各种程序和数据。处理装置21、ROM 22以及RAM 23通过总线24彼此相连。输入/输出(I/O)接口25也连接至总线24。As shown in FIG. 16 , the electronic device 200 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 21 that may be loaded into random access according to a program stored in a read only memory (ROM) 22 or from a storage device 28 Various appropriate operations and processes are executed by the programs in the memory (RAM) 23 . For example, the processing device 21 may be a processor provided by any embodiment of the present disclosure, such as the processor shown in FIG. 5 . In the RAM 23, various programs and data necessary for the operation of the electronic device 200 are also stored. The processing device 21 , the ROM 22 , and the RAM 23 are connected to each other through a bus 24 . An input/output (I/O) interface 25 is also connected to the bus 24 .

通常，以下装置可以连接至I/O接口25：包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置26；包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置27；包括例如磁带、硬盘等的存储装置28；以及通信装置29。通信装置29可以允许电子设备200 与其他电子设备进行无线或有线通信以交换数据。虽然图16示出了具有各种装置的电子设备200，但应理解的是，并不要求实施或具备所有示出的装置，电子设备200可以替代地实施或具备更多或更少的装置。Typically, the following devices may be connected to the I/O interface 25: input devices 26 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration an output device 27 of a computer or the like; a storage device 28 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 29 . The communication means 29 may allow the electronic device 200 to communicate wirelessly or by wire with other electronic devices to exchange data. While FIG. 16 shows electronic device 200 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided, and electronic device 200 may alternatively implement or implement more or less means.

关于电子设备100/200的详细说明和技术效果，可以参考上文中关于处理器和信息预取方法的描述，此处不再赘述。For the detailed description and technical effects of the electronic device 100/200, reference may be made to the above description of the processor and the information prefetching method, which will not be repeated here.

有以下几点需要说明：The following points need to be noted:

(1)本公开实施例附图只涉及到本公开实施例涉及到的结构，其他结构可参考通常设计。(1) The drawings of the embodiments of the present disclosure only relate to the structures involved in the embodiments of the present disclosure, and other structures may refer to general designs.

(2)在不冲突的情况下，本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。(2) The embodiments of the present disclosure and features in the embodiments may be combined with each other to obtain new embodiments without conflict.

以上所述，仅为本公开的具体实施方式，但本公开的保护范围并不局限于此，本公开的保护范围应以所述权利要求的保护范围为准。The above descriptions are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto, and the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims

1. An information prefetching method for a processor, wherein the processor comprises a first-level cache space, a first page table traverser, a second page table traverser, and at least one preset cache space, the The first level cache space and the at least one preset cache space are sequentially connected in communication to form a communication link, the at least one preset cache space includes a target preset cache space, and the first page table traverser is connected to the The target preset cache space is set at the same path level, the first page table traverser is connected in communication with the target preset cache space, and the second page table traverser and the first level cache space are set on the same path level, the second page table traverser is communicatively connected to the first level cache space, and the same path level is physically adjacent or close and can directly perform data interaction and transfer,

The method includes:

In response to selecting the first page table walker among the first page table walker and the second page table walker to perform an address translation operation to obtain a physical address based on a preset rule, the first page table walker sending a prefetch request to the target preset cache space, wherein the prefetch request includes the physical address;

In response to the prefetch request, the target preset cache space performs an information prefetch operation based on the physical address;

Alternatively, in response to selecting the second page table walker among the first page table walker and the second page table walker to perform the address translation operation based on the preset rule, the second page A table walker performs the address translation operation in response to an address translation request to obtain the physical address.

2. The method of claim 1, wherein the processor further comprises a processor core,

The target preset cache space performs the information prefetching operation based on the physical address, including:

determining a prefetch cache space, wherein the prefetch cache space is at least one of the first level cache space and the at least one preset cache space;

Based on the physical address, the target preset cache space obtains target information stored corresponding to the physical address;

The target preset buffer space sends the target information to the prefetch buffer space.

3. The method of claim 2, wherein determining the prefetch cache space comprises:

Acquiring a preset identifier, wherein the preset identifier represents the level information of the cache space, and the preset identifier is stored in the designated storage space or carried in the prefetch request;

The prefetch buffer space is determined according to the preset identifier.

4. The method according to claim 3, wherein the first-level cache space comprises a first-level instruction space and a first-level data space, and the level information indicates the first level,

Determining the prefetch cache space according to the preset identifier includes:

In response to the target information being an instruction type, determining that the prefetch cache space is the first-level instruction space;

In response to the target information being a data type, it is determined that the prefetch buffer space is the first-level data space.

5. The method according to claim 2, wherein, based on the physical address, the target preset cache space obtains the target information stored corresponding to the physical address, comprising:

Based on the physical address, the target information is acquired from the path from the target preset cache space to the memory in a step-by-step query manner.

6. The method according to claim 2, wherein the target preset buffer space sends the target information to the prefetch buffer space, comprising:

The target information is sent to the prefetching buffer space in a step-by-step manner in the path from the target preset buffer space to the prefetching buffer space.

7. The method of claim 2, further comprising:

The target preset buffer space sends the physical address to the processor core.

8. The method of claim 7, wherein the target preset cache space sends the physical address to the processor core, comprising:

The target preset buffer space sends the physical address to the processor core in a stage-by-stage manner.

9. The method according to claim 3, wherein the processor further comprises a page table entry cache space, the processor core and the page table entry cache space are set at the same path level, and the processor core and the page table entry cache space are set at the same path level. the page table entry cache space communication connection,

The method also includes:

In response to the page table entry data required for address translation not existing in the page table entry cache space, the address translation request is generated by the processor core.

10. The method of claim 9, further comprising:

in response to selecting the first page table walker among the first page table walker and the second page table walker to perform the address translation operation based on the preset rule, the first page table walker The processor performs the address translation operation in response to the address translation request to obtain the physical address.

11. The method of claim 10, wherein the first page table walker performs the address translation operation in response to the address translation request to obtain the physical address, comprising:

The first page table traverser receives the address translation request generated by the processor core, obtains page table entry data from memory via the target preset cache space, and uses the page table entry data to perform the address translation. translate operation to obtain the physical address.

12. The method of claim 11, wherein the first page table walker obtains the page table entry data from the memory via the target preset cache space, and uses the page table entry data to perform The address translation operation includes:

The first page table traverser obtains the page table entry data in a step-by-step query manner from the path from the target preset cache space to the memory according to the address translation request, and uses the page table entry The data is translated to obtain the physical address.

13. The method of claim 12, further comprising:

In response to the first page table walker being determined to perform the address translation operation, the first page table walker receives the address translation request forwarded by the second page table walker.

14. The method according to claim 12, wherein the address translation request includes translation information, and the translation information includes: address translation request serial number, virtual address value to be translated, and initial address of the top-level page table.

15. The method of claim 14, wherein the translation information further comprises a request type identifier, the request type identifier indicating that the target information stored corresponding to the physical address is an instruction type or a data type.

16. The method of claim 9, further comprising:

The processor core determines a cache space for storing the target information according to the storage state of the first-level cache space and the at least one preset cache space, and makes the address translation request use the preset identifier way to carry the level information of the cache space used to store the target information;

The first page table traverser obtains the preset identifier by parsing, and causes the prefetch request to carry the preset identifier.

17. The method of claim 1, wherein the second page table walker performs the address translation operation to obtain the physical address in response to the address translation request, comprising:

The second page table traverser receives the address translation request generated by the processor core, obtains page table entry data from memory according to the address translation request, and uses the page table entry data to perform address translation to obtain the physical address.

18. The method according to claim 10, wherein the preset rule comprises: when the page table entry data required for address translation does not exist in the page table entry cache space, or the page table entry cache space does not exist in the page table entry cache space When the page table level corresponding to the page table entry data required for address translation is greater than a threshold, it is determined that the address translation operation is performed by the first page table walker.

19. The method of claim 9, wherein the processor further comprises a request buffer, the request buffer and the first page table traverser are set at the same path level, the request buffer and the the first page table traverser is communicatively connected, and is communicatively connected with the target preset cache space,

The method also includes:

Send a queue of pending address translation requests to the request buffer using the processor core.

20. The method of claim 2, wherein the at least one preset cache space includes a second level cache space to an Nth level cache space, N is an integer greater than 2,

The Nth level cache space is closest to the memory and farthest from the processor core, and any level cache space from the second level cache space to the Nth level cache space is used as the target preset cache space .

21. The method according to claim 20, wherein the Nth level cache space is a shared type cache space, and the Nth level cache space is used as the target preset cache space.

22. The method according to claim 20, wherein the second level cache space is a private type or a shared type cache space, and the second level cache space is used as the target preset cache space.

23. A processor comprising a first level cache space, a first page table traverser, a second page table traverser and at least one preset cache space,

Wherein, the first-level cache space and the at least one preset cache space are sequentially connected in communication to form a communication link, the at least one preset cache space includes a target preset cache space, and the first page table traverses The first page table traverser is connected to the target preset cache space in communication, and the second page table traverser is connected to the first level cache space. Set at the same path level, the second page table traverser is in communication connection with the first level cache space, and the same path level is physically adjacent or close to and can directly perform data interaction and transfer,

The first page table walker is configured to: in response to selecting the first page table walker from the first page table walker and the second page table walker based on a preset rule to perform an address translation operation to obtain a physical address, sending a prefetch request to the target preset cache space, wherein the prefetch request includes the physical address;

The target preset cache space is configured to: in response to the prefetch request, perform an information prefetch operation based on the physical address;

The second page table walker is configured to execute the second page table walker in response to selecting the second page table walker among the first page table walker and the second page table walker based on the preset rule An address translation operation is performed in response to an address translation request to obtain the physical address.

24. The processor of claim 23, wherein the target preset cache space is further configured to determine a prefetch cache space, obtain target information stored corresponding to the physical address based on the physical address, and store the The target information is sent to the prefetch buffer space,

The prefetch cache space is at least one of the first level cache space and the at least one preset cache space.

25. An electronic device comprising a processor according to claim 23 or 24.