CN118796278A

CN118796278A - Processor instruction fetching method, device, equipment and storage medium

Info

Publication number: CN118796278A
Application number: CN202411289756.2A
Authority: CN
Inventors: 李祖松; 郇丹丹; 商家玮
Original assignee: Beijing Micro Core Technology Co ltd
Current assignee: Beijing Micro Core Technology Co ltd
Priority date: 2024-09-14
Filing date: 2024-09-14
Publication date: 2024-10-18
Anticipated expiration: 2044-09-14
Also published as: CN118796278B

Abstract

The present invention provides a processor instruction fetching method, device, equipment and storage medium, and relates to the field of computer technology. The method is applied to a processor in an electronic device, and the electronic device comprises a processor and a memory, and the memory comprises a cache memory. The method comprises: receiving an instruction fetching request; the instruction block of the instruction fetching request comprises ordinary instructions and compressed instructions; determining a valid result of the instruction fetching request according to the instruction fetching request; the valid result is used to characterize whether the instruction fetching request needs to fetch instructions across rows from the cache memory, and the cache memory comprises an odd-body storage body and an even-body storage body; when the valid result is valid, fetching a target instruction from two consecutive cache memory lines where the instruction block is located according to the instruction fetching request; the instruction block is determined based on an instruction fetching address in the instruction fetching request; when the valid result is invalid, fetching the target instruction from the cache memory line where the instruction block is located according to the instruction fetching request.

Description

Processor instruction fetching method, device, equipment and storage medium

技术领域Technical Field

本发明涉及计算机技术领域，尤其涉及一种处理器取指令方法、装置、设备和存储介质。The present invention relates to the field of computer technology, and in particular to a processor instruction fetching method, device, equipment and storage medium.

背景技术Background Art

现代处理器的指令集架构（Instruction Set Architecture，ISA）设计中，为了降低程序代码的大小并提高代码密度，引入了压缩（Compressed）指令（简称C指令）。这些压缩指令能够减少指令高速缓存（Instruction Cache）的缺失率，从而提高处理器的性能、降低功耗、减小芯片面积和成本。In the design of instruction set architecture (ISA) of modern processors, compressed instructions (C instructions for short) are introduced to reduce the size of program code and improve code density. These compressed instructions can reduce the miss rate of instruction cache, thereby improving processor performance, reducing power consumption, and reducing chip area and cost.

例如，采用RISC-V指令集（Reduced Instruction Set Computer Five，第五代精简指令集）的压缩指令子集（RVC，包括RV64C、RV32C）可以使程序代码段大小减少25%～30%，减少20%～25%的指令Cache缺失，相当于指令Cache容量翻倍的性能提升。高级精简指令集计算机架构（Advanced RISC Machines，ARM）和无互锁流水线阶段的微处理器架构（Microprocessor without Interlocked Pipeline Stages，MIPS）也通过提供16位的压缩指令集（如ARM的Thumb和Thumb2，MIPS的MIPS16和microMIPS）来减小代码大小，相对于它们的32位标准指令集，压缩后的ISA可以减少大约25%～30%的代码大小。For example, the use of the compressed instruction subset (RVC, including RV64C and RV32C) of the RISC-V instruction set (Reduced Instruction Set Computer Five, the fifth generation of reduced instruction set) can reduce the size of the program code segment by 25% to 30%, and reduce the instruction cache miss by 20% to 25%, which is equivalent to a performance improvement of doubling the instruction cache capacity. Advanced RISC Machines (ARM) and Microprocessor without Interlocked Pipeline Stages (MIPS) also reduce code size by providing 16-bit compressed instruction sets (such as ARM's Thumb and Thumb2, MIPS's MIPS16 and microMIPS). Compared with their 32-bit standard instruction sets, the compressed ISA can reduce the code size by about 25% to 30%.

然而，压缩指令的使用带来了新的问题：由于普通指令和压缩指令长度不同，存在一条指令跨两个高速缓存Cache行的情况，这增加了取指令部件的设计难度。However, the use of compressed instructions brings new problems: due to the different lengths of ordinary instructions and compressed instructions, there is a situation where an instruction spans two cache lines, which increases the difficulty of designing the instruction fetch unit.

因此，如何实现跨高速缓存行取指令，已成为当前业内亟需解决的技术问题。Therefore, how to implement cross-cache line instruction fetching has become a technical problem that urgently needs to be solved in the industry.

发明内容Summary of the invention

本发明提供一种处理器取指令方法、装置、设备和存储介质，有效实现了跨高速缓存行取指令，提高了处理器性能。The present invention provides a processor instruction fetching method, device, equipment and storage medium, which effectively realizes cross-cache line instruction fetching and improves processor performance.

本发明提供一种处理器取指令方法，应用于电子设备中的处理器，所述电子设备包括所述处理器和存储器，所述存储器包括高速缓存器；该方法包括如下步骤：The present invention provides a processor instruction fetching method, which is applied to a processor in an electronic device, wherein the electronic device comprises the processor and a memory, wherein the memory comprises a cache memory; the method comprises the following steps:

接收取指令请求；所述取指令请求的指令块中包括普通指令和压缩指令；receiving an instruction fetch request; the instruction block of the instruction fetch request includes common instructions and compressed instructions;

根据所述取指令请求，确定所述取指令请求的有效结果；所述有效结果用于表征所述取指令请求是否需要从所述高速缓存器中跨行取指令；所述高速缓存器中包括奇体存储体和偶体存储体，以使所述高速缓存器中连续的两个高速缓存行的地址处于两个不同的奇偶体存储体上；Determine, according to the instruction fetch request, a valid result of the instruction fetch request; the valid result is used to indicate whether the instruction fetch request needs to fetch instructions across rows from the cache; the cache includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-even-body storage bodies;

在所述有效结果为有效的情况下，根据所述取指令请求从所述指令块所在的连续两个所述高速缓存行中取出目标指令；所述指令块为基于所述取指令请求中的取指地址确定的；When the valid result is valid, fetching the target instruction from two consecutive cache lines where the instruction block is located according to the instruction fetch request; the instruction block is determined based on the instruction fetch address in the instruction fetch request;

在所述有效结果为无效的情况下，根据所述取指令请求从所述指令块所在的所述高速缓存行中取出所述目标指令。When the valid result is invalid, the target instruction is fetched from the cache line where the instruction block is located according to the instruction fetch request.

根据本发明提供的一种处理器取指令方法，所述根据所述取指令请求，确定所述取指令请求的有效结果，包括：According to a processor instruction fetching method provided by the present invention, determining a valid result of the instruction fetching request according to the instruction fetching request includes:

根据所述取指地址，确定所述指令块的标识位；所述标识位用于标记所述指令块中是否包含跨行指令，所述跨行指令为所述跨行指令的全部字节落在两个不同的所述高速缓存行中；Determine the identification bit of the instruction block according to the instruction fetch address; the identification bit is used to mark whether the instruction block contains a cross-line instruction, and the cross-line instruction is that all bytes of the cross-line instruction fall in two different cache lines;

根据所述指令块的标识位，确定所述取指令请求的有效结果。A valid result of the instruction fetch request is determined according to the identification bit of the instruction block.

根据本发明提供的一种处理器取指令方法，所述标识位的取值包括1或0，所述根据所述指令块的标识位，确定所述取指令请求的有效结果，包括：According to a processor instruction fetching method provided by the present invention, the value of the identification bit includes 1 or 0, and determining the valid result of the instruction fetching request according to the identification bit of the instruction block includes:

在所述标识位的取值为1的情况下，将所述有效结果确定为有效；When the value of the flag bit is 1, the valid result is determined to be valid;

在所述标识位的取值为0的情况下，将所述有效结果确定为无效。When the value of the flag bit is 0, the valid result is determined to be invalid.

根据本发明提供的一种处理器取指令方法，所述取指令请求中包括所述取指地址，所述取指地址为所述取指令请求对应的目标高速缓存行的访问地址，所述取指地址包括标签、索引、块内偏移，所述标签用于与所述高速缓存器中的所述高速缓存行做多路标签比较，所述索引用于指示所述取指令请求对应的所述高速缓存行的行号，所述索引包含高位索引和低位索引，所述高位索引用于指示所述高速缓存行的行地址，所述低位索引用于指示存储体的地址，所述块内偏移（Block Offset）用于指示所述取指令请求的所述指令块在所述高速缓存行内的具体位置；According to a processor instruction fetching method provided by the present invention, the instruction fetching request includes the instruction fetching address, the instruction fetching address is the access address of the target cache line corresponding to the instruction fetching request, the instruction fetching address includes a tag, an index, and a block offset, the tag is used to perform a multi-way tag comparison with the cache line in the cache, the index is used to indicate the row number of the cache line corresponding to the instruction fetching request, the index includes a high-order index and a low-order index, the high-order index is used to indicate the row address of the cache line, the low-order index is used to indicate the address of a storage body, and the block offset (Block Offset) is used to indicate the specific position of the instruction block of the instruction fetching request in the cache line;

所述根据所述取指令请求从所述指令块所在的连续两个所述高速缓存行中取出目标指令，包括：The fetching of the target instruction from two consecutive cache lines where the instruction block is located according to the instruction fetch request comprises:

根据所述取指地址中的标签，确定所述取指令请求对应的路；Determining a way corresponding to the instruction fetch request according to a tag in the instruction fetch address;

根据所述取指地址中的索引以及所述对应的路，确定所述取指令请求对应的所述高速缓存行以及对应的存储体；其中，每个所述高速缓存行对应有所述高速缓存行的标签存储体和所述高速缓存行的数据存储体，所述标签存储体用于存储每个所述高速缓存行的标签信息，所述数据存储体用于存储每个所述高速缓存行中的指令；Determine the cache line and the corresponding storage body corresponding to the instruction fetch request according to the index in the instruction fetch address and the corresponding way; wherein each cache line corresponds to a tag storage body of the cache line and a data storage body of the cache line, the tag storage body is used to store tag information of each cache line, and the data storage body is used to store instructions in each cache line;

根据所述取指令请求对应的所述高速缓存行、对应的存储体以及所述块内偏移，取出所述目标指令。The target instruction is fetched according to the cache line corresponding to the instruction fetch request, the corresponding storage bank, and the offset within the block.

根据本发明提供的一种处理器取指令方法，所述接收取指令请求之前，还包括：According to a processor instruction fetching method provided by the present invention, before receiving an instruction fetching request, the method further includes:

接收指令预取请求，所述指令预取请求中包含预取请求的预取地址；receiving an instruction prefetch request, wherein the instruction prefetch request includes a prefetch address of the prefetch request;

将所述预取地址在所述高速缓存器中进行查询，得到匹配结果；Searching the pre-fetched address in the cache to obtain a matching result;

在所述匹配结果为未匹配到的情况下，则进行指令预取操作。When the matching result is no match, an instruction prefetch operation is performed.

根据本发明提供的一种处理器取指令方法，所述存储器还包括下一级存储系统；所述进行指令预取操作，包括：According to a processor instruction fetching method provided by the present invention, the memory further includes a next-level storage system; the instruction prefetching operation includes:

利用指令预取器向跨高速缓存行预取项发送分配请求，所述分配请求用于供所述跨高速缓存行预取项为所述预取请求申请分配一项预取项；Sending an allocation request to a cross-cache line prefetch item by using an instruction prefetcher, wherein the allocation request is used for the cross-cache line prefetch item to apply for allocation of a prefetch item for the prefetch request;

根据所述预取项，向所述下一级存储系统发送所述预取请求；所述预取请求用于供所述下一级存储系统根据所述预取请求以及所述高速缓存行的历史标志位预测得到指令访问请求；所述指令访问请求包括单高速缓存行的访问请求或跨高速缓存行的访问请求。According to the prefetch item, the prefetch request is sent to the next-level storage system; the prefetch request is used for the next-level storage system to predict the instruction access request based on the prefetch request and the historical flag of the cache line; the instruction access request includes an access request for a single cache line or an access request across cache lines.

根据本发明提供的一种处理器取指令方法，所述根据所述预取项，向所述下一级存储系统发送所述预取请求，包括：According to a processor instruction fetching method provided by the present invention, sending the prefetch request to the next-level storage system according to the prefetch item includes:

根据所述预取项，对所述预取请求的预取标志位进行预测；所述预取标志位用于表征所述预取请求是否需要进行跨高速缓存行进行预取指令。The prefetch flag of the prefetch request is predicted according to the prefetch item; the prefetch flag is used to indicate whether the prefetch request needs to perform a prefetch instruction across cache lines.

根据本发明提供的一种处理器取指令方法，所述根据所述预取项，对所述预取请求的预取标志位进行预测，包括：According to a processor instruction fetching method provided by the present invention, predicting a prefetch flag of the prefetch request according to the prefetch item includes:

将所述预取请求对应的所有高速缓存行的预取标志位初始设置为1；Initially setting the prefetch flag bits of all cache lines corresponding to the prefetch request to 1;

在所述预取请求对应的高速缓存行中任一所述高速缓存行未发生跨行预取指令的第一目标次数达到第一次数阈值的情况下，将所述高速缓存行的预取标志位确定为0；所述高速缓存行的预取标志位为0表征所述高速缓存行不再进行跨行预取指令；When a first target number of times that no cross-row prefetch instruction occurs in any cache line corresponding to the prefetch request reaches a first number threshold, the prefetch flag bit of the cache line is determined to be 0; the prefetch flag bit of the cache line being 0 indicates that no cross-row prefetch instruction is performed on the cache line;

在所述预取请求对应的高速缓存行中任一所述高速缓存行发生跨行预取指令的第二目标次数达到第二次数阈值的情况下，将所述高速缓存行的预取标志位确定为1；所述高速缓存行的预取标志位为1表征所述高速缓存行恢复跨行预取指令。When a second target number of cross-row prefetch instructions occurs in any cache line corresponding to the prefetch request reaches a second number threshold, the prefetch flag of the cache line is determined to be 1; the prefetch flag of the cache line being 1 indicates that the cache line resumes the cross-row prefetch instruction.

根据本发明提供的一种处理器取指令方法，所述将所述预取地址在所述高速缓存器中进行查询，得到匹配结果，包括：According to a processor instruction fetching method provided by the present invention, the pre-fetching address is queried in the cache to obtain a matching result, including:

利用预取器记录已发送的预取请求的地址；Using a prefetcher to record addresses of prefetch requests that have been sent;

将所述预取地址与所述已发送的预取请求的地址进行匹配，在匹配不到的情况下，将所述预取地址在所述高速缓存器中进行查询，得到匹配结果。The pre-fetch address is matched with the address of the sent pre-fetch request. If no match is found, the pre-fetch address is queried in the cache to obtain a matching result.

第二方面，本发明还提供一种处理器取指令装置，应用于电子设备中的处理器，所述电子设备包括所述处理器和存储器，所述存储器包括高速缓存器；所述装置包括如下模块：In a second aspect, the present invention further provides a processor instruction fetching device, which is applied to a processor in an electronic device, wherein the electronic device comprises the processor and a memory, wherein the memory comprises a cache memory; the device comprises the following modules:

接收模块，用于接收取指令请求；所述取指令请求的指令块中包括普通指令和压缩指令；A receiving module, used for receiving an instruction fetch request; the instruction block of the instruction fetch request includes common instructions and compressed instructions;

取指令模块，用于根据所述取指令请求，确定所述取指令请求的有效结果；所述有效结果用于表征所述取指令请求是否需要从所述高速缓存器中跨行取指令；所述高速缓存器中包括奇体存储体和偶体存储体，以使所述高速缓存器中连续的两个高速缓存行的地址处于两个不同的奇偶体存储体上；an instruction fetch module, for determining a valid result of the instruction fetch request according to the instruction fetch request; the valid result is used to characterize whether the instruction fetch request needs to fetch instructions across rows from the cache; the cache includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-even-body storage bodies;

第三方面，本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上述任一种所述处理器取指令方法。In a third aspect, the present invention further provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the processor instruction fetching method described above is implemented.

第四方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述任一种所述处理器取指令方法。In a fourth aspect, the present invention further provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements any of the processor instruction fetching methods described above.

第五方面，本发明还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述任一种所述处理器取指令方法。In a fifth aspect, the present invention further provides a computer program product, comprising a computer program, wherein when the computer program is executed by a processor, the processor instruction fetching method described in any one of the above is implemented.

本发明提供的处理器取指令方法、装置、设备和存储介质，该方法应用于电子设备中的处理器，电子设备包括处理器和存储器，存储器包括高速缓存器，该方法包括：首先，接收取指令请求，其中，取指令请求的指令块中包括普通指令和压缩指令；然后，根据取指令请求，确定取指令请求的有效结果，其中，有效结果用于表征取指令请求是否需要从高速缓存器中跨行取指令，高速缓存器中包括奇体存储体和偶体存储体，以使高速缓存器中连续的两个高速缓存行的地址处于两个不同的奇偶体存储体上；进而，在有效结果为有效的情况下，根据取指令请求从指令块所在的连续两个高速缓存行中取出目标指令，所述指令块为基于取指令请求中的取指地址确定的；相反地，在有效结果为无效的情况下，根据取指令请求从指令块所在的高速缓存行中取出目标指令。The present invention provides a processor instruction fetching method, device, equipment and storage medium. The method is applied to a processor in an electronic device. The electronic device includes a processor and a memory. The memory includes a cache. The method includes: first, receiving an instruction fetching request, wherein the instruction block of the instruction fetching request includes ordinary instructions and compressed instructions; then, determining a valid result of the instruction fetching request according to the instruction fetching request, wherein the valid result is used to characterize whether the instruction fetching request needs to fetch instructions across rows from the cache. The cache includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-even-body storage bodies; further, when the valid result is valid, fetching a target instruction from two consecutive cache lines where the instruction block is located according to the instruction fetching request, wherein the instruction block is determined based on the instruction fetching address in the instruction fetching request; on the contrary, when the valid result is invalid, fetching the target instruction from the cache line where the instruction block is located according to the instruction fetching request.

本发明中高速缓存器中包括奇体存储体和偶体存储体，以使高速缓存器中连续的两个高速缓存行的地址处于两个不同的奇偶体存储体上，根据取指令请求确定取指令请求的有效结果，也即是否需要从高速缓存器中跨行取指令，在需要跨行取指令的情况下，根据取指令请求从指令块所在的连续两个高速缓存行中取出目标指令，判断不需要跨行取指令的情况下，执行单行读指令的操作，能够实现跨高速缓存行同时取跨行指令的目标，提高了处理器性能。The cache in the present invention includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-body storage bodies. The effective result of the instruction fetch request is determined according to the instruction fetch request, that is, whether it is necessary to fetch instructions across lines from the cache. When it is necessary to fetch instructions across lines, the target instruction is fetched from two consecutive cache lines where the instruction block is located according to the instruction fetch request. When it is determined that it is not necessary to fetch instructions across lines, a single-line read instruction operation is performed, which can achieve the goal of simultaneously fetching instructions across cache lines, thereby improving processor performance.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present invention or the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1是本发明提供的处理器取指令方法的流程示意图。FIG1 is a schematic flow chart of a processor instruction fetching method provided by the present invention.

图2是本发明提供的指令跨高速缓存行的原理示意图。FIG. 2 is a schematic diagram showing the principle of instructions crossing cache lines provided by the present invention.

图3是本发明提供的高速缓存存储体的原理示意图。FIG. 3 is a schematic diagram showing the principle of a cache memory provided by the present invention.

图4是本发明提供的取指地址的原理示意图。FIG. 4 is a schematic diagram showing the principle of the instruction fetch address provided by the present invention.

图5是本发明提供的处理器预取指令方法的原理示意图。FIG5 is a schematic diagram showing the principle of the processor prefetch instruction method provided by the present invention.

图6是本发明提供的处理器取指令装置的结构示意图。FIG6 is a schematic diagram of the structure of a processor instruction fetching device provided by the present invention.

图7是本发明提供的电子设备的结构示意图。FIG. 7 is a schematic diagram of the structure of an electronic device provided by the present invention.

具体实施方式DETAILED DESCRIPTION

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with the drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象，而不用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换，以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施，且“第一”、“第二”所区别的对象通常为一类，并不限定对象的个数，例如第一节点可以是一个，也可以是多个。此外，说明书以及权利要求中“和/或”表示所连接对象的至少其中之一，字符“/”一般表示前后关联对象是一种“或”的关系。The terms "first", "second", etc. in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchangeable under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by "first" and "second" are generally a class, and the number of objects is not limited. For example, the first node can be one or more. In addition, "and/or" in the specification and claims represents at least one of the connected objects, and the character "/" generally represents that the front and back associated objects are in an "or" relationship.

下面结合图1-图7描述本发明的处理器取指令方法、装置、设备和存储介质。The processor instruction fetching method, apparatus, device and storage medium of the present invention are described below in conjunction with FIG. 1 to FIG. 7 .

图1是本发明提供的处理器取指令方法的流程示意图，应用于电子设备中的处理器，电子设备包括处理器和存储器，存储器包括高速缓存器，如图1所示，该方法包括如下：FIG1 is a flow chart of a processor instruction fetching method provided by the present invention, which is applied to a processor in an electronic device, wherein the electronic device includes a processor and a memory, and the memory includes a cache. As shown in FIG1 , the method includes the following steps:

步骤101、接收取指令请求；取指令请求的指令块中包括普通指令和压缩指令；Step 101, receiving an instruction fetch request; the instruction block of the instruction fetch request includes common instructions and compressed instructions;

具体地，需要说明的是，本发明提供的处理器取指令方法可适用于处理器取指令的场景中，特别地，适用支持压缩指令的处理器取指令的场景中。该方法的执行主体可以为电子设备中的处理器，电子设备除了包括处理器还包括高级缓存器。Specifically, it should be noted that the processor instruction fetching method provided by the present invention is applicable to the scenario of processor instruction fetching, in particular, to the scenario of processor instruction fetching supporting compressed instructions. The execution subject of the method can be a processor in an electronic device, and the electronic device includes not only a processor but also an advanced cache.

其中，处理器如中央处理器（Central Processing Unit / Processor，CPU），高速缓冲存储器（Cache，简称“高速缓存”），也即一级存储系统，设置在处理器和内存之间，Cache是一种小容量的高速缓冲存储器，由快速静态随机存储器（Static Random-AccessMemory，SRAM）存储元件组成，可以直接集成在CPU芯片内。在Cache，把内存中被频繁访问的活跃程序块和数据块复制到Cache 中，以提高CPU的读写指令和数据的速度。由于程序访问的局部性，在大多数情况下，CPU能直接从Cache中取得指令和数据，而不必访问内存。并且随着技术的发展，越来越多的数据处理依托于多级缓存存储层次（Cache Hierarchy）系统实现。与之对应地，存储器包括下一级存储系统如（L2 Cache），L2 Cache的容量相对大。Among them, the processor, such as the central processing unit (CPU), the cache memory (Cache, referred to as "cache"), is also a first-level storage system, which is set between the processor and the memory. Cache is a small-capacity cache memory composed of fast static random access memory (SRAM) storage elements, which can be directly integrated into the CPU chip. In the cache, the active program blocks and data blocks that are frequently accessed in the memory are copied to the cache to increase the speed of the CPU to read and write instructions and data. Due to the locality of program access, in most cases, the CPU can directly obtain instructions and data from the cache without accessing the memory. And with the development of technology, more and more data processing relies on a multi-level cache storage hierarchy (Cache Hierarchy) system. Correspondingly, the memory includes a next-level storage system such as (L2 Cache), and the capacity of the L2 Cache is relatively large.

可以理解的是，对于支持压缩指令的处理器，有可能出现一条指令的范围跨过两个高速缓存行（Cache Line）的情况。例如，处理器的高速缓存行大小为64字节（即512位），其中，高速缓存行中既存储有压缩指令（假设压缩指令RVC为16bit，16bit instruction）也存储有普通指令（假设普通指令RVI为32bit，32bit instruction），这样就可能出现普通指令跨高速缓存行的情况。图2是本发明提供的指令跨高速缓存行的原理示意图，如图2所示，假设一个512位的指令高速缓存行从第0到第279位都是32位的普通指令，第480到第495位是一条16位的普通指令，接下来从第496位开始是一条32位的普通指令，就会出现跨高速缓存行的情况，即这条普通指令的16位在一个高速缓存行，另外16位在下一个高速缓存行。本发明提供的处理器取指令方法中，高速缓存器支持一次同时取两个高速缓存行，以保证处理器性能。It is understandable that for a processor that supports compressed instructions, it is possible that the range of an instruction spans two cache lines. For example, the cache line size of the processor is 64 bytes (i.e., 512 bits), wherein the cache line stores both compressed instructions (assuming that the compressed instruction RVC is 16 bits, 16-bit instruction) and ordinary instructions (assuming that the ordinary instruction RVI is 32 bits, 32-bit instruction), so it is possible that ordinary instructions span cache lines. FIG2 is a schematic diagram of the principle of instructions spanning cache lines provided by the present invention. As shown in FIG2, assuming that a 512-bit instruction cache line is a 32-bit ordinary instruction from the 0th to the 279th bit, the 480th to the 495th bit is a 16-bit ordinary instruction, and the next 32-bit ordinary instruction from the 496th bit onwards, there will be a situation where the cache line spans, that is, 16 bits of this ordinary instruction are in one cache line, and the other 16 bits are in the next cache line. In the processor instruction fetching method provided by the present invention, the cache supports fetching two cache lines at a time to ensure processor performance.

首先，接收取指令请求，取指令请求中包含取指地址，取指令请求的指令块中包括普通指令和压缩指令，指令是否是压缩指令，可以根据指令编码中的压缩指令标识位进行判断。例如RISC-V指令集，指令编码的最低两位，即Instruction[1:0]==2’b11（二进制11），为普通32位指令，否则为压缩指令（也叫RVC指令）。First, an instruction fetch request is received. The instruction fetch request contains an instruction fetch address. The instruction block of the instruction fetch request includes ordinary instructions and compressed instructions. Whether the instruction is a compressed instruction can be determined based on the compressed instruction identification bit in the instruction encoding. For example, in the RISC-V instruction set, the lowest two bits of the instruction encoding, i.e., Instruction[1:0]==2’b11 (binary 11), are ordinary 32-bit instructions, otherwise they are compressed instructions (also called RVC instructions).

步骤102、根据取指令请求，确定取指令请求的有效结果；有效结果用于表征取指令请求是否需要从高速缓存器中跨行取指令；高速缓存器中包括奇体存储体和偶体存储体，以使高速缓存器中连续的两个高速缓存行的地址处于两个不同的奇偶体存储体上；Step 102: determining a valid result of the instruction fetch request according to the instruction fetch request; the valid result is used to indicate whether the instruction fetch request needs to fetch instructions across rows from the cache; the cache includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-even-body storage bodies;

具体地，在接收到取指令请求之后，可以进一步根据取指令请求中指令块的标志位判断其是否属于有效请求，也即是否需要从高速缓存器中跨行取指令。Specifically, after receiving the instruction fetch request, it can be further determined whether it is a valid request according to the flag bit of the instruction block in the instruction fetch request, that is, whether it is necessary to fetch instructions across rows from the cache.

需要说明的是，本发明中的高速缓存器的标签存储体和数据存储体分成奇、偶体，图3是本发明提供的高速缓存存储体的原理示意图，如图3所示，高速缓存存储体分成2个体、4个体、……、2的幂次（2ⁿ）个体，存储体地址以缓存索引（Index）的低位表示。若分为2个体，则体地址是缓存索引（Index）的最低1位；若分为4个体，则体地址是缓存索引（Index）的最低2位；若分为n个体，则体地址是缓存索引（Index）的最低n位，n为正整数。It should be noted that the tag storage body and data storage body of the cache in the present invention are divided into odd and even bodies. FIG3 is a schematic diagram of the principle of the cache storage body provided by the present invention. As shown in FIG3, the cache storage body is divided into 2 bodies, 4 bodies, ..., 2 power ( ²ⁿ ) bodies, and the storage body address is represented by the low bit of the cache index (Index). If it is divided into 2 bodies, the body address is the lowest 1 bit of the cache index (Index); if it is divided into 4 bodies, the body address is the lowest 2 bits of the cache index (Index); if it is divided into n bodies, the body address is the lowest n bits of the cache index (Index), and n is a positive integer.

可以看出，采用上述奇偶体的方式，可以确保连续的两个缓存行地址处于奇、偶两个不同的高速缓存存储体上，用单端口（Single-port）的静态随机存取存储器（StaticRandom-Access Memory，SRAM）就可以实现跨Cache行取指令，很大程度上节省了处理器的功耗和面积。It can be seen that the use of the above-mentioned parity method can ensure that two consecutive cache line addresses are in two different cache storage bodies, odd and even. A single-port static random access memory (SRAM) can be used to implement cross-cache line instruction fetching, which greatly saves the power consumption and area of the processor.

进而，对于最后一条不是普通指令跨高速缓存行的情况，则只会发送一个高速缓存行的读请求，则认为其属于无效信号。Furthermore, in the case where the last instruction is not a common instruction across cache lines, only a read request for a cache line will be sent, which is considered to be an invalid signal.

步骤103、在有效结果为有效的情况下，根据取指令请求从指令块所在的连续两个高速缓存行中取出目标指令；指令块为基于取指令请求中的取指地址确定的；Step 103: if the valid result is valid, fetch the target instruction from two consecutive cache lines where the instruction block is located according to the instruction fetch request; the instruction block is determined based on the instruction fetch address in the instruction fetch request;

具体地，在步骤102中处理器判断所述取指令请求为有效请求的情况下，则执行跨行取指令的操作。Specifically, when the processor determines in step 102 that the instruction fetch request is a valid request, an operation of fetching instructions across rows is performed.

例如，根据取指令请求中的取指地址向高速缓存器发出跨行访问请求，取指地址可以用于确定所述跨行访问请求对应的高速缓存行地址、存储体地址以及块内偏移等。进而，基于所述跨行访问请求对应的高速缓存行地址、存储体地址以及块内偏移等从高速缓存器中对应的访问地址取出对应的目标指令，实现跨高速缓存行取指令，从而提高了处理器性能。For example, a cross-row access request is issued to the cache memory according to the instruction fetch address in the instruction fetch request, and the instruction fetch address can be used to determine the cache row address, storage body address, and block offset corresponding to the cross-row access request. Then, based on the cache row address, storage body address, and block offset corresponding to the cross-row access request, the corresponding target instruction is fetched from the corresponding access address in the cache memory, so as to implement cross-cache row instruction fetching, thereby improving processor performance.

步骤104、在有效结果为无效的情况下，根据取指令请求从指令块所在的高速缓存行中取出目标指令。Step 104: When the valid result is invalid, fetch the target instruction from the cache line where the instruction block is located according to the instruction fetch request.

具体地，在步骤102中处理器判断所述取指令请求为无效请求的情况下，则执行单行读指令的操作。Specifically, when the processor determines in step 102 that the instruction fetch request is an invalid request, a single-line read instruction operation is performed.

例如，根据取指令请求中的取指地址向高速缓存器发出单行访问请求，取指地址可以用于确定所述单行访问请求对应的高速缓存行地址、存储体地址以及块内偏移等。进而，基于所述单行访问请求对应的高速缓存行地址、存储体地址以及块内偏移等从高速缓存器中对应的访问地址取出对应的目标指令，实现一个高速缓存行的读操作，从而很大程度上节省处理器的功耗和面积。For example, a single-line access request is issued to the cache memory according to the instruction fetch address in the instruction fetch request, and the instruction fetch address can be used to determine the cache line address, storage body address, and block offset corresponding to the single-line access request. Then, based on the cache line address, storage body address, and block offset corresponding to the single-line access request, the corresponding target instruction is fetched from the corresponding access address in the cache memory to implement a cache line read operation, thereby greatly saving the power consumption and area of the processor.

本实施例提供的方法，该方法应用于电子设备中的处理器，电子设备包括处理器和存储器，存储器包括高速缓存器，该方法包括：首先，接收取指令请求，其中，取指令请求的指令块中包括普通指令和压缩指令；然后，根据取指令请求，确定取指令请求的有效结果，其中，有效结果用于表征取指令请求是否需要从高速缓存器中跨行取指令，高速缓存器中包括奇体存储体和偶体存储体，以使高速缓存器中连续的两个高速缓存行的地址处于两个不同的奇偶体存储体上；进而，在有效结果为有效的情况下，根据取指令请求从指令块所在的连续两个高速缓存行中取出目标指令，所述指令块为基于取指令请求中的取指地址确定的；相反地，在有效结果为无效的情况下，根据取指令请求从指令块所在的高速缓存行中取出目标指令。The method provided in this embodiment is applied to a processor in an electronic device, the electronic device includes a processor and a memory, the memory includes a cache, and the method includes: first, receiving an instruction fetch request, wherein the instruction block of the instruction fetch request includes ordinary instructions and compressed instructions; then, determining a valid result of the instruction fetch request according to the instruction fetch request, wherein the valid result is used to characterize whether the instruction fetch request needs to fetch instructions across rows from the cache, the cache includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-even-body storage bodies; then, when the valid result is valid, fetching a target instruction from two consecutive cache lines where the instruction block is located according to the instruction fetch request, the instruction block being determined based on the instruction fetch address in the instruction fetch request; conversely, when the valid result is invalid, fetching the target instruction from the cache line where the instruction block is located according to the instruction fetch request.

根据本发明提供的一种处理器取指令方法，根据取指令请求，确定取指令请求的有效结果，包括：According to a processor instruction fetching method provided by the present invention, determining a valid result of the instruction fetching request according to the instruction fetching request includes:

根据取指地址，确定指令块的标识位；标识位用于标记指令块中是否包含跨行指令，跨行指令为跨行指令的全部字节落在两个不同的高速缓存行中；According to the instruction fetch address, the identification bit of the instruction block is determined; the identification bit is used to mark whether the instruction block contains a cross-line instruction, and the cross-line instruction is a cross-line instruction whose entire bytes fall into two different cache lines;

根据指令块的标识位，确定取指令请求的有效结果。The valid result of the instruction fetch request is determined based on the identification bit of the instruction block.

具体地，在一些实施例中，步骤102根据取指令请求，确定取指令请求的有效结果的具体实现过程包括如下步骤：Specifically, in some embodiments, the specific implementation process of step 102 determining the valid result of the instruction fetch request according to the instruction fetch request includes the following steps:

首先，根据取指地址，确定指令块的标识位。其中，需要说明的是，标识位用于标记指令块中是否包含跨行指令，跨行指令为跨行指令的全部字节落在两个不同的高速缓存行中。具体地，例如，首先根据取指地址确定取指令请求的指令块，指令块也即指令集合。当一个指令块在取指阶段发现它的最后2字节（2bytes，即16位）是一条普通指令（RVI）指令的前半部分时，则置一个标识位，在这种情况下同时取两个高速缓存行（Cache Line）保证后半部分是一定可以被取到的。否则，只取一个高速缓存行，以节省取指带宽。First, according to the instruction fetch address, determine the identification bit of the instruction block. It should be noted that the identification bit is used to mark whether the instruction block contains cross-line instructions, and the cross-line instructions are cross-line instructions whose all bytes fall in two different cache lines. Specifically, for example, first determine the instruction block of the instruction fetch request according to the instruction fetch address, and the instruction block is also an instruction set. When an instruction block finds that its last 2 bytes (2 bytes, i.e. 16 bits) are the first half of a normal instruction (RVI) instruction during the instruction fetch stage, a identification bit is set. In this case, two cache lines are fetched at the same time to ensure that the second half can be fetched. Otherwise, only one cache line is fetched to save instruction fetch bandwidth.

进一步地，可以根据指令块的标识位，确定取指令请求的有效结果。例如，可以预设标识位的取值以及各个不同取值对应的有效结果，进而，在确定了标识位之后，即可通过简单匹配标识位取值与取值的标识含义，从而确定出所述取指令请求的有效结果，也即判断出所述取指令请求是否为需要从高速缓存器中跨行取指令。Furthermore, the valid result of the instruction fetch request can be determined based on the identification bit of the instruction block. For example, the value of the identification bit and the valid results corresponding to each different value can be preset, and then, after the identification bit is determined, the valid result of the instruction fetch request can be determined by simply matching the identification bit value with the identification meaning of the value, that is, judging whether the instruction fetch request requires fetching instructions across rows from the cache.

本实施例提供的方法，首先，根据取指令请求的取指地址，确定指令块的标识位，其中，标识位用于标记指令块中是否包含跨行指令，跨行指令为跨行指令的全部字节落在两个不同的高速缓存行中；然后，根据指令块的标识位，确定取指令请求的有效结果，也即通过标志位标识每个请求是否有效的信号，因为对于最后一条不是普通指令跨高速缓存行的情况，则只会发送一个高速缓存行的读请求，从而实现了处理器跨行取指令，同时避免浪费带宽，减少处理器的功耗。The method provided in this embodiment first determines the identification bit of the instruction block according to the instruction fetch address of the instruction fetch request, wherein the identification bit is used to mark whether the instruction block contains cross-line instructions, and the cross-line instructions are cross-line instructions whose all bytes fall in two different cache lines; then, according to the identification bit of the instruction block, the valid result of the instruction fetch request is determined, that is, the signal of whether each request is valid is identified by the flag bit, because for the case where the last instruction is not an ordinary instruction across cache lines, only a cache line read request will be sent, thereby realizing the processor to fetch instructions across lines, while avoiding wasting bandwidth and reducing the power consumption of the processor.

根据本发明提供的一种处理器取指令方法，标识位的取值包括1或0，根据指令块的标识位，确定取指令请求的有效结果，包括：According to a processor instruction fetching method provided by the present invention, the value of the identification bit includes 1 or 0, and determining the valid result of the instruction fetching request according to the identification bit of the instruction block includes:

在标识位的取值为1的情况下，将有效结果确定为有效；When the value of the flag bit is 1, the valid result is determined to be valid;

在标识位的取值为0的情况下，将有效结果确定为无效。When the value of the flag bit is 0, the valid result is determined to be invalid.

具体地，在一些实施例中，标识位可以预设可能的所有取值，以及预存相应的对应规则，规则表征不同的标识位取值对应的有效结果的不同含义，例如，预设标识位的可能取值包括1或0，取指令请求的取值为1表征有效结果为有效，取指令请求的取值为0则表征有效结果为无效。Specifically, in some embodiments, the identification bit can preset all possible values and pre-store corresponding rules, the rules representing different meanings of valid results corresponding to different identification bit values. For example, the possible values of the preset identification bit include 1 or 0, and the value of the instruction fetch request is 1, which represents that the valid result is valid, and the value of the instruction fetch request is 0, which represents that the valid result is invalid.

与之对应地，在一些实施例中，根据指令块的标识位，确定取指令请求的有效结果的具体实现过程包括如下步骤：Correspondingly, in some embodiments, the specific implementation process of determining the valid result of the instruction fetch request according to the identification bit of the instruction block includes the following steps:

将所述取指令请求的标识位请求与预设的取值进行匹配，得到对应的有效结果。例如，在标识位的取值为1的情况下，将有效结果确定为有效，在标识位的取值为0的情况下，将有效结果确定为无效，实现了根据标识位确定所述取指令请求是否为有效请求，从而确定是否执行跨行取指令操作，提高了处理器的性能并且避免浪费带宽。The flag bit request of the instruction fetch request is matched with a preset value to obtain a corresponding valid result. For example, when the value of the flag bit is 1, the valid result is determined to be valid, and when the value of the flag bit is 0, the valid result is determined to be invalid, thereby determining whether the instruction fetch request is a valid request based on the flag bit, thereby determining whether to perform a cross-row instruction fetch operation, improving the performance of the processor and avoiding wasting bandwidth.

本实施例提供的方法，在标识位的取值为1的情况下，将有效结果确定为有效，在标识位的取值为0的情况下，将有效结果确定为无效，本方法实现了根据标识位确定所述取指令请求是否为有效请求，从而确定是否执行跨行取指令操作，提高了处理器的性能并且避免浪费带宽。The method provided in this embodiment determines the valid result as valid when the value of the flag bit is 1, and determines the valid result as invalid when the value of the flag bit is 0. This method determines whether the instruction fetch request is a valid request based on the flag bit, thereby determining whether to perform a cross-row instruction fetch operation, thereby improving the performance of the processor and avoiding wasting bandwidth.

根据本发明提供的一种处理器取指令方法，取指令请求中包括取指地址，取指地址为取指令请求对应的目标高速缓存行的访问地址，取指地址包括标签、索引、块内偏移，标签用于与高速缓存器中的高速缓存行做多路标签比较，索引用于指示取指令请求对应的高速缓存行的行号，索引包含高位索引和低位索引，高位索引用于指示高速缓存行的行地址，低位索引用于指示存储体的地址，块内偏移（Block Offset）用于指示取指令请求的指令块在高速缓存行内的具体位置；According to a processor instruction fetching method provided by the present invention, an instruction fetching request includes an instruction fetching address, which is an access address of a target cache line corresponding to the instruction fetching request, and includes a tag, an index, and an offset within a block. The tag is used to perform a multi-way tag comparison with a cache line in a cache memory, and the index is used to indicate the row number of the cache line corresponding to the instruction fetching request. The index includes a high-order index and a low-order index, and the high-order index is used to indicate the row address of the cache line, and the low-order index is used to indicate the address of a storage body. The block offset (Block Offset) is used to indicate the specific position of the instruction block of the instruction fetching request in the cache line;

根据取指令请求从指令块所在的连续两个高速缓存行中取出目标指令，包括：According to the instruction fetch request, a target instruction is fetched from two consecutive cache lines where the instruction block is located, including:

根据取指地址中的标签，确定取指令请求对应的路；Determine the way corresponding to the instruction fetch request according to the tag in the instruction fetch address;

根据取指地址中的索引以及对应的路，确定取指令请求对应的高速缓存行以及对应的存储体；其中，每个高速缓存行对应有高速缓存行的标签存储体和高速缓存行的数据存储体，标签存储体用于存储每个高速缓存行的标签信息，数据存储体用于存储每个高速缓存行中的指令；According to the index in the instruction fetch address and the corresponding way, determine the cache line and the corresponding storage body corresponding to the instruction fetch request; wherein each cache line corresponds to a tag storage body of the cache line and a data storage body of the cache line, the tag storage body is used to store the tag information of each cache line, and the data storage body is used to store the instruction in each cache line;

根据取指令请求对应的高速缓存行、对应的存储体以及块内偏移，取出目标指令。The target instruction is fetched according to the cache line corresponding to the instruction fetch request, the corresponding storage bank and the offset within the block.

具体地，在一些实施例中，所述取指令请求中包括取指地址，取指地址为取指令请求对应的目标高速缓存行的访问地址，后续便于根据取指地址取对应地址的目标指令。Specifically, in some embodiments, the instruction fetch request includes an instruction fetch address, which is an access address of a target cache line corresponding to the instruction fetch request, so as to facilitate fetching the target instruction of the corresponding address according to the instruction fetch address.

示例性地，图4是本发明提供的取指地址的原理示意图，如图4所示，取指地址包括标签（Tag）、索引（Index）、块内偏移（Block Offset），标签用于与高速缓存器中的高速缓存行做多路标签比较，索引用于指示取指令请求对应的高速缓存行的行号，本发明中索引包含高位索引和低位索引，高位索引用于指示高速缓存行的行地址，低位索引用于指示存储体的地址，块内偏移（Block Offset）用于指示取指令请求的指令块在高速缓存行内的具体位置，也即访问指令高速缓存的内容在高速缓存行内的具体位置。Exemplarily, Figure 4 is a schematic diagram of the principle of the instruction fetch address provided by the present invention. As shown in Figure 4, the instruction fetch address includes a tag, an index, and a block offset. The tag is used to perform multi-way tag comparison with the cache line in the cache memory, and the index is used to indicate the row number of the cache line corresponding to the instruction fetch request. In the present invention, the index includes a high-order index and a low-order index. The high-order index is used to indicate the row address of the cache line, and the low-order index is used to indicate the address of the storage body. The block offset is used to indicate the specific position of the instruction block of the instruction fetch request in the cache line, that is, the specific position of the content of the access instruction cache in the cache line.

需要说明的是，本实施例中的每个高速缓存行对应有高速缓存行的标签存储体和高速缓存行的数据存储体，标签存储体用于存储每个高速缓存行的标签信息，数据存储体用于存储每个高速缓存行中的指令，指令高速缓存的标签存储体和数据存储体分成奇、偶体，如分成2个体、4个体、……、2的幂次（2ⁿ）个体，存储体地址以缓存索引（Index）的低位表示。若分为2个体，则体地址是缓存索引（Index）的最低1位；若分为4个体，则体地址是缓存索引（Index）的最低2位；若分为2ⁿ个体，则体地址是缓存索引（Index）的最低n位，n为正整数。指令高速缓存虚地址索引，虚地址空间中相邻的两个Cache Line会被分别划分到不同的奇偶体实现一次两个Cache Line的读取，确保了对于相邻的两个Cache Line可以在一个时钟周期同时访问。It should be noted that, in this embodiment, each cache line corresponds to a tag storage body of the cache line and a data storage body of the cache line. The tag storage body is used to store the tag information of each cache line, and the data storage body is used to store the instructions in each cache line. The tag storage body and the data storage body of the instruction cache are divided into odd and even bodies, such as 2 bodies, 4 bodies, ..., 2 power ( ²ⁿ ) bodies, and the storage body address is represented by the low bit of the cache index (Index). If it is divided into 2 bodies, the body address is the lowest 1 bit of the cache index (Index); if it is divided into 4 bodies, the body address is the lowest 2 bits of the cache index (Index); if it is divided into ²ⁿ bodies, the body address is the lowest n bits of the cache index (Index), and n is a positive integer. The instruction cache virtual address index, two adjacent cache lines in the virtual address space will be divided into different odd and even bodies to realize the reading of two cache lines at a time, ensuring that two adjacent cache lines can be accessed simultaneously in one clock cycle.

与之对应地，步骤103中根据取指令请求从指令块所在的连续两个高速缓存行中取出目标指令，包括：Correspondingly, in step 103, the target instruction is fetched from two consecutive cache lines where the instruction block is located according to the instruction fetch request, including:

首先，根据取指地址中的标签，确定取指令请求对应的路，也即，利用取指地址中的标签做多路标签比较，匹配对应的路。First, the way corresponding to the instruction fetch request is determined according to the tag in the instruction fetch address, that is, the tag in the instruction fetch address is used to perform multi-way tag comparison to match the corresponding way.

进一步地，根据取指地址中的索引以及对应的路，确定取指令请求对应的高速缓存行以及对应的存储体；其中，对应的路确定后，可以结合取指地址中的索引确定出取指令请求对应的高速缓存行以及对应的存储体。其中，索引包含高位索引和低位索引，高位索引用于指示高速缓存行的行地址，低位索引用于指示存储体的地址。Further, according to the index in the instruction fetch address and the corresponding way, the cache line and the corresponding storage body corresponding to the instruction fetch request are determined; wherein, after the corresponding way is determined, the cache line and the corresponding storage body corresponding to the instruction fetch request can be determined in combination with the index in the instruction fetch address. wherein, the index includes a high-order index and a low-order index, the high-order index is used to indicate the row address of the cache line, and the low-order index is used to indicate the address of the storage body.

进一步地，根据取指令请求对应的高速缓存行、对应的存储体以及块内偏移，取出目标指令。基于上述步骤已经确定取指令请求对应的存储体，进一步根据块内偏移，也即块内地址访问对应的目标指令，实现跨行取指令操作。虚地址空间中相邻的两个Cache Line会被分别划分到不同的奇偶体实现一次两个Cache Line的读取，可以确保对于相邻的两个Cache Line可以在一个时钟周期同时访问，实现跨行取指令。Furthermore, the target instruction is retrieved according to the cache line corresponding to the instruction fetch request, the corresponding storage body and the offset within the block. Based on the above steps, the storage body corresponding to the instruction fetch request has been determined, and the corresponding target instruction is further accessed according to the offset within the block, that is, the address within the block, to implement the cross-line instruction fetch operation. Two adjacent cache lines in the virtual address space will be divided into different parity bodies to implement the reading of two cache lines at a time, which can ensure that two adjacent cache lines can be accessed simultaneously in one clock cycle to implement cross-line instruction fetching.

本实施例提供的方法，取指令请求中包括取指地址，取指地址为取指令请求对应的目标高速缓存行的访问地址，取指地址包括标签、索引、块内偏移，标签用于与高速缓存器中的高速缓存行做多路标签比较，索引用于指示取指令请求对应的高速缓存行的行号，索引包含高位索引和低位索引，高位索引用于指示高速缓存行的行地址，低位索引用于指示存储体的地址，块内偏移用于指示取指令请求的指令块在高速缓存行内的具体位置，在取指令请求的有效结果为有效的情况下，本发明根据取指地址中的标签，确定取指令请求对应的路，根据取指地址中的索引以及对应的路，确定取指令请求对应的高速缓存行以及对应的存储体，其中，每个高速缓存行对应有高速缓存行的标签存储体和高速缓存行的数据存储体，标签存储体用于存储每个高速缓存行的标签信息，数据存储体用于存储每个高速缓存行中的指令；进而，根据取指令请求对应的高速缓存行、对应的存储体以及块内偏移，取出目标指令。本发明可以确保对于相邻的两个Cache Line可以在一个时钟周期同时访问，实现了跨高速缓存行取指令，进而提高了处理器性能。The method provided by this embodiment comprises an instruction fetch request including an instruction fetch address, which is an access address of a target cache line corresponding to the instruction fetch request, and the instruction fetch address includes a tag, an index, and an offset within a block. The tag is used to perform a multi-way tag comparison with a cache line in a cache memory, and the index is used to indicate the row number of the cache line corresponding to the instruction fetch request. The index includes a high-order index and a low-order index, and the high-order index is used to indicate the row address of the cache line, and the low-order index is used to indicate the address of the storage body. The offset within the block is used to indicate the specific position of the instruction block of the instruction fetch request within the cache line. When the effective result of the instruction fetch request is effective, the present invention determines the way corresponding to the instruction fetch request according to the tag in the instruction fetch address, and determines the cache line and the corresponding storage body corresponding to the instruction fetch request according to the index in the instruction fetch address and the corresponding way, wherein each cache line corresponds to a tag storage body of the cache line and a data storage body of the cache line, and the tag storage body is used to store the tag information of each cache line, and the data storage body is used to store the instruction in each cache line; further, the target instruction is fetched according to the cache line corresponding to the instruction fetch request, the corresponding storage body, and the offset within the block. The present invention can ensure that two adjacent Cache Lines can be accessed simultaneously in one clock cycle, thereby realizing cross-cache line instruction fetching and further improving processor performance.

根据本发明提供的一种处理器取指令方法，接收取指令请求之前，所述还包括：According to a processor instruction fetching method provided by the present invention, before receiving an instruction fetching request, the method further includes:

接收指令预取请求，指令预取请求中包含预取请求的预取地址；receiving an instruction prefetch request, wherein the instruction prefetch request includes a prefetch address of the prefetch request;

将预取地址在高速缓存器中进行查询，得到匹配结果；The pre-fetched address is searched in the cache to obtain a matching result;

在匹配结果为未匹配到的情况下，则进行指令预取操作。When the matching result is no match, an instruction prefetch operation is performed.

具体地，在一些实施例中，步骤102之前接收取指令请求之前，所述还包括：指令预取。也即，本发明除支持压缩指令的跨高速缓存行取指令外，也同时支持对跨高速缓存行的预取，支持指令预取的好处是可以保证指令吞吐，同时避免浪费带宽，既提高处理器性能，又降低处理器的功耗和面积。Specifically, in some embodiments, before receiving the instruction fetch request before step 102, the method further includes: instruction prefetching. That is, in addition to supporting cross-cache line instruction fetching of compressed instructions, the present invention also supports cross-cache line prefetching. The benefit of supporting instruction prefetching is that instruction throughput can be guaranteed while avoiding bandwidth waste, thereby improving processor performance and reducing processor power consumption and area.

示例性地，预取机制如下：Exemplarily, the pre-fetch mechanism is as follows:

首先，接收指令预取请求，其中，指令预取请求中包含预取请求的预取地址，预取地址例如访问地址。First, an instruction prefetch request is received, wherein the instruction prefetch request includes a prefetch address of the prefetch request, such as an access address.

进一步地，处理器在接收指令预取请求后，基于所述预取地址在高速缓存器Cache中进行查询，得到匹配结果。可以理解的是，匹配结果包括以下两种：第一种：所述预取地址在高速缓存器Cache中匹配到，第二种：所述预取地址在高速缓存器Cache中未匹配到。取指目标队列中加入预取指针，指针的位置在预测指针和取指令指针中间，预取指针读取当前指令包packet的目标地址（如果跳转则为跳转目标，不跳转为顺序的下一个指令包packet的起始地址），发送给预取器。Furthermore, after receiving the instruction prefetch request, the processor searches the cache memory based on the prefetch address to obtain a matching result. It is understandable that the matching results include the following two types: the first type: the prefetch address is matched in the cache memory, and the second type: the prefetch address is not matched in the cache memory. A prefetch pointer is added to the instruction fetch target queue, and the position of the pointer is between the prediction pointer and the instruction fetch pointer. The prefetch pointer reads the target address of the current instruction packet (if jumping, it is the jump target, if not jumping, it is the starting address of the next instruction packet in sequence), and sends it to the prefetcher.

针对第一种匹配结果，说明需要预取的地址在高速缓存器Cache中已经存在，为了降低处理器的功耗，则预取请求被取消。For the first matching result, it means that the address to be pre-fetched already exists in the cache. In order to reduce the power consumption of the processor, the pre-fetch request is cancelled.

针对第二种匹配结果，说明需要预取的地址在高速缓存器Cache中不存在，为了保证指令吞吐，在匹配结果为未匹配到的情况下，则进行指令预取操作。也即在一级存储系统（如L1 Cache，也即高速缓存器）中查询不到的情况下，则需要向下一级存储系统（如L2Cache，也即二级高速缓存）中查询，进行预取操作。For the second matching result, it means that the address to be pre-fetched does not exist in the cache. In order to ensure instruction throughput, if the matching result is not matched, the instruction pre-fetch operation is performed. That is, if the address cannot be found in the first-level storage system (such as L1 Cache, that is, the cache), it is necessary to query the next-level storage system (such as L2Cache, that is, the second-level cache) for pre-fetching.

也即预取器会完成地址翻译并访问指令缓存的标签存储体ITAG SRAM，如果发现该地址已经在指令高速缓存中，则当此预取请求被取消。That is, the prefetcher will complete the address translation and access the tag storage body ITAG SRAM of the instruction cache. If it is found that the address is already in the instruction cache, the prefetch request will be canceled.

本实施例提供的方法，包括预取机制，例如，接收指令预取请求，指令预取请求中包含预取请求的预取地址，将预取地址在高速缓存器中进行查询，得到匹配结果，在匹配结果为未匹配到的情况下，则进行指令预取操作。本发明中的高速缓存Cache支持一次同时取两个高速缓存行（Cache Line），以及能够预取高速缓存行，可以保证这种情况下的指令吞吐，同时避免浪费带宽，既提高处理器性能，又降低处理器的功耗和面积。The method provided in this embodiment includes a prefetch mechanism, for example, receiving an instruction prefetch request, the instruction prefetch request includes a prefetch address of the prefetch request, querying the prefetch address in a cache, obtaining a matching result, and performing an instruction prefetch operation when the matching result is no match. The cache in the present invention supports fetching two cache lines at a time, and can prefetch cache lines, which can ensure instruction throughput in this case, while avoiding wasting bandwidth, thereby improving processor performance and reducing processor power consumption and area.

根据本发明提供的一种处理器取指令方法，存储器还包括下一级存储系统；进行指令预取操作，包括：According to a processor instruction fetching method provided by the present invention, the memory also includes a next-level storage system; performing an instruction prefetch operation includes:

利用指令预取器向跨高速缓存行预取项发送分配请求，分配请求用于供跨高速缓存行预取项为预取请求申请分配一项预取项；Sending an allocation request to the cross-cache line prefetch item by using the instruction prefetcher, the allocation request being used for the cross-cache line prefetch item to apply for allocation of a prefetch item for the prefetch request;

根据预取项，向下一级存储系统发送预取请求；预取请求用于供下一级存储系统根据预取请求以及高速缓存行的历史标志位预测得到指令访问请求；指令访问请求包括单高速缓存行的访问请求或跨高速缓存行的访问请求。According to the prefetch items, a prefetch request is sent to the next level storage system; the prefetch request is used for the next level storage system to predict the instruction access request based on the prefetch request and the historical flag of the cache line; the instruction access request includes an access request for a single cache line or an access request across cache lines.

具体地，在一些实施例中，预取是指的从存储器中的下一级存储系统中进行高速缓存行预取操作，下一级存储系统可以是计算机所有可访问数据存储器，在本实施例中的下一级存储系统例如二级高速缓存L2 Cache，L2 Cache是CPU缓存体系结构中的重要组成部分。它位于CPU核心的一级高速缓存（L1 Cache）之后，通常比L1 Cache拥有更大的存储容量，但访问速度相对较慢。L2 Cache的作用是进一步减少CPU访问内存的次数，从而提高计算机的运行效率。在实际应用中，L2 Cache的大小和性能对CPU的整体性能有显著影响。Specifically, in some embodiments, prefetching refers to prefetching cache lines from the next level storage system in the memory. The next level storage system can be all accessible data storages of the computer. In this embodiment, the next level storage system is, for example, a second-level cache L2 Cache. L2 Cache is an important component in the CPU cache architecture. It is located after the first-level cache (L1 Cache) of the CPU core and usually has a larger storage capacity than L1 Cache, but the access speed is relatively slow. The role of L2 Cache is to further reduce the number of times the CPU accesses the memory, thereby improving the operating efficiency of the computer. In practical applications, the size and performance of L2 Cache have a significant impact on the overall performance of the CPU.

与之对应地，预取机制中处理器进行指令预取操作的具体实现过程可以通过如下步骤实现，图5是本发明提供的处理器预取指令方法的原理示意图，如图5所示，预取机制包括：Correspondingly, the specific implementation process of the processor performing instruction prefetching operation in the prefetch mechanism can be implemented through the following steps. FIG. 5 is a schematic diagram of the principle of the processor prefetching instruction method provided by the present invention. As shown in FIG. 5 , the prefetch mechanism includes:

首先，利用指令预取器向跨高速缓存行预取项发送分配请求，分配请求用于供跨高速缓存行预取项为预取请求申请分配一项预取项，例如向预取项（RrefetchEntry）申请分配一项。First, an allocation request is sent to a cross-cache line prefetch item by using an instruction prefetcher, where the allocation request is used for the cross-cache line prefetch item to apply for allocation of a prefetch item for the prefetch request, for example, to apply for allocation of an item to a prefetch item (RrefetchEntry).

进一步地，根据预取项，向下一级存储系统发送预取请求。具体地，预取项（RrefetchEntry）向下一级存储系统，如二级高速缓存L2 Cache发送预取请求，把相应的缓存行预取到下一级存储系统，如L2 Cache。其中，预取请求用于供下一级存储系统根据预取请求以及高速缓存行的历史标志位预测得到指令访问请求，指令访问请求包括单高速缓存行的访问请求或跨高速缓存行的访问请求。这样既可以及时预取到需要的指令高速缓存行，也可以不浪费访存带宽，只对需要取的相邻高速缓存行进行预取。Furthermore, according to the prefetch entry, a prefetch request is sent to the next level storage system. Specifically, the prefetch entry (RrefetchEntry) sends a prefetch request to the next level storage system, such as the secondary cache L2 Cache, to prefetch the corresponding cache line to the next level storage system, such as L2 Cache. Among them, the prefetch request is used for the next level storage system to predict the instruction access request based on the prefetch request and the historical flag of the cache line. The instruction access request includes an access request for a single cache line or an access request across cache lines. In this way, the required instruction cache lines can be prefetched in time, and the memory access bandwidth can be saved by prefetching only the adjacent cache lines that need to be fetched.

本实施例提供的方法，将预取地址在高速缓存器中进行查询，得到匹配结果，在匹配结果为未匹配到的情况下，利用指令预取器向跨高速缓存行预取项发送分配请求，分配请求用于供跨高速缓存行预取项为预取请求申请分配一项预取项，然后，根据预取项，向下一级存储系统发送预取请求；预取请求用于供下一级存储系统根据预取请求以及高速缓存行的历史标志位预测得到指令访问请求，其中，指令访问请求包括单高速缓存行的访问请求或跨高速缓存行的访问请求，这样既可以及时预取到需要的指令高速缓存行，也可以不浪费访存带宽，只对需要取的相邻高速缓存行进行预取。本发明能够预取高速缓存行，可以保证支持跨行取指令这种情况下的指令吞吐，同时避免浪费带宽，既提高处理器性能，又降低处理器的功耗和面积。The method provided in this embodiment queries the prefetch address in the cache to obtain a matching result. When the matching result is no match, an instruction prefetcher is used to send an allocation request to the cross-cache line prefetch item. The allocation request is used for the cross-cache line prefetch item to apply for a prefetch item for the prefetch request. Then, according to the prefetch item, a prefetch request is sent to the next-level storage system; the prefetch request is used for the next-level storage system to obtain an instruction access request based on the prefetch request and the historical flag of the cache line, wherein the instruction access request includes an access request for a single cache line or an access request for a cross-cache line, so that the required instruction cache line can be prefetched in time, and the memory access bandwidth can be saved, and only the adjacent cache lines that need to be fetched can be prefetched. The present invention can prefetch cache lines, can ensure the instruction throughput in the case of supporting cross-line instruction fetching, and avoid wasting bandwidth, which not only improves the processor performance, but also reduces the power consumption and area of the processor.

根据本发明提供的一种处理器取指令方法，根据预取项，向下一级存储系统发送预取请求，包括：According to a processor instruction fetching method provided by the present invention, a prefetching request is sent to a next-level storage system according to a prefetching item, comprising:

根据预取项，对预取请求的预取标志位进行预测；预取标志位用于表征预取请求是否需要进行跨高速缓存行进行预取指令。According to the prefetch item, the prefetch flag of the prefetch request is predicted; the prefetch flag is used to indicate whether the prefetch request needs to perform a prefetch instruction across cache lines.

具体地，在一些实施例中，根据预取项，向下一级存储系统发送预取请求可以通过如下步骤实现，包括：Specifically, in some embodiments, sending a pre-fetch request to a next-level storage system according to the pre-fetch item may be implemented by the following steps, including:

可以理解的是，预取请求用于供下一级存储系统根据预取请求以及高速缓存行的历史标志位预测得到指令访问请求，指令访问请求包括单高速缓存行的访问请求或跨高速缓存行的访问请求。It is understandable that the prefetch request is used for the next level storage system to predict the instruction access request based on the prefetch request and the historical flag of the cache line, and the instruction access request includes a single cache line access request or a cross-cache line access request.

换句话说，对于每个预取请求，通过对标志是否需要进行跨Cache行的预取的标志位也进行预测，确定发出一个Cache行的访问请求，还是同时也发出跨Cache行的访问请求。这样既可以及时预取到需要的指令高速缓存行，也可以不浪费访存带宽，只对需要取的相邻高速缓存行进行预取。In other words, for each prefetch request, the flag bit indicating whether cross-cache line prefetch is required is also predicted to determine whether to issue a cache line access request or a cross-cache line access request at the same time. In this way, the required instruction cache line can be prefetched in time, and only the adjacent cache lines that need to be fetched can be prefetched without wasting memory access bandwidth.

本实施例提供的方法中，既可以及时预取到需要的指令高速缓存行，也可以不浪费访存带宽，只对需要取的相邻高速缓存行进行预取。In the method provided in this embodiment, the required instruction cache lines can be pre-fetched in time, and only the adjacent cache lines that need to be fetched can be pre-fetched without wasting memory access bandwidth.

根据本发明提供的一种处理器取指令方法，根据预取项，对预取请求的预取标志位进行预测，包括：According to a processor instruction fetching method provided by the present invention, a prefetch flag bit of a prefetch request is predicted according to a prefetch item, comprising:

将预取请求对应的所有高速缓存行的预取标志位初始设置为1；Initially set the prefetch flag bits of all cache lines corresponding to the prefetch request to 1;

在预取请求对应的高速缓存行中任一高速缓存行未发生跨行预取指令的第一目标次数达到第一次数阈值的情况下，将高速缓存行的预取标志位确定为0；高速缓存行的预取标志位为0表征高速缓存行不再进行跨行预取指令；When a first target number of times that no cross-row prefetch instruction occurs in any cache line corresponding to the prefetch request reaches a first number threshold, the prefetch flag bit of the cache line is determined to be 0; the prefetch flag bit of the cache line being 0 indicates that no cross-row prefetch instruction is performed on the cache line;

在预取请求对应的高速缓存行中任一高速缓存行发生跨行预取指令的第二目标次数达到第二次数阈值的情况下，将高速缓存行的预取标志位确定为1；高速缓存行的预取标志位为1表征高速缓存行恢复跨行预取指令。When the second target number of cross-row prefetch instructions occurs in any cache line in the cache lines corresponding to the prefetch request reaches the second number threshold, the prefetch flag of the cache line is determined to be 1; the prefetch flag of the cache line is 1, indicating that the cache line resumes the cross-row prefetch instruction.

具体地，在一些实施例中，对预取请求的标志是否需要进行跨Cache行的预取的标志位也进行预测的过程示例如下：Specifically, in some embodiments, an example of a process of predicting whether a flag of a prefetch request needs to perform prefetching across cache lines is as follows:

对于每个预取请求中的每个高速缓存行，可以通过历史信息进行预测是否需要进行跨Cache行的预取的标志位。For each cache line in each prefetch request, a flag bit may be used to predict whether cross-cache line prefetching is required based on historical information.

例如，初始化将所有的跨Cache行预取的标志位设置为1，采用饱和计数器进行计数，当该Cache行没有发生指令跨Cache行的情况到达一定的次数，则将该Cache行的跨Cache行预取的标志位设置为0，不再进行跨Cache行的预取，直到该Cache行发生指令跨Cache行的情况到达一定的次数，再将该Cache行的跨Cache行预取的标志位设置为1，恢复跨Cache行的预取。For example, all cross-cache line prefetch flags are initialized to 1, and a saturation counter is used for counting. When the cache line does not have an instruction cross-cache line situation for a certain number of times, the cross-cache line prefetch flag of the cache line is set to 0, and cross-cache line prefetch is no longer performed until the cache line has an instruction cross-cache line situation for a certain number of times, and then the cross-cache line prefetch flag of the cache line is set to 1 to resume cross-cache line prefetch.

本实施例提供的方法中，可以通过历史信息进行预测是否需要进行跨Cache行的预取的标志位，既可以及时预取到需要的指令高速缓存行，也可以不浪费访存带宽，只对需要取的相邻高速缓存行进行预取。In the method provided in this embodiment, the flag bit of whether cross-cache line prefetching is needed can be predicted through historical information, so that the required instruction cache lines can be prefetched in time, and only the adjacent cache lines that need to be fetched can be prefetched without wasting memory access bandwidth.

根据本发明提供的一种处理器取指令方法，将预取地址在高速缓存器中进行查询，得到匹配结果，包括：According to a processor instruction fetching method provided by the present invention, a pre-fetch address is queried in a cache to obtain a matching result, including:

将预取地址与已发送的预取请求的地址进行匹配，在匹配不到的情况下，将预取地址在高速缓存器中进行查询，得到匹配结果。The pre-fetch address is matched with the address of the sent pre-fetch request. If no match is found, the pre-fetch address is searched in the cache to obtain a matching result.

具体地，在一些实施例中，指令预取操作中将预取地址在高速缓存器中进行查询，得到匹配结果的过程可以通过如下过程实现，实现步骤包括：Specifically, in some embodiments, in the instruction prefetch operation, the prefetch address is queried in the cache to obtain a matching result, which can be implemented by the following process, and the implementation steps include:

需要说明的是，本实施例中为了保证不重复向下一级存储系统，如L2 Cache发送预取请求，利用预取器记录已发送的预取请求的地址，也即已发送的预取请求物理地址。It should be noted that, in order to ensure that the prefetch request is not repeatedly sent to the next level storage system, such as L2 Cache, in this embodiment, the prefetcher is used to record the address of the sent prefetch request, that is, the physical address of the sent prefetch request.

进而，在接收到预取请求的时候，发送预取请求至下一级存储系统之前，首先在已发送的预取请求的地址中进行匹配，在匹配不到的情况下，将预取地址在高速缓存器中进行查询，得到匹配结果。换句话说，任何预取请求在申请跨高速缓存行预取项之前都去查已发送预取请求地址的记录，如果发现和已发送的预取请求高速缓存行地址相同就会把当前预取请求取消掉，这样既可以及时预取到需要的指令高速缓存行，也可以不浪费访存带宽，只对需要取的相邻高速缓存行进行预取。Furthermore, when a prefetch request is received, before the prefetch request is sent to the next level storage system, a match is first performed in the address of the prefetch request that has been sent. If no match is found, the prefetch address is queried in the cache to obtain a matching result. In other words, any prefetch request will check the record of the address of the prefetch request that has been sent before applying for a cross-cache line prefetch item. If it is found that the cache line address is the same as the prefetch request that has been sent, the current prefetch request will be canceled. In this way, the required instruction cache line can be prefetched in time, and the memory access bandwidth can be saved by prefetching only the adjacent cache lines that need to be fetched.

本实施例提供的方法，利用预取器记录已发送的预取请求的地址，在接收到预取请求的时候，首先将预取地址与已发送的预取请求的地址进行匹配，在匹配不到的情况下，再将预取地址在高速缓存器中进行查询，这样做的好处是，可以保证不重复向下一级存储系统如L2 Cache发送预取请求，节省了处理器的功耗。下面对本发明提供的处理器取指令装置进行描述，下文描述的处理器取指令装置与上文描述的处理器取指令方法可相互对应参照。The method provided in this embodiment uses a prefetcher to record the address of the prefetch request that has been sent. When receiving the prefetch request, the prefetch address is first matched with the address of the prefetch request that has been sent. If no match is found, the prefetch address is queried in the cache. The advantage of this is that it can ensure that the prefetch request is not repeatedly sent to the next level storage system such as L2 Cache, saving the power consumption of the processor. The processor instruction fetching device provided by the present invention is described below. The processor instruction fetching device described below and the processor instruction fetching method described above can be referred to each other.

图6是本发明提供的处理器取指令装置的结构示意图，该装置应用于电子设备中的处理器，所述电子设备包括所述处理器和存储器，所述存储器包括高速缓存器；如图6所示，该处理器取指令装置600包括如下模块。FIG6 is a schematic diagram of the structure of a processor instruction fetching device provided by the present invention, which is applied to a processor in an electronic device, wherein the electronic device includes the processor and a memory, wherein the memory includes a cache memory; as shown in FIG6 , the processor instruction fetching device 600 includes the following modules.

接收模块610，用于接收取指令请求；所述取指令请求的指令块中包括普通指令和压缩指令；The receiving module 610 is used to receive an instruction fetch request; the instruction block of the instruction fetch request includes ordinary instructions and compressed instructions;

取指令模块620，用于根据所述取指令请求，确定所述取指令请求的有效结果；所述有效结果用于表征所述取指令请求是否需要从所述高速缓存器中跨行取指令；所述高速缓存器中包括奇体存储体和偶体存储体，以使所述高速缓存器中连续的两个高速缓存行的地址处于两个不同的奇偶体存储体上；The instruction fetch module 620 is used to determine the effective result of the instruction fetch request according to the instruction fetch request; the effective result is used to indicate whether the instruction fetch request needs to fetch instructions across rows from the cache; the cache includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-even-body storage bodies;

本实施例提供的装置应用于电子设备中的处理器，电子设备包括处理器和存储器，存储器包括高速缓存器，该装置包括：接收模块610，用于接收取指令请求，其中，取指令请求的指令块中包括普通指令和压缩指令；然后，取指令模块620，用于根据取指令请求，确定取指令请求的有效结果，其中，有效结果用于表征取指令请求是否需要从高速缓存器中跨行取指令，高速缓存器中包括奇体存储体和偶体存储体，以使高速缓存器中连续的两个高速缓存行的地址处于两个不同的奇偶体存储体上；进而，在有效结果为有效的情况下，根据取指令请求从指令块所在的连续两个高速缓存行中取出目标指令，所述指令块为基于取指令请求中的取指地址确定的；相反地，在有效结果为无效的情况下，根据取指令请求从指令块所在的高速缓存行中取出目标指令。The device provided in this embodiment is applied to a processor in an electronic device, the electronic device includes a processor and a memory, the memory includes a cache, and the device includes: a receiving module 610, used to receive an instruction fetch request, wherein the instruction block of the instruction fetch request includes ordinary instructions and compressed instructions; then, an instruction fetch module 620, used to determine the valid result of the instruction fetch request according to the instruction fetch request, wherein the valid result is used to characterize whether the instruction fetch request needs to fetch instructions across rows from the cache, the cache includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-even-body storage bodies; then, when the valid result is valid, the target instruction is fetched from the two consecutive cache lines where the instruction block is located according to the instruction fetch request, and the instruction block is determined based on the instruction fetch address in the instruction fetch request; on the contrary, when the valid result is invalid, the target instruction is fetched from the cache line where the instruction block is located according to the instruction fetch request.

本发明中高速缓存器中包括奇体存储体和偶体存储体，以使高速缓存器中连续的两个高速缓存行的地址处于两个不同的奇偶体存储体上，根据取指令请求确定取指令请求的有效结果，也即是否需要从高速缓存器中跨行取指令，在需要跨行取指令的情况下，根据取指令请求从指令块所在的连续两个高速缓存行中取出目标指令，能够实现跨高速缓存行同时取跨行指令的目标，提高了处理器性能。The cache in the present invention includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-body storage bodies, and the effective result of the instruction fetch request is determined according to the instruction fetch request, that is, whether it is necessary to fetch instructions across lines from the cache. When it is necessary to fetch instructions across lines, the target instruction is fetched from two consecutive cache lines where the instruction block is located according to the instruction fetch request, which can achieve the goal of simultaneously fetching cross-line instructions across cache lines, thereby improving processor performance.

根据本发明提供的一种处理器取指令装置600，所述取指令模块620，具体用于：According to a processor instruction fetching device 600 provided by the present invention, the instruction fetching module 620 is specifically used for:

根据本发明提供的一种处理器取指令装置600，所述标识位的取值包括1或0，所述取指令模块620，还用于：According to a processor instruction fetching device 600 provided by the present invention, the value of the flag bit includes 1 or 0, and the instruction fetching module 620 is further used for:

根据本发明提供的一种处理器取指令装置600，所述取指令请求中包括所述取指地址，所述取指地址为所述取指令请求对应的目标高速缓存行的访问地址，所述取指地址包括标签、索引、块内偏移，所述标签用于与所述高速缓存器中的所述高速缓存行做多路标签比较，所述索引用于指示所述取指令请求对应的所述高速缓存行的行号，所述索引包含高位索引和低位索引，所述高位索引用于指示所述高速缓存行的行地址，所述低位索引用于指示存储体的地址，所述块内偏移（Block Offset）用于指示所述取指令请求的所述指令块在所述高速缓存行内的具体位置；According to a processor instruction fetching device 600 provided by the present invention, the instruction fetching request includes the instruction fetching address, the instruction fetching address is the access address of the target cache line corresponding to the instruction fetching request, the instruction fetching address includes a tag, an index, and a block offset, the tag is used to perform a multi-way tag comparison with the cache line in the cache, the index is used to indicate the row number of the cache line corresponding to the instruction fetching request, the index includes a high-order index and a low-order index, the high-order index is used to indicate the row address of the cache line, the low-order index is used to indicate the address of the storage body, and the block offset (Block Offset) is used to indicate the specific position of the instruction block of the instruction fetching request in the cache line;

所述取指令模块620，还用于：The instruction fetch module 620 is further used for:

根据本发明提供的一种处理器取指令装置600，所述装置还包括预取指令模块；According to a processor instruction fetching device 600 provided by the present invention, the device further comprises an instruction prefetching module;

所述预取指令模块，用于：The pre-fetch instruction module is used for:

根据本发明提供的一种处理器取指令装置600，所述存储器还包括下一级存储系统；所述预取指令模块，具体用于：According to a processor instruction fetching device 600 provided by the present invention, the memory further includes a next-level storage system; the pre-fetch instruction module is specifically used for:

根据本发明提供的一种处理器取指令装置600，所述预取指令模块，还用于：According to a processor instruction fetching device 600 provided by the present invention, the pre-fetch instruction module is further used for:

图7示例了一种电子设备的实体结构示意图，如图7所示，该电子设备可以包括：处理器（processor）710、通信接口（Communications Interface）720、存储器（memory）730和通信总线740，其中，处理器710，通信接口720，存储器730通过通信总线740完成相互间的通信。处理器710可以调用存储器730中的逻辑指令，以执行处理器取指令方法，应用于电子设备中的处理器，所述电子设备包括所述处理器和存储器，所述存储器包括高速缓存器；该方法包括：接收取指令请求；所述取指令请求的指令块中包括普通指令和压缩指令；FIG7 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG7 , the electronic device may include: a processor 710, a communications interface 720, a memory 730 and a communication bus 740, wherein the processor 710, the communications interface 720 and the memory 730 communicate with each other through the communication bus 740. The processor 710 may call the logic instructions in the memory 730 to execute the processor instruction fetching method, which is applied to a processor in an electronic device, the electronic device includes the processor and the memory, the memory includes a cache; the method includes: receiving an instruction fetching request; the instruction block of the instruction fetching request includes ordinary instructions and compressed instructions;

此外，上述的存储器730中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，Random Access Memory）、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the logic instructions in the above-mentioned memory 730 can be implemented in the form of a software functional unit and can be stored in a computer-readable storage medium when it is sold or used as an independent product. Based on this understanding, the technical solution of the present invention can be essentially or partly embodied in the form of a software product that contributes to the prior art. The computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc. Various media that can store program codes.

另一方面，本发明还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，计算机程序可存储在非暂态计算机可读存储介质上，所述计算机程序被处理器执行时，计算机能够执行上述各方法所提供的处理器取指令方法，应用于电子设备中的处理器，所述电子设备包括所述处理器和存储器，所述存储器包括高速缓存器；该方法包括：On the other hand, the present invention further provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer-readable storage medium, when the computer program is executed by a processor, the computer can execute the processor instruction fetching method provided by the above methods, applied to a processor in an electronic device, the electronic device includes the processor and a memory, the memory includes a cache; the method includes:

又一方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各方法提供的处理器取指令方法，应用于电子设备中的处理器，所述电子设备包括所述处理器和存储器，所述存储器包括高速缓存器；该方法包括：In another aspect, the present invention further provides a non-transitory computer-readable storage medium having a computer program stored thereon, the computer program being implemented when executed by a processor to execute the processor instruction fetching method provided by the above methods, and being applied to a processor in an electronic device, the electronic device comprising the processor and a memory, the memory comprising a cache; the method comprising:

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Ordinary technicians in this field can understand and implement it without paying creative labor.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A processor instruction fetching method, characterized in that it is applied to a processor in an electronic device, the electronic device comprises the processor and a memory, the memory comprises a cache; the method comprises:

receiving an instruction fetch request; the instruction block of the instruction fetch request includes common instructions and compressed instructions;

Determine, according to the instruction fetch request, a valid result of the instruction fetch request; the valid result is used to indicate whether the instruction fetch request needs to fetch instructions across rows from the cache; the cache includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-even-body storage bodies;

When the valid result is valid, fetching the target instruction from two consecutive cache lines where the instruction block is located according to the instruction fetch request; the instruction block is determined based on the instruction fetch address in the instruction fetch request;

When the valid result is invalid, the target instruction is fetched from the cache line where the instruction block is located according to the instruction fetch request.

2. The processor instruction fetching method according to claim 1, wherein determining a valid result of the instruction fetching request according to the instruction fetching request comprises:

Determine the identification bit of the instruction block according to the instruction fetch address; the identification bit is used to mark whether the instruction block contains a cross-line instruction, and the cross-line instruction is that all bytes of the cross-line instruction fall in two different cache lines;

A valid result of the instruction fetch request is determined according to the identification bit of the instruction block.

3. The processor instruction fetching method according to claim 2, wherein the value of the flag bit includes 1 or 0, and the determining the valid result of the instruction fetching request according to the flag bit of the instruction block includes:

When the value of the flag bit is 1, the valid result is determined to be valid;

When the value of the flag bit is 0, the valid result is determined to be invalid.

4. The processor instruction fetching method according to claim 1, characterized in that the instruction fetch request includes the instruction fetch address, the instruction fetch address is the access address of the target cache line corresponding to the instruction fetch request, the instruction fetch address includes a tag, an index, and an offset within a block, the tag is used to perform a multi-way tag comparison with the cache line in the cache, the index is used to indicate the row number of the cache line corresponding to the instruction fetch request, the index includes a high-order index and a low-order index, the high-order index is used to indicate the row address of the cache line, the low-order index is used to indicate the address of a storage body, and the offset within the block is used to indicate the specific position of the instruction block of the instruction fetch request in the cache line;

The fetching of the target instruction from two consecutive cache lines where the instruction block is located according to the instruction fetch request comprises:

Determining a way corresponding to the instruction fetch request according to a tag in the instruction fetch address;

Determine the cache line and the corresponding storage body corresponding to the instruction fetch request according to the index in the instruction fetch address and the corresponding way; wherein each cache line corresponds to a tag storage body of the cache line and a data storage body of the cache line, the tag storage body is used to store tag information of each cache line, and the data storage body is used to store instructions in each cache line;

The target instruction is fetched according to the cache line corresponding to the instruction fetch request, the corresponding storage bank, and the offset within the block.

5. The processor instruction fetching method according to any one of claims 1 to 4, characterized in that before receiving the instruction fetching request, it also includes:

receiving an instruction prefetch request, wherein the instruction prefetch request includes a prefetch address of the prefetch request;

Searching the pre-fetched address in the cache to obtain a matching result;

When the matching result is no match, an instruction prefetch operation is performed.

6. The processor instruction fetching method according to claim 5, wherein the memory further comprises a next-level storage system; and the instruction prefetching operation comprises:

Sending an allocation request to a cross-cache line prefetch item by using an instruction prefetcher, wherein the allocation request is used for the cross-cache line prefetch item to apply for allocation of a prefetch item for the prefetch request;

According to the prefetch item, the prefetch request is sent to the next-level storage system; the prefetch request is used for the next-level storage system to predict the instruction access request based on the prefetch request and the historical flag of the cache line; the instruction access request includes an access request for a single cache line or an access request across cache lines.

7. The processor instruction fetching method according to claim 6, wherein sending the prefetch request to the next-level storage system according to the prefetch item comprises:

The prefetch flag of the prefetch request is predicted according to the prefetch item; the prefetch flag is used to indicate whether the prefetch request needs to perform a prefetch instruction across cache lines.

8. The processor instruction fetching method according to claim 7, wherein predicting the prefetch flag of the prefetch request according to the prefetch item comprises:

Initially setting the prefetch flag bits of all cache lines corresponding to the prefetch request to 1;

When a first target number of times that no cross-row prefetch instruction occurs in any cache line corresponding to the prefetch request reaches a first number threshold, the prefetch flag bit of the cache line is determined to be 0; the prefetch flag bit of the cache line being 0 indicates that no cross-row prefetch instruction is performed on the cache line;

When a second target number of cross-row prefetch instructions occurs in any cache line corresponding to the prefetch request reaches a second number threshold, the prefetch flag of the cache line is determined to be 1; the prefetch flag of the cache line being 1 indicates that the cache line resumes the cross-row prefetch instruction.

9. The processor instruction fetching method according to claim 5, wherein the step of querying the pre-fetched address in the cache to obtain a matching result comprises:

Using a prefetcher to record addresses of prefetch requests that have been sent;

The pre-fetch address is matched with the address of the sent pre-fetch request. If no match is found, the pre-fetch address is queried in the cache to obtain a matching result.

10. A processor instruction fetching device, characterized in that it is applied to a processor in an electronic device, the electronic device comprises the processor and a memory, the memory comprises a cache; the device comprises:

A receiving module, used for receiving an instruction fetch request; the instruction block of the instruction fetch request includes common instructions and compressed instructions;

an instruction fetch module, for determining a valid result of the instruction fetch request according to the instruction fetch request; the valid result is used to characterize whether the instruction fetch request needs to fetch instructions across rows from the cache; the cache includes an odd-body storage body and an even-body storage body, so that the addresses of two consecutive cache lines in the cache are located on two different odd-even-body storage bodies;

11. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the processor instruction fetching method as claimed in any one of claims 1 to 9 when executing the computer program.

12. A non-transitory computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the processor instruction fetching method according to any one of claims 1 to 9 is implemented.