CN118369652A

CN118369652A - Priority-based cache line fitting in a compressed memory system for a processor-based system

Info

Publication number: CN118369652A
Application number: CN202280081645.5A
Authority: CN
Inventors: N·金恩; R·辛尼尔; G·S·查布拉; K·王
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2022-01-10
Filing date: 2022-12-09
Publication date: 2024-07-19
Also published as: US20230236979A1; US11829292B1; CN118401929A

Abstract

A compressed memory system of a processor-based system includes a memory partitioning circuit for partitioning a memory region into data regions having different priority levels. The system also includes a cache line selection circuit for selecting a first cache line from the high priority data region and a second cache line from the low priority data region. The system also includes a compression circuit to compress the cache line to obtain a first compressed cache line and a second compressed cache line. The system also includes a cache line encapsulation circuit to encapsulate the compressed cache line such that the first compressed cache line is written to a first predetermined portion of a candidate compressed cache line and the second cache line or a portion of the second compressed cache line is written to a second predetermined portion of the candidate compressed cache line. The first predetermined portion is larger than the second predetermined portion.

Description

Priority-based cache line fitting in compressed memory systems for processor-based systems

优先权申请Priority application

本申请要求于2022年1月10日提交的名称为“基于处理器的系统的经压缩存储器系统中的基于优先级的高速缓存行拟合(PRIORITY-BASED CACHE-LINE FITTING INCOMPRESSED MEMORY SYSTEMS OF PROCESSOR-BASED SYSTEMS)”的美国专利申请序列第17/572,471号的优先权，该美国专利申请全文以引用的方式并入本文。This application claims priority to U.S. patent application serial number 17/572,471, filed on January 10, 2022, entitled “PRIORITY-BASED CACHE-LINE FITTING IN COMPPRESSED MEMORY SYSTEMS OF PROCESSOR-BASED SYSTEMS,” which is incorporated herein by reference in its entirety.

本申请还要求于2022年1月10日提交的名称为“基于处理器的系统的经压缩存储器系统中的基于优先级的高速缓存行拟合(PRIORITY-BASED CACHE-LINE FITTING INCOMPRESSED MEMORY SYSTEMS OF PROCESSOR-BASED SYSTEMS)”的美国专利申请序列第17/572,472号的优先权，该美国专利申请全文以引用的方式并入本文。This application also claims priority to U.S. patent application serial number 17/572,472, filed on January 10, 2022, entitled “PRIORITY-BASED CACHE-LINE FITTING IN COMPPRESSED MEMORY SYSTEMS OF PROCESSOR-BASED SYSTEMS,” which is incorporated herein by reference in its entirety.

技术领域Technical Field

所公开的实施方案一般涉及存储器系统，并且具体地涉及基于处理器的系统的经压缩存储器系统中的基于优先级的高速缓存行拟合。The disclosed embodiments relate generally to memory systems, and in particular to priority-based cache line fitting in a compressed memory system of a processor-based system.

背景技术Background technique

随着由常规基于处理器的系统执行的应用在大小和复杂性方面的增加，存储器带宽可变成对系统性能的约束。虽然可通过使用较宽的存储器通信信道来增加可用存储器带宽，但此方法可能会在增加成本和/或集成电路(IC)上的存储器所需的附加面积方面招致惩罚。在不增加存储器通信信道的宽度的情况下增加基于处理器的系统中的存储器带宽的一种方法是通过使用数据压缩。可在基于处理器的系统中采用数据压缩系统来以压缩格式存储数据，因此增加有效存储容量而不增加物理存储容量。就这一点而言，一些常规的数据压缩系统提供压缩引擎来压缩待写入主系统存储器的数据。在执行压缩后，压缩引擎将经压缩数据连同元数据一起写入系统存储器，该元数据将经压缩数据的虚拟地址映射到实际存储经压缩数据的系统存储器中的物理地址。然而，因为元数据用于地址映射，所以存储器访问招致额外的读取和写入，这负面地影响系统性能。例如，访问存储器中的特定高速缓存行可能需要访问存储器中的元数据和地址计算的附加层以确定对应于该特定高速缓存行的存储器中的经压缩高速缓存行的位置。这会增加采用存储容量压缩的基于处理器的系统的复杂性、成本和延迟。As applications executed by conventional processor-based systems increase in size and complexity, memory bandwidth can become a constraint on system performance. Although available memory bandwidth can be increased by using wider memory communication channels, this approach may incur penalties in terms of increased cost and/or additional area required for memory on an integrated circuit (IC). One way to increase memory bandwidth in a processor-based system without increasing the width of the memory communication channel is by using data compression. A data compression system can be employed in a processor-based system to store data in a compressed format, thereby increasing effective storage capacity without increasing physical storage capacity. In this regard, some conventional data compression systems provide a compression engine to compress data to be written to the main system memory. After performing compression, the compression engine writes the compressed data to the system memory together with metadata that maps the virtual address of the compressed data to a physical address in the system memory where the compressed data is actually stored. However, because metadata is used for address mapping, memory access incurs additional reads and writes, which negatively affects system performance. For example, accessing a particular cache line in memory may require accessing metadata in memory and an additional layer of address calculation to determine the location of the compressed cache line in memory corresponding to the particular cache line. This increases the complexity, cost, and latency of processor-based systems that employ storage capacity compression.

发明内容Summary of the invention

因此，需要最小化传统压缩系统的开销的系统、方法和技术。本文所述的技术可用于去除大多数高速缓存行的元数据，并且因此改善高速缓存行访问的平均性能。Therefore, there is a need for systems, methods, and techniques that minimize the overhead of conventional compression systems. The techniques described herein can be used to remove metadata for most cache lines and thus improve the average performance of cache line accesses.

在一个方面，提供了一种用于在基于处理器的系统的经压缩存储器系统中压缩数据的方法。该方法包括：将存储区域划分为多个数据区域，每个数据区域与相应的优先级等级相关联。该方法还包括：(i)从多个数据区域中的第一数据区域选择第一高速缓存行，以及(ii)从多个数据区域中的第二数据区域选择第二高速缓存行。第一数据区域具有比第二数据区域更高的优先级等级。该方法还包括：(i)压缩第一高速缓存行以获得第一经压缩高速缓存行，以及(ii)压缩第二高速缓存行以获得第二经压缩高速缓存行。根据确定第一高速缓存行是可压缩的，该方法包括：(i)将第一经压缩高速缓存行写入候选经压缩高速缓存行的第一预先确定的部分，以及(ii)将第二高速缓存行或第二经压缩高速缓存行的第二部分写入候选经压缩高速缓存行的第二预先确定的部分。第一预先确定的部分大于第二预先确定的部分。In one aspect, a method for compressing data in a compressed memory system of a processor-based system is provided. The method includes: dividing a storage area into a plurality of data areas, each data area being associated with a corresponding priority level. The method also includes: (i) selecting a first cache line from a first data area of the plurality of data areas, and (ii) selecting a second cache line from a second data area of the plurality of data areas. The first data area has a higher priority level than the second data area. The method also includes: (i) compressing the first cache line to obtain a first compressed cache line, and (ii) compressing the second cache line to obtain a second compressed cache line. Based on determining that the first cache line is compressible, the method includes: (i) writing the first compressed cache line to a first predetermined portion of a candidate compressed cache line, and (ii) writing the second cache line or a second portion of the second compressed cache line to a second predetermined portion of the candidate compressed cache line. The first predetermined portion is larger than the second predetermined portion.

在一些具体实施中，该方法还包括：根据确定(i)第一高速缓存行是不可压缩的、(ii)第二高速缓存行是不可压缩的，或者(iii)第二经压缩高速缓存行未拟合在候选经压缩高速缓存行的第二预先确定的部分内，在候选经压缩高速缓存行中设定溢出指针。根据第二高速缓存行的压缩率或第二经压缩高速缓存行的大小，溢出指针指向多个溢出块中的一个溢出块。多个溢出块中的每个溢出块具有不同的大小。In some implementations, the method further includes: setting an overflow pointer in the candidate compressed cache line based on determining that (i) the first cache line is incompressible, (ii) the second cache line is incompressible, or (iii) the second compressed cache line does not fit within a second predetermined portion of the candidate compressed cache line. The overflow pointer points to an overflow block in a plurality of overflow blocks based on a compression ratio of the second cache line or a size of the second compressed cache line. Each overflow block in the plurality of overflow blocks has a different size.

在一些具体实施中，该方法还包括：接收对第一高速缓存行或第二高速缓存行的读取请求；以及响应于接收到该读取请求：根据确定溢出指针被设定，根据溢出指针从多个溢出块中的溢出块检索数据。In some specific implementations, the method also includes: receiving a read request for the first cache line or the second cache line; and in response to receiving the read request: retrieving data from an overflow block in the plurality of overflow blocks according to the overflow pointer based on determining that the overflow pointer is set.

在一些具体实施中，该方法还包括：根据确定第一高速缓存行是可压缩的：在候选经压缩高速缓存行中设定第一压缩率控制位；以及根据确定第一高速缓存行是不可压缩的：将第一高速缓存行的第一部分写入候选经压缩高速缓存行；将第一高速缓存行的剩余部分写入溢出块；重置候选经压缩高速缓存行中的第一压缩率控制位；以及在候选经压缩高速缓存行中设定溢出指针以指向溢出块。In some specific implementations, the method also includes: based on determining that the first cache line is compressible: setting a first compression rate control bit in the candidate compressed cache line; and based on determining that the first cache line is incompressible: writing a first portion of the first cache line to the candidate compressed cache line; writing a remaining portion of the first cache line to an overflow block; resetting the first compression rate control bit in the candidate compressed cache line; and setting an overflow pointer in the candidate compressed cache line to point to the overflow block.

在一些具体实施中，该方法还包括：响应于接收到对第一高速缓存行的读取请求：根据确定第一压缩率控制位被设定：从候选经压缩高速缓存行检索第一高速缓存行；以及根据确定第一压缩率控制位被重置：从候选经压缩高速缓存行检索第一高速缓存行的第一部分；以及基于溢出指针来从溢出块检索第一高速缓存行的第二部分。In some specific implementations, the method also includes: in response to receiving a read request for the first cache line: based on determining that the first compression rate control bit is set: retrieving the first cache line from a candidate compressed cache line; and based on determining that the first compression rate control bit is reset: retrieving a first portion of the first cache line from the candidate compressed cache line; and retrieving a second portion of the first cache line from an overflow block based on an overflow pointer.

在一些具体实施中，该方法还包括：根据确定第一高速缓存行是不可压缩的：根据第二高速缓存行是否是可压缩的，将第二高速缓存行或第二经压缩高速缓存行写入溢出块；以及重置候选经压缩高速缓存行中的第二压缩率控制位以指示第二高速缓存行是否是不可压缩的。In some specific implementations, the method also includes: based on determining that the first cache line is incompressible: writing the second cache line or the second compressed cache line to the overflow block based on whether the second cache line is compressible; and resetting the second compression rate control bit in the candidate compressed cache line to indicate whether the second cache line is incompressible.

在一些具体实施中，该方法还包括：响应于接收到对第二高速缓存行的读取请求：基于第二压缩率控制位来从溢出块检索第二高速缓存行或第二经压缩高速缓存行。In some implementations, the method further includes: in response to receiving a read request for the second cache line: retrieving the second cache line or the second compressed cache line from the overflow block based on the second compression rate control bit.

在一些具体实施中，该方法还包括：响应于接收到对第一高速缓存行的高速缓存行写入请求：压缩第一高速缓存行以获得具有第一大小的第一经更新高速缓存行；根据确定第一大小等于或小于候选经压缩高速缓存行的第一预先确定的大小，将第一经更新高速缓存行写入候选经压缩高速缓存行；根据确定第一大小大于第一预先确定的大小并且等于或小于候选经压缩高速缓存行的第二预先确定的大小，基于第一经更新高速缓存行来对候选经压缩高速缓存行执行读取-修改-写入操作；以及根据确定第一大小大于候选经压缩高速缓存行的第二预先确定的大小，基于第一经更新高速缓存行来对候选经压缩高速缓存行执行读取-修改-写入操作并且对多个溢出块中的溢出块执行读取-修改-写入操作。In some specific implementations, the method also includes: in response to receiving a cache line write request for the first cache line: compressing the first cache line to obtain a first updated cache line having a first size; writing the first updated cache line to the candidate compressed cache line based on a determination that the first size is equal to or less than a first predetermined size of the candidate compressed cache line; performing a read-modify-write operation on the candidate compressed cache line based on the first updated cache line based on a determination that the first size is greater than the first predetermined size and equal to or less than a second predetermined size of the candidate compressed cache line; and performing a read-modify-write operation on the candidate compressed cache line based on the first updated cache line and performing a read-modify-write operation on an overflow block of a plurality of overflow blocks based on a determination that the first size is greater than the second predetermined size of the candidate compressed cache line.

在一些具体实施中，第一预先确定的大小是候选经压缩高速缓存行的大小的一半。In some implementations, the first predetermined size is half the size of the candidate compressed cache line.

在一些具体实施中，该方法还包括：在将第二高速缓存行或第二经压缩高速缓存行的第二部分写入候选经压缩高速缓存行的第二预先确定的部分时，在候选经压缩高速缓存行中写入结束位索引以指示第二高速缓存行或第二经压缩高速缓存行的第二部分被写入候选经压缩高速缓存行内的何处；以及基于候选经压缩高速缓存行中的结束位索引来计算第二预先确定的大小。In some specific implementations, the method also includes: when writing the second cache line or the second portion of the second compressed cache line to the second predetermined portion of the candidate compressed cache line, writing an end bit index in the candidate compressed cache line to indicate where the second cache line or the second portion of the second compressed cache line is written within the candidate compressed cache line; and calculating the second predetermined size based on the end bit index in the candidate compressed cache line.

在一些具体实施中，该方法还包括：响应于接收到对第二高速缓存行的高速缓存行写入请求：压缩第二高速缓存行以获得具有第二大小的第二经更新高速缓存行；根据确定第二大小与第一经压缩高速缓存行的大小的总和小于候选经压缩高速缓存行的第一预先确定的大小，执行读取-修改-写入操作以将第二经更新高速缓存行写入候选经压缩高速缓存行；以及根据确定第二大小与第一经压缩高速缓存行的大小的总和不小于候选经压缩高速缓存行的第一预先确定的大小，(i)执行第一读取-修改-写入操作以将第二经更新高速缓存行的第一部分写入候选经压缩高速缓存行，以及(ii)执行第二读取-修改-写入操作以将第二经更新高速缓存行的剩余部分写入溢出指针所指向的溢出块。In some specific implementations, the method also includes: in response to receiving a cache line write request for a second cache line: compressing the second cache line to obtain a second updated cache line having a second size; based on determining that the sum of the second size and the size of the first compressed cache line is less than a first predetermined size of the candidate compressed cache line, performing a read-modify-write operation to write the second updated cache line to the candidate compressed cache line; and based on determining that the sum of the second size and the size of the first compressed cache line is not less than the first predetermined size of the candidate compressed cache line, (i) performing a first read-modify-write operation to write a first portion of the second updated cache line to the candidate compressed cache line, and (ii) performing a second read-modify-write operation to write a remaining portion of the second updated cache line to the overflow block pointed to by the overflow pointer.

在一些具体实施中，该方法还包括：响应于接收到对第一高速缓存行或第二高速缓存行的高速缓存行写入请求：压缩第一高速缓存行或第二高速缓存行以获得具有经更新大小的经更新经压缩高速缓存行；以及根据确定经更新大小不能拟合在溢出指针所指向的溢出块内，释放溢出指针并更新溢出指针以指向多个溢出块中的新溢出块。In some specific implementations, the method also includes: in response to receiving a cache line write request for the first cache line or the second cache line: compressing the first cache line or the second cache line to obtain an updated compressed cache line with an updated size; and based on determining that the updated size cannot fit within the overflow block pointed to by the overflow pointer, releasing the overflow pointer and updating the overflow pointer to point to a new overflow block among the multiple overflow blocks.

在一些具体实施中，第一经压缩高速缓存行、第二经压缩高速缓存行和候选经压缩高速缓存行具有相等的大小。In some implementations, the first compressed cache line, the second compressed cache line, and the candidate compressed cache line have equal sizes.

在一些具体实施中，第一数据区域和第二数据区域具有相等的大小。In some implementations, the first data region and the second data region are of equal size.

在一些具体实施中，第二预先确定的部分小于候选经压缩高速缓存行的大小的一半。In some implementations, the second predetermined portion is less than half the size of the candidate compressed cache line.

在一些具体实施中，在相反的方向上将第一经压缩高速缓存行和第二经压缩高速缓存行写入候选经压缩高速缓存行。第一经压缩高速缓存行和第二经压缩高速缓存行隔开一个或多个字节。In some implementations, the first compressed cache line and the second compressed cache line are written to the candidate compressed cache line in opposite directions. The first compressed cache line and the second compressed cache line are separated by one or more bytes.

另一方面，提供了一种基于处理器的系统的经压缩存储器系统。该经压缩存储器系统包括被配置为将存储区域划分为多个数据区域的存储器划分电路。每个数据区域与相应的优先级等级相关联。该经压缩存储器系统还包括高速缓存行选择电路，该高速缓存行选择电路被配置为：(i)从多个数据区域中的第一数据区域选择第一高速缓存行，以及(ii)从多个数据区域中的第二数据区域选择第二高速缓存行。第一数据区域具有比第二数据区域更高的优先级等级。该经压缩存储器系统还包括压缩电路，该压缩电路被配置为：(i)压缩第一高速缓存行以获得第一经压缩高速缓存行，以及(ii)压缩第二高速缓存行以获得第二经压缩高速缓存行。该经压缩存储器系统还包括高速缓存行封装电路，该高速缓存行封装电路被配置为：根据确定第一高速缓存行是可压缩的：(i)将第一经压缩高速缓存行写入候选经压缩高速缓存行的第一预先确定的部分，以及(ii)将第二高速缓存行或第二经压缩高速缓存行的第二部分写入候选经压缩高速缓存行的第二预先确定的部分。第一预先确定的部分大于第二预先确定的部分。On the other hand, a compressed memory system of a processor-based system is provided. The compressed memory system includes a memory partitioning circuit configured to partition a storage area into a plurality of data areas. Each data area is associated with a corresponding priority level. The compressed memory system also includes a cache line selection circuit configured to: (i) select a first cache line from a first data area of the plurality of data areas, and (ii) select a second cache line from a second data area of the plurality of data areas. The first data area has a higher priority level than the second data area. The compressed memory system also includes a compression circuit configured to: (i) compress the first cache line to obtain a first compressed cache line, and (ii) compress the second cache line to obtain a second compressed cache line. The compressed memory system also includes a cache line packing circuit configured to: based on determining that the first cache line is compressible: (i) write the first compressed cache line to a first predetermined portion of a candidate compressed cache line, and (ii) write the second cache line or the second portion of the second compressed cache line to a second predetermined portion of the candidate compressed cache line. The first predetermined portion is greater than the second predetermined portion.

在一些具体实施中，高速缓存行封装电路还被配置为：根据确定(i)第一高速缓存行是不可压缩的、(ii)第二高速缓存行是不可压缩的，或者(iii)第二经压缩高速缓存行未拟合在候选经压缩高速缓存行的第二预先确定的部分内，在候选经压缩高速缓存行中设定溢出指针。根据第二高速缓存行的压缩率或第二经压缩高速缓存行的大小，溢出指针指向多个溢出块中的一个溢出块，并且多个溢出块中的每个溢出块具有不同的大小。In some implementations, the cache line packing circuit is further configured to: in response to determining that (i) the first cache line is incompressible, (ii) the second cache line is incompressible, or (iii) the second compressed cache line does not fit within a second predetermined portion of the candidate compressed cache line, set an overflow pointer in the candidate compressed cache line. In response to a compression ratio of the second cache line or a size of the second compressed cache line, the overflow pointer points to one of a plurality of overflow blocks, and each of the plurality of overflow blocks has a different size.

在一些具体实施中，高速缓存行封装电路还被配置为：接收对第一高速缓存行或第二高速缓存行的读取请求；以及响应于接收到该读取请求：根据确定溢出指针被设定，根据溢出指针从多个溢出块中的溢出块检索数据。In some specific implementations, the cache line packing circuit is also configured to: receive a read request for the first cache line or the second cache line; and in response to receiving the read request: retrieve data from an overflow block in the plurality of overflow blocks according to the overflow pointer based on determining that the overflow pointer is set.

在一些具体实施中，高速缓存行封装电路还被配置为：根据确定第一高速缓存行是可压缩的：在候选经压缩高速缓存行中设定第一压缩率控制位；以及根据确定第一高速缓存行是不可压缩的：将第一高速缓存行的第一部分写入候选经压缩高速缓存行；将第一高速缓存行的剩余部分写入溢出块；重置候选经压缩高速缓存行中的第一压缩率控制位；以及在候选经压缩高速缓存行中设定溢出指针以指向溢出块。In some specific implementations, the cache line packing circuit is also configured to: based on determining that the first cache line is compressible: set a first compression rate control bit in the candidate compressed cache line; and based on determining that the first cache line is incompressible: write a first portion of the first cache line to the candidate compressed cache line; write the remaining portion of the first cache line to an overflow block; reset the first compression rate control bit in the candidate compressed cache line; and set an overflow pointer in the candidate compressed cache line to point to the overflow block.

在一些具体实施中，高速缓存行封装电路还被配置为：响应于接收到对第一高速缓存行的读取请求：根据确定第一压缩率控制位被设定：从候选经压缩高速缓存行检索第一高速缓存行；以及根据确定第一压缩率控制位被重置：从候选经压缩高速缓存行检索第一高速缓存行的第一部分；以及基于溢出指针来从溢出块检索第一高速缓存行的第二部分。In some specific implementations, the cache line packing circuit is also configured to: in response to receiving a read request for the first cache line: based on determining that the first compression rate control bit is set: retrieve the first cache line from a candidate compressed cache line; and based on determining that the first compression rate control bit is reset: retrieve a first portion of the first cache line from the candidate compressed cache line; and retrieve a second portion of the first cache line from the overflow block based on the overflow pointer.

在一些具体实施中，高速缓存行封装电路还被配置为：根据确定第一高速缓存行是不可压缩的：根据第二高速缓存行是否是可压缩的，将第二高速缓存行或第二经压缩高速缓存行写入溢出块；以及重置候选经压缩高速缓存行中的第二压缩率控制位以指示第二高速缓存行是否是不可压缩的。In some specific implementations, the cache line packing circuit is further configured to: based on determining that the first cache line is incompressible: write the second cache line or the second compressed cache line to the overflow block based on whether the second cache line is compressible; and reset the second compression rate control bit in the candidate compressed cache line to indicate whether the second cache line is incompressible.

在一些具体实施中，高速缓存行封装电路还被配置为：响应于接收到对第二高速缓存行的读取请求：基于第二压缩率控制位来从溢出块检索第二高速缓存行或第二经压缩高速缓存行。In some implementations, the cache line packing circuit is further configured to: in response to receiving a read request for the second cache line: retrieve the second cache line or the second compressed cache line from the overflow block based on the second compression rate control bit.

在一些具体实施中，高速缓存行封装电路还被配置为：响应于接收到对第一高速缓存行的高速缓存行写入请求：压缩第一高速缓存行以获得具有第一大小的第一经更新高速缓存行；根据确定第一大小等于或小于候选经压缩高速缓存行的第一预先确定的大小，将第一经更新高速缓存行写入候选经压缩高速缓存行；根据确定第一大小大于第一预先确定的大小并且等于或小于候选经压缩高速缓存行的第二预先确定的大小，基于第一经更新高速缓存行来对候选经压缩高速缓存行执行读取-修改-写入操作；以及根据确定第一大小大于候选经压缩高速缓存行的第二预先确定的大小，基于第一经更新高速缓存行来对候选经压缩高速缓存行执行读取-修改-写入操作并且对多个溢出块中的溢出块执行读取-修改-写入操作。In some specific implementations, the cache line packing circuit is also configured to: in response to receiving a cache line write request for a first cache line: compress the first cache line to obtain a first updated cache line having a first size; write the first updated cache line to the candidate compressed cache line based on a determination that the first size is equal to or less than a first predetermined size of the candidate compressed cache line; perform a read-modify-write operation on the candidate compressed cache line based on the first updated cache line based on a determination that the first size is greater than the first predetermined size and equal to or less than a second predetermined size of the candidate compressed cache line; and perform a read-modify-write operation on the candidate compressed cache line based on the first updated cache line and perform a read-modify-write operation on an overflow block among a plurality of overflow blocks based on a determination that the first size is greater than the second predetermined size of the candidate compressed cache line.

在一些具体实施中，高速缓存行封装电路还被配置为：在将第二高速缓存行或第二经压缩高速缓存行的第二部分写入候选经压缩高速缓存行的第二预先确定的部分时，在候选经压缩高速缓存行中写入结束位索引以指示第二高速缓存行或第二经压缩高速缓存行的第二部分被写入候选经压缩高速缓存行内的何处；以及基于候选经压缩高速缓存行中的结束位索引来计算第二预先确定的大小。In some specific implementations, the cache line packing circuit is also configured to: when writing the second cache line or the second portion of the second compressed cache line to the second predetermined portion of the candidate compressed cache line, write an end bit index in the candidate compressed cache line to indicate where the second cache line or the second portion of the second compressed cache line is written within the candidate compressed cache line; and calculate the second predetermined size based on the end bit index in the candidate compressed cache line.

在一些具体实施中，高速缓存行封装电路还被配置为：响应于接收到对第二高速缓存行的高速缓存行写入请求：压缩第二高速缓存行以获得具有第二大小的第二经更新高速缓存行；根据确定第二大小与第一经压缩高速缓存行的大小的总和小于候选经压缩高速缓存行的第一预先确定的大小，执行读取-修改-写入操作以将第二经更新高速缓存行写入候选经压缩高速缓存行；以及根据确定第二大小与第一经压缩高速缓存行的大小的总和不小于候选经压缩高速缓存行的第一预先确定的大小，(i)执行第一读取-修改-写入操作以将第二经更新高速缓存行的第一部分写入候选经压缩高速缓存行，以及(ii)执行第二读取-修改-写入操作以将第二经更新高速缓存行的剩余部分写入溢出指针所指向的溢出块。In some specific implementations, the cache line packing circuit is further configured to: in response to receiving a cache line write request for a second cache line: compress the second cache line to obtain a second updated cache line having a second size; based on determining that the sum of the second size and the size of the first compressed cache line is less than a first predetermined size of the candidate compressed cache line, perform a read-modify-write operation to write the second updated cache line to the candidate compressed cache line; and based on determining that the sum of the second size and the size of the first compressed cache line is not less than the first predetermined size of the candidate compressed cache line, (i) perform a first read-modify-write operation to write a first portion of the second updated cache line to the candidate compressed cache line, and (ii) perform a second read-modify-write operation to write a remaining portion of the second updated cache line to the overflow block pointed to by the overflow pointer.

在一些具体实施中，高速缓存行封装电路还被配置为：响应于接收到对第一高速缓存行或第二高速缓存行的高速缓存行写入请求：压缩第一高速缓存行或第二高速缓存行以获得具有经更新大小的经更新经压缩高速缓存行；以及根据确定经更新大小不能拟合在溢出指针所指向的溢出块内，释放溢出指针并更新溢出指针以指向多个溢出块中的新溢出块。In some specific implementations, the cache line packing circuit is also configured to: in response to receiving a cache line write request for the first cache line or the second cache line: compress the first cache line or the second cache line to obtain an updated compressed cache line with an updated size; and based on determining that the updated size cannot fit within the overflow block pointed to by the overflow pointer, release the overflow pointer and update the overflow pointer to point to a new overflow block among the multiple overflow blocks.

在一些具体实施中，高速缓存行封装电路还被配置为：在相反的方向上将第一经压缩高速缓存行和第二经压缩高速缓存行写入候选经压缩高速缓存行。第一经压缩高速缓存行和第二经压缩高速缓存行隔开一个或多个字节。In some implementations, the cache line packing circuit is further configured to write the first compressed cache line and the second compressed cache line to the candidate compressed cache line in opposite directions. The first compressed cache line and the second compressed cache line are separated by one or more bytes.

另一方面，提供了一种基于处理器的系统的经压缩存储器系统。该经压缩存储器系统包括存储区域，该存储区域包括多个高速缓存行。每个高速缓存行具有多个优先级等级中的一个优先级等级。该经压缩存储器系统还包括经压缩存储区域，该经压缩存储区域包括多个经压缩高速缓存行。每个经压缩高速缓存行包括第一组数据位，该第一组数据位被配置为在第一方向上保持第一高速缓存行的一部分或压缩后的第一高速缓存行的一部分，该第一高速缓存行具有第一优先级等级。每个经压缩高速缓存行还包括第二组数据位，该第二组数据位被配置为在与第一方向相反的第二方向上保持第二高速缓存行的一部分或压缩后的第二高速缓存行的一部分，该第二高速缓存行具有比第一优先级等级低的第二优先级等级。第一组数据位包括比第二组数据位更多数目的位。On the other hand, a compressed memory system of a processor-based system is provided. The compressed memory system includes a storage area, which includes a plurality of cache lines. Each cache line has a priority level among a plurality of priority levels. The compressed memory system also includes a compressed storage area, which includes a plurality of compressed cache lines. Each compressed cache line includes a first group of data bits, which is configured to keep a portion of a first cache line or a portion of a compressed first cache line in a first direction, and the first cache line has a first priority level. Each compressed cache line also includes a second group of data bits, which is configured to keep a portion of a second cache line or a portion of a compressed second cache line in a second direction opposite to the first direction, and the second cache line has a second priority level lower than the first priority level. The first group of data bits includes a greater number of bits than the second group of data bits.

在一些具体实施中，该经压缩存储器系统还包括溢出存储区域，该溢出存储区域包括多个溢出仓。每个溢出仓被配置为保持不同数目的字节。每个经压缩高速缓存行还包括一组溢出指针位，该一组溢出指针位被配置为保持指向多个溢出仓中的一个溢出仓的指针。In some specific implementations, the compressed memory system also includes an overflow storage area, the overflow storage area includes a plurality of overflow bins. Each overflow bin is configured to hold a different number of bytes. Each compressed cache line also includes a set of overflow pointer bits, the set of overflow pointer bits being configured to hold a pointer to one of the plurality of overflow bins.

在一些具体实施中，每个经压缩高速缓存行还包括：指示第一高速缓存行的压缩率的第一控制位；以及指示第二高速缓存行的压缩率的第二控制位。In some implementations, each compressed cache line further includes: a first control bit indicating a compression rate of the first cache line; and a second control bit indicating a compression rate of the second cache line.

在一些具体实施中，该经压缩存储器系统还包括溢出存储区域，该溢出存储区域包括多个溢出仓，其中每个溢出仓被配置为保持不同数目的字节。每个经压缩高速缓存行还包括一组位，该一组位被配置为保持多个溢出仓中的一个溢出仓的大小。In some embodiments, the compressed memory system further comprises an overflow storage area, the overflow storage area comprising a plurality of overflow bins, wherein each overflow bin is configured to hold a different number of bytes. Each compressed cache line further comprises a set of bits, the set of bits being configured to hold the size of one of the plurality of overflow bins.

在一些具体实施中，每个经压缩高速缓存行还包括指示第二组数据位的结束的结束位索引。In some implementations, each compressed cache line also includes an end bit index indicating an end of the second set of data bits.

在一些具体实施中，每个溢出仓被配置为在第一方向上保持第一高速缓存行的位和/或第二高速缓存行或压缩后的第二高速缓存行的位。In some implementations, each overflow bin is configured to hold bits of the first cache line and/or bits of the second cache line or a compressed second cache line in a first direction.

在一些具体实施中，第一组数据位和第二组数据位隔开一个或多个字节。In some implementations, the first set of data bits and the second set of data bits are separated by one or more bytes.

在一些具体实施中，每个高速缓存行和每个经压缩高速缓存行具有相同的大小。In some implementations, each cache line and each compressed cache line have the same size.

在一些具体实施中，当压缩后的第一高速缓存行或压缩后的第二高速缓存行未拟合在经压缩高速缓存行中时，则每个经压缩高速缓存行还包括一组溢出指针位，该一组溢出指针位被配置为保持指向多个溢出仓中的一个溢出仓的指针。溢出仓被配置为保持第一高速缓存行、压缩后的第一高速缓存行、第二高速缓存行和/或压缩后的第二高速缓存行的位。In some implementations, when the compressed first cache line or the compressed second cache line does not fit in the compressed cache line, each compressed cache line further includes a set of overflow pointer bits configured to hold a pointer to an overflow bin of the plurality of overflow bins. The overflow bin is configured to hold bits of the first cache line, the compressed first cache line, the second cache line, and/or the compressed second cache line.

在一些具体实施中，第二组数据位还被配置为保持指示经压缩高速缓存行中的溢出以及第二组数据位的结束的多个控制位。In some implementations, the second set of data bits is further configured to hold a plurality of control bits that indicate an overflow in the compressed cache line and an end of the second set of data bits.

另一方面，提供了一种非暂态计算机可读介质。该非暂态计算机可读介质在其上存储计算机可执行指令，该计算机可执行指令在被处理器执行时使得处理器执行本文所述的方法中的任一方法。In another aspect, a non-transitory computer-readable medium is provided. The non-transitory computer-readable medium stores computer-executable instructions thereon, which, when executed by a processor, cause the processor to perform any of the methods described herein.

所附权利要求的范围内的系统、方法和设备的各种实施方案各自具有若干方面，这些方面中没有任何单一方面唯一地负责本文所述的属性。在不限制所附权利要求的范围的情况下，在考虑本公开后并且具体地在考虑名称为“具体实施方式”的部分后，将理解如何使用各种实施方案的各方面来实现存储器设备的存储装置中的较高吞吐量。Various embodiments of systems, methods, and devices within the scope of the appended claims each have several aspects, no single one of which is solely responsible for the attributes described herein. Without limiting the scope of the appended claims, after considering this disclosure, and in particular after considering the section entitled "Detailed Description of the Invention," it will be understood how aspects of the various embodiments can be used to achieve higher throughput in a storage device of a memory device.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了可以更详细地理解本公开，可以参照各种实施方案的特征进行更具体的描述，各种实施方案中一些实施方案在附图中示出。然而，附图仅示出了本公开的更相关的特征，并且因此不应被认为是限制性的，因为该描述可允许其他有效特征。In order that the present disclosure may be understood in more detail, a more specific description may be given with reference to the features of various embodiments, some of which are illustrated in the accompanying drawings. However, the accompanying drawings only illustrate the more relevant features of the present disclosure and are therefore not to be considered limiting, as the description may allow for other effective features.

图1是根据一些具体实施的具有高速缓存行压缩/解压缩硬件引擎的示例系统的示意图。1 is a diagram of an example system with a cache line compression/decompression hardware engine according to some implementations.

图2A是根据一些具体实施的经压缩存储器系统200的示意图。FIG. 2A is a schematic diagram of a compressed memory system 200 according to some implementations.

图2B是示出根据一些具体实施的用于使用图2A所示的经压缩存储器系统来读取高速缓存行的方法的序列图。2B is a sequence diagram illustrating a method for reading a cache line using the compressed memory system shown in FIG. 2A , according to some implementations.

图2C是示出根据一些具体实施的用于使用图2A所示的经压缩存储器系统来写入高速缓存行的方法的序列图。2C is a sequence diagram illustrating a method for writing a cache line using the compressed memory system shown in FIG. 2A , according to some implementations.

图3是根据一些具体实施的用于经压缩存储器系统的封装方案的示意图。3 is a schematic diagram of a packaging scheme for a compressed memory system according to some implementations.

图4至图9示出了根据一些具体实施的示例经压缩行布局。4-9 illustrate example compressed row layouts according to some implementations.

图10示出了根据一些具体实施的用于读取高优先级高速缓存行的示例方法的流程图。10 illustrates a flow chart of an example method for reading a high priority cache line according to some implementations.

图11示出了根据一些具体实施的用于读取低优先级高速缓存行的示例方法的流程图。11 illustrates a flow chart of an example method for reading a low priority cache line according to some implementations.

图12示出了根据一些具体实施的用于写入高优先级高速缓存行的示例方法的流程图。12 illustrates a flow chart of an example method for writing a high priority cache line according to some implementations.

图13示出了根据一些具体实施的用于写入低优先级高速缓存行的示例方法的流程图。13 illustrates a flow chart of an example method for writing a low priority cache line according to some implementations.

图14是根据一些具体实施的基于处理器的系统的示例经压缩存储器系统的框图。14 is a block diagram of an example compressed memory system of a processor-based system according to some implementations.

根据常规实践，附图中所示的各种特征可以不按比例绘制。相应地，为了清晰起见，各种特征的尺寸可能被任意地放大或缩小。此外，附图中的一些附图可能未描绘给定系统、方法或设备的组件中的所有组件。最终，类似的附图标号能够被用来贯穿说明书和附图标示类似的特征。According to conventional practice, the various features shown in the drawings may not be drawn to scale. Accordingly, for clarity, the sizes of the various features may be arbitrarily enlarged or reduced. In addition, some of the drawings in the drawings may not depict all components in the components of a given system, method or device. Ultimately, similar reference numerals can be used throughout the specification and the drawings to indicate similar features.

具体实施方式Detailed ways

本文描述了许多细节以提供对附图中所示的示例具体实施的透彻理解。然而，可在没有许多特定细节的情况下实践一些实施方案，并且权利要求的范围仅由权利要求中特定陈述的那些特征和方面限制。此外，没有详尽地描述公知的方法、组件和电路，以免不必要地模糊本文所述的具体实施的更相关的方面。Numerous details are described herein to provide a thorough understanding of the example implementations shown in the accompanying drawings. However, some embodiments may be practiced without many of the specific details, and the scope of the claims is limited only by those features and aspects specifically recited in the claims. In addition, well-known methods, components, and circuits are not described in detail so as not to unnecessarily obscure the more relevant aspects of the implementations described herein.

图1是根据一些具体实施的具有高速缓存行压缩/解压缩硬件引擎110的示例系统100的示意图。系统100包括芯片102，该芯片继而包括处理器104、2级(L2)高速缓存或L2紧密耦合存储器(TCM)106以及压缩/解压缩硬件引擎110。处理器104包括存储器管理单元(MMU)、数据高速缓存和指令高速缓存。数据高速缓存和指令高速缓存构成1级(L1)高速缓存。L2高速缓存或L2 TCM 106包括L2控件和L2高速缓存标签或状态模块108。该系统还包括用于访问主外部存储器(例如，双倍速率动态随机存取存储器(DRAM)，有时称为DDR)的存储器控制器114。尽管图1中未示出，但是一些具体实施可以包括常规外围设备、其他存储装置、快速外围组件互连(PCI)接口、直接存储器访问(DMA)控制器和/或集成存储器控制器(IMC)。一些具体实施可包括两个或更多个处理器或处理器块，和/或用于存储由处理器块中的任一个处理器块使用或在处理器块中的每个处理器块之间共享的高速缓存数据的共享3级(L3)高速缓存。一些具体实施包括内部系统总线，该内部系统总线允许处理器块中的每个处理器块访问共享L3高速缓存和/或包括存储器控制器114的其他共享资源。基于处理器的系统100被配置为以未压缩形式将高速缓存数据存储在高速缓存存储器中的高速缓存条目中。高速缓存条目可以是高速缓存行。例如，高速缓存存储器可以是L2高速缓存存储器(例如，L2高速缓存106)。高速缓存存储器可专用于处理器104中的处理器核心或者在多个处理器核心之间共享。基于处理器的系统100包括系统存储器112，该系统存储器包括被配置为将数据以压缩形式存储在存储器条目(其可以为存储器行)中的经压缩数据区域。例如，系统存储器112可以包括双倍数据速率(DDR)静态随机存取存储器(SRAM)。处理器104被配置为在读取和写入操作期间访问系统存储器112以执行软件指令并执行其他处理器操作。FIG. 1 is a schematic diagram of an example system 100 having a cache line compression/decompression hardware engine 110 according to some implementations. System 100 includes a chip 102, which in turn includes a processor 104, a level 2 (L2) cache or L2 tightly coupled memory (TCM) 106, and a compression/decompression hardware engine 110. Processor 104 includes a memory management unit (MMU), a data cache, and an instruction cache. The data cache and the instruction cache constitute a level 1 (L1) cache. L2 cache or L2 TCM 106 includes an L2 control and an L2 cache tag or status module 108. The system also includes a memory controller 114 for accessing a main external memory (e.g., a double data rate dynamic random access memory (DRAM), sometimes referred to as DDR). Although not shown in FIG. 1, some implementations may include conventional peripherals, other storage devices, a peripheral component interconnect (PCI) express interface, a direct memory access (DMA) controller, and/or an integrated memory controller (IMC). Some implementations may include two or more processors or processor blocks, and/or a shared level 3 (L3) cache for storing cache data used by any one of the processor blocks or shared between each of the processor blocks. Some implementations include an internal system bus that allows each of the processor blocks to access the shared L3 cache and/or other shared resources including a memory controller 114. The processor-based system 100 is configured to store cache data in an uncompressed form in a cache entry in a cache memory. The cache entry may be a cache line. For example, the cache memory may be an L2 cache memory (e.g., L2 cache 106). The cache memory may be dedicated to a processor core in the processor 104 or shared between multiple processor cores. The processor-based system 100 includes a system memory 112, which includes a compressed data area configured to store data in a compressed form in a memory entry (which may be a memory line). For example, the system memory 112 may include a double data rate (DDR) static random access memory (SRAM). Processor 104 is configured to access system memory 112 during read and write operations to execute software instructions and to perform other processor operations.

图2A是根据一些具体实施的经压缩存储器系统200的示意图。经压缩存储器系统被配置为压缩来自高速缓存存储器中被逐出的高速缓存条目的高速缓存数据，并且读取用于访问压缩系统存储器中的物理地址的元数据以写入经压缩高速缓存数据。提供在经压缩数据区域(例如，块202、204、206)中存储经压缩数据的能力增加了基于处理器的系统100的存储容量，超过系统存储器112的物理存储器大小。在一些具体实施中，处理器104使用虚拟寻址。执行虚拟至物理地址转换以有效地寻址经压缩数据区域，而不知道压缩系统和经压缩数据区域的压缩大小。就这一点而言，在经压缩存储器系统200中提供压缩或解压缩引擎212以压缩来自处理器104的待写入经压缩数据区域中的未压缩数据，并且解压缩从经压缩数据区域接收的经压缩数据以将此类数据以未压缩形式提供到处理器104。压缩或解压缩引擎212可包括压缩电路，该压缩电路被配置为压缩来自处理器104的待写入经压缩数据区域中的数据。例如，如图1所示，压缩电路可被配置为将64字节(64B)数据字压缩到48字节(48B)经压缩数据字、32字节(32B)经压缩数据字或16字节(16B)经压缩数据字，这些经压缩数据字可存储在相应的存储块202(64B)、204(48B)、206(32B)和204(16B)中，这些存储块各自具有比系统存储器112的整个存储器条目中的每一者小的大小。如果来自处理器104的未压缩数据不能被向下压缩到配置用于经压缩存储器系统102的下一个较小大小的存储块125，则此类未压缩数据被未压缩地存储在存储器条目中的一者的整个宽度上。例如，存储器条目中的一者的宽度可以为64B，并且因此可存储64B存储块，诸如存储块202(64B)。压缩或解压缩引擎212还包括解压缩电路，该解压缩电路被配置为解压缩来自经压缩数据区域的待提供给处理器104的经压缩数据。FIG. 2A is a schematic diagram of a compressed memory system 200 according to some specific implementations. The compressed memory system is configured to compress cache data from cache entries evicted from the cache memory, and read metadata for accessing physical addresses in the compression system memory to write the compressed cache data. Providing the ability to store compressed data in a compressed data area (e.g., blocks 202, 204, 206) increases the storage capacity of the processor-based system 100, exceeding the physical memory size of the system memory 112. In some specific implementations, the processor 104 uses virtual addressing. Perform virtual to physical address conversion to effectively address the compressed data area without knowing the compression system and the compressed size of the compressed data area. In this regard, a compression or decompression engine 212 is provided in the compressed memory system 200 to compress uncompressed data to be written in the compressed data area from the processor 104, and decompress the compressed data received from the compressed data area to provide such data to the processor 104 in an uncompressed form. The compression or decompression engine 212 may include compression circuitry configured to compress data from the processor 104 to be written into the compressed data area. For example, as shown in FIG1 , the compression circuitry may be configured to compress a 64-byte (64B) data word into a 48-byte (48B) compressed data word, a 32-byte (32B) compressed data word, or a 16-byte (16B) compressed data word, which may be stored in corresponding storage blocks 202 (64B), 204 (48B), 206 (32B), and 204 (16B), each of which has a size smaller than each of the entire memory entries of the system memory 112. If the uncompressed data from the processor 104 cannot be compressed down to the next smaller size storage block 125 configured for the compressed memory system 102, such uncompressed data is stored uncompressed over the entire width of one of the memory entries. For example, one of the memory entries may be 64B wide and may therefore store a 64B memory block, such as memory block 202 (64B). Compression or decompression engine 212 also includes decompression circuitry configured to decompress compressed data from the compressed data area to be provided to processor 104.

然而，为了提供更快的存储器访问而不需要压缩和解压缩，提供了高速缓存存储器214(例如，L2高速缓存106)。高速缓存存储器214中的高速缓存条目被配置为以未压缩形式存储高速缓存数据。高速缓存条目中的每一者可具有与存储器条目中的每一者相同的宽度以用于执行有效存储器读取及写入操作。高速缓存条目由相应的虚拟地址(“VA”)标签(例如，存储在L2高速缓存标签108中的标签)访问，因为如上文所讨论的，经压缩存储器系统200向处理器104提供比经压缩数据区域中提供的物理地址空间多的可寻址存储器空间。当处理器104发出针对存储器读取操作的存储器读取请求时，使用存储器读取请求的虚拟地址来搜索高速缓存存储器214以确定虚拟地址是否匹配高速缓存条目的虚拟地址标签中的一者。如果是，则发生高速缓存命中，并且将高速缓存条目中的命中高速缓存条目中的高速缓存数据返回到处理器104，而不需要解压缩高速缓存数据。然而，因为高速缓存条目的数目小于存储器条目的数目，所以在高速缓存存储器214中不包含用于存储器读取请求的高速缓存数据的情况下，可能发生高速缓存未命中。However, in order to provide faster memory access without the need for compression and decompression, a cache memory 214 (e.g., L2 cache 106) is provided. The cache entries in the cache memory 214 are configured to store cache data in an uncompressed form. Each of the cache entries may have the same width as each of the memory entries for performing efficient memory read and write operations. The cache entries are accessed by corresponding virtual address ("VA") tags (e.g., tags stored in the L2 cache tags 108) because, as discussed above, the compressed memory system 200 provides the processor 104 with more addressable memory space than the physical address space provided in the compressed data area. When the processor 104 issues a memory read request for a memory read operation, the virtual address of the memory read request is used to search the cache memory 214 to determine whether the virtual address matches one of the virtual address tags of the cache entry. If so, a cache hit occurs, and the cache data in the hit cache entry in the cache entry is returned to the processor 104 without the need to decompress the cache data. However, because the number of cache entries is less than the number of memory entries, a cache miss may occur if cache data for a memory read request is not contained in cache memory 214 .

因此，继续参照图2A，响应于高速缓存未命中，高速缓存存储器214被配置为将存储器读取请求的虚拟地址提供到压缩电路以从经压缩数据区域检索数据。就这一点而言，压缩电路可以首先查询包含元数据高速缓存条目的元数据高速缓存，这些元数据高速缓存条目各自包含由虚拟地址索引的元数据。元数据高速缓存的访问速度比经压缩数据区域更快。元数据是用于访问经压缩数据区域中的物理地址(PA)以访问包含虚拟地址的经压缩数据的存储器条目的数据，诸如指针。如果元数据高速缓存包含用于存储器读取请求的元数据，则压缩电路使用元数据来访问经压缩数据区域中的存储器条目中的正确存储器条目，以向解压缩电路提供对应的经压缩数据区域。如果元数据高速缓存不包含用于存储器读取请求的元数据，则压缩电路将用于存储器读取请求的虚拟地址提供到元数据电路210，该元数据电路在用于基于处理器的系统100中的所有虚拟地址空间的对应元数据条目中包含元数据。因此，元数据电路210可由存储器读取请求的虚拟地址线性地寻址。元数据用于存储器读取请求访问经压缩数据区域中的存储器条目中的正确存储器条目，以向解压缩电路提供对应的经压缩数据区域。Therefore, continuing to refer to FIG. 2A, in response to a cache miss, the cache memory 214 is configured to provide the virtual address of the memory read request to the compression circuit to retrieve data from the compressed data area. In this regard, the compression circuit may first query the metadata cache containing metadata cache entries, each of which contains metadata indexed by the virtual address. The metadata cache is accessed faster than the compressed data area. Metadata is data, such as a pointer, used to access a physical address (PA) in the compressed data area to access a memory entry containing compressed data of the virtual address. If the metadata cache contains metadata for the memory read request, the compression circuit uses the metadata to access the correct memory entry in the memory entry in the compressed data area to provide the corresponding compressed data area to the decompression circuit. If the metadata cache does not contain metadata for the memory read request, the compression circuit provides the virtual address for the memory read request to the metadata circuit 210, which contains metadata in the corresponding metadata entries for all virtual address spaces in the processor-based system 100. Therefore, the metadata circuit 210 can be linearly addressed by the virtual address of the memory read request. The metadata is used for a memory read request to access a correct one of the memory entries in the compressed data region to provide the corresponding compressed data region to the decompression circuitry.

继续参照图2，解压缩电路响应于存储器读取请求而接收经压缩数据区域。解压缩电路将经压缩数据区域解压缩成未压缩数据，然后可以将未压缩数据提供给处理器104。未压缩数据还存储在高速缓存存储器214中。然而，如果高速缓存存储器214不具有高速缓存条目中的可用高速缓存条目，则高速缓存存储器214可将现有高速缓存条目中的一者逐出到经压缩数据区域以腾出空间来存储未压缩数据。2, the decompression circuit receives the compressed data region in response to the memory read request. The decompression circuit decompresses the compressed data region into uncompressed data, and then can provide the uncompressed data to the processor 104. The uncompressed data is also stored in the cache memory 214. However, if the cache memory 214 does not have an available cache entry in the cache entries, the cache memory 214 can evict one of the existing cache entries to the compressed data region to make room to store the uncompressed data.

为此，高速缓存存储器214首先向压缩电路发送被逐出的高速缓存条目的虚拟地址和未压缩高速缓存数据。压缩电路接收被逐出的高速缓存条目的虚拟地址和未压缩高速缓存数据。压缩电路发起对元数据高速缓存的元数据读取操作以获得与虚拟地址相关联的元数据。在元数据读取操作期间、前或后，压缩电路将未压缩高速缓存数据压缩成待存储在经压缩数据区域中的经压缩数据。如果对元数据高速缓存的元数据读取操作导致高速缓存未命中，则元数据高速缓存向系统存储器112中的元数据电路210发出元数据读取操作，以获得与虚拟地址相关联的元数据。元数据高速缓存然后被暂停。因为对经压缩数据区域的访问可花费比处理器104可发出存储器访问操作长得多的时间，所以从处理器104接收的用于后续存储器写入请求的未压缩数据可缓冲在存储器请求缓冲器中。To this end, the cache memory 214 first sends the virtual address and uncompressed cache data of the evicted cache entry to the compression circuit. The compression circuit receives the virtual address and uncompressed cache data of the evicted cache entry. The compression circuit initiates a metadata read operation on the metadata cache to obtain metadata associated with the virtual address. During, before, or after the metadata read operation, the compression circuit compresses the uncompressed cache data into compressed data to be stored in the compressed data area. If the metadata read operation on the metadata cache results in a cache miss, the metadata cache issues a metadata read operation to the metadata circuit 210 in the system memory 112 to obtain the metadata associated with the virtual address. The metadata cache is then suspended. Because access to the compressed data area can take much longer than the processor 104 can issue a memory access operation, the uncompressed data received from the processor 104 for subsequent memory write requests can be buffered in the memory request buffer.

在元数据从经压缩数据区域返回以更新元数据高速缓存后，元数据高速缓存将元数据作为元数据提供给压缩电路。压缩电路确定经压缩数据区域的新压缩大小是否拟合到经压缩数据区域中与先前用于存储被逐出的高速缓存条目的虚拟地址的数据相同的存储块大小。例如，处理器104可能已经更新了被逐出的高速缓存条目中的高速缓存数据，因为其最后存储在经压缩数据区域中。如果需要新存储块来存储被逐出的高速缓存条目的经压缩数据区域，则压缩电路将指向与被逐出的高速缓存条目的虚拟地址相关联的经压缩存储器系统200中的当前存储块的指针再循环利用到指向经压缩数据区域中的可用存储块的指针的空闲存储器列表(例如，空闲64B块的列表216、空闲48B块的列表218、空闲32B块的列表220和空闲16B块的列表222)中的一者。然后，压缩电路获得从空闲存储器列表中的一者到经压缩数据区域中的期望存储块大小的新可用存储块的指针，以存储被逐出的高速缓存条目的经压缩数据区域。然后，压缩电路将被逐出的高速缓存条目的经压缩数据区域存储与从元数据确定的被逐出的高速缓存条目的虚拟地址相关联的经压缩数据区域中的存储块中。After the metadata is returned from the compressed data area to update the metadata cache, the metadata cache provides the metadata as metadata to the compression circuit. The compression circuit determines whether the new compressed size of the compressed data area fits into the same storage block size in the compressed data area as the data previously used to store the virtual address of the evicted cache entry. For example, the processor 104 may have updated the cache data in the evicted cache entry because it was last stored in the compressed data area. If a new storage block is needed to store the compressed data area of the evicted cache entry, the compression circuit recycles the pointer to the current storage block in the compressed memory system 200 associated with the virtual address of the evicted cache entry to one of the free memory lists (e.g., list 216 of free 64B blocks, list 218 of free 48B blocks, list 220 of free 32B blocks, and list 222 of free 16B blocks) of pointers to available storage blocks in the compressed data area. The compression circuit then obtains a pointer from one of the free memory lists to a new available memory block of the desired memory block size in the compressed data region to store the compressed data region of the evicted cache entry. The compression circuit then stores the compressed data region of the evicted cache entry in a memory block in the compressed data region associated with the virtual address of the evicted cache entry determined from the metadata.

如果将新存储块分配给被逐出的高速缓存条目的虚拟地址，则基于指向新存储块的指针来更新元数据高速缓存条目中对应于被逐出的高速缓存条目的虚拟地址标签中的虚拟地址标签的元数据高速缓存条目中的元数据。然后，元数据高速缓存基于指向新存储块125的指针来更新元数据条目中对应于元数据高速缓存中的虚拟地址的元数据条目中的元数据。If a new memory block is assigned to the virtual address of the evicted cache entry, metadata in the metadata cache entry corresponding to the virtual address tag in the virtual address tag of the evicted cache entry is updated based on the pointer to the new memory block. Then, the metadata cache updates metadata in the metadata entry corresponding to the virtual address in the metadata cache based on the pointer to the new memory block 125.

因为元数据电路210的元数据存储在系统存储器112中，所以元数据电路210可能消耗过量的系统存储器112，从而负面地影响系统性能。因此，期望最小化存储元数据所需的系统存储器112的量，同时仍然提供有效的数据压缩。就这一点而言，经压缩存储器系统200的一些具体实施减小元数据大小。本文所述的技术可用于：Because metadata of metadata circuit 210 is stored in system memory 112, metadata circuit 210 may consume an excessive amount of system memory 112, thereby negatively affecting system performance. Therefore, it is desirable to minimize the amount of system memory 112 required to store metadata while still providing effective data compression. In this regard, some specific implementations of compressed memory system 200 reduce metadata size. The techniques described herein may be used to:

图2A示出了根据一些具体实施的通过元数据线性地索引的未压缩数据。未压缩高速缓存行由3字节的元数据表示，并且元数据指向大小为16字节、32字节、48字节或64字节的经压缩块(例如，块202、204和206)。在一些具体实施中，如果整个未压缩行是0，则元数据电路210在不指向任何经压缩块的情况下对其进行标记。在一些具体实施中，经压缩存储器系统200包括保持块202、204和206的基指针的寄存器文件208，并且元数据电路210为块提供从基指针的偏移。2A shows uncompressed data indexed linearly by metadata according to some implementations. An uncompressed cache line is represented by 3 bytes of metadata, and the metadata points to a compressed block (e.g., blocks 202, 204, and 206) of size 16 bytes, 32 bytes, 48 bytes, or 64 bytes. In some implementations, if the entire uncompressed line is 0, the metadata circuit 210 marks it as not pointing to any compressed block. In some implementations, the compressed memory system 200 includes a register file 208 that holds base pointers for blocks 202, 204, and 206, and the metadata circuit 210 provides an offset from the base pointer for the block.

图2B是示出根据一些具体实施的用于使用图2A所示的经压缩存储器系统来读取高速缓存行的方法的序列图。在L2高速缓存未命中之后，将外部存储器的读取地址发送到压缩或解压缩引擎212，该压缩或解压缩引擎取回已缓存的L2元数据(如果其可用的话)。如果元数据在高速缓存中不可用，则从外部存储器(例如，DDR)读取元数据。基于元数据，从经压缩数据区域的块中从外部存储器取出经压缩数据。经压缩数据然后由解压缩电路解压缩，并且未压缩数据返回到处理器。利用这种方法，0高速缓存行(例如，64字节的0)需要一次读取，并且其他高速缓存行需要两次读取，即一次读取针对元数据，一次读取针对经压缩块。FIG. 2B is a sequence diagram illustrating a method for reading cache lines using the compressed memory system shown in FIG. 2A according to some specific implementations. After an L2 cache miss, the read address of the external memory is sent to the compression or decompression engine 212, which retrieves the cached L2 metadata (if it is available). If the metadata is not available in the cache, the metadata is read from the external memory (e.g., DDR). Based on the metadata, the compressed data is taken out from the external memory from the block of the compressed data area. The compressed data is then decompressed by the decompression circuit, and the uncompressed data is returned to the processor. Using this method, a 0 cache line (e.g., 64 bytes of 0) requires one read, and other cache lines require two reads, i.e., one read for the metadata and one read for the compressed block.

图2C是示出根据一些具体实施的用于使用图2A所示的经压缩存储器系统来写入高速缓存行的方法的序列图。在L2高速缓存未命中之后，将未压缩数据发送到压缩或解压缩引擎212，并且压缩电路将数据压缩到新大小。类似于图2B，如果元数据在元数据高速缓存中不可用，则从外部存储器检索元数据。如果元数据在元数据高速缓存中可用，则从元数据高速缓存中检索元数据。根据经压缩数据的大小，可以从空闲列表(例如，列表216、218、220、222中的一者)检索用于新大小的新索引，并且将经压缩数据写入经压缩数据区域。在一些情况下，旧索引可以被再循环利用，并且元数据在外部存储器中被更新。利用该方法，对于高速缓存行写入，根据经压缩块的大小，如果高速缓存行全为0，则该写入需要针对元数据的一次读取-修改-写入，其他高速缓存行需要针对元数据的一次读取-修改-写入以及针对经压缩数据的一次写入或一次读取-修改-写入。FIG. 2C is a sequence diagram illustrating a method for writing cache lines using the compressed memory system shown in FIG. 2A according to some specific implementations. After an L2 cache miss, the uncompressed data is sent to the compression or decompression engine 212, and the compression circuit compresses the data to a new size. Similar to FIG. 2B, if the metadata is not available in the metadata cache, the metadata is retrieved from the external memory. If the metadata is available in the metadata cache, the metadata is retrieved from the metadata cache. Depending on the size of the compressed data, a new index for the new size can be retrieved from a free list (e.g., one of lists 216, 218, 220, 222), and the compressed data is written to the compressed data area. In some cases, the old index can be recycled and the metadata is updated in the external memory. Using this method, for cache line writes, depending on the size of the compressed block, if the cache line is all 0, the write requires a read-modify-write for the metadata, and other cache lines require a read-modify-write for the metadata and a write or a read-modify-write for the compressed data.

图3是根据一些具体实施的用于经压缩存储器系统的封装方案300的示意图，该经压缩存储器系统帮助避免上文所描述的元数据的额外读取和写入。图3示出了被划分为大小相等的高优先级部分302和低优先级部分304的存储区域(例如，压缩数据候选)。在实际中，存储区域可以被划分为具有不同优先级等级的任何数目的分区，并且/或者分区可以具有不相等的数据量。选择并压缩一个高优先级高速缓存行和一个低优先级高速缓存行以拟合在经压缩高速缓存行306中。在经压缩高高速缓存行和低高速缓存行未拟合在一个经压缩高速缓存行中的情况下，在经压缩行中保留溢出指针，其指向溢出块(例如，溢出仓308、310或312，每个仓具有不同的大小)。为了避免读取-修改-写入，一些具体实施使用多个溢出块(例如，溢出仓保持32字节的数据，溢出仓310保持64字节的数据，并且溢出仓312保持96字节的数据)。为了改善高优先级行的写入性能，经压缩低优先级行仅占据经压缩行的下半部。FIG. 3 is a schematic diagram of a packaging scheme 300 for a compressed memory system according to some specific implementations, which helps avoid the additional reading and writing of metadata described above. FIG. 3 shows a storage area (e.g., compressed data candidate) divided into a high priority portion 302 and a low priority portion 304 of equal size. In practice, the storage area can be divided into any number of partitions with different priority levels, and/or the partitions can have unequal amounts of data. A high priority cache line and a low priority cache line are selected and compressed to fit in a compressed cache line 306. In the case where the compressed high cache line and the low cache line do not fit in a compressed cache line, an overflow pointer is retained in the compressed line, which points to an overflow block (e.g., an overflow bin 308, 310, or 312, each bin having a different size). In order to avoid read-modify-write, some specific implementations use multiple overflow blocks (e.g., an overflow bin holds 32 bytes of data, an overflow bin 310 holds 64 bytes of data, and an overflow bin 312 holds 96 bytes of data). To improve write performance of high priority rows, the compressed low priority rows only occupy the lower half of the compressed rows.

图4至图9示出了根据一些具体实施的示例经压缩行布局。这些示例是为了例示而提供的，并且不应当被解释为意味着布局或组织仅限于这些示例。参照图4，压缩行(CL)400可包括：(i)指示用于高优先级高速缓存行的经压缩数据的HP 402；(ii)指示用于低优先级高速缓存行的经压缩数据的LP 404；(iii)是指示高优先级高速缓存行是否被压缩的标志的HPC 406。例如，1指示压缩，0指示未压缩；(iv)是指示低优先级高速缓存行是否被压缩的标志的LPC 408。例如，1指示压缩，0指示未压缩；(v)指示溢出仓大小的OFB 410。例如，0指示32字节，1指示64字节，2指示96字节，而3指示没有溢出；(vi)指示溢出指针索引的OFP412；和/或(vii)指示低优先级结束位索引的LPE 414，其指示低优先级经压缩数据在何处结束。此信息可用于扩展写入路径上的高优先级高速缓存行。值可以在11与255之间。0可能意味着没有剩余空间供HP扩展。具体实施可以使用任何数目的这些字段。在一些具体实施中，高优先级压缩或解压缩从HPC位朝向OFP开始，如箭头416所指示的。在一些具体实施中，低优先级压缩或解压缩从OFP开始朝向LPE所指向的地方(如箭头418所指示的)，并且如果其具有溢出则在溢出区域中从右向左继续(如箭头420所指示的)。根据一些具体实施，CL的区域420未被CL使用，并且/或者HP 402和LP 404隔开间隙422。在图4所示的示例中，32个字节专用于HP，且32个字节可由HP/LP使用，并且存在31个控制位，包括OFP 412、LPE 414、OFB410和LPC 408。根据一些具体实施，根据HP和LP压缩大小，经压缩行可具有稍微不同的布局，如下文参考图5至图10进一步描述的。例如，可以减少控制位的数目(例如，因为不存在溢出，不需要OFP，而这些位被LP消耗)。4 to 9 illustrate example compressed row layouts according to some specific implementations. These examples are provided for illustration purposes and should not be interpreted to mean that the layout or organization is limited to these examples. Referring to FIG. 4 , a compressed row (CL) 400 may include: (i) an HP 402 indicating compressed data for a high priority cache line; (ii) an LP 404 indicating compressed data for a low priority cache line; (iii) an HP 406 which is a flag indicating whether the high priority cache line is compressed. For example, 1 indicates compression and 0 indicates uncompressed; (iv) an LPC 408 which is a flag indicating whether the low priority cache line is compressed. For example, 1 indicates compression and 0 indicates uncompressed; (v) an OFB 410 indicating the overflow bin size. For example, 0 indicates 32 bytes, 1 indicates 64 bytes, 2 indicates 96 bytes, and 3 indicates no overflow; (vi) OFP 412 indicating an overflow pointer index; and/or (vii) LPE 414 indicating a low priority end bit index, which indicates where the low priority compressed data ends. This information can be used to extend the high priority cache line on the write path. The value can be between 11 and 255. 0 may mean that there is no remaining space for HP to expand. A specific implementation may use any number of these fields. In some specific implementations, high priority compression or decompression starts from the HPC bit toward the OFP, as indicated by arrow 416. In some specific implementations, low priority compression or decompression starts from the OFP toward where the LPE points (as indicated by arrow 418), and if it has an overflow, it continues from right to left in the overflow area (as indicated by arrow 420). According to some specific implementations, area 420 of the CL is not used by the CL, and/or the HP 402 and LP 404 are separated by a gap 422. 4, 32 bytes are dedicated to HP, and 32 bytes are usable by HP/LP, and there are 31 control bits, including OFP 412, LPE 414, OFB 410, and LPC 408. According to some implementations, the compressed rows may have slightly different layouts depending on the HP and LP compression sizes, as further described below with reference to Figures 5 to 10. For example, the number of control bits may be reduced (e.g., because there is no overflow, OFP is not needed, and these bits are consumed by LP).

图5示出了根据一些具体实施的当HP小于256位并且低优先级行拟合在CL中时的示例经压缩行布局500。根据一些具体实施，HPC被设定为1，OFB被设定为0x11，并且LPC被设定为1。仅存在11个控制位(与图4中的31位相反)。5 shows an example compressed row layout 500 when HP is less than 256 bits and the low priority row fits in the CL according to some implementations. According to some implementations, HPC is set to 1, OFB is set to 0x11, and LPC is set to 1. There are only 11 control bits (as opposed to 31 bits in FIG. 4 ).

图6示出了根据一些具体实施的当HP和LP都拟合在一个CL中并且HP大于255位时的示例经压缩行布局600。根据一些具体实施，HPC被设定为1，OFB被设定为0x11，并且LPC被设定为1。在该示例中，HP已经扩展到CL的下半部中。6 shows an example compressed row layout 600 when both HP and LP fit in one CL and HP is greater than 255 bits according to some implementations. According to some implementations, HPC is set to 1, OFB is set to 0x11, and LPC is set to 1. In this example, HP has been extended into the lower half of the CL.

图7示出了根据一些具体实施的当HP小于256位并且LP具有溢出时的示例经压缩行布局700。根据一些具体实施，HPC被设定为1，OFB是0x0与0x2之间的任何值，并且LPC被设定为1或0，并且OFP指向溢出块424。7 shows an example compressed row layout 700 when HP is less than 256 bits and LP has overflow according to some implementations. According to some implementations, HPC is set to 1, OFB is any value between 0x0 and 0x2, and LPC is set to 1 or 0, and OFP points to overflow block 424.

图8示出了根据一些具体实施的当HP大于255位但小于480位并且LP具有溢出时的示例经压缩行布局800。根据一些具体实施，HPC被设定为1，OFB被设定为0与2之间的值，LPC是1或0，并且OFP指向溢出块426。8 shows an example compressed row layout 800 when HP is greater than 255 bits but less than 480 bits and LP has overflow according to some implementations. According to some implementations, HPC is set to 1, OFB is set to a value between 0 and 2, LPC is 1 or 0, and OFP points to overflow block 426.

图9示出了根据一些具体实施的当HP等于512位(不可压缩的)并且LP具有溢出时的示例经压缩行布局900。根据一些具体实施，HPC被设定为0，OFB被设定为0与2之间的值，LPC被设定为0或1，并且OFP指向溢出块428。溢出块428包括HP 430的剩余32位、间隙432和LP 404。9 shows an example compressed row layout 900 when HP is equal to 512 bits (uncompressible) and LP has overflow according to some implementations. According to some implementations, HPC is set to 0, OFB is set to a value between 0 and 2, LPC is set to 0 or 1, and OFP points to overflow block 428. Overflow block 428 includes the remaining 32 bits of HP 430, gap 432, and LP 404.

示例读取/写入开销和统计值Example read/write overhead and statistics

对于高速缓存行读取，一个经压缩高速缓存行中的高优先级高速缓存行和低优先级高速缓存行拟合需要一次读取，并且其他高速缓存行需要两次读取，一次针对经压缩行，一次针对溢出块。对于高速缓存行写入，以255位进行的经压缩高优先级高速缓存行拟合需要一次写入，以256位至480位进行的经压缩高优先级高速缓存行拟合需要一次读取-修改-写入，并且其他不可压缩的高优先级高速缓存行(>480位)需要针对经压缩行的一次读取-修改-写入以及针对溢出块的一次读取-修改-写入；经压缩行中的低优先级高速缓存行拟合需要一次读取-修改-写入，未拟合在经压缩行中的低优先级高速缓存行需要针对经压缩行的一次读取-修改-写入以及针对溢出块的一次读取-修改-写入。For cache line reads, high priority cache lines and low priority cache lines in a compressed cache line fit requires one read, and other cache lines require two reads, one for the compressed line and one for the overflow block. For cache line writes, compressed high priority cache lines fit at 255 bits require one write, compressed high priority cache lines fit at 256 bits to 480 bits require one read-modify-write, and other incompressible high priority cache lines (>480 bits) require one read-modify-write for the compressed line and one read-modify-write for the overflow block; low priority cache lines in a compressed line fit require one read-modify-write, and low priority cache lines not fit in a compressed line require one read-modify-write for the compressed line and one read-modify-write for the overflow block.

下表示出了调制解调器数据(50017075字节的压缩候选数据)的示例高速缓存行读取/写入成本和统计值。The following table shows example cache line read/write costs and statistics for modem data (50017075 bytes of compression candidate data).

根据统计值，84.7％的高优先级高速缓存行对于高速缓存行读取/写入仅需要一次读取/一次写入。在上面的基础上，附加11％的高优先级高速缓存行需要针对高速缓存行读取/写入的一次读取/一次读取-修改-写入。对于高速缓存行读取/写入，仅4.3％的高优先级高速缓存行需要两次读取/两次读取-修改-写入。78.6％的低优先级高速缓存行需要针对高速缓存行读取/写入的一次读取/一次读取-修改-写入。21.4％的低优先级高速缓存行需要针对高速缓存行读取/写入的两次读取/两次读取-修改-写入。According to the statistical values, 84.7% of high-priority cache lines require only one read/one write for cache line read/write. On the basis of the above, an additional 11% of high-priority cache lines require one read/one read-modify-write for cache line read/write. Only 4.3% of high-priority cache lines require two reads/two read-modify-writes for cache line read/write. 78.6% of low-priority cache lines require one read/one read-modify-write for cache line read/write. 21.4% of low-priority cache lines require two reads/two read-modify-writes for cache line read/write.

在下面描述的图10至图13中，存储器读取和写入由矩形框内的图案指示，并且所有其他操作(除了接收读取请求或写入请求的初始块之外)被示为没有任何图案。此外，被预测为高概率的操作以实线示出(即，实线箭头指示高概率的路径)，并且其他路径被示为虚线或虚线箭头。路径的概率基于特定数据集(例如，调制解调器数据)的数据分布。In Figures 10 to 13 described below, memory reads and writes are indicated by patterns within rectangular boxes, and all other operations (except for receiving an initial block of a read request or a write request) are shown without any patterns. In addition, operations predicted to be high probability are shown in solid lines (i.e., solid arrows indicate paths with high probability), and other paths are shown as dashed or dotted arrows. The probability of a path is based on the data distribution of a particular data set (e.g., modem data).

图10示出了根据一些具体实施的用于读取高优先级高速缓存行的示例方法1000的流程图。方法1000包括接收(1002)对HP读取的请求，以及读取(1004)CL。该方法还包括确定(1006)HPC是否等于1。如果HPC为1，则HP被解压缩。如果HPC不为1，则读取(1010)OFP所指向的溢出块，并且从压缩行复制(1012)480位，并且从OFP块复制32位。10 shows a flow chart of an example method 1000 for reading a high priority cache line according to some implementations. The method 1000 includes receiving (1002) a request for a HP read, and reading (1004) the CL. The method also includes determining (1006) whether the HPC is equal to 1. If the HPC is 1, the HP is decompressed. If the HPC is not 1, the overflow block pointed to by the OFP is read (1010) and 480 bits are copied (1012) from the compressed line and 32 bits are copied from the OFP block.

图11示出了根据一些具体实施的用于读取低优先级高速缓存行的示例方法1100的流程图。方法1100包括接收(1102)对LP读取的请求，以及读取(1104)CL。该方法还包括确定(1106)OFB是否被设定为3(十六进制值0x11)。如果OFB被设定为3，则LP被从第11位解压缩(1108)为LPE位。如果OFB未被设定为3，则OFP被读取(1110)并且从第32到CL的LPE的位被解压缩并且来自OFP所指向的溢出块的位被添加(1112)。FIG. 11 illustrates a flow chart of an example method 1100 for reading a low priority cache line according to some implementations. The method 1100 includes receiving 1102 a request for a LP read, and reading 1104 CL. The method also includes determining 1106 whether OFB is set to 3 (hexadecimal value 0x11). If OFB is set to 3, LP is decompressed 1108 from bit 11 to the LPE bit. If OFB is not set to 3, OFP is read 1110 and bits of LPE from bit 32 to CL are decompressed and bits from the overflow block pointed to by OFP are added 1112.

图12示出了根据一些具体实施的用于写入高优先级高速缓存行的示例方法1200的流程图。该方法包括接收(1202)HP写入请求以及将HP行压缩(1204)到HPSZ位。该方法还包括确定(1206)HPSZ是否小于256位。如果HPSZ小于256位，则将HPC设定为1并写入32字节的HP(1210)。通过块1202、1204、1206和1210的路径形成短路径，并且导致对高优先级高速缓存行的快速写入访问。如果HPSZ不小于256位，则读取CL(1208)。然后确定(1212)HPSZ是否小于511-LPE。如果是，则用HP更新CL(1214)并将HPC设定为1，并且写入CL(1216)。如果HPSZ不小于511-LPE，则进一步确定(1218)是否设定了OFP(或者检查CL是否具有OFP)。如果存在OFP，则读取OFP(1220)。如果不存在OFP，则读取LP内容并释放OFP(1222)。此外，确定(1224)HP是否是可压缩的(1224)。如果HP是可压缩的，则获得位大小大于LPSZ-(480-HPSZ)的新OFP块(1232)，更新具有LP的新OFP(1234)，并且用HP填充CL并将HPC设定为1(1236)。如果HP是不可压缩的，则获得位大小大于LPSZ+32的新OFP块(1226)，复制32位的HP溢出并用LP更新新OFP(1228)，并且用HP填充CL并将HPC设定为0(1230)。随后，写入OFP(1238)并写入CL(1240)。FIG. 12 shows a flow chart of an example method 1200 for writing a high priority cache line according to some specific implementations. The method includes receiving (1202) an HP write request and compressing (1204) the HP line to the HPSZ bit. The method also includes determining (1206) whether the HPSZ is less than 256 bits. If the HPSZ is less than 256 bits, HPC is set to 1 and 32 bytes of HP are written (1210). The path through blocks 1202, 1204, 1206, and 1210 forms a short path and results in fast write access to the high priority cache line. If the HPSZ is not less than 256 bits, CL is read (1208). It is then determined (1212) whether the HPSZ is less than 511-LPE. If so, CL is updated with HP (1214) and HPC is set to 1, and CL is written (1216). If the HPSZ is not less than 511-LPE, it is further determined (1218) whether the OFP is set (or whether the CL has an OFP). If the OFP exists, the OFP is read (1220). If the OFP does not exist, the LP content is read and the OFP is released (1222). In addition, it is determined (1224) whether the HP is compressible (1224). If the HP is compressible, a new OFP block (1232) with a bit size greater than LPSZ-(480-HPSZ) is obtained, the new OFP with LP is updated (1234), and the CL is filled with HP and the HPC is set to 1 (1236). If the HP is not compressible, a new OFP block (1226) with a bit size greater than LPSZ+32 is obtained, the 32-bit HP overflow is copied and the new OFP is updated with LP (1228), and the CL is filled with HP and the HPC is set to 0 (1230). Subsequently, the OFP is written (1238) and the CL is written (1240).

图13示出了根据一些具体实施的用于写入低优先级高速缓存行的示例方法1300的流程图。方法1300包括接收(1302)LP写入请求并将行压缩到LPSZ位以及将LPC设定为1(1304)。该方法还包括确定(1306)LPSZ是否大于480位。如果LPSZ大于480位，则将LPSZ设定为512(以指示无压缩)并将LPS设定为0(1308)。如果LPSZ不大于480位，则读取压缩行(1310)。进一步确定(1312)HPC是否被设定为0。如果HPC被设定为0，则读取OFP行(1314)。进一步确定当前块是否大于LPSZ加32。如果是，则更新OFP(1324)。如果当前块不大于LPSZ加32，则获得位大小大于LPSZ+32的新OFP块(1318)，复制32位的HP并更新新OFP(1320)，并且将旧OFP推入空闲列表(1322)。随后，写入OFP(1326)，针对OFP、OFB和LPC更新压缩行(1328)，并且写入CL(1330)。如果HPC不等于0(块1312)，则进一步确定(1332)OFP是否未被设定为0xFF。如果不是，则释放OFL(1334)。如果OFP被设定为0xFF，则HP被解压缩(1336)以找到HP结束位。进一步确定(1338)HP是否超过32字节。如果是，则将LPSPACE设定为HP结束位-11。如果不是，则将LPSPACE设定为256-11。在任一情况下，进一步确定(1344)LPSZ是否小于LPSPACE。如果是，则将LP封装(1346)到CL中，针对(设定为0x11、LPC和LPE的OFP)更新压缩行(1348)，并且写入CL(1350)。如果LPSZ不小于LPSPACE，则获得位大小大于LPSZ-LPSPACE-20的的OFP块(1352)，将LP封装(1354)成OFP和CL，针对(OFP、OFB、LPC和LPE)更新CL(1356)，写入CL(1358)，并且写入OFP(1360)。13 illustrates a flow chart of an example method 1300 for writing a low priority cache line according to some implementations. The method 1300 includes receiving (1302) an LP write request and compressing the line to LPSZ bits and setting LPC to 1 (1304). The method also includes determining (1306) whether LPSZ is greater than 480 bits. If LPSZ is greater than 480 bits, LPSZ is set to 512 (to indicate no compression) and LPS is set to 0 (1308). If LPSZ is not greater than 480 bits, the compressed line is read (1310). It is further determined (1312) whether HPC is set to 0. If HPC is set to 0, the OFP line is read (1314). It is further determined whether the current block is greater than LPSZ plus 32. If so, the OFP is updated (1324). If the current block is not larger than LPSZ plus 32, a new OFP block with a bit size larger than LPSZ+32 is obtained (1318), the 32-bit HP is copied and the new OFP is updated (1320), and the old OFP is pushed into the free list (1322). Subsequently, the OFP is written (1326), the compression line is updated for the OFP, OFB, and LPC (1328), and the CL is written (1330). If HPC is not equal to 0 (block 1312), it is further determined (1332) whether the OFP is not set to 0xFF. If not, the OFL is released (1334). If the OFP is set to 0xFF, the HP is decompressed (1336) to find the HP end bit. It is further determined (1338) whether the HP exceeds 32 bytes. If so, LPSPACE is set to the HP end bit -11. If not, LPSPACE is set to 256-11. In either case, it is further determined (1344) whether LPSZ is less than LPSPACE. If so, the LP is packed (1346) into the CL, the compressed row is updated (1348) for (OFP set to 0x11, LPC, and LPE), and the CL is written (1350). If LPSZ is not less than LPSPACE, an OFP block of bit size greater than LPSZ-LPSPACE-20 is obtained (1352), the LP is packed (1354) into the OFP and CL, the CL is updated (1356) for (OFP, OFB, LPC, and LPE), the CL is written (1358), and the OFP is written (1360).

图14是根据一些具体实施的基于处理器的系统的示例经压缩存储器系统1400的框图。系统1400包括被配置为将存储区域划分为多个数据区域的存储器划分电路1402。每个数据区域与相应的优先级等级相关联。存储区域包括压缩数据候选并且可以包括数据和/或指令。例如，软件图像可以与不同部分链接。每个部分可可以具有不同的地址范围。存储器划分电路1402可被配置为基于地址范围来区分不同的数据区域。在图3中，根据一些具体实施，存储区域被示出为被划分为高优先级孔302和低优先级孔304。尽管在图3中仅示出了两个孔或优先级等级，但是在一些具体实施中可以使用多于两个孔(有时称为数据区域)和/或多于两个优先级等级。FIG. 14 is a block diagram of an example of a compressed memory system 1400 of a processor-based system according to some specific implementations. System 1400 includes a memory partitioning circuit 1402 configured to partition a storage area into a plurality of data areas. Each data area is associated with a corresponding priority level. The storage area includes compressed data candidates and may include data and/or instructions. For example, a software image may be linked to different parts. Each part may have a different address range. The memory partitioning circuit 1402 may be configured to distinguish different data areas based on address ranges. In FIG. 3 , according to some specific implementations, the storage area is shown as being divided into a high priority hole 302 and a low priority hole 304. Although only two holes or priority levels are shown in FIG. 3 , more than two holes (sometimes referred to as data areas) and/or more than two priority levels may be used in some specific implementations.

经压缩存储器系统1400还包括高速缓存行选择电路1404，该高速缓存行选择电路被配置为：(i)从多个数据区域中的第一数据区域选择第一高速缓存行，以及(ii)从多个数据区域中的第二数据区域选择第二高速缓存行。例如，高速缓存行的地址可由高速缓存行选择电路1404用来确定高速缓存行属于第一数据区域还是第二数据区域。根据该确定，高速缓存行选择电路1404可选择第一高速缓存行和第二高速缓存行。第一数据区域具有比第二数据区域更高的优先级等级。例如，在图3中，选择来自高优先级孔302的高速缓存行和来自低优先级孔304的高速缓存行以形成经压缩高速缓存行306。The compressed memory system 1400 also includes a cache line selection circuit 1404, which is configured to: (i) select a first cache line from a first data area of the plurality of data areas, and (ii) select a second cache line from a second data area of the plurality of data areas. For example, the address of the cache line can be used by the cache line selection circuit 1404 to determine whether the cache line belongs to the first data area or the second data area. Based on the determination, the cache line selection circuit 1404 can select the first cache line and the second cache line. The first data area has a higher priority level than the second data area. For example, in FIG. 3, a cache line from the high priority hole 302 and a cache line from the low priority hole 304 are selected to form a compressed cache line 306.

经压缩存储器系统1400还包括压缩电路1406，该压缩电路被配置为：(i)压缩第一高速缓存行以获得第一经压缩高速缓存行，以及(ii)压缩第二高速缓存行以获得第二经压缩高速缓存行。根据一些具体实施，以上参考图1、图2A、图2B和图2C描述了压缩电路的示例。The compressed memory system 1400 also includes a compression circuit 1406 configured to: (i) compress the first cache line to obtain a first compressed cache line, and (ii) compress the second cache line to obtain a second compressed cache line. According to some specific implementations, examples of compression circuits are described above with reference to Figures 1, 2A, 2B, and 2C.

经压缩存储器系统1400还包括高速缓存行封装电路1408，该高速缓存行封装电路被配置为：根据确定第一高速缓存行是可压缩的，(i)将第一经压缩高速缓存行写入候选经压缩高速缓存行的第一预先确定的部分，以及(ii)将第二高速缓存行或第二经压缩高速缓存行的第二部分写入候选经压缩高速缓存行的第二预先确定的部分。第一预先确定的部分大于第二预先确定的部分。例如，图5示出了经压缩且写入指示第一预先确定的部分的HP402的高优先级高速缓存行。图5还示出了经压缩且写入LP 404(第二预先确定的部分)的低优先级高速缓存行。在一些具体实施中，第一预先确定的部分和第二预先确定的部分可对应于经压缩高速缓存行的两个不同部分。首先用高优先级经压缩数据填充第一预先确定的部分，接着用低优先级数据填充第二预先确定的部分。高优先级数据可以溢出到第二预先确定的部分中，但是低优先级数据不能溢出到第一预先确定的部分中。在一些具体实施中，第二预先确定的部分包括控制位而第一预先确定的部分不包括控制位。在一些具体实施中，在不同方向上(例如，朝向两个部分的交叉点)写入经压缩数据。The compressed memory system 1400 also includes a cache line packing circuit 1408, which is configured to: based on determining that the first cache line is compressible, (i) write the first compressed cache line to a first predetermined portion of the candidate compressed cache line, and (ii) write the second cache line or the second portion of the second compressed cache line to a second predetermined portion of the candidate compressed cache line. The first predetermined portion is larger than the second predetermined portion. For example, FIG. 5 shows a high priority cache line that is compressed and written to HP 402 indicating the first predetermined portion. FIG. 5 also shows a low priority cache line that is compressed and written to LP 404 (the second predetermined portion). In some specific implementations, the first predetermined portion and the second predetermined portion may correspond to two different portions of the compressed cache line. The first predetermined portion is first filled with high priority compressed data, and then the second predetermined portion is filled with low priority data. The high priority data can overflow into the second predetermined portion, but the low priority data cannot overflow into the first predetermined portion. In some implementations, the second predetermined portion includes control bits and the first predetermined portion does not include control bits. In some implementations, the compressed data is written in different directions (eg, toward the intersection of the two portions).

在一些具体实施中，高速缓存行封装电路1408还被配置为：根据确定(i)第一高速缓存行是不可压缩的、(ii)第二高速缓存行是不可压缩的，或者(iii)第二经压缩高速缓存行未拟合在候选经压缩高速缓存行的第二预先确定的部分内，在候选经压缩高速缓存行中设定溢出指针(例如，图4中的OFP 412)。根据第二高速缓存行的压缩率或第二经压缩高速缓存行的大小，溢出指针指向多个溢出块中的一个溢出块(例如，溢出仓308、310或312中的一者)。多个溢出块中的每个溢出块具有不同的大小。例如，溢出仓308、310和312保持不同大小的数据。在一些具体实施中，高速缓存行封装电路1408还被配置为：接收对第一高速缓存行或第二高速缓存行的读取请求；以及响应于接收到该读取请求：根据确定溢出指针被设定，根据溢出指针从多个溢出块中的溢出块检索数据。根据一些具体实施，以上参考图3、图4、图7、图8和图9描述了溢出块和溢出逻辑的示例。根据一些具体实施，以上参考图10和图11描述了用于读取的示例方法。In some implementations, the cache line packing circuit 1408 is further configured to: set an overflow pointer (e.g., OFP 412 in FIG. 4 ) in the candidate compressed cache line based on determining that (i) the first cache line is incompressible, (ii) the second cache line is incompressible, or (iii) the second compressed cache line does not fit within a second predetermined portion of the candidate compressed cache line. The overflow pointer points to an overflow block (e.g., one of the overflow bins 308 , 310 , or 312 ) in a plurality of overflow blocks based on a compression ratio of the second cache line or a size of the second compressed cache line. Each overflow block in the plurality of overflow blocks has a different size. For example, the overflow bins 308 , 310 , and 312 hold data of different sizes. In some implementations, the cache line packing circuit 1408 is further configured to: receive a read request for the first cache line or the second cache line; and in response to receiving the read request: retrieve data from an overflow block in the plurality of overflow blocks based on the overflow pointer based on determining that the overflow pointer is set. According to some implementations, examples of overflow blocks and overflow logic are described above with reference to Figures 3, 4, 7, 8, and 9. According to some implementations, example methods for reading are described above with reference to Figures 10 and 11.

在一些具体实施中，高速缓存行封装电路1408还被配置为：根据确定第一高速缓存行是可压缩的：在候选经压缩高速缓存行中设定第一压缩率控制位(例如，HPC 406)；以及根据确定第一高速缓存行是不可压缩的：将第一高速缓存行的第一部分写入候选经压缩高速缓存行；将第一高速缓存行的剩余部分写入溢出块；重置候选经压缩高速缓存行中的第一压缩率控制位；以及在候选经压缩高速缓存行中设定溢出指针以指向溢出块。根据一些具体实施，以上参考图9描述了一个示例。在一些具体实施中，高速缓存行封装电路1408还被配置为：响应于接收到对第一高速缓存行的读取请求：根据确定第一压缩率控制位被设定：从候选经压缩高速缓存行检索第一高速缓存行；以及根据确定第一压缩率控制位被重置：从候选经压缩高速缓存行检索第一高速缓存行的第一部分；以及基于溢出指针来从溢出块检索第一高速缓存行的第二部分。在一些具体实施中，高速缓存行封装电路1408还被配置为：根据确定第一高速缓存行是不可压缩的：根据第二高速缓存行是否是可压缩的，将第二高速缓存行或第二经压缩高速缓存行写入溢出块；以及重置候选经压缩高速缓存行中的第二压缩率控制位(例如，LPC 408)以指示第二高速缓存行是否是不可压缩的。在一些具体实施中，高速缓存行封装电路1408还被配置为：响应于接收到对第二高速缓存行的读取请求：基于第二压缩率控制位来从溢出块检索第二高速缓存行或第二经压缩高速缓存行。根据一些具体实施，以上分别参考图10和图11描述了用于读取高优先级高速缓存行和低优先级高速缓存行的示例方法。In some implementations, the cache line packing circuit 1408 is further configured to: in accordance with determining that the first cache line is compressible: set a first compression rate control bit (e.g., HPC 406) in the candidate compressed cache line; and in accordance with determining that the first cache line is not compressible: write a first portion of the first cache line to the candidate compressed cache line; write the remaining portion of the first cache line to the overflow block; reset the first compression rate control bit in the candidate compressed cache line; and set an overflow pointer in the candidate compressed cache line to point to the overflow block. In accordance with some implementations, an example is described above with reference to FIG. 9. In some implementations, the cache line packing circuit 1408 is further configured to: in response to receiving a read request for the first cache line: in accordance with determining that the first compression rate control bit is set: retrieve the first cache line from the candidate compressed cache line; and in accordance with determining that the first compression rate control bit is reset: retrieve a first portion of the first cache line from the candidate compressed cache line; and retrieve a second portion of the first cache line from the overflow block based on the overflow pointer. In some implementations, the cache line packing circuit 1408 is further configured to: based on determining that the first cache line is incompressible: based on whether the second cache line is compressible, write the second cache line or the second compressed cache line to the overflow block; and reset the second compression rate control bit (e.g., LPC 408) in the candidate compressed cache line to indicate whether the second cache line is incompressible. In some implementations, the cache line packing circuit 1408 is further configured to: in response to receiving a read request for the second cache line: retrieve the second cache line or the second compressed cache line from the overflow block based on the second compression rate control bit. According to some implementations, the above describes an example method for reading a high priority cache line and a low priority cache line with reference to Figures 10 and 11, respectively.

在一些具体实施中，高速缓存行封装电路1408还被配置为：响应于接收到对第一高速缓存行的高速缓存行写入请求：(例如，使用压缩电路1406)压缩第一高速缓存行以获得具有第一大小的第一经更新高速缓存行；根据确定第一大小等于或小于候选经压缩高速缓存行的第一预先确定的大小，将第一经更新高速缓存行写入候选经压缩高速缓存行；根据确定第一大小大于第一预先确定的大小并且等于或小于候选经压缩高速缓存行的第二预先确定的大小，基于第一经更新高速缓存行来对候选经压缩高速缓存行执行读取-修改-写入操作；以及根据确定第一大小大于候选经压缩高速缓存行的第二预先确定的大小，基于第一经更新高速缓存行来对候选经压缩高速缓存行执行读取-修改-写入操作并且对多个溢出块中的溢出块执行读取-修改-写入操作。在一些具体实施中，第一预先确定的大小是候选经压缩高速缓存行的大小的一半。在一些具体实施中，高速缓存行封装电路1408还被配置为：在将第二高速缓存行或第二经压缩高速缓存行的第二部分写入候选经压缩高速缓存行的第二预先确定的部分时，在候选经压缩高速缓存行中写入结束位索引(例如，LPE414)以指示第二高速缓存行或第二经压缩高速缓存行的第二部分被写入候选经压缩高速缓存行内的何处；以及基于候选经压缩高速缓存行中的结束位索引来计算第二预先确定的大小。In some implementations, the cache line packing circuit 1408 is further configured to: in response to receiving a cache line write request for a first cache line: compress the first cache line (e.g., using the compression circuit 1406) to obtain a first updated cache line having a first size; write the first updated cache line to the candidate compressed cache line based on a determination that the first size is equal to or less than a first predetermined size of the candidate compressed cache line; perform a read-modify-write operation on the candidate compressed cache line based on the first updated cache line based on a determination that the first size is greater than the first predetermined size and equal to or less than a second predetermined size of the candidate compressed cache line; and perform a read-modify-write operation on the candidate compressed cache line based on the first updated cache line and perform a read-modify-write operation on an overflow block of the plurality of overflow blocks based on a determination that the first size is greater than the second predetermined size of the candidate compressed cache line. In some implementations, the first predetermined size is half the size of the candidate compressed cache line. In some specific implementations, the cache line packing circuit 1408 is also configured to: when writing the second cache line or the second portion of the second compressed cache line to the second predetermined portion of the candidate compressed cache line, write an end bit index (e.g., LPE414) in the candidate compressed cache line to indicate where the second cache line or the second portion of the second compressed cache line is written within the candidate compressed cache line; and calculate the second predetermined size based on the end bit index in the candidate compressed cache line.

在一些具体实施中，高速缓存行封装电路1408还被配置为：响应于接收到对第二高速缓存行的高速缓存行写入请求：压缩第二高速缓存行以获得具有第二大小的第二经更新高速缓存行；根据确定第二大小与第一经压缩高速缓存行的大小的总和小于候选经压缩高速缓存行的第一预先确定的大小，执行读取-修改-写入操作以将第二经更新高速缓存行写入候选经压缩高速缓存行；以及根据确定第二大小与第一经压缩高速缓存行的大小的总和不小于候选经压缩高速缓存行的第一预先确定的大小，(i)执行第一读取-修改-写入操作以将第二经更新高速缓存行的第一部分写入候选经压缩高速缓存行，以及(ii)执行第二读取-修改-写入操作以将第二经更新高速缓存行的剩余部分写入溢出指针所指向的溢出块。在一些具体实施中，对于高优先级高速缓存行，没有控制位来指示HP在何处结束。在这种情况下，系统解压缩HP部分以确定该信息。在一些具体实施中，这不是真正的解压缩，而是通过HP经压缩位(该过程非常类似于解压缩)进行扫描来确定结束。一些具体实施保留8位至9位来指示HP的结束，就像LP侧那样。In some implementations, the cache line packing circuit 1408 is further configured to: in response to receiving a cache line write request for a second cache line: compress the second cache line to obtain a second updated cache line having a second size; based on determining that the sum of the second size and the size of the first compressed cache line is less than a first predetermined size of the candidate compressed cache line, perform a read-modify-write operation to write the second updated cache line to the candidate compressed cache line; and based on determining that the sum of the second size and the size of the first compressed cache line is not less than the first predetermined size of the candidate compressed cache line, (i) perform a first read-modify-write operation to write a first portion of the second updated cache line to the candidate compressed cache line, and (ii) perform a second read-modify-write operation to write a remaining portion of the second updated cache line to the overflow block pointed to by the overflow pointer. In some implementations, for high priority cache lines, there is no control bit to indicate where the HP ends. In this case, the system decompresses the HP portion to determine this information. In some implementations, this is not a true decompression, but rather the end is determined by scanning the HP through the compressed bits (a process very similar to decompression). Some implementations reserve bits 8 to 9 to indicate the end of the HP, just like the LP side.

在一些具体实施中，高速缓存行封装电路1408还被配置为：响应于接收到对第一高速缓存行或第二高速缓存行的高速缓存行写入请求：压缩第一高速缓存行或第二高速缓存行以获得具有经更新大小的经更新经压缩高速缓存行；以及根据确定经更新大小不能拟合在溢出指针所指向的溢出块内，释放溢出指针并更新溢出指针以指向多个溢出块中的新溢出块。根据一些具体实施，以上参考图12和图13描述了用于操作空闲列表的示例。In some implementations, the cache line packing circuit 1408 is further configured to: in response to receiving a cache line write request for the first cache line or the second cache line: compress the first cache line or the second cache line to obtain an updated compressed cache line with an updated size; and based on determining that the updated size cannot fit within the overflow block pointed to by the overflow pointer, release the overflow pointer and update the overflow pointer to point to a new overflow block in the plurality of overflow blocks. According to some implementations, examples for operating the free list are described above with reference to FIGS. 12 and 13.

在一些具体实施中，第一经压缩高速缓存行、第二经压缩高速缓存行和候选经压缩高速缓存行具有相等的大小。例如，在图3中，来自区域302、304的每个高速缓存行和经压缩高速缓存行306具有64字节。In some implementations, the first compressed cache line, the second compressed cache line, and the candidate compressed cache line have equal sizes. For example, in Figure 3, each cache line from regions 302, 304 and compressed cache line 306 has 64 bytes.

在一些具体实施中，第一数据区域和第二数据区域具有相等的大小。例如，在图3中，区域302和304具有相等的大小(例如，区域具有相等数目的高速缓存行)。In some implementations, the first data region and the second data region have equal sizes. For example, in Figure 3, regions 302 and 304 have equal sizes (eg, the regions have equal numbers of cache lines).

在一些具体实施中，第二预先确定的部分小于候选经压缩高速缓存行的大小的一半。根据一些具体实施，以上参考图4至图9描述了布局的示例。In some implementations, the second predetermined portion is less than half the size of the candidate compressed cache line.According to some implementations, examples of layouts are described above with reference to FIGS.

在一些具体实施中，高速缓存行封装电路1408还被配置为：在相反的方向上将第一经压缩高速缓存行和第二经压缩高速缓存行写入候选经压缩高速缓存行。第一经压缩高速缓存行和第二经压缩高速缓存行隔开一个或多个字节。In some implementations, the cache line packing circuit 1408 is further configured to write the first compressed cache line and the second compressed cache line to the candidate compressed cache line in opposite directions. The first compressed cache line and the second compressed cache line are separated by one or more bytes.

另一方面，提供了一种基于处理器的系统的经压缩存储器系统。该经压缩存储器系统包括存储区域，该存储区域包括多个高速缓存行。例如，图3示出了两个存储区域302和304。每个高速缓存行具有多个优先级等级中的一个优先级等级。例如，区域302中的高速缓存行具有比区域304中的高速缓存行更高的优先级。该经压缩存储器系统还包括经压缩存储区域(例如，包括经压缩高速缓存行306的区域)，该经压缩存储区域包括多个经压缩高速缓存行。每个经压缩高速缓存行包括第一组数据位，该第一组数据位被配置为在第一方向上保持第一高速缓存行的一部分或压缩后的第一高速缓存行的一部分，该第一高速缓存行具有第一优先级等级。例如，根据一些具体实施，图4所示的布局示出了包括用于存储高优先级高速缓存行的位的HP 402的经压缩高速缓存行。每个经压缩高速缓存行还包括第二组数据位，该第二组数据位被配置为在与第一方向相反的第二方向上保持第二高速缓存行的一部分或压缩后的第二高速缓存行的一部分，该第二高速缓存行具有比第一优先级等级低的第二优先级等级。例如，根据一些具体实施，图4所示的布局示出了包括用于存储针对低优先级高速缓存行的位的LP 404的经压缩高速缓存行。第一组数据位包括比第二组数据位更多数目的位。HP 402在与LP 404相反的方向上存储经压缩位。图4中的布局示出了用于高优先级高速缓存行的位可以占据HP 402和区域420，而用于低优先级高速缓存行的位可以仅占据大小小于HP 402的LP 404。On the other hand, a compressed memory system of a processor-based system is provided. The compressed memory system includes a storage area, which includes a plurality of cache lines. For example, FIG. 3 shows two storage areas 302 and 304. Each cache line has a priority level in a plurality of priority levels. For example, the cache line in area 302 has a higher priority than the cache line in area 304. The compressed memory system also includes a compressed storage area (e.g., an area including compressed cache line 306), which includes a plurality of compressed cache lines. Each compressed cache line includes a first group of data bits, which is configured to keep a portion of a first cache line or a portion of a compressed first cache line in a first direction, and the first cache line has a first priority level. For example, according to some specific implementations, the layout shown in FIG. 4 shows a compressed cache line of HP 402 including bits for storing high priority cache lines. Each compressed cache line also includes a second set of data bits that are configured to hold a portion of a second cache line or a compressed portion of a second cache line in a second direction opposite to the first direction, the second cache line having a second priority level lower than the first priority level. For example, according to some specific implementations, the layout shown in FIG. 4 shows a compressed cache line including an LP 404 for storing bits for low priority cache lines. The first set of data bits includes a greater number of bits than the second set of data bits. HP 402 stores compressed bits in a direction opposite to LP 404. The layout in FIG. 4 shows that bits for high priority cache lines can occupy HP 402 and region 420, while bits for low priority cache lines can occupy only LP 404, which is smaller in size than HP 402.

在一些具体实施中，该经压缩存储器系统还包括溢出存储区域，该溢出存储区域包括多个溢出仓(例如，溢出块426)。每个溢出仓被配置为保持不同数目的字节。例如，溢出块308、310和312各自包括不同数目的字节。每个经压缩高速缓存行还包括一组溢出指针位(例如，OFP 412)，该一组溢出指针位被配置为保持指向多个溢出仓中的一个溢出仓的指针。In some specific implementations, the compressed memory system also includes an overflow storage area, which includes multiple overflow bins (e.g., overflow block 426). Each overflow bin is configured to keep different numbers of bytes. For example, overflow blocks 308, 310 and 312 each include different numbers of bytes. Each compressed cache line also includes a set of overflow pointer bits (e.g., OFP 412), which is configured to keep a pointer to an overflow bin in the multiple overflow bins.

在一些具体实施中，每个经压缩高速缓存行还包括：指示第一高速缓存行的压缩率的第一控制位(例如，HPC 406)；以及指示第二高速缓存行的压缩率的第二控制位(例如，LPC 408)。In some implementations, each compressed cache line further includes: a first control bit (eg, HPC 406) indicating a compression rate for the first cache line; and a second control bit (eg, LPC 408) indicating a compression rate for the second cache line.

在一些具体实施中，该经压缩存储器系统还包括溢出存储区域，该溢出存储区域包括多个溢出仓。每个溢出仓被配置为保持不同数目的字节。每个经压缩高速缓存行还包括一组位(例如，OFB 410)，该一组位被配置为保持多个溢出仓中的一个溢出仓的大小。In some implementations, the compressed memory system further includes an overflow storage area, the overflow storage area including a plurality of overflow bins. Each overflow bin is configured to hold a different number of bytes. Each compressed cache line further includes a set of bits (e.g., OFB 410) configured to hold the size of one of the plurality of overflow bins.

在一些具体实施中，每个经压缩高速缓存行还包括指示第二组数据位的结束的结束位索引(例如，LPE 414)。In some implementations, each compressed cache line also includes an end bit index (eg, LPE 414) indicating the end of the second set of data bits.

在一些具体实施中，每个溢出仓被配置为在第一方向上保持第一高速缓存行的位和/或第二高速缓存行或压缩后的第二高速缓存行的位。例如，如图4所示，溢出仓420在与HP 402相同的方向上保持位。In some implementations, each overflow bin is configured to hold bits of the first cache line and/or bits of the second cache line or the compressed second cache line in a first direction. For example, as shown in FIG. 4 , overflow bin 420 holds bits in the same direction as HP 402 .

在一些具体实施中，第一组数据位和第二组数据位隔开一个或多个字节。例如，在图4中，HP 402和LP 404隔开间隙422。In some implementations, the first set of data bits and the second set of data bits are separated by one or more bytes. For example, in FIG. 4 , HP 402 and LP 404 are separated by gap 422 .

在一些具体实施中，每个高速缓存行和每个经压缩高速缓存行具有相同的大小。例如，在图3中，每个高速缓存行302和304、每个经压缩高速缓存行306包括64个字节。In some implementations, each cache line and each compressed cache line have the same size. For example, in Figure 3, each cache line 302 and 304, each compressed cache line 306 includes 64 bytes.

在一些具体实施中，当压缩后的第一高速缓存行或压缩后的第二高速缓存行未拟合在经压缩高速缓存行中时，则每个经压缩高速缓存行还包括一组溢出指针位，该一组溢出指针位被配置为保持指向多个溢出仓中的一个溢出仓的指针。溢出仓被配置为保持第一高速缓存行、压缩后的第一高速缓存行、第二高速缓存行和/或压缩后的第二高速缓存行的位。换句话讲，如图5和图6所示，OFP 412是可选的，并且仅当存在溢出时使用。当压缩后的两个高速缓存行都拟合在经压缩高速缓存行中时，不使用OFP位。In some specific implementations, when the compressed first cache line or the compressed second cache line does not fit in the compressed cache line, each compressed cache line also includes a set of overflow pointer bits, which are configured to keep a pointer to an overflow bin in the multiple overflow bins. The overflow bin is configured to keep the first cache line, the compressed first cache line, the second cache line and/or the compressed second cache line. In other words, as shown in Figures 5 and 6, OFP 412 is optional and is used only when there is an overflow. When both compressed cache lines fit in the compressed cache line, the OFP bit is not used.

在一些具体实施中，第二组数据位还被配置为保持指示经压缩高速缓存行中的溢出(例如，OFP 412、OFB 410)以及第二组数据位(例如，LPE 414)的结束的多个控制位。In some implementations, the second set of data bits is further configured to hold a plurality of control bits that indicate an overflow in the compressed cache line (eg, OFP 412, OFB 410) and the end of the second set of data bits (eg, LPE 414).

将理解，尽管术语“第一”、“第二”等可在本文中用于描述各种元件，但是这些元件不应当受这些术语限制。这些术语仅用于区分一个元件与另一个元件。It will be understood that although the terms "first", "second", etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.

本文所用的术语仅出于描述具体实施方案的目的，并不旨在限制权利要求。如在实施方案的描述和所附权利要求的描述中所用，单数形式的“一个”、“一种”和“该”旨在包括复数形式，除非上下文另有明确指示。还将理解，如本文所用的术语“和/或”是指并且涵盖相关联的所列项中的一项或多项的任何和所有可能组合。将进一步理解，术语“包括”和/或“包含”当在本说明书中使用时指定所陈述的特征、整体、步骤、操作、元件和/或组件的存在，但是不排除一个或多个其他特征、整体、步骤、操作、元件、组件和/或它们的组的存在或添加。The terms used herein are only for the purpose of describing specific embodiments and are not intended to limit the claims. As used in the description of the embodiments and the description of the appended claims, the singular forms of "one", "a kind of" and "the" are intended to include plural forms unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "include" and/or "comprising" when used in this specification specify the presence of stated features, wholes, steps, operations, elements and/or components, but do not exclude the presence or addition of one or more other features, wholes, steps, operations, elements, components and/or their groups.

如本文所用，术语“如果”可被解释为意指“当……时”或“在……时”或“响应于确定”或“根据确定”或“响应于检测”所陈述的先例条件为真，这取决于上下文。类似地，短语“如果确定[所陈述的先例条件为真]”或“如果[所陈述的先例条件为真]”或“当[所陈述的先例条件为真]时”可被解释为意指“在确定……时”或“响应于确定”或“根据确定”或“在检测到……时”或“响应于检测到”所陈述的先例条件为真，这取决于上下文。As used herein, the term “if” may be interpreted to mean “when” or “at” or “in response to determining” or “upon determining” or “in response to detecting” the stated antecedent condition is true, depending on the context. Similarly, the phrase “if it is determined that [the stated antecedent condition is true]” or “if [the stated antecedent condition is true]” or “when [the stated antecedent condition is true]” may be interpreted to mean “upon determining” or “in response to determining” or “upon determining” or “upon detecting” or “in response to detecting” the stated antecedent condition is true, depending on the context.

出于解释的目的，前述描述已经参考特定具体实施进行描述。然而，以上例示性讨论不旨在是详尽的或将权利要求限制于所公开的精确形式。鉴于以上教导内容，许多修改和变化是可能的。具体实施经选择和描述以便最佳地解释操作原理和实际应用，由此使得本领域其他技术人员能够进行实现。For the purpose of explanation, the foregoing description has been described with reference to specific implementations. However, the above illustrative discussion is not intended to be exhaustive or to limit the claims to the precise forms disclosed. In view of the above teachings, many modifications and variations are possible. Specific implementations are selected and described in order to best explain the operating principles and practical applications, thereby enabling others skilled in the art to implement.

Claims

1. A method for compressing data in a compressed memory system of a processor-based system, the method comprising:

Dividing the storage area into a plurality of data areas, each data area being associated with a respective priority level;

(i) Selecting a first cache line from a first data region of the plurality of data regions, and (ii) selecting a second cache line from a second data region of the plurality of data regions, wherein the first data region has a higher priority level than the second data region;

(i) Compressing the first cache line to obtain a first compressed cache line, and (ii) compressing the second cache line to obtain a second compressed cache line; and

In accordance with a determination that the first cache line is compressible:

(i) Writing the first compressed cache line to a first predetermined portion of a candidate compressed cache line, and (ii) writing the second cache line or a second portion of the second compressed cache line to a second predetermined portion of the candidate compressed cache line, wherein the first predetermined portion is greater than the second predetermined portion.

2. The method of claim 1, the method further comprising:

In accordance with a determination that (i) the first cache line is incompressible, (ii) the second cache line is incompressible, or (iii) the second compressed cache line is not fit within the second predetermined portion of the candidate compressed cache line, an overflow pointer is set in the candidate compressed cache line, wherein the overflow pointer points to one of a plurality of overflow blocks, and wherein each of the plurality of overflow blocks has a different size, in accordance with a compression rate of the second cache line or a size of the second compressed cache line.

3. The method of claim 2, the method further comprising:

receiving a read request for the first cache line or the second cache line; and

In response to receiving the read request:

in accordance with a determination that the overflow pointer is set, data is retrieved from an overflow block of the plurality of overflow blocks in accordance with the overflow pointer.

4. The method of claim 1, the method further comprising:

In accordance with a determination that the first cache line is compressible:

setting a first compression rate control bit in the candidate compressed cache line; and

In accordance with a determination that the first cache line is incompressible:

writing a first portion of the first cache line to the candidate compressed cache line;

Writing the remainder of the first cache line to an overflow block;

Resetting the first compression rate control bit in the candidate compressed cache line; and

An overflow pointer is set in the candidate compressed cache line to point to the overflow block.

5. The method of claim 4, the method further comprising:

In response to receiving a read request for the first cache line:

in accordance with a determination that the first compression ratio control bit is set:

Retrieving the first cache line from the candidate compressed cache line; and

In accordance with a determination that the first compression ratio control bit is reset:

Retrieving the first portion of the first cache line from the candidate compressed cache line; and

The second portion of the first cache line is retrieved from the overflow block based on the overflow pointer.

6. The method of claim 4, the method further comprising:

In accordance with a determination that the first cache line is incompressible:

writing the second cache line or the second compressed cache line to the overflow block according to whether the second cache line is compressible; and

A second compression rate control bit in the candidate compressed cache line is reset to indicate whether the second cache line is incompressible.

7. The method of claim 6, the method further comprising:

In response to receiving a read request for the second cache line:

The second cache line or the second compressed cache line is retrieved from the overflow block based on the second compression rate control bit.

8. The method of claim 2, the method further comprising:

in response to receiving a cache line write request for the first cache line:

compressing the first cache line to obtain a first updated cache line of a first size;

in accordance with a determination that the first size is equal to or less than a first predetermined size of the candidate compressed cache line, writing the first updated cache line to the candidate compressed cache line;

Performing a read-modify-write operation on the candidate compressed cache line based on the first updated cache line in accordance with determining that the first size is greater than the first predetermined size and equal to or less than a second predetermined size of the candidate compressed cache line; and

In accordance with a determination that the first size is greater than the second predetermined size of the candidate compressed cache line, a read-modify-write operation is performed on the candidate compressed cache line and a read-modify-write operation is performed on an overflow block of the plurality of overflow blocks based on the first updated cache line.

9. The method of claim 8, wherein the first predetermined size is half a size of the candidate compressed cache line.

10. The method of claim 8, the method further comprising:

Writing an end bit index in the candidate compressed cache line to indicate where the second cache line or the second portion of the second compressed cache line is written within the candidate compressed cache line when writing the second cache line or the second portion of the second compressed cache line to the second predetermined portion of the candidate compressed cache line; and

The second predetermined size is calculated based on the end bit index in the candidate compressed cache line.

11. The method of claim 2, the method further comprising:

in response to receiving a cache line write request for the second cache line:

compressing the second cache line to obtain a second updated cache line of a second size;

In accordance with a determination that a sum of the second size and the size of the first compressed cache line is less than a first predetermined size of the candidate compressed cache line, performing a read-modify-write operation to write the second updated cache line to the candidate compressed cache line; and

In accordance with a determination that the sum of the second size and the size of the first compressed cache line is not less than the first predetermined size of the candidate compressed cache line, (i) performing a first read-modify-write operation to write a first portion of the second updated cache line to the candidate compressed cache line, and (ii) performing a second read-modify-write operation to write a remaining portion of the second updated cache line to an overflow block pointed to by the overflow pointer.

12. The method of claim 2, the method further comprising:

in response to receiving a cache line write request to the first cache line or the second cache line:

compressing the first cache line or the second cache line to obtain an updated compressed cache line having an updated size; and

In accordance with a determination that the updated size cannot fit within the overflow block pointed to by the overflow pointer, the overflow pointer is released and updated to point to a new overflow block of the plurality of overflow blocks.

13. The method of claim 1, wherein the first compressed cache line, the second compressed cache line, and the candidate compressed cache line are of equal size.

14. The method of claim 1, wherein the first data region and the second data region are of equal size.

15. The method of claim 1, wherein the second predetermined portion is less than half a size of the candidate compressed cache line.

16. The method of claim 1, wherein the first compressed cache line and the second compressed cache line are written to the candidate compressed cache line in opposite directions, and wherein the first compressed cache line and the second compressed cache line are separated by one or more bytes.

17. A compressed memory system of a processor-based system, the compressed memory system comprising:

a memory partitioning circuit configured to partition a storage area into a plurality of data areas, each data area being associated with a respective priority level;

a cache line selection circuit configured to: (i) Selecting a first cache line from a first data region of the plurality of data regions, and (ii) selecting a second cache line from a second data region of the plurality of data regions, wherein the first data region has a higher priority level than the second data region;

a compression circuit configured to: (i) Compressing the first cache line to obtain a first compressed cache line, and (ii) compressing the second cache line to obtain a second compressed cache line; and

A cache line encapsulation circuit configured to:

In accordance with a determination that the first cache line is compressible:

(i) Writing the first compressed cache line to a candidate compressed cache

A first predetermined portion of a cache line, and (ii) writing a second portion of the second cache line or the second compressed cache line to a second predetermined portion of the candidate compressed cache line, wherein the first predetermined portion is greater than the second predetermined portion.

18. The compressed memory system of claim 17, wherein the cache line packaging circuitry is further configured to:

In accordance with a determination that (i) the first cache line is incompressible, (ii) the second cache line is incompressible, or (iii) the second compressed cache line is not fit within the second predetermined portion of the candidate compressed cache line:

setting an overflow pointer in the candidate compressed cache line, wherein the overflow pointer points to one of the plurality of overflow blocks according to a compression rate of the second cache line or a size of the second compressed cache line, and wherein each of the plurality of overflow blocks has a different size.

19. The compressed memory system of claim 18, wherein the cache line packaging circuitry is further configured to:

receiving a read request for the first cache line or the second cache line; and

In response to receiving the read request:

20. The compressed memory system of claim 17, wherein the cache line packaging circuitry is further configured to:

In accordance with a determination that the first cache line is compressible:

In accordance with a determination that the first cache line is incompressible:

Writing the remainder of the first cache line to an overflow block;

21. The compressed memory system of claim 20, wherein the cache line packaging circuitry is further configured to:

In response to receiving a read request for the first cache line:

Retrieving the first cache line from the candidate compressed cache line; and

22. The compressed memory system of claim 21, wherein the cache line packaging circuitry is further configured to:

In accordance with a determination that the first cache line is incompressible:

23. The compressed memory system of claim 22, wherein the cache line packaging circuitry is further configured to:

In response to receiving a read request for the second cache line:

24. The compressed memory system of claim 18, wherein the cache line packaging circuitry is further configured to:

in response to receiving a cache line write request for the first cache line:

25. The compressed memory system of claim 24, wherein the first predetermined size is half the size of the candidate compressed cache line.

26. The compressed memory system of claim 24, wherein the cache line packaging circuitry is further configured to:

27. The compressed memory system of claim 18, wherein the cache line packaging circuitry is further configured to:

in response to receiving a cache line write request for the second cache line:

In accordance with a determination that the sum of the second size and the size of the first compressed cache line is not less than the first predetermined size of the candidate compressed cache line, (i) performing a first read-modify-write operation to write a first portion of the second updated cache line to the candidate compressed cache line, and (ii) performing a second read-modify-write operation to write a remaining portion of the second updated cache line to the overflow block pointed to by the overflow pointer.

28. The compressed memory system of claim 18, wherein the cache line packaging circuitry is further configured to:

29. The compressed memory system of claim 17, wherein the first compressed cache line, the second compressed cache line, and the candidate compressed cache line are of equal size.

30. The compressed memory system of claim 17, wherein the first data region and the second data region are of equal size.

31. The compressed memory system of claim 17, wherein the second predetermined portion is less than half a size of the candidate compressed cache line.

32. The compressed memory system of claim 17, wherein the cache line packaging circuitry is further configured to:

the first compressed cache line and the second compressed cache line are written to the candidate compressed cache line in opposite directions, wherein the first compressed cache line and the second compressed cache line are separated by one or more bytes.

33. A non-transitory computer-readable medium having stored thereon computer-executable instructions that, when executed by a processor, cause the processor to:

In accordance with a determination that the first cache line is compressible:

34. The non-transitory computer-readable medium of claim 33 having stored thereon computer-executable instructions that, when executed by a processor, further cause the processor to: