CN113424160B

CN113424160B - Processing method, processing device and related equipment

Info

Publication number: CN113424160B
Application number: CN201980092068.8A
Authority: CN
Inventors: 张志强; 何世明; 周文旻; 孙波
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-03-30
Filing date: 2019-03-30
Publication date: 2024-01-30
Anticipated expiration: 2039-03-30
Also published as: WO2020199061A1; CN113424160A

Abstract

A processing method, apparatus and related device. Wherein the method comprises the following steps: the data processing apparatus receives a memory access command (S200), where the memory access command includes an access address and indication information, and the indication information is used to indicate a data flow characteristic of the intelligent peripheral I P corresponding to the memory access command; determining a read strategy or a write strategy corresponding to the memory access command according to the I P data stream characteristics (S210-S230); the access address is processed according to the read strategy or the write strategy (S240-S250). The method can provide different read strategies or write strategies, meet the application requirements of different scenes, and ensure that the data processing device obtains power consumption and performance benefits.

Description

Processing method, processing device and related equipment

Technical Field

The present disclosure relates to the field of cache, and in particular, to a processing method, apparatus, and related device.

Background

The System Cache (SC) is an important component of a data processing device (such as a computer, a mobile terminal, etc.), and the SC is used for temporarily storing instructions and data in a central processing unit (central processing unit, CPU) and exchanging data with an external memory such as a hard disk, etc., so that the CPU can realize higher-speed access, shorten access time, and improve performance of the system.

A mobile terminal system on chip (SoC) generally includes intelligent peripherals (intelligent peripheral, IP) such as an application processor (application central processing unit, ACPU), a graphics processing unit (graphic process unit, GPU), a display processing unit (display process unit, DPU), a Video Decoder (VDEC), a Video Encoder (VENC), an image signal processor (image signal processor, ISP), and a neural network processing unit (neural-network process unit, NPU). The demands of these IPs on system caches are different in practical application scenarios, for example, some IPs want the system to provide a piece of on-chip random access memory (random access memory, RAM) inside the chip for caching intermediate data generated during the IP working process, and some IPs want the system to provide a piece of storage space for data transfer between IPs or as Last Level Cache (LLC) of the IPs, which aims to reduce the bandwidth and latency (latency) of the IP access dynamic random access memory (dynamic random access memory, DRAM) and improve the working performance of the IPs.

The existing system cache can not meet the requirements of different IPs of the system at the same time, so that DRAM is frequently accessed, power consumption and performance are reduced, and the technical problem to be solved is how to enable the system cache to meet the requirements of different IPs of the system at the same time.

Disclosure of Invention

The application provides a processing method, a processing device and related equipment, which can enable SCs to simultaneously support multiple mode characteristics, meet application requirements of different IPs in different scenes, reduce bandwidth and delay of access of the IPs to a DRAM and improve the working performance of the IPs.

In a first aspect, a processing method is provided, the method comprising: the method comprises the steps that a processing device receives a memory access command, wherein the memory access command comprises an access address and indication information, and the indication information is used for indicating IP data flow characteristics of an intelligent peripheral corresponding to the memory access command; the processing device determines a read strategy or a write strategy corresponding to the memory access command according to the IP data stream characteristics; and the processing device processes the access address according to the read strategy or the write strategy.

In the embodiment of the application, the processing device provides different read strategies or write strategies by identifying and matching the indication information in the memory access command, so that the application requirements of different scenes are met, and the data processing device is ensured to obtain power consumption and performance benefits.

In one possible implementation manner, the processing device determines a read policy or a write policy corresponding to the memory access command according to a preset relationship between the IP data flow characteristic and the read policy or the write policy.

In one possible implementation, the memory access command is a read command; the read strategy is a first read strategy comprising: judging whether the read command hits in the cache line, if yes, reading target data in the cache line corresponding to the access address, and if no, reading target data corresponding to the access address in the main memory.

In the embodiment of the application, when the read strategy is the first read strategy, if the read command does not hit the cache line, the target data is directly read in the main memory, the cache line is not allocated, and the target data is written into the cache, so that the allocation of the cache line can be reduced, the cache resource is saved, and the power consumption is reduced.

In one possible implementation, the memory access command is a read command; the read strategy is a second read strategy, the second read strategy comprising: judging whether the read command hits the cache line, if yes, reading target data in the cache line corresponding to the access address, if not, distributing the cache line, reading the target data corresponding to the access address in the main memory, and writing the target data into the distributed cache line.

In the embodiment of the application, when the read strategy is the second read strategy, if the read command does not hit the cache line, the cache line is allocated, and the target data is read in the main memory and then written into the allocated cache line, so that the consistency of the data is ensured.

In one possible implementation, the memory access command is a read command; the read strategy is a third read strategy, the third read strategy comprising: judging whether the read command hits in a cache line, if yes, reading target data in the cache line corresponding to the access address, releasing the hit cache line, and if no, reading the target data corresponding to the access address in a main memory.

In the embodiment of the present application, when the read policy is the third read policy, in order to prevent data loss in the last application scenario, when the memory access command hits the cache line, the target data in the cache line is written into the main memory, so that the method is not only simply discarded, but also can be used for dimension measurement.

In one possible implementation, the memory access command is a read command; the read strategy is a fourth read strategy, the fourth read strategy comprising: judging whether the read command hits in the cache line, if yes, reading target data in the cache line corresponding to the access address, and performing invalidation processing on the hit cache line, and if not, reading the target data corresponding to the access address in the main memory.

In this embodiment of the present application, when the read policy is the fourth read policy, the processing device may directly discard the read target data after completing the reading of the target data, so as to complete the invalidation processing of the hit cache line, reduce the use of cache resources, and avoid the subsequent re-writing of contaminated (dirty) data, i.e. modified data, into the main memory when performing cache line replacement, thereby reducing the access to the main memory and improving the performance.

In one possible implementation, the memory access command is a write command; the write strategy is a first write strategy comprising: judging whether the write command hits a cache line, if yes, writing target data in the cache line corresponding to the access address, if not, distributing the cache line, and writing the target data in the distributed cache line, wherein the data amount corresponding to the target data is non-integer times of the capacity of the cache line.

In this embodiment of the present application, when the write policy is the first write policy, after a cache line is allocated, it is not necessary to first read and merge (merge) target data from the main memory, and then write the target data in the allocated cache line to keep data consistency, so that the target data can be directly written in the allocated cache line, access to the main memory is reduced, and performance is improved.

In one possible implementation, the memory access command is a write command; the write strategy is a second write strategy comprising: judging whether the write command hits a cache line, if yes, writing target data in the cache line corresponding to the access address, if not, distributing the cache line, reading and merging the target data in a main memory, and writing the target data in the distributed cache line, wherein the data amount corresponding to the target data is non-integer times of the capacity of the cache line.

In this embodiment of the present application, when the write policy is the second write policy, for the write operation of the partial data amount, after the cache line is allocated, the target data needs to be read from the main memory and then written into the allocated cache line after the target data is read from the main memory, so as to maintain the consistency of the data.

In one possible implementation, the memory access command is a write command; the write strategy is a third write strategy, the third write strategy comprising: judging whether the write command hits in the cache line, if yes, writing target data in the cache line corresponding to the access address, and if no, writing the target data in the access address in the main memory.

In the embodiment of the application, when the write strategy is the third write strategy, if the memory access command does not hit the cache line, the cache line is not allowed to be allocated, the memory access command directly accesses the main memory, and the access address in the main memory writes the target data, so that unnecessary allocation of the cache line can be reduced, power consumption can be reduced, and performance can be improved.

In one possible implementation, the allocating the cache line by the processing device includes: the processing device allocates the cache line according to the IP data stream characteristics.

In one possible implementation, the allocating, by the processing device, the cache line according to the IP data flow characteristic includes: the processing device determines the allocation strategy of the cache line according to the IP data flow characteristics; and the processing device allocates the cache line according to the allocation strategy.

In a second aspect, embodiments of the present application provide a processing apparatus, including: a slave interface, a tag controller and a data controller; the slave interface, the tag controller and the data controller are connected with each other, and the slave interface is used for receiving a memory access command, wherein the memory access command comprises an access address and indication information, and the indication information is used for indicating the IP data flow characteristics of the intelligent peripheral corresponding to the memory access command; the tag controller is configured to determine a read policy or a write policy corresponding to the memory access command according to the IP data flow characteristic; and the data controller is used for processing the access address according to the read strategy or the write strategy.

In a possible implementation manner, the processing device further includes a configuration register, where the configuration register is configured to store a preset relationship between the IP data flow characteristic and the read policy or the write policy, and the tag controller determines the read policy or the write policy corresponding to the memory access command according to the preset relationship.

In a possible implementation manner, the processing device further comprises a tag random access memory and a data random access memory, wherein the tag random access memory is used for storing tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a read command, the read policy is a first read policy, and the tag controller is specifically configured to query the tag random access memory according to the first read policy, and determine whether the read command hits in a cache line; if hit, the data controller reads the target data in the data random access memory, and if miss, the data controller reads the target data corresponding to the access address in the main memory.

In a possible implementation manner, the processing device further comprises a tag random access memory and a data random access memory, wherein the tag random access memory is used for storing tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a read command, the read policy is a second read policy, and the tag controller is specifically configured to query the tag random access memory according to the second read policy, and determine whether the read command hits in a cache line; if hit, the data controller reads the target data from the data random access memory, if miss, the data controller allocates a cache line, reads the target data corresponding to the access address from the main memory, and writes the target data into the allocated cache line.

In a possible implementation manner, the processing device further comprises a tag random access memory and a data random access memory, wherein the tag random access memory is used for storing tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a read command, the read policy is a third read policy, and the tag controller is specifically configured to query the tag random access memory according to the third read policy, and determine whether the read command hits in a cache line; and if the target data is hit, the data controller reads the target data from the data random access memory, releases the hit cache line, and if the target data is not hit, the data controller reads the target data corresponding to the access address from the main memory.

In a possible implementation manner, the processing device further comprises a tag random access memory and a data random access memory, wherein the tag random access memory is used for storing tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a read command, the read policy is a fourth read policy, and the tag controller is specifically configured to query the tag random access memory according to the fourth read policy, and determine whether the read command hits in a cache line; if hit, the data controller reads the target data from the data random access memory, and carries out invalidation processing on the hit cache line, if miss, the data controller reads the target data corresponding to the access address from the main memory.

In a possible implementation manner, the processing device further comprises a tag random access memory and a data random access memory, wherein the tag random access memory is used for storing tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a write command, the write strategy is a first write strategy; the tag controller is specifically configured to query the tag random access memory according to the first write policy, and determine whether the write command hits a cache line; if hit, the data controller writes target data in the data random access memory, if miss, the data controller allocates a cache line, and writes the target data in the allocated cache line, wherein the data amount corresponding to the target data is a non-integer multiple of the capacity of the cache line.

In a possible implementation manner, the processing device further comprises a tag random access memory and a data random access memory, wherein the tag random access memory is used for storing tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a write command, the write strategy is a second write strategy; the tag controller is specifically configured to query the tag random access memory according to the first write policy, and determine whether the write command hits a cache line; if hit, the data controller writes the target data in the data random access memory, if miss, the data controller allocates a cache line, reads and merges the target data in the main memory, and writes the target data in the cache line obtained by allocation, wherein the data amount corresponding to the target data is a non-integer multiple of the capacity of the cache line.

In a possible implementation manner, the processing device further comprises a tag random access memory and a data random access memory, wherein the tag random access memory is used for storing tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a write command, the write strategy is a third write strategy; the data controller is specifically configured to query the tag random access memory according to the third write policy, and determine whether the write command hits in a cache line; if hit, the data controller writes the target data in the data random access memory, and if miss, the data controller writes the target data in the access address in the main memory.

In one possible implementation, the data controller allocating the cache line includes: the data controller allocates the cache line according to the IP data flow characteristic.

In one possible implementation, the data processor allocating the cache line according to the IP data flow characteristic includes: the data processor determines the allocation strategy of the cache line according to the IP data flow characteristics; the data processor allocates the cache line according to the allocation policy.

In a third aspect, the present application provides a semiconductor chip, which may include:

the second aspect above, as well as processing means and a central processing unit coupled to the processing means provided in connection with any implementation of the second aspect above.

In a fourth aspect, the present application provides a system-on-a-chip SoC chip including the above first aspect, as well as a processing device provided in combination with any one of the implementations of the above first aspect, and a central processing unit coupled to the processing device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

In a fifth aspect, the present application provides a terminal device, which includes the second aspect and the processing apparatus provided in combination with any implementation manner of the second aspect, and an external memory of the processing apparatus, where the processing apparatus and the external memory of the processing apparatus are disposed in different semiconductor chips.

In a sixth aspect, the present application provides a terminal device, which includes the processing apparatus of the second aspect and any implementation manner of the second aspect, and a central processing unit coupled to the processing apparatus. The central processing unit is used for running a general operating system necessary for the terminal equipment and is used for being coupled with the processing device to complete related processing functions in the processing device. The terminal device may also include a communication interface for the terminal device to communicate with other devices or a communication network.

In a seventh aspect, the present application provides a computer storage medium storing a computer program, which when executed by a processor, can implement the flow executed by the processing device in the processing method provided in the first aspect and any implementation manner of the first aspect.

In an eighth aspect, an embodiment of the present invention provides a computer program, where the computer program includes instructions, when the computer program is executed by a computer, cause the computer to perform the first aspect and the flow executed by the processing apparatus in the processing method provided in connection with any implementation manner of the first aspect.

Drawings

Fig. 1 is a schematic hardware structure diagram of soc+dram according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a processing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a memory access command format according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a first read strategy processing flow according to an embodiment of the present application;

fig. 5 is a schematic diagram of a storage format of a tag random access memory according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a second read strategy processing flow according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of a third read strategy processing flow provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of a fourth read strategy processing flow according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a first write strategy processing flow according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a second write strategy processing flow according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a third write strategy processing flow according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a processing device according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of another processing apparatus according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between 2 or more computers. Furthermore, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with one another in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

First, some terms in this application are explained for easy understanding by those skilled in the art.

(1) A system on chip (SoC), which is referred to as a system on chip, also known as a system on chip, means that it is a product, which is an integrated circuit with dedicated targets, containing the entire contents of a complete system and embedded software. It is also a technique to achieve the whole process from determining the system functions, to software/hardware partitioning, and to complete the design.

(2) Random access memory (random access memory, RAM), which is used to store and hold data, is an internal memory that exchanges data directly with the central processing unit (central processing unit, CPU), also called main memory (memory), which can be read and written at any time, and is typically used as a temporary storage medium for an operating system or other running programs. RAM cannot retain data when power is turned off and if it is required, they must be written to a long-term memory (e.g., hard disk).

(3) Random access memory RAM can be further divided into: static random access memory (static random access memory, SRAM) and dynamic random access memory (dynamic random access memory, DRAM). The basic principle of the two is the same, and charges are stored in the memory, wherein the SRAM has a complex structure, small capacity per unit area and high access speed; the DRAM has a simple structure, a large capacity per unit area, and a slow access time relative to the SRAM, and the DRAM gradually disappears with time due to a simple structure, so that a timed recharging (refresh) is required to maintain the information stored in the capacitor.

(4) cache memory, also known as cache memory, is a very small but high-speed memory located between the CPU and DRAM, typically consisting of SRAM. The speed of CPU is far higher than that of memory, when CPU directly accesses data from memory, it needs to wait for a certain period of time, while the cache memory can store a part of data which has just been used or circularly used by CPU, if CPU needs to use the part of data again, it can be directly called from the cache memory, thus avoiding repeated access of data, reducing CPU waiting time, and raising system efficiency.

(5) A cache is a set of fixed-size blocks of data called cache lines, the size of which is based on the size of a burst read or burst write cycle. Each cache line is completely filled or downloaded in a burst read cycle. Even if the processor accesses only one byte of memory, the cache controller initiates an entire accessor access cycle and requests an entire block of data. The address of the first byte of the cache line is always a multiple of the burst period size, and the start position of the cache line always coincides with the beginning of the burst period.

(6) The minimum operation unit of the cache is a line (line), each line of the cache is provided with a tag (tag) and index (index) domain for addressing, the cache can be divided into full-associative mapping, direct mapping and multi-path group-associative mapping according to an addressing mode, the full-associative mapping refers to that data in any line address can be put into any line of the cache, a processor can accurately address according to the tag and the index, the direct mapping refers to that for any address, only one line is matched in the cache, the index of the line is fixed, the multi-path group-associative mapping refers to that a plurality of parallel direct mapping caches are integrated together, so that a plurality of lines correspond to one index, and whether the tag of each line corresponds to a wanted address is compared. Regardless of the mapping cache, the problem of replacement is faced, as the running time of the system increases, all cache lines are occupied, when a new address data request is needed, the existing data is replaced, and the existing data is replaced in a mode of rotation replacement, random replacement, least recently used (least recently used) replacement and the like, so that the hit probability is increased by utilizing the space and time locality of access, and the benefits of performance and power consumption are obtained.

In order to facilitate understanding of the embodiment of the present application, taking a cache as an example of a cache mapped by multiple ways of sets of associativity, technical problems to be solved specifically in the embodiment of the present application and corresponding practical application scenarios are exemplified.

Referring to fig. 1, fig. 1 is a schematic hardware structure diagram of soc+dram according to an embodiment of the present application. As shown in fig. 1, a system on a chip SoC of a mobile terminal (e.g., a smart phone, a palm computer, etc.) is a system in which an intelligent peripheral (intelligent peripheral, IP), for example, an application processor (application central processing unit, ACPU) S110, a Video Encoder (VENC) S120, an image signal processor (image signal processor, ISP) S130, a display processor (display process unit, DPU) S140, a graphics processor (graphic process unit, GPU) S150, a modem (modem) S160, etc., are connected to an off-chip DRAM S600 through a system bus (system switch) S200, a cache (system cache) S300, a memory controller (memory controller) S400, a memory physical interface (port physical layer, PHY) S500, and one or more of the memory controller S400 and the memory physical interface S500 may exist. The off-chip DRAM S600 is an access center for programs and data of the SoC, the access speed of which is slower than that of the cache memory S300, but the storage space of which is larger, and can be accessed through the paths of the memory controller S400 and the memory physical interface S500, the cache memory S300 is an on-chip cache memory, and can store data required to be accessed by the current processor, so as to meet the requirement of quick access of the processor, but the storage space of which is smaller, and the hit rate of the data accessed by the processor in the cache memory S300 is critical to the influence of the power consumption and performance of the whole system.

Currently, in SoC, a plurality of IP coprocessors are generally used to complete an application scenario. For example, in a game scenario, GPU S150 needs to obtain data such as texture from ACPU S110, then perform multi-level rendering on the obtained data, and finally output the rendered data to DPU S140 for display. In this scenario, different access behaviors exist in the system by different IPs, for example, the GPU performs multi-level rendering on the acquired data, during the rendering process, the GPU S150 needs to continuously write data into the off-chip DRAM S600 first, then read data from the off-chip DRAM S600 and write the data into the cache memory S300, write the data into the off-chip DRAM S600 again after intermediate processing, then read the processed data from the off-chip DRAM S600 again, and repeat the process, and after the rendering is completed, intermediate result data generated during the rendering process does not need to be retained, and the behavior of the intermediate result data is similar to that of the on-chip random access memory (scratchpad memory, SPM); GPU S150 outputs the rendered data to DPU S140 for display, in the process, the rendered data only needs to be transferred once, and the generator (producer), namely GPU S150, writes the whole frame of data into off-chip DRAM S600, and then the consumer (consumer), namely DPU S140, reads and writes the whole frame of data into cache memory S300; in outputting data corresponding to control information such as a descriptor (descriptor) to the GPU S150, the CPU controls writing of the data to the off-chip DRAM S600, and the GPU S150 reads the data from the off-chip DRAM S600 and writes the data to the cache memory S300, using spatial locality and temporal locality possessed by the data.

It will be appreciated that the current use of cache memory S300 is not flexible enough and the requirements for different IPs cannot be met at the same time, resulting in frequent access to off-chip DRAM S600 by the IPs, severely affecting IP performance and power consumption. For example, when the GPU performs multi-order rendering on the acquired data, frequent accesses to the off-chip DRAM 600 are required, which seriously affects its own power consumption and performance, and the existing cache memory S300 cannot support the use of the cache by the GPU S150 in the SPM mode to reduce the accesses to the off-chip DRAM S600 (i.e., the GPU S150 cannot directly read and write in the cache memory S300, and the existing cache memory S300 cannot provide a piece of on-chip RAM for storing intermediate data generated by the GPU S150). In addition, when the GPU S150 outputs the rendered data to the DPU for display, it is also required to access the off-chip DRAM S600 and allocate a cache line (cache line), which also affects the system power consumption and performance, the existing cache memory S300 cannot support the GPU S150 and the DPU S140 to use the cache in a Data Buffer (DB) mode to reduce the access to the off-chip DRAM S600 (i.e., in the case that the GPU S150 and the DPU S140 implement a strict synchronization mechanism, the GPU S150 does not need to write the whole frame of data into the off-chip DRAM S600, and can directly write the data into the cache memory S300, and the DPU S140 directly reads the data from the cache memory S300, because the GPU S150 and the DPU S140 are strictly synchronized, the GPU S150 can read while writing the data, so that more fine-grained data transfer, that is, no more whole frame of data transfer, can be achieved.

In addition, in the prior art scheme, the cache memory S300 only supports IP to use the cache in a Normal Cache (NC) mode, and uses spatial locality and temporal locality to reduce access to the off-chip DRAM S600 (e.g., the CPU outputs data corresponding to control information such as a descriptor to the GPU S150), so as to reduce power consumption, improve performance, cannot support an SPM mode or a DB mode, and cannot dynamically modify attributes of corresponding cache lines in each mode.

In summary, the technical problems to be solved in the present application include how to design a cache memory supporting different IP data flow characteristics to meet the requirements of different IPs, and determine a cache line allocation policy according to the IP data flow characteristics, so as to reduce access to an off-chip DRAM, reduce power consumption, and improve performance.

Based on the foregoing, a processing method and related devices provided in the embodiments of the present application are described below. Referring to fig. 2, fig. 2 is a schematic flow chart of a processing method according to an embodiment of the present application. As shown in fig. 2, the method includes, but is not limited to, the steps of:

s200: the data processing device receives a memory access command.

In particular, the data processing device may be a cache memory and the memory access command may be a command to IP access the DRAM.

Further, the memory access command includes an access address and indication information, where the indication information is used to indicate IP data flow characteristics corresponding to the memory access command. The IP data flow characteristic is obtained by configuring upper software of the system, and can characterize the processing behavior of the IP on data in a specific application scene. The data processing device can determine the data flow characteristics corresponding to the IP sending the memory access command through the indication information in the memory access command, namely, the specific mode in which the IP needs to use the system cache.

For example, referring to fig. 3, fig. 3 is a schematic diagram of a memory access command format according to an embodiment of the present application. As shown in fig. 3, the memory access command includes a command type indication field, an access address field, an indication information field, and other attribute fields, which may include fields such as a transmission identification (transaction ID) and a security flag (security flag). The command type indication field is used for indicating that the command is a read command or a write command, the access address field is used for indicating a memory address which the command needs to access, and the indication information field is used for allocation policy management of a cache line and state maintenance of the cache line.

S210: judging whether the memory access command is a read command or a write command, if the memory access command is a read command, executing step S220; if the memory access command is a write command, step S230 is performed.

Specifically, after receiving a memory access command, the data processing apparatus queries a command type indication field of the command, and interprets the memory access command as a read command or a write command.

S220: and the data processing device determines a read strategy corresponding to the read command according to the IP data stream characteristics.

S230: and the data processing device determines a write strategy corresponding to the write command according to the IP data stream characteristics.

Optionally, the data processing device determines a read policy corresponding to the read command or a write policy corresponding to the write command according to a preset relationship between the IP data stream characteristics and the read policy or the write policy.

Specifically, each indication information field corresponds to a set of configuration registers, where the set of configuration registers includes a cache line allocation mode configuration register, a write command allocation policy configuration register, a read command allocation policy configuration register, an out-of-cache enable configuration register, an out-of-cache lookup enable configuration register, a cache line allocation capability configuration register, a cache line count configuration register, and a cache line replacement enable configuration register, and the set of configuration registers performs corresponding configuration for each indication information field, so as to determine a read policy or a write policy corresponding to a memory access command, and a cache line allocation policy.

Further, after receiving the memory access command, the data processing apparatus queries a read command allocation policy configuration value or a write command allocation policy configuration value in the relevant configuration register according to the indication information field therein, and determines a read policy or a write policy corresponding to the memory access command according to the corresponding configuration value.

S240: the data processing device executes corresponding read operation processing flow according to the determined read strategy.

S250: and the data processing device executes a corresponding write operation processing flow according to the determined write strategy.

Specifically, the data processing device reads target data or writes target data in the storage space corresponding to the access address after determining the corresponding read strategy or write strategy.

It should be noted that, the read policies may include a first read policy, a second read policy, a third read policy, and a fourth read policy, and the write policies may include a first write policy, a second write policy, and a third write policy, and different read policies or write policies, and corresponding processing flows are different.

In one possible implementation, the memory access command is a read command, the read policy is a first read policy, and the data processing apparatus executes the process flow shown in fig. 4. Referring to fig. 4, fig. 4 is a schematic diagram of a first read strategy processing flow according to an embodiment of the present application. After the data processing apparatus determines the read strategy as the first read strategy according to the IP data stream characteristics, steps S410 to S440 are performed.

S410: and inquiring a tag random access memory (tag RAM) corresponding to the cache line according to the access address.

Specifically, tag information of the cache lines is stored in the tag RAM, the tag information includes address information and state information corresponding to each cache line, and the data processing device can obtain addresses and states of different cache lines in the cache by querying the tag RAM.

Referring to fig. 5, fig. 5 is a schematic diagram of a storage format of a tag random access memory according to an embodiment of the present application. As shown in fig. 5, the tag information corresponding to each cache line includes a cache line address field, a cache line status field, an indication information field, and an error check correction (error correcting code, ECC) field. The cache line address field is used for storing an address of the cache line in a cache, the cache line status field is used for storing a status of the cache line, the status of the cache line comprises a valid state (valid) and an invalid state (invalid), and a clean state (clean) and a dirty state (dirty), the indication information field is used for storing an indication signal of a command follow-up path for allocating the cache line, and the indication information field can be used for replacing and managing the cache line. The ECC field is used for storing error correction information of tag information corresponding to the cache line.

S420: judging whether the read command hits the cache line, if yes, executing step S430; if not, step S440 is performed.

S430: and reading target data in the cache line corresponding to the access address.

S440: and reading target data corresponding to the access address in a main memory (namely DRAM).

It should be understood that the capacity of the cache memory is far smaller than that of the main memory, the cache and the main memory are not in one-to-one correspondence, the addresses in the main memory are obtained by mapping, and the mapping modes include direct mapping, full-association mapping and group-association mapping, for example, in the group-association mapping, the memory address is divided into three sections, namely a flag area, an index area and an offset area, the flag area is the upper t bit of the memory address, the index area is the middle s bit of the memory address, the offset area is the lower b bit of the memory address, and the sum of t, s and b is m. The cache is divided into S groups, each group has E cache lines, each cache line has B memory units, where the value of the power of S of 2 is S, and the value of the power of B of 2 is B, so for a memory address, the middle S bits determine which group the corresponding memory unit is mapped to, the lowest B bits determine the offset of the memory unit in the cache line, the high t bits of the memory address correspond to the cache line address field stored in tagRAM, and since there may be multiple memory addresses mapped to the same cache line, the high t bits of the memory address (i.e., the cache line address field) are used to check whether the cache line is a memory unit that needs to be accessed by a memory access command. The specific mapping method is not limited in this application.

Further, the data processing apparatus first determines to which group the accessed memory unit is mapped through the middle s bits of the access address, then compares the high t bits of the access address with the cache line address field of each cache line in the group, and if there is a cache line matching with the cache line, and the state of the cache line is valid, then it is called that the memory access command hits the cache line; if there is no match between any cache line and it, then it is called a memory access command miss cache line.

For example, the total size of the cache is 32 Kilobytes (KB), which is an 8-way set associative, with 8 lines per set (i.e., 8 cache lines per set), and 64 bytes (bytes) per cache line, a total of 64 sets can be calculated. If the address in the main memory is 32 bits, since each cache line is 64 bytes and is 6 power of 2, the [ 0,5 ] of the main memory address is used to distinguish which byte in the cache line, since there are 64 groups in total and is 6 power of 2, the middle 6 bits of the main memory address, i.e., [ 6, 11 ] of the main memory address is used to distinguish which group of the 64 groups, the [ 12, 31 ] of the remaining main memory address is compared with the cache line addresses of 8 cache lines in the group one by one, and if the [ 12, 31 ] is consistent with the cache line address of a certain cache line and the state of the cache line is valid, the cache line is in the memory access command, i.e., the memory unit to be accessed by the memory access command is the memory unit corresponding to the cache line.

It can be seen that when the read policy is the first read policy, if the read command does not hit the cache line, the target data is directly read in the main memory, the cache line is not allocated, and the target data is written into the cache, so that allocation of the cache line can be reduced, cache resources can be saved, and power consumption can be reduced.

In one possible implementation, the memory access command is a read command, the read policy is a second read policy, and the data processing apparatus executes the process flow shown in fig. 6. Referring to fig. 6, fig. 6 is a schematic diagram of a second read strategy processing flow provided in an embodiment of the present application. After the data processing apparatus determines the read policy to be the second read policy according to the IP data stream characteristics, steps S610 to S660 are performed.

S610: and inquiring a tag RAM corresponding to the cache line according to the access address.

S620: judging whether the read command hits the cache line, if yes, executing step S630; if not, step S640 is performed.

S630: and reading target data in the cache line corresponding to the access address.

S640: judging whether the cache line count configuration value is smaller than the cache line allocation capability configuration value, if so, executing step S650; if not, step S660 is performed.

Specifically, the cache line allocation capability configuration value stored in the cache line allocation capability register is used to indicate the maximum number of cache lines allowed to be allocated by the IP data stream characteristic (i.e., indicating information), for example, the maximum number of cache lines allowed to be allocated is 20 if the cache line allocation capability configuration value is 20. It should be noted that, the configuration value of the allocation capability of the cache line is a value that can be dynamically configured and adjusted, and can be configured to be any value smaller than the cache capacity of the whole system according to the requirement. When the number of allocated cache lines reaches the cache line allocation capability configuration value, the data processing apparatus will not allow allocation of new cache lines for the memory access command.

The cache line count configuration register is configured to count the number of allocated cache lines corresponding to the IP data stream characteristic. Each time a cache line is allocated, the cache line count configuration value stored in the cache line count configuration register is incremented by 1, and if a cache line is replaced, released, or invalidated, the cache line count configuration value is decremented by 1.

S650: and distributing the cache line, reading target data corresponding to the access address in the main memory, and writing the target data into the distributed cache line.

Optionally, the data processing apparatus allocates the cache line according to IP data flow characteristics.

Specifically, the data processing device determines a cache line allocation policy according to the IP data stream characteristics, and allocates the cache line according to the determined allocation policy.

Further, when the data processing device determines that the read command does not hit the cache line, the data processing device queries the configuration value in the relevant configuration register according to the indication information in the read command, and if the cache line count configuration value is smaller than the cache line allocation capability configuration value, the data processing device allocates the cache line for the read command, reads the target data corresponding to the access address in the main memory, and writes the target data into the allocated cache line.

Specifically, the data processing apparatus queries the configuration values in the allocation pattern configuration register to determine a specific policy for allocating a cache line. The cache line allocation mode configuration value is used for indicating a cache line allocation mode corresponding to the IP data stream characteristic, the cache line allocation mode comprises a first allocation mode, a second allocation mode and a third allocation mode, the first allocation mode can be an SPM mode, the behavior of the first allocation mode is similar to that of an on-chip RAM, and the priority of the first allocation mode is highest; the second allocation pattern may be a DB pattern, a buffer for data transfer between different IPs or different modules, with a lower priority than the SPM pattern; the third allocation pattern may be NC-pattern, which behaves as a normal cache, with the lowest priority. I.e. the SPM mode has the highest priority, the NC mode has the lowest priority, the DB mode is interposed between them, and furthermore, the memory access command corresponding to the allocation mode configured to the higher priority may replace the cache line corresponding to the allocation mode configured to the lower priority. For example, the allocation mode corresponding to the indication information in a certain memory access command is an SPM mode, and if the memory access command needs to allocate a cache line, the allocation mode may be replaced by a cache line corresponding to a DB mode or an NC mode.

It will be appreciated that by configuring the allocation mode configuration registers, the system cache may support different allocation modes, i.e. the system cache may be used in different modes, thus meeting the requirements of different IPs of the system.

It should be noted that, the configuration value of the cache line allocation mode may be dynamically configured and adjusted, and may be configured according to the needs of the application scenario or the needs of the application scenario. Moreover, by dynamically configuring the configuration value of the cache line allocation mode and changing the priority thereof, the problem that the cache line with high priority can only be released through the cache maintenance operation (cache maintain operation, CMO) in the existing scheme can be solved, the processing efficiency is improved, the power consumption is reduced, and the performance of the whole data processing device is further improved.

The replacement algorithm corresponding to NC mode may be a least recently used (least recently used, LRU) algorithm or a pseudo least recently used (pseudo LRU) algorithm or a least frequently used (least frequently used, LFU) algorithm, and when the allocation mode corresponding to the configuration value in the allocation mode configuration register is NC mode, it is allowed to allocate a cache line in an invalid (invalid) state, or the replacement allocation mode is configured as a cache line in NC mode, and if there are no cache lines of both types, the read command directly accesses the main memory. The replacement algorithm corresponding to the DB mode may be a random (random) algorithm, and when the allocation mode corresponding to the configuration value in the allocation mode configuration register is the DB mode, the allocation of the cache line in the invalid state is allowed, or the allocation mode is replaced with the cache line in the NC mode, or the cache line having the same indication information and the corresponding allocation mode is the DB mode, and according to the configuration value of the replacement configuration register, the cache line having the same indication information stored in the tag RAM and the allocation mode is configured as the DB mode, and if none of the cache lines of the above types exists, the read command directly accesses the main memory. The algorithm corresponding to the SPM mode may be a random (random) algorithm, allowing allocation of invalid cache lines when the allocation mode corresponding to the configuration value in the allocation mode configuration register is the SPM mode, or replacing the allocation mode with a cache line configured in NC mode or DB mode, and if the allocation modes of the allocated cache lines are both SPM modes, directly accessing the main memory by the read command. Note that, since the priority of the SPM mode is highest, the allocation mode corresponding to the instruction information stored in the tag RAM is not allowed to be replaced with the cache line of the SPM mode.

It will be appreciated that the allocation or replacement priority of the cache line is, in order from top to bottom, an invalid cache line, a cache line corresponding to NC mode, and a cache line corresponding to DB mode, and the cache line corresponding to SPM mode does not allow replacement, and further, the allocation mode configuration register corresponding to each instruction information allows dynamic modification to change its priority. For example, if the allocation mode corresponding to the allocation mode configuration register corresponding to the indication information is the SPM mode and the allocation mode configuration register corresponding to the indication information is dynamically modified to make the allocation mode corresponding to the allocation value be the NC mode, the allocation priority of the cache line corresponding to the indication information is reduced.

S660: allocation of a cache line is not allowed.

Specifically, in the case where the cache line count configuration value is greater than or equal to the cache line allocation capability configuration value, the data processing apparatus does not allow allocation of the cache line for the memory access command.

Further, the data processing device queries a configuration value in the configuration register of the configuration mode, if the configuration mode corresponding to the configuration value is NC mode, only the cache line with the same instruction information stored in the tag RAM is allowed to be replaced, and if the cache line of the type does not exist, the read command directly accesses the main memory; if the allocation mode corresponding to the configuration value is a DB mode, inquiring the configuration value in the replacement configuration register, and under the condition that the configuration value allows the replacement of the cache line with the same indication information stored in the tag RAM and the cache line of the type exists, randomly selecting one cache line with the same indication information for replacement, otherwise, directly accessing the main memory by the read command; if the allocation mode corresponding to the configuration value is SPM mode, the allocation of the cache line is not allowed, and the read command directly accesses the main memory.

It should be noted that, the replacement register is enabled only when the allocation mode corresponding to the instruction information is the DB mode, that is, the replacement register determines whether to allow replacement of the cache line with the same instruction information. When the configuration value of the replacement configuration register is configured to allow the cache lines with the same replacement indication information, when the cache line count configuration value is greater than or equal to the cache line allocation capability configuration value, or no cache line which allows allocation is found, the cache lines with the same replacement indication information are replaced; when the configuration value of the replacement configuration register is configured so as not to allow the cache line having the same replacement instruction information, no replacement is performed regardless of whether there is a cache line satisfying the condition.

Specifically, in the process of allocating a cache line or replacing a cache line, a system cache needs to be queried, specifically, in the query process, the cache line of the way (way) of the system cache, which is required to be queried and corresponds to the indication information, is configured by an out-of-cache lookup enabling configuration register, and each bit (bit) in the out-cache lookup enabling configuration register corresponds to one way of the system cache. For example, if there are 16 ways in the system cache, then the out-of-cache lookup enables the configuration register to have a configuration value of 16 bits. When a cache line is allocated or replaced, the cache line indicating the way of the system cache that the information allows allocation or replacement is configured by an out-of-cache enable configuration register, and each bit in the out-of-cache enable configuration register corresponds to one way of the system cache. It should be noted that, in a specific application, the way in the system cache corresponding to the extra-cache lookup enable configuration register needs to be constrained to include the way in the system cache corresponding to the extra-cache enable configuration register, that is, the range of the system cache corresponding to the queried cache line is guaranteed to include the range of the system cache corresponding to the cache line that is allowed to be allocated or replaced.

In one possible implementation, the memory access command is a read command, the read policy is a third read policy, and the data processing apparatus executes a process flow as shown in fig. 7. Referring to fig. 7, fig. 7 is a schematic diagram of a third read strategy processing flow provided in an embodiment of the present application. After the data processing apparatus determines that the read policy is the third read policy according to the IP data stream characteristics, steps S710 to S740 are performed.

S710: and inquiring a tag RAM corresponding to the cache according to the access address.

S720: judging whether the read command hits the cache line, if yes, executing step S730; if not, step S740 is performed.

S730: and reading target data in the cache line corresponding to the access address, and releasing the hit cache line.

Specifically, after the data processing device reads the target data in the cache line corresponding to the access address, the data in the cache line is discarded to be changed into an invalid cache line, the state stored in the tag RAM corresponding to the cache line is adjusted to be in a wireless state, and the data in the cache line is written into the main memory (i.e., DRAM), so that the release process of the hit cache line is completed.

S740: and reading target data corresponding to the access address in the main memory.

It should be appreciated that this third read strategy is generally applied to DB mode for dimension measurement, and in order to prevent data loss in the last application scenario, when the memory access command hits the cache line, the target data in the cache line is written into main memory, rather than simply discarded.

In one possible implementation, the memory access command is a read command, the read policy is a fourth read policy, and the data processing apparatus executes a process flow as shown in fig. 8. Referring to fig. 8, fig. 8 is a schematic diagram of a fourth read strategy processing flow provided in an embodiment of the present application. After the data processing apparatus determines the read policy to be the fourth read policy according to the IP data stream characteristics, steps S810 to S840 are performed.

S810: and inquiring a tag RAM corresponding to the cache according to the access address.

S820: judging whether the memory access command hits the cache line, if yes, executing step S830; if not, step S840 is performed.

S830: and reading target data in the cache line corresponding to the access address, and invalidating the hit cache line.

Specifically, after the data processing device reads the target data in the cache line corresponding to the access address, the data in the cache line is directly discarded to change the data into the cache line in an invalid state, and the state stored in the tag RAM corresponding to the cache line is adjusted to be in the invalid state, so that the data in the cache line does not need to be written into the main memory.

S840: and reading target data corresponding to the access address in the main memory.

It should be noted that, the fourth read policy is generally applied to DB mode, in the process of data transfer between different IPs or different modules, since the data is disposable, it can be directly discarded after the reading is completed, so as to complete invalidation processing of the hit cache line, reduce use of cache resources, and avoid re-writing the polluted (dirty) data, i.e. modified data, into main memory when the cache line is replaced later, reduce access to main memory, and improve performance.

In a possible implementation manner, the memory access command is a write command, the write policy is a first write policy, and the data processing apparatus executes a process flow as shown in fig. 9. Referring to fig. 9, fig. 9 is a schematic diagram of a first write strategy processing procedure provided in an embodiment of the present application. After the data processing apparatus determines that the write strategy is the first write strategy according to the IP data stream characteristics, steps S910 to S960 are performed.

S910: and inquiring a tag RAM corresponding to the cache line according to the access address.

S920: judging whether the memory access command hits the cache line, if so, executing step S930; if not, go to step S940.

S930: and writing target data in the cache line corresponding to the access address.

S940: judging whether the cache line count configuration value is smaller than the cache line allocation capability configuration value, if so, executing step S950; if not, step S960 is performed.

S950: and allocating the cache line, and writing target data in the allocated cache line.

S960: allocation of a cache line is not allowed.

Specifically, the allocation or replacement of the cache line is also performed based on the priority of the allocation mode corresponding to the indication information, and the specific process may refer to the descriptions of fig. 4 to 8, which are not repeated herein.

It should be noted that, the first write strategy is generally used in SPM mode, specifically, for a write operation of a partial (partial) data volume, that is, the target data volume to be written is not a non-integer multiple of the cache line volume, and for example, if the volume of each cache line is 32 bytes and the target data volume to be written is 18 bytes or 42 bytes, after the cache line is allocated, it is not necessary to first read and merge (merge) the target data from the main memory, and then write the target data in the allocated cache line to keep the consistency of the data, so that the target data can be directly written in the allocated cache line, thereby reducing access to the main memory and improving the performance.

In one possible implementation, the memory access command is a write command, the write policy is a second write policy, and the data processing apparatus executes a process flow as shown in fig. 10. Referring to fig. 10, fig. 10 is a schematic diagram of a second write strategy processing flow provided in an embodiment of the present application. After the data processing apparatus determines that the write strategy is the second write strategy according to the IP data stream characteristics, steps S1010-S1060 are performed.

S1010: and inquiring a tag RAM corresponding to the cache line according to the access address.

S1020: judging whether the memory access command hits the cache line, if so, executing step S1030; if not, step S1040 is performed.

S1030: and writing target data in the cache line corresponding to the access address.

S1040: judging whether the cache line count configuration value is smaller than the cache line allocation capability configuration value, if so, executing step S1050; if not, step S1060 is performed.

S1050: and distributing the cache line, reading and merging target data in the main memory, and writing the target data in the distributed cache line.

S1060: allocation of a cache line is not allowed.

It should be noted that, the only difference between the second write strategy and the first write strategy is that, for the write operation of the partial data volume, after the cache line is allocated, the target data needs to be read from the main memory and then written into the allocated cache line to maintain the consistency of the data.

In one possible implementation, the memory access command is a write command, the write policy is a third write policy, and the data processing apparatus executes a process flow as shown in fig. 11. Referring to fig. 11, fig. 11 is a schematic diagram of a third write strategy processing procedure provided in an embodiment of the present application. After the data processing apparatus determines that the read strategy is the third write strategy according to the IP data stream characteristics, steps S1110 to S1140 are performed.

S1110: and inquiring a tag RAM corresponding to the cache according to the access address.

S1120: judging whether the memory access command hits the cache line, if yes, executing step S1130; if not, step S1140 is performed.

S1130: and writing target data in the cache line corresponding to the access address.

S1140: the access address in main memory is written into the target data.

Specifically, the specific process of the third write strategy processing procedure may refer to the descriptions of fig. 4 to 8, and will not be repeated here.

It should be noted that, in the third write strategy, if the memory access command does not hit the cache line, the allocation of the cache line will not be allowed, the memory access command directly accesses the main memory, and the access address in the main memory is written into the target data, so that unnecessary allocation of the cache line can be reduced, power consumption can be reduced, and performance can be improved.

The foregoing details of the method of embodiments of the present application are set forth in order to provide a better understanding of the foregoing aspects of embodiments of the present application, and in response, related apparatus for implementing the foregoing aspects in conjunction therewith are provided below.

Referring to fig. 12, fig. 12 is a schematic structural diagram of a processing apparatus according to an embodiment of the present application. As shown in fig. 12, the processing device 1200 includes a processing module 1210. Wherein,

a processing module 1210, configured to receive a memory access command, where the memory access command includes an access address and the indication information, and the indication information is used to indicate an IP data flow characteristic of an intelligent peripheral corresponding to the memory access command; determining a read strategy or a write strategy corresponding to the memory access command according to the IP data flow characteristics; and processing the access address according to the read strategy or the write strategy.

In one possible implementation, the processing apparatus 1200 further includes a storage module 1220, configured to store a preset relationship between an IP data stream and a read policy and a preset relationship between an IP data stream and a write policy, and the processing module 1210 determines the read policy or the write policy corresponding to the memory access command according to the preset relationship between the IP data stream characteristics and the read policy and the preset relationship between the IP data stream and the write policy.

Optionally, the storage module 1220 is further configured to store tag information corresponding to each cache line and data in the cache line, where the storage module 1220 may be further configured to store a configuration value corresponding to the indication information, and specific content and application of the tag information and specific configuration content in the configuration register may refer to related descriptions of the foregoing method embodiments, which are not described herein again.

Optionally, the processing module 1210 is further configured to determine a cache line allocation policy and maintain a status of a cache line according to a configuration value corresponding to the indication information, and a specific process may refer to the related description of the above method embodiment, which is not described herein.

In this embodiment of the present application, the processing module 1210 determines the read policy or write policy corresponding to the memory access command by querying the preset relationship between the IP data flow characteristics and the read policy or write policy stored in the storage module 1220, and then executes different processing flows according to different read policies or write policies, so as to meet the requirements of different IPs in different application scenarios, reduce the bandwidth of the IP access DRAM, reduce power consumption, and improve the working performance of the IP.

In one possible implementation, the memory access command is a read command; the reading strategy is a first reading strategy; the processing module 1210 is specifically configured to: judging whether the read command hits in the cache line, if yes, reading target data in the cache line corresponding to the access address, and if no, reading target data corresponding to the access address in the main memory.

In one possible implementation, the memory access command is a read command; the reading strategy is a second reading strategy; the processing module 1210 is specifically configured to: judging whether the read command hits the cache line, if yes, reading target data in the cache line corresponding to the access address, if not, distributing the cache line, reading the target data corresponding to the access address in the main memory, and writing the target data into the distributed cache line.

In one possible implementation, the memory access command is a read command; the read strategy is a third read strategy; the processing module 1210 is specifically configured to: judging whether the memory access command hits in a cache line, if yes, reading target data in the cache line corresponding to the access address, releasing the hit cache line, and if not, reading the target data corresponding to the access address in a main memory.

In one possible implementation, the memory access command is a read command; the read strategy is a fourth read strategy; the processing module 1210 is specifically configured to: judging whether the memory access command hits in a cache line, if yes, reading target data in the cache line corresponding to the access address, performing invalidation processing on the hit cache line, and if not, reading the target data corresponding to the access address in a main memory.

In one possible implementation, the memory access command is a write command; the write strategy is a first write strategy; the processing module 1210 is specifically configured to: judging whether the memory access command hits a cache line, if yes, writing target data in the cache line corresponding to the access address, if not, distributing the cache line, and writing the target data in the distributed cache line, wherein the data amount corresponding to the target data is non-integer times of the capacity of the cache line.

In one possible implementation, the memory access command is a write command; the write strategy is a second write strategy; the processing module 1210 is specifically configured to: judging whether the memory access command hits a cache line, if yes, writing target data in the cache line corresponding to the access address, if not, distributing the cache line, reading and merging the target data in a main memory, and writing the target data in the distributed cache line, wherein the data amount corresponding to the target data is non-integer times of the capacity of the cache line.

In one possible implementation, the memory access command is a write command; the write strategy is a third write strategy; the processing module 1210 is specifically configured to: judging whether the memory access command hits in a cache line, if yes, writing target data in the cache line corresponding to the access address, and if no, writing the target data in the access address in the main memory.

In one possible implementation, the processing module 1210 is further configured to allocate a cache line according to an IP data flow characteristic.

Optionally, the processing module 1210 determines a cache line allocation policy according to the IP data flow characteristics, and allocates the cache line according to the determined allocation policy.

It should be understood that the structure of the processing apparatus and the processing procedure for the memory access command described above are only examples, and should not be construed as a specific limitation, and each module in the processing apparatus may be added, reduced or combined as needed. In addition, operations and/or functions of each module in the processing apparatus are respectively for implementing corresponding flows of each method in fig. 1 to 11, and are not described herein for brevity.

Referring to fig. 13, fig. 13 is a schematic structural diagram of yet another processing apparatus according to an embodiment of the present application. The processing device shown in fig. 13 is further refined with respect to the processing device shown in fig. 12. As shown in fig. 13, the processing device 1300 may be a cache memory on a SoC, including: slave interface 1310, tag controller 1320, configuration register 1330, tag random access memory 1340, data controller 1350, data random access memory 1360, and master interface 1370.

It should be appreciated that the functions performed by the processing module 1210 may be performed by the slave interface 1310, the tag controller 1320, the data controller 1350 and the master interface 1370, respectively, and the functions performed by the storage module 1220 may be performed by the configuration register 1330, the tag random access memory 1340 and the data random access memory 1360, respectively.

The slave interface 1310 receives a memory access command and data from the IP through the system bus, the memory access command including an access address and indication information, and the slave interface 1310 forwards the memory access command to the tag controller 1320 and the data to the data controller 1350. The configuration register 1330 configures a cache line allocation policy corresponding to each instruction, and the tag controller 1320 queries the configuration value in the configuration register 1330 according to the instruction in the memory access command, and determines a read policy or a write policy corresponding to the memory access command and the cache line allocation policy corresponding to the instruction. Tag information corresponding to each cache line is stored in the tag ram 1340, and includes address information and status information of the cache line, and the tag controller 1320 maintains the status of the cache line by reading and writing the tag ram 1340, and forwards the completion command to the data controller 1350 after completing the query to the tag ram 1340. The data controller 1350 receives the query completion command forwarded from the tag controller 1320 and the data from the slave interface 1310, completes the read and write operations to the data ram 1360, and stores the data in the cache line in the data ram 1360. Data controller 1350 forwards commands and data to master interface 1370 that require access to memory management controllers (DRAM master control, DMC). Master interface 1370 receives commands and data from data controller 1350 and forwards the received commands and data to the memory management controller.

It should be understood that, for the specific processing procedure of the processing device shown in fig. 13 for the memory access command, reference may be made to the related descriptions of the respective method flows in fig. 1 to 11, which are not repeated herein for brevity.

The embodiment of the invention also provides a computer storage medium, wherein the computer storage medium can store a program, and the program can include part or all of the steps of any one of the method embodiments when being executed.

The embodiments of the present invention also provide a computer program comprising instructions which, when executed by a computer, cause the computer to perform part or all of the steps of any one of the processing methods.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc., in particular may be a processor in the computer device) to perform all or part of the steps of the above-described method of the various embodiments of the present application. Wherein the aforementioned storage medium may comprise: various media capable of storing program codes, such as a U disk, a removable hard disk, a magnetic disk, a compact disk, a Read-Only Memory (abbreviated as ROM), or a random access Memory (Random Access Memory, abbreviated as RAM), are provided.

Claims

1. A method of processing, comprising:

receiving a memory access command, wherein the memory access command comprises an access address and indication information, and the indication information is used for indicating the IP data flow characteristics of the intelligent peripheral corresponding to the memory access command;

Determining a read strategy or a write strategy corresponding to the memory access command according to the IP data flow characteristics;

and processing the access address according to the read strategy or the write strategy.

2. The method of claim 1, wherein determining a read policy or a write policy corresponding to the memory access command based on the IP data flow characteristics comprises:

and determining the read strategy or the write strategy corresponding to the memory access command according to the preset relation between the IP data flow characteristics and the read strategy or the write strategy.

3. The method of claim 1 or 2, wherein the memory access command is a read command; the read strategy is a first read strategy comprising:

judging whether the read command hits in the cache line, if yes, reading target data in the cache line corresponding to the access address, and if no, reading target data corresponding to the access address in the main memory.

4. The method of claim 1 or 2, wherein the memory access command is a read command; the read strategy is a second read strategy, the second read strategy comprising:

judging whether the read command hits the cache line, if yes, reading target data in the cache line corresponding to the access address, if not, distributing the cache line, reading the target data corresponding to the access address in the main memory, and writing the target data into the distributed cache line.

5. The method of claim 1 or 2, wherein the memory access command is a read command; the read strategy is a third read strategy, the third read strategy comprising:

judging whether the read command hits in a cache line, if yes, reading target data in the cache line corresponding to the access address, releasing the hit cache line, and if no, reading the target data corresponding to the access address in a main memory.

6. The method of claim 1 or 2, wherein the memory access command is a read command; the read strategy is a fourth read strategy, the fourth read strategy comprising:

judging whether the read command hits in the cache line, if yes, reading target data in the cache line corresponding to the access address, and performing invalidation processing on the hit cache line, and if not, reading the target data corresponding to the access address in the main memory.

7. The method of claim 1 or 2, wherein the memory access command is a write command; the write strategy is a first write strategy comprising:

judging whether the write command hits a cache line, if yes, writing target data in the cache line corresponding to the access address, if not, distributing the cache line, and writing the target data in the distributed cache line, wherein the data amount corresponding to the target data is non-integer times of the capacity of the cache line.

8. The method of claim 1 or 2, wherein the memory access command is a write command; the write strategy is a second write strategy comprising:

judging whether the write command hits a cache line, if yes, writing target data in the cache line corresponding to the access address, if not, distributing the cache line, reading and merging the target data in a main memory, and writing the target data in the distributed cache line, wherein the data amount corresponding to the target data is non-integer times of the capacity of the cache line.

9. The method of claim 1 or 2, wherein the memory access command is a write command; the write strategy is a third write strategy, the third write strategy comprising:

judging whether the write command hits in the cache line, if yes, writing target data in the cache line corresponding to the access address, and if no, writing the target data in the access address in the main memory.

10. The method of claim 4, 7 or 8, wherein the allocating the cache line comprises: and distributing the cache line according to the IP data flow characteristics.

11. The method of claim 9, wherein said allocating said cache line according to said IP data flow characteristics comprises:

determining an allocation strategy of the cache line according to the IP data flow characteristics;

and distributing the cache line according to the distribution strategy.

12. A processing apparatus, comprising: a slave interface, a tag controller and a data controller; wherein the slave interface, the tag controller and the data controller are connected with each other,

the slave interface is configured to receive a memory access command, where the memory access command includes an access address and indication information, and the indication information is used to indicate an IP data flow characteristic of the intelligent peripheral corresponding to the memory access command;

the tag controller is configured to determine a read policy or a write policy corresponding to the memory access command according to the IP data flow characteristic;

and the data controller is used for processing the access address according to the read strategy or the write strategy.

13. The processing apparatus according to claim 12, further comprising a configuration register for storing a preset relationship between the IP data flow characteristic and the read policy or the write policy, wherein the tag controller determines the read policy or the write policy corresponding to the memory access command according to the preset relationship.

14. The processing apparatus according to claim 12 or 13, wherein the processing apparatus further comprises a tag random access memory and a data random access memory, the tag random access memory being configured to store tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a read command, the read policy is a first read policy,

the tag controller is specifically configured to query the tag random access memory according to the first read policy, and determine whether the read command hits in a cache line; if hit, the data controller reads the target data in the data random access memory, and if miss, the data controller reads the target data corresponding to the access address in the main memory.

15. The processing apparatus according to claim 12 or 13, wherein the processing apparatus further comprises a tag random access memory and a data random access memory, the tag random access memory being configured to store tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a read command, the read policy is a second read policy,

the tag controller is specifically configured to query the tag random access memory according to the second read policy, and determine whether the read command hits in a cache line; if hit, the data controller reads the target data from the data random access memory, if miss, the data controller allocates a cache line, reads the target data corresponding to the access address from the main memory, and writes the target data into the allocated cache line.

16. The processing apparatus according to claim 12 or 13, wherein the processing apparatus further comprises a tag random access memory and a data random access memory, the tag random access memory being configured to store tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a read command, the read policy is a third read policy,

the tag controller is specifically configured to query the tag random access memory according to the third read policy, and determine whether the read command hits in a cache line; and if the target data is hit, the data controller reads the target data from the data random access memory, releases the hit cache line, and if the target data is not hit, the data controller reads the target data corresponding to the access address from the main memory.

17. The processing apparatus according to claim 12 or 13, wherein the processing apparatus further comprises a tag random access memory and a data random access memory, the tag random access memory being configured to store tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a read command, the read policy is a fourth read policy,

The tag controller is specifically configured to query the tag random access memory according to the fourth read policy, and determine whether the read command hits in a cache line; if hit, the data controller reads the target data from the data random access memory, and carries out invalidation processing on the hit cache line, if miss, the data controller reads the target data corresponding to the access address from the main memory.

18. The processing apparatus according to claim 12 or 13, wherein the processing apparatus further comprises a tag random access memory and a data random access memory, the tag random access memory being configured to store tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a write command, the write strategy is a first write strategy;

the tag controller is specifically configured to query the tag random access memory according to the first write policy, and determine whether the write command hits a cache line; if hit, the data controller writes target data in the data random access memory, if miss, the data controller allocates a cache line, and writes the target data in the allocated cache line, wherein the data amount corresponding to the target data is a non-integer multiple of the capacity of the cache line.

19. The processing apparatus according to claim 12 or 13, wherein the processing apparatus further comprises a tag random access memory and a data random access memory, the tag random access memory being configured to store tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a write command, the write strategy is a second write strategy;

the tag controller is specifically configured to query the tag random access memory according to the second write policy, and determine whether the write command hits a cache line; if hit, the data controller writes the target data in the data random access memory, if miss, the data controller allocates a cache line, reads and merges the target data in the main memory, and writes the target data in the cache line obtained by allocation, wherein the data amount corresponding to the target data is a non-integer multiple of the capacity of the cache line.

20. The processing apparatus according to claim 12 or 13, wherein the processing apparatus further comprises a tag random access memory and a data random access memory, the tag random access memory being configured to store tag information of a cache line; the data random access memory is used for storing data of the cache line; when the memory access command is a write command, the write strategy is a third write strategy;

The data controller is specifically configured to query the tag random access memory according to the third write policy, and determine whether the write command hits in a cache line; if hit, the data controller writes the target data in the data random access memory, and if miss, the data controller writes the target data in the access address in the main memory.

21. The processing apparatus of claim 15, 18 or 19, wherein the data controller allocating the cache line comprises: the data controller allocates the cache line according to the IP data flow characteristic.

22. The processing apparatus as in claim 21 wherein the data controller allocating the cache line based on the IP data flow characteristics comprises:

the data controller determines the allocation strategy of the cache line according to the IP data flow characteristics;

the data controller allocates the cache line according to the allocation policy.

23. A semiconductor chip, comprising:

the processing device of any of claims 12 to 22 and a central processing unit coupled to the processing device.

24. A terminal device, comprising:

The processing device according to any one of claims 12 to 22, and a memory external to the processing device, wherein the processing device and the memory external to the processing device are provided within a different semiconductor chip.