[go: up one dir, main page]

CN119357084A - Method and computer program product for adaptively combining cache coherent directory entries - Google Patents

Method and computer program product for adaptively combining cache coherent directory entries Download PDF

Info

Publication number
CN119357084A
CN119357084A CN202411908731.6A CN202411908731A CN119357084A CN 119357084 A CN119357084 A CN 119357084A CN 202411908731 A CN202411908731 A CN 202411908731A CN 119357084 A CN119357084 A CN 119357084A
Authority
CN
China
Prior art keywords
entry
directory
processor core
target
entries
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411908731.6A
Other languages
Chinese (zh)
Other versions
CN119357084B (en
Inventor
张�诚
李锐喆
孙超
赵彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Carpura Technology Co ltd
Original Assignee
Beijing Carpura Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Carpura Technology Co ltd filed Critical Beijing Carpura Technology Co ltd
Priority to CN202411908731.6A priority Critical patent/CN119357084B/en
Publication of CN119357084A publication Critical patent/CN119357084A/en
Application granted granted Critical
Publication of CN119357084B publication Critical patent/CN119357084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明涉及计算机领域,公开一种缓存一致性目录表项的自适应联合方法和计算机程序产品。方法包括:响应于处理器核的读入请求,获取高速缓存对应的第一表项集合;在第一表项集合中存在可用表项的情况下,将处理器核的编号信息保存在可用表项中;在第一表项集合中不存在表项或第一表项集合中不存在可用表项的情况下,申请新表项并将编号信息保存在新表项中;响应于处理器核的写入请求,获取高速缓存对应的第二表项集合;根据第二表项集合中各表项保存的编号信息确定对应的处理器核,控制各处理器核执行无效高速缓存的副本操作;使第二表项集合仅包含一个表项,并在该表项中保存编号信息。充分利用了目录的容量并可减少对所有处理器核的侦听操作。

The present invention relates to the field of computers, and discloses an adaptive combination method and computer program product for cache consistency directory entries. The method comprises: in response to a read-in request of a processor core, obtaining a first set of entries corresponding to a cache; when there are available entries in the first set of entries, saving the numbering information of the processor core in the available entries; when there are no entries in the first set of entries or there are no available entries in the first set of entries, applying for a new entry and saving the numbering information in the new entry; in response to a write request of the processor core, obtaining a second set of entries corresponding to the cache; determining the corresponding processor core according to the numbering information saved in each entry in the second set of entries, and controlling each processor core to perform a copy operation of invalidating the cache; making the second set of entries contain only one entry, and saving the numbering information in the entry. The capacity of the directory is fully utilized and the monitoring operation on all processor cores can be reduced.

Description

Adaptive federation method and computer program product for caching coherence directory entries
Technical Field
The present disclosure relates to the field of computers, and in particular, to an adaptive federation method and computer program product for caching coherence directory entries.
Background
Each processor core on a modern many-core CPU has a cache (i.e., cache) to increase the speed of data access and to ensure ease of programming. When running a parallel program, multiple processes or threads running on multiple processor cores may read and write data in the same shared memory region. In order to ensure the correctness of the running of the parallel program, cache consistency (cache coherence) needs to be provided between different processor cores of the same CPU and between different CPUs in the same computing node, so that data of the same cache block (CACHE LINE) is kept consistent among private caches of a plurality of processor cores in the same computing node.
Under the prior art, the method mainly comprises two cache consistency protocols, namely a consistency protocol based on a interception form (abbreviated as interception protocol) and a consistency protocol based on a directory structure (abbreviated as directory protocol), and is characterized in that:
Based on the network connection, the request sent by the private cache of a single processor core can be broadcast to the private caches of all other processor cores in the system, and the access requests of all processor cores can also be sequenced on the bus so as to realize the requirement of a cache consistency model and a storage consistency model on access and process a plurality of conflict requests of the same data block.
The directory protocol uses a directory structure to manage the shared access condition of each cache block (CACHE LINE). In the protocol, a memory access request sent by a private cache of a processor core is firstly sent to a directory structure with corresponding cache blocks, the current sharing condition of the cache blocks is recorded in the directory structure, and a controller determines the private cache or a memory of the processor core responding to the request according to the current sharing state.
The interception protocol and the directory protocol have advantages and disadvantages. The hardware implementation cost of the interception protocol is low and the running power consumption is low, but the parallel performance bottleneck is easy to be formed because all processors check the competitive access and ordered response of the bus. The directory protocol can enable consistency maintenance of cache blocks with different addresses to be parallel, so that cache consistency can be effectively realized, but on one hand, hardware and power consumption overheads are caused by the need of recording shared access states of a large number of cache blocks, and on the other hand, larger delay is introduced in the process of finding out a corresponding record of one cache block from the directory.
The number of cores in a current piece of CPU has reached hundreds (where AMD corporation has issued 192 cores of commercial CPU), while a computing node of a supercomputer typically has at least two pieces of CPU, so that cache coherency between nearly 400 or more processor cores needs to be supported. Whether snoop or directory protocols, it is difficult to support cache coherency between such a large number of processor cores. Therefore, many of many-core CPUs use a mixed directory and snoop approach, which may be characterized as snoop filtering (snoop filter). The interception filtering method can enable one directory entry to manage the sharing condition of a plurality of cache blocks with continuous addresses, and can inaccurately record the sharing condition of the cache blocks, such as the processor core number of an owner of one or a few copies of the cache blocks, the current effective sharing times and the like. When the sharing condition of the cache block is not precisely recorded, a phenomenon that one cache consistency operation needs to snoop all the processor cores may occur. For example, when a directory entry can only record the numbers of at most two all cores of a cache block, if there is one cache block being shared by 3 or more core reads and one core is immediately modifying the cache block, then it is necessary to snoop almost all processor cores on the CPU.
Either the directory protocol or the hybrid approach, it is often desirable to match the number of entries in the directory to the total number of private caches in all processor cores. For example, when each directory entry is responsible for recording the sharing condition of one cache block, the number of directory entries should be not less than the total number of cache blocks of the private caches in all processor cores, otherwise, the situation that the cache blocks in the private caches are swapped out due to insufficient directory entries may occur. The method of making each directory entry responsible for recording the sharing condition of a plurality of cache blocks with continuous addresses is to reduce the number of directory entries and hardware overhead, but when the access and storage behaviors of the parallel program are very random, the situation that the directory entries are far from sufficient is likely to occur. Therefore, the directory entries on the CPU rarely have a surplus.
In the prior art, data access of each thread in a parallel program is mostly carried out by a private variable, although some methods capable of avoiding redundant cache consistency operation are disclosed currently. However, these methods may make access to private variables not pass through the cache coherence protocol, resulting in that directory entries that were originally not too enough become free, and when redundant cache coherence operations are reduced more, the more free directory entries will be.
Further, there is a high probability that there will be more room in the directory entries in the future and that each directory entry will not be able to accurately record the cache block sharing, which often requires interception of almost all the processor cores of the shared cache block, but cannot fully utilize the directory capacity.
Disclosure of Invention
In order to solve the defects described in the background art and reduce snoop operations to all processor cores, the invention provides an adaptive joint method for caching coherence directory entries, an adaptive joint system for caching coherence directory entries and a computer program product.
At least one embodiment of the present application provides an adaptive federation method for caching coherence directory entries, the method comprising:
responding to a request of checking read ownership of a current cache block by a current processor, and acquiring a first target directory entry set corresponding to an address of the current cache block from a preset directory;
under the condition that a target table entry meeting a preset idle space condition exists in the first target directory table entry set, the number information of the current processor core is stored in the target table entry;
And applying a new table entry to the preset directory under the condition that no table entry exists in the first target directory table entry set or no target table entry for storing the free space of the number information of the current processor core exists in the first target directory table entry set, storing the number information of the current processor core in the new table entry under the condition that the new table entry is successfully applied, and storing the new table entry in the first target directory table entry set.
At least one embodiment of the present application also provides an adaptive joint system for caching coherence directory entries, where the adaptive joint system is characterized by comprising:
The first target directory entry set determining module is used for responding to a request of checking the read-in ownership of the current cache block by the current processor and acquiring a first target directory entry set corresponding to the address of the current cache block from a preset directory;
The first storage module is used for storing the number information of the current processor core in the target table entry under the condition that the target table entry meeting the preset idle space condition exists in the first target directory table entry set;
The second saving module is configured to apply for a new entry to the preset directory when no entry exists in the first target directory entry set or no target entry exists in the first target directory entry set, and save the number information of the current processor core in the new entry and save the new entry in the first target directory entry set when the new entry is successfully applied.
At least one embodiment of the present application also provides an electronic device comprising a memory, a processor and a computer program stored on the memory, the processor executing the computer program to carry out the steps of the method as described above.
At least one embodiment of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described above.
At least one embodiment of the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the fiber detection method as described above.
The embodiment of the application provides a self-adaptive combination method for cache consistency directory table entries, a self-adaptive combination system for cache consistency directory table entries and a computer program product. Compared with the prior art, the method fully utilizes the capacity of the directory and can reduce interception operation on all processor cores, and because the table entries of the target directory table entry set are orderly arranged according to the preset preservation policy, the searching efficiency for the table entries in the target directory table entry set is higher.
In some alternative embodiments, the method further comprises:
responding to a request of checking the write ownership of the current cache block by the current processor, and acquiring a second target directory entry set corresponding to the address of the current cache block from the preset directory;
Determining corresponding processor cores according to the number information of the processor cores stored in each table entry in the second target directory table entry set, and sending out an invalid copy instruction so that each processor core executes an operation of invalidating the copy of the current cache block;
and sending out an entry updating instruction so that the second target directory entry set only comprises an initial directory entry, and storing the number information of the current processor core in the initial directory entry.
In some alternative embodiments, the method further comprises:
Responding to a swap-out request of the current processor core for the current cache block, and acquiring an address of the current cache block and a target directory entry storing the number information of the current processor core from the preset directory;
And deleting the number information of the current processor core from the target directory entry when the target directory entry exists.
In some optional embodiments, after the deleting the number information of the current processor core from the target directory entry, the method further comprises:
and deleting the table entry meeting the preset conditions from the target directory entry set for storing the target directory entry, wherein the preset conditions comprise that no number information of any processor core is stored in the table entry.
In some alternative embodiments, the method further comprises:
Under the condition that the application of the new table entry is unsuccessful, an entry updating instruction is sent out, so that the first target directory table entry set only comprises one initial directory table entry, the number information of the current processor core is stored in the initial directory table entry in a preset basic content format, and then the first target directory table entry set is changed into a state of inaccurate record sharing, and the directory table entry set in the state of inaccurate record sharing only comprises the unique directory table entry.
In some alternative embodiments, the target directory entry set includes:
isomorphic directory entry set and heterogeneous directory entry set, wherein:
for the table entries in the same isomorphic directory table entry set, the content formats of the table entries are the same;
for entries in the same set of heterogeneous directory entries, the content format of the entries includes one or more.
In some optional embodiments, the content format of the table entry includes a bitmap enumeration mode or a number enumeration mode, where:
the bitmap enumeration mode records ownership conditions of a plurality of processor cores with continuous processor core numbers aiming at cache block copies in a bitmap mode;
The numbering enumeration mode records ownership of a plurality of processor cores for cache block copies in a mode of processor core numbering values.
In some alternative embodiments, in the first target directory entry set and the second target directory entry set, entries are ordered according to a preset preservation policy.
In some alternative embodiments, the preset preservation policy includes:
For any two adjacent first and second entries in the target directory entry set, the value of the number information of the processor core stored in the first entry is smaller or larger than the value of the number information of the processor core stored in the second entry.
In some alternative embodiments, the system further comprises:
The second target directory entry set determining module is used for responding to the request of checking the write ownership of the current cache block by the current processor and acquiring a second target directory entry set corresponding to the address of the current cache block from the preset directory;
The invalidation module is used for determining a corresponding processor core according to the number information of the processor core stored in each table entry in the second target directory table entry set, and sending out an invalidation copy instruction so that each processor core executes the operation of invalidating the copy of the current cache block;
And the updating module is used for sending an entry updating instruction so that the second target directory entry set only comprises an initial directory entry, and the number information of the current processor core is stored in the initial directory entry.
Drawings
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.
FIG. 1 is a flowchart of an adaptive join method for caching coherence directory entries according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of another adaptive federation method for caching coherence directory entries provided by embodiments of the present disclosure;
Fig. 3 is a flowchart of another adaptive join method for caching coherence directory entries according to an embodiment of the present disclosure.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present invention, numerous technical details have been set forth in order to provide a better understanding of the present invention. The claimed invention may be practiced without these specific details and with various changes and modifications based on the following embodiments.
In view of the shortcomings of the prior art, an object of an embodiment of the present invention is to provide an adaptive combining method for cache coherence directory entries, an adaptive combining system for cache coherence directory entries, and a computer program product. Compared with the prior art, the method fully utilizes the capacity of the directory and can reduce interception operation on all processor cores, and because the table entries of the target directory table entry set are orderly arranged according to the preset preservation policy, the searching efficiency for the table entries in the target directory table entry set is higher.
Embodiment one:
the embodiment of the invention relates to a self-adaptive joint method for caching consistency directory entries.
The implementation details of the adaptive combining method for cache coherence directory entries in this embodiment are specifically described from the standpoint of cache block reading, and the following is only implementation details provided for easy understanding, but is not necessary to implement this embodiment.
The self-adaptive joint method of the cache consistency directory entries of the embodiment can be applied to electronic equipment with communication, calculation and data storage capabilities. As shown in fig. 1, the adaptive combining method for cache coherence directory entries provided in this embodiment includes the following steps:
step 110, responding to a request of checking read ownership of a current cache block by a current processor, and acquiring a first target directory entry set corresponding to an address of the current cache block from a preset directory.
Specifically, a preset directory responds to an application of checking read ownership of a current cache block by a current processor, a first target directory entry set corresponding to an address of the current cache block is queried, and an entry is stored in the target directory entry set, wherein the entry is used for recording numbers of all cores of the current cache block.
Step 120, under the condition that a target table entry meeting a preset idle space condition exists in the first target directory table entry set, the number information of the current processor core is stored in the target table entry.
The preset free space condition comprises that the available free space in the table entry is not smaller than the space required for storing the number information of the current processor core.
Specifically, when an entry exists in the first target directory entry set, and there is a free space for recording the number information of the current processor core, the number information of the current processor core is recorded in the entry.
Step 130, applying a new entry to the preset directory under the condition that no entry exists in the first target directory entry set or no target entry exists in the first target directory entry set, and under the condition that the new entry is successfully applied, saving the number information of the current processor core in the new entry and saving the new entry in the first target directory entry set.
Specifically, when the first target directory entry set has no entry or no free space in which the number information of the current processor core is recorded in all entries, a new entry is applied to a directory, after a new entry is successfully applied, the new entry is added to the first target directory entry set, and the number information of the current processor core is recorded in the new entry.
Embodiment two:
based on the foregoing embodiments, implementation details of the adaptive combining method for cache coherence directory entries according to this embodiment will be specifically described from the standpoint of writing into a cache block, and the following details are provided only for convenience in understanding, and are not necessary for implementing this embodiment.
The self-adaptive joint method of the cache consistency directory entries of the embodiment can be applied to electronic equipment with communication, calculation and data storage capabilities. As shown in fig. 2, the adaptive combining method for cache coherence directory entries provided in this embodiment includes the following steps:
step 210, responding to a request of checking the write ownership of the target cache block by the current processor, and acquiring a second target directory entry set corresponding to the address of the target cache block from a preset directory.
Specifically, the preset directory responds to the application of checking the write ownership of the current cache block by the current processor, and queries a second target directory entry set corresponding to the address of the current cache block.
Step 220, determining a corresponding processor core according to the number information of the processor core stored in each entry in the second target directory entry set, and issuing an invalid copy instruction, so that each processor core executes an operation of invalidating the copy of the target cache block.
Specifically, an operation of invalidating the valid copy of the current cache block is initiated to each associated processor core according to the second set of target directory entries.
And 230, sending an entry updating instruction to enable the second target directory entry set to only contain an initial directory entry, removing redundant directory entries, and storing the number information of the current processor core in the initial directory entry.
Specifically, after the valid copy operation of the invalid current cache block, the second target directory entry set is kept only by one initial directory entry, and the number information of the current processor core is recorded in the initial directory entry.
Embodiment III:
Based on the above embodiment, the embodiment of the present invention relates to an adaptive combining method for caching coherence directory entries.
The implementation details of the adaptive combining method for cache coherence directory entries in the present embodiment are specifically described below, and the following description is merely provided for understanding the implementation details, and is not a necessity for implementing the present embodiment.
The self-adaptive joint method of the cache consistency directory entries of the embodiment can be applied to electronic equipment with communication, calculation and data storage capabilities. As shown in fig. 3, the adaptive combining method for cache coherence directory entries provided in this embodiment includes the following steps:
step 310, responding to a request of checking read ownership of a target cache block by a current processor, and acquiring a first target directory entry set corresponding to an address of the target cache block from a preset directory;
step 320, under the condition that a target table entry meeting a preset idle space condition exists in the first target directory table entry set, saving the number information of the current processor core in the target table entry;
Step 330, applying a new entry to the preset directory if no entry exists in the first target directory entry set or no target entry exists in the first target directory entry set for storing the number information of the current processor core, storing the number information of the current processor core in the new entry if the new entry is successfully applied, and storing the new entry in the first target directory entry set;
Step 340, responding to the request of the current processor for checking the write ownership of the target cache block, and acquiring a second target directory entry set corresponding to the address of the target cache block from the preset directory;
Step 350, determining a corresponding processor core according to the number information of the processor core stored in each entry in the second target directory entry set, and issuing an invalid copy instruction, so that each processor core executes an operation of invalidating the copy of the target cache block;
And 360, sending an entry updating instruction to enable the second target directory entry set to only contain an initial directory entry, removing redundant directory entries, and storing the number information of the current processor core in the initial directory entry.
As an example, the method disclosed in this embodiment may specifically include the following process flows:
the method comprises the steps that a preset directory responds to an application of checking read ownership of a current cache block by a current processor, and a first target directory entry set corresponding to an address of the current cache block is queried, wherein the entries are used for recording numbers of all cores of the current cache block;
when an entry in the first target directory entry set has an idle space for recording the number information of the current processor core, recording the number information of the current processor core in the entry;
when the first target directory entry set has no entry or no free space in which the number information of the current processor core is recorded, applying a new entry to a directory, adding the new entry to the first target directory entry set after the new entry is successfully applied, and recording the number information of the current processor core into the new entry;
The method comprises the steps that a preset directory responds to an application of checking write ownership of a current cache block by a current processor, a second target directory entry set corresponding to an address of the current cache block is queried, an operation of invalidating valid copies of the current cache block is initiated to each relevant processor core according to the second target directory entry set, then the second target directory entry set only keeps one initial directory entry, and the number information of the current processor core is recorded in the initial directory entry.
Embodiment four:
Based on the foregoing embodiments, this embodiment further explains and describes the adaptive association method of cache coherence directory entries provided in the foregoing embodiments.
In the related art, no matter how many private caches of the same cache block exist in the private caches of the processor cores, the address of the cache block can only correspond to at most one entry in the directory (in the second-level directory protocol, one entry of the same cache block in the first-level directory and an entry in the second-level directory of each core group are actually the same entry). One of the most main technical characteristics of the invention is that the address of the same cache block corresponds to an entry set in a directory, the number of entries in the set changes along with the running process of a parallel program, and all entries in the set are combined to record the sharing condition of the same cache block among all processor cores, namely, the sharing condition of a recordable part of each entry.
In step 310, a first target directory entry set corresponding to an address of a target cache block is obtained from a preset directory in response to a request of a current processor to check read ownership of the target cache block.
In some embodiments, the target directory entry set comprises:
isomorphic directory entry set and heterogeneous directory entry set, wherein:
for the table entries in the same isomorphic directory table entry set, the content formats of the table entries are the same;
for entries in the same set of heterogeneous directory entries, the content format of the entries includes one or more.
In some optional embodiments, the content format of the table entry includes a bitmap enumeration mode or a number enumeration mode, and a mutual conversion between the two modes, wherein:
the bitmap enumeration mode records ownership conditions of a plurality of processor cores with continuous processor core numbers aiming at cache block copies in a bitmap mode;
The numbering enumeration mode records ownership of a plurality of processor cores for cache block copies in a mode of processor core numbering values.
Alternatively, the content formats of all entries in the same directory entry set may be identical, which is referred to as an isomorphic directory entry set. The content formats of all entries in the same directory entry set may be multiple, which is referred to as a heterogeneous directory entry set, meaning that one directory entry may select one of multiple modalities to store the number information of the processor cores sharing the cache block. In the case that the number of processor cores in the current computing node reaches hundreds, it is difficult for one directory entry to accurately record the sharing condition of any cache block among all processor cores, and a simplified manner is often adopted.
For example, one directory entry has 64 bits (bits), where 34 bits are the address tag bits of the cache block, 10 bits record the number of currently valid copies, and two other 10 bits record the processor core numbers of the owners of the two valid copies. In the present application, the above simplified manner is referred to as a preset basic content format.
The isomorphic and heterogeneous directory entry sets are further described below:
1) Isomorphic directory entry sets based on the basic content format. Each table item in the set adopts a basic content format, the address marks of all the table items are the same, and each table item can record the processor core numbers of the owners of two effective copies, so that when N table items exist in the set, the maximum number of the owners of the effective copies which can be accurately recorded is 2*N;
2) Heterogeneous directory entry sets based on linked list structures. All entries in the collection are organized into a linked list structure or a linked structure or index structure similar to a file storage manner. Taking a linked list structure as an example, only the table entry positioned at the head of the linked list records an address mark, and each table entry can have a plurality of bit marks for marking the next table entry. When the linked list structure is a doubly linked list (the doubly linked list helps to sequence and de-redundant the entries in the set), each entry will also have several bit marks for the last entry. Generally, a directory will also employ a cache-like set associative mapping strategy to expedite a lookup according to a cache block address, while directory entries within the same set (set) will typically not be too numerous (e.g., not more than 64). Since all entries in a directory entry set come from the same group, only up to 7 bits are needed to mark the last entry or the next entry, while the remaining 50 bits of a non-linked list head entry can record the processor core number of 5 active copy owners (called number enumeration). In addition, the table entry of a non-linked list head can also record the ownership status of several check cache block copies with consecutive processor core numbers in a bitmap mode (referred to as a bitmap enumeration mode), at this time, 10 bits of the core numbers (or core group numbers) of the initial processor cores of the bitmap can be recorded in the remaining 50 bits, and the other 40 bits record the ownership status of the corresponding 40 processor check cache block copies. In one heterogeneous directory entry set, entries in a numbering enumeration mode and entries in a bitmap enumeration mode may exist simultaneously, where the numbering enumeration mode is suitable for a case where the core numbers between processor cores are sparse, and the bitmap enumeration mode is suitable for a case where the core numbers between processor cores sharing the corresponding cache blocks are dense. Along with the change of the sharing use condition of the same cache block among the processor cores in the running process of the parallel program, the table entries can be adaptively converted between a number enumeration mode and a bitmap enumeration mode, so that the accurate record of the sharing condition of the same cache block is realized by combining as few table entries as possible.
In step 320, if there is a target entry satisfying a preset free space condition in the first target directory entry set, the number information of the current processor core is stored in the target entry.
Alternatively, one directory entry may typically record the sharing of multiple processors to check the same cache block, whether in a homogenous directory entry set or a heterogeneous directory representation set. Before the number information of the current processor core is recorded in the target directory entry set, whether the list entries in the target directory entry set have the free space to record the number information of the current processor core (for example, in a number enumeration mode or a bitmap enumeration mode) should be checked, so as to improve the utilization efficiency of the directory entries.
In step 330, if no entry exists in the first target directory entry set or no target entry exists in the first target directory entry set, which is used for storing the number information of the current processor core, a new entry is applied to the preset directory, and if the new entry is successfully applied, the number information of the current processor core is stored in the new entry, and the new entry is stored in the first target directory entry set.
In some embodiments, the method further comprises:
Under the condition that the application of the new table entry is unsuccessful, an entry updating instruction is sent out, so that the first target table entry set only comprises an initial table entry and redundant table entries are removed, the number information of the current processor core is stored in the initial table entry in a preset basic content format, the first target table entry set is changed into a state of inaccurate record sharing, and the table entry set in the state of inaccurate record sharing only comprises a unique table entry.
Alternatively, it is a natural procedure to add a new entry to the target directory entry set and record the number information of the current processor core to the new entry after applying for it. The application of new entries is generally successful when there are free entries in the target directory entry set (when the directory adopts a group association mapping policy and there are free entries in the corresponding group), if there are no free entries in the directory, the application of new entries must succeed if there are no entries in the target directory entry set (i.e. the data of the cache block is read into the cache for the first time), but if there are no free entries in the directory, the application of new entries may succeed or may fail if there are entries in the target directory entry set, and the success or failure is mainly dependent on the related priority policy (similar to the cache replacement algorithm). When the application of a new table entry fails, the target directory table entry cannot accurately record the sharing condition of the cache block among all the processor cores, and the recording can be performed by adopting a basic content format, and only one table entry is always reserved in the target directory table entry set (the rest table entries are released). When the application of a new table entry causes an active table entry to be preempted, the directory table entry set to which the active table entry belongs can not accurately record the sharing condition of the cache block among all processor cores.
In step 340, in response to the request of the current processor for checking the write ownership of the target cache block, a second target directory entry set corresponding to the address of the target cache block is obtained from the preset directory.
In step 350, the corresponding processor core is determined according to the number information of the processor core stored in each entry in the second target directory entry set, and an invalid copy instruction is issued, so that each processor core executes an operation of invalidating the copy of the target cache block.
In step 360, an entry update instruction is issued to cause only one initial directory entry to be included in the second target directory entry set and to remove redundant directory entries, and the number information of the current processor core is saved in the initial directory entry.
As an example, the specific flow of the method disclosed in this embodiment includes:
the method comprises the steps that a preset directory responds to an application of checking read ownership of a current cache block by a current processor, and a first target directory entry set corresponding to an address of the current cache block is queried, wherein the entries are used for recording numbers of all cores of the current cache block;
when an entry in the first target directory entry set has an idle space for recording the number information of the current processor core, recording the number information of the current processor core in the entry;
when the first target directory entry set has no entry or no free space in which the number information of the current processor core is recorded, applying a new entry to a directory, adding the new entry to the first target directory entry set after the new entry is successfully applied, and recording the number information of the current processor core into the new entry;
The method comprises the steps that a preset directory responds to an application of checking write ownership of a current cache block by a current processor, a second target directory entry set corresponding to an address of the current cache block is queried, an operation of invalidating valid copies of the current cache block is initiated to each relevant processor core according to the second target directory entry set, then the second target directory entry set only keeps one initial directory entry, and the number information of the current processor core is recorded in the initial directory entry.
When the second target directory entry set accurately records all relevant processor cores sharing the current cache block, respectively initiating operations of invalidating valid copies of the current cache block to the relevant processor cores (corresponding to classical implementation of a cache coherence protocol, the current processor core can acquire an latest copy of the cache block while invalidating). When the second set of target directory entries does not have an exact record of all relevant processor cores sharing the current cache block, then an operation needs to be initiated to nearly all processor cores to invalidate all valid copies of the current cache block. If the second target directory entry set is an empty set, then no invalidation operation needs to be initiated and then a new directory entry needs to be applied. The write request may cause only the current processor core to have a valid copy of the current cache block among all processor cores, so the second target directory entry set eventually only needs to hold one entry.
In some embodiments, the method further comprises:
Responding to a swap-out request of the current processor core for the target cache block, and acquiring an address of the current cache block and a target directory table entry storing the number information of the current processor core from the preset directory;
And deleting the number information of the current processor core from the target directory entry when the target directory entry exists.
In some embodiments, after said deleting the number information of the current processor core from the target directory entry, the method further comprises:
And deleting the table entries meeting the preset conditions from the target directory table entry set for storing the target directory table entries.
In some embodiments, the preset conditions include:
No numbering information for any processor cores is saved in the entry.
Optionally, whether the target directory entry set accurately records the sharing condition of the current cache block or not, the relevant content in the target directory entry set needs to be updated before the current cache block is swapped out. Where a target directory entry exists (corresponding to the case of a precise record), the number information of the current processor core needs to be removed from the contents of the target directory entry. Further, the target directory entry set may be optimally adjusted, for example, at most one entry in the set has free space, an entry without substantial content is removed, or a certain entry is converted between a bitmap enumeration mode and a numbering enumeration mode.
In some embodiments, in the first target directory entry set and the second target directory entry set, entries are ordered according to a preset preservation policy.
In some embodiments, the preset preservation policy includes:
For any two adjacent first and second entries in the target directory entry set, the value of the number information of the processor core stored in the first entry is smaller or larger than the value of the number information of the processor core stored in the second entry.
Alternatively, if there are multiple entries in the set, it is necessary to find an entry related to the number information of the current processor core from all the entries in the set. If there is no order relationship between entries, it is often necessary to traverse all entries. In order to accelerate the search process, the entries may always be in a certain order, such as the processor core numbers are arranged in ascending or descending order, and after the contents of the set change, the order is maintained by adjusting the contents between the entries.
Fifth embodiment:
another embodiment of the present application is directed to an adaptive joint system for caching coherence directory entries.
The implementation details of the adaptive joint system for cache coherence directory entries of the present embodiment are specifically described from the perspective of cache block reading, and the following is only implementation details provided for easy understanding, but is not necessary for implementing the present embodiment, where the adaptive joint system for cache coherence directory entries provided in the present embodiment includes:
The first target directory entry set determining module is used for responding to a request of checking read ownership of a target cache block by a current processor and acquiring a first target directory entry set corresponding to an address of the target cache block from a preset directory;
The first storage module is used for storing the number information of the current processor core in the target table entry under the condition that the target table entry meeting the preset idle space condition exists in the first target directory table entry set;
The second saving module is configured to apply for a new entry to the preset directory when no entry exists in the first target directory entry set or no target entry exists in the first target directory entry set, and save the number information of the current processor core in the new entry and save the new entry in the first target directory entry set when the new entry is successfully applied.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working processes of each module in the adaptive joint system of cache coherence directory entries may refer to corresponding processes in the foregoing method embodiments, and this embodiment is not repeated herein.
It should be noted that, each module involved in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present application, units less closely related to solving the technical problem presented by the present application are not introduced in the present embodiment, but it does not indicate that other units are not present in the present embodiment.
Example six:
based on the foregoing embodiments, another embodiment of the present application relates to an adaptive joint system for caching coherence directory entries.
The implementation details of the adaptive joint system for cache coherence directory entries of the present embodiment are specifically described below from the standpoint of writing into a cache block, and the following is only implementation details provided for easy understanding, but is not necessary for implementing the present embodiment, where the adaptive joint system for cache coherence directory entries provided in the present embodiment includes:
the second target directory entry set determining module is used for responding to a request of checking the write ownership of the target cache block by the current processor and acquiring a second target directory entry set corresponding to the address of the target cache block from a preset directory;
The invalidation module is used for determining a corresponding processor core according to the number information of the processor core stored in each table entry in the second target directory table entry set, and sending out an invalidation copy instruction so that each processor core executes the operation of invalidating the copy of the target cache block;
And the updating module is used for sending an entry updating instruction so that the second target directory entry set only comprises an initial directory entry, redundant directory entries are removed, and the number information of the current processor core is stored in the initial directory entry.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working processes of each module in the adaptive joint system of cache coherence directory entries may refer to corresponding processes in the foregoing method embodiments, and this embodiment is not repeated herein.
It should be noted that, each module involved in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present application, units less closely related to solving the technical problem presented by the present application are not introduced in the present embodiment, but it does not indicate that other units are not present in the present embodiment.
Embodiment seven:
based on the foregoing embodiments, another embodiment of the present application relates to an adaptive joint system for caching coherence directory entries.
The implementation details of the adaptive joint system for cache coherence directory entries of the present embodiment are specifically described below, which are provided only for easy understanding, but not necessary for implementing the present embodiment, where the adaptive joint system for cache coherence directory entries provided in the present embodiment includes:
The first target directory entry set determining module is used for responding to a request of checking read ownership of a target cache block by a current processor and acquiring a first target directory entry set corresponding to an address of the target cache block from a preset directory;
The first storage module is used for storing the number information of the current processor core in the target table entry under the condition that the target table entry meeting the preset idle space condition exists in the first target directory table entry set;
a second saving module, configured to apply a new entry to the preset directory if no entry exists in the first target directory entry set or no target entry exists in the first target directory entry set for saving the number information of the current processor core, and save the number information of the current processor core in the new entry and save the new entry in the first target directory entry set if the new entry is successfully applied;
The second target directory entry set determining module is used for responding to the request of checking the write ownership of the target cache block by the current processor and acquiring a second target directory entry set corresponding to the address of the target cache block from the preset directory;
The invalidation module is used for determining a corresponding processor core according to the number information of the processor core stored in each table entry in the second target directory table entry set, and sending out an invalidation copy instruction so that each processor core executes the operation of invalidating the copy of the target cache block;
And the updating module is used for sending an entry updating instruction so that the second target directory entry set only comprises an initial directory entry, redundant directory entries are removed, and the number information of the current processor core is stored in the initial directory entry.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working processes of each module in the adaptive joint system of cache coherence directory entries may refer to corresponding processes in the foregoing method embodiments, and this embodiment is not repeated herein.
It should be noted that, each module involved in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present application, units less closely related to solving the technical problem presented by the present application are not introduced in the present embodiment, but it does not indicate that other units are not present in the present embodiment.
Example eight:
Another embodiment of the application is directed to an electronic device comprising at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the embodiments described above.
Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.
The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.
Example nine:
another embodiment of the application relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. The storage medium includes various media capable of storing program codes, such as a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk.
In some embodiments of the application, a computer program product is also provided, comprising a computer program which, when executed by a processor, implements the steps of the method described in the above embodiments.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the application and that various changes in form and details may be made therein without departing from the spirit and scope of the application.

Claims (12)

1.一种缓存一致性目录表项的自适应联合方法,其特征在于,所述方法包括:1. A method for adaptively combining cache coherent directory entries, characterized in that the method comprises: 响应于当前处理器核对当前cache块的读入所有权的请求,从预设目录中获取所述当前cache块的地址对应的第一目标目录表项集合;In response to a request from the current processor core for read ownership of the current cache block, obtaining a first target directory entry set corresponding to the address of the current cache block from a preset directory; 在所述第一目标目录表项集合中存在满足预设空闲空间条件的目标表项的情况下,将所述当前处理器核的编号信息保存在所述目标表项中;If there is a target entry that meets a preset free space condition in the first target directory entry set, storing the number information of the current processor core in the target entry; 在所述第一目标目录表项集合中不存在表项,或所述第一目标目录表项集合中不存在用于保存所述当前处理器核的编号信息的空闲空间的目标表项的情况下,向所述预设目录申请新表项,在成功申请到所述新表项的情况下,将所述当前处理器核的编号信息保存在所述新表项中,并将所述新表项保存在所述第一目标目录表项集合中。When there is no entry in the first target directory entry set, or when there is no target entry in the first target directory entry set with free space for storing the numbering information of the current processor core, a new entry is applied for from the preset directory, and when the new entry is successfully applied for, the numbering information of the current processor core is saved in the new entry, and the new entry is saved in the first target directory entry set. 2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, characterized in that the method further comprises: 响应于所述当前处理器核对所述当前cache块的写入所有权的请求,从所述预设目录中获取所述当前cache块的地址对应的第二目标目录表项集合;In response to a request by the current processor to verify the write ownership of the current cache block, obtaining a set of second target directory entries corresponding to the address of the current cache block from the preset directory; 根据所述第二目标目录表项集合中各表项中保存的处理器核的编号信息确定对应处理器核,并发出无效副本指令,以使各所述处理器核执行无效所述当前cache块的副本的操作;Determine the corresponding processor core according to the number information of the processor core stored in each entry in the second target directory entry set, and issue an invalid copy instruction to enable each processor core to execute an operation of invalidating the copy of the current cache block; 发出表项更新指令,以使所述第二目标目录表项集合中仅包含一个初始目录表项,并在所述初始目录表项中保存所述当前处理器核的编号信息。An entry update instruction is issued so that the second target directory entry set includes only one initial directory entry, and the number information of the current processor core is saved in the initial directory entry. 3.根据权利要求1所述的方法,其特征在于,所述方法还包括:3. The method according to claim 1, characterized in that the method further comprises: 响应于所述当前处理器核针对所述当前cache块的换出请求,从所述预设目录中获取所述当前cache块的地址和保存有所述当前处理器核的编号信息的目标目录表项;In response to a swap-out request of the current processor core for the current cache block, acquiring from the preset directory an address of the current cache block and a target directory entry storing serial number information of the current processor core; 在存在所述目标目录表项的情况下,从所述目标目录表项中删除所述当前处理器核的编号信息。In the case where the target directory entry exists, the number information of the current processor core is deleted from the target directory entry. 4.根据权利要求3所述的方法,其特征在于,在所述从所述目标目录表项中删除所述当前处理器核的编号信息之后,所述方法还包括:4. The method according to claim 3, characterized in that after deleting the number information of the current processor core from the target directory entry, the method further comprises: 从保存所述目标目录表项的目标目录表项集合中,删除满足符合预设条件的表项;其中,所述预设条件包括:在所述表项中未保存任何处理器核的编号信息。Delete the table entries that meet the preset conditions from the target directory table entry set that stores the target directory table entries; wherein the preset conditions include: no number information of any processor core is stored in the table entry. 5.根据权利要求1所述的方法,其特征在于,所述方法还包括:5. The method according to claim 1, characterized in that the method further comprises: 在申请所述新表项不成功的情况下,发出表项更新指令,以使所述第一目标目录表项集合中仅包含一个初始目录表项而去掉多余目录表项,并以预设基本内容格式将所述当前处理器核的编号信息保存在所述初始目录表项中。If the application for the new entry is unsuccessful, an entry update instruction is issued so that the first target directory entry set contains only one initial directory entry and removes redundant directory entries, and the numbering information of the current processor core is saved in the initial directory entry in a preset basic content format. 6.根据权利要求1~5中任一项所述的方法,其特征在于,所述目标目录表项集合包括:6. The method according to any one of claims 1 to 5, wherein the target directory entry set comprises: 同构目录表项集合和异构目录表项集合;其中:A homogeneous directory entry set and a heterogeneous directory entry set; wherein: 对于位于同一个所述同构目录表项集合中的表项,各表项的内容格式均相同;For the entries in the same isomorphic directory entry set, the content format of each entry is the same; 对于位于同一个所述异构目录表项集合中的表项,表项的内容格式包括一种或多种。For entries in the same heterogeneous directory entry set, the content formats of the entries include one or more. 7.根据权利要求6所述的方法,其特征在于,所述表项的内容格式包括:位图枚举方式或编号枚举方式;其中:7. The method according to claim 6, characterized in that the content format of the table entry includes: a bitmap enumeration method or a number enumeration method; wherein: 位图枚举方式以位图方式记录处理器核编号连续的若干个处理器核针对cache块副本的所有权情况;The bitmap enumeration method records the ownership of cache block copies by several processor cores with consecutive processor core numbers in a bitmap manner; 编号枚举方式以处理器核编号值的方式记录若干个处理器核针对cache块副本的所有权情况。The number enumeration method records the ownership of cache block copies of several processor cores in the form of processor core number values. 8.根据权利要求2所述的方法,其特征在于,在所述第一目标目录表项集合和所述第二目标目录表项集合中,各表项均按照预设保存策略有序排列。8. The method according to claim 2 is characterized in that, in the first target directory entry set and the second target directory entry set, each entry is arranged in order according to a preset preservation strategy. 9.根据权利要求8所述的方法,其特征在于,所述预设保存策略包括:9. The method according to claim 8, wherein the preset saving strategy comprises: 针对目标目录表项集合中任意相邻的两个第一表项和第二表项,保存在所述第一表项中的处理器核的编号信息的值小于或大于保存在所述第二表项中的处理器核的编号信息的值。For any two adjacent first and second entries in the target directory entry set, the value of the processor core number information stored in the first entry is smaller than or larger than the value of the processor core number information stored in the second entry. 10.一种缓存一致性目录表项的自适应联合系统,其特征在于,包括:10. An adaptive association system for cache coherent directory entries, comprising: 第一目标目录表项集合确定模块,用于响应于当前处理器核对当前cache块的读入所有权的请求,从预设目录中获取所述当前cache块的地址对应的第一目标目录表项集合;A first target directory entry set determination module, configured to obtain a first target directory entry set corresponding to the address of the current cache block from a preset directory in response to a request from the current processor to check the read ownership of the current cache block; 第一保存模块,用于在所述第一目标目录表项集合中存在满足预设空闲空间条件的目标表项的情况下,将所述当前处理器核的编号信息保存在所述目标表项中;A first saving module, configured to save the numbering information of the current processor core in the target entry if there is a target entry satisfying a preset free space condition in the first target directory entry set; 第二保存模块,用于在所述第一目标目录表项集合中不存在表项,或所述第一目标目录表项集合中不存在用于保存所述当前处理器核的编号信息的空闲空间的目标表项的情况下,向所述预设目录申请新表项,在成功申请到所述新表项的情况下,将所述当前处理器核的编号信息保存在所述新表项中,并将所述新表项保存在所述第一目标目录表项集合中。A second saving module is used to apply for a new entry from the preset directory when there is no entry in the first target directory entry set, or when there is no target entry in the first target directory entry set with free space for saving the numbering information of the current processor core, and when the new entry is successfully applied for, save the numbering information of the current processor core in the new entry and save the new entry in the first target directory entry set. 11.根据权利要求10所述的统,其特征在于,还包括:11. The system according to claim 10, further comprising: 第二目标目录表项集合确定模块,用于响应于所述当前处理器核对所述当前cache块的写入所有权的请求,从所述预设目录中获取所述当前cache块的地址对应的第二目标目录表项集合;a second target directory entry set determining module, configured to obtain a second target directory entry set corresponding to the address of the current cache block from the preset directory in response to a request of the current processor to check the write ownership of the current cache block; 无效模块,用于根据所述第二目标目录表项集合中各表项中保存的处理器核的编号信息确定对应处理器核,并发出无效副本指令,以使各所述处理器核执行无效所述当前cache块的副本的操作;an invalidation module, configured to determine a corresponding processor core according to the number information of the processor core stored in each entry in the second target directory entry set, and issue an invalidation copy instruction so that each processor core executes an operation of invalidating the copy of the current cache block; 更新模块,用于发出表项更新指令,以使所述第二目标目录表项集合中仅包含一个初始目录表项,并在所述初始目录表项中保存所述当前处理器核的编号信息。The updating module is used to issue an entry updating instruction so that the second target directory entry set includes only one initial directory entry, and the serial number information of the current processor core is saved in the initial directory entry. 12.一种计算机程序产品,包括计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1至9中任一项所述方法的步骤。12. A computer program product, comprising a computer program, characterized in that when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 9 are implemented.
CN202411908731.6A 2024-12-24 2024-12-24 Method and computer program product for adaptively combining cache coherent directory entries Active CN119357084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411908731.6A CN119357084B (en) 2024-12-24 2024-12-24 Method and computer program product for adaptively combining cache coherent directory entries

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411908731.6A CN119357084B (en) 2024-12-24 2024-12-24 Method and computer program product for adaptively combining cache coherent directory entries

Publications (2)

Publication Number Publication Date
CN119357084A true CN119357084A (en) 2025-01-24
CN119357084B CN119357084B (en) 2025-05-06

Family

ID=94319189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411908731.6A Active CN119357084B (en) 2024-12-24 2024-12-24 Method and computer program product for adaptively combining cache coherent directory entries

Country Status (1)

Country Link
CN (1) CN119357084B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080244190A1 (en) * 2007-03-30 2008-10-02 Shedivy David A Method, Apparatus, System and Program Product Supporting Efficient Eviction of an Entry From a Central Coherence Directory
CN104778132A (en) * 2015-04-08 2015-07-15 浪潮电子信息产业股份有限公司 Multi-core processor directory cache replacement method
CN107229593A (en) * 2016-03-25 2017-10-03 华为技术有限公司 The buffer consistency operating method and multi-disc polycaryon processor of multi-disc polycaryon processor
CN107894914A (en) * 2016-09-30 2018-04-10 华为技术有限公司 Buffer consistency treating method and apparatus
CN105659216B (en) * 2014-09-29 2019-03-19 华为技术有限公司 The CACHE DIRECTORY processing method and contents controller of multi-core processor system
CN114385439A (en) * 2021-12-13 2022-04-22 北京大学 Multi-chip consistent monitoring filtering method based on cache group mapping
CN118377637A (en) * 2024-06-26 2024-07-23 北京卡普拉科技有限公司 Method, device, equipment and storage medium for reducing redundant cache consistency operation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080244190A1 (en) * 2007-03-30 2008-10-02 Shedivy David A Method, Apparatus, System and Program Product Supporting Efficient Eviction of an Entry From a Central Coherence Directory
CN105659216B (en) * 2014-09-29 2019-03-19 华为技术有限公司 The CACHE DIRECTORY processing method and contents controller of multi-core processor system
CN104778132A (en) * 2015-04-08 2015-07-15 浪潮电子信息产业股份有限公司 Multi-core processor directory cache replacement method
CN107229593A (en) * 2016-03-25 2017-10-03 华为技术有限公司 The buffer consistency operating method and multi-disc polycaryon processor of multi-disc polycaryon processor
CN107894914A (en) * 2016-09-30 2018-04-10 华为技术有限公司 Buffer consistency treating method and apparatus
CN114385439A (en) * 2021-12-13 2022-04-22 北京大学 Multi-chip consistent monitoring filtering method based on cache group mapping
CN118377637A (en) * 2024-06-26 2024-07-23 北京卡普拉科技有限公司 Method, device, equipment and storage medium for reducing redundant cache consistency operation

Also Published As

Publication number Publication date
CN119357084B (en) 2025-05-06

Similar Documents

Publication Publication Date Title
JP5440067B2 (en) Cache memory control device and cache memory control method
CN104679669B (en) The method of cache cache accumulator systems and access cache row cache line
CN101577716B (en) Distributed storage method and system based on InfiniBand network
US6901483B2 (en) Prioritizing and locking removed and subsequently reloaded cache lines
US7814279B2 (en) Low-cost cache coherency for accelerators
CN102591800B (en) Data access and storage system and method for weak consistency storage model
KR100978156B1 (en) Method, apparatus, system and computer readable recording medium for line swapping scheme to reduce effectiveness in snoop filter
US6438653B1 (en) Cache memory control circuit including summarized cache tag memory summarizing cache tag information in parallel processor system
US20150058570A1 (en) Method of constructing share-f state in local domain of multi-level cache coherency domain system
US12093177B2 (en) Multi-level partitioned snoop filter
CN113641596B (en) Cache management method, cache management device and processor
CN119025443A (en) A multi-core processor system
CN112955876A (en) I/O coherent requesting node for data processing network with improved handling of write operations
CN103970678B (en) Catalogue designing method and device
US20210240625A1 (en) Management of coherency directory cache entry ejection
CN119201452A (en) Hierarchical memory page management method and system based on random dynamic access memory and heterogeneous memory
JP2746530B2 (en) Shared memory multiprocessor
CN119544643B (en) Multi-core on-chip network transaction processing system, method, device, medium and product
CN118550849B (en) Cache consistency maintenance method, multi-core system and electronic device
CN119357084B (en) Method and computer program product for adaptively combining cache coherent directory entries
CN105659216B (en) The CACHE DIRECTORY processing method and contents controller of multi-core processor system
US20250130944A1 (en) Cache status recording method, data access method and related apparatuses and devices
CN113296686B (en) Data processing method, device, equipment and storage medium
US20140289481A1 (en) Operation processing apparatus, information processing apparatus and method of controlling information processing apparatus
US6496907B1 (en) System and method for updating from a read-only to a read-write entry and concurrently invalidating stale cache copies from head-to-tail and tail-to-head directions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant