[go: up one dir, main page]

CN119226190B - Processor and data access monitoring method - Google Patents

Processor and data access monitoring method Download PDF

Info

Publication number
CN119226190B
CN119226190B CN202411759219.XA CN202411759219A CN119226190B CN 119226190 B CN119226190 B CN 119226190B CN 202411759219 A CN202411759219 A CN 202411759219A CN 119226190 B CN119226190 B CN 119226190B
Authority
CN
China
Prior art keywords
data
core
cache
system memory
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411759219.XA
Other languages
Chinese (zh)
Other versions
CN119226190A (en
Inventor
陈复
唐菊飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinliji Semiconductor Co ltd
Original Assignee
Shanghai Xinliji Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinliji Semiconductor Co ltd filed Critical Shanghai Xinliji Semiconductor Co ltd
Priority to CN202411759219.XA priority Critical patent/CN119226190B/en
Publication of CN119226190A publication Critical patent/CN119226190A/en
Application granted granted Critical
Publication of CN119226190B publication Critical patent/CN119226190B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明提出了一种处理器及数据存取监测方法。属于多核处理器的运算及内存访问技术领域;所述处理器包括输入/输出区块,所述输入/输出区块用以连接各所述核心及所述系统内存,其中,所述输入/输出区块包括历史存取记录表,所述历史存取记录表用于暂存各所述核心的高速缓存存取的至少一部分数据的地址。藉由所述历史存取记录表中记录着所述高速缓存曾经存取的信息,所以一旦历史记录表中发现所要的目标数据的信息,即可以准确的要求具有所述数据的核心将数据释出,因此可以明显降低播送窥探的频率。

The present invention proposes a processor and data access monitoring method. It belongs to the technical field of operation and memory access of multi-core processors; the processor includes an input/output block, and the input/output block is used to connect each of the cores and the system memory, wherein the input/output block includes a historical access record table, and the historical access record table is used to temporarily store the address of at least a part of the data accessed by the cache of each core. Since the information of the cache accessed before is recorded in the historical access record table, once the information of the desired target data is found in the historical record table, the core with the data can be accurately required to release the data, thereby significantly reducing the frequency of broadcast snooping.

Description

Processor and data access monitoring method
Technical Field
The invention provides a processor and a data access monitoring method, and belongs to the technical field of operation and memory access of multi-core processors.
Background
A system having a processor typically utilizes a memory controller to control and access resources of the system memory. When the memory controller is in effect receiving memory access requests, the memory controller buffers the memory access requests and processes the buffered memory access requests according to certain specified priorities.
In the prior art, each core of the processor is respectively provided with a cache, so that a certain physical block of the system memory can be accessed by more than one core at the same time, and the situation that data are different among a plurality of caches must be avoided. The Condition of cache contention (CACHE RACE Condition) between multiple processors occurs mainly when there is a write action, at which time it is necessary to snoop the cache and the bus by means of a snoop cache coherency protocol (Snoop Cache Coherency Protocol), and the behavior of the cache is known from the snoop, thus determining which activity to take.
A common coherency protocol is the MOESI (Modified Owned Exclusive SHARED INVALID) protocol. Each cache line according to this protocol includes a status bit indicating the MOESI status of the cache line. These states include Modified (M) to indicate that the cache line is Modified, exclusive (E) or Shared (S), or Invalid (I). And the Owned (O) state indicates that the cache line in the cache that may share copies with other caches and that the data in memory is stale is modified.
However, according to the above-mentioned technical means, the processor needs to continuously change the state of the cache line during the operation process, which obviously reduces the performance of the processor. The method reduces the access requirement to the system memory, thereby reducing the number of times of data movement between the system memory and the cache memory, and improving the system performance.
Disclosure of Invention
The invention provides a processor and a data access monitoring method, which are used for solving the problem that the performance of the processor is reduced because the state of a cache line is continuously changed by the behavior of a cache through snoop:
The invention provides a processor, which comprises a plurality of cores and all cores are connected with a system memory, and comprises an input/output block, wherein the input/output block is used for connecting each core and the system memory, and comprises a history access record table, and the history access record table is used for temporarily storing addresses of at least one part of data accessed by a cache of each core.
Further, the history table of the input/output block is connected to a coherency circuit unit and a memory controller, wherein the coherency circuit unit is connected to the core, and the memory controller is connected to the system memory.
Further, the coherence circuit unit includes any one of MOSEI and MESI protocols.
Further, delays may occur when a cache line switches between different states. For example, in the MESI protocol, a cache line may switch from an exclusive (E) state to a shared (S) state, or from a modified (M) state to an invalid (I) state, each state switch involving the sending and receiving of messages and the processing of responses to those messages, and the state switch delay may be calculated by the following equation:
Wherein n represents the total number of state switching steps; indicating the initialization time of the i step; Representing the propagation time of step i; Y represents the influence factor of the propagation distance; Indicating the acknowledgement time of the i-th step;
in cache coherency protocols, when one core needs to update the cache state of the other core, this needs to be done by messaging. The messaging delay includes the propagation time of the message on the bus and the processing time at the target core. Calculated by the following formula:
Wherein m represents the total number of messaging steps; Indicating the transmission time of the j-th step; Representing the transmission time of the j th step; representing the number of transmission nodes in step j; representing the processing time of the j step;
in some operations, such as write operations, one core may need to wait for the response of the other core to complete the update of the cache state. Waiting for a response delay refers to the time taken from sending a request until all necessary responses are received. Calculated by the following formula:
Wherein, Indicating the time at which the request was sent and p indicating the total number of corresponding steps,Indicating the response time of the kth step.
Adding the three delays to obtain the total delay in the cache consistency protocol:
the performance of the cache coherence protocol is evaluated by calculating the total delay, and if the total delay is below a preset threshold, the performance of the cache coherence protocol is considered acceptable, and if above the threshold, the protocol or hardware design needs to be optimized to reduce the delay. Further, the input step of the input/output block includes:
When any core of the processor performs read-write operation on a Cache (Cache) of the processor, capturing relevant information (such as a system memory address of data) of the operation and inputting the relevant information into a history access record table of an input/output block;
The history access record list in the input/output block interacts with a consistency circuit unit, and the consistency circuit unit maintains the consistency of data between cores based on MOESI or MESI protocol;
when the cache state of the core changes (such as data is modified, replaced or invalid), the change information is input into a history access record table through a consistency circuit unit, and the record is updated;
When the core needs to load data from the system memory to the cache, or write back data from the cache to the system memory, the information of the operation interacts with the memory controller through the input/output block.
Further, the outputting step of the input/output block includes:
when the core needs to access specific data, a query request is sent to a history access record table in the input/output block;
If the history record indicates that particular data has been accessed (i.e., is present in a cache of a core), then the core that is ready to fetch the data waits for the core that has accessed the data to release the data (possibly coordinated by a coherency circuit unit);
If the data is released, the core to be taken acquires the data from the system memory or the cache of another core;
If the history record indicates that the particular data was not accessed (i.e., is not present in the cache of any core), then the I/O block will broadcast a snoop message to all cores;
in the process of data acquisition, the input/output block and the memory controller work cooperatively to ensure the correct transmission of data from the system memory to the cache;
Meanwhile, the input/output block updates the history access record table in real time.
Further, the accessing step of the history access record table includes:
when any core of the processor executes read-write operation on a Cache (Cache), capturing related information of the operation by a system, wherein the information comprises a system memory address of data, an operation type (read/write) and a core number for executing the operation;
the captured operation information is input into a history access record table in the input/output block, wherein the history access record table stores at least one part of the information, and the at least one part comprises a system memory address of data;
The history access record table interacts with a consistency circuit unit, and the consistency circuit unit maintains the consistency of data between cores based on MOESI or MESI protocols;
when the cache state of the core changes (such as data is modified, replaced or invalidated), the consistency circuit unit transfers the change information to the history access record table;
When the core needs to load data from the system memory to the cache or write back data from the cache to the system memory, the information of the operation interacts with the memory controller through the input/output block;
The memory controller is responsible for actual data transmission, and the input/output block records the relevant information of the operation (especially the system memory address of the data) into the history access record table;
when the core needs to access specific data, sending a query request to a history access record table in an input/output block, wherein the query request comprises a system memory address of the data to be accessed;
after receiving the query request, the history access record table checks the stored record to determine whether the data has been accessed by caches of other cores;
If the data has been accessed, the core that is ready to fetch the data waits for the core that has accessed the data to release the data;
once the data is released, the core that is ready to fetch may fetch the data from system memory or another core's cache;
If the data is not accessed, the I/O block will broadcast a snoop message to all cores, ensuring that no cores are using the data;
Then, preparing a core for taking the data to acquire the data from a system memory;
in the process of data acquisition, the input/output block updates the history access record table in real time and displays new data access conditions, wherein the new access conditions comprise information such as the latest access state, access time and the like of the recorded data.
The invention provides a data access monitoring method, which comprises a processor with a plurality of cores, and comprises the following steps:
Recording related information of data which is accessed by a cache in a history access record table, wherein the related information comprises an address of the data in a system memory;
when the core processor needs to access specific data, inquiring a history access record table;
if the history access record shows that the specific data is accessed, releasing the specific data by the core which has accessed the specific data, and preparing to access the specific data by taking the core of the specific data into the system memory;
if the history access record indicates that the specific data is not accessed, a snoop is broadcast to each core, the specific data is released, and the core of the specific data is ready to be taken to the system memory for accessing the specific data.
Further, the step of releasing the specific data from the core having accessed the specific data and preparing to access the specific data from the core having accessed the specific data to the system memory comprises:
When a request core needs to access specific data, inquiring a history access record table to determine whether the data is accessed by a cache of a holding core;
If the query result shows that the data has been accessed (i.e., exists in a cache of a core), then go to the next step;
the holding core, upon receipt of the notification, prepares to release the data, including marking the data as "invalid" or "dirty" from the cache (if the data has been modified) and preparing it for writing it back to system memory (if the data is "dirty");
the coherency circuit unit monitors this process and if the data is dirty, the coherency circuit unit coordinates writing the data back to system memory from the cache holding the core;
Once the data is released (or if the data is never cached, directly accesses the system memory), the requesting core retrieves the data by accessing the system memory;
and finally, updating the history access record table to finish data access.
Further, the step of broadcasting a snoop to each core to release the specific data and prepare to access the specific data from the core of the specific data to the system memory comprises:
When a request core needs to access specific data, inquiring a history access record table to determine whether the data is accessed by caches of other cores;
If the history table indicates that the particular data is not accessed (i.e., is not in any core's cache), then go to the next step;
the request core triggers a snoop mechanism of the processor, and the snoop mechanism sends snoop requests to all other cores through an internal bus or a communication network, wherein the snoop requests contain address information of data to be accessed in a system memory;
After each core receives the snoop request, checking whether the cache contains the requested data;
If the hit core finds that the requested data is contained in its cache, it will be ready to release the data, the hit core marks the relevant data in the cache as "invalid" or "dirty" (if the data has been modified);
The hit core broadcasts a data release notification to all cores through the processor's internal mechanisms, indicating that it is ready to release data or has written data back to system memory;
After receiving the data release notification, the request core acquires information that the data is now in the system memory or is about to be released back to the system memory;
The request core reads the data from the system memory, loads the data into the cache, updates the history access record table, and records the result of the data access and the latest access state of the data.
Further, the step of each core after receiving the snoop request further includes:
After each core receives the snoop request and inspects its cache, the time from the receipt of the snoop request to the determination of whether it hits (i.e., whether the requested data is contained in the cache) is recorded, which is referred to as the "snoop response time";
for each hit core, the total time from the determination of the hit to the preparation of the release of the data (including marking the data as "invalid" or "dirty" and, if necessary, writing the data back to system memory) is recorded, this time being referred to as the "data preparation time";
After a certain amount of snoop event data is collected (e.g., every thousand snoops or minutes), an average snoop delay is calculated, wherein the average snoop time is calculated by the following equation:
Where N represents the number of snoop events, A snoop event number representing a hit; representing a set of response times for all snoop events; A set of data preparation times representing all hit snoop events; And Beta represents the base of logarithm and is used for carrying out logarithmic exchange on the corresponding time; the coefficients representing the exponential function are used to adjust the exponential increase in the data preparation time.
Wherein if a snoop request misses the cache of any core, the data preparation time is 0;
Judging the efficiency of a snoop mechanism based on a preset threshold;
if the average snoop delay is lower than the set threshold, judging that the snoop mechanism is high in efficiency;
If the average snoop delay is higher than the set threshold, judging that the snoop mechanism is low in efficiency;
If the snoop mechanism is determined to be inefficient, the cause of the high snoop latency may be internal bus congestion, communication network latency, inefficient processor snoop logic, etc.
Based on the analysis results, corresponding handling measures are taken, including upgrading hardware components such as internal buses, communication networks or processors, and improving algorithms or policies of snoop mechanisms, such as reducing unnecessary snoop requests, optimizing data access patterns, etc.
And feeding back the adopted disposal measures and the effects thereof to related teams or personnel, performing iterative adjustment according to the actual effects, and continuously monitoring the performance of the snoop mechanism.
The invention has the beneficial effects that the information which is accessed by the cache is recorded in the history access record table, so that once the information of the target data which is required is found in the history access record table, the core with the data can be accurately required to release the data, thereby obviously reducing the frequency of broadcasting snooping. The method comprises the steps of combining a history access record list with a consistency circuit unit, effectively maintaining data consistency among cores, reducing data conflict and error, improving reliability of a system, quickly judging whether data is accessed by other cores through inquiry of the history record in a data request process, reducing unnecessary data transmission and delay, improving overall efficiency of data access, reducing frequent access to a system memory, reducing occupation of memory bandwidth and improving system performance through real-time monitoring and recording access conditions of the data, quickly positioning problems when errors or faults occur through recording history access information, improving fault recovery capability of the system, providing data support for system performance optimization through monitoring delay and efficiency of a snoop mechanism, helping development teams to identify bottlenecks and improve, and dynamically managing the data access by the history access record list and the snoop mechanism updated in real time, so that a processor can flexibly cope with different load conditions.
Drawings
FIG. 1 is a schematic diagram of a processor architecture according to an embodiment of the present invention;
FIG. 2 is a flow chart of data access monitoring according to an embodiment of the invention.
Detailed Description
Referring to fig. 1, the embodiment discloses a processor 100 including a plurality of cores 101, 102, 103, 104 of C0, C1, C2., cn, etc., and each core 101-104 is respectively coupled with a cache (denoted by "$" symbol). The processor 100 has an input/output block (I/Oblock) 120, and the I/O block 120 is connected to each of the cores 101-104. Next, the processor 100 is connected to a system memory 130 through the i/o block 120.
Specifically, the input/output block 120 includes a history table 121 coupled to a first multiplexer 122, a second multiplexer 123, and a coherency circuit unit 124, wherein the first multiplexer 122 is further coupled to the second multiplexer 123. The memory controller 125 is connected to the second multiplexer 123. The coherency circuit 124 is coupled to the core 101.
Further, the first multiplexer 122 is connected to each of the cores 101 to 104, so that the request command of each of the cores 101 to 104 can be transmitted to the history table 121 or the second multiplexer 123 through the first multiplexer 122. The output result of the history table 121 may be transmitted to the second multiplexer 123 or the coherency circuit unit 124. The output of the second multiplexer 123 may be transmitted to the memory controller 125 to access the data in the system memory 130. The output of the coherency circuit unit 124 may be transmitted to each of the cores 101-104.
The history table 121 is a cache-like memory queue, and the contents of the memory queue include an address, a validity, and a data source, so as to correspond to the address of the data taken by at least a portion of the cache. The coherency circuit unit 124 may employ a known MOESI protocol or MESI protocol.
Referring to FIG. 2, an embodiment of the present invention discloses a data access monitoring method, which takes the correspondence between core A and core B as an example, and applies the MESI protocol:
Block 201 discloses that core A accesses a data S;
Block 202 discloses that core A first goes to the history access record table to check the record in the history record table, whether the data S is recorded in the history access record table or not is recorded in the history access record table according to whether the data S is accessed or not, and the data S is recorded in the history access record table once the data S is accessed;
Block 203 discloses a determining step for determining whether the data S is recorded in the history access record;
Block 204 discloses that if data S is in the history list and is shown as being accessed by core B, then a determination is again made as to the status of data S, wherein if the status of data S is declared shared invalid, then access of data S to system memory by core A may be performed as disclosed in block 205.
In the determination step disclosed in block 204, if the data S is not the shared (S) or invalid (I) condition, the state of the data S is either modified (M) or exclusive (E), which can be performed as disclosed in block 206.
Block 206 discloses that core A requests core B to discard/release data S and invalidates data S in core B, if the state of data S is modified (M) or exclusive (E), which causes core A to access data S to system memory, as further disclosed in block 205.
If it is found in the determining step of block 203 that the data S is not recorded in the history table, then core A broadcasts a snoop for each core as disclosed in block 207;
then, as indicated by the block 208, each core is required to perform state clearing/releasing of the data S and declare the state of the data S according to the cache coherency protocol.
After each core has completed clearing and the state of the data S has been re-declared, core A may access the data S from system memory, as shown in block 205.
Since the information that the cache has accessed is recorded in the history table, once the information of the target data is found in the history table, the core with the data can be accurately required to release the data, so that the frequency of broadcasting snooping can be obviously reduced. It will be appreciated that the greater the memory capacity of the history table, the less frequent the snoop will be broadcast.
In the MESI protocol, there are four states for a cache line, modify (M), exclusive (E), shared (S), and invalid (I). When the data state of a cache line changes, for example, a switch from an exclusive (E) state to a shared (S) state, or a switch from a modified (M) state to an invalid (I) state, a state switching operation is required. The state switching delay is obtained by the following formula:
Wherein n represents the total number of state switching steps; indicating the initialization time of the i step; Representing the propagation time of step i; Y represents the influence factor of the propagation distance; Indicating the acknowledgement time of the i-th step;
When a core modifies data in a cache line, the state of the cache line changes from exclusive (E) or shared (S) to modified (M). When other cores need to access the data, access rights to the data are requested through messaging. Depending on the type of request and the state of the current cache line, the cache controller may decide whether to send the data to the requester (shared state) or invalidate the requester's cache line (invalid state). Each state switch involves the sending and receiving of messages and the processing of responses to those messages. The messaging delay includes the propagation time of the message on the bus and the processing time at the target core. The response processing time depends on the speed at which the target core processes the received message. The message passing delay is obtained by the following formula:
Wherein m represents the total number of messaging steps; Indicating the transmission time of the j-th step; Representing the transmission time of the j th step; representing the number of transmission nodes in step j; representing the processing time of the j step;
In some operations, such as write operations, one core may need to wait for the response of the other core to complete the update of the cache state. When a core needs to modify data in a cache line, it will first check the state of the cache line. If the cache line is in the exclusive (E) state, the core may directly modify the data and update the state to modified (M). If a cache line is in the shared (S) state, the core needs to send a message to other cores that own the cache line requesting them to invalidate the respective cache line. After sending the request, the core needs to wait for the responses of all relevant cores. Waiting for a response delay refers to the time taken from sending a request until all necessary responses are received. If a core fails to respond in time, it may cause the write operation to be blocked, thereby increasing the overall delay. Wherein the wait response delay is obtained by the following formula:
Wherein m represents the total number of messaging steps; Indicating the transmission time of the j-th step; Representing the transmission time of the j th step; representing the number of transmission nodes in step j; representing the processing time of the j step;
adding the three delays can result in a total delay in the cache coherency protocol.
Total delay = state switching delay + messaging delay + wait for response delay.
The performance of the cache coherence protocol can be evaluated by calculating the total delay, if the total delay is lower than a preset threshold, the performance of the cache coherence protocol is considered acceptable, if the total delay is higher than the threshold, the protocol or hardware design is required to be optimized to reduce the delay, the main sources of the delay in the cache coherence protocol can be more accurately identified by subdividing the delay into a state switching delay (STA), a message delivery delay (MPD) and a waiting response delay (RED), and the optimization efficiency is improved by helping developers optimize specific delay sources. The delay quantization is possible through a specific calculation formula, and the performance of the cache consistency protocol can be intuitively evaluated through the Total Delay (TD) obtained through calculation. A preset threshold is set as a criterion for performance evaluation. By comparing the total delay to a threshold, it can be determined whether the performance of the cache coherency protocol is acceptable. When the total delay is above the threshold, a direction is provided to optimize the protocol or hardware design. And a developer can analyze which links have an optimization space according to various parameters in the delay calculation formula, so that targeted optimization is performed. By optimizing the delay source in the cache consistency protocol, the total delay can be reduced, and the response speed and the overall performance of the system can be improved. The reduction of the delay is important to the improvement of the user experience, and the reduction of the delay can obviously improve the user experience in application scenes with higher real-time requirements, such as games, real-time communication and the like. The performance optimization of the cache consistency protocol is beneficial to reducing the unstable phenomenon of the system caused by too high delay, improving the reliability and stability of the system and reducing the risk of system breakdown or crash. Through the deep research on cache consistency protocol delay, innovation and development of protocols can be promoted, and developers can explore more efficient and concise protocol designs so as to adapt to continuously changing computing environments.
When any core of the processor performs a read/write operation on its Cache (Cache), information about the operation (e.g., system memory address of data, operation type, etc.) is captured, and the captured information is then input into the history table of the input/output block. This step ensures that each time a core accesses the cache memory, basic data is provided for subsequent data access monitoring, the history access record table in the input/output block interacts with the coherency circuit unit, the coherency of data between the cores is maintained based on a cache coherency protocol such as MOESI or MESI, when the cache state of the cores changes (such as data is modified, replaced or invalidated), the change information is input into the history access record table through the coherency circuit unit, the record is updated in real time, and when the cores need to load data from the system memory into the cache, or write back data from the cache into the system memory, the information of these operations interacts with the memory controller through the input/output block, and the memory controller performs corresponding memory read-write operations according to the received information, so as to ensure correct loading and write-back of the data.
The technical scheme realizes real-time access monitoring of data by capturing read-write operation information of a processor core to a cache and inputting the read-write operation information to a historical access record table, is beneficial to a system administrator or a developer to know the flow condition of the data in real time, ensures the correctness and the safety of the data, interacts with a consistency circuit unit through the historical access record table in an input/output block, maintains the consistency of the data between the cores through the consistency circuit unit based on MOESI or MESI protocols, ensures that the multi-core processor cannot generate data conflict or inconsistency when processing the data, improves the stability and the reliability of the system, inputs the change information into the historical access record table through the consistency circuit unit when the cache state of the core is changed (such as data is modified, replaced or invalid), updates the record, ensures that the information in the historical access record table is always consistent with the actual state of the core cache, provides an accurate basis for subsequent data access and processing, ensures that the system needs to load the data from the system to the cache or the situation that the data is not consistent when the system is written back to the cache, and the system is more convenient to control the system through the overall memory, the system can control the system, the system can be more flexibly updated and the memory access and the memory is more convenient to control the system, the system is more convenient to input and the system has more improved in the overall system access and the system, the system has more data access and the system access and the memory is more convenient to control and more convenient to update and the memory access and more necessary to control the system, by monitoring the data access condition and maintaining the data consistency in real time, the system can discover and process potential problems in time, thereby being beneficial to reducing the maintenance cost of the system and reducing the system breakdown and repair work caused by data errors or inconsistencies.
When a core needs to access specific data, it will send a query request to the history access record table in the input/output block, after receiving the request, it will query the history access record table to determine the position of the required data, if the history access record shows that the specific data has been accessed (i.e. exists in the cache of a certain core), the core ready to take the data will enter a waiting state, waiting for the core that has accessed the data to release the data, if the history access record shows that the specific data has not been accessed (i.e. does not exist in the cache of any core), the input/output block will broadcast a snoop message to all cores to find out if the data has not been recorded in the caches of other cores (although this is less the case, may be due to a delay or an error in recording), once the core that has accessed the data has been released (possibly completed by a coordination of a consistency circuit unit), the core ready to take the data can be fetched from the system memory or the cache of another core, in the data fetching process, the input/output block and the controller will work in close cooperation, ensuring that the data is transferred from the system to the cache of the system or the latest data is fetched from the cache of any core, and the history access record is updated in real time.
The data access method has the advantages that the history access record list is inquired, the cores for preparing and taking the data can quickly know whether the data are occupied by other cores or not, so that unnecessary waiting is avoided, once the data are released, the cores for preparing and taking the data can immediately acquire the data from a system memory or a cache of another core, the speed and efficiency of data access are improved, the data can be kept consistent when the data are accessed and modified through coordination of the consistency circuit units, data conflict and errors are avoided, in the data acquisition process, the cooperative work of the input/output block and the memory controller ensures the correct transmission of the data from the system memory to the cache, the integrity of the data is maintained, the access times of the system to the memory are reduced through efficient data access management and optimization, the cost and energy consumption of the system are reduced, the input/output block can update the history access record list in real time, accurate data access state information is provided for the system, the system can be more reasonably distributed and utilized, different application scenes and requirements can be met through flexible data access strategies and optimization, the system flexibility is improved, the system can be adapted to the system is improved, the problem of the system is discovered through the cooperative access record and the history access record can be timely discovered, and the risk of the system access is reduced.
When any core of the processor performs a read/write operation to the cache, the system captures detailed information about the operation, including the system memory address of the data, the type of operation (read or write), and the core number at which the operation was performed, and the captured operation information is then entered into the history access record table in the input/output block. The record list stores at least the system memory address of the data, which is the key basis for data access inquiry, and the history access record list and the consistency circuit unit keep close interaction. The coherency circuit unit maintains coherency of data between cores based on MOESI or MESI protocols. These protocols ensure that copies of data remain synchronized and consistent as they are shared or modified among multiple cores, and that coherency circuitry passes these change information to the history table as the cache state of the cores changes (e.g., the data is modified, replaced, or invalidated). This ensures that the information in the record table always remains consistent with the actual state of the core cache, and that the information of these operations will interact with the memory controller through the I/O block when the core needs to load data from the system memory into the cache, or write data back from the cache to the system memory. The memory controller is responsible for the actual data transfer tasks, and at the same time, the I/O block records the relevant information of these operations (especially the system memory address of the data) into the history access record table. This ensures that the record table reflects the access of the data in real time, and when the core needs to access specific data, it will issue a query request to the history access record table in the input/output block. The history access record table, upon receiving a query request, examines the stored record to determine if the data has been accessed by the caches of the other cores. If the data has been accessed, the core that is ready to fetch the data waits for the core that has accessed the data to release the data (this process is coordinated by the coherency circuitry unit), once the data is released, the core that is ready to fetch may fetch the data from system memory or another core's cache. If the data is not accessed, the input/output block broadcasts a snoop message to all cores to ensure that no cores are using the data and acquire the data from the system memory, and in the process of data acquisition, the input/output block updates the history access record table in real time to display new data access conditions, including the latest access state, access time and other information of the recorded data.
The history access record table allows the core to quickly inquire the access state of specific data so as to avoid unnecessary memory access and data transmission delay, the system can timely find and process data access conflict through real-time update and inquiry, reduce waiting time among the cores, the core can more effectively utilize cache resources, because the history access record table provides accurate information about whether the data are cached or not, the scheme optimizes the use of memory bandwidth through reducing unnecessary memory access, improves the overall system performance, interaction of the history access record table and a consistency circuit unit ensures the consistency of the data among the cores, avoids data conflict and error, real-time update of the data access state ensures that the system can always acquire latest data state information, the system can timely find potential data access problems through real-time update and inquiry, reduce the risk of system crash and data loss, and the scheme can adapt to different application scenes and requirements through flexible data access strategies and optimization in the data transmission and access process, thereby simplifying the system and centrally designing the system access state. The quick positioning and solving of the data access problem reduces the debugging and maintenance costs of the system.
When a processor core (referred to as a "request core") needs to access particular data, it first queries the history access record table. The record table stores previous access to data by all cores, including whether the data is held by a cache of a core. The purpose of the query is to determine whether the data has been cached by another core (referred to as the "holding core"). The history access record table returns a query result telling the requesting core whether the data has been accessed. If data has been accessed (i.e., is present in the cache of a core), the requesting core needs to wait for the holding core to release the data. The requesting core informs the holding core via a bus or communication mechanism internal to the processor that it wants to access the data. The holding core, upon receiving the notification, prepares to release the data. The holding core marks the data as "invalid" or "dirty" from the cache, and if the data is "dirty" (i.e., has been modified), the holding core needs to be ready to write the data back to system memory. The coherency circuit unit monitors the data release process. If the data is dirty, the coherency circuitry coordinates writing the data back to system memory from the cache holding the core, the requesting core may access the system memory to retrieve the data once the data is released (or directly access the system memory if the data is not cached), the requesting core loads the data from the system memory into its cache for subsequent quick access, and finally, the history access record table is updated to reflect the latest access of the data.
The request core can quickly determine the position of data by inquiring the history access record table, unnecessary memory access is avoided, so that the data access efficiency is improved, the request core can quickly acquire the required data when the data is released or directly accessed from the system memory, the waiting time is reduced, and the consistency circuit unit is introduced to ensure the consistency of the data among a plurality of cores. Particularly when data is modified (i.e. changed into dirty data), a consistency circuit unit coordinates writing the data back to a system memory, thereby ensuring the accuracy of the data in the system memory, effectively avoiding data conflict and error by marking the data as invalid or dirty from a cache and timely writing back to the system memory, fully utilizing cache resources, reducing the consumption of memory bandwidth by reducing unnecessary memory access, simultaneously, more effectively managing the data access state by updating a history access record table in real time, improving the utilization rate of the whole resource, and reducing the risks of system breakdown and data loss by strict data access control and consistency maintenance. In the process of data access, any potential problem can be found and processed in time, so that the stability and reliability of the system are enhanced, the system can be well adapted to the environment of a multi-core processor, and data sharing and consistent access among a plurality of cores are supported. By optimizing the data access strategy, powerful support is provided for efficient operation of the multi-core processor. By centrally managing the data access states, the design and implementation of the system is simplified. Meanwhile, by providing clear data access flow and consistency maintenance mechanism, the development and maintenance cost of the system is reduced.
When a requesting core needs to access certain data, it first looks up the history table to determine if the data has been accessed by the caches of other cores, if the history table shows that the data has not been accessed (i.e. is not in the caches of any of the cores), the requesting core knows that it needs to fetch data from system memory, the requesting core triggers the processor's snoop mechanism by sending snoop requests to all other cores via the internal bus or communication network, the snoop requests contain address information of the data to be accessed in the system memory, so that other cores can check if their own caches contain the data, each core checks if it contains the requested data in its cache after receiving the snoop requests, and if a certain core (called a hit core) finds that it contains the requested data in its cache, it will be ready to release the data. The hit core marks the data as "invalid" or "dirty" from its cache (if the data is modified), if the data is "dirty" (i.e., has been modified), the hit core will write it back to system memory to ensure accuracy of the data in system memory, the hit core broadcasts a data release notification to all cores through the internal mechanisms of the processor, this notification indicating that the hit core is ready to release the data (if the data is not modified) or has written the data back to system memory (if the data is "dirty"), the request core knows that the data is now in or about to be released back to system memory after receiving the data release notification. Thus, the requesting core may read data from system memory and load it into its own cache for subsequent quick access. Finally, the request core updates the history access record table, and records the result of the data access and the latest access state of the data.
The snoop mechanism ensures that when a certain core needs to access specific data, if the data is cached by other cores, the data can be released or written back to the system memory in time, so that the consistency of the data is ensured. By broadcasting the data release notification, all cores can know the latest state of the data in real time, so that data collision and errors are avoided, when the data is not in the cache of the request core, a snoop mechanism can be rapidly positioned to the actual position of the data (namely, the system memory or the caches of other cores), unnecessary memory access is reduced, the efficiency of data access is improved, the delay of the system is reduced, snoop requests and data release notification are sent through an internal bus or a communication network, the communication is local and targeted, excessive consumption of bus bandwidth is not caused, compared with the traditional bus monitoring mode, the snoop mechanism is more efficient and energy-saving, the risks of system breakdown and data loss are reduced through strict data access control and consistency maintenance, and any potential problems can be timely discovered and processed in the process of data access, so that the stability and reliability of the system are enhanced. As the number of processor cores increases, snoop mechanisms can ensure consistent and efficient access of data across multiple cores. By centrally managing the data access status and history access record table, the scheme simplifies the management process of data access. The developer can be more focused on the logical implementation of the application without having to pay too much attention to the underlying data access details.
When a certain core in the system needs to access a certain data item, which may be present in the caches of other cores, a snoop request is initiated, each core receiving the snoop request checks its cache to determine whether the requested data is contained (i.e. hit), for each snoop request, the time from the receipt of the request to the determination of whether it hits, referred to as the snoop response time, is recorded, for hit cores, the total time from the determination of hits to the preparation of release of the data (including marking the data as "invalid" or "dirty" and possibly writing the data back to system memory) is further recorded, referred to as the data preparation time, and after a certain amount of snoop event data is collected (e.g. every thousand snoops or minutes), the average snoop delay is calculated. Wherein the average snoop time is calculated by the following formula:
Where N represents the number of snoop events, A snoop event number representing a hit; representing a set of response times for all snoop events; A set of data preparation times representing all hit snoop events; And Beta represents the base of logarithm and is used for carrying out logarithmic exchange on the corresponding time; The coefficients representing the exponential function are used to adjust the exponential increase in the data preparation time. The efficiency of the snoop mechanism is determined based on a preset threshold. If the average snoop delay is lower than the set threshold, the snoop mechanism is considered to be high in efficiency, otherwise, the snoop mechanism is considered to be low in efficiency, if the snoop mechanism is judged to be low in efficiency, the reasons for the high snoop delay are analyzed, possible reasons include internal bus congestion, communication network delay, insufficient processor snoop logic and the like, corresponding treatment measures are adopted according to analysis results, such as upgrading hardware components (internal buses, communication networks, processors and the like) or improving algorithms or strategies of the snoop mechanism (such as reducing unnecessary snoop requests, optimizing data access modes and the like), and the adopted treatment measures and effects thereof are fed back to related teams or personnel. And performing iterative adjustment according to the actual effect, and continuously monitoring the performance of the snoop mechanism to ensure continuous and efficient operation of the snoop mechanism.
The scheme can accurately measure the performance of the snoop mechanism by recording the response time and the data preparation time of each snoop request, and the calculation formula of the average snoop delay not only considers the response time of the snoop request, but also includes the data preparation time, and the relative importance of the response time and the data preparation time in the average delay is adjusted through the weight coefficients alpha and gamma. The logarithmic base number beta and the exponential function coefficient lambda in the formula are allowed to be adjusted according to specific application scenes so as to adapt to different performance requirements and hardware environments, and the accurate average snoop delay can be obtained by recording the response time and the data preparation time of each snoop request and calculating by combining the formula, so that accurate performance data is provided for system administrators or developers, and the system administrators or developers can better know the operation condition of a snoop mechanism. The formula can trigger a performance optimization procedure when the average snoop delay is above a set threshold. By performing in-depth analysis on the cause of high delay and taking corresponding treatment measures (such as upgrading hardware components, improving algorithms or strategies, etc.), snoop delay can be significantly reduced, and system performance can be improved. The average snoop delay formula is not only used for current performance evaluation, but also provides powerful support for continuous improvement of the system by continuously monitoring the performance of the snoop mechanism. Feedback and iterative adjustment mechanisms enable the system to be optimized continuously according to actual effects to accommodate changing workload and hardware environments. The high-efficiency snoop mechanism can reduce the data access delay and improve the response speed of the system. By optimizing the snoop mechanism, the data access conflict and error can be reduced, the risk of system breakdown or data loss is reduced, the reliability and stability of the system are enhanced, and the system can be ensured to stably run for a long time. Based on accurate performance evaluation, the scheme can accurately judge the efficiency of the snoop mechanism, and when the average snoop delay is higher than a set threshold, the system can automatically trigger a performance optimization flow, including cause analysis, treatment measure formulation and implementation and the like. The scheme provides a tool for deeply analyzing the reason of high snoop delay, and helps to identify problems such as internal bus congestion, communication network delay, insufficient snoop logic of a processor and the like. This flexibility enables a system administrator or developer to formulate an effective solution to a particular problem. The scheme not only pays attention to the current performance, but also ensures that the system can keep high-efficiency running for a long time by continuously monitoring the performance of the snoop mechanism. The feedback and iteration adjustment mechanism enables the system to be optimized continuously according to the actual effect so as to adapt to the continuously changing workload and hardware environment, the scheme can reduce data access delay and improve the overall performance of the system by optimizing the snoop mechanism, and the efficient snoop mechanism is beneficial to reducing data access conflicts and errors and enhancing the reliability and stability of the system.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (8)

1.一种处理器,所述处理器包括多个核心且所有核心与系统内存进行连接,其特征在于:所述处理器包括输入/输出区块,所述输入/输出区块用以连接各所述核心及所述系统内存,其中,所述输入/输出区块均包括历史存取记录表,所述历史存取记录表用于暂存各所述核心的高速缓存存取的数据的地址;1. A processor, comprising a plurality of cores and all the cores are connected to a system memory, wherein the processor comprises an input/output block, the input/output block is used to connect each of the cores and the system memory, wherein each of the input/output blocks comprises a historical access record table, the historical access record table is used to temporarily store the address of the data accessed by the cache of each of the cores; 所述输入/输出区块的输入步骤,包括:The input step of the input/output block includes: 当处理器的任一核心对其高速缓存进行读写操作时,对所述读写操作的相关信息进行捕获并输入到输入/输出区块的历史存取记录表中;When any core of the processor performs a read or write operation on its cache, relevant information of the read or write operation is captured and input into a historical access record table of the input/output block; 输入/输出区块中的历史存取记录表与一致性电路单元进行交互,通过一致性电路单元基于MOESI或MESI协议对核心之间数据的一致性进行维护;The historical access record table in the input/output block interacts with the consistency circuit unit, and the consistency circuit unit maintains the consistency of data between cores based on the MOESI or MESI protocol; 当核心的高速缓存状态发生变化时,将变化信息通过一致性电路单元输入到历史存取记录表中,对记录进行更新;When the core cache state changes, the change information is input into the historical access record table through the consistency circuit unit to update the record; 当核心需要从系统内存加载数据到高速缓存,或从高速缓存写回数据到系统内存时,所述读写操作的信息通过输入/输出区块与内存控制器进行交互;When the core needs to load data from the system memory to the cache, or write data back from the cache to the system memory, the information of the read and write operations interacts with the memory controller through the input/output block; 所述核心用于接收窥探请求,所述核心接收到窥探请求后的步骤,包括:The core is used to receive a snoop request, and the steps after the core receives the snoop request include: 在每个核心接收到窥探请求并检查其高速缓存后,记录从接收到窥探请求到确定是否命中的时间,所述时间为窥探响应时间;After each core receives the snoop request and checks its cache, the time from receiving the snoop request to determining whether it hits is recorded, and the time is the snoop response time; 对于每个命中核心,记录从确定命中到准备释放数据的总时间,所述从确定命中到准备释放数据包括将数据标记为“无效”或“脏”,以及将数据写回到系统内存;所述总时间为数据准备时间;For each hit core, record the total time from determining the hit to preparing to release the data, where the time from determining the hit to preparing to release the data includes marking the data as "invalid" or "dirty" and writing the data back to the system memory; the total time is the data preparation time; 在收集窥探事件数据后,计算平均窥探延迟;其中,平均窥探时间通过如下公式进行计算:After collecting snooping event data, the average snooping delay is calculated; the average snooping time is calculated using the following formula: 其中,N表示窥探事件数,表示命中的窥探事件数;表示所有窥探响应时间集合;表示所有命中窥探事件的数据准备时间集合;表示权重系数,用于调整响应时间和数据准备时间在平均延迟中的重要性;β表示对数的底数,用于对相应时间进行对数交换;表示指数函数的系数,用于调整数据准备时间的指数增长;Where N is the number of snooping events, Indicates the number of snoop events that hit; represents the set of all snoop response times; Represents the set of data preparation times for all snooping events that hit; and represents the weight coefficient, which is used to adjust the importance of response time and data preparation time in the average delay; β represents the base of the logarithm, which is used to perform logarithmic conversion on the response time; represents the coefficient of the exponential function, which is used to adjust the exponential growth of data preparation time; 基于预设的阈值来判断窥探机制的效率;Determine the efficiency of the snooping mechanism based on a preset threshold; 若平均窥探延迟低于设定的阈值,则判断窥探机制效率高;If the average snooping delay is lower than the set threshold, the snooping mechanism is judged to be efficient; 若平均窥探延迟高于设定的阈值,则判断窥探机制效率低;If the average snooping delay is higher than the set threshold, the snooping mechanism is judged to be inefficient; 若判断窥探机制的效率低,则对窥探延迟高的原因进行分析;根据分析结果,采取相应的处置措施;If the snooping mechanism is judged to be inefficient, the cause of the high snooping delay is analyzed; and corresponding disposal measures are taken according to the analysis results; 将采取的处置措施及其效果反馈给相关团队或人员,根据实际效果进行迭代调整,并对窥探机制的性能进行持续监控。The measures taken and their effects will be fed back to the relevant teams or personnel, iterative adjustments will be made based on the actual results, and the performance of the snooping mechanism will be continuously monitored. 2.根据权利要求1所述的一种处理器,其特征在于,所述输入/输出区块的所述历史存取记录表连接一个一致性电路单元及内存控制器,其中,所述一致性电路单元连接至所述核心,所述内存控制器连接所述系统内存。2. A processor according to claim 1, characterized in that the historical access record table of the input/output block is connected to a consistency circuit unit and a memory controller, wherein the consistency circuit unit is connected to the core, and the memory controller is connected to the system memory. 3.根据权利要求2所述的一种处理器,其特征在于,所述一致性电路单元包括MOSEI及MESI协议中的协议。3. A processor according to claim 2, characterized in that the consistency circuit unit includes a protocol in MOSEI and MESI protocols. 4.根据权利要求1所述的一种处理器,其特征在于,所述输入/输出区块的输出步骤,包括:4. The processor according to claim 1, wherein the output step of the input/output block comprises: 当核心需要访问特定数据时,向输入/输出区块中的历史存取记录表发出查询请求;When the core needs to access specific data, it issues a query request to the historical access record table in the input/output block; 若历史存取记录显示特定数据已被存取,则准备取用所述特定数据的核心会等待已存取所述高速缓存存取的数据的核心释放所述特定数据;If the historical access record shows that the specific data has been accessed, the core that is ready to access the specific data will wait for the core that has accessed the cached data to release the specific data; 若数据被释放,则准备取用的核心从系统内存或另一个核心的高速缓存中获取所述特定数据;If the data is released, the core that is ready to use it obtains the specific data from the system memory or the cache of another core; 若历史存取记录显示特定数据未被存取,则输入/输出区块会向所有核心广播窥探消息;If the historical access records show that specific data has not been accessed, the input/output block broadcasts a snoop message to all cores; 在数据获取过程中,输入/输出区块与内存控制器协同工作,对数据从系统内存到高速缓存的正确传输进行保障;During the data acquisition process, the input/output block works with the memory controller to ensure the correct transfer of data from the system memory to the cache; 同时,输入/输出区块对历史存取记录表进行实时更新。At the same time, the input/output block updates the historical access record table in real time. 5.根据权利要求1所述的一种处理器,其特征在于,所述历史存取记录表的存取步骤,包括:5. A processor according to claim 1, characterized in that the step of accessing the historical access record table comprises: 当处理器的任一核心执行对高速缓存的读写操作时,系统对所述操作的相关信息进行捕获;When any core of the processor performs a read or write operation on the cache, the system captures relevant information of the operation; 捕获到的操作信息被输入到输入/输出区块中的历史存取记录表中;历史存取记录表对所述操作信息的至少一部分进行存储,所述至少一部分包括数据的系统内存地址;The captured operation information is input into a historical access record table in the input/output block; the historical access record table stores at least a portion of the operation information, the at least a portion including a system memory address of the data; 历史存取记录表与一致性电路单元进行交互,一致性电路单元基于MOESI或MESI协议来对核心之间数据的一致性进行维护;The historical access record table interacts with the consistency circuit unit, and the consistency circuit unit maintains the consistency of data between cores based on the MOESI or MESI protocol; 当核心的高速缓存状态发生变化时,一致性电路单元将所述变化信息传递给历史存取记录表;When the core cache state changes, the consistency circuit unit transmits the change information to the historical access record table; 当核心需要从系统内存加载数据到高速缓存,或从高速缓存写回数据到系统内存时,所述操作的信息通过输入/输出区块与内存控制器进行交互;When the core needs to load data from the system memory to the cache, or write data back from the cache to the system memory, the information of the operation interacts with the memory controller through the input/output block; 内存控制器负责实际的数据传输,而输入/输出区块则记录所述操作的相关信息到历史存取记录表中;The memory controller is responsible for the actual data transfer, while the input/output block records the relevant information of the operation into the historical access record table; 当核心需要访问特定数据时,向输入/输出区块中的历史存取记录表发出查询请求,所述查询请求包含要访问的数据的系统内存地址;When the core needs to access specific data, a query request is issued to the historical access record table in the input/output block, wherein the query request includes the system memory address of the data to be accessed; 历史存取记录表接收到查询请求后,对存储的记录进行检查,确定所述数据是否已被其他核心的高速缓存存取过;After receiving the query request, the historical access record table checks the stored records to determine whether the data has been accessed by the cache of other cores; 若数据已被存取,则准备取用所述数据的核心会等待已存取所述数据的核心释放所述数据;If the data has been accessed, the core that is ready to access the data will wait for the core that has accessed the data to release the data; 一旦数据被释放,准备取用的核心可以从系统内存或另一个核心的高速缓存中获取所述数据;Once the data is released, the core that is ready to access it can fetch said data from system memory or another core's cache; 若数据未被存取,输入/输出区块会向所有核心广播窥探消息,确保没有核心正在使用所述数据;If the data is not being accessed, the I/O block broadcasts a snoop message to all cores to ensure that no core is using the data; 则,准备取用所述数据的核心从系统内存中获取所述数据;Then, the core that is ready to access the data obtains the data from the system memory; 在数据获取过程中,输入/输出区块对历史存取记录表进行会实时更新,对新的数据存取情况进行展示。During the data acquisition process, the input/output block will update the historical access record table in real time and display the new data access status. 6.一种数据存取监测方法,利用如权利要求1-5任一所述的处理器进行实现;其特征在于:所述监测方法包括:6. A data access monitoring method, implemented by using a processor as claimed in any one of claims 1 to 5; characterized in that: the monitoring method comprises: 在历史存取记录表中,记录曾经被高速缓存存取过的数据的相关信息;所述相关信息包括所述数据在系统内存中的地址;In the historical access record table, relevant information of the data that has been accessed by the cache is recorded; the relevant information includes the address of the data in the system memory; 当核心处理器需要访问特定数据时,对历史存取记录表进行查询;When the core processor needs to access specific data, it queries the historical access record table; 若所述历史存取记录表显示所述特定数据已被存取,则已存取特定数据的核心释出所述特定数据,且准备取用所述特定数据的核心至所述系统内存对所述特定数据进行存取;If the historical access record table shows that the specific data has been accessed, the core that has accessed the specific data releases the specific data and is ready to access the core of the specific data to the system memory to access the specific data; 若所述历史存取记录表显示所述特定数据未被存取,则对各核心播送窥探,使所述特定数据被释出,且准备取用所述特定数据的核心至所述系统内存对所述特定数据进行存取。If the historical access record table shows that the specific data has not been accessed, a snoop is broadcast to each core to release the specific data, and the core that is ready to access the specific data goes to the system memory to access the specific data. 7.根据权利要求6所述的一种数据存取监测方法,其特征在于,所述已存取特定数据的核心释出所述特定数据,且准备取用所述特定数据的核心至所述系统内存对所述特定数据进行存取的步骤,包括:7. The data access monitoring method according to claim 6, wherein the step of releasing the specific data from the core that has accessed the specific data and preparing to access the specific data from the core to the system memory comprises: 当请求核心需要访问特定数据时,对历史存取记录表进行查询,确定所述特定数据是否已被持有核心的高速缓存存取过;When the requesting core needs to access specific data, the historical access record table is queried to determine whether the specific data has been accessed by the cache of the holding core; 历史存取记录表会返回一个查询结果,指示所述特定数据是否已被存取;若查询结果显示数据已被存取,则进入下一步;The historical access record table will return a query result indicating whether the specific data has been accessed; if the query result shows that the data has been accessed, proceed to the next step; 请求核心通过处理器内部的总线或通信机制通知持有核心,表明要访问所述特定数据;持有核心在接收到通知后,准备释放所述特定数据;包括将特定数据从高速缓存中标记为“无效”或“脏”,并准备将其写回系统内存;The requesting core notifies the holding core through a bus or communication mechanism inside the processor, indicating that the specific data is to be accessed; after receiving the notification, the holding core prepares to release the specific data, including marking the specific data as "invalid" or "dirty" from the cache and preparing to write it back to the system memory; 一致性电路单元对这一过程进行监控,若特定数据是“脏”的,一致性电路单元会协调将特定数据从持有核心的高速缓存写回系统内存;The consistency circuit unit monitors this process, and if specific data is "dirty", the consistency circuit unit coordinates writing the specific data from the cache holding the core back to the system memory; 一旦特定数据被释放,则请求核心通过访问系统内存来获取所述特定数据;并从系统内存中加载特定数据到其高速缓存中;Once the specific data is released, the requesting core obtains the specific data by accessing the system memory; and loads the specific data from the system memory into its cache; 最后,对历史存取记录表进行更新,完成数据访问。Finally, the historical access record table is updated to complete data access. 8.根据权利要求6所述的一种数据存取监测方法,其特征在于,所述对各核心播送窥探,使所述特定数据被释出,且准备取用所述特定数据的核心至所述系统内存对所述特定数据进行存取的步骤,包括:8. The data access monitoring method according to claim 6, wherein the step of broadcasting snooping to each core to release the specific data and preparing to use the core of the specific data to access the specific data in the system memory comprises: 当请求核心需要访问特定数据时,对历史存取记录表进行查询,确定所述特定数据是否已被其他核心的高速缓存存取过;When the requesting core needs to access specific data, the historical access record table is queried to determine whether the specific data has been accessed by the cache of other cores; 若历史存取记录表显示所述特定数据未被存取,则进入下一步;If the historical access record table shows that the specific data has not been accessed, proceed to the next step; 请求核心触发处理器的窥探机制,所述窥探机制通过内部总线或通信网络向所有其他核心发送窥探请求;The requesting core triggers a snoop mechanism of the processor, which sends a snoop request to all other cores via an internal bus or communication network; 每个核心接收到窥探请求后,检查其高速缓存中是否包含请求的特定数据;After receiving the snoop request, each core checks whether its cache contains the specific data requested; 若命中核心发现其高速缓存中包含了请求的特定数据,则将准备释放所述特定数据;命中核心将高速缓存中的相关数据标记为“无效”或“脏”;若数据是“脏”的,命中核心将数据写回到系统内存;If the hit core finds that its cache contains the requested specific data, it will prepare to release the specific data; the hit core marks the relevant data in the cache as "invalid" or "dirty"; if the data is "dirty", the hit core writes the data back to the system memory; 命中核心通过处理器的内部机制向所有核心广播数据释放通知;The hit core broadcasts data release notification to all cores through the processor's internal mechanism; 请求核心在接收到数据释放通知后,获取到数据现在位于系统内存或即将被释放回系统内存的信息;After receiving the data release notification, the requesting core obtains information that the data is now in the system memory or is about to be released back to the system memory; 请求核心则从系统内存中读取数据,并将其加载到高速缓存中,并对历史存取记录表进行更新,对这次数据访问的结果和数据的最新存取状态进行记录。The requesting core reads the data from the system memory, loads it into the cache, and updates the historical access record table to record the result of this data access and the latest access status of the data.
CN202411759219.XA 2024-12-03 2024-12-03 Processor and data access monitoring method Active CN119226190B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411759219.XA CN119226190B (en) 2024-12-03 2024-12-03 Processor and data access monitoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411759219.XA CN119226190B (en) 2024-12-03 2024-12-03 Processor and data access monitoring method

Publications (2)

Publication Number Publication Date
CN119226190A CN119226190A (en) 2024-12-31
CN119226190B true CN119226190B (en) 2025-03-14

Family

ID=94067232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411759219.XA Active CN119226190B (en) 2024-12-03 2024-12-03 Processor and data access monitoring method

Country Status (1)

Country Link
CN (1) CN119226190B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118964232A (en) * 2024-10-16 2024-11-15 上海芯力基半导体有限公司 Data access method and processor

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5335335A (en) * 1991-08-30 1994-08-02 Compaq Computer Corporation Multiprocessor cache snoop access protocol wherein snoop means performs snooping operations after host bus cycle completion and delays subsequent host bus cycles until snooping operations are completed
US6807608B2 (en) * 2002-02-15 2004-10-19 International Business Machines Corporation Multiprocessor environment supporting variable-sized coherency transactions
US7353341B2 (en) * 2004-06-03 2008-04-01 International Business Machines Corporation System and method for canceling write back operation during simultaneous snoop push or snoop kill operation in write back caches
CN116126517A (en) * 2022-12-13 2023-05-16 海光信息技术股份有限公司 Access request processing method, multi-core processor system, chip and electronic device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118964232A (en) * 2024-10-16 2024-11-15 上海芯力基半导体有限公司 Data access method and processor

Also Published As

Publication number Publication date
CN119226190A (en) 2024-12-31

Similar Documents

Publication Publication Date Title
EP1153349B1 (en) Non-uniform memory access (numa) data processing system that speculatively forwards a read request to a remote processing node
JP3661761B2 (en) Non-uniform memory access (NUMA) data processing system with shared intervention support
JP3644587B2 (en) Non-uniform memory access (NUMA) data processing system with shared intervention support
US6209065B1 (en) Mechanism for optimizing generation of commit-signals in a distributed shared-memory system
US6286090B1 (en) Mechanism for selectively imposing interference order between page-table fetches and corresponding data fetches
JP5037566B2 (en) Optimizing concurrent access with a directory-type coherency protocol
US7613882B1 (en) Fast invalidation for cache coherency in distributed shared memory system
US9170946B2 (en) Directory cache supporting non-atomic input/output operations
CA2280172C (en) Non-uniform memory access (numa) data processing system that holds and reissues requests at a target processing node in response to a retry
US20130205096A1 (en) Forward progress mechanism for stores in the presence of load contention in a system favoring loads by state alteration
US6266743B1 (en) Method and system for providing an eviction protocol within a non-uniform memory access system
JPH06274461A (en) Multiprocessor system having cache matching ensuring function validating range designation
JP2008525901A (en) Early prediction of write-back of multiple owned cache blocks in a shared memory computer system
Alian et al. Data direct I/O characterization for future I/O system exploration
US11126564B2 (en) Partially coherent memory transfer
US20140006716A1 (en) Data control using last accessor information
CN118964232B (en) Data access method and processor
US6615321B2 (en) Mechanism for collapsing store misses in an SMP computer system
US7774554B2 (en) System and method for intelligent software-controlled cache injection
CN118550849B (en) Cache consistency maintenance method, multi-core system and electronic device
CN112136118A (en) Transport protocol in a data processing network
CN119226190B (en) Processor and data access monitoring method
Mallya et al. Simulation based performance study of cache coherence protocols
CN119201008B (en) A multi-core processor and a method for centrally managing cache consistency
JPH09311820A (en) Multiprocessor system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant