CN119226190B

CN119226190B - Processor and data access monitoring method

Info

Publication number: CN119226190B
Application number: CN202411759219.XA
Authority: CN
Inventors: 陈复; 唐菊飞
Original assignee: Shanghai Xinliji Semiconductor Co ltd
Current assignee: Shanghai Xinliji Semiconductor Co ltd
Priority date: 2024-12-03
Filing date: 2024-12-03
Publication date: 2025-03-14
Anticipated expiration: 2044-12-03
Also published as: CN119226190A

Abstract

The present invention proposes a processor and data access monitoring method. It belongs to the technical field of operation and memory access of multi-core processors; the processor includes an input/output block, and the input/output block is used to connect each of the cores and the system memory, wherein the input/output block includes a historical access record table, and the historical access record table is used to temporarily store the address of at least a part of the data accessed by the cache of each core. Since the information of the cache accessed before is recorded in the historical access record table, once the information of the desired target data is found in the historical record table, the core with the data can be accurately required to release the data, thereby significantly reducing the frequency of broadcast snooping.

Description

Processor and data access monitoring method

Technical Field

The invention provides a processor and a data access monitoring method, and belongs to the technical field of operation and memory access of multi-core processors.

Background

A system having a processor typically utilizes a memory controller to control and access resources of the system memory. When the memory controller is in effect receiving memory access requests, the memory controller buffers the memory access requests and processes the buffered memory access requests according to certain specified priorities.

In the prior art, each core of the processor is respectively provided with a cache, so that a certain physical block of the system memory can be accessed by more than one core at the same time, and the situation that data are different among a plurality of caches must be avoided. The Condition of cache contention (CACHE RACE Condition) between multiple processors occurs mainly when there is a write action, at which time it is necessary to snoop the cache and the bus by means of a snoop cache coherency protocol (Snoop Cache Coherency Protocol), and the behavior of the cache is known from the snoop, thus determining which activity to take.

A common coherency protocol is the MOESI (Modified Owned Exclusive SHARED INVALID) protocol. Each cache line according to this protocol includes a status bit indicating the MOESI status of the cache line. These states include Modified (M) to indicate that the cache line is Modified, exclusive (E) or Shared (S), or Invalid (I). And the Owned (O) state indicates that the cache line in the cache that may share copies with other caches and that the data in memory is stale is modified.

However, according to the above-mentioned technical means, the processor needs to continuously change the state of the cache line during the operation process, which obviously reduces the performance of the processor. The method reduces the access requirement to the system memory, thereby reducing the number of times of data movement between the system memory and the cache memory, and improving the system performance.

Disclosure of Invention

The invention provides a processor and a data access monitoring method, which are used for solving the problem that the performance of the processor is reduced because the state of a cache line is continuously changed by the behavior of a cache through snoop:

The invention provides a processor, which comprises a plurality of cores and all cores are connected with a system memory, and comprises an input/output block, wherein the input/output block is used for connecting each core and the system memory, and comprises a history access record table, and the history access record table is used for temporarily storing addresses of at least one part of data accessed by a cache of each core.

Further, the history table of the input/output block is connected to a coherency circuit unit and a memory controller, wherein the coherency circuit unit is connected to the core, and the memory controller is connected to the system memory.

Further, the coherence circuit unit includes any one of MOSEI and MESI protocols.

Further, delays may occur when a cache line switches between different states. For example, in the MESI protocol, a cache line may switch from an exclusive (E) state to a shared (S) state, or from a modified (M) state to an invalid (I) state, each state switch involving the sending and receiving of messages and the processing of responses to those messages, and the state switch delay may be calculated by the following equation:

Wherein n represents the total number of state switching steps; indicating the initialization time of the i step; Representing the propagation time of step i; Y represents the influence factor of the propagation distance; Indicating the acknowledgement time of the i-th step;

in cache coherency protocols, when one core needs to update the cache state of the other core, this needs to be done by messaging. The messaging delay includes the propagation time of the message on the bus and the processing time at the target core. Calculated by the following formula:

Wherein m represents the total number of messaging steps; Indicating the transmission time of the j-th step; Representing the transmission time of the j th step; representing the number of transmission nodes in step j; representing the processing time of the j step;

in some operations, such as write operations, one core may need to wait for the response of the other core to complete the update of the cache state. Waiting for a response delay refers to the time taken from sending a request until all necessary responses are received. Calculated by the following formula:

Wherein, Indicating the time at which the request was sent and p indicating the total number of corresponding steps,Indicating the response time of the kth step.

Adding the three delays to obtain the total delay in the cache consistency protocol:

the performance of the cache coherence protocol is evaluated by calculating the total delay, and if the total delay is below a preset threshold, the performance of the cache coherence protocol is considered acceptable, and if above the threshold, the protocol or hardware design needs to be optimized to reduce the delay. Further, the input step of the input/output block includes:

When any core of the processor performs read-write operation on a Cache (Cache) of the processor, capturing relevant information (such as a system memory address of data) of the operation and inputting the relevant information into a history access record table of an input/output block;

The history access record list in the input/output block interacts with a consistency circuit unit, and the consistency circuit unit maintains the consistency of data between cores based on MOESI or MESI protocol;

when the cache state of the core changes (such as data is modified, replaced or invalid), the change information is input into a history access record table through a consistency circuit unit, and the record is updated;

When the core needs to load data from the system memory to the cache, or write back data from the cache to the system memory, the information of the operation interacts with the memory controller through the input/output block.

Further, the outputting step of the input/output block includes:

when the core needs to access specific data, a query request is sent to a history access record table in the input/output block;

If the history record indicates that particular data has been accessed (i.e., is present in a cache of a core), then the core that is ready to fetch the data waits for the core that has accessed the data to release the data (possibly coordinated by a coherency circuit unit);

If the data is released, the core to be taken acquires the data from the system memory or the cache of another core;

If the history record indicates that the particular data was not accessed (i.e., is not present in the cache of any core), then the I/O block will broadcast a snoop message to all cores;

in the process of data acquisition, the input/output block and the memory controller work cooperatively to ensure the correct transmission of data from the system memory to the cache;

Meanwhile, the input/output block updates the history access record table in real time.

Further, the accessing step of the history access record table includes:

when any core of the processor executes read-write operation on a Cache (Cache), capturing related information of the operation by a system, wherein the information comprises a system memory address of data, an operation type (read/write) and a core number for executing the operation;

the captured operation information is input into a history access record table in the input/output block, wherein the history access record table stores at least one part of the information, and the at least one part comprises a system memory address of data;

The history access record table interacts with a consistency circuit unit, and the consistency circuit unit maintains the consistency of data between cores based on MOESI or MESI protocols;

when the cache state of the core changes (such as data is modified, replaced or invalidated), the consistency circuit unit transfers the change information to the history access record table;

When the core needs to load data from the system memory to the cache or write back data from the cache to the system memory, the information of the operation interacts with the memory controller through the input/output block;

The memory controller is responsible for actual data transmission, and the input/output block records the relevant information of the operation (especially the system memory address of the data) into the history access record table;

when the core needs to access specific data, sending a query request to a history access record table in an input/output block, wherein the query request comprises a system memory address of the data to be accessed;

after receiving the query request, the history access record table checks the stored record to determine whether the data has been accessed by caches of other cores;

If the data has been accessed, the core that is ready to fetch the data waits for the core that has accessed the data to release the data;

once the data is released, the core that is ready to fetch may fetch the data from system memory or another core's cache;

If the data is not accessed, the I/O block will broadcast a snoop message to all cores, ensuring that no cores are using the data;

Then, preparing a core for taking the data to acquire the data from a system memory;

in the process of data acquisition, the input/output block updates the history access record table in real time and displays new data access conditions, wherein the new access conditions comprise information such as the latest access state, access time and the like of the recorded data.

The invention provides a data access monitoring method, which comprises a processor with a plurality of cores, and comprises the following steps:

Recording related information of data which is accessed by a cache in a history access record table, wherein the related information comprises an address of the data in a system memory;

when the core processor needs to access specific data, inquiring a history access record table;

if the history access record shows that the specific data is accessed, releasing the specific data by the core which has accessed the specific data, and preparing to access the specific data by taking the core of the specific data into the system memory;

if the history access record indicates that the specific data is not accessed, a snoop is broadcast to each core, the specific data is released, and the core of the specific data is ready to be taken to the system memory for accessing the specific data.

Further, the step of releasing the specific data from the core having accessed the specific data and preparing to access the specific data from the core having accessed the specific data to the system memory comprises:

When a request core needs to access specific data, inquiring a history access record table to determine whether the data is accessed by a cache of a holding core;

If the query result shows that the data has been accessed (i.e., exists in a cache of a core), then go to the next step;

the holding core, upon receipt of the notification, prepares to release the data, including marking the data as "invalid" or "dirty" from the cache (if the data has been modified) and preparing it for writing it back to system memory (if the data is "dirty");

the coherency circuit unit monitors this process and if the data is dirty, the coherency circuit unit coordinates writing the data back to system memory from the cache holding the core;

Once the data is released (or if the data is never cached, directly accesses the system memory), the requesting core retrieves the data by accessing the system memory;

and finally, updating the history access record table to finish data access.

Further, the step of broadcasting a snoop to each core to release the specific data and prepare to access the specific data from the core of the specific data to the system memory comprises:

When a request core needs to access specific data, inquiring a history access record table to determine whether the data is accessed by caches of other cores;

If the history table indicates that the particular data is not accessed (i.e., is not in any core's cache), then go to the next step;

the request core triggers a snoop mechanism of the processor, and the snoop mechanism sends snoop requests to all other cores through an internal bus or a communication network, wherein the snoop requests contain address information of data to be accessed in a system memory;

After each core receives the snoop request, checking whether the cache contains the requested data;

If the hit core finds that the requested data is contained in its cache, it will be ready to release the data, the hit core marks the relevant data in the cache as "invalid" or "dirty" (if the data has been modified);

The hit core broadcasts a data release notification to all cores through the processor's internal mechanisms, indicating that it is ready to release data or has written data back to system memory;

After receiving the data release notification, the request core acquires information that the data is now in the system memory or is about to be released back to the system memory;

The request core reads the data from the system memory, loads the data into the cache, updates the history access record table, and records the result of the data access and the latest access state of the data.

Further, the step of each core after receiving the snoop request further includes:

After each core receives the snoop request and inspects its cache, the time from the receipt of the snoop request to the determination of whether it hits (i.e., whether the requested data is contained in the cache) is recorded, which is referred to as the "snoop response time";

for each hit core, the total time from the determination of the hit to the preparation of the release of the data (including marking the data as "invalid" or "dirty" and, if necessary, writing the data back to system memory) is recorded, this time being referred to as the "data preparation time";

After a certain amount of snoop event data is collected (e.g., every thousand snoops or minutes), an average snoop delay is calculated, wherein the average snoop time is calculated by the following equation:

Where N represents the number of snoop events, A snoop event number representing a hit; representing a set of response times for all snoop events; A set of data preparation times representing all hit snoop events; And Beta represents the base of logarithm and is used for carrying out logarithmic exchange on the corresponding time; the coefficients representing the exponential function are used to adjust the exponential increase in the data preparation time.

Wherein if a snoop request misses the cache of any core, the data preparation time is 0;

Judging the efficiency of a snoop mechanism based on a preset threshold;

if the average snoop delay is lower than the set threshold, judging that the snoop mechanism is high in efficiency;

If the average snoop delay is higher than the set threshold, judging that the snoop mechanism is low in efficiency;

If the snoop mechanism is determined to be inefficient, the cause of the high snoop latency may be internal bus congestion, communication network latency, inefficient processor snoop logic, etc.

Based on the analysis results, corresponding handling measures are taken, including upgrading hardware components such as internal buses, communication networks or processors, and improving algorithms or policies of snoop mechanisms, such as reducing unnecessary snoop requests, optimizing data access patterns, etc.

And feeding back the adopted disposal measures and the effects thereof to related teams or personnel, performing iterative adjustment according to the actual effects, and continuously monitoring the performance of the snoop mechanism.

The invention has the beneficial effects that the information which is accessed by the cache is recorded in the history access record table, so that once the information of the target data which is required is found in the history access record table, the core with the data can be accurately required to release the data, thereby obviously reducing the frequency of broadcasting snooping. The method comprises the steps of combining a history access record list with a consistency circuit unit, effectively maintaining data consistency among cores, reducing data conflict and error, improving reliability of a system, quickly judging whether data is accessed by other cores through inquiry of the history record in a data request process, reducing unnecessary data transmission and delay, improving overall efficiency of data access, reducing frequent access to a system memory, reducing occupation of memory bandwidth and improving system performance through real-time monitoring and recording access conditions of the data, quickly positioning problems when errors or faults occur through recording history access information, improving fault recovery capability of the system, providing data support for system performance optimization through monitoring delay and efficiency of a snoop mechanism, helping development teams to identify bottlenecks and improve, and dynamically managing the data access by the history access record list and the snoop mechanism updated in real time, so that a processor can flexibly cope with different load conditions.

Drawings

FIG. 1 is a schematic diagram of a processor architecture according to an embodiment of the present invention;

FIG. 2 is a flow chart of data access monitoring according to an embodiment of the invention.

Detailed Description

Referring to fig. 1, the embodiment discloses a processor 100 including a plurality of cores 101, 102, 103, 104 of C0, C1, C2., cn, etc., and each core 101-104 is respectively coupled with a cache (denoted by "$" symbol). The processor 100 has an input/output block (I/Oblock) 120, and the I/O block 120 is connected to each of the cores 101-104. Next, the processor 100 is connected to a system memory 130 through the i/o block 120.

Specifically, the input/output block 120 includes a history table 121 coupled to a first multiplexer 122, a second multiplexer 123, and a coherency circuit unit 124, wherein the first multiplexer 122 is further coupled to the second multiplexer 123. The memory controller 125 is connected to the second multiplexer 123. The coherency circuit 124 is coupled to the core 101.

Further, the first multiplexer 122 is connected to each of the cores 101 to 104, so that the request command of each of the cores 101 to 104 can be transmitted to the history table 121 or the second multiplexer 123 through the first multiplexer 122. The output result of the history table 121 may be transmitted to the second multiplexer 123 or the coherency circuit unit 124. The output of the second multiplexer 123 may be transmitted to the memory controller 125 to access the data in the system memory 130. The output of the coherency circuit unit 124 may be transmitted to each of the cores 101-104.

The history table 121 is a cache-like memory queue, and the contents of the memory queue include an address, a validity, and a data source, so as to correspond to the address of the data taken by at least a portion of the cache. The coherency circuit unit 124 may employ a known MOESI protocol or MESI protocol.

Referring to FIG. 2, an embodiment of the present invention discloses a data access monitoring method, which takes the correspondence between core A and core B as an example, and applies the MESI protocol:

Block 201 discloses that core A accesses a data S;

Block 202 discloses that core A first goes to the history access record table to check the record in the history record table, whether the data S is recorded in the history access record table or not is recorded in the history access record table according to whether the data S is accessed or not, and the data S is recorded in the history access record table once the data S is accessed;

Block 203 discloses a determining step for determining whether the data S is recorded in the history access record;

Block 204 discloses that if data S is in the history list and is shown as being accessed by core B, then a determination is again made as to the status of data S, wherein if the status of data S is declared shared invalid, then access of data S to system memory by core A may be performed as disclosed in block 205.

In the determination step disclosed in block 204, if the data S is not the shared (S) or invalid (I) condition, the state of the data S is either modified (M) or exclusive (E), which can be performed as disclosed in block 206.

Block 206 discloses that core A requests core B to discard/release data S and invalidates data S in core B, if the state of data S is modified (M) or exclusive (E), which causes core A to access data S to system memory, as further disclosed in block 205.

If it is found in the determining step of block 203 that the data S is not recorded in the history table, then core A broadcasts a snoop for each core as disclosed in block 207;

then, as indicated by the block 208, each core is required to perform state clearing/releasing of the data S and declare the state of the data S according to the cache coherency protocol.

After each core has completed clearing and the state of the data S has been re-declared, core A may access the data S from system memory, as shown in block 205.

Since the information that the cache has accessed is recorded in the history table, once the information of the target data is found in the history table, the core with the data can be accurately required to release the data, so that the frequency of broadcasting snooping can be obviously reduced. It will be appreciated that the greater the memory capacity of the history table, the less frequent the snoop will be broadcast.

In the MESI protocol, there are four states for a cache line, modify (M), exclusive (E), shared (S), and invalid (I). When the data state of a cache line changes, for example, a switch from an exclusive (E) state to a shared (S) state, or a switch from a modified (M) state to an invalid (I) state, a state switching operation is required. The state switching delay is obtained by the following formula:

When a core modifies data in a cache line, the state of the cache line changes from exclusive (E) or shared (S) to modified (M). When other cores need to access the data, access rights to the data are requested through messaging. Depending on the type of request and the state of the current cache line, the cache controller may decide whether to send the data to the requester (shared state) or invalidate the requester's cache line (invalid state). Each state switch involves the sending and receiving of messages and the processing of responses to those messages. The messaging delay includes the propagation time of the message on the bus and the processing time at the target core. The response processing time depends on the speed at which the target core processes the received message. The message passing delay is obtained by the following formula:

In some operations, such as write operations, one core may need to wait for the response of the other core to complete the update of the cache state. When a core needs to modify data in a cache line, it will first check the state of the cache line. If the cache line is in the exclusive (E) state, the core may directly modify the data and update the state to modified (M). If a cache line is in the shared (S) state, the core needs to send a message to other cores that own the cache line requesting them to invalidate the respective cache line. After sending the request, the core needs to wait for the responses of all relevant cores. Waiting for a response delay refers to the time taken from sending a request until all necessary responses are received. If a core fails to respond in time, it may cause the write operation to be blocked, thereby increasing the overall delay. Wherein the wait response delay is obtained by the following formula:

adding the three delays can result in a total delay in the cache coherency protocol.

Total delay = state switching delay + messaging delay + wait for response delay.

The performance of the cache coherence protocol can be evaluated by calculating the total delay, if the total delay is lower than a preset threshold, the performance of the cache coherence protocol is considered acceptable, if the total delay is higher than the threshold, the protocol or hardware design is required to be optimized to reduce the delay, the main sources of the delay in the cache coherence protocol can be more accurately identified by subdividing the delay into a state switching delay (STA), a message delivery delay (MPD) and a waiting response delay (RED), and the optimization efficiency is improved by helping developers optimize specific delay sources. The delay quantization is possible through a specific calculation formula, and the performance of the cache consistency protocol can be intuitively evaluated through the Total Delay (TD) obtained through calculation. A preset threshold is set as a criterion for performance evaluation. By comparing the total delay to a threshold, it can be determined whether the performance of the cache coherency protocol is acceptable. When the total delay is above the threshold, a direction is provided to optimize the protocol or hardware design. And a developer can analyze which links have an optimization space according to various parameters in the delay calculation formula, so that targeted optimization is performed. By optimizing the delay source in the cache consistency protocol, the total delay can be reduced, and the response speed and the overall performance of the system can be improved. The reduction of the delay is important to the improvement of the user experience, and the reduction of the delay can obviously improve the user experience in application scenes with higher real-time requirements, such as games, real-time communication and the like. The performance optimization of the cache consistency protocol is beneficial to reducing the unstable phenomenon of the system caused by too high delay, improving the reliability and stability of the system and reducing the risk of system breakdown or crash. Through the deep research on cache consistency protocol delay, innovation and development of protocols can be promoted, and developers can explore more efficient and concise protocol designs so as to adapt to continuously changing computing environments.

When any core of the processor performs a read/write operation on its Cache (Cache), information about the operation (e.g., system memory address of data, operation type, etc.) is captured, and the captured information is then input into the history table of the input/output block. This step ensures that each time a core accesses the cache memory, basic data is provided for subsequent data access monitoring, the history access record table in the input/output block interacts with the coherency circuit unit, the coherency of data between the cores is maintained based on a cache coherency protocol such as MOESI or MESI, when the cache state of the cores changes (such as data is modified, replaced or invalidated), the change information is input into the history access record table through the coherency circuit unit, the record is updated in real time, and when the cores need to load data from the system memory into the cache, or write back data from the cache into the system memory, the information of these operations interacts with the memory controller through the input/output block, and the memory controller performs corresponding memory read-write operations according to the received information, so as to ensure correct loading and write-back of the data.

The technical scheme realizes real-time access monitoring of data by capturing read-write operation information of a processor core to a cache and inputting the read-write operation information to a historical access record table, is beneficial to a system administrator or a developer to know the flow condition of the data in real time, ensures the correctness and the safety of the data, interacts with a consistency circuit unit through the historical access record table in an input/output block, maintains the consistency of the data between the cores through the consistency circuit unit based on MOESI or MESI protocols, ensures that the multi-core processor cannot generate data conflict or inconsistency when processing the data, improves the stability and the reliability of the system, inputs the change information into the historical access record table through the consistency circuit unit when the cache state of the core is changed (such as data is modified, replaced or invalid), updates the record, ensures that the information in the historical access record table is always consistent with the actual state of the core cache, provides an accurate basis for subsequent data access and processing, ensures that the system needs to load the data from the system to the cache or the situation that the data is not consistent when the system is written back to the cache, and the system is more convenient to control the system through the overall memory, the system can control the system, the system can be more flexibly updated and the memory access and the memory is more convenient to control the system, the system is more convenient to input and the system has more improved in the overall system access and the system, the system has more data access and the system access and the memory is more convenient to control and more convenient to update and the memory access and more necessary to control the system, by monitoring the data access condition and maintaining the data consistency in real time, the system can discover and process potential problems in time, thereby being beneficial to reducing the maintenance cost of the system and reducing the system breakdown and repair work caused by data errors or inconsistencies.

When a core needs to access specific data, it will send a query request to the history access record table in the input/output block, after receiving the request, it will query the history access record table to determine the position of the required data, if the history access record shows that the specific data has been accessed (i.e. exists in the cache of a certain core), the core ready to take the data will enter a waiting state, waiting for the core that has accessed the data to release the data, if the history access record shows that the specific data has not been accessed (i.e. does not exist in the cache of any core), the input/output block will broadcast a snoop message to all cores to find out if the data has not been recorded in the caches of other cores (although this is less the case, may be due to a delay or an error in recording), once the core that has accessed the data has been released (possibly completed by a coordination of a consistency circuit unit), the core ready to take the data can be fetched from the system memory or the cache of another core, in the data fetching process, the input/output block and the controller will work in close cooperation, ensuring that the data is transferred from the system to the cache of the system or the latest data is fetched from the cache of any core, and the history access record is updated in real time.

The data access method has the advantages that the history access record list is inquired, the cores for preparing and taking the data can quickly know whether the data are occupied by other cores or not, so that unnecessary waiting is avoided, once the data are released, the cores for preparing and taking the data can immediately acquire the data from a system memory or a cache of another core, the speed and efficiency of data access are improved, the data can be kept consistent when the data are accessed and modified through coordination of the consistency circuit units, data conflict and errors are avoided, in the data acquisition process, the cooperative work of the input/output block and the memory controller ensures the correct transmission of the data from the system memory to the cache, the integrity of the data is maintained, the access times of the system to the memory are reduced through efficient data access management and optimization, the cost and energy consumption of the system are reduced, the input/output block can update the history access record list in real time, accurate data access state information is provided for the system, the system can be more reasonably distributed and utilized, different application scenes and requirements can be met through flexible data access strategies and optimization, the system flexibility is improved, the system can be adapted to the system is improved, the problem of the system is discovered through the cooperative access record and the history access record can be timely discovered, and the risk of the system access is reduced.

When any core of the processor performs a read/write operation to the cache, the system captures detailed information about the operation, including the system memory address of the data, the type of operation (read or write), and the core number at which the operation was performed, and the captured operation information is then entered into the history access record table in the input/output block. The record list stores at least the system memory address of the data, which is the key basis for data access inquiry, and the history access record list and the consistency circuit unit keep close interaction. The coherency circuit unit maintains coherency of data between cores based on MOESI or MESI protocols. These protocols ensure that copies of data remain synchronized and consistent as they are shared or modified among multiple cores, and that coherency circuitry passes these change information to the history table as the cache state of the cores changes (e.g., the data is modified, replaced, or invalidated). This ensures that the information in the record table always remains consistent with the actual state of the core cache, and that the information of these operations will interact with the memory controller through the I/O block when the core needs to load data from the system memory into the cache, or write data back from the cache to the system memory. The memory controller is responsible for the actual data transfer tasks, and at the same time, the I/O block records the relevant information of these operations (especially the system memory address of the data) into the history access record table. This ensures that the record table reflects the access of the data in real time, and when the core needs to access specific data, it will issue a query request to the history access record table in the input/output block. The history access record table, upon receiving a query request, examines the stored record to determine if the data has been accessed by the caches of the other cores. If the data has been accessed, the core that is ready to fetch the data waits for the core that has accessed the data to release the data (this process is coordinated by the coherency circuitry unit), once the data is released, the core that is ready to fetch may fetch the data from system memory or another core's cache. If the data is not accessed, the input/output block broadcasts a snoop message to all cores to ensure that no cores are using the data and acquire the data from the system memory, and in the process of data acquisition, the input/output block updates the history access record table in real time to display new data access conditions, including the latest access state, access time and other information of the recorded data.

The history access record table allows the core to quickly inquire the access state of specific data so as to avoid unnecessary memory access and data transmission delay, the system can timely find and process data access conflict through real-time update and inquiry, reduce waiting time among the cores, the core can more effectively utilize cache resources, because the history access record table provides accurate information about whether the data are cached or not, the scheme optimizes the use of memory bandwidth through reducing unnecessary memory access, improves the overall system performance, interaction of the history access record table and a consistency circuit unit ensures the consistency of the data among the cores, avoids data conflict and error, real-time update of the data access state ensures that the system can always acquire latest data state information, the system can timely find potential data access problems through real-time update and inquiry, reduce the risk of system crash and data loss, and the scheme can adapt to different application scenes and requirements through flexible data access strategies and optimization in the data transmission and access process, thereby simplifying the system and centrally designing the system access state. The quick positioning and solving of the data access problem reduces the debugging and maintenance costs of the system.

When a processor core (referred to as a "request core") needs to access particular data, it first queries the history access record table. The record table stores previous access to data by all cores, including whether the data is held by a cache of a core. The purpose of the query is to determine whether the data has been cached by another core (referred to as the "holding core"). The history access record table returns a query result telling the requesting core whether the data has been accessed. If data has been accessed (i.e., is present in the cache of a core), the requesting core needs to wait for the holding core to release the data. The requesting core informs the holding core via a bus or communication mechanism internal to the processor that it wants to access the data. The holding core, upon receiving the notification, prepares to release the data. The holding core marks the data as "invalid" or "dirty" from the cache, and if the data is "dirty" (i.e., has been modified), the holding core needs to be ready to write the data back to system memory. The coherency circuit unit monitors the data release process. If the data is dirty, the coherency circuitry coordinates writing the data back to system memory from the cache holding the core, the requesting core may access the system memory to retrieve the data once the data is released (or directly access the system memory if the data is not cached), the requesting core loads the data from the system memory into its cache for subsequent quick access, and finally, the history access record table is updated to reflect the latest access of the data.

The request core can quickly determine the position of data by inquiring the history access record table, unnecessary memory access is avoided, so that the data access efficiency is improved, the request core can quickly acquire the required data when the data is released or directly accessed from the system memory, the waiting time is reduced, and the consistency circuit unit is introduced to ensure the consistency of the data among a plurality of cores. Particularly when data is modified (i.e. changed into dirty data), a consistency circuit unit coordinates writing the data back to a system memory, thereby ensuring the accuracy of the data in the system memory, effectively avoiding data conflict and error by marking the data as invalid or dirty from a cache and timely writing back to the system memory, fully utilizing cache resources, reducing the consumption of memory bandwidth by reducing unnecessary memory access, simultaneously, more effectively managing the data access state by updating a history access record table in real time, improving the utilization rate of the whole resource, and reducing the risks of system breakdown and data loss by strict data access control and consistency maintenance. In the process of data access, any potential problem can be found and processed in time, so that the stability and reliability of the system are enhanced, the system can be well adapted to the environment of a multi-core processor, and data sharing and consistent access among a plurality of cores are supported. By optimizing the data access strategy, powerful support is provided for efficient operation of the multi-core processor. By centrally managing the data access states, the design and implementation of the system is simplified. Meanwhile, by providing clear data access flow and consistency maintenance mechanism, the development and maintenance cost of the system is reduced.

When a requesting core needs to access certain data, it first looks up the history table to determine if the data has been accessed by the caches of other cores, if the history table shows that the data has not been accessed (i.e. is not in the caches of any of the cores), the requesting core knows that it needs to fetch data from system memory, the requesting core triggers the processor's snoop mechanism by sending snoop requests to all other cores via the internal bus or communication network, the snoop requests contain address information of the data to be accessed in the system memory, so that other cores can check if their own caches contain the data, each core checks if it contains the requested data in its cache after receiving the snoop requests, and if a certain core (called a hit core) finds that it contains the requested data in its cache, it will be ready to release the data. The hit core marks the data as "invalid" or "dirty" from its cache (if the data is modified), if the data is "dirty" (i.e., has been modified), the hit core will write it back to system memory to ensure accuracy of the data in system memory, the hit core broadcasts a data release notification to all cores through the internal mechanisms of the processor, this notification indicating that the hit core is ready to release the data (if the data is not modified) or has written the data back to system memory (if the data is "dirty"), the request core knows that the data is now in or about to be released back to system memory after receiving the data release notification. Thus, the requesting core may read data from system memory and load it into its own cache for subsequent quick access. Finally, the request core updates the history access record table, and records the result of the data access and the latest access state of the data.

The snoop mechanism ensures that when a certain core needs to access specific data, if the data is cached by other cores, the data can be released or written back to the system memory in time, so that the consistency of the data is ensured. By broadcasting the data release notification, all cores can know the latest state of the data in real time, so that data collision and errors are avoided, when the data is not in the cache of the request core, a snoop mechanism can be rapidly positioned to the actual position of the data (namely, the system memory or the caches of other cores), unnecessary memory access is reduced, the efficiency of data access is improved, the delay of the system is reduced, snoop requests and data release notification are sent through an internal bus or a communication network, the communication is local and targeted, excessive consumption of bus bandwidth is not caused, compared with the traditional bus monitoring mode, the snoop mechanism is more efficient and energy-saving, the risks of system breakdown and data loss are reduced through strict data access control and consistency maintenance, and any potential problems can be timely discovered and processed in the process of data access, so that the stability and reliability of the system are enhanced. As the number of processor cores increases, snoop mechanisms can ensure consistent and efficient access of data across multiple cores. By centrally managing the data access status and history access record table, the scheme simplifies the management process of data access. The developer can be more focused on the logical implementation of the application without having to pay too much attention to the underlying data access details.

When a certain core in the system needs to access a certain data item, which may be present in the caches of other cores, a snoop request is initiated, each core receiving the snoop request checks its cache to determine whether the requested data is contained (i.e. hit), for each snoop request, the time from the receipt of the request to the determination of whether it hits, referred to as the snoop response time, is recorded, for hit cores, the total time from the determination of hits to the preparation of release of the data (including marking the data as "invalid" or "dirty" and possibly writing the data back to system memory) is further recorded, referred to as the data preparation time, and after a certain amount of snoop event data is collected (e.g. every thousand snoops or minutes), the average snoop delay is calculated. Wherein the average snoop time is calculated by the following formula:

Where N represents the number of snoop events, A snoop event number representing a hit; representing a set of response times for all snoop events; A set of data preparation times representing all hit snoop events; And Beta represents the base of logarithm and is used for carrying out logarithmic exchange on the corresponding time; The coefficients representing the exponential function are used to adjust the exponential increase in the data preparation time. The efficiency of the snoop mechanism is determined based on a preset threshold. If the average snoop delay is lower than the set threshold, the snoop mechanism is considered to be high in efficiency, otherwise, the snoop mechanism is considered to be low in efficiency, if the snoop mechanism is judged to be low in efficiency, the reasons for the high snoop delay are analyzed, possible reasons include internal bus congestion, communication network delay, insufficient processor snoop logic and the like, corresponding treatment measures are adopted according to analysis results, such as upgrading hardware components (internal buses, communication networks, processors and the like) or improving algorithms or strategies of the snoop mechanism (such as reducing unnecessary snoop requests, optimizing data access modes and the like), and the adopted treatment measures and effects thereof are fed back to related teams or personnel. And performing iterative adjustment according to the actual effect, and continuously monitoring the performance of the snoop mechanism to ensure continuous and efficient operation of the snoop mechanism.

The scheme can accurately measure the performance of the snoop mechanism by recording the response time and the data preparation time of each snoop request, and the calculation formula of the average snoop delay not only considers the response time of the snoop request, but also includes the data preparation time, and the relative importance of the response time and the data preparation time in the average delay is adjusted through the weight coefficients alpha and gamma. The logarithmic base number beta and the exponential function coefficient lambda in the formula are allowed to be adjusted according to specific application scenes so as to adapt to different performance requirements and hardware environments, and the accurate average snoop delay can be obtained by recording the response time and the data preparation time of each snoop request and calculating by combining the formula, so that accurate performance data is provided for system administrators or developers, and the system administrators or developers can better know the operation condition of a snoop mechanism. The formula can trigger a performance optimization procedure when the average snoop delay is above a set threshold. By performing in-depth analysis on the cause of high delay and taking corresponding treatment measures (such as upgrading hardware components, improving algorithms or strategies, etc.), snoop delay can be significantly reduced, and system performance can be improved. The average snoop delay formula is not only used for current performance evaluation, but also provides powerful support for continuous improvement of the system by continuously monitoring the performance of the snoop mechanism. Feedback and iterative adjustment mechanisms enable the system to be optimized continuously according to actual effects to accommodate changing workload and hardware environments. The high-efficiency snoop mechanism can reduce the data access delay and improve the response speed of the system. By optimizing the snoop mechanism, the data access conflict and error can be reduced, the risk of system breakdown or data loss is reduced, the reliability and stability of the system are enhanced, and the system can be ensured to stably run for a long time. Based on accurate performance evaluation, the scheme can accurately judge the efficiency of the snoop mechanism, and when the average snoop delay is higher than a set threshold, the system can automatically trigger a performance optimization flow, including cause analysis, treatment measure formulation and implementation and the like. The scheme provides a tool for deeply analyzing the reason of high snoop delay, and helps to identify problems such as internal bus congestion, communication network delay, insufficient snoop logic of a processor and the like. This flexibility enables a system administrator or developer to formulate an effective solution to a particular problem. The scheme not only pays attention to the current performance, but also ensures that the system can keep high-efficiency running for a long time by continuously monitoring the performance of the snoop mechanism. The feedback and iteration adjustment mechanism enables the system to be optimized continuously according to the actual effect so as to adapt to the continuously changing workload and hardware environment, the scheme can reduce data access delay and improve the overall performance of the system by optimizing the snoop mechanism, and the efficient snoop mechanism is beneficial to reducing data access conflicts and errors and enhancing the reliability and stability of the system.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A processor, comprising a plurality of cores and all the cores are connected to a system memory, wherein the processor comprises an input/output block, the input/output block is used to connect each of the cores and the system memory, wherein each of the input/output blocks comprises a historical access record table, the historical access record table is used to temporarily store the address of the data accessed by the cache of each of the cores;

The input step of the input/output block includes:

When any core of the processor performs a read or write operation on its cache, relevant information of the read or write operation is captured and input into a historical access record table of the input/output block;

The historical access record table in the input/output block interacts with the consistency circuit unit, and the consistency circuit unit maintains the consistency of data between cores based on the MOESI or MESI protocol;

When the core cache state changes, the change information is input into the historical access record table through the consistency circuit unit to update the record;

When the core needs to load data from the system memory to the cache, or write data back from the cache to the system memory, the information of the read and write operations interacts with the memory controller through the input/output block;

The core is used to receive a snoop request, and the steps after the core receives the snoop request include:

After each core receives the snoop request and checks its cache, the time from receiving the snoop request to determining whether it hits is recorded, and the time is the snoop response time;

For each hit core, record the total time from determining the hit to preparing to release the data, where the time from determining the hit to preparing to release the data includes marking the data as "invalid" or "dirty" and writing the data back to the system memory; the total time is the data preparation time;

After collecting snooping event data, the average snooping delay is calculated; the average snooping time is calculated using the following formula:

Where N is the number of snooping events, Indicates the number of snoop events that hit; represents the set of all snoop response times; Represents the set of data preparation times for all snooping events that hit; and represents the weight coefficient, which is used to adjust the importance of response time and data preparation time in the average delay; β represents the base of the logarithm, which is used to perform logarithmic conversion on the response time; represents the coefficient of the exponential function, which is used to adjust the exponential growth of data preparation time;

Determine the efficiency of the snooping mechanism based on a preset threshold;

If the average snooping delay is lower than the set threshold, the snooping mechanism is judged to be efficient;

If the average snooping delay is higher than the set threshold, the snooping mechanism is judged to be inefficient;

If the snooping mechanism is judged to be inefficient, the cause of the high snooping delay is analyzed; and corresponding disposal measures are taken according to the analysis results;

The measures taken and their effects will be fed back to the relevant teams or personnel, iterative adjustments will be made based on the actual results, and the performance of the snooping mechanism will be continuously monitored.

2. A processor according to claim 1, characterized in that the historical access record table of the input/output block is connected to a consistency circuit unit and a memory controller, wherein the consistency circuit unit is connected to the core, and the memory controller is connected to the system memory.

3. A processor according to claim 2, characterized in that the consistency circuit unit includes a protocol in MOSEI and MESI protocols.

4. The processor according to claim 1, wherein the output step of the input/output block comprises:

When the core needs to access specific data, it issues a query request to the historical access record table in the input/output block;

If the historical access record shows that the specific data has been accessed, the core that is ready to access the specific data will wait for the core that has accessed the cached data to release the specific data;

If the data is released, the core that is ready to use it obtains the specific data from the system memory or the cache of another core;

If the historical access records show that specific data has not been accessed, the input/output block broadcasts a snoop message to all cores;

During the data acquisition process, the input/output block works with the memory controller to ensure the correct transfer of data from the system memory to the cache;

At the same time, the input/output block updates the historical access record table in real time.

5. A processor according to claim 1, characterized in that the step of accessing the historical access record table comprises:

When any core of the processor performs a read or write operation on the cache, the system captures relevant information of the operation;

The captured operation information is input into a historical access record table in the input/output block; the historical access record table stores at least a portion of the operation information, the at least a portion including a system memory address of the data;

The historical access record table interacts with the consistency circuit unit, and the consistency circuit unit maintains the consistency of data between cores based on the MOESI or MESI protocol;

When the core cache state changes, the consistency circuit unit transmits the change information to the historical access record table;

When the core needs to load data from the system memory to the cache, or write data back from the cache to the system memory, the information of the operation interacts with the memory controller through the input/output block;

The memory controller is responsible for the actual data transfer, while the input/output block records the relevant information of the operation into the historical access record table;

When the core needs to access specific data, a query request is issued to the historical access record table in the input/output block, wherein the query request includes the system memory address of the data to be accessed;

After receiving the query request, the historical access record table checks the stored records to determine whether the data has been accessed by the cache of other cores;

If the data has been accessed, the core that is ready to access the data will wait for the core that has accessed the data to release the data;

Once the data is released, the core that is ready to access it can fetch said data from system memory or another core's cache;

If the data is not being accessed, the I/O block broadcasts a snoop message to all cores to ensure that no core is using the data;

Then, the core that is ready to access the data obtains the data from the system memory;

During the data acquisition process, the input/output block will update the historical access record table in real time and display the new data access status.

6. A data access monitoring method, implemented by using a processor as claimed in any one of claims 1 to 5; characterized in that: the monitoring method comprises:

In the historical access record table, relevant information of the data that has been accessed by the cache is recorded; the relevant information includes the address of the data in the system memory;

When the core processor needs to access specific data, it queries the historical access record table;

If the historical access record table shows that the specific data has been accessed, the core that has accessed the specific data releases the specific data and is ready to access the core of the specific data to the system memory to access the specific data;

If the historical access record table shows that the specific data has not been accessed, a snoop is broadcast to each core to release the specific data, and the core that is ready to access the specific data goes to the system memory to access the specific data.

7. The data access monitoring method according to claim 6, wherein the step of releasing the specific data from the core that has accessed the specific data and preparing to access the specific data from the core to the system memory comprises:

When the requesting core needs to access specific data, the historical access record table is queried to determine whether the specific data has been accessed by the cache of the holding core;

The historical access record table will return a query result indicating whether the specific data has been accessed; if the query result shows that the data has been accessed, proceed to the next step;

The requesting core notifies the holding core through a bus or communication mechanism inside the processor, indicating that the specific data is to be accessed; after receiving the notification, the holding core prepares to release the specific data, including marking the specific data as "invalid" or "dirty" from the cache and preparing to write it back to the system memory;

The consistency circuit unit monitors this process, and if specific data is "dirty", the consistency circuit unit coordinates writing the specific data from the cache holding the core back to the system memory;

Once the specific data is released, the requesting core obtains the specific data by accessing the system memory; and loads the specific data from the system memory into its cache;

Finally, the historical access record table is updated to complete data access.

8. The data access monitoring method according to claim 6, wherein the step of broadcasting snooping to each core to release the specific data and preparing to use the core of the specific data to access the specific data in the system memory comprises:

When the requesting core needs to access specific data, the historical access record table is queried to determine whether the specific data has been accessed by the cache of other cores;

If the historical access record table shows that the specific data has not been accessed, proceed to the next step;

The requesting core triggers a snoop mechanism of the processor, which sends a snoop request to all other cores via an internal bus or communication network;

After receiving the snoop request, each core checks whether its cache contains the specific data requested;

If the hit core finds that its cache contains the requested specific data, it will prepare to release the specific data; the hit core marks the relevant data in the cache as "invalid" or "dirty"; if the data is "dirty", the hit core writes the data back to the system memory;

The hit core broadcasts data release notification to all cores through the processor's internal mechanism;

After receiving the data release notification, the requesting core obtains information that the data is now in the system memory or is about to be released back to the system memory;

The requesting core reads the data from the system memory, loads it into the cache, and updates the historical access record table to record the result of this data access and the latest access status of the data.