CN117687895A

CN117687895A - Pseudo lockstep execution across CPU cores

Info

Publication number: CN117687895A
Application number: CN202311155346.4A
Authority: CN
Inventors: 巴拉拉姆·辛哈洛伊; 彼得·霍克希尔德
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2022-09-12
Filing date: 2023-09-08
Publication date: 2024-03-12

Abstract

The present disclosure provides for automatically detecting errors, such as SDCs, in a multi-core computing environment. For example, a core may operate in an error detection mode in which multiple cores repeat the same instruction execution and compare the results. Based on the results, it may be determined whether one of the cores is faulty.

Description

Pseudo lockstep execution across CPU cores

Cross Reference to Related Applications

The present application claims the benefit of the filing date of U.S. provisional patent application No. 63/405,618 filed on 9/12 of 2022, the disclosure of which is incorporated herein by reference.

Background

As Central Processing Units (CPUs) develop, silent Data Corruption (SDC) increases significantly. This may be the result of technology node scaling, transistor unreliability, design margin, and guard band reduction. SDC may also be the result of insufficient burn-in testing, an exponential increase in the number of transistors, etc.

Many different types of faults may occur in the CPU or in a larger system including the CPU. Examples of such errors include machine check anomalies (MCE), SDC, and the like. Memory and cache subsystems typically have a number of failure detection and mitigation mechanisms, such as parity, error Correction Code (ECC), multi-level redundancy, memory scrubbing, memory mirroring, redundant Array of Independent Memories (RAIM), cache line (cache line) deletion, set delete, spare data lanes, cyclic Redundancy Check (CRC), and the like. The CPU core also incorporates a number of detection and mitigation mechanisms such as various architectural registers, parity and ECC protection of caches and other structures, processor instruction retries, use of anti-radiation triggers, residual checking of floating point and fixed point pipelines, and the like.

If a checker is placed in a logical path or storage structure to check for errors, and the checker triggers, the trigger may result in the detection of an unrecoverable error (DUE) or MCE. If the checker does not detect an error and the error alters the end result, the error results in an SDC. The error checking mechanism in the existing CPU core is not robust enough to detect SDC errors.

Disclosure of Invention

The present disclosure provides for automatically detecting errors, such as SDCs, in a multi-core computing environment. For example, a core may operate in an error detection mode in which multiple cores repeat the same instruction execution and compare the results. Based on the results, it may be determined whether one of the cores is faulty. In a production environment, the kernel may be run in an error detection mode by running production code. The kernels may execute the same workload under the same conditions or parameters (e.g., voltage, frequency, temperature, altitude, etc.) in the same runtime environment. During deployment, the kernel may also be run in error detection mode for a period of time to detect a failed kernel. After deployment, the kernel may be run in error detection mode periodically to detect kernels that start to fail over time.

Drawings

FIG. 1 is a schematic diagram of an example system according to aspects of the present disclosure.

FIG. 2 is a block diagram of an example environment for implementing a system in accordance with aspects of the present disclosure.

FIG. 3 is a flowchart of an example method of error checking using instruction execution across multiple cores, in accordance with aspects of the present disclosure.

Detailed Description

The present disclosure provides an error detection mechanism, for example, for detecting Silent Data Corruption (SDC). The mechanism includes a primary core and a secondary core that operate in pseudo-lockstep mode, where both cores execute the same instruction segment. The line eviction synchronizer ensures that corresponding cache lines generated from the primary core's L2 cache and the secondary core's L2 cache are provided to the checker at approximately the same time. The corresponding cache lines may then be compared to determine if there is an exact match.

To implement the error detection mechanism, each core may include a counter to track committed instructions. Each core may further include instructions for managing the operation of the master core, such as coordinating the counting of committed instructions prior to processing interrupts, and identifying cache lines that should not be included in the comparison. A system on a chip (SoC) including a primary core and a secondary core may include an eviction table to store cache lines that have not been compared. It may further include a comparator that ensures that the pair of rows from the eviction table have the same value. Furthermore, it may include a mechanism (e.g., synchronizer) to empty the two L2 caches row by row so that they are sent to the eviction table for inspection by the comparator.

FIG. 1 is a schematic diagram illustrating an example system and method for handling error detection in a core. The software threads are divided into segments. In particular, both cores 120, 130 run the same code for the same data.

The first core 120 and the second core 130 may be any of various types of processing cores. For example, the processing cores may be cores of a CPU, a Graphics Processing Unit (GPU), a Tensor Processing Unit (TPU), a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), or the like.

The first core 120 may be a "primary" core and the second core 130 a "secondary" core. The first core 120 may be visible to the application software and the second core 130 may not be visible to the application software. For example, the secondary core simply repeats what the primary core does without having any effect on the rest of the system. When the first core 120 stores data in memory, such as System Level Cache (SLC) and memory, the second core 130 does not store data outside of its L2. Interrupts generated by the first core are normally handled after initial synchronization, but interrupts generated by the second core 130 may be handled differently based on the type of interrupt.

The first core 120 and the second core 130 may process instruction segments at different rates, contact different cache lines, and execute different instructions. However, the first core 120 and the second core 130 will commit the same instruction set. In the L2 caches 125, 135, the Least Recently Used (LRU) state of a cache line may be different in each cache because instructions executed by the first core 120 and the second core 130 may contact different cache lines at different times.

When the instruction segment is executed by the first core 120 and the second core 130, the results are written to the L2 caches 125, 135. Each write may be referred to as a store. In some examples, the results may initially be cached in an L1 cache (not shown) within the respective cores 120, 130. The time to evict from the L1 cache and store to the L2 cache may be slightly different for the two cores. If the L2 cache is designed to not include the L1 cache, then when a cache line is evicted from the L2 cache, if the same cache line is present in the L1 cache, then the cache line must also be evicted from the L1 cache. Lines in both L2 caches should have the same content when evicted from the L2 caches because the two L2 caches 125, 135 are synchronized prior to eviction to ensure that the caches have seen the same set of store operations.

As shown, each core 120, 130 may include a respective main Translation Lookaside Buffer (TLB) 122, 132 that holds a subset of the entries in the page table in memory. Each core may have an instruction TLB and a data TLB that hold a subset of the entries in the main TLB. According to some examples, for the main core 120, only the first TLB 122 may track references, changes, or other information of pages. When a new entry is created in the first TLB 122, the same entry is also created in the second TLB 132. When an entry is deleted from the first TLB 122, the entry is also removed from the second TLB 132.

At the end of execution of each instruction segment, the entire L2 caches 125, 135 may be flushed to ensure that the contents of each L2 cache match. If the content matches, then the subsequent instruction segments may be executed by the first core 120 and/or the second core 130. Alternatively, the operation of the first core 120 and the second core 130 is performed in pseudo-lockstep mode, wherein the cores 120, 130 execute the same instruction segment, and the results are compared and based on the comparison termination result. If the content does not match, it may be determined that an error has been detected.

According to some examples, each core 120, 130 may indicate a number or count of Instruction Set Architecture (ISA) instructions that have been committed at a given point in time. For example, the "committed instruction count" may track the number of instructions committed by the primary core 120 and the secondary core 130. The count may be reset by, for example, the respective cores 120, 130. The first core 120 and the second core 130 may be synchronized to start execution from the same program counter, with the count set to zero. For example, each core 120, 130 may include a counter that tracks the "committed instruction count". Privileged instructions may execute before starting execution of the instruction segment or when the instruction segment is near completion. Such privileged instructions may include, for example, instructions to reset all architectural states, flush L1 and L2 caches and TLB, reset committed instruction count registers, suspend or cancel the operation of any core, skip portions of the compare instruction segment, evict a cache line from L2, and so forth.

L1 cache 125 and L2 cache 135 may be flushed prior to executing the first instruction in the segment. The register holding the committed instruction count may be set to zero. When a cache line is loaded into the first L2 cache 125, the same line is loaded into the second L2 cache 135 in the same manner. Although the corresponding loads to each of the first L2 cache 125 and the second L2 cache 135 may occur at approximately the same time, the loads need not be synchronized.

For example, row eviction synchronizer 140 may be a module in the primary core or primary L2 cache that sends read requests to the counts of the primary and secondary cores, which results in a temporary suspension of completion in the core. The line eviction synchronizer 140 may force the eviction of the same cache line from each of the first L2 cache 125 and the second L2 cache 135 at approximately the same time. Cache lines evicted from the first L2 cache 125 and the second L2 cache 135 may be sent to the checker 150.

When synchronizer 140 receives these two counts, it determines whose count is higher and how much higher, the difference in counts = N. Synchronizer 140 signals the core with the higher count to flush the pipeline and halt instruction fetching. Synchronizer 140 also sends a signal and number N to the core with the lower count. The core then cancels the completion of the suspension, then waits for N instructions to commit, then flushes the pipeline and suspends fetching instructions. When all completed stores from the kernel are flushed to the L2 cache, the selected L2 line may be evicted from the flushed cache. After the selected L2 line is evicted, synchronizer 140 cancels the suspended kernel and resumes normal execution. If the L2 cache does not contain, then the line evicted from the L2 cache must also be evicted from the L1 data cache.

When the "committed instruction count" of each L2125, 135 reaches the same number, all lines evicted from the first L2 cache 125 are also evicted from the second L2 cache 135 at about the same time. In this regard, the first L2 cache 125 and the second L2 cache 135 may communicate with each other directly or through one or more other components. Communication between the first and second L2 caches that coordinate eviction may be initiated by either the first L2 cache 125 or the second L2 cache 135. For example, the first L2 cache 125 may send a read request to the second cache 135 to obtain a "committed instruction count" for the second L2 cache 135. The committed instruction count indicates the number of loads and stores completed by the core at a given point in time, e.g., a value entered as a result of a store operation. The second L2 cache 135 may send its count in response.

According to an example method for communication and coordination between the first and second L2 caches, upon receipt of a count, the first L2 cache 125 may pause its store operation. This may cause the second L2 cache 135 to also suspend its store operations.

If the count from the second L2 cache 135 is lower than the count of the first L2 cache 125, the first L2 cache 125 may evict its cache line. The evicted line may be sent to the checker 150 and temporarily stored for later comparison. The first L2 cache 125 then signals to cancel suspending the store operation of the second L2 cache. The first L2 cache 125 may also send its count to the second L2 cache 135, either together with the cancel pause signal or separately. When the second L2 cache 135 reaches the same count as indicated by the first L2 cache 125, the second L2 cache 135 evicts its cache line, e.g., by sending it to the checker 150 for comparison with the temporarily stored corresponding cache line from the first L2 cache 125.

If the count from the second L2 cache 135 is higher than the count of the first L2 cache 125, the first cache 125 instructs the second cache 135 to evict its cache line to the checker 150. The first L2 cache 125 may also instruct the second L2 cache 135 to cancel the store operation from being suspended after the line is evicted, and the first L2 cache 125 also cancels the suspension. When the first L2 cache 125 reaches the same count as the second L2 cache 135, it evicts its line and sends it to the checker 150 for comparison.

If the count from the second L2 cache 135 is the same as the count of the first L2 cache 125, the first L2 cache 125 evicts its cache line to the checker 150 and unpauses itself. The first L2 cache 125 also sends a signal to the second L2 cache 135 to evict its cache line to the checker 150 and cancel the suspended storage operation.

According to a second example method for communication and coordination between the L2 caches 125, 135, a synchronizer in the first core 120 or the first L2 cache 125 sends a read request to read a "committed instruction count" from each of the first core 120 and the second core 130. This may result in a temporary suspension of completion in cores 120, 130. The synchronizer then receives two counts, which determines which count is higher and how much higher. For example, the difference in counts may be represented by N. The synchronizer signals the core with the higher count to flush its pipeline and halt instruction fetching. The synchronizer also signals the kernel with the lower count, indicating the difference N. The core cancels the completion of the suspension and waits for N instructions to commit, then flushes its pipeline and suspends fetching instructions. When all completed stores from both cores are drained to L2, the selected L2 cache line may be evicted. According to some examples, multiple L2 cache lines may be evicted simultaneously. After the selected line is evicted, the synchronizer un-halts the kernel and resumes normal execution. If L2 does not contain, the line evicted from L2 will also be evicted from L1.

For example, the checker 150 may be implemented in control logic in an L2 cache, an L3 cache, or at an interface of a core-to-core communication network, or in control logic between two cores. The checker 150 compares the line of data from the first L2 cache 125 generated by the instruction with the line of data from the second L2 cache 135. In this regard, the checker may check each L2 cache line written during execution of the instruction segment to determine whether the result from the first core 120 matches the result of the second core 130. If the contents in each row match, the next instruction may be similarly processed and analyzed by the two cores 120, 130. If the contents in each row do not match, then it may be determined that an error occurred in one of the first core 120 or the second core 130.

According to some examples, the checker 150 may have a storage structure for storing cache lines received from the first L2 cache 125 and the second L2 cache 135. The structure may have a plurality of entries corresponding to outstanding L2 misses that the plurality of cores may have. The storage structure may be, for example, a table or other structure.

While the above examples compare the results of two cores 120, 130 executing instruction segments, in other examples, additional cores may be included in the analysis. For example, a third, fourth, or more cores may also process instruction segments processed by first core 120 and second core 130, and the values stored by all cores or any subset of cores may be compared to each other. In some examples, when the values stored by each core do not match, analyzing the additional cores may help to more easily identify which core failed. For example, if four cores store the same value and a fifth core stores a different value, then it may be determined that the fifth core is experiencing an error or failure.

While the above system compares the results of instructions executed by the first core and the second core, the inherent differences in execution of the two cores are still allowed. Such inherent differences may be due to, for example, speculative execution, out-of-order execution, different branch predictions, and so forth. This results in the LRU states in the two L2 caches 125, 135 being different. The L2 LRU state from the master core 120 for cache replacement is honored. In other words, the master core decides which line to replace based on its LRU or replacement algorithm. The secondary core will replace the same row regardless of its own LRU state.

According to some examples, upon detecting an L2 miss, a new L2 cache line to load is determined. This may be sent to the L3 cache, SLC or memory and when the line returns, it is communicated back by the L3 cache, SLC or memory so that both L2's know which "way" to load the line into the congruence class.

External interrupts may be routed to the first core 120 instead of the second core 130. The internal interrupt may be handled normally by the first core 120. Before the interrupt is made, the first core 120 may coordinate the committed instruction counts of the two cores 120, 130. For example, a synchronizer in the first core 120 sends a request to read the "committed instruction count" of the second core 130, causing it to halt instruction fetching and flush its pipeline. After the suspension, the second core 130 waits for a signal from the first core 120 indicating where to resume execution. After the same "committed instruction count" is reached, the first core 120 interrupts. Thus, at this point, both cores 120, 130 should have the same architectural state. The master core 120 may set the caches 125, 135 to an "incomparable" mode so that subsequent stores drained to the L2 line are not checked for errors.

In the incomparable mode, all subsequent stores may be drained to L2 lines, which will not be checked for errors. According to some examples, an L2 cache line storing data in an "incomparable" mode may include a flag (e.g., a set bit or other indicator) that marks the line so that the checker 150 knows not to compare the line. According to other examples, a flag or bit marking a cache line as incomparable may be in the L2 directory.

For internal interrupts generated by the second core 130, the second core 130 may be suspended and wait for a signal from the first core 120 indicating a point in the instruction segment where the second core 130 may resume execution. Examples of such internal interrupts may include page faults, unaligned accesses, illegal operations, hypervisor calls (SVCs), and the like. If the first core 120 first reaches an internally generated interrupt, the second core 130 will not see it because it will be paused. If the first core 120 reaches an interrupt after the second core 130, it may coordinate the committed instruction count and set the cache to an incomparable mode, as described above.

When the first core 120 returns from the interrupt, it sends a special interrupt to the second core to allow the operating system to make the second core architecture state the same as the first core, it synchronizes the committed instruction count of both cores 120, 130 to zero, and the two cores 120, 130 are restarted from the same program counter almost simultaneously. Before execution begins, all lines marked as "incomparable" should be evicted from the first L2 cache 125 and the second L2 cache 135, so that after a restart, the contents of all L1 and L2 caches are identical.

According to some examples, error checking may be skipped if one or more events occur during execution of the code segment. Such events may include when the code includes self-modifying code, code that depends on kernel-specific variables, and the like. To skip a particular code region, the code region within a segment may be tagged, bracketed, or otherwise marked using an instruction set architecture. When the kernel encounters such an area, it may set the L2 cache to an "incomparable state" until the code area ends.

FIG. 2 is a block diagram of an example environment 200 for implementing the error checking system of FIG. 1. The system may be implemented on a device (e.g., server computing device 215) having one or more processors in one or more locations. The user computing device 212 and the server computing device 215 may be communicatively coupled to one or more storage devices 230 through a network 260. The storage device 230 may be a combination of volatile and nonvolatile memory and may be located in the same or different physical location as the computing devices 212, 215. For example, storage device 230 may include any type of non-transitory computer-readable medium capable of storing information, such as a hard disk drive, a solid state drive, a tape drive, optical storage, a memory card, ROM, RAM, DVD, CD-ROM, writable, and read-only memory.

The server computing device 215 may include one or more processors 213 and memory 214. Memory 214 may store information accessible by processor 213, including instructions 221 executable by processor 213. Memory 214 may also include data 223 that may be retrieved, manipulated, or stored by processor 213. Memory 214 may be a non-transitory computer readable medium such as volatile and non-volatile memory capable of storing information accessible by processor 213. The processor 213 may include one or more processor cores, e.g., for CPU, GPU, TPU, FPGA, etc.

The instructions 221 may include one or more instructions that, when executed by the processor 213, cause the one or more processors to perform actions defined by the instructions. The instructions 221 may be stored in an object code format for direct processing by the processor 213, or in other formats including interpretable scripts or a collection of separate source code modules that are interpreted or precompiled as desired. The instructions 221 may include instructions for comparing the results of instruction segments executed by two processor cores. The instructions 221 may further include instructions to determine whether one of the processor cores is experiencing an error condition or is failing based on a comparison of the results.

The data 223 may be retrieved, stored, or modified by the processor 213 according to instructions 221. The data 223 may be stored in a computer register, as a table with a number of different fields and records, in a relational database or a non-relational database, or as JSON, YAML, proto or XML documents. The data 223 may also be formatted in a computer readable format such as, but not limited to, binary values, ASCII, or Unicode. In addition, the data 223 may include information sufficient to identify relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memory including other network locations, or information used by a function to calculate relevant data.

The user computing device 212 may also be configured, like the server computing device 215, with one or more processors 216, memory 217, instructions 218, and data 219. The user computing device 212 may also include user output 226 and user input 224. User input 224 may include any suitable mechanism or technique for receiving input from a user, such as a keyboard, mouse, mechanical actuator, soft actuator, touch screen, microphone, and sensor.

The server computing device 215 may be configured to send data to the user computing device 212, and the user computing device 212 may be configured to display at least a portion of the received data on a display implemented as part of the user output 226. User output 226 may also be used to display an interface between user computing device 212 and server computing device 215. The user output 226 may alternatively or additionally include one or more speakers, transducers, or other audio outputs, a haptic interface providing non-visual and non-audible information to a platform user of the user computing device 212, or other haptic feedback.

Although fig. 2 illustrates the processors 213, 216 and memories 214, 217 as being within the computing devices 215, 212, the components described in this specification, including the processors 213, 216 and memories 214, 217, may include multiple processors and memories that may operate in different physical locations rather than within the same computing device. For example, some of the instructions 221, 218 and data 223, 219 may be stored on a removable SD card, while others may be stored within a read-only computer chip. Some or all of the instructions and data may be stored in a location that is physically remote from the processors 213, 216 but still accessible to the processors 213, 216. Similarly, the processors 213, 216 may include a collection of processors that may operate concurrently and/or sequentially. The computing devices 215, 212 may each include one or more internal clocks that provide timing information that may be used for time measurement of operations and programs run by the computing devices 215, 212.

The server computing device 215 may be configured to receive a request from the user computing device 212 to process data. For example, environment 200 may be part of a computing platform configured to provide various services to users through various user interfaces and/or APIs that expose platform services. The one or more services may be a machine learning framework or a set of tools for generating a neural network or other machine learning model based on specified tasks and training data.

Devices 212, 215 are capable of direct and indirect communication over network 260. Devices 215, 212 may establish a listening socket that may accept an initiating connection for sending and receiving information. The network 260 itself may include various configurations and protocols including the Internet, world Wide Web, intranets, virtual private networks, wide area networks, local area networks, and private networks using one or more corporate proprietary communication protocols. The network 260 may support various short-range and long-range connections. Short-range and long-range connections may be made over different bandwidths (e.g., generally the same asStandard-associated 2.402GHz to 2.480GHz, usually with +.>2.4GHz and 5GHz associated with communication protocols) or comply with various communication standards (e.g., +.>Standard). Additionally or alternatively, the network 260 may also support wired connections between the devices 212, 215, including through various types of ethernet connections.

While a single server computing device 215 and user computing device 212 are shown in fig. 2, it should be understood that aspects of the present disclosure may be implemented in accordance with a variety of different configurations and numbers of computing devices, including in a sequential or parallel processing paradigm, or over a distributed network of multiple devices. In some implementations, aspects of the present disclosure may be performed on a single device and any combination thereof.

FIG. 3 is a flow chart of an example process 300 for error detection. The example process 300 may be performed on a system of one or more processors in one or more locations.

As indicated at block 310, the first core and the second core execute instruction segments. The first core may be, for example, a primary core visible to an application running instructions, while the second core is a secondary core not visible to the application. Although two kernels are described in this example, additional kernels may be included in the analysis. The execution may be performed by the first core and the second core substantially simultaneously, but allows for differences in processing speed and other events such that the execution need not be completely synchronized.

In block 320, the results of the executed instruction segment are stored by each of the first core and the second core in respective first and second caches. The cache may be, for example, an L2 cache. According to some examples, each of the first cache and the second cache may maintain a count of stored values committed by their respective first core and second core.

In block 330, the associated cache line is evicted from the first cache and the second cache and sent to the error checker. The eviction may be coordinated by, for example, the first L2 cache or the second L2 cache. This coordination may utilize counts of committed instructions maintained by the first cache and the second cache. The checker may be a module implemented in logic in L2 or L3 or on a bus interface. The checker may include a storage structure for temporarily storing cache lines received from the first cache and the second cache. Corresponding cache lines from the first cache and the second cache may be received at the checker at different times. For example, one cache may send a cache line and pause until another cache catches up with and sends the corresponding cache line.

In block 340, the checker compares the cache lines associated with each other. For example, the checker may compare each stored value in the corresponding cache line to determine if there is a match.

In block 350, based on the comparison, it is determined whether the first core or the second core is experiencing an error or failure. For example, if the compared values do not match, it may be determined that one of the cores is experiencing an error. According to some examples, the nature or severity of the mismatch may be an indication of the severity of the error or failure of the kernel.

Aspects of the present disclosure may be implemented in digital electronic circuitry, in computer-readable storage media, as one or more computer programs, or in combinations of one or more of the foregoing. The computer-readable storage medium may be non-transitory, for example, as one or more instructions executable by the cloud computing platform and stored on the tangible storage device.

In this specification, the phrase "configured to" is used in a different context in connection with a computer system, hardware, and a portion of a hardware circuit or computer program, engine, or module. When a system is configured to perform one or more operations, this means that the system has the appropriate software, firmware, and/or hardware installed on the system that, when in operation, causes the system to perform the one or more operations. When a piece of hardware is said to be configured to perform one or more operations, this means that the piece of hardware includes one or more circuits that, when operated, receive input and generate output corresponding to the one or more operations from the input. When a computer program, engine, or module is referred to as being configured to perform one or more operations, it means that the computer program comprises one or more program instructions that, when executed by one or more computers, cause the one or more computers to perform the one or more operations.

Although the operations shown in the figures and described in the claims are illustrated in a particular order, it should be understood that the operations may be performed in a different order than shown, and that some operations may be omitted, performed more than once, and/or performed in parallel with other operations. Furthermore, the separation of different system components configured to perform different operations should not be understood as requiring the components to be separated. The described components, modules, programs, and engines may be integrated together as a single system or as part of multiple systems.

The foregoing alternative examples are not mutually exclusive, unless otherwise specified, but may be implemented in various combinations to achieve unique advantages. As these and other variations and combinations of the above-described features may be utilized without departing from the subject matter defined by the claims, the foregoing description of the embodiments should be taken by way of illustration rather than by way of limitation of the subject matter defined by the claims. Furthermore, the provision of examples described herein, and terms such as "for example," "comprising," and the like, should not be construed as limiting the claimed subject matter to a particular example; rather, these examples are intended to illustrate only one of many possible embodiments. Furthermore, the same reference numbers in different drawings may identify the same or similar elements.

Claims

1. A method, comprising:

executing, by a first processor core, a first instruction segment;

executing, by a second processor core, the first instruction segment;

comparing, with one or more processors, a result of the first instruction segment executed by the first processor core with a result of the first instruction segment executed by the second processor core;

based on the comparison, a determination is made, with the one or more processors, whether one of the first processor core or the second processor core is experiencing an error.

2. The method as recited in claim 1, further comprising:

storing, by the first processor core, the results of executing the first instruction segment in a first cache; and

the results of executing the first instruction segment are stored in a second cache by the second processor core.

3. The method of claim 2, further comprising coordinating eviction of corresponding cache lines from the first cache and the second cache.

4. The method of claim 3, wherein coordinating eviction of a corresponding cache line comprises coordinating a count.

5. The method of claim 4, wherein the count is a count of committed instructions, and wherein coordinating eviction of corresponding cache lines comprises:

sending a signal by the first cache to the second cache requesting a count of committed instructions;

receiving, by the first cache, the count of committed instructions from the second cache; and

comparing the count of committed instructions from the second cache with a count of committed instructions from the first cache.

6. The method as recited in claim 5, further comprising:

if the count of committed instructions from the second cache is less than the count of committed stores from the first cache, evicting lines comprising the committed stores from the first cache to a checker for temporary storage and allowing the second cache to catch up;

if the count of committed instructions from the second cache is greater than the count of committed instructions from the first cache, evicting a line from the second cache that includes the committed store to the checker for temporary storage and allowing the first cache to catch up; and

if the count of committed instructions from the second cache is the same as the count of committed instructions from the first cache, then a line is evicted from the first and second caches to the checker.

7. The method of claim 4, wherein the count comprises a count of at least one of a load or a store.

8. The method of claim 1, further comprising determining that both cores are operating properly when the result from the first core matches the result from the second core.

9. The method of claim 1, further comprising executing, by the first core and the second core, a second instruction segment.

10. The method of claim 9, wherein the first core and the second core operate in an error checking mode in which the results from each core are compared for a limited period of time when the first core and the second core are deployed.

11. The method of claim 9, wherein the first core and the second core operate in an error checking mode in which the results from each core are compared for an extended period of time when the first core and the second core are in a test phase.

12. A system, comprising:

a first processor core operable to execute a first instruction segment;

a first cache in communication with the first processor core, the first cache operable to store results of execution of the first instruction segment by the first processor core;

a second processor core operable to execute the first instruction segment;

a second cache in communication with the second processor core, the second cache operable to store results of execution of the first instruction segment by the second core; and

one or more processors in communication with the first cache and the second cache, the one or more processors configured to:

comparing the first cached content with the second cached content; and

based on the comparison, it is determined whether the first processor core or the second processor core is experiencing an error.

13. The system according to claim 12, wherein:

the first processor core is configured to store the results of executing the first instruction segment in a first cache; and

the second processor core is configured to store the results of executing the first instruction segment in a second cache.

14. The system of claim 13, wherein the one or more processors are further configured to coordinate eviction of corresponding cache lines from the first cache and the second cache.

15. The system of claim 14, wherein coordinating the eviction of the corresponding cache line comprises coordinating the counting of committed instructions, and wherein coordinating the eviction of the corresponding cache line comprises:

16. The system according to claim 15, wherein:

if the count of committed instructions from the second cache is less than the count of committed stores from the first cache, the one or more processors evict a line from the first cache that includes the committed stores to a checker for temporary storage and allow the second cache to catch up;

if the count of committed instructions from the second cache is greater than the count of committed instructions from the first cache, the one or more processors evict a line from the second cache that includes the committed store to the checker for temporary storage and allow the first cache to catch up; and

the one or more processors evict a line from the first cache and the second cache to the checker if the count of committed instructions from the second cache is the same as the count of committed instructions from the first cache.

17. The system of claim 14, wherein coordinating eviction of a corresponding cache line comprises coordinating a count of at least one of loads or stores.

18. The system of claim 12, wherein the first core and the second core operate in an error checking mode in which the results from each core are compared for a limited period of time when the first core and the second core are deployed.

19. The system of claim 12, wherein the first core and the second core operate in an error checking mode in which the results from each core are compared for an extended period of time when the first core and the second core are in a test phase.

20. A non-transitory computer-readable medium storing instructions executable by one or more processors for performing a method of detecting silence data corruption, the method comprising:

comparing a first result of a first instruction segment executed by a first processor core with a second result of the first instruction segment executed by a second processor core; and

based on the comparison, it is determined whether one of the first processor core or the second processor core is experiencing an error.