[go: up one dir, main page]

CN112416615A - Multi-core processor, method and device for realizing cache consistency of multi-core processor and storage medium - Google Patents

Multi-core processor, method and device for realizing cache consistency of multi-core processor and storage medium Download PDF

Info

Publication number
CN112416615A
CN112416615A CN202011222430.XA CN202011222430A CN112416615A CN 112416615 A CN112416615 A CN 112416615A CN 202011222430 A CN202011222430 A CN 202011222430A CN 112416615 A CN112416615 A CN 112416615A
Authority
CN
China
Prior art keywords
state
cache line
bus
cache
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011222430.XA
Other languages
Chinese (zh)
Other versions
CN112416615B (en
Inventor
叶政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN202011222430.XA priority Critical patent/CN112416615B/en
Publication of CN112416615A publication Critical patent/CN112416615A/en
Application granted granted Critical
Publication of CN112416615B publication Critical patent/CN112416615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1678Details of memory controller using bus width
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/545Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a method and a device for realizing cache consistency of a multi-core processor, the multi-core processor and a storage medium, wherein the method comprises the following steps: on the basis of a MESI protocol, adding a modified and shared state to mark each cache line as any one of a modified state, a modified and shared state, an exclusive state, a shared state and a failed state in a cache of each processor in the multi-core processor; wherein, the modified and shared state indicates that the data in the cache line has been updated but not written back to the memory, and has the same valid copy in the caches of other cores; and tracking the cache line corresponding to each processor core through bus monitoring so as to perform state conversion on the corresponding cache line according to different data access modes and realize the cache consistency of the multi-core processor. According to the scheme, the performance of the multi-core processor is improved by effectively reducing the memory write-back operation.

Description

Multi-core processor, method and device for realizing cache consistency of multi-core processor and storage medium
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a method and a device for realizing cache consistency of a multi-core processor, the multi-core processor and a storage medium, in particular to a method and a device for realizing cache consistency of the multi-core processor, the multi-core processor and the storage medium.
Background
By multi-core is meant that a processor includes multiple central processing unit cores (cpu cores), and two or more cores are packaged together and integrated into a circuit of the processor. A multi-core processor is a single chip (also referred to as a silicon core) that can be directly inserted into a single processor socket, but the operating system will use all the associated resources to treat each of its execution cores as a separate logical processor. By dividing tasks among multiple execution cores, a multi-core processor may execute more tasks in a particular clock cycle.
Many-core processors have been widely used in industrial products, and with the continuous expansion of their application range, the demands of people on many-core processors are also increasing. The multi-core processor is used as a control core unit of a product, and plays an extremely important role in aspects of product stability, performance optimization and the like.
The cache consistency of the multi-core processor ensures that each processor core can access the latest memory data no matter whether the cached data in the core is valid or not and whether the cached data in the core is updated or not. The method for keeping the cache and the memory of each core synchronous in real time is the simplest method for realizing the cache consistency of the multi-core processor, but has the following defects: the bandwidth of the bus on the multi-core processor chip is greatly increased, and the performance of the multi-core processor is seriously influenced.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention aims to provide a method and a device for realizing cache consistency of a multi-core processor, the multi-core processor and a storage medium, so as to solve the problems that the cache consistency of the multi-core processor is realized by keeping the cache and the memory of each core synchronous in real time, the bandwidth of a bus on a multi-core processor chip is increased, and the performance of the multi-core processor is influenced, and achieve the effect of improving the performance of the multi-core processor by effectively reducing the memory write-back operation.
The invention provides a method for realizing cache consistency of a multi-core processor, which comprises the following steps: under the condition that the cache coherence protocol of the multi-core processor is the MESI protocol, adding a modified and shared state on the basis of the MESI protocol so as to mark each cache line as any one of a modified state, a modified and shared state, an exclusive state, a shared state and a failed state in the cache of each processor in the multi-core processor; wherein, the modified and shared state indicates that the data in the cache line has been updated but not written back to the memory, and has the same valid copy in the caches of other cores; tracking the cache line corresponding to each processor core through bus monitoring to perform state conversion on the corresponding cache line according to different data access modes, so as to realize the cache consistency of the multi-core processor; the data access pattern includes: a local access mode and a bus access mode.
In some embodiments, further comprising: establishing and maintaining a state machine according to the state and operation of each cache line to perform state conversion on the corresponding cache line according to different data access modes, so as to realize the cache consistency of the multi-core processor; the operations, comprising: local operations and bus operations.
In some embodiments, performing state transitions on respective cache lines according to different data access modes includes: the cache line in the modified state is converted to the modified and shared state in a bus read operation.
In some embodiments, performing state transition on the corresponding cache line according to different data access modes further includes: under the bus writing operation, the exclusive cache line does not send out the memory write-back signal any more, but directly sends to the cache line of the processor core sending out the bus write-back signal through the bus, and then the latest data is cached in the cache line of the processor core sending out the bus write-back signal.
In some embodiments, performing state transition on the corresponding cache line according to different data access modes further includes: under the bus write operation, the cache line in the shared state does not send out a memory write-back signal any more, but directly sends out the memory write-back signal to the cache line of the processor core sending out the bus write-back signal through the bus.
In some embodiments, performing state transition on the corresponding cache line according to different data access modes further includes: when the cache line in the modified state or the cache line in the modified and shared state monitors the bus writing operation, the cache line can be temporarily transmitted to the processor core which sends out the bus writing signal through the bus without being processed by the memory writing back operation.
In another aspect, the present invention provides an apparatus for implementing cache coherence of a multi-core processor, where the apparatus includes: the state configuration unit is configured to add a modified and shared state on the basis of the MESI protocol under the condition that the cache coherence protocol of the multi-core processor is the MESI protocol, so that each cache line is marked as any one state of a modified state, a modified and shared state, an exclusive state, a shared state and a failed state in the cache of each processor in the multi-core processor; wherein, the modified and shared state indicates that the data in the cache line has been updated but not written back to the memory, and has the same valid copy in the caches of other cores; the state conversion unit is configured to monitor and track the cache line corresponding to each processor core through a bus so as to perform state conversion on the corresponding cache line according to different data access modes and realize the cache consistency of the multi-core processor; the data access pattern includes: a local access mode and a bus access mode.
In some embodiments, further comprising: the state conversion unit is also configured to establish and maintain a state machine according to the state and operation of each cache line, so as to perform state conversion on the corresponding cache line according to different data access modes and realize the cache consistency of the multi-core processor; the operations, comprising: local operations and bus operations.
In some embodiments, the state transition unit performs state transition on the corresponding cache line according to different data access modes, including: the cache line in the modified state is converted to the modified and shared state in a bus read operation.
In some embodiments, the state transition unit performs state transition on the corresponding cache line according to different data access modes, and further includes: under the bus writing operation, the exclusive cache line does not send out the memory write-back signal any more, but directly sends to the cache line of the processor core sending out the bus write-back signal through the bus, and then the latest data is cached in the cache line of the processor core sending out the bus write-back signal.
In some embodiments, the state transition unit performs state transition on the corresponding cache line according to different data access modes, and further includes: under the bus write operation, the cache line in the shared state does not send out a memory write-back signal any more, but directly sends out the memory write-back signal to the cache line of the processor core sending out the bus write-back signal through the bus.
In some embodiments, the state transition unit performs state transition on the corresponding cache line according to different data access modes, and further includes: when the cache line in the modified state or the cache line in the modified and shared state monitors the bus writing operation, the cache line can be temporarily transmitted to the processor core which sends out the bus writing signal through the bus without being processed by the memory writing back operation.
In accordance with the above apparatus, a further aspect of the present invention provides a multi-core processor, comprising: the device for realizing the cache consistency of the multi-core processor is described above.
In another aspect, the present invention provides a storage medium, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute the above method for implementing cache coherence of a multicore processor.
Therefore, according to the scheme of the invention, a modified and shared state (MS) is added on the basis of the MESI protocol, and the cache line corresponding to each processor core is tracked through bus monitoring, so that the memory write-back operation can be effectively reduced, and the performance of the multi-core processor is improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a flowchart illustrating an embodiment of a method for implementing cache coherency in a multi-core processor according to the present invention;
FIG. 2 is a schematic structural diagram of an embodiment of an apparatus for implementing cache coherency in a multi-core processor according to the present invention;
FIG. 3 is a diagram illustrating state transition of a MESI multi-core cache coherency protocol;
FIG. 4 is a diagram illustrating state transitions of an embodiment of a multi-core cache coherency protocol according to the invention.
The reference numbers in the embodiments of the present invention are as follows, in combination with the accompanying drawings:
102-a state configuration unit; 104-state transition unit.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
According to an embodiment of the present invention, a method for implementing cache coherence of a multi-core processor is provided, as shown in fig. 1, which is a schematic flow diagram of an embodiment of the method of the present invention. The method for realizing the cache consistency of the multi-core processor can comprise the following steps: step S110 and step S120.
At step S110, in a case that the cache coherency protocol of the multi-core processor is the MESI protocol, on the basis of the MESI protocol, adding a modified and shared state (MS) to mark each cache line as any one of a modified state (M), a modified and shared state (MS), an exclusive state (E), a shared state (S), and a invalidated state (I) in the cache of each processor in the multi-core processor. Wherein the modified and shared state (MS) indicates that the data in the cache line has been updated but not yet written back to memory, and that there are also valid copies in the caches of the other cores.
The MESI protocol can mark each cache line as any one of a modified state (M), an exclusive state (E), a shared state (S) and a failed state (I) in a cache of each processor in the multi-core processor, and simultaneously maintain a state machine so as to perform state conversion on the corresponding cache line according to different data access modes. A modified state (M) indicating that the data in the cache line has been updated but not written back to memory and that the same valid copy is not held in the other core's caches. Exclusive state (E), indicating that the data in the cache line and memory are synchronized and that no valid copy of the same is held in the other cores' caches. And the shared state (S) indicates that the data in the cache line and the memory are synchronous, and the same effective copy is kept in the caches of other cores. The spent state (I), which indicates that the data for the cache line has been spent, is not used.
Therefore, a modified and shared state (MS) is added on the basis of the MESI protocol, and by adding the modified and shared state (MS), the cache line in the modified state (M) does not generate the problem of low time delay caused by memory write-back under the bus read operation any more, so that the problem of memory write-back under the bus read operation of the cache line in the modified state (M) is solved, and the memory write-back operation is effectively reduced.
In step S120, the cache line corresponding to each processor core is tracked through bus monitoring, so as to perform state transition on the corresponding cache line according to different data access modes, thereby implementing cache consistency of the multi-core processor. The data access pattern includes: a local access mode and a bus access mode. Wherein the local access mode corresponds to local operation and the bus access mode corresponds to bus operation.
Therefore, by tracking the cache line corresponding to each processor core through bus monitoring, the problem of memory write back of the cache line in the exclusive state (E) and the cache line in the shared state (S) can not be generated under the bus writing operation, the problem of memory write back of the cache line in the exclusive state (E) under the bus writing operation is solved, the problem of memory write back of the cache line in the shared state (S) under the bus writing operation is solved, and the memory write back operation is effectively reduced.
In some embodiments, further comprising: and establishing and maintaining a state machine according to the state and operation of each cache line so as to perform state conversion on the corresponding cache line according to different data access modes and realize the cache consistency of the multi-core processor. The operations, comprising: local operations and bus operations.
The state of the cache line can be known from the mark, the operation of the cache line can comprise a local operation and a bus operation, and in the initially established state machine, the state of the cache line only comprises a failed state (I). The state machine is a control center which is composed of a state register and a combinational logic circuit, can carry out state transition according to a preset state according to a control signal, coordinates the action of the related signal and completes a specific operation. The operation of the cache line in the state machine is divided into a local operation and a bus operation, wherein the local operation refers to the operation of the processor core owning the cache line on the line, and the bus operation refers to the operation of other processor cores accessing the cache line of the core and then sending a synchronization signal to the core through a bus.
Therefore, the high-performance multi-core cache consistency method is improved on the basis of the MESI protocol, so that the communication rate of the multi-core processor can be improved to a great extent, the performance of the processor is improved, the power consumption can be reduced by reducing unnecessary memory write-back, the heating condition of the processor is relieved, and the product can normally work in a more severe working environment.
In some embodiments, performing state transitions on respective cache lines according to different data access modes includes: the cache line in the modified state (M) is converted to the modified and shared state (MS) in a bus read operation. By adding the modified and shared state (MS), the cache line in the modified state (M) is converted to the modified and shared state (MS) during a bus read operation, avoiding performance issues caused by memory write back.
In some embodiments, performing state transition on the corresponding cache line according to different data access modes further includes: under the bus writing operation, the cache line in the exclusive state (E) does not send out a memory write-back signal any more, but directly sends out the memory write-back signal to the cache line of the processor core sending out the bus write-back signal through the bus, and then the latest data is cached in the cache line of the processor core sending out the bus write-back signal. By adding the modified and shared state (MS), the cache line in the exclusive state (E) no longer issues a memory write back signal under bus write operation, but rather is sent directly over the bus to the cache line of the processor core issuing the bus write signal, after which the most recent data is cached in the cache line of this processor core issuing the bus write signal, which also avoids a memory write back problem.
In some embodiments, performing state transition on the corresponding cache line according to different data access modes further includes: under the bus write operation, the cache line in the shared state (S) does not send out a memory write-back signal any more, but directly sends out the memory write-back signal to the cache line of the processor core sending out the bus write-back signal through the bus. By adding the modified and shared state (MS), the cache line in the shared state (S) no longer issues a memory write back signal under bus write operation, but directly sends the memory write back signal to the cache line of the processor core issuing the bus write signal through the bus, again avoiding a memory write back problem.
In some embodiments, performing state transition on the corresponding cache line according to different data access modes further includes: when the cache line in the modified state (M) or the cache line in the modified and shared state (S) snoops the bus writing operation, the cache line can be temporarily transmitted to the processor core which sends out the bus writing signal through the bus without being processed by the memory writing-back operation. By adding the modified and shared state (MS), when the cache line in the modified state (M) or the cache line in the modified and shared state (S) monitors the bus writing operation, the cache line can be temporarily not processed by the memory write-back operation, but is transmitted to the processor core sending out the bus write signal through the bus, so as to further improve the performance of the processor.
Through a large number of tests, the technical scheme of the embodiment is adopted, a modified and shared state (MS) is added on the basis of the MESI protocol, and the cache line corresponding to each processor core is tracked through bus monitoring, so that the memory write-back operation can be effectively reduced, and the performance of the multi-core processor is improved.
According to the embodiment of the invention, the implementation device of the cache consistency of the multi-core processor is also provided, which corresponds to the implementation method of the cache consistency of the multi-core processor. Referring to fig. 2, a schematic diagram of an embodiment of the apparatus of the present invention is shown. The device for realizing the cache consistency of the multi-core processor can comprise: a state configuration unit 102 and a state transition unit 104.
The state configuration unit 102 is configured to, in a case that a cache coherency protocol of the multi-core processor is a MESI protocol, add a modified and shared state (MS) on the basis of the MESI protocol, so as to mark each cache line as any one of a modified state (M), a modified and shared state (MS), an exclusive state (E), a shared state (S), and a disabled state (I) in a cache of each processor in the multi-core processor. Wherein the modified and shared state (MS) indicates that the data in the cache line has been updated but not yet written back to memory, and that there are also valid copies in the caches of the other cores. The specific function and processing of the status configuration unit 102 are shown in step S110.
The MESI protocol can mark each cache line as any one of a modified state (M), an exclusive state (E), a shared state (S) and a failed state (I) in a cache of each processor in the multi-core processor, and simultaneously maintain a state machine so as to perform state conversion on the corresponding cache line according to different data access modes. A modified state (M) indicating that the data in the cache line has been updated but not written back to memory and that the same valid copy is not held in the other core's caches. Exclusive state (E), indicating that the data in the cache line and memory are synchronized and that no valid copy of the same is held in the other cores' caches. And the shared state (S) indicates that the data in the cache line and the memory are synchronous, and the same effective copy is kept in the caches of other cores. The spent state (I), which indicates that the data for the cache line has been spent, is not used.
Therefore, a modified and shared state (MS) is added on the basis of the MESI protocol, and by adding the modified and shared state (MS), the cache line in the modified state (M) does not generate the problem of low time delay caused by memory write-back under the bus read operation any more, so that the problem of memory write-back under the bus read operation of the cache line in the modified state (M) is solved, and the memory write-back operation is effectively reduced.
And the state conversion unit 104 is configured to track the cache line corresponding to each processor core through bus monitoring, so as to perform state conversion on the corresponding cache line according to different data access modes, thereby realizing the cache consistency of the multi-core processor. The data access pattern includes: a local access mode and a bus access mode. The detailed function and processing of the state transition unit 104 are shown in step S120. Wherein the local access mode corresponds to local operation and the bus access mode corresponds to bus operation.
Therefore, by tracking the cache line corresponding to each processor core through bus monitoring, the problem of memory write back of the cache line in the exclusive state (E) and the cache line in the shared state (S) can not be generated under the bus writing operation, the problem of memory write back of the cache line in the exclusive state (E) under the bus writing operation is solved, the problem of memory write back of the cache line in the shared state (S) under the bus writing operation is solved, and the memory write back operation is effectively reduced.
In some embodiments, further comprising: the state transition unit 104 is further configured to establish and maintain a state machine according to the state and operation of each cache line, so as to perform state transition on the corresponding cache line according to different data access modes, thereby implementing cache consistency of the multi-core processor. The operations, comprising: local operations and bus operations.
The state of the cache line can be known from the mark, the operation of the cache line can comprise a local operation and a bus operation, and in the initially established state machine, the state of the cache line only comprises a failed state (I). The state machine is a control center which is composed of a state register and a combinational logic circuit, can carry out state transition according to a preset state according to a control signal, coordinates the action of the related signal and completes a specific operation. The operation of the cache line in the state machine is divided into a local operation and a bus operation, wherein the local operation refers to the operation of the processor core owning the cache line on the line, and the bus operation refers to the operation of other processor cores accessing the cache line of the core and then sending a synchronization signal to the core through a bus.
Therefore, the high-performance multi-core cache consistency device is improved on the basis of the MESI protocol, so that the communication rate of a multi-core processor can be improved to a great extent, the performance of a processor is improved, the power consumption can be reduced by reducing unnecessary memory write-back, the heating condition of the processor is relieved, and the product can normally work in a more severe working environment.
In some embodiments, the state transition unit 104 performs state transition on the corresponding cache line according to different data access modes, including: the state transition unit 104, in particular, is further configured to transition the cache line in the modified state (M) to the modified and shared state (MS) in a bus read operation. By adding the modified and shared state (MS), the cache line in the modified state (M) is converted to the modified and shared state (MS) during a bus read operation, avoiding performance issues caused by memory write back.
In some embodiments, the state transition unit 104 performs state transition on the corresponding cache line according to different data access modes, and further includes: the state transition unit 104 is specifically configured to, under the bus write operation, not send the memory write-back signal any more, but send the memory write-back signal directly to the cache line of the processor core that sends the bus write-back signal through the bus, and then the latest data is cached in the cache line of the processor core that sends the bus write-back signal. By adding the modified and shared state (MS), the cache line in the exclusive state (E) no longer issues a memory write back signal under bus write operation, but rather is sent directly over the bus to the cache line of the processor core issuing the bus write signal, after which the most recent data is cached in the cache line of this processor core issuing the bus write signal, which also avoids a memory write back problem.
In some embodiments, the state transition unit 104 performs state transition on the corresponding cache line according to different data access modes, and further includes: the state transition unit 104 is further configured to, in particular, send the cache line in the shared state (S) directly to the cache line of the processor core that sent the bus write signal through the bus instead of sending the memory write-back signal under the bus write operation. By adding the modified and shared state (MS), the cache line in the shared state (S) no longer issues a memory write back signal under bus write operation, but directly sends the memory write back signal to the cache line of the processor core issuing the bus write signal through the bus, again avoiding a memory write back problem.
In some embodiments, the state transition unit 104 performs state transition on the corresponding cache line according to different data access modes, and further includes: the state transition unit 104 is specifically configured to, when the cache line in the modified state (M) or the cache line in the modified and shared state (S) snoops the bus write operation, temporarily perform no memory write-back operation processing, and transmit the cache line to the processor core issuing the bus write signal through the bus. By adding the modified and shared state (MS), when the cache line in the modified state (M) or the cache line in the modified and shared state (S) monitors the bus writing operation, the cache line can be temporarily not processed by the memory write-back operation, but is transmitted to the processor core sending out the bus write signal through the bus, so as to further improve the performance of the processor.
Since the processes and functions implemented by the apparatus of this embodiment substantially correspond to the embodiments, principles, and examples of the method shown in fig. 1, reference may be made to the related descriptions in the foregoing embodiments for details which are not described in the description of this embodiment, and further description is not given here.
Through a large number of tests, the technical scheme of the invention is adopted, a modified and shared state (MS) is added on the basis of the MESI protocol, and by adding the modified and shared state (MS), the memory write-back operation is effectively reduced, and the performance of the multi-core processor is improved.
According to the embodiment of the invention, a multi-core processor corresponding to the implementation device of the cache consistency of the multi-core processor is also provided. The multi-core processor may include: the device for realizing the cache consistency of the multi-core processor is described above.
In the related scheme, the used multi-core cache coherence protocol is a MESI protocol, and each cache line is marked as one of four states of modified (M), exclusive (E), shared (S) and invalidated (I), and a state machine is maintained at the same time, and the state of the corresponding cache line is converted according to different data access modes. Among them, the MESI protocol (i.e., cache coherency protocol) is a cache coherency protocol based on Invalidate and is one of the most common protocols that support write-back caches. The method well solves the problems of high bandwidth and low performance of the method for realizing the cache consistency of the multi-core processor for keeping the cache and the memory of each core synchronous in real time, but has certain defects, such as the problem of memory write-back caused by certain operations increases the bus bandwidth.
In some embodiments, the present invention provides a method for implementing cache coherency of a multi-core processor, and implements a high-performance multi-core cache coherency method. The multi-core cache consistency method provided by the scheme of the invention is improved on the basis of the MESI protocol. By the high-performance multi-core cache consistency method, the communication rate of the multi-core processor can be improved to a great extent, the performance of the processor is improved, and the power consumption can be reduced by reducing unnecessary memory write-back, so that the heating condition of the processor is relieved, and the product can normally work in a more severe working environment.
The following describes an exemplary implementation process of the scheme of the present invention with reference to the examples shown in fig. 3 and fig. 4.
FIG. 3 is a diagram illustrating state transition of a MESI multi-core cache coherency protocol. As shown in fig. 3, in the MESI multi-core cache coherency protocol, each cache line is marked as one of the following states:
the 11 th state: a modified state (M) indicating that the data in the cache line has been updated but not written back to memory and that the same valid copy is not held in the other core's caches.
State 12: exclusive state (E), indicating that the data in the cache line and memory are synchronized and that no valid copy of the same is held in the other cores' caches.
State 13: and the shared state (S) indicates that the data in the cache line and the memory are synchronous, and the same effective copy is kept in the caches of other cores.
State 14: the spent state (I), which indicates that the data for the cache line has been spent, is not used.
The scheme of the present invention achieves a more efficient multi-core cache coherency approach than the widely used MESI protocol in the example shown in fig. 3.
FIG. 4 is a diagram illustrating state transitions of an embodiment of a multi-core cache coherency protocol according to the invention. As shown in FIG. 4, in the scheme of the present invention, each cache line is marked as one of the following states:
state 21: a modified state (M) indicating that the data in the cache line has been updated but not written back to memory and that the same valid copy is not held in the other core's caches.
State 22: modified and shared state (MS), indicating that the data in the cache line has been updated but not yet written back to memory, and that there is a valid copy of the same in the other cores' caches.
State 23: exclusive state (E), indicating that the data in the cache line and memory are synchronized and that no valid copy of the same is held in the other cores' caches.
State 24: and the shared state (S) indicates that the data in the cache line and the memory are synchronous, and the same effective copy is kept in the caches of other cores.
State 25: the spent state (I), which indicates that the data for the cache line has been spent, is not used.
In the example shown in FIG. 4, in the state machine of the multicore processor:
a cache line in modified state (M) is still in modified state (M) under a local read operation or a local write operation. A cache line in the modified state (M) will issue a write back signal in a bus write operation, with the state transitioning to the invalidated state (I).
The cache line in the modified and shared state (MS) is still in the modified and shared state (MS) under a local read operation or a bus read operation. A modified and shared state (MS) cache line transitions to a disabled state (I) in a bus write operation or a write back signaling operation.
The cache line in exclusive state (E) is still in exclusive state (E) under local read operation. The cache line in exclusive state (E) transitions to shared state (S) under bus read operation. The buffer line in exclusive state (E) transitions to the disabled state (I) under a bus write operation.
The cache line in the shared state (S) is still in the shared state (S) under a local read operation or a bus read operation. The cache line sharing state (S) is changed to a spent state (I) under a bus write operation.
The cache line in the failed state (I) is converted to the shared state (S) in a local read operation or in an operation of issuing a bus read signal. A cache line in the invalidated state (I) may also be state-switched to the exclusive state (E) in a local read operation or in an operation to issue a bus read signal. A cache line in the spent state (I) may also be state-switched to a modified and shared state (MS) in a local read operation or in an issued bus read signal operation. The cache line in the invalidated state (I) is changed to the modified state (M) in a local write operation or an operation of issuing a bus write signal.
In some embodiments, a multi-core cache coherence method provided by an aspect of the present invention includes: a state machine is established and maintained based on the state and operation of the cache line. The state of the cache line can be known from the tag, the operation of the cache line can comprise local operation and bus operation, and the state machine is initially established, wherein the state of the cache line only comprises a failed state (I).
The state machine is a control center which is composed of a state register and a combinational logic circuit, can carry out state transition according to a preset state according to a control signal, coordinates the action of the related signal and completes a specific operation.
The state machine of MESI is shown in fig. 3, and the state machine of the present invention is shown in fig. 4. The operation of the cache line in the state machine is divided into a local operation and a bus operation, wherein the local operation refers to the operation of the processor core owning the cache line on the line, and the bus operation refers to the operation of other processor cores accessing the cache line of the core and then sending a synchronization signal to the core through a bus.
According to the scheme, a modified and shared state (MS) is added on the basis of the MESI protocol, and the memory write-back operation is effectively reduced by adding the modified and shared state (MS). Referring to the example shown in fig. 4, the low latency problem caused by memory write-back of the cache line in the modified state (M) is no longer generated in the bus read operation, and the memory write-back problem of the cache line in the modified state (M) in the bus read operation is solved.
Modified and shared state (MS), is where there is also a shared and modified cache line in the cache operation of the MESI protocol, except that the MESI protocol does not maintain this state separately. Although this state is also maintained in the related scheme, other states are different, so that the change of the whole state machine is relatively large, and it is difficult to perform extension and upgrade on the basis of the prior art, and since a plurality of states in the related scheme are all composite states similar to a modified and shared state (MS), the design complexity of hardware is also more complicated than that of the hardware of the ordinary MESI protocol, so the design implementation time and cost are higher, and the performance is not necessarily improved. The invention adds a modified and shared state (MS) maintenance on the existing mature MESI protocol, and optimizes the state machine at the same time, so that the original operation condition of writing back the memory is changed from 4 to 2, thus obtaining larger optimization under the condition of not changing the design.
According to the scheme, the cache line corresponding to each processor core is tracked through bus monitoring, and the memory write-back operation is effectively reduced. As can be seen from a comparison between the examples shown in fig. 3 and fig. 4, the memory write-back problem of the cache line in the exclusive state (E) and the cache line in the shared state (S) does not occur any more under the bus write operation, the memory write-back problem of the cache line in the exclusive state (E) under the bus write operation is solved, and the memory write-back problem of the cache line in the shared state (S) under the bus write operation is solved.
The scheme of the invention is improved on the basis of the MESI protocol, and solves the following three problems:
the first problem is that: the memory write back problem of the cache line in the modified state (M) under the bus read operation is solved: in the original MESI protocol, the cache line with modified state (M) is written back to memory under bus read operation, and then the current cache line is invalidated. According to the scheme, the modified and shared state (MS) is added, so that the cache line in the modified state (M) is converted into the modified and shared state (MS) in bus read operation, and the performance problem caused by memory write-back is avoided.
The second problem is that: the problem of memory write back of a cache line in an exclusive state (E) under a bus write operation is solved: in the original MESI protocol, the cache line in exclusive state (E) is written back to memory during a bus write operation, and the current cache line is invalidated. According to the scheme of the invention, the cache line in the exclusive state (E) does not send out the memory write-back signal under the bus writing operation, but directly sends the memory write-back signal to the cache line of the processor core sending out the bus write-back signal through the bus, and then the latest data is cached in the cache line of the processor core sending out the bus write-back signal, so that the problem of memory write-back is avoided.
The third problem is that: the memory write back problem of the cache line in the shared state (S) under the bus write operation is solved: in the original MESI protocol, a cache line in the shared state (S) is written back to memory during a bus write operation, and the current cache line is invalidated. According to the scheme of the invention, the cache line in the shared state (S) does not send out the memory write-back signal under the bus write operation, but directly sends the memory write-back signal to the cache line of the processor core sending out the bus write-back signal through the bus, so that the problem of memory write-back is avoided again.
The above embodiment is one embodiment of the present invention, but the embodiment of the present invention is not limited to the above embodiment.
In some embodiments, in the multi-core cache coherence method provided by the scheme of the present invention, when a cache line in the modified state (M) or a cache line in the modified and shared state (S) snoops a bus write operation, the cache line can be temporarily not processed by a memory writeback operation, but transmitted to a processor core that issues a bus write signal through a bus, so as to further improve the performance of the processor.
Since the processing and functions implemented by the multi-core processor of this embodiment substantially correspond to the embodiment, principle, and example of the apparatus shown in fig. 2, details are not described in the description of this embodiment, and reference may be made to the related description in the foregoing embodiment, which is not described herein again.
Through a large number of tests, the technical scheme of the invention is adopted, and a modified and shared state (MS) is added on the basis of the MESI protocol, so that the cache line in the modified state (M) does not generate the low delay problem caused by memory write-back under the bus read operation any more, the memory write-back operation is effectively reduced, and the performance of the multi-core processor is improved.
According to the embodiment of the present invention, a storage medium corresponding to an implementation method of cache coherence of a multi-core processor is further provided, where the storage medium includes a stored program, and when the program runs, a device where the storage medium is located is controlled to execute the implementation method of cache coherence of a multi-core processor.
Since the processing and functions implemented by the storage medium of this embodiment substantially correspond to the embodiments, principles, and examples of the method shown in fig. 1, reference may be made to the related descriptions in the foregoing embodiments for details which are not described in detail in the description of this embodiment, and thus no further description is given here.
Through a large number of tests, the technical scheme of the invention tracks the cache line corresponding to each processor core through bus monitoring, thereby effectively reducing the memory write-back operation and improving the performance of the multi-core processor.
Through a large number of tests, by adopting the technical scheme of the invention, the cache line corresponding to each processor core is tracked through bus monitoring, the problem of memory write back can not be caused under the bus write operation of the cache lines in the exclusive state (E) and the shared state (S), and the performance of the multi-core processor is improved.
In summary, it is readily understood by those skilled in the art that the advantageous modes described above can be freely combined and superimposed without conflict.
The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (14)

1. A method for implementing cache coherence of a multi-core processor is characterized by comprising the following steps:
under the condition that the cache coherence protocol of the multi-core processor is the MESI protocol, adding a modified and shared state on the basis of the MESI protocol so as to mark each cache line as any one of a modified state, a modified and shared state, an exclusive state, a shared state and a failed state in the cache of each processor in the multi-core processor; wherein, the modified and shared state indicates that the data in the cache line has been updated but not written back to the memory, and has the same valid copy in the caches of other cores;
tracking the cache line corresponding to each processor core through bus monitoring to perform state conversion on the corresponding cache line according to different data access modes, so as to realize the cache consistency of the multi-core processor; the data access pattern includes: a local access mode and a bus access mode.
2. The method for implementing cache coherence of a multi-core processor according to claim 1, further comprising:
establishing and maintaining a state machine according to the state and operation of each cache line to perform state conversion on the corresponding cache line according to different data access modes, so as to realize the cache consistency of the multi-core processor; the operations, comprising: local operations and bus operations.
3. The method for implementing cache coherence of the multi-core processor according to claim 1 or 2, wherein performing state transition on the corresponding cache line according to different data access modes comprises:
the cache line in the modified state is converted to the modified and shared state in a bus read operation.
4. The method for implementing cache coherence of a multi-core processor according to claim 1 or 2, wherein the state transition is performed on the corresponding cache line according to different data access modes, further comprising:
under the bus writing operation, the exclusive cache line does not send out the memory write-back signal any more, but directly sends to the cache line of the processor core sending out the bus write-back signal through the bus, and then the latest data is cached in the cache line of the processor core sending out the bus write-back signal.
5. The method for implementing cache coherence of a multi-core processor according to claim 1 or 2, wherein the state transition is performed on the corresponding cache line according to different data access modes, further comprising:
under the bus write operation, the cache line in the shared state does not send out a memory write-back signal any more, but directly sends out the memory write-back signal to the cache line of the processor core sending out the bus write-back signal through the bus.
6. The method for implementing cache coherence of a multi-core processor according to claim 1 or 2, wherein the state transition is performed on the corresponding cache line according to different data access modes, further comprising:
when the cache line in the modified state or the cache line in the modified and shared state monitors the bus writing operation, the cache line can be temporarily transmitted to the processor core which sends out the bus writing signal through the bus without being processed by the memory writing back operation.
7. An apparatus for implementing cache coherency of a multi-core processor, comprising:
the state configuration unit is configured to add a modified and shared state on the basis of the MESI protocol under the condition that the cache coherence protocol of the multi-core processor is the MESI protocol, so that each cache line is marked as any one state of a modified state, a modified and shared state, an exclusive state, a shared state and a failed state in the cache of each processor in the multi-core processor; wherein, the modified and shared state indicates that the data in the cache line has been updated but not written back to the memory, and has the same valid copy in the caches of other cores;
the state conversion unit is configured to monitor and track the cache line corresponding to each processor core through a bus so as to perform state conversion on the corresponding cache line according to different data access modes and realize the cache consistency of the multi-core processor; the data access pattern includes: a local access mode and a bus access mode.
8. The apparatus for implementing cache coherence of a multi-core processor according to claim 7, further comprising:
the state conversion unit is also configured to establish and maintain a state machine according to the state and operation of each cache line, so as to perform state conversion on the corresponding cache line according to different data access modes and realize the cache consistency of the multi-core processor; the operations, comprising: local operations and bus operations.
9. The apparatus for implementing cache coherence of a multicore processor according to claim 7 or 8, wherein the state transition unit performs state transition on the corresponding cache line according to different data access modes, and includes:
the cache line in the modified state is converted to the modified and shared state in a bus read operation.
10. The apparatus according to claim 7 or 8, wherein the state transition unit performs state transition on the corresponding cache line according to different data access modes, and further includes:
under the bus writing operation, the exclusive cache line does not send out the memory write-back signal any more, but directly sends to the cache line of the processor core sending out the bus write-back signal through the bus, and then the latest data is cached in the cache line of the processor core sending out the bus write-back signal.
11. The apparatus according to claim 7 or 8, wherein the state transition unit performs state transition on the corresponding cache line according to different data access modes, and further includes:
under the bus write operation, the cache line in the shared state does not send out a memory write-back signal any more, but directly sends out the memory write-back signal to the cache line of the processor core sending out the bus write-back signal through the bus.
12. The apparatus according to claim 7 or 8, wherein the state transition unit performs state transition on the corresponding cache line according to different data access modes, and further includes:
when the cache line in the modified state or the cache line in the modified and shared state monitors the bus writing operation, the cache line can be temporarily transmitted to the processor core which sends out the bus writing signal through the bus without being processed by the memory writing back operation.
13. A multi-core processor, comprising: the apparatus of any of claims 7 to 12 for implementing cache coherency for a multi-core processor.
14. A storage medium, characterized in that the storage medium includes a stored program, wherein, when the program runs, a device in which the storage medium is located is controlled to execute the implementation method of cache coherence of a multi-core processor according to any one of claims 1 to 6.
CN202011222430.XA 2020-11-05 2020-11-05 Multi-core processor, cache consistency realization method and device thereof and storage medium Active CN112416615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011222430.XA CN112416615B (en) 2020-11-05 2020-11-05 Multi-core processor, cache consistency realization method and device thereof and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011222430.XA CN112416615B (en) 2020-11-05 2020-11-05 Multi-core processor, cache consistency realization method and device thereof and storage medium

Publications (2)

Publication Number Publication Date
CN112416615A true CN112416615A (en) 2021-02-26
CN112416615B CN112416615B (en) 2024-08-16

Family

ID=74828592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011222430.XA Active CN112416615B (en) 2020-11-05 2020-11-05 Multi-core processor, cache consistency realization method and device thereof and storage medium

Country Status (1)

Country Link
CN (1) CN112416615B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342278A (en) * 2021-06-22 2021-09-03 海光信息技术股份有限公司 Processor and method for keeping cache data consistency
CN113342709A (en) * 2021-06-04 2021-09-03 海光信息技术股份有限公司 Method for accessing data in a multiprocessor system and multiprocessor system
CN114217809A (en) * 2021-04-14 2022-03-22 无锡江南计算技术研究所 Many-core simplified Cache protocol implementation method without transverse consistency
CN114416440A (en) * 2021-11-16 2022-04-29 广东赛昉科技有限公司 A method for realizing multi-core cache consistency verification
CN115373877A (en) * 2022-10-24 2022-11-22 北京智芯微电子科技有限公司 Heterogeneous multi-core processor control method and device for ensuring shared cache coherence
WO2023103767A1 (en) * 2021-12-06 2023-06-15 合肥杰发科技有限公司 Homogeneous multi-core-based multi-operating system, communication method, and chip
WO2024066613A1 (en) * 2022-09-28 2024-04-04 北京微核芯科技有限公司 Access method and apparatus and data storage method and apparatus for multi-level cache system
CN117971728A (en) * 2024-03-29 2024-05-03 北京象帝先计算技术有限公司 Buffer, buffer control method, integrated circuit system, electronic component and equipment
WO2024164977A1 (en) * 2023-02-06 2024-08-15 华为技术有限公司 Data processing method and apparatus, and chip and computer device
CN118838863A (en) * 2024-09-24 2024-10-25 山东云海国创云计算装备产业创新中心有限公司 Data processing method, server, product and medium of multi-core processor
CN119440881A (en) * 2025-01-07 2025-02-14 芯来智融半导体科技(上海)有限公司 A multi-core consistency processing method, system, device and storage medium
CN119645421A (en) * 2025-02-11 2025-03-18 北京麟卓信息科技有限公司 ARM many-core-oriented x86 instruction dynamic conversion cache consistency maintenance method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5434993A (en) * 1992-11-09 1995-07-18 Sun Microsystems, Inc. Methods and apparatus for creating a pending write-back controller for a cache controller on a packet switched memory bus employing dual directories
US20040039880A1 (en) * 2002-08-23 2004-02-26 Vladimir Pentkovski Method and apparatus for shared cache coherency for a chip multiprocessor or multiprocessor system
US20050027946A1 (en) * 2003-07-30 2005-02-03 Desai Kiran R. Methods and apparatus for filtering a cache snoop
CN1609823A (en) * 2003-10-23 2005-04-27 英特尔公司 Method and equipment for maintenance of sharing consistency of cache memory
CN102929832A (en) * 2012-09-24 2013-02-13 杭州中天微系统有限公司 Cache-coherence multi-core processor data transmission system based on no-write allocation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5434993A (en) * 1992-11-09 1995-07-18 Sun Microsystems, Inc. Methods and apparatus for creating a pending write-back controller for a cache controller on a packet switched memory bus employing dual directories
US20040039880A1 (en) * 2002-08-23 2004-02-26 Vladimir Pentkovski Method and apparatus for shared cache coherency for a chip multiprocessor or multiprocessor system
US20050027946A1 (en) * 2003-07-30 2005-02-03 Desai Kiran R. Methods and apparatus for filtering a cache snoop
CN1609823A (en) * 2003-10-23 2005-04-27 英特尔公司 Method and equipment for maintenance of sharing consistency of cache memory
CN102929832A (en) * 2012-09-24 2013-02-13 杭州中天微系统有限公司 Cache-coherence multi-core processor data transmission system based on no-write allocation

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114217809A (en) * 2021-04-14 2022-03-22 无锡江南计算技术研究所 Many-core simplified Cache protocol implementation method without transverse consistency
CN114217809B (en) * 2021-04-14 2024-04-30 无锡江南计算技术研究所 Implementation method of many-core simplified Cache protocol without transverse consistency
CN113342709A (en) * 2021-06-04 2021-09-03 海光信息技术股份有限公司 Method for accessing data in a multiprocessor system and multiprocessor system
CN113342278A (en) * 2021-06-22 2021-09-03 海光信息技术股份有限公司 Processor and method for keeping cache data consistency
CN114416440A (en) * 2021-11-16 2022-04-29 广东赛昉科技有限公司 A method for realizing multi-core cache consistency verification
WO2023103767A1 (en) * 2021-12-06 2023-06-15 合肥杰发科技有限公司 Homogeneous multi-core-based multi-operating system, communication method, and chip
WO2024066613A1 (en) * 2022-09-28 2024-04-04 北京微核芯科技有限公司 Access method and apparatus and data storage method and apparatus for multi-level cache system
CN115373877A (en) * 2022-10-24 2022-11-22 北京智芯微电子科技有限公司 Heterogeneous multi-core processor control method and device for ensuring shared cache coherence
WO2024164977A1 (en) * 2023-02-06 2024-08-15 华为技术有限公司 Data processing method and apparatus, and chip and computer device
CN117971728A (en) * 2024-03-29 2024-05-03 北京象帝先计算技术有限公司 Buffer, buffer control method, integrated circuit system, electronic component and equipment
CN118838863A (en) * 2024-09-24 2024-10-25 山东云海国创云计算装备产业创新中心有限公司 Data processing method, server, product and medium of multi-core processor
CN119440881A (en) * 2025-01-07 2025-02-14 芯来智融半导体科技(上海)有限公司 A multi-core consistency processing method, system, device and storage medium
CN119645421A (en) * 2025-02-11 2025-03-18 北京麟卓信息科技有限公司 ARM many-core-oriented x86 instruction dynamic conversion cache consistency maintenance method
CN119645421B (en) * 2025-02-11 2025-05-13 北京麟卓信息科技有限公司 ARM many-core-oriented x86 instruction dynamic conversion cache consistency maintenance method

Also Published As

Publication number Publication date
CN112416615B (en) 2024-08-16

Similar Documents

Publication Publication Date Title
CN112416615B (en) Multi-core processor, cache consistency realization method and device thereof and storage medium
US5335335A (en) Multiprocessor cache snoop access protocol wherein snoop means performs snooping operations after host bus cycle completion and delays subsequent host bus cycles until snooping operations are completed
JP5367899B2 (en) Technology to save cached information during low power mode
JP5431525B2 (en) A low-cost cache coherency system for accelerators
US6976131B2 (en) Method and apparatus for shared cache coherency for a chip multiprocessor or multiprocessor system
US7590805B2 (en) Monitor implementation in a multicore processor with inclusive LLC
JP3627037B2 (en) Method and computer system for maintaining cache coherency
US5802577A (en) Multi-processing cache coherency protocol on a local bus
TWI299826B (en) System and method of coherent data transfer during processor idle states
CN102929832B (en) Cache-coherence multi-core processor data transmission system based on no-write allocation
US20050005073A1 (en) Power control within a coherent multi-processing system
US20030046495A1 (en) Streamlined cache coherency protocol system and method for a multiple processor single chip device
WO2012170719A1 (en) Systems, methods, and devices for cache block coherence
CN110402433A (en) memory access monitoring
JPH10320283A (en) Method and device for providing cache coherent protocol for maintaining cache coherence in multiprocessor/data processing system
CN107122162B (en) Thousand core high throughput processing system of isomery and its amending method based on CPU and GPU
JP2532191B2 (en) A method of managing data transmission for use in a computing system having a dual bus architecture.
US5829027A (en) Removable processor board having first, second and third level cache system for use in a multiprocessor computer system
US20140229678A1 (en) Method and apparatus for accelerated shared data migration
US20040111563A1 (en) Method and apparatus for cache coherency between heterogeneous agents and limiting data transfers among symmetric processors
US20140297966A1 (en) Operation processing apparatus, information processing apparatus and method of controlling information processing apparatus
CN118113631B (en) Data processing system, method, device, medium and computer program product
US20030023794A1 (en) Cache coherent split transaction memory bus architecture and protocol for a multi processor chip device
US20030163745A1 (en) Method to reduce power in a computer system with bus master devices
EP2771796B1 (en) A three channel cache-coherency socket protocol

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant