CN118550868B - Method and device for determining adjustment strategy, storage medium and electronic device - Google Patents
Method and device for determining adjustment strategy, storage medium and electronic device Download PDFInfo
- Publication number
- CN118550868B CN118550868B CN202411021331.3A CN202411021331A CN118550868B CN 118550868 B CN118550868 B CN 118550868B CN 202411021331 A CN202411021331 A CN 202411021331A CN 118550868 B CN118550868 B CN 118550868B
- Authority
- CN
- China
- Prior art keywords
- instruction
- determining
- instructions
- effective address
- processing core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 238000012545 processing Methods 0.000 claims abstract description 250
- 238000004590 computer program Methods 0.000 claims description 27
- 238000004458 analytical method Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 description 13
- 238000001693 membrane extraction with a sorbent interface Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 238000005457 optimization Methods 0.000 description 11
- 230000003993 interaction Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 238000012937 correction Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 241000876446 Lanthanotidae Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/167—Interprocessor communication using a common memory, e.g. mailbox
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Advance Control (AREA)
Abstract
The embodiment of the application provides a method and a device for determining an adjustment strategy, a storage medium and electronic equipment, wherein the method comprises the steps of determining instruction queues respectively corresponding to a plurality of processing cores of target equipment; determining an instruction information table corresponding to a plurality of instruction queues, wherein the instruction information table at least comprises effective addresses of second instructions belonging to target types in the first instructions and positions of the second instructions in the corresponding instruction queues, and determining an adjustment strategy for sniffing operation of each processing core according to effective addresses respectively corresponding to any two adjacent second instructions in the instruction queues and positions respectively corresponding to any two adjacent second instructions. The method solves the problems that the cache consistency control protocol adopted by the RISC-V multi-core processor is fixed, so that a plurality of invalid sniffing (snoop) operations exist in the cache consistency control flow of the RISC-V multi-core processor, and the processing efficiency of the RISC-V multi-core processor is low.
Description
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and a device for determining an adjustment strategy, a storage medium and electronic equipment.
Background
The fifth generation of reduced instruction processor (Reduced Instruction Set Computer Version, abbreviated as RISC-V) is a brand new instruction set architecture, is an open instruction set architecture established based on the Reduced Instruction Set Computing (RISC) principle, and has the advantages of being freely controllable by any academic institution or commercial organization. In the related art, RISC-V multi-core processors are increasingly used as special purpose processors (not general purpose processors, with the main programs/algorithms running on the special purpose processors being fixed/unified). One of the application scenarios includes using a RISC-V multi-core processor as parallel computing, where each processing core (core) of the RISC-V multi-core processor runs a set of specific operations, and the cores have communication and data interaction, but the frequency of the interaction is low.
But with the widespread use of RISC-V multi-core processors, the problems that arise are increasing. For example, a cache coherence control protocol adopted by a RISC-V multi-core processor is fixed, so that a corresponding state flow jump and sniff (snoop) operation scheme is fixed on cache coherence control regardless of a specific application scenario of an application layer corresponding to the RISC-V multi-core processor. However, in a specific field or a specific application, the state flow jump and snoop scheme on the cache consistency control is invalid in many cases, and the overall prediction efficiency of the RISC-V multi-core processor is not high due to the more invalid flows.
Aiming at the problems that in the related art, a cache consistency control protocol adopted by a RISC-V multi-core processor is fixed, so that a plurality of invalid sniffing (snoop) operations exist in a cache consistency control flow of the RISC-V multi-core processor, and further the processing efficiency of the RISC-V multi-core processor is low, the problem is not solved effectively.
Disclosure of Invention
The embodiment of the application provides a method and a device for determining an adjustment strategy, a storage medium and electronic equipment, which at least solve the problem that in the related art, a cache consistency control protocol adopted by a RISC-V multi-core processor is fixed, so that a plurality of invalid sniffing (snoop) operations exist in the cache consistency control flow of the RISC-V multi-core processor, and further the processing efficiency of the RISC-V multi-core processor is low.
According to one embodiment of the application, an instruction queue corresponding to a plurality of processing cores of target equipment is determined, wherein the instruction queue comprises a first instruction which is sent out by each processing core and is not executed currently, an instruction information table corresponding to a plurality of the instruction queues is determined, the instruction information table at least comprises an effective address of a second instruction belonging to a target type in the first instruction and a position of the second instruction in the corresponding instruction queue, and an adjustment strategy for sniffing operation of each processing core is determined according to the effective address corresponding to any two adjacent second instructions in the instruction queue and the position corresponding to any two adjacent second instructions, wherein the sniffing operation is used for indicating each processing core to allow operation which is executed on other processing cores between the any two adjacent second instructions, so that each processing core keeps consistency with the other processing cores, and the sniffing operation is performed by each processing core except for each processing core.
In an exemplary embodiment, determining instruction queues respectively corresponding to a plurality of processing cores of a target device includes determining a first level cache corresponding to each processing core, wherein the first instruction corresponding to each processing core is stored in the first level cache, and acquiring the first instruction from the first level cache to obtain the instruction queues, wherein the first level cache is a cache directly connected with each processing core.
In an exemplary embodiment, the first instruction is obtained from the first-level cache to obtain the instruction queue, and the method comprises the steps of determining a first quantity of the first instruction allowed to be obtained from the first-level cache, wherein the first quantity is determined through a second quantity of applications running on each processing core, obtaining the first quantity of the first instruction from the first-level cache, and sequentially storing the first quantity of the first instruction into a preset queue corresponding to each processing core to obtain the instruction queue.
In an exemplary embodiment, the first instruction is obtained from the first-level cache to obtain the instruction queue, and the method comprises the steps of determining a preset execution sequence of W first instructions contained in the first-level cache, wherein W is a positive integer, and determining the instruction queue according to the preset execution sequence.
In an exemplary embodiment, determining the instruction queue according to the preset execution sequence includes determining a third instruction in the W first instructions according to the preset execution sequence, where the third instruction is an nth instruction in the W first instructions, N is an integer greater than 1, and determining the instruction queue corresponding to each processing core according to the third instruction.
In an exemplary embodiment, the instruction queue corresponding to each processing core is determined through the third instruction, and the method comprises the steps of acquiring the third instruction into a preset queue corresponding to each processing core, updating, namely determining a fourth instruction executed after the third instruction through the preset execution sequence and updating the fourth instruction into the third instruction when the third instruction is determined to be not a branch instruction, predicting an execution result of the third instruction to obtain a prediction result when the third instruction is determined to be a branch instruction, updating a fifth instruction indicated by the prediction result in W first instructions into the third instruction, circularly executing the acquiring step and the updating step to obtain the instruction queue, wherein M is used for representing a first number of the first instructions allowed to be acquired from the primary cache.
In an exemplary embodiment, determining instruction information tables corresponding to the plurality of instruction queues includes analyzing the first instructions included in the plurality of instruction queues to obtain analysis results, and determining the instruction information tables according to the analysis results.
In an exemplary embodiment, resolving the first instruction included in the plurality of instruction queues to obtain a resolving result, including resolving the first instruction according to an instruction architecture of an instruction of a target type to obtain an instruction type of the first instruction and the effective address of the first instruction, wherein the target type includes the instruction type, determining a position of the first instruction in an instruction queue to which the first instruction belongs, and determining the instruction type, the effective address and the position of the first instruction in the instruction queue to which the first instruction belongs as the resolving result.
In an exemplary embodiment, the first instruction is parsed according to an instruction architecture of an instruction of a target type to obtain an instruction type of the first instruction, and the method comprises the steps of parsing the first instruction according to the instruction architecture to obtain an operation code of the first instruction, determining that the instruction type of the first instruction is a first type when the operation code is a first value, and determining that the instruction type of the first instruction is a second type when the operation code is a second value.
In one exemplary embodiment, resolving the first instruction according to an instruction architecture of an instruction of a target type to obtain the effective address of the first instruction includes resolving the first instruction according to the instruction architecture to obtain an index of a first source register recorded in the first instruction, obtaining a K-bit immediate of a sign bit extension of the first instruction, wherein K is a positive integer, and determining a sum of the index and the K-bit immediate as the effective address.
In one exemplary embodiment, determining the instruction information table according to the analysis result comprises determining a filling position in the instruction information table according to the instruction type and a target processing core to which the first instruction belongs, wherein the processing cores comprise the target processing core, and filling the effective address and the position of the first instruction in an instruction queue to which the first instruction belongs into the filling position to obtain the instruction information table.
In an exemplary embodiment, determining an adjustment policy for the sniffing operation of each processing core according to the effective addresses respectively corresponding to any two adjacent second instructions in the instruction queue and the positions respectively corresponding to the any two adjacent second instructions comprises determining a sixth instruction and a seventh instruction in the any two adjacent second instructions through the instruction information table, determining a first effective address and a first position corresponding to the sixth instruction, determining a second effective address and a second position corresponding to the seventh instruction, determining an eighth instruction of the other processing cores, wherein a third position of the eighth instruction is between the first position and the second position, and determining the adjustment policy through the eighth instruction, the first effective address and the second effective address.
In one exemplary embodiment, determining a sixth instruction and a seventh instruction in the arbitrary two adjacent second instructions through the instruction information table comprises determining a query position corresponding to a target processing core corresponding to the instruction queue in the instruction information table, wherein the plurality of processing cores comprise the target processing core, the query position comprises a filling position, the filling position is determined through an instruction type of the first instruction in the target type and the target processing core to which the first instruction belongs, and acquiring the sixth instruction and the seventh instruction from the query position.
In one exemplary embodiment, determining an eighth instruction for the other processing core includes traversing remaining locations in the instruction information table other than the query location and determining the eighth instruction from the remaining locations.
In one exemplary embodiment, determining an eighth instruction of the other processing cores includes fetching all instructions located between the first location and the second location in other instruction queues, wherein the other instruction queues are other instruction queues of the plurality of instruction queues than the instruction queue, and determining an instruction of the target type of all instructions as the eighth instruction.
In one exemplary embodiment, determining the adjustment policy from the eighth instruction, the first effective address, and the second effective address includes determining a third effective address of the eighth instruction from the instruction information table, determining whether the third effective address is consistent with a fourth effective address, wherein the fourth effective address includes at least one of the first effective address, the second effective address, and determining the adjustment policy to allow the other processing cores to perform the sniff operation if the third effective address and the fourth effective address are consistent.
In one exemplary embodiment, after determining whether the third effective address is consistent with a fourth effective address, the method further includes determining that the adjustment policy is to prohibit the each processing core from performing the snoop operation for the other processing cores if the third effective address and the fourth effective address are not consistent.
In an exemplary embodiment, after determining the adjustment strategy of the sniffing operation of each processing core according to the effective address respectively corresponding to any two adjacent second instructions in the instruction queue and the positions respectively corresponding to the any two adjacent second instructions, the method further comprises determining whether a branch instruction exists in any two adjacent second instructions when the any two adjacent second instructions are executed by each processing core, determining the execution result of the branch instruction when the branch instruction exists in any two adjacent second instructions, and determining the execution mode of the adjustment strategy according to the execution result.
In one exemplary embodiment, determining the execution mode of the adjustment policy according to the execution result includes determining a ninth instruction and a tenth instruction, wherein the ninth instruction is a first instruction of the arbitrary two adjacent second instructions, the tenth instruction is a second instruction of the arbitrary two adjacent second instructions, determining whether the tenth instruction is an eleventh instruction indicated by the execution result or an instruction allowed to be executed after the eleventh instruction in the case that the ninth instruction is a branch instruction, and determining that the adjustment policy is allowed to be executed in the case that the tenth instruction is the eleventh instruction or an instruction allowed to be executed after the eleventh instruction.
According to another embodiment of the application, a determining device of an adjustment strategy is provided, which comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining an instruction queue respectively corresponding to a plurality of processing cores of target equipment, the instruction queue comprises a first instruction which is sent by each processing core and is not executed currently, the second determining module is used for determining an instruction information table corresponding to a plurality of the instruction queues, the instruction information table at least comprises an effective address of a second instruction belonging to a target type in the first instruction and a position of the second instruction in the instruction queue, the third determining module is used for determining an adjustment strategy of a sniffing operation of each processing core according to the effective address respectively corresponding to any two adjacent second instructions in the instruction queue and the position respectively corresponding to any two adjacent second instructions, and the sniffing operation is used for indicating that each processing core allows operations executed on other processing cores between the any two adjacent second instructions, so that each processing core keeps consistency with the processing core and the processing core except for processing the other cores.
According to a further embodiment of the application, there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the application there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to a further embodiment of the application, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.
The method comprises the steps of determining an instruction queue corresponding to a plurality of processing cores of target equipment, wherein the instruction queue comprises a first instruction which is sent by each processing core and is not executed currently, determining an instruction information table corresponding to a plurality of the instruction queues, wherein the instruction information table at least comprises effective addresses of second instructions belonging to target types in the first instruction and positions of the second instructions in the corresponding instruction queue, and determining an adjustment strategy of sniffing operation of each processing core according to the effective addresses respectively corresponding to any two adjacent second instructions in the instruction queue and the positions respectively corresponding to the any two adjacent second instructions, wherein the sniffing operation is used for indicating each processing core to allow operation which is executed on other processing cores between the any two adjacent second instructions so that each processing core keeps cache consistency with the other processing cores, and the other processing cores are processing cores except for each processing core in the plurality of processing cores. Therefore, the problem that in the related art, a cache consistency control protocol adopted by the RISC-V multi-core processor is fixed, so that a plurality of invalid sniffing (snoop) operations exist in the cache consistency control flow of the RISC-V multi-core processor, and the processing efficiency of the RISC-V multi-core processor is low can be solved. The processing efficiency of the RISC-V multi-core processor is improved.
Drawings
Fig. 1 is a hardware block diagram of a computer terminal of a method for determining an adjustment policy according to an embodiment of the present application;
FIG. 2 is a flow chart of a method of determining an adjustment strategy according to an embodiment of the present application;
FIG. 3 is a diagram of a multi-core processor cache architecture of a method of determining an adjustment policy according to an embodiment of the application;
FIG. 4 is a schematic diagram of an instruction architecture corresponding to a RISC-V instruction set of a method for determining an adjustment policy according to an embodiment of the present application;
FIG. 5 is a state jump flow diagram of a MESI cache coherence protocol of a method for determining an adjustment policy according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a snoop flow of a MESI cache coherence protocol of a method for determining an adjustment policy according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an instruction processing optimization module of a method of determining an adjustment strategy according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a RISC-C Load-dependent instruction set of a method of determining an adjustment policy according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a RISC-C Store-related instruction set of a method of determining an adjustment policy according to an embodiment of the present application;
fig. 10 is a block diagram of a configuration of an adjustment policy determination device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal or similar computing device. Taking the operation on a computer terminal as an example, fig. 1 is a block diagram of a hardware structure of a computer terminal according to a method for determining an adjustment policy according to an embodiment of the present application. As shown in fig. 1, the computer terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, wherein the computer terminal may further include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the computer terminal described above. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a load resource allocation method in an embodiment of the present application, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a computer terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
Fig. 2 is a flowchart of a method for determining an adjustment policy according to an embodiment of the present application, as shown in fig. 2, the flowchart includes the steps of:
step S202, determining instruction queues respectively corresponding to a plurality of processing cores of target equipment, wherein the instruction queues comprise first instructions which are sent by each processing core and are not executed currently;
Step S204, determining instruction information tables corresponding to a plurality of instruction queues, wherein the instruction information tables at least comprise effective addresses of second instructions belonging to a target type in the first instructions and positions of the second instructions in the instruction queues;
It should be noted that, the target type specifically refers to Load and Store instructions. The instructions issued by each processing core are not limited to Load and Store instructions, and therefore, it is necessary to determine a second instruction belonging to the target type from the instruction queue and obtain an instruction information table of the second instruction.
And step S206, determining an adjustment strategy of sniffing operation of each processing core according to the effective addresses respectively corresponding to any two adjacent second instructions in the instruction queue and the positions respectively corresponding to the any two adjacent second instructions, wherein the sniffing operation is used for indicating the operation of each processing core which allows execution of other processing cores between the any two adjacent second instructions so as to enable each processing core to keep cache consistency with the other processing cores, and the other processing cores are processing cores except for each processing core in the plurality of processing cores.
It should be noted that, the determined adjustment policy for the sniffing operation specifically refers to the adjustment policy for the sniffing operation between any two adjacent second instructions for each processing core.
It should be noted that, the sniffing operation, i.e. the Snoop operation, the CPU (or the processing core) perceives that the behavior of other CPUs (such as reading and writing a certain cache line) is completed through the request message sent by other CPUs in the sniffing (Snoop) line, and sometimes the CPU needs to respond to some request messages in the bus. This is called a "bus sniffing mechanism".
The method comprises the steps of determining an instruction queue corresponding to a plurality of processing cores of target equipment, wherein the instruction queue comprises a first instruction which is sent by each processing core and is not executed currently, determining an instruction information table corresponding to a plurality of the instruction queues, wherein the instruction information table at least comprises an effective address of a second instruction belonging to a target type in the first instruction and a position of the second instruction in the corresponding instruction queue, and determining an adjustment strategy of sniffing operation of each processing core according to the effective address respectively corresponding to any two adjacent second instructions in the instruction queue and the position respectively corresponding to the any two adjacent second instructions, wherein the sniffing operation is used for indicating the operation which is allowed to be executed by each processing core between the any two adjacent second instructions so as to enable each processing core to keep cache consistency with the other processing cores, and the other processing cores are processing cores except for each processing core in the plurality of processing cores. Therefore, the problem that in the related art, a cache consistency control protocol adopted by the RISC-V multi-core processor is fixed, so that a plurality of invalid sniffing (snoop) operations exist in the cache consistency control flow of the RISC-V multi-core processor, and the processing efficiency of the RISC-V multi-core processor is low can be solved. The processing efficiency of the RISC-V multi-core processor is improved.
In an exemplary embodiment, determining instruction queues respectively corresponding to a plurality of processing cores of a target device includes determining a first level cache corresponding to each processing core, wherein the first instruction corresponding to each processing core is stored in the first level cache, and acquiring the first instruction from the first level cache to obtain the instruction queues, wherein the first level cache is a cache directly connected with each processing core.
It should be noted that, the multiple processing cores in the embodiment of the present application may be multiple processing cores in a RISC-V multi-core processor. As shown in fig. 3, the Cache architecture of the multi-core processor includes processing cores 0, 1, 2, and 3 in a central processing unit (Central Processing Unit, abbreviated as CPU), each processing core is shared by a single level Cache (L1 Cache), 4 processing cores share a single level Cache (L2 Cache), and the two level caches are connected to a Double Data Rate synchronous dynamic random access memory (Double Data Rate, abbreviated as DDR) outside the CPU.
Optionally, in one case, the memory is DRAM (Dynamic RAM) and the cache is SRAM (Static RAM).
The first-level Cache consists of an L1I Cache (first-level instruction Cache) and an L1D Cache (first-level data Cache). The faster the cache speed closer to the CPU, the more expensive the unit price. When the Core queries the cache, the cache is searched from the near to the far, the cache is preferentially searched from the first-level cache, wherein the unit searched from the cache is called a cache line (CACHELINE), the search is ended when the unit searched from the cache is found, and the second-level cache search is carried out when the unit searched from the cache is not found. The second level cache is not found and is found in Main Memory (DDR), wherein the Main Memory is called a Memory line, and the unit searched from the Memory is called a Memory line.
In the embodiment of the application, a first instruction is acquired from a first-level instruction cache in a first-level cache which is shared by each processing core. Alternatively, the first instruction may be pushed into the instruction queue in the order in which the first instructions were fetched.
Optionally, the first instruction is obtained from the first-level cache to obtain the instruction queue, and the method comprises the steps of determining a first quantity of the first instruction allowed to be obtained from the first-level cache, wherein the first quantity is determined through a second quantity of applications running on each processing core, obtaining the first quantity of the first instruction from the first-level cache, and sequentially storing the first quantity of the first instruction into a preset queue corresponding to each processing core to obtain the instruction queue.
That is, in one implementation, the first instructions are fetched in a first number that allows the first instructions to be fetched from the level one cache. Alternatively, the obtained first number of instructions may be stored in a preset queue according to an obtaining sequence, where the preset queue may be an empty queue preset for each processing core.
Optionally, the first instruction is obtained from the first-level cache to obtain the instruction queue, and the method comprises the steps of determining a preset execution sequence of W first instructions contained in the first-level cache, wherein W is a positive integer, and determining the instruction queue according to the preset execution sequence.
That is, in another implementation, the instruction queue is determined according to a preset execution order of W first instructions in the first level cache.
The method comprises the steps of determining a third instruction in W first instructions according to the preset execution sequence, wherein the third instruction is an Nth instruction in W first instructions, N is an integer greater than 1, and determining the instruction queue corresponding to each processing core according to the third instruction.
It will be appreciated that since the analysis of any two adjacent second instructions in the instruction queue takes time, the first instruction cannot be fetched from the first instruction of the W first instructions when fetched from the first level cache, but may choose to be fetched from the nth instruction of the W first instructions. Therefore, before the adjustment strategy of the sniffing operation of each processing core is determined according to the effective addresses respectively corresponding to any two adjacent second instructions in the instruction queue and the positions respectively corresponding to the any two adjacent second instructions, the execution of any two adjacent second instructions is not carried out, and the effective implementation of the adjustment strategy is further ensured.
Optionally, in another implementation manner, a third instruction in the W first instructions is determined through the preset execution sequence, and from the third instruction, a first number of first instructions are acquired to the preset queue, so as to obtain the execution queue.
Further, the instruction queue corresponding to each processing core is determined through the third instruction, and the method comprises the steps of acquiring the third instruction into a preset queue corresponding to each processing core, updating, namely, determining a fourth instruction executed after the third instruction through the preset execution sequence and updating the fourth instruction into the third instruction when the third instruction is determined to be not a branch instruction, predicting an execution result of the third instruction to obtain a prediction result when the third instruction is determined to be the branch instruction, updating a fifth instruction indicated by the prediction result in W first instructions into the third instruction, circularly executing the acquiring step and the updating step for M times to obtain the instruction queue, wherein M is used for representing a first number of the first instructions allowed to be acquired from the primary cache.
In the process of fetching the first instruction into the preset queue, starting from the third instruction, it is required to determine whether the instruction is a branch instruction for each fetched instruction (such as the third instruction and the fourth instruction). When the instruction is a branch instruction, the execution result of the instruction needs to be predicted, and a fifth instruction corresponding to the predicted result is acquired into a preset queue. It should be noted that if the prediction result is inaccurate, the instruction from the fifth instruction in the instruction queue may be obtained in error, so that the snoop operation determined by the instruction from the fifth instruction is finally invalidated.
Wherein a branch instruction is an instruction in a computer program for selecting different execution paths according to different conditions. These instructions allow the program to jump to different code blocks during execution depending on whether conditions are met or not, thereby implementing branching logic in the program. Common branch instructions include conditional branch instructions, unconditional branch instructions, and the like. By using branch instructions, the program may execute different code according to different conditions, thereby implementing different processing logic in different situations.
In an exemplary embodiment, determining instruction information tables corresponding to the plurality of instruction queues includes analyzing the first instructions included in the plurality of instruction queues to obtain analysis results, and determining the instruction information tables according to the analysis results. That is, after the instruction queue is obtained, the first instructions in the instruction queue are sequentially analyzed in the order pushed into the instruction queue (i.e., the fetch order), and the analysis result is obtained, thereby obtaining the execution information table.
Optionally, analyzing the first instruction included in the plurality of instruction queues to obtain an analysis result, including analyzing the first instruction according to an instruction architecture of an instruction of a target type to obtain an instruction type of the first instruction and the effective address of the first instruction, wherein the target type includes the instruction type, determining a position of the first instruction in an instruction queue to which the first instruction belongs, and determining the instruction type, the effective address and the position of the first instruction in the instruction queue to which the first instruction belongs as the analysis result.
Further, the first instruction is analyzed according to the instruction architecture of the target type instruction to obtain the instruction type of the first instruction, the method comprises the steps of analyzing the first instruction according to the instruction architecture to obtain an operation code of the first instruction, determining that the instruction type of the first instruction is the first type when the operation code is a first value, and determining that the instruction type of the first instruction is the second type when the operation code is a second value.
In the embodiment of the application, the target type instruction specifically refers to Load and Store instructions. The Load and Store instructions both belong to the RISC-V instruction set, and the instruction architecture corresponding to the RISC-V instruction set is shown in FIG. 4, where the Load instruction belongs to the I-type instruction (corresponding to the first type) and the Store instruction belongs to the S-type instruction (corresponding to the second type). Specifically, in FIG. 4, opcode is an opcode, 7 bits wide, and is 0-6 bits in the instruction. Opcode=7 'b000011 (corresponding to the first value) for class I Load instruction and opcode=7' b0100011 (corresponding to the second value) for class S Store instruction.
Therefore, whether the first instruction is of the first type or the second type can be determined through the operation code of the first instruction, and therefore the instruction information table can be obtained quickly and efficiently.
Further, the first instruction is analyzed according to the instruction architecture of the target type instruction to obtain the effective address of the first instruction, the first instruction is analyzed according to the instruction architecture to obtain an index of a first source register recorded in the first instruction, a K-bit immediate of sign bit extension of the first instruction is obtained, K is a positive integer, and the sum of the index and the K-bit immediate is determined to be the effective address.
"Imm [ ]" in FIG. 4 represents an immediate, so the K-bit immediate of a first instruction of the first type, i.e., the Load instruction, is imm [11:0], at 20-31 bits of the instruction, and the K-bit immediate of a first instruction of the second type, i.e., the Store instruction, is imm [4:0] +imm [11:5]. Alternatively, K is equal to 12. The first source register is referred to as the rs1 register.
Optionally, determining the instruction information table according to the analysis result includes determining a filling position in the instruction information table according to the instruction type and a target processing core to which the first instruction belongs, wherein the plurality of processing cores include the target processing core, and filling the effective address and the position of the first instruction in an instruction queue to which the first instruction belongs into the filling position to obtain the instruction information table.
Alternatively, the instruction information table may be as shown in table 1. Each row corresponds to each processing core, and each column corresponds to an effective address (load_addr) and a location (load_instr_position) of a first instruction of a first type, and to an effective address (store_addr) and a location (store_instr_position) of a first instruction of a second type, respectively. Optionally, fill locations, such as the locations in Table 1 where core0 is located for filling Load_addr, load_instr_position, or where core0 is located for filling Store_addr, store_instr_position.
TABLE 1
Load_addr | Load_instr_position | Store_addr | Store_instr_position | |
Core0 | Core0_Load_addr_0Core0_Load_addr_1Core0_Load_addr_2...... | Core0_Load_instr_position_0Core0_Load_instr_position_1Core0_Load_instr_position_2...... | Core0_Store_addr_0Core0_Store_addr_1Core0_Store_addr_2...... | Core0_Store_instr_position_0Core0_Store_instr_position_1Core0_Store_instr_position_2 |
Core1 | Core1_Load_addr_0Core1_Load_addr_1Core1_Load_addr_2...... | Core1_Load_instr_position_0Core1_Load_instr_position_1Core1_Load_instr_position_2...... | Core1_Store_addr_0Core1_Store_addr_1Core1_Store_addr_2...... | Core1_Store_instr_position_0Core1_Store_instr_position_1Core1_Store_instr_position_2 |
Core2 | Core2_Load_addr_0Core2_Load_addr_1Core2_Load_addr_2...... | Core2_Load_instr_position_0Core2_Load_instr_position_1Core2_Load_instr_position_2...... | Core2_Store_addr_0Core2_Store_addr_1Core2_Store_addr_2...... | Core2_Store_instr_position_0Core2_Store_instr_position_1Core2_Store_instr_position_2 |
Core3 | Core3_Load_addr_0Core3_Load_addr_1Core3_Load_addr_2...... | Core3_Load_instr_position_0Core3_Load_instr_position_1Core3_Load_instr_position_2...... | Core3_Store_addr_0Core3_Store_addr_1Core3_Store_addr_2...... | Core3_Store_instr_position_0Core3_Store_instr_position_1Core3_Store_instr_position_2 |
In an exemplary embodiment, determining an adjustment policy for the sniffing operation of each processing core according to the effective addresses respectively corresponding to any two adjacent second instructions in the instruction queue and the positions respectively corresponding to the any two adjacent second instructions comprises determining a sixth instruction and a seventh instruction in the any two adjacent second instructions through the instruction information table, determining a first effective address and a first position corresponding to the sixth instruction, determining a second effective address and a second position corresponding to the seventh instruction, determining an eighth instruction of the other processing cores, wherein a third position of the eighth instruction is between the first position and the second position, and determining the adjustment policy through the eighth instruction, the first effective address and the second effective address.
Optionally, determining a sixth instruction and a seventh instruction in the arbitrary two adjacent second instructions through the instruction information table comprises determining a query position corresponding to a target processing core corresponding to the instruction queue in the instruction information table, wherein the plurality of processing cores comprise the target processing core, the query position comprises a filling position, the filling position is determined through an instruction type of the first instruction in the target type and the target processing core to which the first instruction belongs, and the sixth instruction and the seventh instruction are acquired from the query position.
Illustratively, as shown in Table 1, if core0 is the target processing core, then core 0's query location is the row in Table 1 where core0 is located. Any two adjacent second instructions may be two adjacent Load instructions, may be two adjacent Store instructions, may be adjacent Load instructions and Store instructions, and may be adjacent Store instructions and Load instructions. Specifically, the position is determined by looking up the position recorded in the instruction information table, because the position and the effective address information are directly embodied in the instruction information table, it can also be understood that the sixth instruction, the seventh instruction, and the first effective address and the first position of the sixth instruction, and the second effective address and the second position corresponding to the seventh instruction are determined at the same time.
In one exemplary embodiment, determining an eighth instruction for the other processing core includes traversing remaining locations in the instruction information table other than the query location and determining the eighth instruction from the remaining locations.
For example, if core0 is the target processing core, then the rows are the rest of the positions as in Table 1, core2, core 3.
Also for example, if core0 is the target processing core, any two adjacent second instructions of core0 are the Load instruction and Store instruction. The location of the Load instruction is core0_load_instr_position_0=3 and the location of the Store instruction is core0_store_instr_position_0=10;
If the location of the Load instruction for Core1 is core1_load_instr_position_0=5, then the Load instruction for Core1 is determined to be between the Load and Store instructions for Core 0.
If the location of the Load instruction of Core1 is core1_load_instr_position_0=11, then it is determined that the Load instruction of Core1 is not between the Load and Store instructions of Core 0.
It should be noted that, the first number of the first instructions allowed to be included in the instruction queues corresponding to the processing cores is the same. Core0_load_instr_position_0=3 may be understood as the simultaneous execution of an instruction having a position of 3 in the instruction queue corresponding to the other processing Core.
In one exemplary embodiment, determining an eighth instruction of the other processing cores includes fetching all instructions located between the first location and the second location in other instruction queues, wherein the other instruction queues are other instruction queues of the plurality of instruction queues than the instruction queue, and determining an instruction of the target type of all instructions as the eighth instruction.
For example, if core0 is the target processing core, any two adjacent second instructions of core0 are the Load instruction and Store instruction. The first location of the Load instruction is core0_load_instr_position_0=3 and the second location of the Store instruction is core0_store_instr_position_0=10;
then all instructions in the other instruction queues between the 3-10 positions are determined, and the instruction belonging to the target type in all instructions is determined to be the eighth instruction.
In one exemplary embodiment, determining the adjustment policy from the eighth instruction, the first effective address, and the second effective address includes determining a third effective address of the eighth instruction from the instruction information table, determining whether the third effective address is consistent with a fourth effective address, wherein the fourth effective address includes at least one of the first effective address, the second effective address, and determining the adjustment policy to allow the other processing cores to perform the sniff operation if the third effective address and the fourth effective address are consistent.
Therefore, through the arrangement of the instruction information table, any two adjacent second instructions and an eighth instruction positioned between the any two adjacent second instructions can be simply and directly determined, so that the execution strategy of the sniffing operation of each processing core between the any two adjacent second instructions is determined.
In one exemplary embodiment, after determining whether the third effective address is consistent with a fourth effective address, the method further includes determining that the adjustment policy is to prohibit the each processing core from performing the snoop operation for the other processing cores if the third effective address and the fourth effective address are not consistent.
Thus, in the event that it is determined that the third effective address is inconsistent with the fourth effective address, it may be determined that each processing core need not perform a snoop operation between any two adjacent second instructions. Therefore, invalid sniffing operation is omitted, and the processing efficiency of the RISC-V multi-core processor is improved.
In an exemplary embodiment, after determining the adjustment strategy of the sniffing operation of each processing core according to the effective address respectively corresponding to any two adjacent second instructions in the instruction queue and the positions respectively corresponding to the any two adjacent second instructions, the method further comprises determining whether a branch instruction exists in any two adjacent second instructions when the any two adjacent second instructions are executed by each processing core, determining the execution result of the branch instruction when the branch instruction exists in any two adjacent second instructions, and determining the execution mode of the adjustment strategy according to the execution result.
It will be appreciated that after the adjustment policy is determined, each processing core is determined to verify the outcome of execution of the branch instruction, and if the outcome is predicted, the adjustment policy is allowed to be executed.
In one exemplary embodiment, determining the execution mode of the adjustment policy according to the execution result includes determining a ninth instruction and a tenth instruction, wherein the ninth instruction is a first instruction of the arbitrary two adjacent second instructions, the tenth instruction is a second instruction of the arbitrary two adjacent second instructions, determining whether the tenth instruction is an eleventh instruction indicated by the execution result or an instruction allowed to be executed after the eleventh instruction in the case that the ninth instruction is a branch instruction, and determining that the adjustment policy is allowed to be executed in the case that the tenth instruction is the eleventh instruction or an instruction allowed to be executed after the eleventh instruction.
Optionally, determining to prohibit execution of the adjustment policy if it is determined that the tenth instruction is not the eleventh instruction indicated by the execution result or an instruction allowed to be executed after the eleventh instruction, instructing each processing core to suspend execution of the arbitrary two adjacent second instructions, reacquiring the instruction queue corresponding to each processing core, and updating the adjustment policy through the reacquired instruction queue.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
In order to better understand the process of the method for determining the adjustment policy, the implementation flow of the method for determining the adjustment policy is described below in conjunction with the optional embodiment, but is not limited to the technical solution of the embodiment of the present application.
In the related art, in a multi-core processor as shown in fig. 3, multiple copies of the same data may exist in different caches. The presence of these copies severely affects the correctness of the program execution. For example, the same data in the L2 Cache may have copies in both the L1 Cache of core0 and the L1 Cache of core1, and the values of the data in the copies may be different. Thus, a cache coherency protocol (cache coherence protocol) is required to manage multiple copies of shared data. Common cache coherency protocols such as the MESI protocol.
Alternatively, if the same variable is operated in a concurrent scenario (e.g., multithreading), it is guaranteed that the cached variable in each core is the correct value, which involves some "cache coherency" protocols. Among the most widely used RISC-V processors is the MESI protocol.
The MESI protocol is an invite-based cache coherency protocol and is one of the most common protocols supporting write-back caches, belonging to the CPU cache coherency protocol.
It should be noted that there is a Flag field in the information of the cache line, which indicates 4 states, and the states are classified into the states corresponding to M, E, S, I as described below. Namely: modified (Modified) Exclusive (Exclusive) Exclusive? Exclusive) is arranged in a way of.
Wherein M state represents that the contents of the cache line are modified and the cache line is cached only in the Core's exclusive level one cache. The data in the cache line in this state is different from that in memory and will be written to memory at some future time (when other cores are to read the contents of the cache line or when other cores are to modify the contents of the memory to which the cache corresponds).
And E state, namely, representing that the content in the memory corresponding to the cache line is only cached by the Core, and other cores do not cache the content in the memory corresponding to the cache line. The contents of the cache line in this state are consistent with the contents of the memory. The cache may change to S state when any other Core reads the contents of the cache' S corresponding memory. Or the local Core (Core occupying the CACHELINE alone) writes the cache to become M-state.
Wherein, the S state is that the data is stored in the local Core cache and other cores. The data in this state is consistent with the data in memory. When other cores modify the content of the memory corresponding to the cache line, the cache line is changed into an I state.
Wherein I state represents that the contents of the cache line are invalid.
Optionally, in an alternative embodiment of the present application, a state jump flow Cheng Ru of the MESI cache coherence protocol is shown in fig. 5, and a snoop flow of the MESI cache coherence protocol on the basis of the state jump is shown in fig. 6.
In fig. 5 or 6, it should be noted that local refers to the operation of Core0 to Core0 level one cache, and remote refers to the operation of other cores (excluding Core 0) to Core0 level one cache. For example, for CACHELINE of Core 0's L1 Cache, core 0's Read and write operations to Core 0's L1 Cache, called Local Read or Local write (Local write), while other cores initiate operations to Core 0's L1 Cache through L2 Cache, called Remote Read or Remote write (Remote write). In connection with FIG. 3, for CACHELINE of the L2 Cache, because it is shared by 4 cores, none of the Local operations are Remote operations.
Based on fig. 6, the conventional MESI coherency protocol is processed as follows:
the initial state of the cacheline is I state, and the data is empty.
2, In the state of I,
1) Local Read-a snoop operation is required.
If other caches do not have the data, the Cache fetches the data from the memory, and the CACHE LINE state is changed into E;
if other caches have the data and the state is M, the data is updated to the memory, the Cache fetches the data from the memory, and the CACHE LINE states of the 2 caches are changed into S;
If other caches have the data and the state is S or E, the Cache fetches the data from the memory, and the CACHE LINE states of the caches are changed into S.
2)Local Write:
Data is fetched from the memory, modified in the Cache, and the state becomes M;
if other caches have the data and the state is M, the data is updated to the memory;
If the other caches have the data, the CACHE LINE states of the other caches become I;
The state following the I state is also I under Remote Read and Remote Write operations.
3:E state:
local Read, data is fetched from the Cache, and the state is unchanged.
Local Write, modifying data in Cache, state becomes M.
Remote Read-data is shared with other cores and the state becomes S.
Remote Write-data is modified, this CACHE LINE can no longer be used, and the state becomes I.
In 4:S states, a snoop operation is required.
Local Read, data is fetched from the Cache, and the state is unchanged.
Local Write, modify data in Cache, state becomes M, and CACHE LINE state shared by other cores becomes I.
Remote Read, state unchanged.
Remote Write-data is modified, this CACHE LINE can no longer be used, and the state becomes I.
In 5:M states, a snoop operation is required.
Local Read, data is fetched from the Cache, and the state is unchanged.
Local Write, modifying data in Cache, and keeping state unchanged.
Remote Read-this line of data is written to memory, causing other cores to use the latest data, and the state becomes S.
The Remote Write, the line of data is written to memory, enabling other cores to use the latest data, since other cores will modify the line of data, the state becomes I.
Therefore, the MESI state jump can find that the Cache needs a large amount of snoop operations in the I state, the S state and the M state to realize Cache consistency, but the state flow jump and the snoop operations on the Cache consistency control are invalid in a specific field or a specific application, so that the overall processing efficiency of the multi-core processor is generally not high. In particular, in the field of parallel computing of the existing RISC-V multi-core processor, a group of specific operations are operated on each core, communication and data interaction exist among the cores, but application scenes with very low interaction frequency are more and more, and the traditional MESI cannot efficiently meet the requirements of the application.
In summary, an alternative embodiment of the present application proposes a design method that optimizes the performance of a multi-core RISC-V CPU. Aiming at the defects of the traditional scheme, in the parallel computing field of the existing RISC-V multi-core processor, a group of specific operations are operated on each core, communication and data interaction exist among the cores, but the application scene with very low interaction frequency is optimized.
According to the alternative embodiment of the application, an instruction collection function is firstly set, instructions which are not executed in L1 instruction caches in cores are collected according to a preset collection width (corresponding to a first quantity), load and Store instructions are screened and analyzed according to instruction types, wherein the Load corresponds to a read operation, the Store corresponds to a write operation, the Load and Store instructions of the cores (corresponding to each processing Core) are simultaneously analyzed and prejudged by an execution scene in a preset time period, and then the L1L 2 Cache consistency (MESI) state processing is dynamically implemented and optimized according to a prejudgment result, so that invalid Snoop operation is avoided, and meanwhile, in order to avoid errors, a monitoring and correcting function is also set, so that timely correction when the prejudgment is wrong is ensured, normal execution of the instruction functions is ensured, and the performance of the multi-Core RISC-V CPU under a specific scene is greatly improved.
An alternative embodiment of the present application first adds an Instruction processing Optimization module, i.e., an Instruction processing Optimization module, to a conventional solution, as shown in fig. 7. The module contains 5 sub-modules, respectively mode_cfg (Mode configuration module), instruction_collection (Instruction analysis module), instruction_ Analyse (Instruction analysis module), snoop_adjust (Snoop adjustment module), monitor_correct (monitoring correction module).
The following describes 5 sub-modules included in the instruction processing optimization module.
1. Mode_cfg, i.e. a Mode configuration module, the user can configure whether the module selects the normal Mode or the instruction optimization Mode, the configuration register mode_cfg_mode=0 indicates that the user selects the normal Mode, and the configuration register mode_cfg_mode=1 indicates that the user selects the instruction optimization Mode. It should be noted that, in the conventional mode, namely, bypass instruction processing optimization module, the instruction optimization mode can even be the instruction processing optimization module, so that the multi-core RISC-V CPU using the alternative embodiment of the present application can be applied to the parallel computing field of the RISC-V multi-core processor in the conventional mode, a set of specific operations are run on each core, and communication and data interaction are performed between the cores, but the frequency of the interaction is very low.
2. The instruction_collect module receives the collection width transmitted from the mode_cfg module, where the collection width refers to a first number of instructions collected from multiple Core L1 Instruction caches, and the width or the number is determined by a user according to an application running on each Core, for example, may be set to 20 or 30, because different applications use Load or Store instructions (because only Load or Store instructions, the cores will read and write external L1/L2 caches, where Load corresponds to a read operation and Store corresponds to a write operation), and the interval time of Load or Store instructions in some applications may be short, and the interval time of Load or Store instructions in some applications may be long. The collection is that the module obtains the instruction which is not executed yet from the instruction caches of the cores, the L1 caches are divided into two types, one type is used for caching the instruction, the other type is used for caching data, and the other type is used for caching the data. Therefore, the module is provided with an AXI Master interface to read the L1 instruction Cache of each Core. It is noted that the sub-module cannot start execution from the first unexecuted instruction when acquiring unexecuted instructions, because the subsequent analysis and judgment requires time, and thus, time is reserved. It is ensured that the first instruction read by this sub-module has not been executed by the respective cores before the decision result of the subsequent-stage snoop_adjust (Snoop skip module) comes out.
Therefore, the instruction collection module can dynamically adjust the quantity of the collected instructions according to different application programs operated by each Core, and the flexibility and the practicability of the design are enhanced.
3. Instruction_ Analyse, an Instruction analysis module, which receives the Instruction information transmitted by the previous instruction_collection module, and analyzes the Instruction to obtain Load and Store instructions of each Core, where in general, the Load Instruction copies the value of the effective address in the memory/peripheral to the rd register, and the Store Instruction copies the value in the rs2 register to the effective address in the memory/peripheral.
The instruction analysis module has the function of analyzing RISC-V instructions, and the analysis is completed according to the Opcode operation code of the instructions. The RISC-V instruction architecture is shown in FIG. 4.
It should be noted that the Load instruction belongs to the class I instruction, and the Store instruction belongs to the class S instruction;
wherein, for a class I instruction:
The opcode is an operation code, the bit width is 7 bits, and the bit width is 0-6 bits of the instruction;
rd (Destination Register) is a destination register, 5 bits wide, 7-11 bits in the instruction;
funct3 is an operation field. funct3 is 3 bits, which is 12-14 bits of instruction;
rs1 (Source Register 1) is the first Source operand Register, which occupies 5 bits, and is 15-19 bits of the instruction;
deposit 12-bit immediate-imm [11:0], at the 20-31 bits of the instruction.
Wherein, for an S-type instruction:
The opcode is an operation code, the bit width is 7 bits, and the bit width is 0-6 bits of the instruction;
imm[4:0]+imm[11:5];
funct3 is an operation field. funct3 is 3 bits, which is 12-14 bits of instruction;
rs1 (Source Register 1) is the first Source operand Register, which occupies 5 bits, and is 15-19 bits of the instruction;
rs2 (Source Register 2) is the second Source operand Register, 5 bits, at instruction 25-31 bits.
Further, as shown in fig. 8, the instruction set related to RISC-C Load is shown in fig. 8, and it can be seen from fig. 8 that the opcode=7' b000011 of the class I Load instruction.
Further, as shown in fig. 9, the instruction set related to RISC-C Store is shown in fig. 9, and as can be seen from fig. 9, the opcode of the S-type Store instruction=7' b0100011.
The instruction analysis module analyzes the Load/Store instruction types of each Core and analyzes the corresponding effective addresses load_addr/store_addr in each Load/Store. The effective address of the instruction is derived from the value in the rs1 register plus the 12-bit immediate (imm [11:0 ]) (the immediate in the store instruction is split into two parts imm [4:0] +imm [11:5 ]). It is also necessary to resolve the location of the Load/Store of each Core in the instruction queue. Where the location refers to the location of the current Load/Store instruction in the instruction queue read in the instruction collection module, the location of the first instruction=0.
An instruction information table of Load/Store instructions for a plurality of processing cores may be created by the instruction analysis module. The instruction information table may be as shown in table 1.
4. The function of the submodule is to carry out comprehensive analysis according to the Load/Store instruction information table of each Core transmitted by the front stage, then judge and carry out the adjustment of the bloom (namely, determine the adjustment strategy) according to the judgment result. The principle of analysis is to check the address dependencies of multiple cores between Load and Store instructions.
The analysis method is as follows:
Optionally, between execution of Load twice by each Core, other cores have Load or Store operations on the corresponding address, such as Core1 executing Load operations on the Core1_Load_addr_1, core1_Load_addr_2 addresses, core0, core2, core3 not having Load or Store operations on the Core1_Load_addr_1, core1_Load_addr_2 addresses, then it is determined that no Snoop operation is required for Read operations at Core1, because during this time period other cores will not Read or write the address, only Core1 occupies data on the Core1_Load_addr_1, and Core1_Load_addr_2 addresses.
Optionally, between execution of the Store twice by each Core, whether or not other cores Load or Store the corresponding address, such as Core0 executing Store operation on the Core0_store_addr_2, core0_store_addr_3 address, core1, core2, core3 not having Load or Store operation on the Core0_store_addr_2, core0_store_addr_3 address, determines Write operation of Core0, no nooop operation is required, since during this time period other cores will not read or Write the address, only Core0 occupies the Core0_store_addr_2, core0_store_addr_3 address.
Optionally, between each Core executing Load and Store, whether other cores are performing Load or Store operations on the corresponding address, such as Core2 executing Load operations on the execution of the Core2_load_addr_0 address, store operations on the Core2_store_addr_1 address, core0, core1, core3 not performing Load or Store operations on the Core2_load_addr_0 address, core2_store_addr_1 address, then determining Read operations, write operations of Core2, and no snoop operations are required, because during this time period other cores will not Read or Write the address, only Core2 occupies the Core2_load_addr_0, core2_store_addr_1 address.
Optionally, between each Core executing Store and Load, whether other cores execute Store or Load operations on the corresponding address, such as Core3 executing Store operations on the execution Core3_store_addr_1 address, load operations on the execution Core3_load_addr_1 address, core0, core1, core2 not executing Store or Load operations on the Core3_store_addr_1 address, core3_load_addr_1 address, and Core3_load_addr_1 address, determines Write operations, read operations of Core3, and no nooop operations are needed, because during this time period other cores will not Read or Write the address, only Core3 occupies the Core3_store_addr_1, core3_load_addr_1 address.
It should be noted that, between the above descriptions, "the judgment basis" is based on the Load of each Core or the position in the instruction queue read in the Store instruction collecting module, for example, core0_load_instr_position_0, core2_store_instr_position_2, and the like.
Illustratively, the location of the Load instruction of Core0 is Core 0_load_instr_position_0=3 and the location of the Store instruction of Core0 is Core 0_store_instr_position_0=10.
If the location of the Load instruction for Core1 is core1_load_instr_position_0=5, then the Load instruction for Core1 is determined to be between the Load and Store instructions for Core 0.
If the location of the Load instruction of Core1 is core1_load_instr_position_0=11, then it is determined that the Load instruction of Core1 is not between the Load and Store instructions of Core 0.
Monitor_correct, i.e. Monitor correction function, the function of the module ensures timely correction when the pre-determination occurs error, ensures normal execution of instruction function, mainly because "branch instruction", if the predicted result of the branch instruction is found to be wrong when executing, the instruction to be executed next by Core will not be the next instruction in instruction Cache, and needs to be reloaded. Thus, for this scenario, the instruction collection module needs to re-fetch the unexecuted instructions
In summary, the optional embodiment of the present application optimizes the hardware implementation scheme of the cache coherency protocol by adding a hardware design (i.e. adding an instruction processing optimization module). The newly added hardware design realizes an instruction collection function, can collect the instructions which are not executed in the L1 instruction Cache in each Core according to a preset collection width, screens and analyzes Load and Store instructions according to the instruction types, analyzes and pre-judges the executed scene of each Core Load and Store instruction in a preset time period, and dynamically implements optimization processing on the Cache consistency (MESI) state processing of the L1L 2 Cache according to the pre-judging result so as to avoid invalid Snoop operation. Meanwhile, in order to avoid error occurrence, a monitoring and correcting function is further arranged, so that timely correction of error occurrence in the prejudgment of the branch instruction is ensured, normal execution of the instruction function is further ensured, performance of the multi-core RISC-V CPU under a specific scene is greatly improved, and the cache consistency protocol processing rate is also greatly optimized.
The embodiment also provides a device for determining an adjustment policy, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 10 is a block diagram of a configuration of an adjustment policy determining apparatus according to an embodiment of the present application, as shown in fig. 10, including:
a first determining module 1002, configured to determine instruction queues respectively corresponding to a plurality of processing cores of a target device, where the instruction queues include a first instruction that is issued by each processing core and is not executed currently;
A second determining module 1004, configured to determine instruction information tables corresponding to the plurality of instruction queues, where the instruction information tables at least include an effective address of a second instruction belonging to a target type in the first instruction, and a position of the second instruction in the instruction queue to which the second instruction belongs;
And a third determining module 1006, configured to determine an adjustment policy for a sniffing operation of each processing core according to an effective address respectively corresponding to any two adjacent second instructions in the instruction queue and a position respectively corresponding to the any two adjacent second instructions, where the sniffing operation is used to instruct each processing core to allow an operation performed on other processing cores between the any two adjacent second instructions, so that each processing core maintains cache consistency with the other processing cores, and the other processing cores are processing cores except for each processing core in the plurality of processing cores.
The device determines the instruction queues corresponding to the processing cores of the target equipment respectively, wherein the instruction queues comprise first instructions which are sent by each processing core and are not executed currently, determines the instruction information tables corresponding to the instruction queues, wherein the instruction information tables at least comprise effective addresses of second instructions belonging to target types in the first instructions and positions of the second instructions in the corresponding instruction queues, and determines an adjustment strategy of sniffing operation on each processing core according to the effective addresses respectively corresponding to any two adjacent second instructions in the instruction queues and the positions respectively corresponding to the any two adjacent second instructions, wherein the sniffing operation is used for indicating that each processing core allows operation executed on other processing cores between the any two adjacent second instructions so that each processing core keeps cache consistency with the other processing cores, and the other processing cores are processing cores except for each processing core in the processing cores. Therefore, the problem that in the prior art, a cache consistency control protocol adopted by a RISC-V multi-core processor is fixed, so that a plurality of invalid sniffing (snoop) operations exist in a cache consistency control flow of the RISC-V multi-core processor, and the processing efficiency of the RISC-V multi-core processor is low is solved. The processing efficiency of the RISC-V multi-core processor is improved.
In an exemplary embodiment, a first determining module 1002 is configured to determine a first level cache corresponding to each processing core, where the first level cache stores the first instruction corresponding to each processing core, and obtain the first instruction from the first level cache to obtain the instruction queue, where the first level cache is a cache directly connected to each processing core.
In an exemplary embodiment, a first determining module 1002 is configured to determine a first number of the first instructions allowed to be obtained from the first level cache, where the first number is determined by a second number of applications running on each processing core, obtain the first number of the first instructions from the first level cache, and store the first number of the first instructions in a preset queue corresponding to each processing core in turn, so as to obtain the instruction queue.
In an exemplary embodiment, a first determining module 1002 is configured to determine a preset execution order of W first instructions included in the first level cache, where W is a positive integer, and determine the instruction queue according to the preset execution order.
In an exemplary embodiment, a first determining module 1002 is configured to determine, according to the preset execution sequence, a third instruction of the W first instructions, where the third instruction is an nth instruction of the W first instructions, and N is an integer greater than 1, and determine, according to the third instruction, the instruction queue corresponding to each processing core.
In an exemplary embodiment, the first determining module 1002 is configured to obtain the third instruction into a preset queue corresponding to each processing core, update the third instruction into a preset queue corresponding to each processing core, determine, if the third instruction is not a branch instruction, a fourth instruction executed after the third instruction according to the preset execution sequence, and update the fourth instruction into the third instruction, predict an execution result of the third instruction to obtain a prediction result if the third instruction is a branch instruction, update a fifth instruction indicated by the prediction result in W first instructions into the third instruction, and execute the obtaining step and the updating step in a loop for M times to obtain the instruction queue, where M is used to represent a first number of the first instructions allowed to be obtained from the primary cache.
In an exemplary embodiment, the second determining module 1004 is configured to parse the first instructions included in the plurality of instruction queues to obtain a parsing result, and determine the instruction information table according to the parsing result.
In an exemplary embodiment, the second determining module 1004 is configured to parse the first instruction according to an instruction architecture of an instruction of a target type, to obtain an instruction type of the first instruction and the effective address of the first instruction, where the target type includes the instruction type, determine a location of the first instruction in an instruction queue to which the first instruction belongs, and determine the instruction type, the effective address, and a location of the first instruction in the instruction queue to which the first instruction belongs as the parsing result.
In an exemplary embodiment, the second determining module 1004 is configured to parse the first instruction according to the instruction architecture to obtain an operation code of the first instruction, determine that an instruction type of the first instruction is a first type if the operation code is a first value, and determine that the instruction type of the first instruction is a second type if the operation code is a second value.
In an exemplary embodiment, a second determining module 1004 is configured to parse the first instruction according to the instruction architecture, obtain an index of a first source register recorded in the first instruction, obtain a K-bit immediate of a sign bit extension of the first instruction, where K is a positive integer, and determine a sum of the index and the K-bit immediate as the effective address.
In an exemplary embodiment, a second determining module 1004 is configured to determine a padding position in the instruction information table according to the instruction type and a target processing core to which the first instruction belongs, where the multiple processing cores include the target processing core, and fill the effective address and a position of the first instruction in an instruction queue to which the first instruction belongs into the padding position to obtain the instruction information table.
In an exemplary embodiment, a third determining module 1006 is configured to determine a sixth instruction and a seventh instruction of the arbitrary two adjacent second instructions through the instruction information table, determine a first effective address and a first location corresponding to the sixth instruction, and determine a second effective address and a second location corresponding to the seventh instruction, determine an eighth instruction of the other processing core, wherein a third location of the eighth instruction is between the first location and the second location, and determine the adjustment policy through the eighth instruction, the first effective address, and the second effective address.
In an exemplary embodiment, a third determining module 1006 is configured to determine a query location corresponding to a target processing core corresponding to the instruction queue in the instruction information table, where the plurality of processing cores includes the target processing core, the query location includes a padding location, the padding location is determined by an instruction type of the first instruction in the target type and the target processing core to which the first instruction belongs, and the sixth instruction and the seventh instruction are acquired from the query location.
In an exemplary embodiment, a third determining module 1006 is configured to traverse the remaining locations in the instruction information table other than the query location and determine the eighth instruction from the remaining locations.
In an exemplary embodiment, a third determining module 1006 is configured to obtain all instructions located between the first location and the second location in other instruction queues, where the other instruction queues are other instruction queues of the plurality of instruction queues except the instruction queue, and determine, as the eighth instruction, an instruction belonging to the target type in all instructions.
In an exemplary embodiment, a third determining module 1006 is configured to determine a third effective address of the eighth instruction through the instruction information table, determine whether the third effective address is consistent with a fourth effective address, where the fourth effective address includes at least one of the first effective address, the second effective address, and determine the adjustment policy to allow the each processing core to perform the snoop operation if the third effective address and the fourth effective address are consistent.
In an exemplary embodiment, a third determining module 1006 is configured to determine that the adjustment policy is to prohibit the execution of the sniff operation by the other processing cores by the each processing core if the third effective address and the fourth effective address are not identical.
In an exemplary embodiment, after determining the adjustment policy for the sniffing operation of each processing core according to the effective addresses respectively corresponding to any two adjacent second instructions in the instruction queue and the positions respectively corresponding to the any two adjacent second instructions, the device further comprises a fourth determining module, configured to determine whether a branch instruction exists in any two adjacent second instructions when the any two adjacent second instructions are executed by each processing core, determine an execution result of the branch instruction when the branch instruction exists in any two adjacent second instructions, and determine an execution mode of the adjustment policy according to the execution result.
In an exemplary embodiment, a fourth determining module is configured to determine a ninth instruction and a tenth instruction, where the ninth instruction is a first instruction of the any two adjacent second instructions, the tenth instruction is a second instruction of the any two adjacent second instructions, determine whether the tenth instruction is an eleventh instruction indicated by the execution result or an instruction allowed to be executed after the eleventh instruction in a case where the ninth instruction is a branch instruction, and determine that the adjustment policy is allowed to be executed in a case where the tenth instruction is the eleventh instruction or an instruction allowed to be executed after the eleventh instruction.
Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
In an exemplary embodiment, the computer readable storage medium may include, but is not limited to, a U disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, etc. various media in which a computer program may be stored.
An embodiment of the application also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
In an exemplary embodiment, the electronic device may further include a transmission device connected to the processor, and an input/output device connected to the processor.
Embodiments of the application also provide a computer program product comprising a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.
Embodiments of the present application also provide another computer program product comprising a non-volatile computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.
Embodiments of the present application also provide a computer program comprising computer instructions stored in a computer readable storage medium, a processor of a computer device reading the computer instructions from the computer readable storage medium, the processor executing the computer instructions to cause the computer device to perform the steps of any of the method embodiments described above.
Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present application should be included in the protection scope of the present application.
Claims (18)
1. A method for determining an adjustment strategy, comprising:
determining an instruction queue respectively corresponding to a plurality of processing cores of target equipment, wherein the instruction queue comprises a first instruction which is sent by each processing core and is not executed currently;
Determining instruction information tables corresponding to a plurality of instruction queues, wherein the instruction information tables at least comprise effective addresses of second instructions belonging to target types in the first instructions and positions of the second instructions in the corresponding instruction queues;
Determining an adjustment strategy of sniffing operation of each processing core according to effective addresses respectively corresponding to any two adjacent second instructions in the instruction queue and positions respectively corresponding to the any two adjacent second instructions, wherein the sniffing operation is used for indicating that each processing core allows operations executed on other processing cores between the any two adjacent second instructions so as to enable each processing core to keep cache consistency with the other processing cores, the other processing cores are processing cores except for each processing core in the plurality of processing cores,
Determining an adjustment strategy for the sniffing operation of each processing core according to the effective addresses respectively corresponding to any two adjacent second instructions in the instruction queue and the positions respectively corresponding to any two adjacent second instructions, wherein the adjustment strategy comprises the steps of determining a sixth instruction and a seventh instruction in the any two adjacent second instructions through the instruction information table, determining a first effective address and a first position corresponding to the sixth instruction, determining a second effective address and a second position corresponding to the seventh instruction, determining an eighth instruction of other processing cores, determining a third position of the eighth instruction between the first position and the second position, determining the adjustment strategy through the eighth instruction, the first effective address and the second effective address,
Determining a sixth instruction and a seventh instruction in the arbitrary two adjacent second instructions through the instruction information table, wherein the method comprises the steps of determining a corresponding query position of a target processing core corresponding to the instruction queue in the instruction information table, wherein the plurality of processing cores comprise the target processing core, the query position comprises a filling position, the filling position is determined through an instruction type of the first instruction in the target type and the target processing core to which the first instruction belongs, acquiring the sixth instruction and the seventh instruction from the query position,
Determining the adjustment policy by the eighth instruction, the first effective address and the second effective address comprises determining a third effective address of the eighth instruction by the instruction information table, determining whether the third effective address is consistent with a fourth effective address, wherein the fourth effective address comprises at least one of the first effective address, the second effective address, and determining the adjustment policy to allow the other processing cores to execute the sniff operation if the third effective address and the fourth effective address are consistent, wherein determining the eighth instruction of the other processing cores comprises traversing the rest positions except the query position in the instruction information table, and determining the eighth instruction from the rest positions.
2. The method of claim 1, wherein determining instruction queues respectively corresponding to the plurality of processing cores of the target device comprises:
determining a first level cache corresponding to each processing core, wherein the first level cache stores the first instruction corresponding to each processing core;
And acquiring the first instruction from the first-level cache to obtain the instruction queue, wherein the first-level cache is a cache directly connected with each processing core.
3. The method of claim 2, wherein retrieving the first instruction from the level one cache to obtain the instruction queue comprises:
Determining a first number of the first instructions allowed to be fetched from the level one cache, wherein the first number is determined by a second number of applications running on each of the processing cores;
and acquiring a first number of first instructions from the first-level cache, and sequentially storing the first number of first instructions into a preset queue corresponding to each processing core to obtain the instruction queue.
4. The method of claim 2, wherein retrieving the first instruction from the level one cache to obtain the instruction queue comprises:
Determining a preset execution sequence of W first instructions included in the first-level cache, wherein W is a positive integer;
and determining the instruction queue through the preset execution sequence.
5. The method of claim 4, wherein determining the instruction queue by the preset execution order comprises:
Determining a third instruction in the W first instructions according to the preset execution sequence, wherein the third instruction is an Nth instruction in the W first instructions, and N is an integer greater than 1;
and determining the instruction queue corresponding to each processing core through the third instruction.
6. The method of claim 5, wherein determining the instruction queue corresponding to each processing core by the third instruction comprises:
the third instruction is acquired into a preset queue corresponding to each processing core;
An updating step of determining a fourth instruction executed after the third instruction by the preset execution order and updating the fourth instruction to the third instruction, in the case where it is determined that the third instruction is not a branch instruction; under the condition that the third instruction is determined to be a branch instruction, predicting an execution result of the third instruction to obtain a prediction result;
And circularly executing the acquisition step and the updating step for M times to obtain the instruction queue, wherein M is used for representing a first number of the first instructions allowed to be acquired from the primary cache.
7. The method of claim 1, wherein determining instruction information tables corresponding to a plurality of the instruction queues comprises:
Analyzing the first instructions contained in the instruction queues to obtain analysis results;
and determining the instruction information table according to the analysis result.
8. The method of claim 7, wherein parsing the first instruction included in the plurality of instruction queues to obtain a parsed result comprises:
Analyzing the first instruction according to the instruction architecture of the target type instruction to obtain the instruction type of the first instruction and the effective address of the first instruction, wherein the target type comprises the instruction type;
and determining the position of the first instruction in an instruction queue to which the first instruction belongs;
and determining the instruction type, the effective address and the position of the first instruction in an instruction queue to which the first instruction belongs as the analysis result.
9. The method of claim 8, wherein resolving the first instruction according to the instruction architecture of the target type instruction to obtain the instruction type of the first instruction comprises:
Analyzing the first instruction according to the instruction architecture to obtain an operation code of the first instruction;
Determining that the instruction type of the first instruction is a first type if the opcode is a first value;
In the case where the opcode is a second value, it is determined that the instruction type of the first instruction is a second type.
10. The method of claim 8, wherein resolving the first instruction according to an instruction architecture of an instruction of a target type to obtain the effective address of the first instruction comprises:
Analyzing the first instruction according to the instruction architecture to obtain an index of a first source register recorded in the first instruction and obtain a K-bit immediate of sign bit expansion of the first instruction, wherein K is a positive integer;
and determining the sum value of the index and the K-bit immediate as the effective address.
11. The method of claim 8, wherein determining the instruction information table from the parsing result comprises:
Determining filling positions in the instruction information table according to the instruction type and a target processing core to which the first instruction belongs, wherein the plurality of processing cores comprise the target processing core;
and filling the effective address and the position of the first instruction in an instruction queue to which the first instruction belongs into the filling position to obtain the instruction information table.
12. The method of claim 1, wherein after determining whether the third effective address is consistent with a fourth effective address, the method further comprises:
And in the case that the third effective address and the fourth effective address are inconsistent, determining the adjustment policy to prohibit the execution of the sniff operation by the other processing cores by the each processing core.
13. The method according to claim 1, wherein after determining the adjustment policy for the sniff operation of each processing core according to the effective addresses respectively corresponding to any two adjacent second instructions in the instruction queue and the positions respectively corresponding to the any two adjacent second instructions, the method further comprises:
determining whether a branch instruction exists in any two adjacent second instructions under the condition that each processing core executes the any two adjacent second instructions;
determining an execution result of a branch instruction under the condition that the branch instruction exists in any two adjacent second instructions;
And determining the execution mode of the adjustment strategy according to the execution result.
14. The method of claim 13, wherein determining the manner of execution of the adjustment strategy from the execution result comprises:
Determining a ninth instruction and a tenth instruction, wherein the ninth instruction is a first instruction of the any two adjacent second instructions, and the tenth instruction is a second instruction of the any two adjacent second instructions;
in the case where the ninth instruction is a branch instruction, determining whether the tenth instruction is an eleventh instruction indicated by the execution result or an instruction permitted to be executed after the eleventh instruction;
In the case where the tenth instruction is the eleventh instruction or an instruction permitted to be executed after the eleventh instruction, it is determined that execution of the adjustment policy is permitted.
15. An adjustment strategy determining apparatus, comprising:
the first determining module is used for determining an instruction queue respectively corresponding to a plurality of processing cores of the target equipment, wherein the instruction queue comprises a first instruction which is sent by each processing core and is not executed currently;
The second determining module is used for determining instruction information tables corresponding to a plurality of instruction queues, wherein the instruction information tables at least comprise effective addresses of second instructions belonging to target types in the first instructions and positions of the second instructions in the instruction queues;
A third determining module, configured to determine an adjustment policy for a snoop operation of each processing core according to an effective address respectively corresponding to any two adjacent second instructions in the instruction queue and a position respectively corresponding to the any two adjacent second instructions, where the snoop operation is used to instruct each processing core to allow an operation performed on other processing cores between the any two adjacent second instructions, so that each processing core maintains cache consistency with the other processing cores, where the other processing cores are processing cores except for each processing core in the plurality of processing cores,
The third determining module is further configured to determine a sixth instruction and a seventh instruction in the arbitrary two adjacent second instructions through the instruction information table, determine a first effective address and a first location corresponding to the sixth instruction, and determine a second effective address and a second location corresponding to the seventh instruction, determine an eighth instruction of the other processing core, wherein a third location of the eighth instruction is between the first location and the second location, determine the adjustment policy through the eighth instruction, the first effective address, and the second effective address,
The third determining module is further configured to determine a query location corresponding to a target processing core corresponding to the instruction queue in the instruction information table, where the plurality of processing cores includes the target processing core, the query location includes a padding location, the padding location is determined by an instruction type of the first instruction in the target type and the target processing core to which the first instruction belongs, obtain the sixth instruction and the seventh instruction from the query location,
The third determining module is further configured to determine, according to the instruction information table, a third effective address of the eighth instruction, determine whether the third effective address is consistent with a fourth effective address, where the fourth effective address includes at least one of the first effective address and the second effective address, determine that the adjustment policy is to allow the other processing cores to execute the sniffing operation if the third effective address is consistent with the fourth effective address, and determine the eighth instruction from the rest positions.
16. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, wherein the computer program, when being executed by a processor, implements the steps of the method according to any of the claims 1 to 14.
17. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 14 when the computer program is executed.
18. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method as claimed in any one of claims 1 to 14.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411021331.3A CN118550868B (en) | 2024-07-29 | 2024-07-29 | Method and device for determining adjustment strategy, storage medium and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411021331.3A CN118550868B (en) | 2024-07-29 | 2024-07-29 | Method and device for determining adjustment strategy, storage medium and electronic device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118550868A CN118550868A (en) | 2024-08-27 |
CN118550868B true CN118550868B (en) | 2024-12-20 |
Family
ID=92444466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411021331.3A Active CN118550868B (en) | 2024-07-29 | 2024-07-29 | Method and device for determining adjustment strategy, storage medium and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118550868B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345547A (en) * | 2012-06-15 | 2018-07-31 | 英特尔公司 | Out of order load based on lock with based on synchronous method |
CN115373877A (en) * | 2022-10-24 | 2022-11-22 | 北京智芯微电子科技有限公司 | Heterogeneous multi-core processor control method and device for ensuring shared cache coherence |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5588131A (en) * | 1994-03-09 | 1996-12-24 | Sun Microsystems, Inc. | System and method for a snooping and snarfing cache in a multiprocessor computer system |
US6601144B1 (en) * | 2000-10-26 | 2003-07-29 | International Business Machines Corporation | Dynamic cache management in a symmetric multiprocessor system via snoop operation sequence analysis |
US7143246B2 (en) * | 2004-01-16 | 2006-11-28 | International Business Machines Corporation | Method for supporting improved burst transfers on a coherent bus |
US8438335B2 (en) * | 2010-09-28 | 2013-05-07 | Intel Corporation | Probe speculative address file |
CN112685335B (en) * | 2020-12-28 | 2022-07-15 | 湖南博匠信息科技有限公司 | Data storage system |
KR20220149100A (en) * | 2021-04-30 | 2022-11-08 | 주식회사 멤레이 | Method for managing cache, method for balancing memory traffic, and memory controlling apparatus |
CN116049034A (en) * | 2022-04-29 | 2023-05-02 | 海光信息技术股份有限公司 | Verification method and device for cache consistency of multi-core processor system |
CN115563027B (en) * | 2022-11-22 | 2023-05-12 | 北京微核芯科技有限公司 | Method, system and device for executing stock instruction |
CN117608667B (en) * | 2024-01-23 | 2024-05-24 | 合芯科技(苏州)有限公司 | Instruction set processing system, method and electronic equipment |
-
2024
- 2024-07-29 CN CN202411021331.3A patent/CN118550868B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345547A (en) * | 2012-06-15 | 2018-07-31 | 英特尔公司 | Out of order load based on lock with based on synchronous method |
CN115373877A (en) * | 2022-10-24 | 2022-11-22 | 北京智芯微电子科技有限公司 | Heterogeneous multi-core processor control method and device for ensuring shared cache coherence |
Also Published As
Publication number | Publication date |
---|---|
CN118550868A (en) | 2024-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11086792B2 (en) | Cache replacing method and apparatus, heterogeneous multi-core system and cache managing method | |
US9911508B2 (en) | Cache memory diagnostic writeback | |
US11892949B2 (en) | Reducing cache transfer overhead in a system | |
CN104508639B (en) | Use the coherency management of coherency domains table | |
CN118838863B (en) | Data processing method, server, product and medium for multi-core processor | |
DE112016004303T5 (en) | Low administration hardware prediction element to reduce power inversion for core-to-core data transfer optimization commands | |
CN103729306B (en) | The method and data processing equipment of cache block invalidation | |
CN110959154B (en) | Private cache for thread local store data access | |
US11483260B2 (en) | Data processing network with flow compaction for streaming data transfer | |
JP5226010B2 (en) | Shared cache control device, shared cache control method, and integrated circuit | |
US12164426B2 (en) | Reconfigurable cache hierarchy framework for the storage of FPGA bitstreams | |
US8359433B2 (en) | Method and system of handling non-aligned memory accesses | |
CN104583974B (en) | Reduced expandable cache directory | |
CN118550868B (en) | Method and device for determining adjustment strategy, storage medium and electronic device | |
US20240232083A1 (en) | Partitioning a cache for application of a replacement policy | |
JP2010244327A (en) | Cache system | |
KR101303079B1 (en) | Apparatus and method for controlling cache coherence in virtualized environment based on multi-core | |
CN110955386B (en) | Method for managing the provision of information, such as instructions, to a microprocessor and corresponding system | |
CN119645421B (en) | ARM many-core-oriented x86 instruction dynamic conversion cache consistency maintenance method | |
CN119292764B (en) | Data processing method, device, electronic device, and computer-readable storage medium | |
US9430397B2 (en) | Processor and control method thereof | |
CN116303125B (en) | Request scheduling method, cache, device, computer equipment and storage medium | |
JP2020077075A (en) | Information processing device, information processing method, and program | |
CN119065991A (en) | A processing method, device, controller, electronic device and storage medium | |
WO2025130918A1 (en) | Service processing apparatus and method, and device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |