[go: up one dir, main page]

CN115437691A - Physical register file allocation device for RISC-V vector and floating point register - Google Patents

Physical register file allocation device for RISC-V vector and floating point register Download PDF

Info

Publication number
CN115437691A
CN115437691A CN202211397078.2A CN202211397078A CN115437691A CN 115437691 A CN115437691 A CN 115437691A CN 202211397078 A CN202211397078 A CN 202211397078A CN 115437691 A CN115437691 A CN 115437691A
Authority
CN
China
Prior art keywords
register file
physical register
vector
floating point
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211397078.2A
Other languages
Chinese (zh)
Other versions
CN115437691B (en
Inventor
罗嘉蕙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jindi Space Time Hangzhou Technology Co ltd
Original Assignee
Jindi Space Time Hangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jindi Space Time Hangzhou Technology Co ltd filed Critical Jindi Space Time Hangzhou Technology Co ltd
Priority to CN202211397078.2A priority Critical patent/CN115437691B/en
Publication of CN115437691A publication Critical patent/CN115437691A/en
Application granted granted Critical
Publication of CN115437691B publication Critical patent/CN115437691B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/3013Organisation of register space, e.g. banked or distributed register file according to data content, e.g. floating-point registers, address registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a physical register file allocation device aiming at RISC-V vectors and floating point registers, which comprises a physical register file, at least three groups of which are divided into a floating point exclusive physical register file, a vector exclusive physical register file and a shared physical register file; wherein the floating point exclusive physical register file is allocated for use only with floating point architectural registers, wherein the vector exclusive physical register file is allocated for use only with vector architectural registers, and wherein the shared physical registers are allocable for use with both floating point architectural registers and vector architectural registers. The exclusive physical register file provides necessary register file resource storage data for the floating point register and the vector register respectively, and the shared physical register file can be flexibly distributed to the floating point register and the vector register for use; meanwhile, by monitoring the return control logic, the shared table entry can be released in time for subsequent instructions to use, and the utilization efficiency of the shared physical register is effectively improved.

Description

Physical register file allocation device for RISC-V vector and floating point register
Technical Field
The invention relates to a physical register file allocation device of RISC-V processor (CPU) out-of-order architecture vector and floating point register.
Background
Out-of-order (OoO) execution is a common performance-enhancing approach in microarchitectural design of high-performance processors, where the order of instruction execution is determined by data availability rather than by the instruction order of the program itself, thereby reducing the latency of processor stalls due to instructions waiting for data. Taking the following instruction sequence as an example, instruction 1 needs to wait for the result of instruction 0 when executed in order, while instruction 0 is a divide instruction, which typically requires a longer execution delay, resulting in the processor stalling for a result. In the out-of-order architecture, the operand of instruction 2 to instruction 4 is not dependent on the result of instruction 0 or instruction 1 by detection, so that the delay of processor stall is reduced by executing instruction 2 to instruction 4 in advance.
Instruction 0: div x2, x1, x0
Instruction 1: sub x4, x2, x3
Instruction 2: add x3, x5, x6
Instruction 3: add x2, x7, x8
Instruction 4: sub x5, x2, x3
......
However, the out-of-order execution may cause the processor to perform an error, for example, the instruction sequence described above, because instruction 1 cannot be executed and instructions 2 to 4 are executed in advance, if the result of instruction 3 is written back to the x2 register directly in advance, instruction 1 may miss the x2 result updated by instruction 3, thereby causing the operation error. Therefore, in the design of the out-of-order architecture, an additional mechanism is required to avoid the problem caused by the change of the execution order.
X0 x8 in the above example are registers used by the Instruction Set Architecture (ISA). The ISA holds operands and results of instructions by introducing a number of registers, while the processor is responsible for performing the instruction functions defined by the ISA. Different ISAs may have different numbers of registers, for example, the x86 instruction set has 8/16 integer registers, the ARM instruction set has 32 integer registers, and the RISC-V instruction set has 32 integer registers. Such instruction set architecture dependent registers are referred to as architectural registers.
And the processor often needs to map the architecture register to a physical register on hardware to realize the writing and reading of the register data. In a simple sequential architecture, because the execution sequence of the instructions is consistent with the program sequence, the instructions are usually mapped one by one only by using the physical registers with the same number as the architectural registers, and each instruction reads the corresponding operand from the corresponding physical register according to the index of the operand and writes back the corresponding destination register after the instruction is completed.
In an out-of-order architecture, the problem of out-of-order read and write back registers is typically solved by introducing register renaming techniques. The processor introduces a renaming table, and the physical register corresponding to each architecture register is recorded by the renaming table. When an instruction writes back a register, the register is remapped to a free physical register, that is, the result of the instruction is written back to the remapped physical register, and if the subsequent instruction needs to use the result register of the instruction as a source operand, the information of the corresponding physical register can be obtained by searching the rename table, and corresponding data can be read.
We also use the above instruction sequence as an example to describe the operation mechanism of the rename table, and assume that there are 16 general architecture registers x0 to x15 and 32 physical registers p0 to p31, as shown in fig. 1. Before instruction 0 is executed, the mapping relationships recorded on the rename table are x0 to x8 mapped to p0 to p8, respectively.
Operands x0 and x1 of instruction 0 correspond to p0 and p1, and if the instruction writes back x2, a new mapping p27 is allocated for x 2;
the mapping table relation seen by instruction 1 changes, operands x2 and x3 correspond to p27 and p3, respectively, and the instruction writes back x4, then a new mapping table relation p28 is allocated for x 4;
the mapping table relation seen by the instruction 2 changes, operands x5 and x6 correspond to p5 and p6, respectively, and the instruction writes back x3, then a new mapping relation p29 is allocated for x 3;
the mapping table relationship seen by instruction 3 changes, operands x7 and x8 correspond to p7 and p8, respectively, and the instruction writes back x2, then x2 is assigned a new mapping table relationship p30.
It can be seen that even if instruction 3 is issued ahead of time and x2 is written back, it will not affect instruction 1 from getting the correct x2 data, because x2 of two different instructions map to different physical register entries, by the above-described renaming mechanism. The p2 entry corresponding to instruction 1 may be released for use by subsequent instructions after instruction 0 is written back and instruction 1 reads data and transmits it. Since an architectural register may need to occupy several physical registers at the same time, this mechanism often requires the number of physical registers to be greater than the number of architectural registers, otherwise the instruction may be stalled because it cannot be allocated to a free physical register, waiting for the physical register to be released. Specific principles can be found in "computer architecture: quantitative research methods" by john l. Henius and davi. A. Patson.
In addition to integer registers, most processor ISAs include floating point registers or vector registers for floating point instructions and vector instructions, such as the x86 AVX and ARM Neon instruction set. Similar to integer registers, renaming techniques are also applicable to out-of-order execution of floating points and vectors.
The invention is based on the floating point extended instruction set and vector extended instruction set vector extension of the open source architecture RISC-V, the concrete instruction set architecture spec file can refer to the website: https:// githu.com/RISC-V/RISC-V-isa-manual/reuses/download/rating-IMAFDQC/RISC-V-spec-20191213. Pdf;
https://github.com/RISC-V/RISC-V-v-spec/releases/tag/v1.0。
the floating point extended instruction set includes 32 floating point registers f0 to f31, each having a bit width of FLEN (FLEN = 32/64). The vector extended instruction set includes 32 vector registers v 0-v 31, each register having a bit width VLEN (VLEN =64/128/256.. 2048).
Generally speaking, different types of architectural registers have various physical register files, for example, the skylake architecture of Intel has 180 independent integer physical registers and 168 vector physical registers; zen 2 of AMD has 180 independent integer physical registers and 160 floating point physical registers (see https:// www. Hardware. Com/intel-sunny-cove-vs-AMD-Zen-2-core-architecture-10 th-gen-ice-lake-vs-ryzen-3000 /); the open source RISC-V architecture BOOM has 128 integer physical registers and 128 floating point physical register files (refer to https:// read thetaedocs. Org/projects/RISC-V-bottom/down loads/pdf/latest /); an open source RISC-V architecture Xiangshan processor has 192 integer physical registers and 192 floating point physical register files (see https:// xingshan-doc: readthetadocs. Io/zh _ CN/latest/arch /).
In the RISC-V architecture, because independent floating point registers and vector registers exist at the same time, corresponding physical register file resources are required to be provided to store respective data in the architecture design. If out-of-order vector and floating point architectures need to be supported, the corresponding physical register file resources also need to be increased.
In the prior art, different types of architectural registers usually have their own physical register files, and a high-performance processor often needs to use a deeper out-of-order issue and a larger parallel issue width to meet performance requirements, so that a larger number of physical registers than the number of architectural registers are needed for support, which can be referred to in detail in the above examples. Therefore, it is common for processors to support high performance floating point and vector processing by providing physical registers for floating point registers and vector registers that far exceed the number of architectural registers to provide as much out of order processing capability as possible. However, due to the difference between the application scenarios of the floating point instruction and the vector instruction, the processor often processes a large number of floating point instructions or a large number of vector instructions at the same time, but the situation that a large number of floating point instructions and vector instructions are mixed is not likely to occur. Thus, there may be instances when the floating point physical register file is occupied by a large number of floating point instructions, the vector register file is now in an idle or underutilized state, and vice versa.
While this architecture improves processor performance by providing a large amount of physical register file resources, the large amount of physical register file resources are often under-utilized.
The relevant prior art can be found in the patent literature: CN1160622C, CN1983164A, CN100561461C, CN101290567A, CN102262525A, CN108845826A, CN109508206A, CN112346783A, CN112506468A, CN114020328A, WO2022120722A1.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides a physical register file allocation apparatus for RISC-V vectors and floating point registers, which adopts a mode of sharing a physical register file by a floating point register and a vector register, and simultaneously, carries table data from the shared physical register file to an exclusive physical register file through a monitoring return control logic, so as to solve the problem of insufficient resource utilization of a large amount of existing physical register files, and adopts the following technical scheme:
a physical register file allocation apparatus for RISC-V vector and floating point registers, comprising:
the decoding logic is used for decoding the input instruction to obtain instruction information, wherein the instruction information at least comprises an instruction type, a source operand index and a destination register index;
a renaming table for recording the mapping relation between the architecture register and the physical register file;
the physical register files are at least divided into three groups, one group is a floating point exclusive physical register file, the other group is a vector exclusive physical register file, and the other group is a shared physical register file; wherein the floating point exclusive physical register file is allocated only for use by floating point architectural registers, wherein the vector exclusive physical register file is allocated only for use by vector architectural registers, and wherein the shared physical registers are allocable for use by floating point architectural registers and vector architectural registers;
the register file allocation and release control logic is responsible for the allocation and release of the table entries of the physical register file;
monitoring a return control logic, monitoring whether the shared physical register file has an effective table entry which is not required to be used, monitoring whether the floating point exclusive physical register file and the vector exclusive physical register file have idle table entries, and sending corresponding table entry data and a transport request to a renaming allocation control logic;
the renaming allocation control logic judges whether the list item of the physical register file needs to be allocated for storing the instruction result or not according to the instruction information obtained by the decoding logic, judges the corresponding physical register file which needs to be allocated, and updates the renaming list according to the list item allocated by the register file allocation and release control logic; and determining whether table entry data of the shared physical register file needs to be transported to the table entry of the exclusive physical register file according to feedback of the monitoring return control logic;
and the renaming allocation control logic accesses the renaming table according to the source operand index, acquires the physical register file index corresponding to the source operand and transmits the physical register file index to the transmitting logic.
The invention has the following beneficial technical effects:
the exclusive physical register file provides necessary register file resources for the floating point register and the vector register to store data respectively, and the shared physical register file can be flexibly distributed to the floating point register and the vector register for use; meanwhile, through monitoring the return control logic, the information which is not needed by the shared physical register file can be returned to the corresponding independent physical register file in time, and the shared table items are released in time for subsequent instructions to use, so that the utilization efficiency of the shared physical register is effectively improved. In addition, when the processor does not use a certain group of architectural registers (floating points or vectors) for a long time, the registers stored in the shared physical register file can be transferred to the corresponding independent physical registers in a transfer mode, and the power consumption of the group of registers is reduced by adopting a low-power-consumption mode such as clock off.
Drawings
The invention is further described below with reference to the accompanying drawings:
FIG. 1 is a schematic diagram of a rename table operating mechanism;
FIG. 2 is a diagram of a physical register file allocation apparatus for RISC-V vector and floating-point registers according to the present invention;
fig. 3 is a schematic diagram of table entries being carried when the present invention is FLEN = VLEN = DPLEN;
fig. 4 is a schematic diagram of table entry handling when 2 × flen = vlen = dplen according to the present invention;
fig. 5 is a schematic diagram of table entry transportation when 4 × flex =2 × dplen =vlen according to the present invention.
Detailed Description
The specific implementation mode of the invention has the following parameter setting and condition setting:
the invention is carried out aiming at processor out-of-order emission architectures of RISC-V vector extension instruction set (vector extension) and RISC-V floating point instruction set;
the physical register file resources used in the present invention are: the number of the floating-point exclusive physical register file entries is Nfp, the bit width is the bit width FLEN of the floating-point register, namely the resource is Nfp FLEN-bit; vector-independent physical register file table entries Nvec (Nfp is more than or equal to the number of vector registers), the bit width is a bit width VLEN of the vector register (which can also be regarded as a bank of K DPLENs, K DPLEN = VLEN, and DPLEN is an access bit width and an execution bit width of a vector instruction at each actual time), namely the resource is Nvec VLEN-bit; the number of entries of the shared physical register file is Nshare, the bit width is DPLEN × K, that is, the resource is Nshare × DPLEN × K-bit. In this embodiment, DPLEN = VLEN, and K =1 is taken as an example;
the invention can process m (m is more than or equal to 1) instructions at the same time, and the number of the instructions processed at the same time does not hinder the feasibility of the invention.
Referring to FIG. 2, the present invention provides a physical register file allocation apparatus for RISC-V vector and floating-point register, comprising:
decoding logic
Decoding the received instruction to obtain necessary key instruction information, which at least comprises:
whether it is a vector instruction;
whether it is a floating point instruction;
source operand and destination register valid information and indices of vector instructions;
the source operand and destination register valid information and index for the floating point instruction.
Renaming allocation control logic, renaming table and physical register file
The renaming table records the mapping relationship between the architectural register and the physical register file.
The renaming table comprises a floating point renaming table and a vector renaming table, the floating point renaming table records the mapping relation among the floating point architectural registers, the floating point exclusive physical register file and the shared physical register file, and the vector renaming table records the mapping relation among the vector architectural registers, the vector exclusive physical register file and the shared physical register file.
The physical register files are divided into at least three groups, one group is a floating point exclusive physical register file, the other group is a vector exclusive physical register file, and the other group is a shared physical register file; wherein the floating point exclusive physical register file is allocated for use only with floating point architectural registers, wherein the vector exclusive physical register file is allocated for use only with vector architectural registers, and wherein the shared physical registers are allocable for use with both floating point architectural registers and vector architectural registers.
Renaming allocation control logic, which judges whether the list item of the physical register file needs to be allocated for storing the instruction result or not according to the instruction information obtained by the decoding logic, judges the corresponding physical register file which needs to be allocated, and updates the renaming list according to the list item allocated by the register file allocation and release control logic; and determining whether the table entry data of the shared physical register file needs to be moved to the table entry of the exclusive physical register file according to the feedback returned by the monitoring to the control logic; and the renaming allocation control logic accesses the renaming table according to the source operand index, acquires a physical register file index corresponding to the source operand and transmits the physical register file index to the transmitting logic.
The renaming allocation control logic performs the following operations according to the instruction information and the monitoring return control logic:
(1) And acquiring a corresponding physical register index from a corresponding floating point or vector rename table according to the type and index of the source operand.
(2) Determining allocation of the physical register file according to the type of the destination register and the physical register state information provided by the register file allocation and release control logic:
(2.1) if the floating point register needs to be written back and an idle floating point exclusive physical register file table entry exists, distributing a corresponding table entry and updating a floating point rename table;
(2.2) if the vector register needs to be written back and an idle vector exclusive physical register file entry exists, distributing a corresponding entry and updating a vector rename table;
(2.3) if the floating point register needs to be written back without the free floating point exclusive physical register file entry, or the vector register needs to be written back without the free vector exclusive physical register file entry, allocating the shared physical register file entry and updating the corresponding renaming table.
(3) Determining whether the table entry of a specific shared physical register file needs to be transported to a specific exclusive physical register file table entry according to feedback returned by the monitoring control logic, and if the shared physical register file has effective table entries which do not need to be used and the exclusive physical register file has idle table entries, determining the transport opportunity according to the current state of the renaming allocation control logic:
(3.1) carrying opportunity is that if the renaming distribution control logic does not have instructions to be processed currently, corresponding list items are distributed directly according to feedback of the monitoring return control logic, the corresponding renaming list is updated, and the list items are updated in the form of the instructions;
(3.2) carrying opportunity, namely if the renaming allocation control logic is currently processing the vector instruction and the floating point register to be carried is a floating point register, allocating a corresponding table entry by using an idle floating point exclusive physical register file, updating the floating point renaming table and updating the table entry in the form of the instruction;
and (3.3) carrying time is that if the renaming allocation control logic is currently processing the floating point instruction and the vector register to be carried is a vector register, the idle vector exclusive physical register file is used for allocating a corresponding table entry, the vector renaming table is updated, and the table entry is updated in the form of the instruction.
(III) Transmit logic
And after the renaming table is searched and updated according to the instruction type and the information, sending the corresponding instruction information and the physical register file index to the transmitting logic for transmitting the instruction, and simultaneously reading the corresponding physical register file list item after the instruction is transmitted.
(IV) execution logic
After the instruction is transmitted and read data, the instruction is transmitted to the execution logic to finish the execution, and the completion information and the data to be written back are fed back to the completion and write-back logic.
(V) write back and completion logic
And writing the data back to the corresponding physical register file according to the index information of the physical register file allocated before the instruction. And meanwhile, the completion information of the instruction is fed back to the register file allocation and release control logic.
(VI) register File Allocation and Release control logic
The register file allocation and release control logic maintains a state table of three different physical register files, recording whether each physical register is occupied and which instruction can be released after execution is complete. Depending on the instruction completion fed back by the completion and write-back logic, the control logic will release the physical register entry that no longer needs to be taken. In addition, the control logic allocates the idle table entries and updates the corresponding state table according to the allocation request of the rename allocation control.
(VII) monitoring Return control logic
The monitoring return control logic is responsible for judging whether the shared physical register has valid data which is no longer required to be used or not and whether the corresponding exclusive physical register has a released table entry or not according to the state tables of the three groups of physical register files and the instruction stream information of the current processor. And if the conditions are met, sending the shared physical register table entry information needing to be transported and the transport request to the renaming allocation control logic.
The working flow of the invention is shown in figure 2: after receiving the instruction, the decoding logic decodes the instruction to obtain the instruction type, the operand index information and the destination register index information. After receiving the decoding information, the renaming allocation control logic searches the corresponding floating point renaming table or vector renaming table, acquires the mapping information of the operand and returns the mapping information to the transmitting logic, and the mapping information is used for reading the correct operand from the three physical register files. At the same time, the renaming allocation control logic requests the allocation of a new physical register table entry to the register file allocation and release control logic according to the instruction type and the destination register index information. The register file allocation and release control logic judges whether idle table entries exist according to the state tables of the three groups of physical register files, and if the idle table entries exist, the allocated table entry information is returned to the renaming allocation control logic; if not, the table entry needs to be allocated after being released. At this point, the instruction will stay on the rename allocation control logic waiting for the allocation to complete. After the allocation is completed, the renaming allocation control logic updates the physical register table entry information allocated to the destination register to the floating point renaming table or the vector renaming table, and sends the information to the transmitting logic. After obtaining operands from the three physical register file table entries, the issue logic issues the operands to corresponding floating point execution logic or vector execution logic for execution according to the instruction type. The write-back and completion logic writes back an execution result to a corresponding physical register table item according to the information of the physical register table item allocated by the instruction, and feeds back the information indicating that the instruction is completed to the register file allocation and release control logic, and the register file allocation and release logic updates a corresponding state table and releases the table items which are not required to be used any more. In the whole process, the monitoring return control logic collects the state tables of three groups of physical register files in the register file allocation and release control logic in real time and current instruction stream information, and judges whether a certain effective table entry in the shared physical register file can be carried to an idle table entry of a floating point or vector exclusive physical register file. And once the condition that the table entries can be carried is monitored, the monitoring return control logic sends carrying requests and carrying information to the renaming allocation control logic, the renaming allocation control logic updates mapping table relations after receiving the requests and the information, and instructs the transmitting logic to read data from corresponding physical register file table entries in the form of operand reading, and write the data back to the exclusive physical register file to be carried through write-back and completion logic.
Taking the mapping table shown in fig. 3 as an example, when the processor executes a floating point program first, each floating point instruction is decoded by the decoding logic to obtain a corresponding floating point source operand index and a floating point destination register index, the renaming assignment control logic accesses the renaming table according to the floating point source operand index, obtains a physical register file index corresponding to the source operand and transmits the physical register file index to the launch logic for subsequent data acquisition, and assigns an idle floating point exclusive physical register file entry to a corresponding floating point destination register. As instructions execute, more and more floating point exclusive physical register file entries are allocated. When the floating point exclusive physical register file entry is limited and the release speed is slow, it may happen that the floating point exclusive physical register file entry is completely allocated without a free entry. At this time, when the renaming allocation control logic detects that the floating point exclusive register file entry is full, the floating point destination register of the new instruction, which needs to be written back, is mapped to the shared physical register file. Thus, when all floating point instructions of a segment of a program have been executed, a portion of the floating point architectural registers are mapped to floating point shared only physical register file entries (i.e., blocks marked with slanted lines in the floating point shared physical register entries of FIG. 3), a portion of the floating point architectural registers are mapped to shared register file entries (i.e., blocks marked with slanted lines in the shared physical register entries of FIG. 3), and a portion of the slower floating point shared only physical register file entries are eventually released because the execution of the floating point program is completed (i.e., blocks marked with white color in the floating point shared physical register entries of FIG. 3).
At this point, if the processor begins processing a vector program, similar to a floating point program, more and more vector-exclusive physical register file entries are allocated over time. When the vector exclusive physical register file entry is limited and the release rate is slow, it may happen that the vector exclusive physical register file entry is completely allocated without a free entry (i.e., the gray-marked block in the vector exclusive physical register entry in fig. 3). As the vector program continues to execute, the rename allocation control logic allocates the destination register of the vector translated by the decode logic to the shared physical register file entry, and may allocate more and more shared physical register files for out-of-order performance. If the floating-point architectural registers still occupy the shared physical register file (i.e., the diagonal blocks in the shared physical register entry in FIG. 3), the vector registers may not fully utilize the shared physical register file resources and performance may be affected. However, since the processor is executing a large number of vector programs, the floating point architectural registers are in a quiescent state (no new allocation and release requirements). Therefore, at this time, the monitoring return control logic monitors whether there is a valid entry that is not needed to be used in the entries of the shared physical register file (i.e., the square of the slash in the shared physical register entry in fig. 3), if there is a valid entry, it determines whether a floating point architecture register or a vector architecture register is mapped in the entries, if there is a floating point architecture register, it further monitors and determines whether there is a free entry in the floating point exclusive physical register file, if there is a free entry, the data in the floating point architecture register is transferred from the shared physical register file entry to a free entry in the floating point exclusive physical register (as shown by an arrow in fig. 3), and the mapping relationship is changed. Similarly, if the register is judged to be a vector architecture register, whether idle entries exist in the vector exclusive physical register file entries is further monitored and judged, and if the idle entries exist, similar carrying and mapping relation change are carried out.
By the mode of detecting the handling, the shared physical register resource is released, the vector (or floating point) program which is executing has more sufficient physical register file resources to use, so that the utilization efficiency of the resources is improved, and the idle of the write port of the floating point (or vector) special register file does not conflict with the instruction which is being executed by the program to influence the operation of the normal program.
It should be noted that, since the bit width of the vector register and the bit width of the floating-point register are both configurable, the width of the entries of their respective exclusive physical register files is not necessarily the same, and the entries of the shared physical register files may not be the same, which is feasible in the following cases.
(1) FLEN = VLEN = DPLEN, and the simplest scheme is to set three physical register files to be the same bit width, and use the most direct one-to-one mapping relationship, as shown in fig. 3. At this time, the length of the vector architecture register is equal to that of the floating point architecture register, so that the three types of physical register file table entries are mapped in such a way that one physical register table entry corresponds to one architecture register.
(2) 2 FLEN = VLEN = dplen, the floating-point exclusive physical register file bit width may be configured according to the FLEN, and the vector exclusive physical register file bit width and the shared physical register file bit width may be configured according to the VLEN, as shown in fig. 4. A floating point architectural register may be mapped to a floating point exclusive physical register entry, or to the upper or lower half of a shared physical register file entry, as indicated by the shaded blocks in the figure; while a vector architectural register may map to either a vector exclusive or a shared physical register file table entry, as indicated by the grey-colored blocks in the figure. The advantage of this approach is that although the vector architecture registers occupy wider resources of the shared physical register file due to the larger VLEN, since one shared physical register file can map two floating point architecture register files, it is equivalent to have 2 times of the physical register file entry resources available when executing floating point programs, and compared with sharing the resources to the vector architecture registers, the sharing approach improves the resource utilization efficiency better.
(3) 4 flen =2 DPLEN = VLEN, and the bit width VLEN of the vector register is wider, but each vector register with VLEN bit width is usually divided into two Blocks (BANK), BANK0 and BANK1, and each BANK has bit width DPLEN. The vector instruction is also split into two microinstructions that access DPLEN bit wide data for BANK0 and BANK1, respectively. The renaming table also maps the vector architecture register and the physical register file by taking the BANK as a unit, so that the vector architecture register is mapped to the vector exclusive physical register and the shared physical register in a BANK mode, the shared physical register file BANK1 and the vector exclusive physical register BANK1 are mapped correspondingly to the BANK1 only distributed to the vector architecture register, and the shared physical register file BANK0 and the vector exclusive physical register BANK0 are mapped correspondingly to the BANK0 only distributed to the vector architecture register. While the shared physical register file may be fully allocated for use with floating point registers, for example, both BANK0 and BANK1 of the shared physical register file may map to two floating point architectural registers, as shown in fig. 5.
In addition, in order to make full use of the shared physical registers as much as possible, the number of the floating-point exclusive physical register file entries should be set to 32, so as to ensure that all 32 floating-point registers can be placed therein, and when a large number of vector instructions are processed, all floating-point architectural registers can be mapped to the floating-point exclusive physical register file, so that the vector architectural registers can make full use of the shared physical registers; the number of the file entries of the vector exclusive physical register is set to be 32, so that all 32 vector registers can be placed at the file entries, and the shared physical register can be fully used when a large floating point instruction is processed; the number of entries of the shared physical register can be set according to the performance requirement of the processor.
The above is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto. On the basis of the present invention, any replacement of the features that can be imagined by a person of ordinary skill in the art without creative work when an act of claiming an infringement occurs by realizing substantially the same function and achieving substantially the same effect by substantially the same means is included in the protection scope of the present invention.

Claims (13)

1. A physical register file allocation apparatus for RISC-V vector and floating point registers, comprising:
the decoding logic is used for decoding the input instruction to obtain instruction information, wherein the instruction information at least comprises an instruction type, a source operand index and a destination register index;
a renaming table for recording the mapping relation between the architecture register and the physical register file;
the physical register files are at least divided into three groups, one group is a floating point exclusive physical register file, the other group is a vector exclusive physical register file, and the other group is a shared physical register file; wherein the floating point exclusive physical register file is allocated for use only by floating point architectural registers, wherein the vector exclusive physical register file is allocated for use only by vector architectural registers, wherein the shared physical registers are available for use by floating point architectural registers and vector architectural registers;
register file allocation and release control logic responsible for the allocation and release of entries of the physical register file;
monitoring return control logic, monitoring whether the shared physical register file has effective table items which are not required to be used, monitoring whether the floating point exclusive physical register file and the vector exclusive physical register file have idle table items, and sending corresponding table item data and a carrying request to renaming allocation control logic;
the renaming allocation control logic judges whether the table entry of the physical register file needs to be allocated for storing an instruction result or not according to the instruction information obtained by the decoding logic, judges the corresponding physical register file needing to be allocated, and updates the renaming table according to the table entries allocated by the register file allocation and release control logic; and determining whether table entry data of the shared physical register file needs to be transported to the table entry of the exclusive physical register file according to feedback of the monitoring return control logic;
and the renaming allocation control logic accesses the renaming table according to the source operand index, acquires the physical register file index corresponding to the source operand and transmits the physical register file index to the transmitting logic.
2. A RISC-V vector and floating point register file allocation apparatus as claimed in claim 1, wherein said renaming table comprises a floating point renaming table recording a mapping between floating point architectural registers and a floating point exclusive physical register file and a shared physical register file, and a vector renaming table recording a mapping between vector architectural registers and a vector exclusive physical register file and a shared physical register file.
3. The apparatus of claim 2, wherein the renaming allocation control logic determines that when the physical register file is allocated, if a floating point register needs to be written back and there is a free floating point-independent physical register file entry, allocates a corresponding entry and updates the floating point renaming table.
4. A physical register file allocation apparatus for RISC-V vectors and floating point registers as claimed in claim 2, wherein said rename allocation control logic is operable to allocate a corresponding entry and update said vector rename table if a write back of a vector register is required and there is a free physical register file entry exclusively shared by said vector when said physical register file allocation is determined.
5. A RISC-V vector and floating point register physical register file allocation apparatus as claimed in claim 2, wherein said rename allocation control logic, when determining said physical register file allocation, allocates a shared register file entry and updates a corresponding rename table if a floating point register needs to be written back without a free floating point exclusive physical register file entry or a vector register needs to be written back without a free vector exclusive physical register file entry.
6. The apparatus of claim 1, wherein if both the shared physical register file has valid entries that do not need to be used and the exclusive physical register file has free entries, then the shift timing is determined according to the current state of the renaming assignment control logic.
7. The apparatus of claim 6, wherein the handling timing is to directly allocate the corresponding entries according to the feedback of the monitoring return control logic, update the corresponding rename tables, and update the entries by the form of instructions, if the rename allocation control logic does not currently have instructions to be processed.
8. The apparatus of claim 6, wherein the shifting timing is to use the free floating point exclusive physical register file to allocate corresponding entries, update the floating point rename table, and update entries by instruction if the rename allocation control logic is currently processing vector instructions and the floating point registers are to be shifted.
9. The apparatus of claim 6, wherein the transfer timing is such that if the renaming allocation control logic is currently processing floating point instructions and the vector registers to be transferred, the vector renaming table is updated by allocating corresponding entries using the vector-only physical register file that is idle, and the entries are updated by instruction.
10. A physical register file allocation apparatus for RISC-V vectors and floating point registers as claimed in claim 1, wherein said register file allocation and release control logic maintains and updates a state table for three different sets of physical register files, said state table recording whether each physical register table entry is occupied and whether it can be released after completion of instruction execution.
11. A RISC-V vector and floating point register directed physical register file allocation apparatus as claimed in any one of claims 1 to 10, wherein said issue logic receives instruction information, via said rename table lookup and updated source operand index and destination register index, accesses corresponding said physical register file table entry to read source operand data and issues it to the execution logic.
12. A RISC-V vector and floating point register file allocation apparatus as claimed in claim 11, wherein said execution logic includes floating point execution logic and vector execution logic, said floating point execution logic being adapted to receive floating point instructions and operands and to perform corresponding instruction operations with results fed back to write back and completion logic; the vector execution logic is configured to receive vector instructions and operands, execute corresponding instruction operations, and feed results back to the write back and completion logic.
13. The apparatus of claim 12, wherein the write-back and completion logic receives completed instruction information and instruction results fed back by the execution logic and writes instruction results back to the corresponding physical register file while returning instruction information to the register file allocation and release control logic.
CN202211397078.2A 2022-11-09 2022-11-09 Physical register file allocation device for RISC-V vector and floating point register Active CN115437691B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211397078.2A CN115437691B (en) 2022-11-09 2022-11-09 Physical register file allocation device for RISC-V vector and floating point register

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211397078.2A CN115437691B (en) 2022-11-09 2022-11-09 Physical register file allocation device for RISC-V vector and floating point register

Publications (2)

Publication Number Publication Date
CN115437691A true CN115437691A (en) 2022-12-06
CN115437691B CN115437691B (en) 2023-01-31

Family

ID=84252918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211397078.2A Active CN115437691B (en) 2022-11-09 2022-11-09 Physical register file allocation device for RISC-V vector and floating point register

Country Status (1)

Country Link
CN (1) CN115437691B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116560729A (en) * 2023-05-11 2023-08-08 北京市合芯数字科技有限公司 Register multistage management method and system of multithreaded processor

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6092175A (en) * 1998-04-02 2000-07-18 University Of Washington Shared register storage mechanisms for multithreaded computer systems with out-of-order execution
US20010004755A1 (en) * 1997-04-03 2001-06-21 Henry M Levy Mechanism for freeing registers on processors that perform dynamic out-of-order execution of instructions using renaming registers
CN1866237A (en) * 2005-05-19 2006-11-22 国际商业机器公司 Methods and apparatus for sharing processor resources
US20110087865A1 (en) * 2009-10-13 2011-04-14 International Business Machines Corporation Intermediate Register Mapper
CN108920188A (en) * 2018-07-03 2018-11-30 中国人民解放军国防科技大学 Method and device for expanding register file
CN110352403A (en) * 2016-09-30 2019-10-18 英特尔公司 Graphics processor register renaming mechanism
CN110520837A (en) * 2017-04-18 2019-11-29 国际商业机器公司 The register context reduction restored based on renaming register
CN112181494A (en) * 2020-09-28 2021-01-05 中国人民解放军国防科技大学 Method for realizing floating point physical register file
CN114356420A (en) * 2021-12-28 2022-04-15 海光信息技术股份有限公司 Instruction pipeline processing method and device, electronic device and storage medium
CN115248701A (en) * 2022-09-21 2022-10-28 进迭时空(杭州)科技有限公司 Zero-copy data transmission device and method between processor register files

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010004755A1 (en) * 1997-04-03 2001-06-21 Henry M Levy Mechanism for freeing registers on processors that perform dynamic out-of-order execution of instructions using renaming registers
US6092175A (en) * 1998-04-02 2000-07-18 University Of Washington Shared register storage mechanisms for multithreaded computer systems with out-of-order execution
CN1866237A (en) * 2005-05-19 2006-11-22 国际商业机器公司 Methods and apparatus for sharing processor resources
US20110087865A1 (en) * 2009-10-13 2011-04-14 International Business Machines Corporation Intermediate Register Mapper
CN110352403A (en) * 2016-09-30 2019-10-18 英特尔公司 Graphics processor register renaming mechanism
CN110520837A (en) * 2017-04-18 2019-11-29 国际商业机器公司 The register context reduction restored based on renaming register
CN108920188A (en) * 2018-07-03 2018-11-30 中国人民解放军国防科技大学 Method and device for expanding register file
CN112181494A (en) * 2020-09-28 2021-01-05 中国人民解放军国防科技大学 Method for realizing floating point physical register file
CN114356420A (en) * 2021-12-28 2022-04-15 海光信息技术股份有限公司 Instruction pipeline processing method and device, electronic device and storage medium
CN115248701A (en) * 2022-09-21 2022-10-28 进迭时空(杭州)科技有限公司 Zero-copy data transmission device and method between processor register files

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
方颖等: "一种基于可配置共享寄存器堆的多核处理器核间数据交换结构设计", 《微电子学与计算机》 *
王向前等: "分簇结构向量寄存器分配策略研究", 《单片机与嵌入式系统应用》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116560729A (en) * 2023-05-11 2023-08-08 北京市合芯数字科技有限公司 Register multistage management method and system of multithreaded processor
CN116560729B (en) * 2023-05-11 2024-06-04 北京市合芯数字科技有限公司 Register multistage management method and system of multithreaded processor

Also Published As

Publication number Publication date
CN115437691B (en) 2023-01-31

Similar Documents

Publication Publication Date Title
US11204769B2 (en) Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN108376097B (en) Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9990200B2 (en) Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
CN100478871C (en) System and method for lifetime counter design for handling instruction flushes from a queue
US8418180B2 (en) Thread priority method for ensuring processing fairness in simultaneous multi-threading microprocessors
US6574725B1 (en) Method and mechanism for speculatively executing threads of instructions
JP2004326738A (en) Simultaneous multi-thread processor
CN115437691B (en) Physical register file allocation device for RISC-V vector and floating point register
CN118295710B (en) Space recovery method, device, equipment and medium for multi-port transmission
EP4495762A1 (en) Computing chip and instruction processing method
EP1913474B1 (en) Dynamically modifying system parameters based on usage of specialized processing units
CN116257350B (en) Renaming grouping device for RISC-V vector register
US5765017A (en) Method and system in a data processing system for efficient management of an indication of a status of each of multiple registers
KR100861701B1 (en) Register Renaming System and Method Based on Similarity of Register Values
CN118295711B (en) Space allocation method, device, equipment and medium for multi-port transmission
JP2002229780A (en) Executing mechanism for large-scale data pass architecture
KR102170966B1 (en) Apparatus and method for managing reorder buffer of high-performance out-of-order superscalar cores
JP4631442B2 (en) Processor
CN118568008A (en) Space recovery method, device, equipment and medium for dual-port emission
CN117369878A (en) Instruction processing method and device of double-emission pipeline, electronic equipment and medium
CN118747086A (en) Dual-port transmission space allocation method, device, equipment and medium
CN119512623A (en) Vector processor, operation method of vector processor and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant