Vector data access method and system under riscv-v instruction set architecture
Technical Field
The invention relates to the technical field of processors, in particular to a vector data access method and system under a riscv-v instruction set architecture.
Background
In the prior art, the following defects exist:
1. the vector data path often depends on the front end of the pipeline in processing flow. Once the processor architecture is changed, the vector data path is often changed along with the change, the independence is poor, and the transportability is poor;
2. one set of codes can only generate one set of hardware, and the universality is poor;
3. one VLEN parameter configuration corresponds to a vector data path structure, the application scene is single, and the iteration cost of the processor is high;
4. when VLEN is large, the data path is generally non-pipelined and has long execution beats, low processor performance, or large area design, which is not friendly to small-scale processors.
5. The design is complex, the bug convergence is slow, and the design period of the processor is long.
Thus, the present application is directed to the riscv-v instruction set, implementing a flexibly configurable (VLEN, DLEN) vector datapath; a set of relatively independent vector data paths is designed, the method is applicable to different processor design scenes, the interface is simple, code transplantation is facilitated, and processor iteration is facilitated.
Disclosure of Invention
Aiming at the defects of the prior art, the invention discloses a vector data access method and a system under a riscv-v instruction set architecture, which are used for realizing a flexibly configurable (VLEN, DLEN) vector data access aiming at a riscv-v instruction set; a set of relatively independent vector data paths is designed, the method is applicable to different processor design scenes, the interface is simple, code transplantation is facilitated, and processor iteration is facilitated.
The invention is realized by the following technical scheme:
the invention discloses a vector data access method under a riscv-v instruction set architecture, which comprises the following steps of:
the S1 processor is operative to issue command information to VUD via the command issue unit ISS;
s2 utilizes VUD to receive instruction information from ISS components and read source operands from VRFs;
s3, splitting the instruction after the register reading into n uops in the REISSUE, and sending the uops to VBOB to be executed in sequence;
s4 retrieves VBOB execution results from the pack station while writing back to VRF and reporting completion status to ROB.
Further, in the method, the data widths of VRF, REISSUE and PACKAGE are all VLEN.
Further, VLEN is a vector register width.
Further, in the method, the data width of VBOB is DLEN.
Further, the DLEN is a vector datapath width.
Further, in the method, the PAKAGE station is temporally coincident with VBOB.
Further, in the method, VUD is a vector non-access instruction data path.
Furthermore, in the method, the Vector data path is executed in a pipelining mode, the execution beat is short, and DLEN can be configured to be used for dealing with processor design scenes of different scales.
Furthermore, in the (VLEN, DLEN) configuration, VLEN is greater than or equal to DLEN, and when VLEN is configured as DLEN, the pipeline execution is performed, the execution beat is short, and the method is applied to a scene pursuing high performance; and DLEN < VLEN is configured, the data path area is small, the power consumption is low, and the method is applied to small-scale and low-power-consumption scenes.
In a second aspect, the present invention discloses a vector data path system under a riscv-v instruction set architecture, where the system is configured to implement the vector data path method under the riscv-v instruction set architecture in the first aspect, and includes an instruction issue unit ISS, a vector register file VRF, a reorder buffer unit ROB, and a vector non-access instruction data path VUD.
The invention has the beneficial effects that:
1. the vector data path is simple in design, and is relatively independent of the front section of the pipeline, so that the transportability is strong.
2. The invention has one set of codes corresponding to a plurality of sets of hardware and strong universality.
3. The VLEN is completely configurable, the application scene is flexible, and the iteration cost of the processor is low.
4. The Vector data path is executed in a pipelining mode, the execution beat is short, the performance of the processor is high, DLEN can be configured, and the Vector data path can be used for processor design scenes of different scales.
5. The invention has simple design, fast bug convergence and short processor design period.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram of the principle steps of a vector data path method under a riscv-v instruction set architecture;
FIG. 2 is a schematic diagram of a vector datapath system under a riscv-v instruction set architecture;
fig. 3 is a schematic diagram of the execution of the embodiment VUD of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The embodiment discloses a vector data path method under a riscv-v instruction set architecture as shown in fig. 1, which includes the following steps:
the S1 processor is operative to issue command information to VUD via the command issue unit ISS;
s2 utilizes VUD to receive instruction information from ISS components and read source operands from VRFs;
s3, splitting the instruction after the register reading into n uops in the REISSUE, and sending the uops to VBOB to be executed in sequence;
s4 retrieves VBOB execution results from the pack station while writing back to VRF and reporting completion status to ROB.
In this embodiment, the data widths of VRF, REISSUE, and pack are VLEN. Where VLEN is the vector register width.
In this embodiment, the data width of VBOB is DLEN. Where DLEN is the vector datapath width.
In this embodiment, the PAKAGE station is temporally coincident with VBOB.
In this embodiment, VUD is a vector non-access instruction data path.
In this embodiment, the Vector data path is executed in a pipelined manner, the execution beat is short, and DLEN is configurable and is used for dealing with processor design scenarios of different scales.
Example 2
The embodiment discloses a vector data path system under a riscv-v instruction set architecture as shown in fig. 2, which includes an instruction issue unit ISS, a vector register file VRF, a reorder buffer unit ROB, and a vector non-access instruction data path VUD.
In this embodiment, iss (instruction issue) is an instruction issue Unit, vrf (vector Register file) is a vector Register file, a single Register stores data of VLEN length, ROB (Re-order Buffer) reorder Buffer Unit, and vud (vector Unit datapath) is a vector non-access instruction data path.
In this embodiment, vlen (vector register length) is the vector register width, and dlen (datapath length) is referred to herein as the vector datapath width. VRF and REISSUE, PACKAGE has a data width of VLEN and VBOB has a data width of DLEN.
In this embodiment, VUD executes beats as shown in FIG. 3, VUD receives instruction information from the ISS unit and reads source operands from the VRF.
In this embodiment, the REISSUE splits the instruction into n uops, and sends the n uops to VBOB in sequence for execution, and the executed result is recovered by the pack station, written back to VRF, and reports the completion status to ROB.
In this embodiment, the PAKAGE station is completed with VBOB in time sequence, VUD is running water, and the performance is higher. Interfaces among all components are simple, independence is high, main function realization of the instructions is centralized on the VBOB component, and correctness of instruction functions related to the VLEN is mainly guaranteed by other components. The design structure is convenient for bottom verification, the verification of the sub-modules can be simultaneously carried out in the design process, the bug can be quickly converged, and the design period of the processor is shortened.
Therefore, the VLEN can be configured in the embodiment, the characteristic of vector change is met, and the flexibility and the universality are high; the REISSUE station splits data before executing the instruction, executes the data with the granularity of uop, and recovers the data with the granularity of uop at the PACKAGE station to realize the control of transmitting and recovering the data; DLEN is configurable, namely the area of a vector data path is configurable, and the power consumption is controllable.
In conclusion, the vector data path of the invention has simple design and strong portability compared with the front segment of the independent pipeline. The invention has one set of codes corresponding to a plurality of sets of hardware and strong universality. The VLEN is completely configurable, the application scene is flexible, and the iteration cost of the processor is low. The Vector data path is executed in a pipelining mode, the execution beat is short, the performance of the processor is high, DLEN can be configured, and the Vector data path can be used for processor design scenes of different scales. The invention has simple design, fast bug convergence and short processor design period.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.