WO2022067510A1 - Processor, processing method, and related device - Google Patents
Processor, processing method, and related device Download PDFInfo
- Publication number
- WO2022067510A1 WO2022067510A1 PCT/CN2020/118836 CN2020118836W WO2022067510A1 WO 2022067510 A1 WO2022067510 A1 WO 2022067510A1 CN 2020118836 W CN2020118836 W CN 2020118836W WO 2022067510 A1 WO2022067510 A1 WO 2022067510A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- general
- address
- purpose register
- register
- row
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 12
- 238000000034 method Methods 0.000 claims description 59
- 238000004364 calculation method Methods 0.000 claims description 45
- 238000004590 computer program Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 description 22
- 238000010586 diagram Methods 0.000 description 20
- 101100534231 Xenopus laevis src-b gene Proteins 0.000 description 15
- 230000008569 process Effects 0.000 description 15
- 238000000605 extraction Methods 0.000 description 11
- 239000013598 vector Substances 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/04—Addressing variable-length words or parts of words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
Definitions
- the present invention relates to the field of computer technology, and in particular, to a processor, a processing method and related equipment.
- a common five-stage pipeline may include: an instruction fetch pipeline, an instruction decoding pipeline, a data selection pipeline, an execution pipeline, and a write return pipeline.
- the data selection pipeline it is necessary to read data from the corresponding general-purpose register (General-Purposed Register, GPR) according to the source operand address of each instruction, and the source operand required for the execution of the instruction has been obtained.
- GPR General-Purposed Register
- the calculation result of the instruction needs to be written back to the corresponding GPR.
- the calculation result can be obtained from the GPR, that is, the calculation result can be used as the result of the subsequent instruction. source operand, etc.
- a GPR with a small capacity (such as a 32-byte GPR) often causes the data to be temporarily written into the data memory when the data cannot be written back to the GPR, and then when the subsequent instructions need to use the data. , and then read it from the data memory to the GPR, which will result in frequent data swapping in and out between the data memory and the GPR. If the correlation between instructions is involved, it will cause the pipeline stall (Pipeline stall), reducing processor performance.
- the GPR is directly expanded in order to solve the above-mentioned problem of frequent data swapping in and out, for example, the entire data memory is used as the GPR (for example, the GPR is expanded to 512bytes), the selection range will be greatly increased (from the original 32bytes) The data range is expanded to a data range of 512 bytes), which brings a huge logical cost to the above operand selection.
- Embodiments of the present invention provide a processor, a processing method, and related equipment, which can reduce the cost of operand selection and ensure the processing efficiency of the processor.
- an embodiment of the present invention provides a processor, including a data selection unit, an instruction decoding unit connected to the data selection unit, and a general-purpose register set of M rows*N columns, wherein the M rows*N columns are general-purpose registers
- Each general-purpose register group in the register group includes K general-purpose registers; M, N, and K are integers greater than or equal to 1; wherein, the instruction decoding unit is used to decode the input X instructions, obtain The respective at least one source operand address of the X instructions, a total of Y source operand addresses; and the Y source operand addresses are sent to the data selection unit; the Y source operand addresses are Each source operand address includes at least one column address bit, at least one row address bit and at least one target address bit; the at least one column address bit is used to indicate the general register column to which each source operand address belongs, The at least one row address bit is used to indicate the general register row to which each source operand address belongs; the at least one destination address bit is used to indicate
- the corresponding relationship of the t-th general-purpose register; X, Y, and t are integers greater than or equal to 1; the data selection unit is used for the at least one column address bit included in the i-th source operand address, The at least one row address bit and the at least one target address bit access the general register corresponding to the i-th source operand address in the M row*N column general register group to obtain the corresponding source operand;
- the i-th source operand address is one of the Y source operand addresses; i is an integer greater than or equal to 1 and less than or equal to Y.
- the embodiment of the present invention provides a processor, which realizes reducing the selection cost by narrowing the selection range of the operand, specifically including the division of the general register group and the design of the address of the source operand.
- the processor may include pre-divided M-row*N-column general-purpose register groups, and each general-purpose register group may include at least one general-purpose register.
- the source operand address in this embodiment of the present invention may include at least one column address bit, at least one row address bit, and at least one target address bit.
- the at least one column address bit can be used to indicate the general-purpose register column to which the source operand address belongs (for example, if there are 4 columns of general-purpose registers in total, two binary column address bits can be used to indicate the source operand address belongs to the general-purpose register column), the at least one row address bit can be used to indicate the general-purpose register row to which the source operand address belongs, and the at least one destination address bit can be used to indicate that the source operand address is specifically the first address in a general-purpose register group Several general purpose registers. Therefore, it is realized that the processor can select the operand in the general register row and general register column indicated by each part of the address bits included in the address of each source operand.
- the general-purpose register column to which each source operand address belongs can be determined according to the column address bits, and then in the general-purpose register column to which it belongs, the specific correspondence of each source operand address can be determined through other row address bits and target address bits.
- the general register row to which each source operand address belongs can be determined according to the row address bits, and then in the general register row to which it belongs, each source operation can be determined by other column address bits and destination address bits.
- the corresponding general register of the number address is obtained, and the data in it is obtained, that is, the corresponding source operand is obtained.
- the embodiment of the present invention can greatly narrow the selection range and reduce the logic cost of the selection by performing operand selection in a designated general-purpose register column or a designated general-purpose register row. Therefore, compared with the prior art, if the operand selection is performed directly in the entire data range (that is, in all general-purpose registers), it will bring a great selection cost; if a part of the data is extracted in advance , and then perform operand selection within this data range, which will bring additional hardware costs, etc. Therefore, compared with the aforementioned existing solutions, the embodiments of the present invention can realize that, based on the existing hardware structure, within the original larger data range, for each source operand address, a corresponding smaller one can be determined. The data range is selected, and then the operand selection is performed within the smaller data range, which greatly reduces the selection cost and ensures the execution efficiency of the instruction and the performance of the processor.
- the processor further includes an execution unit connected to the data selection unit, the execution unit includes at least one arithmetic logic unit; the at least one arithmetic logic unit is configured to The source operands corresponding to each of the Y source operand addresses execute the X instructions to obtain respective calculation results of the X instructions.
- the processor may further include an execution unit, and the execution unit may further include one or more arithmetic logic units, through which the one or more arithmetic logic units can respectively execute based on the acquired multiple source operands
- the respective computing tasks corresponding to the multiple instructions such as completing addition, subtraction and other calculations, are obtained to obtain the respective calculation results of the multiple instructions, which further improves the overall execution efficiency of the instructions.
- the processor further includes a result write-back unit connected to the execution unit; the instruction decoding unit is further configured to acquire the respective destination operand addresses of the X instructions, A total of X destination operand addresses; each destination operand address in the X destination operand addresses includes the at least one column address bit, the at least one row address bit and the at least one target address bit; The at least one column address bit is used to indicate the general-purpose register column to which each destination operand address belongs, and the at least one row address bit is used to indicate the general-purpose register to which each destination operand address belongs.
- the at least one target address bit is used to indicate the corresponding relationship between the address of each destination operand and the t-th general-purpose register in the K general-purpose registers;
- the result write-back unit is used for according to The at least one column address bit, the at least one row address bit, and the at least one target address bit included in the jth destination operand address access the jth row address bit in the M row*N column general-purpose register set the general-purpose registers corresponding to the destination operand addresses, and write the calculation result of the jth instruction back into the general-purpose register corresponding to the jth destination operand address; j is greater than or equal to 1, and An integer less than or equal to X.
- the processor may further include a result write-back unit.
- the destination operand address in this embodiment of the present invention may include at least one column address bit, at least one row address bit, and at least one target address bit.
- the at least one column address bit can be used to indicate the general-purpose register column to which the destination operand address belongs (for example, if there are 4 columns of general-purpose registers, two binary column address bits can be used to indicate the destination operand address belongs to the general-purpose register row), the at least one row address bit can be used to indicate the general-purpose register row to which the destination operand address belongs, and the at least one target address bit can be used to indicate that the destination operand address is specifically the No.
- Several general purpose registers can be used to indicate the general-purpose register column to which the destination operand address belongs (for example, if there are 4 columns of general-purpose registers, two binary column address bits can be used to indicate the destination operand address belongs to the general-purpose register row), the at least one row address bit can be used to indicate the general-
- the processor can select the operand in the general register row and general register column indicated by each part of the address bits included in the address of each destination operand.
- the general-purpose register column to which each destination operand address belongs can be determined according to the column address bits, and then in the respective general-purpose register column, the specific correspondence of each destination operand address can be determined through other row address bits and target address bits.
- the general register row to which each destination operand address belongs can be determined according to the row address bits, and then each destination operation can be determined by other column address bits and target address bits in the respective general register row.
- the corresponding general-purpose register of the number address is written, and the respective calculation results of the multiple instructions are written back to the corresponding general-purpose register. Therefore, by performing operand selection in a designated general-purpose register row or a designated general-purpose register row, the selection range is greatly reduced, and the logic cost of the selection is reduced.
- the data selection unit is specifically configured to: determine the ith source operand address according to the at least one column address bit included in the ith source operand address the i'th general-purpose register column to which it belongs, and in the i'th general-purpose register column, according to the at least one row address bit and the at least one destination address included in the i-th source operand address Bit access to the corresponding general-purpose register, i' is an integer greater than or equal to 1 and less than or equal to N; or, according to the at least one row address bit included in the i-th source operand address, determine the The i"th general register row to which the ith source operand address belongs, and in the ith” general register row, according to the at least one column address included in the ith source operand address The bit and the at least one target address bit access the corresponding general-purpose register, and i" is an integer greater than or equal to 1 and less than or equal to M.
- the general-purpose register column to which each source operand address belongs may be determined according to the column address bits, and then in the general-purpose register column to which each source operand address belongs, each source address bit may be determined by other row address bits and target address bits.
- the general register corresponding to the operand address For example, it is determined that the source operand address belongs to the general-purpose register group in the first row by using the aforementioned at least one column address bit.
- the operand address belongs to the 2nd general purpose register in the 1st row general register group.
- the general register row to which each source operand address belongs can be determined according to the row address bits, and then in the general register row to which they belong, the specific address of each source operand can be determined by other column address bits and target address bits.
- the corresponding general-purpose registers For example, it is determined that the source operand address belongs to the general-purpose register group in the first row by using the aforementioned at least one row address bit.
- the operand address belongs to the 2nd general purpose register in the 1st column general register group. In this way, the selection of operands in the general-purpose register group of M rows*N columns is completed according to each part of the address bits in the source operand address, that is, the general register corresponding to the source operand address is determined, which greatly reduces the Choice cost.
- the embodiment of the present application aims to select the number based on the row and column indicated by the address bit through the division of rows and columns, when the operand is selected, so as to narrow the selection range. Therefore, the embodiment of the present application There is no specific limitation on whether the selection range is narrowed by rows or the selection range is narrowed by columns. It can be understood that, in some possible implementations, if each general-purpose register group includes only one general-purpose register group, the source operand address may not include target address bits.
- the result write-back unit is specifically configured to: determine the jth target operand according to the at least one column address bit included in the jth target operand address The j'th general register column to which the address belongs, and in the j'th general register column, according to the at least one row address bit included in the jth target operand address and the at least one target
- the address bit accesses the corresponding general-purpose register, and j' is an integer greater than or equal to 1 and less than or equal to N; or, according to the at least one row address bit included in the jth target operand address, determine The j"th general register row to which the jth target operand address belongs, and in the jth" general register row, according to the at least one column included in the jth target operand address
- the address bit and the at least one target address bit access the corresponding general-purpose register, and j" is an integer greater than or equal to 1 and less than or equal to M.
- the general-purpose register column to which each destination operand address belongs may be determined according to the column address bits, and then in the respective general-purpose register column, other row address bits and target address bits are used to determine each destination The general register corresponding to the operand address. For example, it is determined that the destination operand address belongs to the general-purpose register group of the first row by using the aforementioned at least one column address bit. The operand address belongs to the 2nd general purpose register in the 1st row general register group.
- the general register row to which each destination operand address belongs can be determined according to the row address bits, and then in the respective general register row, the specific address of each destination operand can be determined by other column address bits and target address bits. the corresponding general-purpose registers.
- the destination operand address belongs to the general-purpose register group in the first row by using the aforementioned at least one row address bit.
- the operand address belongs to the 2nd general purpose register in the 1st column general register group. In this way, the selection of operands in the M row*N column general register group is completed according to the address bits of each part of the destination operand address, that is, the general register corresponding to the destination operand address is determined, which greatly reduces the Choice cost.
- any two different source operand addresses among the Y source operand addresses belong to different general register columns; any two among the X destination operand addresses Different destination operand addresses belong to different columns of the general-purpose register.
- each column of general-purpose registers is constrained to select only one operand at a time. If the respective source operand addresses or destination operand addresses of multiple instructions correspond to the same column different general-purpose registers, the processor will be required to access (that is, select) different general-purpose registers in a row of general-purpose registers, and obtain their corresponding source operands, or write calculation results, etc., which often increases the number of selections.
- the logical cost is exactly contrary to the technical problem to be solved by the present invention.
- the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the Y source operand addresses include a plurality of first source operand addresses whose number is greater than the first threshold, and second source operand addresses; the The general-purpose register corresponding to the first source operand address is the target general-purpose register, and the second source operand address belongs to the general-purpose register column where the target general-purpose register is located; the data selection unit is also used for The corresponding temporary register is accessed according to the address of the first source operand to obtain the corresponding source operand.
- the above-mentioned constraint rules set to reduce the cost of selection are likely to lead to the problem of decreased instruction execution efficiency when processing data dependencies. are the same, and the source operands of other instructions processed in parallel in the same stage and the source operands of the multiple instructions correspond to different general-purpose registers of the same register row, then according to the aforementioned constraint rules, the multiple instructions cannot be combined with each other. Other instructions in the same layer are processed in parallel, only by inserting empty instructions and placing the multiple instructions in the next layer for processing. As a result, the instruction utilization rate is reduced and the instruction execution efficiency is reduced. Therefore, the general-purpose register that needs to be frequently referenced and updated can be replaced by a temporary register through the compiler.
- the data in the general-purpose register can be stored in the temporary register, and the processor can access the corresponding temporary register according to the source operand address. , to obtain the corresponding source operand, which is not constrained by the constraint rules and ensures the execution efficiency of the instruction.
- the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the X destination operand addresses include a plurality of first destination operand addresses whose number is greater than the first threshold, and the second destination operand address; the The general-purpose register corresponding to the first destination operand address is the target general-purpose register, and the second destination operand address belongs to the general-purpose register column where the target general-purpose register is located; the result is written back to the unit, and also uses accessing the corresponding temporary register according to the target destination operand address, and writing the corresponding calculation result back into the temporary register.
- the above-mentioned constraint rules set to reduce the cost of selection are likely to lead to the problem of decreased instruction execution efficiency when processing data dependencies. are the same, and the destination operands of other instructions processed in parallel in the same layer and the destination operands of the multiple instructions correspond to different general-purpose registers in the same register row, then according to the aforementioned constraint rules, the multiple instructions cannot be combined with the same layer.
- parallel processing of other instructions only empty instructions can be inserted, and the multiple instructions are placed in the next layer for processing.
- the instruction utilization rate is reduced and the instruction execution efficiency is reduced. Therefore, the general-purpose register that needs to be frequently referenced and updated can be replaced by a temporary register through the compiler.
- the data in the general-purpose register can be stored in the temporary register, and the processor can access the corresponding temporary register according to the destination operand address. , so as to write the corresponding calculation result back into the temporary register, which is not constrained by the constraint rules and ensures the execution efficiency of the instruction.
- the processor further includes an instruction acquisition unit connected to the instruction decoding unit; wherein, the instruction acquisition unit is configured to acquire the X instructions to be executed, and The X instructions are sent to the instruction decoding unit; the X instructions are instructions executed by the processor in parallel within one clock cycle.
- the processor may further include an instruction acquisition unit, through which a plurality of instructions to be executed may be acquired, and the multiple instructions to be executed may be instructions in the same layer, that is, the multiple instructions
- the instructions may be instructions that the processor executes in parallel within one clock cycle (that is, within one beat (cycle)), so that the execution efficiency of the instructions can be improved.
- the access types of the Y source operand addresses and the X destination operand addresses are both direct access types or indirect access types.
- the direct access can directly know the operand address, and the indirect access is a base address plus a variable, the operand address can be finally obtained, so the operand address finally obtained by the indirect access is often very high. May differ from the operand address for direct access, but both belong to the same general register row. In this way, for the same general-purpose register row, two different operands need to be selected at one time, which will increase the logical cost of the selection. However, if they are both direct access or indirect access, according to the aforementioned constraint rules, instructions with different operand addresses but belonging to the same general-purpose register row can be processed at different layers, avoiding the need for a single general-purpose register row in advance. In the case of selecting two different operands, the logical cost of selecting the operands is reduced.
- an embodiment of the present invention provides a processing method, which is applied to a processor, where the processor includes a data selection unit, an instruction decoding unit connected to the data selection unit, and a general-purpose register set of M rows*N columns , each general-purpose register group in the M-row*N-column general-purpose register group includes K general-purpose registers; M, N and K are integers greater than or equal to 1; the method includes:
- the input X instructions are decoded, and at least one source operand address of each of the X instructions is obtained, which is a total of Y source operand addresses; and the Y source operands are address is sent to the data selection unit; each of the Y source operand addresses includes at least one column address bit, at least one row address bit and at least one destination address bit; the at least one column address bit The address bits are used to indicate the general-purpose register column to which each source operand address belongs, and the at least one row address bit is used to indicate the general-purpose register row to which each source operand address belongs; the at least one destination address bit Used to indicate the correspondence between each source operand address and the t-th general-purpose register in the K general-purpose registers; X, Y, and t are integers greater than or equal to 1;
- the data selection unit By the data selection unit, according to the at least one column address bit, the at least one row address bit and the at least one destination address bit included in the i-th source operand address are common in the M rows*N columns Access the general-purpose register corresponding to the i-th source operand address in the register group to obtain the corresponding source operand; the i-th source operand address is one of the Y source operand addresses; i is greater than or an integer equal to 1 and less than or equal to Y.
- the processor further includes an execution unit connected to the data selection unit, the execution unit includes at least one arithmetic logic unit; the method further includes:
- the X instructions are executed based on the source operands corresponding to the Y source operand addresses, respectively, to obtain respective calculation results of the X instructions.
- the processor further includes a result write-back unit connected to the execution unit; the method further includes:
- each destination operand address in the X destination operand addresses includes the at least one column address bit, the at least one row address bit and the at least one target address bit;
- the at least one column address bit is used to indicate the general register column to which each destination operand address belongs, the at least one column address bit
- One row address bit is used to indicate the general-purpose register row to which each destination operand address belongs;
- the at least one target address bit is used to indicate that each destination operand address is associated with the K general-purpose registers. the correspondence of the t-th general-purpose register;
- the result write-back unit according to the at least one column address bit, the at least one row address bit and the at least one target address bit included in the jth destination operand address, in the M row*N column Access the general-purpose register corresponding to the j-th destination operand address in the general-purpose register group, and write the calculation result of the j-th instruction back into the general-purpose register corresponding to the j-th destination operand address ; j is an integer greater than or equal to 1 and less than or equal to X.
- the data selection unit is based on the at least one column address bit, the at least one row address bit and the at least one destination included in the i-th source operand address.
- the address bits access the general-purpose register corresponding to the i-th source operand address in the M-row*N-column general-purpose register group, including:
- the data selection unit determine the i'th general-purpose register column to which the i-th source operand address belongs, and perform a In the i'th general-purpose register column, the corresponding general-purpose register is accessed according to the at least one row address bit and the at least one target address bit included in the i-th source operand address, where i' is: an integer greater than or equal to 1 and less than or equal to N; or,
- the data selection unit determine the i-th general register row to which the i-th source operand address belongs, and set the row in the i-th source operand address.
- the i" th general register row the corresponding general register is accessed according to the at least one column address bit and the at least one target address bit included in the i th source operand address, where i" is: An integer greater than or equal to 1 and less than or equal to M.
- the writing back unit through the result is based on the at least one column address bit, the at least one row address bit and the at least one column address bit included in the jth destination operand address.
- the target address bit accesses the general-purpose register corresponding to the j-th destination operand address in the M-row*N-column general-purpose register group, including:
- the j'th general register column determines the j'th general register column to which the jth target operand address belongs, and In the j'th general register column, the corresponding general register is accessed according to the at least one row address bit and the at least one target address bit included in the jth target operand address, j' is an integer greater than or equal to 1 and less than or equal to N; or,
- the j"th general register row to which the jth target operand address belongs, and In the j"-th general-purpose register row, the corresponding general-purpose register is accessed according to the at least one column address bit and the at least one target address bit included in the j-th target operand address, j" is an integer greater than or equal to 1 and less than or equal to M.
- any two different source operand addresses among the Y source operand addresses belong to different general register columns; any two among the X destination operand addresses Different destination operand addresses belong to different columns of the general-purpose register.
- the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the Y source operand addresses include a plurality of first source operand addresses whose number is greater than the first threshold, and second source operand addresses; the The general-purpose register corresponding to the first source operand address is the target general-purpose register, and the second source operand address belongs to the general-purpose register row where the target general-purpose register is located; the method further includes:
- the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the X destination operand addresses include a plurality of first destination operand addresses whose number is greater than the first threshold, and the second destination operand address; the The general-purpose register corresponding to the first destination operand address is the target general-purpose register, and the second destination operand address belongs to the general-purpose register row where the target general-purpose register is located; the method further includes:
- the corresponding temporary register is accessed according to the target destination operand address, and the corresponding calculation result is written back into the temporary register.
- the processor further includes an instruction acquisition unit connected to the instruction decoding unit; the method further includes:
- the access types of the Y source operand addresses and the X destination operand addresses are both direct access types or indirect access types.
- the present invention provides a semiconductor chip, which may include the processor provided by any one of the implementation manners of the foregoing first aspect.
- the present invention provides a semiconductor chip, which may include: the processor provided by any one of the implementation manners of the first aspect, an internal memory coupled to the multi-core processor, and an external memory.
- the present invention provides a system-on-chip SoC chip, where the SoC chip includes the processor provided by any one of the implementation manners of the first aspect, an internal memory coupled to the processor, and an external memory.
- the SoC chip may be composed of chips, or may include chips and other discrete devices.
- the present invention provides a chip system, where the chip system includes the multi-core processor provided by any one of the implementation manners of the foregoing first aspect.
- the chip system further includes a memory, and the memory is used for saving necessary or related program instructions and data during the operation of the multi-core processor.
- the chip system may be composed of chips, or may include chips and other discrete devices.
- the present invention provides a processing device having a function of implementing any one of the processing methods in the second aspect.
- This function can be implemented by hardware or by executing corresponding software by hardware.
- the hardware or software includes one or more modules corresponding to the above functions.
- the present invention provides a terminal, where the terminal includes a processor, and the processor is the processor provided by any one of the implementation manners of the foregoing first aspect.
- the terminal may also include a memory for coupling with the processor that holds program instructions and data necessary for the terminal.
- the terminal may also include a communication interface for the terminal to communicate with other devices or a communication network.
- the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the process flow of any one of the above-mentioned second aspects. .
- an embodiment of the present invention provides a computer program, where the computer program includes instructions that, when the computer program is executed by a processor, enable the processor to execute the processing method flow described in any one of the second aspect above .
- FIG. 1 is a schematic diagram of an IE structure in the prior art.
- FIG. 2 is a schematic diagram of another IE structure in the prior art.
- FIG. 3 is a schematic diagram of a program split in the prior art.
- FIG. 4 is a schematic diagram of instruction processing based on PV extraction in the prior art.
- FIG. 5 is a schematic diagram of an FV-based instruction processing provided by an embodiment of the present invention.
- FIG. 6 is a schematic structural diagram of a processor according to an embodiment of the present invention.
- FIG. 7 is a schematic structural diagram of another processor according to an embodiment of the present invention.
- FIGS. 8a-8e are schematic diagrams of a division manner of a general-purpose register according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram of a data selection provided by an embodiment of the present invention.
- FIG. 10 is a schematic diagram of a result write-back provided by an embodiment of the present invention.
- FIG. 11 is a schematic flowchart of a processing method provided by an embodiment of the present invention.
- the N-column general-purpose register group in this embodiment of the present application may correspond to N column vectors
- M-row general-purpose registers A group may correspond to M row vectors, where each element in the vector may represent a general-purpose register group.
- the terms "comprising” and “having” and any variations thereof are intended to cover non-exclusive inclusion.
- a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.
- a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
- an application running on a computing device and the computing device may be components.
- One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between 2 or more computers.
- these components can execute from various computer readable media having various data structures stored thereon.
- a component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals) Communicate through local and/or remote processes.
- data packets eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals
- the instruction pipeline is to improve the efficiency of the processor to execute instructions, and divide the operation of an instruction into multiple small steps, and each step is completed by a special circuit. For example, an instruction needs to go through 3 stages to execute: fetch, decode, and execute. Each stage takes one machine cycle. If pipeline technology is not used, then this instruction needs 3 machine cycles to execute; The instruction pipeline technology, then when this instruction completes the "instruction fetch” and then enters the "decoding", the next instruction can be "instructed fetch", which improves the execution efficiency of the instruction. In general, the more pipeline stages, that is, the more small steps an instruction is divided into, the higher the execution efficiency of the instruction.
- FIG. 1 is a schematic structural diagram of an instruction processing engine (Instruction Engine, IE) in the prior art.
- the IE structure of the processor may be as shown in FIG. 1 .
- a processor with a five-stage pipeline is taken as an example.
- the life cycle of an instruction in the pipeline structure may include an instruction fetch pipeline ⁇ an instruction decoding pipeline ⁇ a data selection pipeline ⁇ an execution pipeline ⁇ a write return pipeline, which is the pipeline structure.
- the execution process of an instruction is divided into at least five stages, in which the basic functions of each stage of pipeline are as follows:
- Instruction Fetching is to read instructions from Instruction Memory (IMEM) based on the Program Counter (PC) as an address, and send the instructions to the instruction decoder after completing the verification (Instruction Decoder, ID);
- Instruction decoding pipeline decode the input instruction through the instruction decoder and check the validity, and then extract the command type, operand type, immediate value, source operand address and destination operand address of the instruction;
- Data selection pipeline Data selector (Data Selector, DS) selects data from general-purpose registers (General-Purpose Register, GPR) according to the source operand address obtained in the instruction decoding pipeline, and data correlation analysis is required at the same time; , the data selector is sometimes called multiplexers (Mux); as shown in "4 ⁇ 2 ⁇ 16 bit src operands (source operand)" in Figure 1, it can be shown that the processor can be parallelized in one shot Processing 4 instructions, each instruction can include two source operands, that is, 8 source operands can be obtained from the general-purpose register in one shot, and the size of each source operand can be 16bit, that is, 2byte, etc. etc., will not be repeated here;
- Execution pipeline Through multiple arithmetic logic units, the corresponding arithmetic, logic and shift operations are performed according to the command type and the obtained operands, and the operation results are output; instruction execution refers to the process of performing real operations on instructions . For example, if the instruction is an addition operation instruction, the operand is added; if it is a subtraction operation instruction, the operand is subtracted, etc., which will not be repeated here;
- Write-back refers to the process of writing the result of instruction execution back to the general-purpose register set, or writing to the data memory, or reading data from the data memory and The process of writing back to GPR, that is, performing read/write (load/store) as shown in Figure 1; as shown in "4 ⁇ 16bit dst operands (destination operand)" in Figure 1, it can indicate that the processor
- One shot can process 4 instructions in parallel, and each instruction can include a destination operand, that is, the calculation results of the 4 instructions can be written into the corresponding general-purpose registers in one shot.
- the size of the result) can be 16bit, that is, 2byte, etc., which will not be repeated here;
- processor architecture and the pipeline structure of the processor are only some exemplary implementations provided by the embodiments of the present invention, and the processor architecture and the pipeline structure of the processor in the embodiments of the present invention include but are not limited to the above implementations. Way.
- the capacity of GPR is only 32 bytes. Due to the small capacity of GPR, in the above write-back pipeline, if the calculation result cannot be written back to GPR, the calculation result is temporarily written into the data memory, and then When the subsequent instruction needs to use the calculation result, it will be read from the data memory to the GPR, which will result in frequent data swapping in and out between the data memory and the GPR, reducing the efficiency of instruction execution, and even causing the pipeline to stop, etc. Wait. Therefore, for such cases, there is a solution to use the entire original data memory as a GPR, that is, to expand the capacity of the GPR to include the full range vector (Full Vector, FV), that is, to include the full range of operand selection. scope.
- FV Full Range Vector
- FIG. 2 is a schematic diagram of another IE structure in the prior art.
- the capacity of the GPR in Figure 2 has been expanded to 512 bytes, which ensures that all data can be stored in the GPR to a certain extent, without the need to frequently update the GPR and GPR. Data is swapped in and out.
- directly expanding the capacity of the GPR will result in that in the data selection pipeline, the processor needs to select the source operand address obtained by decoding within the 512byte data range to obtain the corresponding source operand address. Operands, in this way, will greatly increase the logical cost of selection.
- Scheme 1 Extract a partial vector (PV) from FV, use PV as the selection range of operands, and select operands from PV.
- PV partial vector
- VLIW Very Long Instruction Word
- FIG. 3 is a schematic diagram of program splitting in the prior art. As shown in Figure 3, each circle in the figure represents a program block, and the connection between the program blocks represents the program path.
- the compiler generates PV through partial vector construction (PV Builder, PVB) in advance for the source operands that need to be referenced and the destination operands that need to be written back in each instruction to be executed. Then, use PV as the selection range of the source operand and destination operand of IE, and then restore the PV to FV through the full-range vector construction (FV Builder, FVB) after the execution of the program block is completed.
- PV Builder partial vector construction
- FVB full-range vector construction
- the extracted PV can be stored in a 128-byte general-purpose register group (for example, a general-purpose register can store 1-byte data, then the general-purpose register group at least It can include 128 general-purpose registers, etc.), in this way, the selection range is narrowed and the logic cost of selection is reduced.
- a general-purpose register can store 1-byte data
- the general-purpose register group at least It can include 128 general-purpose registers, etc.
- the access types of the source operand and the destination operand may include direct access and indirect access.
- PV extraction is likely to cause a stampede problem between indirect access and direct access to general-purpose registers;
- indirect addressing In order to reduce the cost of PVB extraction and the cost of indirect addressing (PVB aligns the extraction segment according to 4 bytes), indirect addressing is forced to only support operands with a granularity of 1byte, resulting in instructions with an operand size of 2byte being used. Forcibly split into 2 instructions, thereby reducing the efficiency of instruction execution, and increasing the number of instructions, that is, increasing the instruction space;
- the hardware needs to do the translation of the instruction operand based on the full range vector FV-based to the instruction operand based on the partial vector PV-based, which further increases the hardware cost.
- FIG. 5 The schematic diagram of instruction processing, as shown in Figure 5, based on the original FV, without adding PV extraction and PV restoration, the selection range of the source operand and the destination operand is constrained to reduce the logical cost of data selection;
- FIG. 6 is a schematic structural diagram of a processor according to an embodiment of the present invention.
- the processor 10 may be located in any electronic device, such as a computer, a computer, a mobile phone, a tablet, a personal digital assistant, a smart wearable device, a smart vehicle, or a smart home appliance.
- the processor 10 may specifically be a chip or a chip set or a circuit board on which the chip or the chip set is mounted.
- the chip or chip set or the circuit board on which the chip or chip set is mounted can be driven by necessary software.
- the processor 10 may include an instruction decoding unit 103, a data selection unit 104, and an M row*N column register group 107 connected to the data selection unit, and each of the M row*N column register group 107 is common
- the register group may include K general registers, where M, N and K are integers greater than or equal to 1.
- the instruction decoding unit 103 is connected to the data selection unit 104, and the instruction decoding unit 103 operates in the instruction decoding pipeline stage of the processor 10 to complete the decoding of the X instructions to be executed, and extract the respective X instructions. At least one source operand address (Y source operand addresses in total), and the Y source operand addresses are sent to the data selection unit 104 .
- X and Y are integers greater than or equal to 1.
- the data selection unit 104 operates in the data selection pipeline stage of the processor 10 to select the corresponding general-purpose register from the M row*N column register group 107 according to the source operand address, and obtain the corresponding source operand.
- each source operand address obtained by the instruction decoding unit 103 may include at least one column address bit, at least one row address bit and at least one target address bit.
- the at least one column address bit may be used to indicate a general-purpose register column to which each source operand address belongs, the at least one row address bit may be used to indicate a general-purpose register row to which each source operand address belongs, and the at least one destination address bit It is used to indicate the correspondence between each source operand address and the t-th general-purpose register among the K general-purpose registers, where t is an integer greater than or equal to 1 and less than or equal to K.
- M and N are 4, and K is 2, that is, there are 4 rows*4 columns of general-purpose register groups.
- Each general-purpose register group can include 2 general-purpose registers, and each source operand address can include 2 columns. address bits, 2 row address bits, and 1 destination address bit.
- the two column address bit values can be 00, 01, 10 or 11, where 00 can indicate that the source operand address belongs to the first general register column, and 01 can indicate that the source operand address belongs to the second general register Register row, 10 may indicate that the source operand address belongs to the third general-purpose register row, 11 may indicate that the source operand address belongs to the fourth general-purpose register row, and so on, which will not be repeated here.
- the i-th source operand address in the Y source operand addresses includes 2 column address bits, 2 row address bits and 1 destination address bit, and its values are 10, 11 and 10 respectively.
- the data selection unit 104 can access in the M row*N column general register group 107 according to the 2 column address bits, 2 row address bits and 1 target address bit included in the i-th source operand address the corresponding general-purpose registers. For example, the data selection unit 104 can access the second general-purpose register in the fourth general-purpose register row in the third general-purpose register row to obtain the corresponding source operand; or, in the fourth general-purpose register row, access The 2nd general-purpose register in the 3rd general-purpose register column to obtain the corresponding source operand.
- i is an integer greater than or equal to 1 and less than or equal to Y.
- FIG. 7 is a schematic structural diagram of another processor according to an embodiment of the present invention.
- the processor 10 may further include an instruction memory 101 , an instruction retrieval unit 102 , an execution unit 105 and a result write-back unit 106 .
- the instruction memory 101, the instruction acquisition unit 102, the instruction decoding unit 103, the data selection unit 104, the execution unit 105 and the result write-back unit 106 can be connected in sequence; Connected to the data selection unit 104 and the result write-back unit 106 .
- the instruction acquisition unit 102 operates in the instruction fetch pipeline stage of the processor 10 to complete the acquisition of X instructions to be executed from the instruction memory 101, and can verify the X instructions.
- the instruction decoding unit 103 can decode the X instructions to be executed, extract at least one source operand address of each of the X instructions, and can also extract the respective destination operand addresses of the X instructions to be executed. (a total of X destination operand addresses), as well as command type, operand type and immediate data, etc.
- the instruction decoding unit 103 may further perform validity checking on the X instructions, and so on.
- the destination operand address obtained through instruction decoding may include at least one column address bit, at least one row address bit and at least one target address bit.
- the at least one column address bit can be used to indicate the general-purpose register column to which the destination operand address belongs
- the at least one row address bit can be used to indicate the general-purpose register row to which the destination operand address belongs
- the at least one target address bit can be used with to indicate the correspondence between the destination operand address and the t-th general-purpose register among the K general-purpose registers.
- the execution unit 105 may include multiple arithmetic logic units (Arithmetic Logic Units, ALUs) as shown in FIG. 7 (for example, including the arithmetic logic unit 1051 and the arithmetic logic unit 1052, etc.), and optionally, may also include other A unit for performing a computing task, etc., is not specifically limited in this embodiment of the present invention.
- the execution unit 105 runs in the execution pipeline stage of the processor 10, and can complete the calculation task of the instruction through the arithmetic logic unit therein, and obtain the corresponding calculation result.
- the processor 10 can call multiple arithmetic logic units to execute tasks of multiple instructions in parallel to obtain the calculation results of the multiple instructions, and the X instructions can be instructions that the processor executes in parallel within one beat. . For example, if X is 4, that is, there are 4 instructions to be executed in a layer (stage), the tasks of the 4 instructions can be executed in parallel by 4 arithmetic logic units.
- the result write-back unit 106 operates in the write-back pipeline stage of the processor 10 to select the corresponding general-purpose register from the M row*N column register group 107 according to the destination operand address, and write the calculation result back to the corresponding general-purpose register. in the general-purpose registers.
- the j-th destination operand address in the X destination operand addresses includes 2 column address bits, 2 row address bits and 1 target address bit, and their values are respectively is 00, 01, and 0, then the result write-back unit 107 can write data in M rows*N columns according to the 2 column address bits, 2 row address bits and 1 destination address bit included in the i-th source operand address
- the corresponding general registers are accessed in the general register group 107 .
- the result write-back unit 106 may access the first general-purpose register in the second general-purpose register row in the first general-purpose register row to write the corresponding calculation result into the general-purpose register; or, in the second general-purpose register In the general register row, access the 1st general register in the 1st general register column.
- j is an integer greater than or equal to 1 and less than or equal to X.
- connection relationship shown in FIG. 7 does not limit the connection relationship between them.
- the pipeline structure may be different according to the structure of each processor. Therefore, the pipeline structure referred to in the present invention refers to the pipeline structure of the processor 10, and does not specifically limit the pipeline structure of other processors.
- each source operand address can include 2 column address bits and 3 row address bits.
- the values of the three row address bits can be 000, 001, 010, 011, 100, 101, 110 or 111, where 000 can indicate that the source operand address belongs to the first general register row, and 001 can indicate that the source operand address belongs to the first general register row.
- each source operand address can include 1 target address bit, and the value of the 1 target address bit can be 0 or 1. , where 0 can indicate that the source operand address corresponds to the first general-purpose register in the general-purpose register group, and 1 can indicate that the source operand address corresponds to the second general-purpose register in the general-purpose register group.
- the selection range can be narrowed, and the corresponding general-purpose register can be finally determined, so that The purpose of narrowing the selection range and reducing the selection cost can be achieved without additionally increasing the hardware cost of PV extraction and PV restoration.
- the first source operand address of the Y source operand addresses includes a total of 6 address bits, including 2 column address bits, 3 row address bits, and 1 destination address bit, and their values are 01 respectively.
- the data selection unit 104 can select the 2 column address bits and the 3 row address bits included in the first source operand address according to the and 1 target address bit, access the first general-purpose register in the register group in the fifth general-purpose register row of the second general-purpose register column (that is, the source operand address whose address is 011000 corresponds to the second column. the 2nd general-purpose register in the 5-row general-purpose register bank) to obtain the corresponding source operand, or to access the 1st general-purpose register within the register bank in the 2nd general-purpose register column of the 5th general-purpose register row, to get the corresponding source operand.
- each general-purpose register group can include 2 general-purpose registers, then each purpose
- the operand address can include 2 column address bits, 3 row address bits and 1 destination address bit.
- the 3 row address bit values can be 000, 001, 010, 011, 100, 101, 110 or 111, where 000 can indicate that the destination operand address belongs to the first general register row, and 001 can indicate the destination The operand address belongs to the 2nd general register row, 010 can indicate that the destination operand address belongs to the 3rd general register row, 111 can indicate that the destination operand address belongs to the 8th general register row, and so on, no more here.
- the value of the 1 destination address bit may be 0 or 1, wherein 0 may indicate that the destination operand address corresponds to the first general-purpose register in the general-purpose register group, and 1 may indicate that the destination operand address corresponds to the general-purpose register The second general-purpose register in the register bank.
- the selection range can be narrowed, and the corresponding general-purpose register can be finally determined, so that The purpose of narrowing the selection range and reducing the selection cost can be achieved without additionally increasing the hardware cost of PV extraction and PV restoration.
- the third destination operand address in the X destination operand addresses includes a total of 6 address bits, including 2 column address bits, 3 row address bits and 1 target address bit, and their values are 01 respectively.
- the result write-back unit 106 can be based on the 2 column address bits, 3 row addresses included in the third destination operand address bit and 1 target address bit, access the first general-purpose register in the register group in the fifth general-purpose register row of the second general-purpose register column (that is, the destination operand address whose address is 011000 corresponds to the second column The 2nd general-purpose register in the 5th general-purpose register group), or access the 1st general-purpose register in the register group in the 2nd general-purpose register column of the 5th general-purpose register row to write the corresponding calculation result into this general register.
- instructions corresponding to different source operand addresses belonging to the same general-purpose register row can be placed in different layers for processing by the compiler, so as to avoid the above-mentioned same general-purpose register row selecting multiple options of different source operands at one time. Conflict, reducing the logical cost of selection.
- any two different destination operand addresses in the above X destination operand addresses also need to belong to different general-purpose register columns. If any two different destination operand addresses belong to the same general-purpose register column, the general-purpose register The column needs to select at least two different source operands at a time, that is, 4 bytes of data, which increases the logical cost of the selection. Wait, and no further description will be given here.
- the access types of the Y source operand addresses and the X destination operand addresses may be restricted to be direct access or indirect access. It should be noted that since direct access can directly know the operand address, and indirect access is a base address plus a variable to finally get the operand address, it often leads to indirect access. The resulting operand address is very likely to be the same as Operand addresses for direct access are different, but both belong to the same general-purpose register row. In this way, for the same general-purpose register row, it is necessary to select two different general-purpose registers at a time to read two different source operands, or write the calculation results of two different instructions, which will greatly increase the number of options. logical cost.
- the compiler can place instructions with different operand addresses but belong to the same general register row in different layers for processing according to the aforementioned constraint rules; or, if the operand addresses are constrained If both are indirect accesses, the compiler can place instructions with different operand addresses but belonging to the same general-purpose register row in different layers for processing according to the aforementioned constraint rules and the difference value that always exists between the base addresses of the indirect accesses.
- the same operand in the same layer for example, src1 shown in Figure 9 below
- each general-purpose register group may include at least one general-purpose register. Therefore, the processor can, based on the source operand address and at least one column address bit included in the destination operand address, from the general-purpose register column corresponding to the at least one column address bit, according to the remaining row address bits and target address bits. Or based on at least one row address bit included in the source operand address and the destination operand address, from the general register row corresponding to the at least one row address bit, according to the remaining column address bits and target address bits. general-purpose registers. For each source operand address and destination operand address, the scope of selection is reduced, and the selection cost is greatly reduced.
- FIG. 8a-FIG. 8e are schematic diagrams of a division manner of a general-purpose register provided by an embodiment of the present invention.
- FV that is, all general-purpose registers
- N banks that is, the above-mentioned division into N columns.
- the selection range of ALU is constrained based on the Bank dimension to reduce the selection cost.
- the interleaving granularity K is generally consistent with the size of the ALU operand, so that the processor can fetch the required source operand in a general register group (for example, the size of the source operand in the instruction is 2byte, then the interleaving granularity K can be is 2byte, that is, each general-purpose register group includes 2 general-purpose registers, usually each general-purpose register can store 1byte of data).
- the number of banks N can generally be consistent with the size of the VLIW (or consistent with the number of instructions included in each layer), that is, consistent with the number of instructions processed in parallel by the processor in one shot, so that the processor can target multiple
- the source operand of each instruction select the corresponding general-purpose register in the Bank, for example, the source operand of instruction 1 can belong to Bank0, the source operand of instruction 2 can belong to Bank2, etc., thereby reducing the need to select the same Bank for multiple instructions. selection conflict.
- the processor can process 4 instructions in parallel in one beat (that is, one clock cycle), then N can be 4, that is, the general-purpose register is divided into 4 Banks.
- the interleaving granularity K may not be consistent with the size of the operand, for example, when the size of the operand is 2 bytes, the interleaving granularity may also be 1, that is, each general register group. may include 1 general-purpose register, or the interleaving granularity K may be 4, that is, each general-purpose register group may include 4 general-purpose registers, etc., which is not specifically limited in this embodiment of the present invention.
- the number of banks N may also be greater than the number of instructions processed in parallel by the processor in one beat.
- N may also be 8, that is,
- the general-purpose register is divided into 8 banks, so that the number selection conflict in the same bank can be further reduced, and so on, which is not specifically limited in this embodiment of the present invention.
- FV 512byte FV
- R0-R511 512 general-purpose registers
- R0R511 512 general-purpose registers
- other registers such as status registers, etc., here is the general-purpose register.
- FIGS. 8a-8e the division method of general-purpose registers and the number selection method involved in the embodiment of the present invention are further elaborated.
- the FV is multi-banked.
- the glyphs are interleaved and divided into corresponding Banks.
- general registers R0-R1, R8-R9, R16-R17...R504-R505 can be divided into Bank0; R2-R3, R10-R11, R18-R19...R506-R507 can be divided into Bank1 ; Divide R4-R5, R12-R13, R20-R21...R508-R509 to Bank2; divide R6-R7, R14-R15, R22-R23...R510-R511 to Bank3.
- Each general-purpose register in R0 to R511 can store 1byte of data, optionally, R0, etc. can also be recorded as R0b, where b represents 8bit, that is, 1byte.
- both the source operand address and the destination operand address may include at least 9 address bits, which may include 2 column address bits, which are used to indicate the Bank to which they belong, and may also include It includes 6 row address bits, which are used to indicate the general-purpose register row to which it belongs, and can also include 1 target address bit, which is used to indicate that the source operand address or the destination operand address is the first general-purpose register group. register.
- the upper 2 bits of the source operand address may be column address bits, the lower 1 bit may be the destination address bits, and the middle 6 bits may be row address bits.
- the upper 6 bits of the source operand address can be row address bits, the lowest 1 bit can be the destination address bit, and the middle 2 bits can be column address bits, or, the upper 1 bit of the source operand address can be are target address bits, the lowest 2 bits may be column address bits, the middle 6 bits may be row address bits, etc., which are not specifically limited in this embodiment of the present invention.
- the source operand address is 000000000
- the source operand address belongs to Bank0
- the first general-purpose register in the general-purpose register group of the first row that is, the general-purpose register R0 shown in FIG. 8a
- the corresponding source operand can be obtained.
- the source operand address is 100000001, it can be determined that the source operand address belongs to Bank2, and by accessing the second general register in the first row general register group in Bank2 (that is, the general register shown in Figure 8a) R5), the corresponding source operand can be obtained, etc., which will not be repeated here.
- the size of the source operand is consistent with the interleaving granularity K, that is, if the size of the source operand is also 2 bytes, the addresses of consecutive 2 bytes belonging to the same Bank can be considered to be the same address, such as the source operand.
- Addresses 000000000 and 000000001 can be considered to be the same address, that is, R0 and R1 as shown in Figure 8a can be considered to be the same address.
- the processor obtains the corresponding source operand according to the source operand address 000000000 or 000000001, both are read Take the data in R0 and R1 to obtain a 2byte source operand.
- R0h and R0, R0h and R1 can also be considered to be the same address. Since h in R0h represents 16bit, that is, 2byte, then R0h is R0+R1, so both read the data in R0 and R1 , thereby obtaining a 2byte source operand.
- the upper 2 bits in the destination operand address can also be the column address bits, the lowest 1 bit can be the target address bits, and the middle 6 bits can be the row address bits.
- the destination operand address is 111111110, it can be determined that the destination operand address belongs to Bank3.
- the first general-purpose register in the general-purpose register group of row 64 in Bank3 that is, general-purpose register R510 as shown in Figure 8a
- the corresponding calculation result can be written back to the general register R510.
- the destination operand address is 110000001
- it can be determined that the source operation address belongs to Bank3 and by accessing the second general-purpose register in the general-purpose register group of the first row in Bank3 (that is, the general-purpose register shown in Figure 8a) R7), the corresponding calculation result can be written back into the general-purpose register R7, etc., which will not be repeated here.
- the FV is multi-banked.
- the glyphs are interleaved and divided into corresponding Banks.
- general registers R0-R3, R16-R19, R32-R35...R496-R499 can be divided into Bank0; R4-R7, R20-R23, R36-R39...R500-R503 can be divided into Bank1 ; Divide R8-R11, R24-R27, R40-R43...R504-R507 to Bank2; divide R12-R15, R28-R31, R44-R47...R508-R511 to Bank3.
- the 512 general-purpose registers are divided into 4 banks (ie, 4 columns), each bank includes 32 rows of general-purpose register banks, and each general-purpose register bank includes 4 general-purpose registers.
- both the source operand address and the destination operand address can include 9 address bits, of which 2 column address bits can be included to indicate the Bank to which they belong, and 5 row address bits can be included to indicate the bank.
- the general-purpose register row to which it belongs may further include two target address bits, which are used to indicate which general-purpose register in the general-purpose register group the source operand address or the destination operand address is.
- the upper 2 bits of the source operand address may be column address bits
- the lower 2 bits may be destination address bits
- the middle 5 bits may be row address bits.
- the upper 5 bits of the source operand address may be row address bits, the lowest 2 bits may be target address bits, the middle 2 bits may be column address bits, etc., which are not specifically limited in this embodiment of the present invention.
- taking the upper 2 bits as the column address bits, the lower 2 bits as the target address bits, and the middle 5 bits as the row address bits as an example for example, if the source operand address is 100000011, it can be determined that the source operation address belongs to Bank2, The corresponding source operand can be obtained by accessing the fourth general-purpose register in the first-row general-purpose register group (that is, the general-purpose register R11 shown in Figure 8b) in Bank2.
- the source operand address is 100000110, it can be determined that the source operand address belongs to Bank2, by accessing the third general-purpose register in the second row general-purpose register group in Bank2 (that is, the general-purpose register shown in Figure 8b). R26), the corresponding source operand can be obtained.
- R26 the third general-purpose register in the second row general-purpose register group in Bank2
- the corresponding source operand can be obtained.
- the FV is multi-banked.
- the glyphs are interleaved and divided into corresponding Banks.
- general registers R0-R1, R4-R5, R8-R9...R508-R509 can be divided into Bank0; R2-R3, R6-R7, R10-R11...R510-R511 can be divided into Bank1 .
- the 512 general-purpose registers are divided into 2 banks (ie, 2 columns), each bank includes 128 rows of general-purpose register banks, and each general-purpose register bank includes 2 general-purpose registers.
- both the source operand address and the destination operand address can include 9 address bits, which can include 1 column address bit to indicate the Bank to which they belong, and 7 row address bits to indicate the bank.
- the general-purpose register row to which it belongs may also include a target address bit, which is used to indicate that the source operand address or the destination operand address is the th general-purpose register in the general-purpose register group.
- the highest 1 bit of the source operand address can be the column address bit, the lowest 1 bit can be the target address bit, and the middle 7 bits can be the row address bit.
- the source operand address is 100000001, it can be determined that the source operation address belongs to Bank1.
- the second general-purpose register in the first row of general-purpose register group in Bank1 that is, general-purpose register R3 as shown in Figure 8c
- the source operand address is 10000010, it can be determined that the source operand address belongs to Bank1, by accessing the first general-purpose register in the second row general-purpose register group in Bank1 (that is, the general-purpose register shown in Figure 8c). R6), you can get the corresponding source operand.
- R6 the first general-purpose register in the second row general-purpose register group in Bank1 (that is, the general-purpose register shown in Figure 8c).
- FIG. 8d and FIG. 8e for other possible division manners.
- the FV is 1024 bytes, take K equals 2 and N equals 4 as an example, the FV is multi-banked, the 1024 general-purpose registers can be divided into 4 banks (that is, 4 columns), and each bank includes 128 Row general-purpose register banks, each general-purpose register bank includes 2 general-purpose registers. Then, both the source operand address and the destination operand address may include at least 10 address bits, which may include 2 column address bits, 7 row address bits, and 1 destination address bit, etc., which will not be repeated here.
- the way of dividing the Bank may not be limited to the dividing way of the "Z"-shaped interleaving shown in the above-mentioned Figures 8a-8e.
- consecutive addresses can be divided into each Bank, for example , Take the FV of 512byte and the number of Bank N equal to 4 as an example, you can divide R0-R127 into Bank0, R128-R255 into Bank1, R256-R382 into Bank2, R383-R511 into Bank3, etc., This embodiment of the present invention does not specifically limit this.
- FIG. 9 is a schematic diagram of data selection provided by an embodiment of the present invention.
- the arithmetic logic unit ALU mainly completes arithmetic operations (addition, subtraction, multiplication and division), logical operations (and or non-exclusive OR) and shift operations on binary data.
- Mathematical operations such as addition, subtraction, multiplication, division, and logical operations such as "OR, AND, ASL, ROL" instructions are performed in the arithmetic logic unit ALU.
- an instruction can include two source operands, for example, the arithmetic logic unit ALU0 can be used to execute the task of instruction 0, and can include two source operands src1 and src2 (for example, the two source operands of src1 and src2 can be
- ALU1 can be used to execute the task of instruction 1, which can include two source operands, src1 and src2
- ALU2 can be used to execute the task of instruction 2, which can include There are two source operands, src1 and src2.
- ALU3 can be used to execute the task of instruction 3, and can include two source operands, src1 and src2.
- instruction 0, instruction 1, instruction 2, and instruction 3 may be four instructions of the same layer under the VLIW architecture.
- the two source operands of each instruction (such as src1 and src2 of ALU0 shown in Figure 9, src1 and src2 of ALU1, src1 and src2 of ALU2, and src1 and src2 of ALU3) can be independent of each other, and they are not independent of each other. Subject to the above binding rules.
- src1 of ALU0 and src2 of ALU0 can access different general registers in the same Bank, such as R0 and R8 in Bank0 shown in Figure 8a, or src1 of ALU0 and src2 of ALU1 can also access different general registers in the same Bank , such as R10 and R11 in Bank1 shown in FIG. 8a , etc., which will not be repeated here. Due to the independence between src1 and src2, they are not constrained by the above constraint rules, so that the problem of reduced instruction execution efficiency caused by constraint rules can be reduced to a certain extent.
- instruction 0, instruction 1, instruction 2, and instruction 3 are 4 instructions of the same layer, they need to be constrained by the above-mentioned selection constraint rules to avoid selecting multiple different sources in the same Bank at one time. Based on the increase of the optional logic cost brought about by the operand, it can be understood that the source operands of the instructions of different layers are not constrained to access the Bank.
- instruction 4 is the instruction of the next layer, which is in a different layer from instruction 0, instruction 1, instruction 2 and instruction 3.
- src1 or src2 of instruction 4 is the same as that of the previous instruction 0, instruction 1, instruction 2 and instruction 3. There will be no conflict.
- src2 may be indirect access, for example, src2 of ALU1 may be direct access, src1 of ALU3 may be indirect access, etc., which will not be repeated here.
- FIG. 10 is a schematic diagram of a result write-back provided by an embodiment of the present invention.
- each instruction can include a destination operand.
- ALU0, ALU1, ALU2, and ALU3 can write their respective calculation results back to the corresponding general-purpose registers according to their destination operand addresses.
- instruction 0, instruction 1, instruction 2 and instruction 3 are 4 instructions of the same layer, they need to be constrained by the above selection constraint rules to avoid selecting multiple operations with different purposes in the same Bank at one time.
- the destination operands of different layers of instructions are not restricted to access the Bank, and the access to the Bank of instructions of different layers is not restricted.
- the access type of the destination operand is also not restricted. For details, reference may be made to the embodiment corresponding to FIG. 9 above, which will not be repeated here.
- the Write After Write (WAW) mode can be supported.
- the destination operand addresses of ALU0, ALU1, ALU2 and ALU3 are all R0, and ALU3 writes the calculation result of instruction 3 into R0 after ALU0, ALU1 and ALU2, then R0 finally saves the calculation result of instruction 3.
- the processor 10 in this embodiment of the present invention may further include one or more temporary registers (Temporary Register, T) ( 7), the one or more temporary registers can be simultaneously visible to all ALUs (eg, ALU0, ALU1, ALU2 and ALU3 shown in FIG. 9) without any constraint.
- temporary registers are only visible to the compiler (Compiler), but not to the programmer (Programer), and can be used to improve instruction utilization.
- the compiler can replace general-purpose registers that need to be frequently referenced and updated with temporary registers.
- R8 can be replaced by a temporary register T0 in the upper layer of instruction 0, instruction 1, instruction 2 and instruction 3, and the T0 can be The data of R8 is stored and corresponds to R8. In this way, when instruction 1, instruction 2 and instruction 3 select data according to the source operand address, they can directly access the corresponding T0 to obtain the source operand, thus not being constrained by the constraint rules, ensuring the utilization and effectiveness.
- FIG. 11 is a schematic flowchart of a processing method according to an embodiment of the present invention.
- the processing method is applied to a processor, and the processor includes a data selection unit and an instruction decoding unit connected to the data selection unit.
- unit and M row*N column general-purpose register group each general-purpose register group in the M row*N-column general-purpose register group includes K general-purpose registers; M, N, and K are integers greater than or equal to 1; and the processing The method is applicable to any one of the processors in the above-mentioned FIG. 1 to FIG. 3 and a device (such as a mobile phone, a computer, a server, etc.) including the processor.
- the method may include the following steps S201-S202, wherein,
- Step S201 through the instruction decoding unit, decode the input X instructions, and obtain at least one source operand address of each of the X instructions, a total of Y source operand addresses;
- the source operand address is sent to the data selection unit;
- each of the Y source operand addresses includes at least one column address bit, at least one row address bit and at least one target address bit;
- the At least one column address bit is used to indicate the general-purpose register column to which each source operand address belongs, and the at least one row address bit is used to indicate the general-purpose register row to which each source operand address belongs;
- the at least one The target address bit is used to indicate the correspondence between each source operand address and the t-th general-purpose register in the K general-purpose registers;
- X, Y, and t are integers greater than or equal to 1;
- Step S202 through the data selection unit, according to the at least one column address bit, the at least one row address bit and the at least one target address bit included in the i-th source operand address in the M row* Access the general register corresponding to the i-th source operand address in the N-column general-purpose register group to obtain the corresponding source operand; the i-th source operand address is one of the Y source operand addresses; i is an integer greater than or equal to 1 and less than or equal to Y.
- the processor further includes an execution unit connected to the data selection unit, the execution unit includes at least one arithmetic logic unit; the method further includes:
- the X instructions are executed based on the source operands corresponding to the Y source operand addresses, respectively, to obtain respective calculation results of the X instructions.
- the processor further includes a result write-back unit connected to the execution unit; the method further includes:
- each destination operand address in the X destination operand addresses includes the at least one column address bit, the at least one row address bit and the at least one target address bit;
- the at least one column address bit is used to indicate the general register column to which each destination operand address belongs, the at least one column address bit
- One row address bit is used to indicate the general-purpose register row to which each destination operand address belongs;
- the at least one target address bit is used to indicate that each destination operand address is associated with the K general-purpose registers. the correspondence of the t-th general-purpose register;
- the result write-back unit according to the at least one column address bit, the at least one row address bit and the at least one target address bit included in the jth destination operand address, in the M row*N column Access the general-purpose register corresponding to the j-th destination operand address in the general-purpose register group, and write the calculation result of the j-th instruction back into the general-purpose register corresponding to the j-th destination operand address ; j is an integer greater than or equal to 1 and less than or equal to X.
- the data selection unit is based on the at least one column address bit, the at least one row address bit and the at least one destination included in the i-th source operand address.
- the address bits access the general-purpose register corresponding to the i-th source operand address in the M-row*N-column general-purpose register group, including:
- the data selection unit determine the i'th general-purpose register column to which the i-th source operand address belongs, and perform a In the i'th general-purpose register column, the corresponding general-purpose register is accessed according to the at least one row address bit and the at least one target address bit included in the i-th source operand address, where i' is: an integer greater than or equal to 1 and less than or equal to N; or,
- the data selection unit determine the i-th general register row to which the i-th source operand address belongs, and set the row in the i-th source operand address.
- the i" th general register row the corresponding general register is accessed according to the at least one column address bit and the at least one target address bit included in the i th source operand address, where i" is: An integer greater than or equal to 1 and less than or equal to M.
- the writing back unit through the result is based on the at least one column address bit, the at least one row address bit and the at least one column address bit included in the jth destination operand address.
- the target address bit accesses the general-purpose register corresponding to the j-th destination operand address in the M-row*N-column general-purpose register group, including:
- the j'th general register column determines the j'th general register column to which the jth target operand address belongs, and In the j'th general register column, the corresponding general register is accessed according to the at least one row address bit and the at least one target address bit included in the jth target operand address, j' is an integer greater than or equal to 1 and less than or equal to N; or,
- the j"th general register row to which the jth target operand address belongs, and In the j"-th general-purpose register row, the corresponding general-purpose register is accessed according to the at least one column address bit and the at least one target address bit included in the j-th target operand address, j" is an integer greater than or equal to 1 and less than or equal to M.
- any two different source operand addresses among the Y source operand addresses belong to different general register columns; any two among the X destination operand addresses Different destination operand addresses belong to different columns of the general-purpose register.
- the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the Y source operand addresses include a plurality of first source operand addresses whose number is greater than the first threshold, and second source operand addresses; the The general-purpose register corresponding to the first source operand address is the target general-purpose register, and the second source operand address belongs to the general-purpose register row where the target general-purpose register is located; the method further includes:
- the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the X destination operand addresses include a plurality of first destination operand addresses whose number is greater than the first threshold, and the second destination operand address; the The general-purpose register corresponding to the first destination operand address is the target general-purpose register, and the second destination operand address belongs to the general-purpose register row where the target general-purpose register is located; the method further includes:
- the corresponding temporary register is accessed according to the target destination operand address, and the corresponding calculation result is written back into the temporary register.
- the processor further includes an instruction acquisition unit connected to the instruction decoding unit; the method further includes:
- the access types of the Y source operand addresses and the X destination operand addresses are both direct access types or indirect access types.
- An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may store a program, and when the program is executed by a processor, the processor may execute any of the methods described in the foregoing method embodiments. Some or all of the steps of a kind.
- Embodiments of the present invention further provide a computer program, where the computer program includes instructions, when the computer program is executed by a multi-core processor, the processor can perform some or all of the steps of any one of the above method embodiments .
- the disclosed apparatus may be implemented in other manners.
- the apparatus embodiments described above are only illustrative, for example, the division of the above-mentioned units is only a logical function division, and other division methods may be used in actual implementation, for example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.
- the above-mentioned units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
- the integrated units are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium.
- the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc., specifically a processor in the computer device) to execute all or part of the steps of the above methods in various embodiments of the present invention.
- a computer device which may be a personal computer, a server, or a network device, etc., specifically a processor in the computer device
- the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disk, Read-Only Memory (Read-Only Memory, abbreviation: ROM) or Random Access Memory (Random Access Memory, abbreviation: RAM), etc.
- a medium that can store program code may include: U disk, mobile hard disk, magnetic disk, optical disk, Read-Only Memory (Read-Only Memory, abbreviation: ROM) or Random Access Memory (Random Access Memory, abbreviation: RAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Disclosed in the present invention are a processor, a processing method, and a related device. The processor comprises a data selection unit, an instruction decoding unit connected to the data selection unit, and M rows*N columns of general-purpose register sets; the instruction decoding unit is used for decoding X input instructions, obtaining at least one source operand address of each of the X instructions, Y source operand addresses in total, and sending the Y source operand addresses to the data selection unit; each source operand address among the Y source operand addresses comprises at least one column address bit, at least one row address bit, and at least one target address bit; the data selection unit is used for accessing, according to the at least one column address bit, the at least one row address bit, and the at least one target address bit comprised in the i-th source operand address, the general-purpose register corresponding to the i-th source operand address in the M rows*N columns of general-purpose register sets, and obtaining a corresponding source operand. According to the present invention, the operand selection cost can be reduced.
Description
本发明涉及计算机技术领域,尤其涉及一种处理器、处理方法及相关设备。The present invention relates to the field of computer technology, and in particular, to a processor, a processing method and related equipment.
随着各个领域的数据规模以及复杂度的不断增加,对处理器计算能力的要求越来越高,为了提高处理器执行指令的效率,现有技术中大多采取将一条指令的操作分成多个细小的步骤的方式,也即指令流水线。例如,常见的五级流水线可以包括:取指令流水线、指令译码流水线、数据选择流水线、执行流水线和写回流水线。一般情况下,流水线级数越多,也即一条指令被划分成越多个小步骤,指令的执行效率越高。其中,在数据选择流水线中,需要根据每条指令的源操作数地址从对应的通用寄存器(General-Purposed Register,GPR)中读取数据,已获得该指令执行计算所需的源操作数,而在写回流水线中,需要将该指令的计算结果写回对应的GPR中,待后续指令需要使用该计算结果时,可以从该GPR中获取该计算结果,也即该计算结果可以作为后续指令的源操作数,等等。With the continuous increase of data scale and complexity in various fields, the requirements for processor computing power are getting higher and higher. In order to improve the efficiency of processor execution of instructions, most of the existing technologies adopt the operation of dividing an instruction into multiple small ones. The way of the steps, that is, the instruction pipeline. For example, a common five-stage pipeline may include: an instruction fetch pipeline, an instruction decoding pipeline, a data selection pipeline, an execution pipeline, and a write return pipeline. In general, the more pipeline stages, that is, the more small steps an instruction is divided into, the higher the execution efficiency of the instruction. Among them, in the data selection pipeline, it is necessary to read data from the corresponding general-purpose register (General-Purposed Register, GPR) according to the source operand address of each instruction, and the source operand required for the execution of the instruction has been obtained. In the write-back pipeline, the calculation result of the instruction needs to be written back to the corresponding GPR. When the subsequent instruction needs to use the calculation result, the calculation result can be obtained from the GPR, that is, the calculation result can be used as the result of the subsequent instruction. source operand, etc.
综上,容量较小的GPR(例如为32byte的GPR)往往会导致在数据无法写回GPR的情况下暂时先将数据写入数据存储器(data memory)中,然后当后续指令需要使用该数据时,再从data memory中读出至GPR,如此便会导致频繁地在data memory和GPR之间进行数据换入换出,如果涉及指令间的相关性,就会导致流水停顿(Pipeline stall),降低处理器性能。然而,若为了解决上述频繁有数据换入换出的问题而直接扩大GPR,例如将整个data memory都作为GPR(比如将GPR扩大至512byte),则会大大增加选数范围(从原来的32byte的数据范围扩大至512byte的数据范围),从而为上述操作数选择带来巨大的逻辑代价。To sum up, a GPR with a small capacity (such as a 32-byte GPR) often causes the data to be temporarily written into the data memory when the data cannot be written back to the GPR, and then when the subsequent instructions need to use the data. , and then read it from the data memory to the GPR, which will result in frequent data swapping in and out between the data memory and the GPR. If the correlation between instructions is involved, it will cause the pipeline stall (Pipeline stall), reducing processor performance. However, if the GPR is directly expanded in order to solve the above-mentioned problem of frequent data swapping in and out, for example, the entire data memory is used as the GPR (for example, the GPR is expanded to 512bytes), the selection range will be greatly increased (from the original 32bytes) The data range is expanded to a data range of 512 bytes), which brings a huge logical cost to the above operand selection.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供一种处理器、处理方法及相关设备,可以减少操作数的选数代价,保证处理器的处理效率。Embodiments of the present invention provide a processor, a processing method, and related equipment, which can reduce the cost of operand selection and ensure the processing efficiency of the processor.
第一方面,本发明实施例提供了一种处理器,包括数据选择单元、与所述数据选择单元连接的指令译码单元和M行*N列通用寄存器组,所述M行*N列通用寄存器组中的每一个通用寄存器组包括K个通用寄存器;M、N和K为大于或者等于1的整数;其中,所述指令译码单元,用于对输入的X条指令进行译码,获取所述X条指令各自的至少一个源操作数地址,共计Y个源操作数地址;并将所述Y个源操作数地址发送至所述数据选择单元;所述Y个源操作数地址中的每一个源操作数地址中包括至少一个列地址位、至少一个行地址位和至少一个目标地址位;所述至少一个列地址位用于指示所述每一个源操作数地址所属的通用寄存器列,所述至少一个行地址位用于指示所述每一个源操作数地址所属的通用寄存器行;所述至少一个目标地址位用于指示所述每一个源操作数地址与所述K个通用寄存器中的第t个通用寄存器的对应关系;X、Y和t为大于或者等于1的整数;所述数据选择单元,用于根据第i个源操作数地址中包括的所述至少一个列地址位、所述至少一 个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第i个源操作数地址对应的通用寄存器,获取对应的源操作数;所述第i个源操作数地址为所述Y个源操作数地址中的一个;i为大于或者等于1,且小于或者等于Y的整数。In a first aspect, an embodiment of the present invention provides a processor, including a data selection unit, an instruction decoding unit connected to the data selection unit, and a general-purpose register set of M rows*N columns, wherein the M rows*N columns are general-purpose registers Each general-purpose register group in the register group includes K general-purpose registers; M, N, and K are integers greater than or equal to 1; wherein, the instruction decoding unit is used to decode the input X instructions, obtain The respective at least one source operand address of the X instructions, a total of Y source operand addresses; and the Y source operand addresses are sent to the data selection unit; the Y source operand addresses are Each source operand address includes at least one column address bit, at least one row address bit and at least one target address bit; the at least one column address bit is used to indicate the general register column to which each source operand address belongs, The at least one row address bit is used to indicate the general register row to which each source operand address belongs; the at least one destination address bit is used to indicate that each source operand address is in the K general registers. The corresponding relationship of the t-th general-purpose register; X, Y, and t are integers greater than or equal to 1; the data selection unit is used for the at least one column address bit included in the i-th source operand address, The at least one row address bit and the at least one target address bit access the general register corresponding to the i-th source operand address in the M row*N column general register group to obtain the corresponding source operand; The i-th source operand address is one of the Y source operand addresses; i is an integer greater than or equal to 1 and less than or equal to Y.
本发明实施例,提供了一种处理器,该处理器实现了通过缩小操作数的选数范围以减小选数代价,具体包括针对通用寄存器组的划分以及针对源操作数地址的设计。其中,在通用寄存器组方面,该处理器可以包括预先划分的M行*N列通用寄存器组,每一个通用寄存器组中可以包括至少一个通用寄存器。此外,在源操作数地址方面,本发明实施例的源操作数地址中可以包括至少一个列地址位、至少一个行地址位和至少一个目标地址位。其中,该至少一个列地址位可以用于指示源操作数地址所属的通用寄存器列(例如,若共划分有4列通用寄存器,则可通过两个二进制的列地址位来指示源操作数地址所属的通用寄存器列),该至少一个行地址位可以用于指示源操作数地址所属的通用寄存器行,该至少一个目标地址位可以用于指示源操作数地址具体为某一通用寄存器组中的第几个通用寄存器。从而实现处理器可以根据每一个源操作数地址中包括的各部分地址位,在其指示的通用寄存器行、通用寄存器列中进行操作数选择。例如,可以根据列地址位确定每一个源操作数地址所属的通用寄存器列,然后再在各自所属的通用寄存器列中,通过其他的行地址位和目标地址位确定每一个源操作数地址具体对应的通用寄存器;或者,可以根据行地址位确定每一个源操作数地址所属的通用寄存器行,然后再在各自所属的通用寄存器行中,通过其他的列地址位和目标地址位确定每一个源操作数地址具体对应的通用寄存器,并获取其中的数据,也即获取对应的源操作数。综合上述两方面,本发明实施例可以通过在指定通用寄存器列或者指定通用寄存器行中进行操作数选择,大大缩小选数范围,降低选数的逻辑代价。由此,对比现有技术中,若直接在全部的数据范围内(也即全部的通用寄存器中)进行操作数选择,则会带来极大的选数代价;若通过预先将一部分数据提取出来,然后再在该数据范围内进行操作数选择,则会带来额外的硬件成本等。因此,相较于前述现有的方案而言,本发明实施例可以实现基于现有的硬件结构,在原本较大的数据范围内,针对每一个源操作数地址确定其对应的一个较小的数据范围,然后在该较小的数据范围内进行操作数选择,极大程度上降低了选数代价,保证指令的执行效率以及处理器的性能。The embodiment of the present invention provides a processor, which realizes reducing the selection cost by narrowing the selection range of the operand, specifically including the division of the general register group and the design of the address of the source operand. Wherein, in terms of general-purpose register groups, the processor may include pre-divided M-row*N-column general-purpose register groups, and each general-purpose register group may include at least one general-purpose register. In addition, in terms of the source operand address, the source operand address in this embodiment of the present invention may include at least one column address bit, at least one row address bit, and at least one target address bit. Wherein, the at least one column address bit can be used to indicate the general-purpose register column to which the source operand address belongs (for example, if there are 4 columns of general-purpose registers in total, two binary column address bits can be used to indicate the source operand address belongs to the general-purpose register column), the at least one row address bit can be used to indicate the general-purpose register row to which the source operand address belongs, and the at least one destination address bit can be used to indicate that the source operand address is specifically the first address in a general-purpose register group Several general purpose registers. Therefore, it is realized that the processor can select the operand in the general register row and general register column indicated by each part of the address bits included in the address of each source operand. For example, the general-purpose register column to which each source operand address belongs can be determined according to the column address bits, and then in the general-purpose register column to which it belongs, the specific correspondence of each source operand address can be determined through other row address bits and target address bits. Or, the general register row to which each source operand address belongs can be determined according to the row address bits, and then in the general register row to which it belongs, each source operation can be determined by other column address bits and destination address bits. The corresponding general register of the number address is obtained, and the data in it is obtained, that is, the corresponding source operand is obtained. Combining the above two aspects, the embodiment of the present invention can greatly narrow the selection range and reduce the logic cost of the selection by performing operand selection in a designated general-purpose register column or a designated general-purpose register row. Therefore, compared with the prior art, if the operand selection is performed directly in the entire data range (that is, in all general-purpose registers), it will bring a great selection cost; if a part of the data is extracted in advance , and then perform operand selection within this data range, which will bring additional hardware costs, etc. Therefore, compared with the aforementioned existing solutions, the embodiments of the present invention can realize that, based on the existing hardware structure, within the original larger data range, for each source operand address, a corresponding smaller one can be determined. The data range is selected, and then the operand selection is performed within the smaller data range, which greatly reduces the selection cost and ensures the execution efficiency of the instruction and the performance of the processor.
在一种可能的实现方式中,所述处理器还包括与所述数据选择单元连接的执行单元,所述执行单元包括至少一个算术逻辑单元;所述至少一个算术逻辑单元,用于基于所述Y个源操作数地址各自对应的所述源操作数执行所述X条指令,得到所述X条指令各自的计算结果。In a possible implementation manner, the processor further includes an execution unit connected to the data selection unit, the execution unit includes at least one arithmetic logic unit; the at least one arithmetic logic unit is configured to The source operands corresponding to each of the Y source operand addresses execute the X instructions to obtain respective calculation results of the X instructions.
在本发明实施例中,处理器中还可以包括执行单元,该执行单元还包括一个或多个算术逻辑单元,通过该一个或多个算术逻辑单元可以基于获取到的多个源操作数分别执行该多条指令各自对应的计算任务,例如完成加法、减法等等计算,得到该多条指令各自的计算结果,进一步提高了指令的整体执行效率。In this embodiment of the present invention, the processor may further include an execution unit, and the execution unit may further include one or more arithmetic logic units, through which the one or more arithmetic logic units can respectively execute based on the acquired multiple source operands The respective computing tasks corresponding to the multiple instructions, such as completing addition, subtraction and other calculations, are obtained to obtain the respective calculation results of the multiple instructions, which further improves the overall execution efficiency of the instructions.
在一种可能的实现方式中,所述处理器还包括与所述执行单元连接的结果写回单元;所述指令译码单元,还用于获取所述X条指令各自的目的操作数地址,共计X个目的操作数地址;所述X个目的操作数地址中的每一个目的操作数地址中包括所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位;所述至少一个列地址位用于指示 所述每一个目的操作数地址所属的所述通用寄存器列,所述至少一个行地址位用于指示所述每一个目的操作数地址所属的所述通用寄存器行;所述至少一个目标地址位用于指示所述每一个目的操作数地址与所述K个通用寄存器中的所述第t个通用寄存器的对应关系;所述结果写回单元,用于根据第j个目的操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第j个目的操作数地址对应的所述通用寄存器,并将第j条指令的所述计算结果写回所述第j个目的操作数地址对应的所述通用寄存器中;j为大于或者等于1,且小于或者等于X的整数。In a possible implementation manner, the processor further includes a result write-back unit connected to the execution unit; the instruction decoding unit is further configured to acquire the respective destination operand addresses of the X instructions, A total of X destination operand addresses; each destination operand address in the X destination operand addresses includes the at least one column address bit, the at least one row address bit and the at least one target address bit; The at least one column address bit is used to indicate the general-purpose register column to which each destination operand address belongs, and the at least one row address bit is used to indicate the general-purpose register to which each destination operand address belongs. row; the at least one target address bit is used to indicate the corresponding relationship between the address of each destination operand and the t-th general-purpose register in the K general-purpose registers; the result write-back unit is used for according to The at least one column address bit, the at least one row address bit, and the at least one target address bit included in the jth destination operand address access the jth row address bit in the M row*N column general-purpose register set the general-purpose registers corresponding to the destination operand addresses, and write the calculation result of the jth instruction back into the general-purpose register corresponding to the jth destination operand address; j is greater than or equal to 1, and An integer less than or equal to X.
在本发明实施例中,处理器中还可以包括结果写回单元,此外,本发明实施例的目的操作数地址中可以包括至少一个列地址位、至少一个行地址位和至少一个目标地址位。其中,该至少一个列地址位可以用于指示目的操作数地址所属的通用寄存器列(例如,若共划分有4列通用寄存器,则可通过两个二进制的列地址位来指示目的操作数地址所属的通用寄存器列),该至少一个行地址位可以用于指示目的操作数地址所属的通用寄存器行,该至少一个目标地址位可以用于指示目的操作数地址具体为某一通用寄存器组中的第几个通用寄存器。从而实现处理器可以根据每一个目的操作数地址中包括的各部分地址位,在其指示的通用寄存器行、通用寄存器列中进行操作数选择。例如,可以根据列地址位确定每一个目的操作数地址所属的通用寄存器列,然后再在各自所属的通用寄存器列中,通过其他的行地址位和目标地址位确定每一个目的操作数地址具体对应的通用寄存器;或者,可以根据行地址位确定每一个目的操作数地址所属的通用寄存器行,然后再在各自所属的通用寄存器行中,通过其他的列地址位和目标地址位确定每一个目的操作数地址具体对应的通用寄存器,并将多条指令各自的计算结果分别写回对应的通用寄存器中。由此,通过在指定通用寄存器列或者指定通用寄存器行中进行操作数选择,大大缩小选数范围,降低选数的逻辑代价。In this embodiment of the present invention, the processor may further include a result write-back unit. In addition, the destination operand address in this embodiment of the present invention may include at least one column address bit, at least one row address bit, and at least one target address bit. The at least one column address bit can be used to indicate the general-purpose register column to which the destination operand address belongs (for example, if there are 4 columns of general-purpose registers, two binary column address bits can be used to indicate the destination operand address belongs to the general-purpose register row), the at least one row address bit can be used to indicate the general-purpose register row to which the destination operand address belongs, and the at least one target address bit can be used to indicate that the destination operand address is specifically the No. Several general purpose registers. Therefore, the processor can select the operand in the general register row and general register column indicated by each part of the address bits included in the address of each destination operand. For example, the general-purpose register column to which each destination operand address belongs can be determined according to the column address bits, and then in the respective general-purpose register column, the specific correspondence of each destination operand address can be determined through other row address bits and target address bits. Or, the general register row to which each destination operand address belongs can be determined according to the row address bits, and then each destination operation can be determined by other column address bits and target address bits in the respective general register row. The corresponding general-purpose register of the number address is written, and the respective calculation results of the multiple instructions are written back to the corresponding general-purpose register. Therefore, by performing operand selection in a designated general-purpose register row or a designated general-purpose register row, the selection range is greatly reduced, and the logic cost of the selection is reduced.
在一种可能的实现方式中,所述数据选择单元,具体用于:根据所述第i个源操作数地址中包括的所述至少一个列地址位,确定所述第i个源操作数地址所属的第i’个通用寄存器列,并在所述第i’个通用寄存器列中,根据所述第i个源操作数地址中包括的所述至少一个行地址位和所述至少一个目标地址位访问对应的所述通用寄存器,i’为大于或者等于1,且小于或者等于N的整数;或者,根据所述第i个源操作数地址中包括的所述至少一个行地址位,确定所述第i个源操作数地址所属的第i”个通用寄存器行,并在所述第i”个通用寄存器行中,根据所述第i个源操作数地址中包括的所述至少一个列地址位和所述至少一个目标地址位访问对应的所述通用寄存器,i”为大于或者等于1,且小于或者等于M的整数。In a possible implementation manner, the data selection unit is specifically configured to: determine the ith source operand address according to the at least one column address bit included in the ith source operand address the i'th general-purpose register column to which it belongs, and in the i'th general-purpose register column, according to the at least one row address bit and the at least one destination address included in the i-th source operand address Bit access to the corresponding general-purpose register, i' is an integer greater than or equal to 1 and less than or equal to N; or, according to the at least one row address bit included in the i-th source operand address, determine the The i"th general register row to which the ith source operand address belongs, and in the ith" general register row, according to the at least one column address included in the ith source operand address The bit and the at least one target address bit access the corresponding general-purpose register, and i" is an integer greater than or equal to 1 and less than or equal to M.
在本发明实施例中,可以根据列地址位确定每一个源操作数地址所属的通用寄存器列,然后再在各自所属的通用寄存器列中,通过其他的行地址位和目标地址位确定每一个源操作数地址具体对应的通用寄存器。比如,通过前述至少一个列地址位确定源操作数地址属于第1行通用寄存器组,进一步地,可以在第1行通用寄存器组中,根据至少一个行地址位和至少一个目标地址位确定该源操作数地址属于第1行通用寄存器组中的第2个通用寄存器。又或者,可以根据行地址位确定每一个源操作数地址所属的通用寄存器行,然后再 在各自所属的通用寄存器行中,通过其他的列地址位和目标地址位确定每一个源操作数地址具体对应的通用寄存器。比如,通过前述至少一个行地址位确定源操作数地址属于第1行通用寄存器组,进一步地,可以在第1行通用寄存器组中,根据至少一个列地址位和至少一个目标地址位确定该源操作数地址属于第1列通用寄存器组中的第2个通用寄存器。由此,完成根据源操作数地址中的各部分地址位在该M行*N列通用寄存器组进行操作数选择,也即确定该源操作数地址具体对应的通用寄存器,极大程度上减少了选数代价。需要说明的是,本申请实施例旨在通过行和列的划分,在操作数选择时,基于地址位所指示的行、列进行选数,从而缩小其选数范围,因此,本申请实施例对通过行进行选数范围的缩小还是通过列进行选数范围的缩小不作具体限定。可以理解的是,在一些可能的实施方式中,若每个通用寄存器组内仅仅包括一个通用寄存器组,则该源操作数地址中可以不包括目标地址位。In this embodiment of the present invention, the general-purpose register column to which each source operand address belongs may be determined according to the column address bits, and then in the general-purpose register column to which each source operand address belongs, each source address bit may be determined by other row address bits and target address bits. The general register corresponding to the operand address. For example, it is determined that the source operand address belongs to the general-purpose register group in the first row by using the aforementioned at least one column address bit. The operand address belongs to the 2nd general purpose register in the 1st row general register group. Alternatively, the general register row to which each source operand address belongs can be determined according to the row address bits, and then in the general register row to which they belong, the specific address of each source operand can be determined by other column address bits and target address bits. the corresponding general-purpose registers. For example, it is determined that the source operand address belongs to the general-purpose register group in the first row by using the aforementioned at least one row address bit. The operand address belongs to the 2nd general purpose register in the 1st column general register group. In this way, the selection of operands in the general-purpose register group of M rows*N columns is completed according to each part of the address bits in the source operand address, that is, the general register corresponding to the source operand address is determined, which greatly reduces the Choice cost. It should be noted that the embodiment of the present application aims to select the number based on the row and column indicated by the address bit through the division of rows and columns, when the operand is selected, so as to narrow the selection range. Therefore, the embodiment of the present application There is no specific limitation on whether the selection range is narrowed by rows or the selection range is narrowed by columns. It can be understood that, in some possible implementations, if each general-purpose register group includes only one general-purpose register group, the source operand address may not include target address bits.
在一种可能的实现方式中,所述结果写回单元,具体用于:根据所述第j个目标操作数地址中包括的所述至少一个列地址位,确定所述第j个目标操作数地址所属的第j’个通用寄存器列,并在所述第j’个通用寄存器列中,根据所述第j个目标操作数地址中包括的所述至少一个行地址位和所述至少一个目标地址位访问对应的所述通用寄存器,j’为大于或者等于1,且小于或者等于N的整数;或者,根据所述第j个目标操作数地址中包括的所述至少一个行地址位,确定所述第j个目标操作数地址所属的第j”个通用寄存器行,并在所述第j”个通用寄存器行中,根据所述第j个目标操作数地址中包括的所述至少一个列地址位和所述至少一个目标地址位访问对应的所述通用寄存器,j”为大于或者等于1,且小于或者等于M的整数。In a possible implementation manner, the result write-back unit is specifically configured to: determine the jth target operand according to the at least one column address bit included in the jth target operand address The j'th general register column to which the address belongs, and in the j'th general register column, according to the at least one row address bit included in the jth target operand address and the at least one target The address bit accesses the corresponding general-purpose register, and j' is an integer greater than or equal to 1 and less than or equal to N; or, according to the at least one row address bit included in the jth target operand address, determine The j"th general register row to which the jth target operand address belongs, and in the jth" general register row, according to the at least one column included in the jth target operand address The address bit and the at least one target address bit access the corresponding general-purpose register, and j" is an integer greater than or equal to 1 and less than or equal to M.
在本发明实施例中,可以根据列地址位确定每一个目的操作数地址所属的通用寄存器列,然后再在各自所属的通用寄存器列中,通过其他的行地址位和目标地址位确定每一个目的操作数地址具体对应的通用寄存器。比如,通过前述至少一个列地址位确定目的操作数地址属于第1行通用寄存器组,进一步地,可以在第1行通用寄存器组中,根据至少一个行地址位和至少一个目标地址位确定该目的操作数地址属于第1行通用寄存器组中的第2个通用寄存器。又或者,可以根据行地址位确定每一个目的操作数地址所属的通用寄存器行,然后再在各自所属的通用寄存器行中,通过其他的列地址位和目标地址位确定每一个目的操作数地址具体对应的通用寄存器。比如,通过前述至少一个行地址位确定目的操作数地址属于第1行通用寄存器组,进一步地,可以在第1行通用寄存器组中,根据至少一个列地址位和至少一个目标地址位确定该目的操作数地址属于第1列通用寄存器组中的第2个通用寄存器。由此,完成根据目的操作数地址中的各部分地址位在该M行*N列通用寄存器组进行操作数选择,也即确定该目的操作数地址具体对应的通用寄存器,极大程度上减少了选数代价。In this embodiment of the present invention, the general-purpose register column to which each destination operand address belongs may be determined according to the column address bits, and then in the respective general-purpose register column, other row address bits and target address bits are used to determine each destination The general register corresponding to the operand address. For example, it is determined that the destination operand address belongs to the general-purpose register group of the first row by using the aforementioned at least one column address bit. The operand address belongs to the 2nd general purpose register in the 1st row general register group. Alternatively, the general register row to which each destination operand address belongs can be determined according to the row address bits, and then in the respective general register row, the specific address of each destination operand can be determined by other column address bits and target address bits. the corresponding general-purpose registers. For example, it is determined that the destination operand address belongs to the general-purpose register group in the first row by using the aforementioned at least one row address bit. The operand address belongs to the 2nd general purpose register in the 1st column general register group. In this way, the selection of operands in the M row*N column general register group is completed according to the address bits of each part of the destination operand address, that is, the general register corresponding to the destination operand address is determined, which greatly reduces the Choice cost.
在一种可能的实现方式中,所述Y个源操作数地址中的任意两个不同的源操作数地址属于不同的所述通用寄存器列;所述X个目的操作数地址中的任意两个不同的目的操作数地址属于不同的所述通用寄存器列。In a possible implementation manner, any two different source operand addresses among the Y source operand addresses belong to different general register columns; any two among the X destination operand addresses Different destination operand addresses belong to different columns of the general-purpose register.
在本发明实施例中,为了进一步减少选数的逻辑代价,约束每列通用寄存器一次只能选出一个操作数出来,若多条指令各自的源操作数地址或者目的操作数地址对应于同一列 的不同通用寄存器,则将要求处理器在一列通用寄存器中分别访问(也即选择)不同的通用寄存器,并获取其对应的源操作数,或者写入计算结果等,这往往会增加选数的逻辑代价,恰恰与本发明所要解决的技术问题相违背。In this embodiment of the present invention, in order to further reduce the logical cost of selecting numbers, each column of general-purpose registers is constrained to select only one operand at a time. If the respective source operand addresses or destination operand addresses of multiple instructions correspond to the same column different general-purpose registers, the processor will be required to access (that is, select) different general-purpose registers in a row of general-purpose registers, and obtain their corresponding source operands, or write calculation results, etc., which often increases the number of selections. The logical cost is exactly contrary to the technical problem to be solved by the present invention.
在一种可能的实现方式中,所述处理器还包括临时寄存器,所述临时寄存器与目标通用寄存器对应;所述目标通用寄存器为所述M行*N列通用寄存器组中的一个;所述临时寄存器存储有所述目标通用寄存器内的数据;所述Y个源操作数地址中包括多个相同的数量大于第一阈值的第一源操作数地址,以及第二源操作数地址;所述第一源操作数地址对应的所述通用寄存器为所述目标通用寄存器,所述第二源操作数地址属于所述目标通用寄存器所在的所述通用寄存器列;所述数据选择单元,还用于根据所述第一源操作数地址访问对应的所述临时寄存器,获取对应的源操作数。In a possible implementation manner, the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the Y source operand addresses include a plurality of first source operand addresses whose number is greater than the first threshold, and second source operand addresses; the The general-purpose register corresponding to the first source operand address is the target general-purpose register, and the second source operand address belongs to the general-purpose register column where the target general-purpose register is located; the data selection unit is also used for The corresponding temporary register is accessed according to the address of the first source operand to obtain the corresponding source operand.
在本发明实施例中,由于上述为减少选数代价而设置的约束规则容易导致在处理数据相关性时出现指令执行效率下降的问题,例如,若多条指令执行计算时所需的源操作数相同,并且同一层(stage)内并行处理的其他指令的源操作数与该多条指令的源操作数对应于同一寄存器列的不同通用寄存器,则按照前述的约束规则,该多条指令无法与同一层内的其他指令并行处理,只能通过插空指令,并将该多条指令放在下一层进行处理,如此一来,指令利用率降低,指令执行效率降低。由此,可以通过编译器,将需要频繁引用和更新的通用寄存器用临时寄存器替代,该临时寄存器内可以存储有该通用寄存器内的数据,处理器可以根据源操作数地址访问该对应的临时寄存器,以获取对应的源操作数,不受约束规则的约束,保证了指令的执行效率。In this embodiment of the present invention, the above-mentioned constraint rules set to reduce the cost of selection are likely to lead to the problem of decreased instruction execution efficiency when processing data dependencies. are the same, and the source operands of other instructions processed in parallel in the same stage and the source operands of the multiple instructions correspond to different general-purpose registers of the same register row, then according to the aforementioned constraint rules, the multiple instructions cannot be combined with each other. Other instructions in the same layer are processed in parallel, only by inserting empty instructions and placing the multiple instructions in the next layer for processing. As a result, the instruction utilization rate is reduced and the instruction execution efficiency is reduced. Therefore, the general-purpose register that needs to be frequently referenced and updated can be replaced by a temporary register through the compiler. The data in the general-purpose register can be stored in the temporary register, and the processor can access the corresponding temporary register according to the source operand address. , to obtain the corresponding source operand, which is not constrained by the constraint rules and ensures the execution efficiency of the instruction.
在一种可能的实现方式中,所述处理器还包括临时寄存器,所述临时寄存器与目标通用寄存器对应;所述目标通用寄存器为所述M行*N列通用寄存器组中的一个;所述临时寄存器存储有所述目标通用寄存器内的数据;所述X个目的操作数地址中包括多个相同的数量大于第一阈值的第一目的操作数地址,以及第二目的操作数地址;所述第一目的操作数地址对应的所述通用寄存器为所述目标通用寄存器,所述第二目的操作数地址属于所述目标通用寄存器所在的所述通用寄存器列;所述结果写回单元,还用于根据所述目标目的操作数地址访问对应的所述临时寄存器,并将对应的所述计算结果写回所述临时寄存器中。In a possible implementation manner, the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the X destination operand addresses include a plurality of first destination operand addresses whose number is greater than the first threshold, and the second destination operand address; the The general-purpose register corresponding to the first destination operand address is the target general-purpose register, and the second destination operand address belongs to the general-purpose register column where the target general-purpose register is located; the result is written back to the unit, and also uses accessing the corresponding temporary register according to the target destination operand address, and writing the corresponding calculation result back into the temporary register.
在本发明实施例中,由于上述为减少选数代价而设置的约束规则容易导致在处理数据相关性时出现指令执行效率下降的问题,例如,若多条指令执行计算时所需的源操作数相同,并且同一层内并行处理的其他指令的目的操作数与该多条指令的目的操作数对应于同一寄存器列的不同通用寄存器,则按照前述的约束规则,该多条指令无法与同一层内的其他指令并行处理,只能通过插空指令,并将该多条指令放在下一层进行处理,如此一来,指令利用率降低,指令执行效率降低。由此,可以通过编译器,将需要频繁引用和更新的通用寄存器用临时寄存器替代,该临时寄存器内可以存储有该通用寄存器内的数据,处理器可以根据目的操作数地址访问该对应的临时寄存器,以将对应的计算结果写回该临时寄存器中,不受约束规则的约束,保证了指令的执行效率。In this embodiment of the present invention, the above-mentioned constraint rules set to reduce the cost of selection are likely to lead to the problem of decreased instruction execution efficiency when processing data dependencies. are the same, and the destination operands of other instructions processed in parallel in the same layer and the destination operands of the multiple instructions correspond to different general-purpose registers in the same register row, then according to the aforementioned constraint rules, the multiple instructions cannot be combined with the same layer. In parallel processing of other instructions, only empty instructions can be inserted, and the multiple instructions are placed in the next layer for processing. As a result, the instruction utilization rate is reduced and the instruction execution efficiency is reduced. Therefore, the general-purpose register that needs to be frequently referenced and updated can be replaced by a temporary register through the compiler. The data in the general-purpose register can be stored in the temporary register, and the processor can access the corresponding temporary register according to the destination operand address. , so as to write the corresponding calculation result back into the temporary register, which is not constrained by the constraint rules and ensures the execution efficiency of the instruction.
在一种可能的实现方式中,所述处理器还包括与所述指令译码单元连接的指令获取单元;其中,所述指令获取单元,用于获取待执行的所述X条指令,并将所述X条指令发送至所述指令译码单元;所述X条指令为所述处理器在一个时钟周期内并行执行的指令。In a possible implementation manner, the processor further includes an instruction acquisition unit connected to the instruction decoding unit; wherein, the instruction acquisition unit is configured to acquire the X instructions to be executed, and The X instructions are sent to the instruction decoding unit; the X instructions are instructions executed by the processor in parallel within one clock cycle.
在本发明实施例中,处理器还可以包括指令获取单元,通过该指令获取单元可以获取多条待执行的指令,该多条待执行的指令可以为同一层内的指令,也即该多条指令可以为该处理器在一个时钟周期内(也即一拍(cycle)内)并行执行的指令,从而可以提升指令的执行效率。In this embodiment of the present invention, the processor may further include an instruction acquisition unit, through which a plurality of instructions to be executed may be acquired, and the multiple instructions to be executed may be instructions in the same layer, that is, the multiple instructions The instructions may be instructions that the processor executes in parallel within one clock cycle (that is, within one beat (cycle)), so that the execution efficiency of the instructions can be improved.
在一种可能的实现方式中,所述Y个源操作数地址以及所述X个目的操作数地址的访问类型均为直接访问类型或者均为间接访问类型。In a possible implementation manner, the access types of the Y source operand addresses and the X destination operand addresses are both direct access types or indirect access types.
在本发明实施例中,由于直接访问可以直接知晓操作数地址,而间接访问是一个基址加上一个变量后最终才能得到操作数地址,因此往往会导致间接访问最终得到的操作数地址很有可能与直接访问的操作数地址不同,但是两者属于同一通用寄存器列。如此,对于同一个通用寄存器列来说,一次则需要选取两个不同的操作数出来,这将会增大选数的逻辑代价。但是,若均为直接访问或者均为间接访问,则可以根据前述的约束规则,将操作数地址不同但属于同一通用寄存器列的指令放在不同层进行处理,提前规避在同一通用寄存器列一次需要选取两个不同的操作数的情况,减少选数的逻辑代价。In the embodiment of the present invention, since the direct access can directly know the operand address, and the indirect access is a base address plus a variable, the operand address can be finally obtained, so the operand address finally obtained by the indirect access is often very high. May differ from the operand address for direct access, but both belong to the same general register row. In this way, for the same general-purpose register row, two different operands need to be selected at one time, which will increase the logical cost of the selection. However, if they are both direct access or indirect access, according to the aforementioned constraint rules, instructions with different operand addresses but belonging to the same general-purpose register row can be processed at different layers, avoiding the need for a single general-purpose register row in advance. In the case of selecting two different operands, the logical cost of selecting the operands is reduced.
第二方面,本发明实施例提供了一种处理方法,应用于处理器,所述处理器包括数据选择单元、与所述数据选择单元连接的指令译码单元和M行*N列通用寄存器组,所述M行*N列通用寄存器组中的每一个通用寄存器组包括K个通用寄存器;M、N和K为大于或者等于1的整数;所述方法包括:In a second aspect, an embodiment of the present invention provides a processing method, which is applied to a processor, where the processor includes a data selection unit, an instruction decoding unit connected to the data selection unit, and a general-purpose register set of M rows*N columns , each general-purpose register group in the M-row*N-column general-purpose register group includes K general-purpose registers; M, N and K are integers greater than or equal to 1; the method includes:
通过所述指令译码单元,对输入的X条指令进行译码,获取所述X条指令各自的至少一个源操作数地址,共计Y个源操作数地址;并将所述Y个源操作数地址发送至所述数据选择单元;所述Y个源操作数地址中的每一个源操作数地址中包括至少一个列地址位、至少一个行地址位和至少一个目标地址位;所述至少一个列地址位用于指示所述每一个源操作数地址所属的通用寄存器列,所述至少一个行地址位用于指示所述每一个源操作数地址所属的通用寄存器行;所述至少一个目标地址位用于指示所述每一个源操作数地址与所述K个通用寄存器中的第t个通用寄存器的对应关系;X、Y和t为大于或者等于1的整数;Through the instruction decoding unit, the input X instructions are decoded, and at least one source operand address of each of the X instructions is obtained, which is a total of Y source operand addresses; and the Y source operands are address is sent to the data selection unit; each of the Y source operand addresses includes at least one column address bit, at least one row address bit and at least one destination address bit; the at least one column address bit The address bits are used to indicate the general-purpose register column to which each source operand address belongs, and the at least one row address bit is used to indicate the general-purpose register row to which each source operand address belongs; the at least one destination address bit Used to indicate the correspondence between each source operand address and the t-th general-purpose register in the K general-purpose registers; X, Y, and t are integers greater than or equal to 1;
通过所述数据选择单元,根据第i个源操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第i个源操作数地址对应的通用寄存器,获取对应的源操作数;所述第i个源操作数地址为所述Y个源操作数地址中的一个;i为大于或者等于1,且小于或者等于Y的整数。By the data selection unit, according to the at least one column address bit, the at least one row address bit and the at least one destination address bit included in the i-th source operand address are common in the M rows*N columns Access the general-purpose register corresponding to the i-th source operand address in the register group to obtain the corresponding source operand; the i-th source operand address is one of the Y source operand addresses; i is greater than or an integer equal to 1 and less than or equal to Y.
在一种可能的实现方式中,所述处理器还包括与所述数据选择单元连接的执行单元,所述执行单元包括至少一个算术逻辑单元;所述方法还包括:In a possible implementation manner, the processor further includes an execution unit connected to the data selection unit, the execution unit includes at least one arithmetic logic unit; the method further includes:
通过所述至少一个算术逻辑单元,基于所述Y个源操作数地址各自对应的所述源操作数执行所述X条指令,得到所述X条指令各自的计算结果。Through the at least one arithmetic logic unit, the X instructions are executed based on the source operands corresponding to the Y source operand addresses, respectively, to obtain respective calculation results of the X instructions.
在一种可能的实现方式中,所述处理器还包括与所述执行单元连接的结果写回单元;所述方法还包括:In a possible implementation manner, the processor further includes a result write-back unit connected to the execution unit; the method further includes:
通过所述指令译码单元,获取所述X条指令各自的目的操作数地址,共计X个目的操作数地址;所述X个目的操作数地址中的每一个目的操作数地址中包括所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位;所述至少一个列地址位用于指 示所述每一个目的操作数地址所属的所述通用寄存器列,所述至少一个行地址位用于指示所述每一个目的操作数地址所属的所述通用寄存器行;所述至少一个目标地址位用于指示所述每一个目的操作数地址与所述K个通用寄存器中的所述第t个通用寄存器的对应关系;Through the instruction decoding unit, the respective destination operand addresses of the X instructions are acquired, with a total of X destination operand addresses; each destination operand address in the X destination operand addresses includes the at least one column address bit, the at least one row address bit and the at least one target address bit; the at least one column address bit is used to indicate the general register column to which each destination operand address belongs, the at least one column address bit One row address bit is used to indicate the general-purpose register row to which each destination operand address belongs; the at least one target address bit is used to indicate that each destination operand address is associated with the K general-purpose registers. the correspondence of the t-th general-purpose register;
通过所述结果写回单元,根据第j个目的操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第j个目的操作数地址对应的所述通用寄存器,并将第j条指令的所述计算结果写回所述第j个目的操作数地址对应的所述通用寄存器中;j为大于或者等于1,且小于或者等于X的整数。According to the result write-back unit, according to the at least one column address bit, the at least one row address bit and the at least one target address bit included in the jth destination operand address, in the M row*N column Access the general-purpose register corresponding to the j-th destination operand address in the general-purpose register group, and write the calculation result of the j-th instruction back into the general-purpose register corresponding to the j-th destination operand address ; j is an integer greater than or equal to 1 and less than or equal to X.
在一种可能的实现方式中,所述通过所述数据选择单元,根据第i个源操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第i个源操作数地址对应的通用寄存器,包括:In a possible implementation manner, the data selection unit is based on the at least one column address bit, the at least one row address bit and the at least one destination included in the i-th source operand address. The address bits access the general-purpose register corresponding to the i-th source operand address in the M-row*N-column general-purpose register group, including:
通过所述数据选择单元,根据所述第i个源操作数地址中包括的所述至少一个列地址位,确定所述第i个源操作数地址所属的第i’个通用寄存器列,并在所述第i’个通用寄存器列中,根据所述第i个源操作数地址中包括的所述至少一个行地址位和所述至少一个目标地址位访问对应的所述通用寄存器,i’为大于或者等于1,且小于或者等于N的整数;或者,Through the data selection unit, according to the at least one column address bit included in the i-th source operand address, determine the i'th general-purpose register column to which the i-th source operand address belongs, and perform a In the i'th general-purpose register column, the corresponding general-purpose register is accessed according to the at least one row address bit and the at least one target address bit included in the i-th source operand address, where i' is: an integer greater than or equal to 1 and less than or equal to N; or,
通过所述数据选择单元,根据所述第i个源操作数地址中包括的所述至少一个行地址位,确定所述第i个源操作数地址所属的第i”个通用寄存器行,并在所述第i”个通用寄存器行中,根据所述第i个源操作数地址中包括的所述至少一个列地址位和所述至少一个目标地址位访问对应的所述通用寄存器,i”为大于或者等于1,且小于或者等于M的整数。Through the data selection unit, according to the at least one row address bit included in the i-th source operand address, determine the i-th general register row to which the i-th source operand address belongs, and set the row in the i-th source operand address. In the i" th general register row, the corresponding general register is accessed according to the at least one column address bit and the at least one target address bit included in the i th source operand address, where i" is: An integer greater than or equal to 1 and less than or equal to M.
在一种可能的实现方式中,所述通过所述结果写回单元,根据第j个目的操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第j个目的操作数地址对应的所述通用寄存器,包括:In a possible implementation manner, the writing back unit through the result is based on the at least one column address bit, the at least one row address bit and the at least one column address bit included in the jth destination operand address. The target address bit accesses the general-purpose register corresponding to the j-th destination operand address in the M-row*N-column general-purpose register group, including:
通过所述结果写回单元,根据所述第j个目标操作数地址中包括的所述至少一个列地址位,确定所述第j个目标操作数地址所属的第j’个通用寄存器列,并在所述第j’个通用寄存器列中,根据所述第j个目标操作数地址中包括的所述至少一个行地址位和所述至少一个目标地址位访问对应的所述通用寄存器,j’为大于或者等于1,且小于或者等于N的整数;或者,Through the result write-back unit, according to the at least one column address bit included in the jth target operand address, determine the j'th general register column to which the jth target operand address belongs, and In the j'th general register column, the corresponding general register is accessed according to the at least one row address bit and the at least one target address bit included in the jth target operand address, j' is an integer greater than or equal to 1 and less than or equal to N; or,
通过所述结果写回单元,根据所述第j个目标操作数地址中包括的所述至少一个行地址位,确定所述第j个目标操作数地址所属的第j”个通用寄存器行,并在所述第j”个通用寄存器行中,根据所述第j个目标操作数地址中包括的所述至少一个列地址位和所述至少一个目标地址位访问对应的所述通用寄存器,j”为大于或者等于1,且小于或者等于M的整数。Through the result write-back unit, according to the at least one row address bit included in the jth target operand address, determine the j"th general register row to which the jth target operand address belongs, and In the j"-th general-purpose register row, the corresponding general-purpose register is accessed according to the at least one column address bit and the at least one target address bit included in the j-th target operand address, j" is an integer greater than or equal to 1 and less than or equal to M.
在一种可能的实现方式中,所述Y个源操作数地址中的任意两个不同的源操作数地址属于不同的所述通用寄存器列;所述X个目的操作数地址中的任意两个不同的目的操作数地址属于不同的所述通用寄存器列。In a possible implementation manner, any two different source operand addresses among the Y source operand addresses belong to different general register columns; any two among the X destination operand addresses Different destination operand addresses belong to different columns of the general-purpose register.
在一种可能的实现方式中,所述处理器还包括临时寄存器,所述临时寄存器与目标通 用寄存器对应;所述目标通用寄存器为所述M行*N列通用寄存器组中的一个;所述临时寄存器存储有所述目标通用寄存器内的数据;所述Y个源操作数地址中包括多个相同的数量大于第一阈值的第一源操作数地址,以及第二源操作数地址;所述第一源操作数地址对应的所述通用寄存器为所述目标通用寄存器,所述第二源操作数地址属于所述目标通用寄存器所在的所述通用寄存器列;所述方法还包括:In a possible implementation manner, the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the Y source operand addresses include a plurality of first source operand addresses whose number is greater than the first threshold, and second source operand addresses; the The general-purpose register corresponding to the first source operand address is the target general-purpose register, and the second source operand address belongs to the general-purpose register row where the target general-purpose register is located; the method further includes:
通过所述数据选择单元,根据所述第一源操作数地址访问对应的所述临时寄存器,获取对应的源操作数;Through the data selection unit, access the corresponding temporary register according to the first source operand address, and obtain the corresponding source operand;
在一种可能的实现方式中,所述处理器还包括临时寄存器,所述临时寄存器与目标通用寄存器对应;所述目标通用寄存器为所述M行*N列通用寄存器组中的一个;所述临时寄存器存储有所述目标通用寄存器内的数据;所述X个目的操作数地址中包括多个相同的数量大于第一阈值的第一目的操作数地址,以及第二目的操作数地址;所述第一目的操作数地址对应的所述通用寄存器为所述目标通用寄存器,所述第二目的操作数地址属于所述目标通用寄存器所在的所述通用寄存器列;所述方法还包括:In a possible implementation manner, the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the X destination operand addresses include a plurality of first destination operand addresses whose number is greater than the first threshold, and the second destination operand address; the The general-purpose register corresponding to the first destination operand address is the target general-purpose register, and the second destination operand address belongs to the general-purpose register row where the target general-purpose register is located; the method further includes:
通过所述结果写回单元,根据所述目标目的操作数地址访问对应的所述临时寄存器,并将对应的所述计算结果写回所述临时寄存器中。Through the result write-back unit, the corresponding temporary register is accessed according to the target destination operand address, and the corresponding calculation result is written back into the temporary register.
在一种可能的实现方式中,所述处理器还包括与所述指令译码单元连接的指令获取单元;所述方法还包括:In a possible implementation manner, the processor further includes an instruction acquisition unit connected to the instruction decoding unit; the method further includes:
通过所述指令获取单元,获取待执行的所述X条指令,并将所述X条指令发送至所述指令译码单元;所述X条指令为所述处理器在一个时钟周期内并行执行的指令。Obtain the X instructions to be executed by the instruction acquisition unit, and send the X instructions to the instruction decoding unit; the X instructions are executed in parallel by the processor within one clock cycle instruction.
在一种可能的实现方式中,所述Y个源操作数地址以及所述X个目的操作数地址的访问类型均为直接访问类型或者均为间接访问类型。In a possible implementation manner, the access types of the Y source operand addresses and the X destination operand addresses are both direct access types or indirect access types.
第三方面,本发明提供一种半导体芯片,可包括上述第一方面中的任意一种实现方式所提供的处理器。In a third aspect, the present invention provides a semiconductor chip, which may include the processor provided by any one of the implementation manners of the foregoing first aspect.
第四方面,本发明提供一种半导体芯片,可包括:上述第一方面中的任意一种实现方式所提供的处理器、耦合于所述多核处理器的内部存储器以及外部存储器。In a fourth aspect, the present invention provides a semiconductor chip, which may include: the processor provided by any one of the implementation manners of the first aspect, an internal memory coupled to the multi-core processor, and an external memory.
第五方面,本发明提供一种片上系统SoC芯片,该SoC芯片包括上述第一方面中的任意一种实现方式所提供的处理器、耦合于所述处理器的内部存储器和外部存储器。该SoC芯片,可以由芯片构成,也可以包含芯片和其他分立器件。In a fifth aspect, the present invention provides a system-on-chip SoC chip, where the SoC chip includes the processor provided by any one of the implementation manners of the first aspect, an internal memory coupled to the processor, and an external memory. The SoC chip may be composed of chips, or may include chips and other discrete devices.
第六方面,本发明提供了一种芯片系统,该芯片系统包括上述第一方面中的任意一种实现方式所提供的多核处理器。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存所述多核处理器在运行过程中所必要或相关的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其它分立器件。In a sixth aspect, the present invention provides a chip system, where the chip system includes the multi-core processor provided by any one of the implementation manners of the foregoing first aspect. In a possible design, the chip system further includes a memory, and the memory is used for saving necessary or related program instructions and data during the operation of the multi-core processor. The chip system may be composed of chips, or may include chips and other discrete devices.
第七方面,本发明提供一种处理装置,该处理装置具有实现上述第二面中的任意一种处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a seventh aspect, the present invention provides a processing device having a function of implementing any one of the processing methods in the second aspect. This function can be implemented by hardware or by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.
第八方面,本发明提供一种终端,该终端包括处理器,该处理器为上述第一方面中的任意一种实现方式所提供的处理器。该终端还可以包括存储器,存储器用于与处理器耦合, 其保存终端必要的程序指令和数据。该终端还可以包括通信接口,用于该终端与其它设备或通信网络通信。In an eighth aspect, the present invention provides a terminal, where the terminal includes a processor, and the processor is the processor provided by any one of the implementation manners of the foregoing first aspect. The terminal may also include a memory for coupling with the processor that holds program instructions and data necessary for the terminal. The terminal may also include a communication interface for the terminal to communicate with other devices or a communication network.
第九方面,本发明提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述第二方面中任意一项所述的处理方法流程。In a ninth aspect, the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the process flow of any one of the above-mentioned second aspects. .
第十方面,本发明实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被处理器执行时,使得处理器可以执行上述第二方面中任意一项所述的处理方法流程。In a tenth aspect, an embodiment of the present invention provides a computer program, where the computer program includes instructions that, when the computer program is executed by a processor, enable the processor to execute the processing method flow described in any one of the second aspect above .
图1为现有技术中的一种IE结构示意图。FIG. 1 is a schematic diagram of an IE structure in the prior art.
图2为现有技术中的另一种IE结构示意图。FIG. 2 is a schematic diagram of another IE structure in the prior art.
图3为现有技术中的一种程序拆分示意图。FIG. 3 is a schematic diagram of a program split in the prior art.
图4为现有技术中的一种基于PV提取的指令处理示意图。FIG. 4 is a schematic diagram of instruction processing based on PV extraction in the prior art.
图5为本发明实施例提供的一种基于FV的指令处理示意图。FIG. 5 is a schematic diagram of an FV-based instruction processing provided by an embodiment of the present invention.
图6为本发明实施例提供的一种处理器的结构示意图。FIG. 6 is a schematic structural diagram of a processor according to an embodiment of the present invention.
图7为本发明实施例提供的又一种处理器的结构示意图。FIG. 7 is a schematic structural diagram of another processor according to an embodiment of the present invention.
图8a-图8e为本发明实施例提供的一种通用寄存器的划分方式示意图。8a-8e are schematic diagrams of a division manner of a general-purpose register according to an embodiment of the present invention.
图9为本发明实施例提供的一种数据选择的示意图。FIG. 9 is a schematic diagram of a data selection provided by an embodiment of the present invention.
图10为本发明实施例提供的一种结果写回的示意图。FIG. 10 is a schematic diagram of a result write-back provided by an embodiment of the present invention.
图11为本发明实施例提供的一种处理方法的流程示意图。FIG. 11 is a schematic flowchart of a processing method provided by an embodiment of the present invention.
下面将结合本发明实施例中的附图,对本发明实施例进行描述。The embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.
本发明的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。并且,本申请的说明书和权利要求以及附图中所提及的“行”与“列”,并非绝对实际的“行”与“列”,也即本申请实施例中的M行*N列寄存器组在实际电路中并非是完全按照垂直的“行”与“列”进行排布的。在一些可能的实施方式中,本申请可以基于向量的方式来约束“行”与“列”,例如本申请实施例中的N列通用寄存器组可以对应于N个列向量,而M行通用寄存器组可以对应于M个行向量,其中,向量中的每一个元素均可以代表一个通用寄存器组。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" in the description and claims of the present invention and the accompanying drawings are used to distinguish different objects, rather than to describe a specific order. . Moreover, the “rows” and “columns” mentioned in the description, claims and drawings of the present application are not absolutely actual “rows” and “columns”, that is, M rows*N columns in the embodiments of the present application Register banks are not completely arranged in vertical "rows" and "columns" in actual circuits. In some possible implementations, the present application may constrain “rows” and “columns” based on vectors. For example, the N-column general-purpose register group in this embodiment of the present application may correspond to N column vectors, and M-row general-purpose registers A group may correspond to M row vectors, where each element in the vector may represent a general-purpose register group. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式 地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。The terms "component", "module", "system" and the like are used in this specification to refer to a computer-related entity, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be components. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between 2 or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals) Communicate through local and/or remote processes.
首先,对本发明中的部分用语进行解释说明,以便于本领域技术人员理解。First, some terms in the present invention will be explained so as to facilitate the understanding of those skilled in the art.
(1)指令流水线,是为提高处理器执行指令的效率,把一条指令的操作分成多个细小的步骤,每个步骤由专门的电路完成的方式。例如,一条指令要执行要经过3个阶段:取指令、译码、执行,每个阶段都要花费一个机器周期,如果没有采用流水线技术,那么这条指令执行需要3个机器周期;如果采用了指令流水线技术,那么当这条指令完成“取指”后进入“译码”的同时,下一条指令就可以进行“取指”了,这样就提高了指令的执行效率。一般情况下,流水线级数越多,也即一条指令被划分成越多个小步骤,指令的执行效率越高。(1) The instruction pipeline is to improve the efficiency of the processor to execute instructions, and divide the operation of an instruction into multiple small steps, and each step is completed by a special circuit. For example, an instruction needs to go through 3 stages to execute: fetch, decode, and execute. Each stage takes one machine cycle. If pipeline technology is not used, then this instruction needs 3 machine cycles to execute; The instruction pipeline technology, then when this instruction completes the "instruction fetch" and then enters the "decoding", the next instruction can be "instructed fetch", which improves the execution efficiency of the instruction. In general, the more pipeline stages, that is, the more small steps an instruction is divided into, the higher the execution efficiency of the instruction.
请参阅图1,图1为现有技术中的一种指令处理引擎(Instruction Engine,IE)结构示意图。处理器的IE结构可以如图1所示。图1中以五级流水线的处理器为例,该流水线结构中一条指令的生命周期可包括取指令流水线→指令译码流水线→数据选择流水线→执行流水线→写回流水线,也即是该流水线结构将一条指令的执行过程至少分为五个阶段,其中每级流水完成的基本功能如下:Please refer to FIG. 1. FIG. 1 is a schematic structural diagram of an instruction processing engine (Instruction Engine, IE) in the prior art. The IE structure of the processor may be as shown in FIG. 1 . In Figure 1, a processor with a five-stage pipeline is taken as an example. The life cycle of an instruction in the pipeline structure may include an instruction fetch pipeline → an instruction decoding pipeline → a data selection pipeline → an execution pipeline → a write return pipeline, which is the pipeline structure. The execution process of an instruction is divided into at least five stages, in which the basic functions of each stage of pipeline are as follows:
取指令流水线:取指令(Instruction Fetching,IF)是基于程序计数器(Program Counter,PC)作为地址从指令存储器(Instruction Memory,IMEM)读取指令,并完成校验后将指令发送给指令译码器(Instruction Decoder,ID);Instruction Fetching Pipeline: Instruction Fetching (IF) is to read instructions from Instruction Memory (IMEM) based on the Program Counter (PC) as an address, and send the instructions to the instruction decoder after completing the verification (Instruction Decoder, ID);
指令译码流水线:通过指令译码器对输入的指令进行译码并做合法性检查,然后提取该指令的命令类型,操作数类型,立即数,源操作数地址和目的操作数地址;Instruction decoding pipeline: decode the input instruction through the instruction decoder and check the validity, and then extract the command type, operand type, immediate value, source operand address and destination operand address of the instruction;
数据选择流水线:通过数据选择器(Data Selector,DS)根据指令译码流水线中得到的源操作数地址从通用寄存器(General-Purpose Register,GPR)中选择数据,同时需进行数据相关性分析;其中,数据选择器有时也称之为多路选择器(multiplexers,Mux);如图1中的“4×2×16 bit src operands(源操作数)”所示,可以表明处理器一拍可并行处理4条指令,每条指令可以包括两个源操作数,也即一拍可从通用寄存器中获取8个源操作数,其中,每个源操作数的大小可以为16bit,也即2byte,等等,此处不再进行赘述;Data selection pipeline: Data selector (Data Selector, DS) selects data from general-purpose registers (General-Purpose Register, GPR) according to the source operand address obtained in the instruction decoding pipeline, and data correlation analysis is required at the same time; , the data selector is sometimes called multiplexers (Mux); as shown in "4×2×16 bit src operands (source operand)" in Figure 1, it can be shown that the processor can be parallelized in one shot Processing 4 instructions, each instruction can include two source operands, that is, 8 source operands can be obtained from the general-purpose register in one shot, and the size of each source operand can be 16bit, that is, 2byte, etc. etc., will not be repeated here;
执行流水线(Execute,EX):通过多个算术逻辑单元根据命令类型以及获取的操作数执行相应的算术、逻辑和移位等操作,并输出操作结果;指令执行是指对指令进行真正运算的过程。例如,如果指令是一条加法运算指令,则对操作数进行加法计算;如果是减法 运算指令,则对操作数进行减法计算,等等,此处不再进行赘述;Execution pipeline (Execute, EX): Through multiple arithmetic logic units, the corresponding arithmetic, logic and shift operations are performed according to the command type and the obtained operands, and the operation results are output; instruction execution refers to the process of performing real operations on instructions . For example, if the instruction is an addition operation instruction, the operand is added; if it is a subtraction operation instruction, the operand is subtracted, etc., which will not be repeated here;
写回流水线:写回(Write-Back,WB)是指将指令执行的结果写回通用寄存器组的过程,或者是写入数据存储器(data memory),又或者是从数据存储器中读取数据并写回GPR的过程,也即如图1所示的执行读出/写入(load/store);如图1中的“4×16bit dst operands(目的操作数)”所示,可以表明处理器一拍可并行处理4条指令,每条指令可以包括一个目的操作数,也即一拍可将4条指令的计算结果写入对应的通用寄存器中,其中,每个目的操作数(也即计算结果)的大小可以为16bit,也即2byte,等等,此处不再进行赘述;Write-back pipeline: Write-back (Write-Back, WB) refers to the process of writing the result of instruction execution back to the general-purpose register set, or writing to the data memory, or reading data from the data memory and The process of writing back to GPR, that is, performing read/write (load/store) as shown in Figure 1; as shown in "4×16bit dst operands (destination operand)" in Figure 1, it can indicate that the processor One shot can process 4 instructions in parallel, and each instruction can include a destination operand, that is, the calculation results of the 4 instructions can be written into the corresponding general-purpose registers in one shot. The size of the result) can be 16bit, that is, 2byte, etc., which will not be repeated here;
可以理解的是,上述处理器架构和处理器的流水线结构只是本发明实施例提供的一些示例性的实施方式,本发明实施例中的处理器架构和处理器的流水线结构包括但不仅限于以上实现方式。It can be understood that the above-mentioned processor architecture and the pipeline structure of the processor are only some exemplary implementations provided by the embodiments of the present invention, and the processor architecture and the pipeline structure of the processor in the embodiments of the present invention include but are not limited to the above implementations. Way.
显然,如图1所示,GPR的容量只有32byte,由于GPR容量较小往往会导致在上述写回流水线中,计算结果无法写回GPR的情况下暂时先将计算结果写入data memory中,然后当后续指令需要使用该计算结果时,再从data memory中读出至GPR,如此便会导致频繁地在data memory和GPR之间进行数据换入换出,降低指令执行效率,甚至造成流水停顿等等。因此,针对此类情况,有一种解决方案是将原来的整个data memory全部作为GPR,也即扩大GPR的容量,使其包括全范围矢量(Full Vector,FV),也即包括操作数选择的全范围。请参阅图2,图2为现有技术中的另一种IE结构示意图。如图2所示,相较于图1中的容量仅为32byte的GPR,图2中的GPR容量扩大至了512byte,一定程度上保证了数据均可以存储至GPR中,而无需频繁在GPR和之间进行数据的换入换出。然而,如图2所示,直接扩大GPR的容量,将会导致在数据选择流水线中,处理器需要在这个512byte的数据范围内根据译码得到的源操作数地址进行选择,以获取对应的源操作数,如此,便会极大程度上提高选数的逻辑代价。Obviously, as shown in Figure 1, the capacity of GPR is only 32 bytes. Due to the small capacity of GPR, in the above write-back pipeline, if the calculation result cannot be written back to GPR, the calculation result is temporarily written into the data memory, and then When the subsequent instruction needs to use the calculation result, it will be read from the data memory to the GPR, which will result in frequent data swapping in and out between the data memory and the GPR, reducing the efficiency of instruction execution, and even causing the pipeline to stop, etc. Wait. Therefore, for such cases, there is a solution to use the entire original data memory as a GPR, that is, to expand the capacity of the GPR to include the full range vector (Full Vector, FV), that is, to include the full range of operand selection. scope. Please refer to FIG. 2 , which is a schematic diagram of another IE structure in the prior art. As shown in Figure 2, compared to the GPR with a capacity of only 32 bytes in Figure 1, the capacity of the GPR in Figure 2 has been expanded to 512 bytes, which ensures that all data can be stored in the GPR to a certain extent, without the need to frequently update the GPR and GPR. Data is swapped in and out. However, as shown in Figure 2, directly expanding the capacity of the GPR will result in that in the data selection pipeline, the processor needs to select the source operand address obtained by decoding within the 512byte data range to obtain the corresponding source operand address. Operands, in this way, will greatly increase the logical cost of selection.
综上,为了便于理解本发明实施例,进一步分析并提出本发明所具体要解决的技术问题。在现有技术中,关于缩小选数范围,降低操作数选择的逻辑代价,包括多种技术方案,以下示例性的列举如下常用的一种方案。To sum up, in order to facilitate the understanding of the embodiments of the present invention, the technical problems to be solved by the present invention are further analyzed and proposed. In the prior art, regarding narrowing the selection range and reducing the logic cost of operand selection, various technical solutions are included, and the following is an example of a commonly used solution.
方案一:从FV中提取部分矢量(Partial Vector,PV),将PV作为操作数的选数范围,从PV中进行操作数选择。Scheme 1: Extract a partial vector (PV) from FV, use PV as the selection range of operands, and select operands from PV.
为提升并行执行指令的能力,现在处理器多采用多条指令并行的超长指令字(Very Long Instruction Word,VLIW)架构以加速指令的执行。这样,对于大位宽FV(例如图2所示的512B(byte)的FV,甚至可能是1024byte的FV),则选数代价将会被进一步放大。In order to improve the ability to execute instructions in parallel, processors now use a Very Long Instruction Word (VLIW) architecture in which multiple instructions are parallelized to speed up the execution of instructions. In this way, for a large bit width FV (for example, the FV of 512B (byte) shown in FIG. 2 , or even the FV of 1024 bytes), the selection cost will be further amplified.
为解决将整个FV作为操作数的选数范围带来的选数代价大的问题,提出了一种从FV中提取部分PV作为VLIW的操作数的解决方案。该方案的大体思想是通过编译器将程序拆分成若干相互连接的小的程序块,然后在执行上一个程序块的时候会完成指定下一个程序块的入口,如此反复,程序会选择一条程序路径完成指定程序的处理。请参阅图3,图3为现有技术中的一种程序拆分示意图。如图3所示,图中每个圆圈表示一个程序块,程序块之间的连线表示程序路径。编译器针对每个待执行的指令中需要引用的源操作数和需要 写回的目的操作数,提前通过部分矢量构造(PV Builder,PVB)生成PV。然后,使用PV作为IE的源操作数和目的操作数的选数范围,完成程序块的执行后再将通过全范围矢量构造(FV Builder,FVB)将PV还原到FV。请参阅图4,图4为现有技术中的一种基于PV提取的指令处理示意图。如图4所示,FV为512byte,提取得到的PV只有128byte,可选地,提取得到的PV可以由128byte的通用寄存器组存储(例如一个通用寄存器可存储1byte的数据,则该通用寄存器组至少可包括128个通用寄存器,等等),如此一来,缩小了选数范围,降低了选数的逻辑代价。In order to solve the problem of high selection cost caused by using the entire FV as the selection range of the operand, a solution is proposed to extract part of PV from FV as the operand of VLIW. The general idea of this scheme is to use the compiler to split the program into several interconnected small program blocks, and then when the previous program block is executed, the entry of the next program block will be specified. Repeat this, the program will select a program The path completes the processing of the specified program. Please refer to FIG. 3 , which is a schematic diagram of program splitting in the prior art. As shown in Figure 3, each circle in the figure represents a program block, and the connection between the program blocks represents the program path. The compiler generates PV through partial vector construction (PV Builder, PVB) in advance for the source operands that need to be referenced and the destination operands that need to be written back in each instruction to be executed. Then, use PV as the selection range of the source operand and destination operand of IE, and then restore the PV to FV through the full-range vector construction (FV Builder, FVB) after the execution of the program block is completed. Please refer to FIG. 4 , which is a schematic diagram of instruction processing based on PV extraction in the prior art. As shown in Figure 4, the FV is 512 bytes, and the extracted PV is only 128 bytes. Optionally, the extracted PV can be stored in a 128-byte general-purpose register group (for example, a general-purpose register can store 1-byte data, then the general-purpose register group at least It can include 128 general-purpose registers, etc.), in this way, the selection range is narrowed and the logic cost of selection is reduced.
该方案一的缺点:Disadvantages of this scheme one:
1、需要增加PVB来从FV中提取PV,以及需要增加FVB来将PV还原成FV,由此会造成电路面积和功耗开销明显增加;1. It is necessary to increase PVB to extract PV from FV, and to increase FVB to restore PV to FV, which will cause a significant increase in circuit area and power consumption overhead;
2、源操作数和目的操作数的访问类型可能包括直接访问和间接访问,如此一来,PV提取容易引起间接访问和直接访问通用寄存器之间的踩踏问题;2. The access types of the source operand and the destination operand may include direct access and indirect access. As a result, PV extraction is likely to cause a stampede problem between indirect access and direct access to general-purpose registers;
3、为减少PVB提取的代价和间接寻址的代价(PVB按照4个byte对齐提取段(segment)),间接寻址被迫仅支持1byte粒度的操作数,导致操作数大小为2byte的指令被迫拆成2条指令,从而降低了指令执行效率,并且增加了指令数量,也即增加了指令空间;3. In order to reduce the cost of PVB extraction and the cost of indirect addressing (PVB aligns the extraction segment according to 4 bytes), indirect addressing is forced to only support operands with a granularity of 1byte, resulting in instructions with an operand size of 2byte being used. Forcibly split into 2 instructions, thereby reducing the efficiency of instruction execution, and increasing the number of instructions, that is, increasing the instruction space;
4、为节省指令空间以及增强cache命中率,需要硬件做基于全范围矢量FV-based的指令操作数到基于部分矢量PV-based指令操作数的翻译,进一步增加了硬件成本。4. In order to save the instruction space and enhance the cache hit rate, the hardware needs to do the translation of the instruction operand based on the full range vector FV-based to the instruction operand based on the partial vector PV-based, which further increases the hardware cost.
综上,上述现有技术中的方案一虽然通过PV提取缩小了选数范围,一定程度上降低了选数代价,但是也增加了成本,并带来了其余问题,例如上述增加指令空间,降低指令执行效率等等问题。没有真正意义上解决选数的逻辑代价问题,也无法保证指令的执行效率。因此,为了解决当前应用处理技术中不满足实际业务需求的问题,本发明实际要解决的技术问题包括如下方面:To sum up, although the first solution in the above-mentioned prior art narrows the selection range through PV extraction and reduces the selection cost to a certain extent, it also increases the cost and brings about other problems, such as the above-mentioned increase of instruction space, reduction of Instruction execution efficiency and so on. There is no real solution to the logical cost of number selection, nor can the execution efficiency of instructions be guaranteed. Therefore, in order to solve the problem that the current application processing technology does not meet the actual business requirements, the technical problems to be solved in the present invention include the following aspects:
1、基于原有的FV,缩小选数范围,减少选数代价,有效解决PV提取和PV还原引入的成本增加问题;请参阅图5,图5为本发明实施例提供的一种基于FV的指令处理示意图,如图5所示,基于原有的FV,在不增加PV提取和PV还原的前提下,约束源操作数和目的操作数的选数范围,降低数据选择的逻辑代价;1. Based on the original FV, the selection range is narrowed, the selection cost is reduced, and the problem of cost increase introduced by PV extraction and PV reduction is effectively solved; please refer to FIG. The schematic diagram of instruction processing, as shown in Figure 5, based on the original FV, without adding PV extraction and PV restoration, the selection range of the source operand and the destination operand is constrained to reduce the logical cost of data selection;
2、有效解决PV提取导致的间接访问和直接访问寄存器的踩踏问题;2. Effectively solve the trampling problem of indirect access and direct access registers caused by PV extraction;
3、有效节省指令空间以及增强cache命中率,并节省需要硬件做FV-based的指令操作数到PV-based的指令操作数的翻译的开销;3. Effectively save the instruction space and enhance the cache hit rate, and save the cost of translation from FV-based instruction operands to PV-based instruction operands in hardware;
4、由于方案一中不同程序块(也即指令块(bundle))所对应提取的PV不同,因此指令块之间无法实现自由跳转,因而需为执行跳转指令实现不同指令块之间无约束的跳转,这种跳转可以有效解决需要专用分支加速进行程序跳转的代价。4. Since the PVs corresponding to different program blocks (that is, the bundles) extracted in Scheme 1 are different, free jumps cannot be realized between the instruction blocks, so it is necessary to execute the jump instruction to achieve no connection between different instruction blocks. Constrained jumps, which can effectively solve the cost of program jumps that require dedicated branch acceleration.
基于上述处理器架构和处理器的流水线结构,本发明提供一种处理器。请参阅图6,图6为本发明实施例提供的一种处理器的结构示意图。该处理器10可以位于任意一个电子设备中,如电脑、计算机、手机、平板、个人数字助理、智能穿戴设备、智能车载或智能 家电等各类设备中。该处理器10具体可以是芯片或芯片组或搭载有芯片或者芯片组的电路板。该芯片或芯片组或搭载有芯片或芯片组的电路板可在必要的软件驱动下工作。Based on the above-mentioned processor architecture and the pipeline structure of the processor, the present invention provides a processor. Please refer to FIG. 6, which is a schematic structural diagram of a processor according to an embodiment of the present invention. The processor 10 may be located in any electronic device, such as a computer, a computer, a mobile phone, a tablet, a personal digital assistant, a smart wearable device, a smart vehicle, or a smart home appliance. The processor 10 may specifically be a chip or a chip set or a circuit board on which the chip or the chip set is mounted. The chip or chip set or the circuit board on which the chip or chip set is mounted can be driven by necessary software.
具体地,处理器10可以包括指令译码单元103、数据选择单元104,以及与所述数据选择单元连接的M行*N列寄存器组107,M行*N列寄存器组107中的每一个通用寄存器组可以包括K个通用寄存器,M、N和K为大于或者等于1的整数。其中,指令译码单元103与数据选择单元104连接,指令译码单元103运行在处理器10的指令译码流水线阶段,以完成对待执行的X条指令进行译码,提取该X条指令各自的至少一个源操作数地址(共计Y个源操作数地址),并将该Y个源操作数地址发送至数据选择单元104。X、Y为大于或者等于1的整数。其中,数据选择单元104运行在处理器10的数据选择流水线阶段,以完成根据源操作数地址从M行*N列寄存器组107中选择对应的通用寄存器,获取对应的源操作数。具体地,通过指令译码单元103得到的每一个源操作数地址中可以包括至少一个列地址位、至少一个行地址位和至少一个目标地址位。该至少一个列地址位可以用于指示该每一个源操作数地址所属的通用寄存器列,该至少一个行地址位用于指示每一个源操作数地址所属的通用寄存器行,该至少一个目标地址位用于指示每一个源操作数地址与该K个通用寄存器中的第t个通用寄存器的对应关系,t为大于或者等于1,且小于或者等于K的整数。例如M和N为4,K为2,也即划分有4行*4列通用寄存器组,每个通用寄存器组中可以包括2个通用寄存器,则每一个源操作数地址中可以包括2个列地址位、2个行地址位和1个目标地址位。例如,该2个列地址位数值可以为00、01、10或者11,其中,00可以指示该源操作数地址属于第1个通用寄存器列,01可以指示该源操作数地址属于第2个通用寄存器列,10可以指示该源操作数地址属于第3个通用寄存器列,11可以指示该源操作数地址属于第4个通用寄存器列,等等,此处不再进行赘述。可选地,若该Y个源操作数地址中的第i个源操作数地址中包括2个列地址位、2个行地址位和1个目标地址位,且其数值分别为10、11和1,则数据选择单元104可以根据该第i个源操作数地址中包括的该2个列地址位、2个行地址位和1个目标地址位在M行*N列通用寄存器组107中访问对应的通用寄存器。例如,数据选择单元104可以在第3个通用寄存器列中,访问第4个通用寄存器行内的第2个通用寄存器,以获取对应的源操作数;或者,在第4个通用寄存器行中,访问第3个通用寄存器列内的第2个通用寄存器,以获取对应的源操作数。i为大于或者等于1,且小于或者等于Y的整数。如此,可以基于预先划分的M行*N列寄存器组,以及源操作数地址中包括的至少一个列地址位、至少一个行地址位和至少一个目标地址位,针对每一个源操作数地址从该至少一个列地址位指示的通用寄存器列中选择对应的通用寄存器,或者从该至少一个行地址位指示的通用寄存器行中选择对应的通用寄存器,以获取对应的源操作数,从而实现缩小选数范围,减少选数的逻辑代价。Specifically, the processor 10 may include an instruction decoding unit 103, a data selection unit 104, and an M row*N column register group 107 connected to the data selection unit, and each of the M row*N column register group 107 is common The register group may include K general registers, where M, N and K are integers greater than or equal to 1. The instruction decoding unit 103 is connected to the data selection unit 104, and the instruction decoding unit 103 operates in the instruction decoding pipeline stage of the processor 10 to complete the decoding of the X instructions to be executed, and extract the respective X instructions. At least one source operand address (Y source operand addresses in total), and the Y source operand addresses are sent to the data selection unit 104 . X and Y are integers greater than or equal to 1. The data selection unit 104 operates in the data selection pipeline stage of the processor 10 to select the corresponding general-purpose register from the M row*N column register group 107 according to the source operand address, and obtain the corresponding source operand. Specifically, each source operand address obtained by the instruction decoding unit 103 may include at least one column address bit, at least one row address bit and at least one target address bit. The at least one column address bit may be used to indicate a general-purpose register column to which each source operand address belongs, the at least one row address bit may be used to indicate a general-purpose register row to which each source operand address belongs, and the at least one destination address bit It is used to indicate the correspondence between each source operand address and the t-th general-purpose register among the K general-purpose registers, where t is an integer greater than or equal to 1 and less than or equal to K. For example, M and N are 4, and K is 2, that is, there are 4 rows*4 columns of general-purpose register groups. Each general-purpose register group can include 2 general-purpose registers, and each source operand address can include 2 columns. address bits, 2 row address bits, and 1 destination address bit. For example, the two column address bit values can be 00, 01, 10 or 11, where 00 can indicate that the source operand address belongs to the first general register column, and 01 can indicate that the source operand address belongs to the second general register Register row, 10 may indicate that the source operand address belongs to the third general-purpose register row, 11 may indicate that the source operand address belongs to the fourth general-purpose register row, and so on, which will not be repeated here. Optionally, if the i-th source operand address in the Y source operand addresses includes 2 column address bits, 2 row address bits and 1 destination address bit, and its values are 10, 11 and 10 respectively. 1, then the data selection unit 104 can access in the M row*N column general register group 107 according to the 2 column address bits, 2 row address bits and 1 target address bit included in the i-th source operand address the corresponding general-purpose registers. For example, the data selection unit 104 can access the second general-purpose register in the fourth general-purpose register row in the third general-purpose register row to obtain the corresponding source operand; or, in the fourth general-purpose register row, access The 2nd general-purpose register in the 3rd general-purpose register column to obtain the corresponding source operand. i is an integer greater than or equal to 1 and less than or equal to Y. In this way, based on the pre-divided M row*N column register group, and at least one column address bit, at least one row address bit and at least one destination address bit included in the source operand address, for each source operand address from the Select the corresponding general-purpose register from the general-purpose register column indicated by at least one column address bit, or select the corresponding general-purpose register from the general-purpose register row indicated by the at least one row address bit to obtain the corresponding source operand, so as to realize the narrowing selection range, reducing the logical cost of selecting numbers.
在一种可能的实现方式中,请参阅图7,图7为本发明实施例提供的又一种处理器的结构示意图。如图7所示,处理器10还可以包括指令存储器101、指令获取单元102、执行单元105和结果写回单元106。其中,如图7所示,指令存储器101、指令获取单元102、指令译码单元103、数据选择单元104、执行单元105和结果写回单元106可以依次连接;M行*N列寄存器组107分别与数据选择单元104以及结果写回单元106连接。其中,指令 获取单元102运行在处理器10的取指令流水线阶段,以完成从指令存储器101中获取待执行的X条指令,并可以对该X条指令进行校验。可选地,指令译码单元103可以对待执行的X条指令进行译码,提取该X条指令各自的至少一个源操作数地址,还可以提取该待执行的X条指令各自的目的操作数地址(共计X个目的操作数地址),以及命令类型、操作数类型和立即数等。可选地,指令译码单元103还可以对该X条指令分别进行合法性检查,等等。可选地,如上所述,通过指令译码得到的目的操作数地址中可以包括至少一个列地址位、至少一个行地址位和至少一个目标地址位。该至少一个列地址位可以用于指示该目的操作数地址所属的通用寄存器列,该至少一个行地址位可以用于指示该目的操作数地址所属的通用寄存器行,该至少一个目标地址位可以用于指示该目的操作数地址与该K个通用寄存器中的第t个通用寄存器的对应关系。其中,执行单元105可以包括如图7所示的多个算术逻辑单元(Arithmetic Logic Unit,ALU)(例如包括算术逻辑单元1051和算术逻辑单元1052,等等),可选地,还可以包括其他用于执行计算任务的单元,等等,本发明实施例对此不作具体限定。执行单元105运行在处理器10的执行流水线阶段,可以通过其中的算术逻辑单元完成指令的计算任务,得到对应的计算结果。可选地,基于VLIW架构,处理器10可以调用多个算术逻辑单元并行执行多条指令的任务,得到多条指令的计算结果,该X条指令可以为处理器在一拍内并行执行的指令。例如,X为4,也即一层(stage)内共有4条待执行的指令,则可以通过4个算术逻辑单元并行执行该4条指令的任务。其中,结果写回单元106运行在处理器10的写回流水线阶段,以完成根据目的操作数地址从该M行*N列寄存器组107中选择对应的通用寄存器,并将计算结果写回该对应的通用寄存器中。可选地,如上所述,例如,该X个目的操作数地址中的第j个目的操作数地址中包括2个列地址位、2个行地址位和1个目标地址位,且其数值分别为00、01和0,则结果写回单元107可以根据该第i个源操作数地址中包括的该2个列地址位、2个行地址位和1个目标地址位在M行*N列通用寄存器组107中访问对应的通用寄存器。例如,结果写回单元106可以在第1个通用寄存器列中,访问第2个通用寄存器行内的第1个通用寄存器,以将对应的计算结果写入该通用寄存器中;或者,在第2个通用寄存器行中,访问第1个通用寄存器列内的第1个通用寄存器。j为大于或者等于1,且小于或者等于X的整数。如此,可以基于预先划分的M行*N列寄存器组,以及目的操作数地址中包括的至少一个列地址位、至少一个行地址位和至少一个目标地址位,针对每一个目的操作数地址从该至少一个列地址位指示的通用寄存器列中选择对应的通用寄存器,或者从该至少一个行地址位指示的通用寄存器行中选择对应的通用寄存器,并将对应的计算结果写入该通用寄存器中,实现缩小选数范围,减少选数的逻辑代价。In a possible implementation manner, please refer to FIG. 7 , which is a schematic structural diagram of another processor according to an embodiment of the present invention. As shown in FIG. 7 , the processor 10 may further include an instruction memory 101 , an instruction retrieval unit 102 , an execution unit 105 and a result write-back unit 106 . Among them, as shown in FIG. 7, the instruction memory 101, the instruction acquisition unit 102, the instruction decoding unit 103, the data selection unit 104, the execution unit 105 and the result write-back unit 106 can be connected in sequence; Connected to the data selection unit 104 and the result write-back unit 106 . Wherein, the instruction acquisition unit 102 operates in the instruction fetch pipeline stage of the processor 10 to complete the acquisition of X instructions to be executed from the instruction memory 101, and can verify the X instructions. Optionally, the instruction decoding unit 103 can decode the X instructions to be executed, extract at least one source operand address of each of the X instructions, and can also extract the respective destination operand addresses of the X instructions to be executed. (a total of X destination operand addresses), as well as command type, operand type and immediate data, etc. Optionally, the instruction decoding unit 103 may further perform validity checking on the X instructions, and so on. Optionally, as described above, the destination operand address obtained through instruction decoding may include at least one column address bit, at least one row address bit and at least one target address bit. The at least one column address bit can be used to indicate the general-purpose register column to which the destination operand address belongs, the at least one row address bit can be used to indicate the general-purpose register row to which the destination operand address belongs, and the at least one target address bit can be used with to indicate the correspondence between the destination operand address and the t-th general-purpose register among the K general-purpose registers. The execution unit 105 may include multiple arithmetic logic units (Arithmetic Logic Units, ALUs) as shown in FIG. 7 (for example, including the arithmetic logic unit 1051 and the arithmetic logic unit 1052, etc.), and optionally, may also include other A unit for performing a computing task, etc., is not specifically limited in this embodiment of the present invention. The execution unit 105 runs in the execution pipeline stage of the processor 10, and can complete the calculation task of the instruction through the arithmetic logic unit therein, and obtain the corresponding calculation result. Optionally, based on the VLIW architecture, the processor 10 can call multiple arithmetic logic units to execute tasks of multiple instructions in parallel to obtain the calculation results of the multiple instructions, and the X instructions can be instructions that the processor executes in parallel within one beat. . For example, if X is 4, that is, there are 4 instructions to be executed in a layer (stage), the tasks of the 4 instructions can be executed in parallel by 4 arithmetic logic units. The result write-back unit 106 operates in the write-back pipeline stage of the processor 10 to select the corresponding general-purpose register from the M row*N column register group 107 according to the destination operand address, and write the calculation result back to the corresponding general-purpose register. in the general-purpose registers. Optionally, as described above, for example, the j-th destination operand address in the X destination operand addresses includes 2 column address bits, 2 row address bits and 1 target address bit, and their values are respectively is 00, 01, and 0, then the result write-back unit 107 can write data in M rows*N columns according to the 2 column address bits, 2 row address bits and 1 destination address bit included in the i-th source operand address The corresponding general registers are accessed in the general register group 107 . For example, the result write-back unit 106 may access the first general-purpose register in the second general-purpose register row in the first general-purpose register row to write the corresponding calculation result into the general-purpose register; or, in the second general-purpose register In the general register row, access the 1st general register in the 1st general register column. j is an integer greater than or equal to 1 and less than or equal to X. In this way, based on the pre-divided M row*N column register group, and at least one column address bit, at least one row address bit and at least one target address bit included in the destination operand address, for each destination operand address from the Select a corresponding general-purpose register from the general-purpose register column indicated by at least one column address bit, or select a corresponding general-purpose register from the general-purpose register row indicated by the at least one row address bit, and write the corresponding calculation result into the general-purpose register, The implementation narrows the selection range and reduces the logical cost of the selection.
需要说明的是,图7中所示意的连接关系并不对其之间的连接关系构成限制。此外,流水线结构可以依据每个处理器的结构不同而不同,因此,本发明中所指的流水线结构是指处理器10的流水线结构,而不对其他处理器的流水线结构作具体限定。It should be noted that the connection relationship shown in FIG. 7 does not limit the connection relationship between them. In addition, the pipeline structure may be different according to the structure of each processor. Therefore, the pipeline structure referred to in the present invention refers to the pipeline structure of the processor 10, and does not specifically limit the pipeline structure of other processors.
如上所述,例如,N为4,也即划分有8列通用寄存器组,并且M为8,也即划分有8行通用寄存器组,则每一个源操作数地址中可以包括2个列地址位和3个行地址位。比如,该3个行地址位的数值可以为000、001、010、011、100、101、110或者111,其中,000 可以指示该源操作数地址属于第1个通用寄存器行,001可以指示该源操作数地址属于第2个通用寄存器行,010可以指示该源操作数地址属于第3个通用寄存器行,111可以指示该源操作数地址属于第8个通用寄存器行,等等,此处不再进行赘述。又例如,K为2,也即每一个通用寄存器组内包括2个通用寄存器,则每一个源操作数地址中可以包括1个目标地址位,该1个目标地址位的数值可以为0或者1,其中,0可以指示该源操作数地址对应于通用寄存器组内的第1个通用寄存器,1可以指示该源操作数地址对应于通用寄存器组内的第2个通用寄存器。由此,通过M行*N列寄存器组的划分,以及源操作数地址中包括的列地址位、行地址位以及目标地址位,可以实现缩小选数范围,并最终确定对应的通用寄存器,从而无需额外增加PV提取和PV还原的硬件成本就可以达到缩小选数范围,减少选数代价的目的。例如,该Y个源操作数地址中的第1个源操作数地址中共包括6个地址位,其中包括2个列地址位、3个行地址位和1个目标地址位,且其数值分别01、100和0(也即该第1个源操作数地址可以为011000),则数据选择单元104可以根据该第1个源操作数地址中包括的该2个列地址位、3个行地址位和1个目标地址位,在第2个通用寄存器列的第5个通用寄存器行中访问该寄存器组内的第1个通用寄存器(也即该地址为011000的源操作数地址对应第2列第5行通用寄存器组中的第2个通用寄存器),以获取对应的源操作数,或者在第5个通用寄存器行的第2个通用寄存器列中访问该寄存器组内的第1个通用寄存器,以获取对应的源操作数。As mentioned above, for example, if N is 4, that is, 8 columns of general-purpose register groups are divided, and M is 8, that is, 8 rows of general-purpose register groups are divided, then each source operand address can include 2 column address bits and 3 row address bits. For example, the values of the three row address bits can be 000, 001, 010, 011, 100, 101, 110 or 111, where 000 can indicate that the source operand address belongs to the first general register row, and 001 can indicate that the source operand address belongs to the first general register row. The source operand address belongs to the 2nd general register row, 010 can indicate that the source operand address belongs to the 3rd general register row, 111 can indicate that the source operand address belongs to the 8th general register row, etc. Let's go into details. For another example, if K is 2, that is, each general-purpose register group includes 2 general-purpose registers, then each source operand address can include 1 target address bit, and the value of the 1 target address bit can be 0 or 1. , where 0 can indicate that the source operand address corresponds to the first general-purpose register in the general-purpose register group, and 1 can indicate that the source operand address corresponds to the second general-purpose register in the general-purpose register group. Therefore, through the division of the M row*N column register group, and the column address bits, row address bits and target address bits included in the source operand address, the selection range can be narrowed, and the corresponding general-purpose register can be finally determined, so that The purpose of narrowing the selection range and reducing the selection cost can be achieved without additionally increasing the hardware cost of PV extraction and PV restoration. For example, the first source operand address of the Y source operand addresses includes a total of 6 address bits, including 2 column address bits, 3 row address bits, and 1 destination address bit, and their values are 01 respectively. , 100 and 0 (that is, the first source operand address can be 011000), then the data selection unit 104 can select the 2 column address bits and the 3 row address bits included in the first source operand address according to the and 1 target address bit, access the first general-purpose register in the register group in the fifth general-purpose register row of the second general-purpose register column (that is, the source operand address whose address is 011000 corresponds to the second column. the 2nd general-purpose register in the 5-row general-purpose register bank) to obtain the corresponding source operand, or to access the 1st general-purpose register within the register bank in the 2nd general-purpose register column of the 5th general-purpose register row, to get the corresponding source operand.
如上所述,同理,例如N为4,M为8,K为2,也即划分由M行*N列通用寄存器组,每个通用寄存器组中可以包括2个通用寄存器,则每一个目的操作数地址中可以包括2个列地址位、3个行地址位和1个目标地址位。例如,该3个行地址位数值可以为000、001、010、011、100、101、110或者111,其中,000可以指示该目的操作数地址属于第1个通用寄存器行,001可以指示该目的操作数地址属于第2个通用寄存器行,010可以指示该目的操作数地址属于第3个通用寄存器行,111可以指示该目的操作数地址属于第8个通用寄存器行,等等,此处不再进行赘述。例如,该1个目标地址位的数值可以为0或者1,其中,0可以指示该目的操作数地址对应于通用寄存器组内的第1个通用寄存器,1可以指示该目的操作数地址对应于通用寄存器组内的第2个通用寄存器。由此,通过M行*N列寄存器组的划分,以及目的操作数地址中包括的列地址位、行地址位以及目标地址位,可以实现缩小选数范围,并最终确定对应的通用寄存器,从而无需额外增加PV提取和PV还原的硬件成本就可以达到缩小选数范围,减少选数代价的目的。例如,该X个目的操作数地址中的第3个目的操作数地址中共包括6个地址位,其中包括2个列地址位、3个行地址位和1个目标地址位,且其数值分别01、100和0(也即该第3个目的操作数地址可以为011000),则结果写回单元106可以根据该第3个目的操作数地址中包括的该2个列地址位、3个行地址位和1个目标地址位,在第2个通用寄存器列的第5个通用寄存器行中访问该寄存器组内的第1个通用寄存器(也即该地址为011000的目的操作数地址对应第2列第5行通用寄存器组中的第2个通用寄存器),或者在第5个通用寄存器行的第2个通用寄存器列中访问该寄存器组内的第1个通用寄存器,以将对应的计算结果写入该通用寄存器中。As mentioned above, for the same reason, for example, N is 4, M is 8, and K is 2, that is, it is divided into M rows*N columns of general-purpose register groups, and each general-purpose register group can include 2 general-purpose registers, then each purpose The operand address can include 2 column address bits, 3 row address bits and 1 destination address bit. For example, the 3 row address bit values can be 000, 001, 010, 011, 100, 101, 110 or 111, where 000 can indicate that the destination operand address belongs to the first general register row, and 001 can indicate the destination The operand address belongs to the 2nd general register row, 010 can indicate that the destination operand address belongs to the 3rd general register row, 111 can indicate that the destination operand address belongs to the 8th general register row, and so on, no more here. Repeat. For example, the value of the 1 destination address bit may be 0 or 1, wherein 0 may indicate that the destination operand address corresponds to the first general-purpose register in the general-purpose register group, and 1 may indicate that the destination operand address corresponds to the general-purpose register The second general-purpose register in the register bank. Therefore, through the division of M rows * N columns of register groups, and the column address bits, row address bits and target address bits included in the destination operand address, the selection range can be narrowed, and the corresponding general-purpose register can be finally determined, so that The purpose of narrowing the selection range and reducing the selection cost can be achieved without additionally increasing the hardware cost of PV extraction and PV restoration. For example, the third destination operand address in the X destination operand addresses includes a total of 6 address bits, including 2 column address bits, 3 row address bits and 1 target address bit, and their values are 01 respectively. , 100, and 0 (that is, the third destination operand address can be 011000), the result write-back unit 106 can be based on the 2 column address bits, 3 row addresses included in the third destination operand address bit and 1 target address bit, access the first general-purpose register in the register group in the fifth general-purpose register row of the second general-purpose register column (that is, the destination operand address whose address is 011000 corresponds to the second column The 2nd general-purpose register in the 5th general-purpose register group), or access the 1st general-purpose register in the register group in the 2nd general-purpose register column of the 5th general-purpose register row to write the corresponding calculation result into this general register.
可选地,为了进一步减少选数代价,可以约束每列寄存器中一次选数只能选出2byte的数据出来(也即只选出一个大小为2byte的操作数),因此,上述Y个源操作数地址中的任意两个不同的源操作数地址需属于不同的通用寄存器列。显然,若任意两个不同的源操作数地址属于同一通用寄存器列,则该通用寄存器列一次需要至少选出两个不同的源操作数,也即4byte的数据,增大了选数的逻辑代价。可选地,可以通过编译器实现将属于同一通用寄存器列的不同源操作数地址对应的指令放在不同层进行处理,从而避免上述同一通用寄存器列一次选出多个不同源操作数的选数冲突,减少选数的逻辑代价。同理,上述X个目的操作数地址中的任意两个不同的目的操作数地址也需属于不同的通用寄存器列,若任意两个不同的目的操作数地址属于同一通用寄存器列,则该通用寄存器列一次需要至少选出两个不同的源操作数,也即4byte的数据,增大了选数的逻辑代价。等等,此处不再进行赘述。Optionally, in order to further reduce the cost of number selection, it can be constrained that only 2 bytes of data can be selected for one selection in each column of registers (that is, only one operand with a size of 2 bytes is selected). Therefore, the above Y source operations Any two different source operand addresses in the data address must belong to different general-purpose register columns. Obviously, if any two different source operand addresses belong to the same general register row, then the general register row needs to select at least two different source operands at a time, that is, 4byte data, which increases the logical cost of the selection. . Optionally, instructions corresponding to different source operand addresses belonging to the same general-purpose register row can be placed in different layers for processing by the compiler, so as to avoid the above-mentioned same general-purpose register row selecting multiple options of different source operands at one time. Conflict, reducing the logical cost of selection. Similarly, any two different destination operand addresses in the above X destination operand addresses also need to belong to different general-purpose register columns. If any two different destination operand addresses belong to the same general-purpose register column, the general-purpose register The column needs to select at least two different source operands at a time, that is, 4 bytes of data, which increases the logical cost of the selection. Wait, and no further description will be given here.
可选地,为了进一步减少选数代价,可以约束上述Y个源操作数地址、X个目的操作数地址的访问类型均为直接访问或者均为间接访问。需要说明的是,由于直接访问可以直接知晓操作数地址,而间接访问是一个基址加上一个变量后最终才能得到操作数地址,因此往往会导致间接访问最终得到的操作数地址很有可能与直接访问的操作数地址不同,但是两者属于同一通用寄存器列。如此,对于同一个通用寄存器列来说,一次则需要选择两个不同的通用寄存器,以读取两个不同源操作数,或者写入两个不同指令的计算结果,这将会大大增加选数的逻辑代价。然而,若约束操作数地址均为直接访问,则可以根据前述的约束规则,通过编译器将操作数地址不同但属于同一通用寄存器列的指令放在不同层进行处理;或者,若约束操作数地址均为间接访问,则可以根据前述的约束规则,通过编译器根据间接访问的基址之间始终存在的差异值,将操作数地址不同但属于同一通用寄存器列的指令放在不同层进行处理。综上,可选地,可以通过约束同一层的同一操作数(例如下述图9所示的src1)均为直接访问或者均为间接访问,避免编译器无法识别直接访问和间接访问的列冲突问题。可以理解的是,只有当编译器识别到列冲突,才能根据依赖合理map指令以解决选数时的列冲突问题。因此,禁止在同一层内的同一操作数同时使用直接访问和间接访问的目的是为了编译器能识别列冲突。从而可以提前规避在同一通用寄存器列中一次需要选择两个不同的通用寄存器的情况,减少选数的逻辑代价。Optionally, in order to further reduce the selection cost, the access types of the Y source operand addresses and the X destination operand addresses may be restricted to be direct access or indirect access. It should be noted that since direct access can directly know the operand address, and indirect access is a base address plus a variable to finally get the operand address, it often leads to indirect access. The resulting operand address is very likely to be the same as Operand addresses for direct access are different, but both belong to the same general-purpose register row. In this way, for the same general-purpose register row, it is necessary to select two different general-purpose registers at a time to read two different source operands, or write the calculation results of two different instructions, which will greatly increase the number of options. logical cost. However, if the operand addresses are constrained to be directly accessed, the compiler can place instructions with different operand addresses but belong to the same general register row in different layers for processing according to the aforementioned constraint rules; or, if the operand addresses are constrained If both are indirect accesses, the compiler can place instructions with different operand addresses but belonging to the same general-purpose register row in different layers for processing according to the aforementioned constraint rules and the difference value that always exists between the base addresses of the indirect accesses. To sum up, optionally, the same operand in the same layer (for example, src1 shown in Figure 9 below) can be directly accessed or both accessed indirectly, so as to prevent the compiler from not being able to identify the column conflict of direct access and indirect access. question. It is understandable that only when the compiler recognizes the column conflict, it can resolve the column conflict problem when selecting numbers based on the reasonable map instruction. Therefore, the purpose of prohibiting both direct and indirect access to the same operand in the same layer is to allow the compiler to recognize column conflicts. Therefore, the situation that two different general-purpose registers need to be selected at one time in the same general-purpose register row can be avoided in advance, and the logic cost of the number selection can be reduced.
如上所述,本发明实施例为减少数据选择代价,对现有的通用寄存器进行了划分,得到M行*N列寄存器组107,其中,每一个通用寄存器组可以包括至少一个通用寄存器。从而使得处理器可以基于源操作数地址以及目的操作数地址中包括的至少一个列地址位,从该至少一个列地址位对应的通用寄存器列中,根据其余的行地址位以及目标地址位选择对应的通用寄存器;或者基于源操作数地址以及目的操作数地址中包括的至少一个行地址位,从该至少一个行地址位对应的通用寄存器行中,根据其余的列地址位以及目标地址位选择对应的通用寄存器。实现针对每一个源操作数地址以及目的操作数地址,缩小了其选数范围,大大减小了选数代价。As described above, in order to reduce the cost of data selection, the embodiments of the present invention divide the existing general-purpose registers to obtain M rows*N columns of register groups 107, wherein each general-purpose register group may include at least one general-purpose register. Therefore, the processor can, based on the source operand address and at least one column address bit included in the destination operand address, from the general-purpose register column corresponding to the at least one column address bit, according to the remaining row address bits and target address bits. Or based on at least one row address bit included in the source operand address and the destination operand address, from the general register row corresponding to the at least one row address bit, according to the remaining column address bits and target address bits. general-purpose registers. For each source operand address and destination operand address, the scope of selection is reduced, and the selection cost is greatly reduced.
可选地,请参阅图8a-图8e,图8a-图8e为本发明实施例提供的一种通用寄存器的划分方式示意图。如图8a-图8e所示,可以将FV(也即全部的通用寄存器)以K byte粒度交织 的方式分成N个堆(Bank),也即上述的分成N列。然后,基于Bank维度约束ALU的选数范围以减少选数代价的。其中,交织粒度K一般与ALU操作数的大小保持一致,以便于处理器可以在一个通用寄存器组中取出所需的源操作数(例如指令中源操作数的大小为2byte,则交织粒度K可以为2byte,也即每一个通用寄存器组包括2个通用寄存器,通常每一个通用寄存器可以存储1byte的数据)。其中,Bank数N一般可以与VLIW的大小保持一致(或者说与每层包括的指令数量保持一致),也即与处理器一拍并行处理的指令的数量保持一致,以便于处理器可以针对多条指令各自的源操作数,在Bank中选择相应的通用寄存器,例如指令1的源操作数可以属于Bank0,指令2的源操作数可以属于Bank2,等等,从而减少在多条指令选择同一Bank时产生的选数冲突。例如处理器一拍(也即一个时钟周期)可以并行处理4条指令,则N可以为4,也即将通用寄存器划分成4个Bank。可选地,在一些可能的实施方式中,交织粒度K也可以不与操作数的大小保持一致,例如当操作数的大小为2byte,该交织粒度也可以为1,也即每个通用寄存器组中可以包括1个通用寄存器,或者该交织粒度K可以为4,也即每个通用寄存器组中可以包括4个通用寄存器,等等,本发明实施例对此不作具体限定。可选地,在一些可能的实施方式中,Bank数N也可以大于处理器一拍并行处理的指令的数量,例如处理器一拍可以并行处理4条指令,则N也可以为8,也即将通用寄存器划分成8个Bank,从而可以进一步减少在同一Bank的选数冲突,等等,本发明实施例对此不作具体限定。Optionally, please refer to FIG. 8a-FIG. 8e, FIG. 8a-FIG. 8e are schematic diagrams of a division manner of a general-purpose register provided by an embodiment of the present invention. As shown in Fig. 8a-Fig. 8e, FV (that is, all general-purpose registers) can be divided into N banks (Banks) in a K byte granularity interleaving manner, that is, the above-mentioned division into N columns. Then, the selection range of ALU is constrained based on the Bank dimension to reduce the selection cost. Among them, the interleaving granularity K is generally consistent with the size of the ALU operand, so that the processor can fetch the required source operand in a general register group (for example, the size of the source operand in the instruction is 2byte, then the interleaving granularity K can be is 2byte, that is, each general-purpose register group includes 2 general-purpose registers, usually each general-purpose register can store 1byte of data). Among them, the number of banks N can generally be consistent with the size of the VLIW (or consistent with the number of instructions included in each layer), that is, consistent with the number of instructions processed in parallel by the processor in one shot, so that the processor can target multiple The source operand of each instruction, select the corresponding general-purpose register in the Bank, for example, the source operand of instruction 1 can belong to Bank0, the source operand of instruction 2 can belong to Bank2, etc., thereby reducing the need to select the same Bank for multiple instructions. selection conflict. For example, the processor can process 4 instructions in parallel in one beat (that is, one clock cycle), then N can be 4, that is, the general-purpose register is divided into 4 Banks. Optionally, in some possible implementation manners, the interleaving granularity K may not be consistent with the size of the operand, for example, when the size of the operand is 2 bytes, the interleaving granularity may also be 1, that is, each general register group. may include 1 general-purpose register, or the interleaving granularity K may be 4, that is, each general-purpose register group may include 4 general-purpose registers, etc., which is not specifically limited in this embodiment of the present invention. Optionally, in some possible implementations, the number of banks N may also be greater than the number of instructions processed in parallel by the processor in one beat. For example, if the processor can process 4 instructions in parallel in one beat, N may also be 8, that is, The general-purpose register is divided into 8 banks, so that the number selection conflict in the same bank can be further reduced, and so on, which is not specifically limited in this embodiment of the present invention.
可选地,以512byte的FV为例,也即以总共512个通用寄存器(R0-R511)为例(FV主要以通用寄存器为主,和其他的寄存器组成,比如状态寄存器等等,此处以通用寄存器为例,本发明实施例对此不作具体限定),结合图8a-图8e,对本发明实施例所涉及的通用寄存器的划分方式,以及选数方法进行进一步的阐述。Optionally, take 512byte FV as an example, that is, take a total of 512 general-purpose registers (R0-R511) as an example (FV is mainly composed of general-purpose registers, and other registers, such as status registers, etc., here is the general-purpose register. Taking a register as an example, which is not specifically limited in this embodiment of the present invention), with reference to FIGS. 8a-8e , the division method of general-purpose registers and the number selection method involved in the embodiment of the present invention are further elaborated.
可选地,如图8a所示,以K等于2,N等于4为例,将FV多Bank化,显然,该划分方式以K*N=8为周期,将512个通用寄存器以“Z”字形交织地划分到对应的Bank。如图8a所示,可以将通用寄存器R0-R1、R8-R9、R16-R17……R504-R505划分到Bank0;将R2-R3、R10-R11、R18-R19……R506-R507划分到Bank1;将R4-R5、R12-R13、R20-R21……R508-R509划分到Bank2;将R6-R7、R14-R15、R22-R23……R510-R511划分到Bank3。R0至R511中的每一个通用寄存器可以存储有1byte的数据,可选地,R0等也可记作R0b,其中b表示8bit,也即1byte。如此,如上所述,该512个通用寄存器被划分成4个Bank(也即4列),每个Bank包括64行通用寄存器组,每个通用寄存器组包括2个通用寄存器。在该情况下,显然,由于2
9=512,因此源操作数地址和目的操作数地址均可以包括至少9个地址位,其中可以包括2个列地址位,用于指示所属的Bank,还可以包括6个行地址位,用于指示所属的通用寄存器行,还可以包括1个目标地址位,用于指示该源操作数地址或者该目的操作数地址为该通用寄存器组中的第几个通用寄存器。例如,以源操作数地址为例,该源操作数地址的高2位可以为列地址位,最低1位可以为目标地址位,中间6位可以为行地址位。可选地,该源操作数地址的高6位可以为行地址位,最低1位可以为目标地址位,中间2位可以为列地址位,又或者,该源操作数地址的高1位可以为目标地址位,最低2位可以为列地址位,中间6位可以为行地址位,等等,本发明实施例对此不作具体限定。可选地,以高2位为列地址位,最低1位为目标地址位,中间 6位为行地址位为例,比如该源操作数地址为000000000,则可以确定该源操作地址属于Bank0,通过在Bank0中访问第1行通用寄存器组中的第1个通用寄存器(也即如图8a所示的通用寄存器R0),可以获取对应的源操作数。又比如,该源操作数地址为100000001,则可以确定该源操作地址属于Bank2,通过在Bank2中访问第一行通用寄存器组中的第2个通用寄存器(也即如图8a所示的通用寄存器R5),可以获取对应的源操作数,等等,此处不再进行赘述。可选地,若源操作数的大小与交织粒度K保持一致,也即源操作数的大小也为2byte的情况下,属于同一Bank内的连续2byte的地址可以认为是相同地址,例如源操作数地址000000000以及000000001,可以认为是相同地址,也即如图8a所示的R0和R1可以认为是相同地址,处理器在根据源操作数地址000000000或者000000001获取相应的源操作数是,均是读取R0以及R1中的数据,从而获得2byte的源操作数。又或者,R0h和R0、R0h和R1也可以认为是相同地址,由于R0h中的h表示16bit,也即2byte,则R0h为R0+R1,因此,两者均是读取R0以及R1中的数据,从而获得2byte的源操作数。如上所述,同理,目的操作数地址中的高2位也可以为列地址位,最低1位可以为目标地址位,中间6位可以为行地址位。比如该目的操作数地址为111111110,则可以确定该目的操作地址属于Bank3,通过在Bank3中访问第64行通用寄存器组中的第1个通用寄存器(也即如图8a所示的通用寄存器R510),可以将对应的计算结果写回该通用寄存器R510中。又比如,该目的操作数地址为110000001,则可以确定该源操作地址属于Bank3,通过在Bank3中访问第1行通用寄存器组中的第2个通用寄存器(也即如图8a所示的通用寄存器R7),可以将对应的计算结果写回该通用寄存器R7中,等等,此处不再进行赘述。
Optionally, as shown in Figure 8a, taking K equal to 2 and N equal to 4 as an example, the FV is multi-banked. Obviously, the division method takes K*N=8 as the period, and the 512 general-purpose registers are denoted by "Z". The glyphs are interleaved and divided into corresponding Banks. As shown in Figure 8a, general registers R0-R1, R8-R9, R16-R17...R504-R505 can be divided into Bank0; R2-R3, R10-R11, R18-R19...R506-R507 can be divided into Bank1 ; Divide R4-R5, R12-R13, R20-R21...R508-R509 to Bank2; divide R6-R7, R14-R15, R22-R23...R510-R511 to Bank3. Each general-purpose register in R0 to R511 can store 1byte of data, optionally, R0, etc. can also be recorded as R0b, where b represents 8bit, that is, 1byte. Thus, as described above, the 512 general-purpose registers are divided into 4 banks (ie, 4 columns), each bank includes 64 rows of general-purpose register banks, and each general-purpose register bank includes 2 general-purpose registers. In this case, obviously, since 2 9 =512, both the source operand address and the destination operand address may include at least 9 address bits, which may include 2 column address bits, which are used to indicate the Bank to which they belong, and may also include It includes 6 row address bits, which are used to indicate the general-purpose register row to which it belongs, and can also include 1 target address bit, which is used to indicate that the source operand address or the destination operand address is the first general-purpose register group. register. For example, taking the source operand address as an example, the upper 2 bits of the source operand address may be column address bits, the lower 1 bit may be the destination address bits, and the middle 6 bits may be row address bits. Optionally, the upper 6 bits of the source operand address can be row address bits, the lowest 1 bit can be the destination address bit, and the middle 2 bits can be column address bits, or, the upper 1 bit of the source operand address can be are target address bits, the lowest 2 bits may be column address bits, the middle 6 bits may be row address bits, etc., which are not specifically limited in this embodiment of the present invention. Optionally, taking the upper 2 bits as the column address bits, the lowest 1 bit as the target address bits, and the middle 6 bits as the row address bits as an example, for example, if the source operand address is 000000000, it can be determined that the source operand address belongs to Bank0, By accessing the first general-purpose register in the general-purpose register group of the first row (that is, the general-purpose register R0 shown in FIG. 8a ) in Bank0, the corresponding source operand can be obtained. For another example, if the source operand address is 100000001, it can be determined that the source operand address belongs to Bank2, and by accessing the second general register in the first row general register group in Bank2 (that is, the general register shown in Figure 8a) R5), the corresponding source operand can be obtained, etc., which will not be repeated here. Optionally, if the size of the source operand is consistent with the interleaving granularity K, that is, if the size of the source operand is also 2 bytes, the addresses of consecutive 2 bytes belonging to the same Bank can be considered to be the same address, such as the source operand. Addresses 000000000 and 000000001 can be considered to be the same address, that is, R0 and R1 as shown in Figure 8a can be considered to be the same address. When the processor obtains the corresponding source operand according to the source operand address 000000000 or 000000001, both are read Take the data in R0 and R1 to obtain a 2byte source operand. Alternatively, R0h and R0, R0h and R1 can also be considered to be the same address. Since h in R0h represents 16bit, that is, 2byte, then R0h is R0+R1, so both read the data in R0 and R1 , thereby obtaining a 2byte source operand. As mentioned above, in the same way, the upper 2 bits in the destination operand address can also be the column address bits, the lowest 1 bit can be the target address bits, and the middle 6 bits can be the row address bits. For example, if the destination operand address is 111111110, it can be determined that the destination operand address belongs to Bank3. By accessing the first general-purpose register in the general-purpose register group of row 64 in Bank3 (that is, general-purpose register R510 as shown in Figure 8a) , the corresponding calculation result can be written back to the general register R510. For another example, if the destination operand address is 110000001, it can be determined that the source operation address belongs to Bank3, and by accessing the second general-purpose register in the general-purpose register group of the first row in Bank3 (that is, the general-purpose register shown in Figure 8a) R7), the corresponding calculation result can be written back into the general-purpose register R7, etc., which will not be repeated here.
可选地,如图8b所示,以K等于4,N等于4为例,将FV多Bank化,显然,该划分方式以K*N=16为周期,将512个通用寄存器以“Z”字形交织地划分到对应的Bank。如图8b所示,可以将通用寄存器R0-R3、R16-R19、R32-R35……R496-R499划分到Bank0;将R4-R7、R20-R23、R36-R39……R500-R503划分到Bank1;将R8-R11、R24-R27、R40-R43……R504-R507划分到Bank2;将R12-R15、R28-R31、R44-R47……R508-R511划分到Bank3。如上所述,该512个通用寄存器被划分成4个Bank(也即4列),每个Bank包括32行通用寄存器组,每个通用寄存器组包括4个通用寄存器。如此,同理,源操作数地址和目的操作数地址均可包括9个地址位,其中可以包括2个列地址位,用于指示所属的Bank,还可以包括5个行地址位,用于指示所属的通用寄存器行,还可以包括2个目标地址位,用于指示该源操作数地址或者该目的操作数地址为该通用寄存器组中的第几个通用寄存器。例如,以源操作数地址为例,该源操作数地址的高2位可以为列地址位,最低2位可以为目标地址位,中间5位可以为行地址位。可选地,该源操作数地址的高5位可以为行地址位,最低2位可以为目标地址位,中间2位可以为列地址位,等等,本发明实施例对此不作具体限定。可选地,以高2位为列地址位,最低2位为目标地址位,中间5位为行地址位为例,比如该源操作数地址为100000011,则可以确定该源操作地址属于Bank2,通过在Bank2中访问第一行通用寄存器组中的第4个通用寄存器(也即如图8b所示的通用寄存器R11),可以获取对应的源操作数。又比如,该源操作数地址为100000110,则可以确定该源操作地址属于Bank2,通过在Bank2中访问第二行通用寄存器组中的第3个通用寄存器(也即如图8b所示的通用寄存器R26),可以获取对应的源操作数。可选地, 可以参考上述图8a对应的实施例,此处不再进行赘述。Optionally, as shown in Figure 8b, taking K equal to 4 and N equal to 4 as an example, the FV is multi-banked. Obviously, this division method takes K*N=16 as the period, and the 512 general-purpose registers are marked with "Z". The glyphs are interleaved and divided into corresponding Banks. As shown in Figure 8b, general registers R0-R3, R16-R19, R32-R35...R496-R499 can be divided into Bank0; R4-R7, R20-R23, R36-R39...R500-R503 can be divided into Bank1 ; Divide R8-R11, R24-R27, R40-R43...R504-R507 to Bank2; divide R12-R15, R28-R31, R44-R47...R508-R511 to Bank3. As described above, the 512 general-purpose registers are divided into 4 banks (ie, 4 columns), each bank includes 32 rows of general-purpose register banks, and each general-purpose register bank includes 4 general-purpose registers. In this way, in the same way, both the source operand address and the destination operand address can include 9 address bits, of which 2 column address bits can be included to indicate the Bank to which they belong, and 5 row address bits can be included to indicate the bank. The general-purpose register row to which it belongs may further include two target address bits, which are used to indicate which general-purpose register in the general-purpose register group the source operand address or the destination operand address is. For example, taking the source operand address as an example, the upper 2 bits of the source operand address may be column address bits, the lower 2 bits may be destination address bits, and the middle 5 bits may be row address bits. Optionally, the upper 5 bits of the source operand address may be row address bits, the lowest 2 bits may be target address bits, the middle 2 bits may be column address bits, etc., which are not specifically limited in this embodiment of the present invention. Optionally, taking the upper 2 bits as the column address bits, the lower 2 bits as the target address bits, and the middle 5 bits as the row address bits as an example, for example, if the source operand address is 100000011, it can be determined that the source operation address belongs to Bank2, The corresponding source operand can be obtained by accessing the fourth general-purpose register in the first-row general-purpose register group (that is, the general-purpose register R11 shown in Figure 8b) in Bank2. For another example, if the source operand address is 100000110, it can be determined that the source operand address belongs to Bank2, by accessing the third general-purpose register in the second row general-purpose register group in Bank2 (that is, the general-purpose register shown in Figure 8b). R26), the corresponding source operand can be obtained. Optionally, reference may be made to the above-mentioned embodiment corresponding to FIG. 8 a , which will not be repeated here.
可选地,如图8c所示,以K等于2,N等于2为例,将FV多Bank化,显然,该划分方式以K*N=4为周期,将512个通用寄存器以“Z”字形交织地划分到对应的Bank。如图8c所示,可以将通用寄存器R0-R1、R4-R5、R8-R9……R508-R509划分到Bank0;将R2-R3、R6-R7、R10-R11……R510-R511划分到Bank1。如上所述,该512个通用寄存器被划分成2个Bank(也即2列),每个Bank包括128行通用寄存器组,每个通用寄存器组包括2个通用寄存器。如此,同理,源操作数地址和目的操作数地址均可包括9个地址位,其中可以包括1个列地址位,用于指示所属的Bank,还可以包括7个行地址位,用于指示所属的通用寄存器行,还可以包括1个目标地址位,用于指示该源操作数地址或者该目的操作数地址为该通用寄存器组中的第几个通用寄存器。例如,以源操作数地址为例,该源操作数地址的最高1位可以为列地址位,最低1位可以为目标地址位,中间7位可以为行地址位。比如该源操作数地址为100000001,则可以确定该源操作地址属于Bank1,通过在Bank1中访问第一行通用寄存器组中的第2个通用寄存器(也即如图8c所示的通用寄存器R3),可以获取对应的源操作数。又比如,该源操作数地址为10000010,则可以确定该源操作地址属于Bank1,通过在Bank1中访问第2行通用寄存器组中的第1个通用寄存器(也即如图8c所示的通用寄存器R6),可以获取对应的源操作数。可选地,可以参考上述图8a对应的实施例,此处不再进行赘述。Optionally, as shown in Figure 8c, taking K equal to 2 and N equal to 2 as an example, the FV is multi-banked. Obviously, the division method takes K*N=4 as the period, and the 512 general-purpose registers are denoted by "Z". The glyphs are interleaved and divided into corresponding Banks. As shown in Figure 8c, general registers R0-R1, R4-R5, R8-R9...R508-R509 can be divided into Bank0; R2-R3, R6-R7, R10-R11...R510-R511 can be divided into Bank1 . As described above, the 512 general-purpose registers are divided into 2 banks (ie, 2 columns), each bank includes 128 rows of general-purpose register banks, and each general-purpose register bank includes 2 general-purpose registers. In this way, in the same way, both the source operand address and the destination operand address can include 9 address bits, which can include 1 column address bit to indicate the Bank to which they belong, and 7 row address bits to indicate the bank. The general-purpose register row to which it belongs may also include a target address bit, which is used to indicate that the source operand address or the destination operand address is the th general-purpose register in the general-purpose register group. For example, taking the source operand address as an example, the highest 1 bit of the source operand address can be the column address bit, the lowest 1 bit can be the target address bit, and the middle 7 bits can be the row address bit. For example, if the source operand address is 100000001, it can be determined that the source operation address belongs to Bank1. By accessing the second general-purpose register in the first row of general-purpose register group in Bank1 (that is, general-purpose register R3 as shown in Figure 8c) , you can get the corresponding source operand. For another example, if the source operand address is 10000010, it can be determined that the source operand address belongs to Bank1, by accessing the first general-purpose register in the second row general-purpose register group in Bank1 (that is, the general-purpose register shown in Figure 8c). R6), you can get the corresponding source operand. Optionally, reference may be made to the above-mentioned embodiment corresponding to FIG. 8 a , which will not be repeated here.
可选地,其他可能的划分方式可以参考图8d以及图8e。如图8d所示,以K*N=8为周期,将512个通用寄存器以“Z”字形交织地划分到2个Bank中,每个Bank包括64行通用寄存器组,每个通用寄存器组可以包括4个通用寄存器。如图8e所示,以K*N=16为周期,将512个通用寄存器以“Z”字形交织地划分到8个Bank中,每个Bank包括32行通用寄存器组,每个通用寄存器组可以包括2个通用寄存器,等等,此处不再进行赘述。Optionally, reference may be made to FIG. 8d and FIG. 8e for other possible division manners. As shown in Figure 8d, with K*N=8 as the cycle, 512 general-purpose registers are divided into 2 banks in a "Z" shape interleaved, each bank includes 64 rows of general-purpose register groups, and each general-purpose register group can Includes 4 general purpose registers. As shown in Figure 8e, with K*N=16 as the period, 512 general-purpose registers are divided into 8 banks in a "Z" shape interleaved, each bank includes 32 rows of general-purpose register groups, and each general-purpose register group can be Including 2 general-purpose registers, etc., which will not be repeated here.
可选地,若FV为1024byte,以K等于2,N等于4为例,将FV多Bank化,该1024个通用寄存器可以被划分成4个Bank(也即4列),每个Bank包括128行通用寄存器组,每个通用寄存器组包括2个通用寄存器。则源操作数地址和目的操作数地址均可以包括至少10个地址位,其中可以包括2个列地址位,7个行地址位和1个目标地址位,等等,此处不再进行赘述。Optionally, if the FV is 1024 bytes, take K equals 2 and N equals 4 as an example, the FV is multi-banked, the 1024 general-purpose registers can be divided into 4 banks (that is, 4 columns), and each bank includes 128 Row general-purpose register banks, each general-purpose register bank includes 2 general-purpose registers. Then, both the source operand address and the destination operand address may include at least 10 address bits, which may include 2 column address bits, 7 row address bits, and 1 destination address bit, etc., which will not be repeated here.
可选地,划分Bank的方式可以不限于上述图8a-图8e所示的“Z”字形交织的划分方式,在一些可能的实现方式中,可以将连续的地址划分到每个Bank中,例如,以512byte的FV,且Bank数N等于4为例,则可以将R0-R127划分到Bank0,将R128-R255划分到Bank1,R256-R382划分到Bank2,R383-R511划分到Bank3,等等,本发明实施例对此不作具体限定。Optionally, the way of dividing the Bank may not be limited to the dividing way of the "Z"-shaped interleaving shown in the above-mentioned Figures 8a-8e. In some possible implementations, consecutive addresses can be divided into each Bank, for example , Take the FV of 512byte and the number of Bank N equal to 4 as an example, you can divide R0-R127 into Bank0, R128-R255 into Bank1, R256-R382 into Bank2, R383-R511 into Bank3, etc., This embodiment of the present invention does not specifically limit this.
请参阅图9,图9为本发明实施例提供的一种数据选择的示意图。其中,算术逻辑单元ALU主要完成对二进制数据的算术运算(加减乘除)、逻辑运算(与或非异或)以及移位操作。数学运算例如加、减、乘、除,以及逻辑运算例如“OR、AND、ASL,ROL”等指令都在算术逻辑单元ALU中执行。如图9所示,一条指令可以包括两个源操作数,例如算术逻辑单元ALU0可以用于执行指令0的任务,可以包括src1与src2两个源操作数(比如 可以对src1与src2两个源操作数进行相加或者相减运算等等),又例如ALU1可以用于执行指令1的任务,可以包括src1与src2两个源操作数,还例如ALU2可以用于执行指令2的任务,可以包括src1与src2两个源操作数,还例如ALU3可以用于执行指令3的任务,可以包括src1与src2两个源操作数。其中,指令0、指令1、指令2和指令3可以为VLIW架构下相同层的4条指令。其中,每条指令的两个源操作数(比如图9所示的ALU0的src1和src2,ALU1的src1和src2,ALU2的src1和src2,以及ALU3的src1和src2)之间可以互相独立,不受上述约束规则的约束。如此,虽然不同ALU的相同源操作数不可以访问同一Bank内的不同通用寄存器(例如,以图8a所示的划分方式为例,ALU0的src1以及ALU1的src1不可以访问Bank0的不同通用寄存器,比如R0以及R8),但是,不同ALU或者相同ALU的不同源操作数访问Bank没有约束。例如,ALU0的src1以及ALU0的src2可以访问同一Bank内的不同通用寄存器,比如图8a所示的Bank0内的R0以及R8,或者ALU0的src1以及ALU1的src2也可以访问同一Bank内的不同通用寄存器,比如图8a所示的Bank1内的R10以及R11,等等,此处不再进行赘述。由于src1与src2之间的独立性,不受上述约束规则的约束,从而可以一定程度上减少约束规则带来的指令执行效率降低的问题。可选地,如上所述,由于指令0、指令1、指令2和指令3为相同层的4条指令,因此需要受到上述选数约束规则的约束,避免一次在同一Bank中选择多个不同源操作数带来的选数逻辑代价的增大,基于此,可以理解的是,不同层的指令的源操作数访问Bank不受约束。例如,指令4为下一层的指令,与指令0、指令1、指令2和指令3处于不同层,因此,在处理器进行下一层指令处理时,上一层中指令0、指令1、指令2和指令3已完成对相应Bank的访问,获取了计算所需的源操作数,则指令4的src1或者src2对Bank的访问与上一层指令0、指令1、指令2和指令3之间不会产生冲突。同样的,由于不同ALU或者相同ALU的不同源操作数访问Bank没有约束,因此,不同ALU或者相同ALU的不同源操作数也可以为任意的访问类型,比如ALU0的src1可以为直接访问,ALU0的src2可以为间接访问,又比如,ALU1的src2可以为直接访问,ALU3的src1可以为间接访问,等等,此处不再进行赘述。Please refer to FIG. 9. FIG. 9 is a schematic diagram of data selection provided by an embodiment of the present invention. Among them, the arithmetic logic unit ALU mainly completes arithmetic operations (addition, subtraction, multiplication and division), logical operations (and or non-exclusive OR) and shift operations on binary data. Mathematical operations such as addition, subtraction, multiplication, division, and logical operations such as "OR, AND, ASL, ROL" instructions are performed in the arithmetic logic unit ALU. As shown in Figure 9, an instruction can include two source operands, for example, the arithmetic logic unit ALU0 can be used to execute the task of instruction 0, and can include two source operands src1 and src2 (for example, the two source operands of src1 and src2 can be For example, ALU1 can be used to execute the task of instruction 1, which can include two source operands, src1 and src2, and, for example, ALU2 can be used to execute the task of instruction 2, which can include There are two source operands, src1 and src2. For example, ALU3 can be used to execute the task of instruction 3, and can include two source operands, src1 and src2. Wherein, instruction 0, instruction 1, instruction 2, and instruction 3 may be four instructions of the same layer under the VLIW architecture. Among them, the two source operands of each instruction (such as src1 and src2 of ALU0 shown in Figure 9, src1 and src2 of ALU1, src1 and src2 of ALU2, and src1 and src2 of ALU3) can be independent of each other, and they are not independent of each other. Subject to the above binding rules. In this way, although the same source operands of different ALUs cannot access different general-purpose registers in the same Bank (for example, taking the division method shown in Figure 8a as an example, src1 of ALU0 and src1 of ALU1 cannot access different general-purpose registers of Bank0, Such as R0 and R8), however, there is no restriction on accessing Bank by different ALUs or different source operands of the same ALU. For example, src1 of ALU0 and src2 of ALU0 can access different general registers in the same Bank, such as R0 and R8 in Bank0 shown in Figure 8a, or src1 of ALU0 and src2 of ALU1 can also access different general registers in the same Bank , such as R10 and R11 in Bank1 shown in FIG. 8a , etc., which will not be repeated here. Due to the independence between src1 and src2, they are not constrained by the above constraint rules, so that the problem of reduced instruction execution efficiency caused by constraint rules can be reduced to a certain extent. Optionally, as mentioned above, since instruction 0, instruction 1, instruction 2, and instruction 3 are 4 instructions of the same layer, they need to be constrained by the above-mentioned selection constraint rules to avoid selecting multiple different sources in the same Bank at one time. Based on the increase of the optional logic cost brought about by the operand, it can be understood that the source operands of the instructions of different layers are not constrained to access the Bank. For example, instruction 4 is the instruction of the next layer, which is in a different layer from instruction 0, instruction 1, instruction 2 and instruction 3. Therefore, when the processor processes the instruction of the next layer, the instruction 0, instruction 1, Instruction 2 and instruction 3 have completed the access to the corresponding Bank and obtained the source operands required for the calculation, then the access to the Bank by src1 or src2 of instruction 4 is the same as that of the previous instruction 0, instruction 1, instruction 2 and instruction 3. There will be no conflict. Similarly, since different ALUs or different source operands of the same ALU have no restrictions on accessing Bank, different ALUs or different source operands of the same ALU can also have any access type. src2 may be indirect access, for example, src2 of ALU1 may be direct access, src1 of ALU3 may be indirect access, etc., which will not be repeated here.
请参阅图10,图10为本发明实施例提供的一种结果写回的示意图。如图10所示,每条指令可以包括一个目的操作数,在写回流水线阶段,ALU0、ALU1、ALU2和ALU3可以将各自得到的计算结果根据各自的目的操作数地址写回对应的通用寄存器中。如上所述,同理,由于指令0、指令1、指令2和指令3为相同层的4条指令,因此需要受到上述选数约束规则的约束,避免一次在同一Bank中选择多个不同目的操作数(也即多个不同目的操作数地址对应的通用寄存器)带来的选数逻辑代价的增大,但是,不同层的指令的目的操作数访问Bank不受约束,并且对不同层的指令的目的操作数的访问类型也不作约束,具体可参考上述图9对应的实施例,此处不在进行赘述。可选地,若不同指令的目的操作数为相同地址,也即不同指令的目的操作数地址相同,则在将计算结果写回通用寄存器时,可以支持写后写(Write After Write,WAW)模式,将逻辑上后执行的指令的目的操作数作为最后操作的目的操作数,也即后写入的可以覆盖掉之前写入的。例如,ALU0、ALU1、ALU2和ALU3的目的操作数地址均为R0,并且,ALU3在ALU0、ALU1和ALU2之后将指令3 的计算结果写入R0,则R0中最终保存指令3的计算结果。Please refer to FIG. 10. FIG. 10 is a schematic diagram of a result write-back provided by an embodiment of the present invention. As shown in Figure 10, each instruction can include a destination operand. During the write-backflow pipeline stage, ALU0, ALU1, ALU2, and ALU3 can write their respective calculation results back to the corresponding general-purpose registers according to their destination operand addresses. . As mentioned above, for the same reason, since instruction 0, instruction 1, instruction 2 and instruction 3 are 4 instructions of the same layer, they need to be constrained by the above selection constraint rules to avoid selecting multiple operations with different purposes in the same Bank at one time. The increase in the selection logic cost caused by the number (that is, the general-purpose registers corresponding to multiple different destination operand addresses), however, the destination operands of different layers of instructions are not restricted to access the Bank, and the access to the Bank of instructions of different layers is not restricted. The access type of the destination operand is also not restricted. For details, reference may be made to the embodiment corresponding to FIG. 9 above, which will not be repeated here. Optionally, if the destination operands of different instructions are the same address, that is, the destination operand addresses of different instructions are the same, when the calculation result is written back to the general-purpose register, the Write After Write (WAW) mode can be supported. , take the destination operand of the logically executed instruction as the destination operand of the last operation, that is, the later written can overwrite the previously written. For example, the destination operand addresses of ALU0, ALU1, ALU2 and ALU3 are all R0, and ALU3 writes the calculation result of instruction 3 into R0 after ALU0, ALU1 and ALU2, then R0 finally saves the calculation result of instruction 3.
可选地,为缓解增加了上述约束规则导致的处理数据相关性时指令执行效率下降的问题,本发明实施例中的处理器10还可以包括一个或多个临时寄存器(Temporary Register,T)(图7中未示出),该一个或多个临时寄存器可以被全部的ALU(例如图9所示的ALU0、ALU1、ALU2和ALU3)同时可见,不受任何约束规则的约束。同时,临时寄存器仅对编译器(Compiler)可见,对程序员(Programer)是不可见的,可以用于提升指令利用率,编译器可以将需要频繁引用和更新的通用寄存器使用临时寄存器替代。Optionally, in order to alleviate the problem of decreased instruction execution efficiency when processing data dependencies caused by the addition of the above-mentioned constraint rules, the processor 10 in this embodiment of the present invention may further include one or more temporary registers (Temporary Register, T) ( 7), the one or more temporary registers can be simultaneously visible to all ALUs (eg, ALU0, ALU1, ALU2 and ALU3 shown in FIG. 9) without any constraint. At the same time, temporary registers are only visible to the compiler (Compiler), but not to the programmer (Programer), and can be used to improve instruction utilization. The compiler can replace general-purpose registers that need to be frequently referenced and updated with temporary registers.
例如,以图8a以及图9为例,在同一层的4条指令中,若ALU0的src1需要访问Bank0中的R0,ALU1的src1需要访问Bank0中的R8,并且ALU2的src1需要访问Bank0中的R8,同时ALU3的src1也需要访问Bank0中的R8,如此,依照上述同一层中不同指令的相同源操作数(例如src1)不可访问同一Bank的不同通用寄存器的约束规则,该指令1、指令2和指令3无法与指令0处于同一层进行处理,只能在这一层插空指令并将指令1、指令2和指令3移至下一层进行处理,如此,由于空指令的插入,大大降低了指令利用率以及指令的执行效率。在此情况下,也即知晓R8会被频繁访问引用的情况下,可以在指令0、指令1、指令2和指令3的上一层中,将R8用一个临时寄存器T0替代,该T0中可以存储有R8的数据,并且与R8对应。如此,在指令1、指令2和指令3根据源操作数地址进行数据选择时,均可以直接访问对应的T0,以获取源操作数,从而不受约束规则的约束,保证了指令的利用率和执行效率。For example, taking Figure 8a and Figure 9 as examples, in the 4 instructions of the same layer, if src1 of ALU0 needs to access R0 in Bank0, src1 of ALU1 needs to access R8 in Bank0, and src1 of ALU2 needs to access R8 in Bank0 R8, and src1 of ALU3 also needs to access R8 in Bank0, so, according to the constraint rule that the same source operand (such as src1) of different instructions in the same layer cannot access different general registers of the same Bank, the instruction 1, instruction 2 and instruction 3 cannot be processed in the same layer as instruction 0. You can only insert empty instructions in this layer and move instruction 1, instruction 2 and instruction 3 to the next layer for processing. In this way, due to the insertion of empty instructions, the reduction is greatly reduced. The instruction utilization and the execution efficiency of the instruction. In this case, that is, knowing that R8 will be frequently accessed and referenced, R8 can be replaced by a temporary register T0 in the upper layer of instruction 0, instruction 1, instruction 2 and instruction 3, and the T0 can be The data of R8 is stored and corresponds to R8. In this way, when instruction 1, instruction 2 and instruction 3 select data according to the source operand address, they can directly access the corresponding T0 to obtain the source operand, thus not being constrained by the constraint rules, ensuring the utilization and effectiveness.
又例如,以图8a以及图10为例,在同一层的4条指令中,若ALU0的dst1需要访问Bank1中的R2,ALU1的src1需要访问Bank1中的R10,并且ALU2的src1需要访问Bank1中的R10,同时ALU3的src1也需要访问Bank1中的R10。如此,同理,在知晓R10会被频繁访问引用的情况下,可以在指令0、指令1、指令2和指令3的上一层中,将R10用一个临时寄存器T1替代,该T1中可以存储有R10的数据,并且与R10对应。如此,在指令1、指令2和指令3根据目的操作数地址进行计算结果的写回时,均可以直接访问对应的T1,并将各自的计算结果写入T1中,不受约束规则的约束,保证了指令的利用率和执行效率。For another example, taking Figure 8a and Figure 10 as examples, in the four instructions of the same layer, if dst1 of ALU0 needs to access R2 in Bank1, src1 of ALU1 needs to access R10 in Bank1, and src1 of ALU2 needs to access R10 in Bank1 At the same time, src1 of ALU3 also needs to access R10 in Bank1. In the same way, in the case of knowing that R10 will be frequently accessed and referenced, R10 can be replaced by a temporary register T1 in the upper layer of instruction 0, instruction 1, instruction 2 and instruction 3, which can store There is data for R10, and it corresponds to R10. In this way, when instruction 1, instruction 2 and instruction 3 write back the calculation results according to the destination operand address, they can directly access the corresponding T1, and write their respective calculation results into T1, without being constrained by the constraint rules, Guaranteed instruction utilization and execution efficiency.
需要说明的是,本发明实施例可应用于任何多条指令并行处理的芯片,设备或者装置,并且不限于其实现模式是硬件还是软件。It should be noted that the embodiments of the present invention can be applied to any chip, device, or device that processes multiple instructions in parallel, and is not limited to whether the implementation mode is hardware or software.
请参阅图11,图11为本发明实施例提供的一种处理方法的流程示意图,该处理方法,应用于处理器,该处理器包括数据选择单元、与所述数据选择单元连接的指令译码单元和M行*N列通用寄存器组,所述M行*N列通用寄存器组中的每一个通用寄存器组包括K个通用寄存器;M、N和K为大于或者等于1的整数;且该处理方法适用于上述图1-图3中的任意一种处理器以及包含所述处理器的设备(如手机、电脑、服务器等)。该方法可以包括以下步骤S201-步骤S202,其中,Please refer to FIG. 11. FIG. 11 is a schematic flowchart of a processing method according to an embodiment of the present invention. The processing method is applied to a processor, and the processor includes a data selection unit and an instruction decoding unit connected to the data selection unit. unit and M row*N column general-purpose register group, each general-purpose register group in the M row*N-column general-purpose register group includes K general-purpose registers; M, N, and K are integers greater than or equal to 1; and the processing The method is applicable to any one of the processors in the above-mentioned FIG. 1 to FIG. 3 and a device (such as a mobile phone, a computer, a server, etc.) including the processor. The method may include the following steps S201-S202, wherein,
步骤S201,通过所述指令译码单元,对输入的X条指令进行译码,获取所述X条指令各自的至少一个源操作数地址,共计Y个源操作数地址;并将所述Y个源操作数地址发 送至所述数据选择单元;所述Y个源操作数地址中的每一个源操作数地址中包括至少一个列地址位、至少一个行地址位和至少一个目标地址位;所述至少一个列地址位用于指示所述每一个源操作数地址所属的通用寄存器列,所述至少一个行地址位用于指示所述每一个源操作数地址所属的通用寄存器行;所述至少一个目标地址位用于指示所述每一个源操作数地址与所述K个通用寄存器中的第t个通用寄存器的对应关系;X、Y和t为大于或者等于1的整数;Step S201, through the instruction decoding unit, decode the input X instructions, and obtain at least one source operand address of each of the X instructions, a total of Y source operand addresses; The source operand address is sent to the data selection unit; each of the Y source operand addresses includes at least one column address bit, at least one row address bit and at least one target address bit; the At least one column address bit is used to indicate the general-purpose register column to which each source operand address belongs, and the at least one row address bit is used to indicate the general-purpose register row to which each source operand address belongs; the at least one The target address bit is used to indicate the correspondence between each source operand address and the t-th general-purpose register in the K general-purpose registers; X, Y, and t are integers greater than or equal to 1;
步骤S202,通过所述数据选择单元,根据第i个源操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第i个源操作数地址对应的通用寄存器,获取对应的源操作数;所述第i个源操作数地址为所述Y个源操作数地址中的一个;i为大于或者等于1,且小于或者等于Y的整数。Step S202, through the data selection unit, according to the at least one column address bit, the at least one row address bit and the at least one target address bit included in the i-th source operand address in the M row* Access the general register corresponding to the i-th source operand address in the N-column general-purpose register group to obtain the corresponding source operand; the i-th source operand address is one of the Y source operand addresses; i is an integer greater than or equal to 1 and less than or equal to Y.
在一种可能的实现方式中,所述处理器还包括与所述数据选择单元连接的执行单元,所述执行单元包括至少一个算术逻辑单元;所述方法还包括:In a possible implementation manner, the processor further includes an execution unit connected to the data selection unit, the execution unit includes at least one arithmetic logic unit; the method further includes:
通过所述至少一个算术逻辑单元,基于所述Y个源操作数地址各自对应的所述源操作数执行所述X条指令,得到所述X条指令各自的计算结果。Through the at least one arithmetic logic unit, the X instructions are executed based on the source operands corresponding to the Y source operand addresses, respectively, to obtain respective calculation results of the X instructions.
在一种可能的实现方式中,所述处理器还包括与所述执行单元连接的结果写回单元;所述方法还包括:In a possible implementation manner, the processor further includes a result write-back unit connected to the execution unit; the method further includes:
通过所述指令译码单元,获取所述X条指令各自的目的操作数地址,共计X个目的操作数地址;所述X个目的操作数地址中的每一个目的操作数地址中包括所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位;所述至少一个列地址位用于指示所述每一个目的操作数地址所属的所述通用寄存器列,所述至少一个行地址位用于指示所述每一个目的操作数地址所属的所述通用寄存器行;所述至少一个目标地址位用于指示所述每一个目的操作数地址与所述K个通用寄存器中的所述第t个通用寄存器的对应关系;Through the instruction decoding unit, the respective destination operand addresses of the X instructions are acquired, with a total of X destination operand addresses; each destination operand address in the X destination operand addresses includes the at least one column address bit, the at least one row address bit and the at least one target address bit; the at least one column address bit is used to indicate the general register column to which each destination operand address belongs, the at least one column address bit One row address bit is used to indicate the general-purpose register row to which each destination operand address belongs; the at least one target address bit is used to indicate that each destination operand address is associated with the K general-purpose registers. the correspondence of the t-th general-purpose register;
通过所述结果写回单元,根据第j个目的操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第j个目的操作数地址对应的所述通用寄存器,并将第j条指令的所述计算结果写回所述第j个目的操作数地址对应的所述通用寄存器中;j为大于或者等于1,且小于或者等于X的整数。According to the result write-back unit, according to the at least one column address bit, the at least one row address bit and the at least one target address bit included in the jth destination operand address, in the M row*N column Access the general-purpose register corresponding to the j-th destination operand address in the general-purpose register group, and write the calculation result of the j-th instruction back into the general-purpose register corresponding to the j-th destination operand address ; j is an integer greater than or equal to 1 and less than or equal to X.
在一种可能的实现方式中,所述通过所述数据选择单元,根据第i个源操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第i个源操作数地址对应的通用寄存器,包括:In a possible implementation manner, the data selection unit is based on the at least one column address bit, the at least one row address bit and the at least one destination included in the i-th source operand address. The address bits access the general-purpose register corresponding to the i-th source operand address in the M-row*N-column general-purpose register group, including:
通过所述数据选择单元,根据所述第i个源操作数地址中包括的所述至少一个列地址位,确定所述第i个源操作数地址所属的第i’个通用寄存器列,并在所述第i’个通用寄存器列中,根据所述第i个源操作数地址中包括的所述至少一个行地址位和所述至少一个目标地址位访问对应的所述通用寄存器,i’为大于或者等于1,且小于或者等于N的整数;或者,Through the data selection unit, according to the at least one column address bit included in the i-th source operand address, determine the i'th general-purpose register column to which the i-th source operand address belongs, and perform a In the i'th general-purpose register column, the corresponding general-purpose register is accessed according to the at least one row address bit and the at least one target address bit included in the i-th source operand address, where i' is: an integer greater than or equal to 1 and less than or equal to N; or,
通过所述数据选择单元,根据所述第i个源操作数地址中包括的所述至少一个行地址位,确定所述第i个源操作数地址所属的第i”个通用寄存器行,并在所述第i”个通用 寄存器行中,根据所述第i个源操作数地址中包括的所述至少一个列地址位和所述至少一个目标地址位访问对应的所述通用寄存器,i”为大于或者等于1,且小于或者等于M的整数。Through the data selection unit, according to the at least one row address bit included in the i-th source operand address, determine the i-th general register row to which the i-th source operand address belongs, and set the row in the i-th source operand address. In the i" th general register row, the corresponding general register is accessed according to the at least one column address bit and the at least one target address bit included in the i th source operand address, where i" is: An integer greater than or equal to 1 and less than or equal to M.
在一种可能的实现方式中,所述通过所述结果写回单元,根据第j个目的操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第j个目的操作数地址对应的所述通用寄存器,包括:In a possible implementation manner, the writing back unit through the result is based on the at least one column address bit, the at least one row address bit and the at least one column address bit included in the jth destination operand address. The target address bit accesses the general-purpose register corresponding to the j-th destination operand address in the M-row*N-column general-purpose register group, including:
通过所述结果写回单元,根据所述第j个目标操作数地址中包括的所述至少一个列地址位,确定所述第j个目标操作数地址所属的第j’个通用寄存器列,并在所述第j’个通用寄存器列中,根据所述第j个目标操作数地址中包括的所述至少一个行地址位和所述至少一个目标地址位访问对应的所述通用寄存器,j’为大于或者等于1,且小于或者等于N的整数;或者,Through the result write-back unit, according to the at least one column address bit included in the jth target operand address, determine the j'th general register column to which the jth target operand address belongs, and In the j'th general register column, the corresponding general register is accessed according to the at least one row address bit and the at least one target address bit included in the jth target operand address, j' is an integer greater than or equal to 1 and less than or equal to N; or,
通过所述结果写回单元,根据所述第j个目标操作数地址中包括的所述至少一个行地址位,确定所述第j个目标操作数地址所属的第j”个通用寄存器行,并在所述第j”个通用寄存器行中,根据所述第j个目标操作数地址中包括的所述至少一个列地址位和所述至少一个目标地址位访问对应的所述通用寄存器,j”为大于或者等于1,且小于或者等于M的整数。Through the result write-back unit, according to the at least one row address bit included in the jth target operand address, determine the j"th general register row to which the jth target operand address belongs, and In the j"-th general-purpose register row, the corresponding general-purpose register is accessed according to the at least one column address bit and the at least one target address bit included in the j-th target operand address, j" is an integer greater than or equal to 1 and less than or equal to M.
在一种可能的实现方式中,所述Y个源操作数地址中的任意两个不同的源操作数地址属于不同的所述通用寄存器列;所述X个目的操作数地址中的任意两个不同的目的操作数地址属于不同的所述通用寄存器列。In a possible implementation manner, any two different source operand addresses among the Y source operand addresses belong to different general register columns; any two among the X destination operand addresses Different destination operand addresses belong to different columns of the general-purpose register.
在一种可能的实现方式中,所述处理器还包括临时寄存器,所述临时寄存器与目标通用寄存器对应;所述目标通用寄存器为所述M行*N列通用寄存器组中的一个;所述临时寄存器存储有所述目标通用寄存器内的数据;所述Y个源操作数地址中包括多个相同的数量大于第一阈值的第一源操作数地址,以及第二源操作数地址;所述第一源操作数地址对应的所述通用寄存器为所述目标通用寄存器,所述第二源操作数地址属于所述目标通用寄存器所在的所述通用寄存器列;所述方法还包括:In a possible implementation manner, the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the Y source operand addresses include a plurality of first source operand addresses whose number is greater than the first threshold, and second source operand addresses; the The general-purpose register corresponding to the first source operand address is the target general-purpose register, and the second source operand address belongs to the general-purpose register row where the target general-purpose register is located; the method further includes:
通过所述数据选择单元,根据所述第一源操作数地址访问对应的所述临时寄存器,获取对应的源操作数;Through the data selection unit, access the corresponding temporary register according to the first source operand address, and obtain the corresponding source operand;
在一种可能的实现方式中,所述处理器还包括临时寄存器,所述临时寄存器与目标通用寄存器对应;所述目标通用寄存器为所述M行*N列通用寄存器组中的一个;所述临时寄存器存储有所述目标通用寄存器内的数据;所述X个目的操作数地址中包括多个相同的数量大于第一阈值的第一目的操作数地址,以及第二目的操作数地址;所述第一目的操作数地址对应的所述通用寄存器为所述目标通用寄存器,所述第二目的操作数地址属于所述目标通用寄存器所在的所述通用寄存器列;所述方法还包括:In a possible implementation manner, the processor further includes a temporary register, and the temporary register corresponds to a target general-purpose register; the target general-purpose register is one of the M-row*N-column general-purpose register groups; the The temporary register stores the data in the target general-purpose register; the X destination operand addresses include a plurality of first destination operand addresses whose number is greater than the first threshold, and the second destination operand address; the The general-purpose register corresponding to the first destination operand address is the target general-purpose register, and the second destination operand address belongs to the general-purpose register row where the target general-purpose register is located; the method further includes:
通过所述结果写回单元,根据所述目标目的操作数地址访问对应的所述临时寄存器,并将对应的所述计算结果写回所述临时寄存器中。Through the result write-back unit, the corresponding temporary register is accessed according to the target destination operand address, and the corresponding calculation result is written back into the temporary register.
在一种可能的实现方式中,所述处理器还包括与所述指令译码单元连接的指令获取单元;所述方法还包括:In a possible implementation manner, the processor further includes an instruction acquisition unit connected to the instruction decoding unit; the method further includes:
通过所述指令获取单元,获取待执行的所述X条指令,并将所述X条指令发送至所述 指令译码单元;所述X条指令为所述处理器在一个时钟周期内并行执行的指令。Obtain the X instructions to be executed by the instruction acquisition unit, and send the X instructions to the instruction decoding unit; the X instructions are executed in parallel by the processor within one clock cycle instruction.
在一种可能的实现方式中,所述Y个源操作数地址以及所述X个目的操作数地址的访问类型均为直接访问类型或者均为间接访问类型。In a possible implementation manner, the access types of the Y source operand addresses and the X destination operand addresses are both direct access types or indirect access types.
需要说明的是,本发明实施例中所描述的处理方法的具体流程,可参见上述图1-图10中所述的发明实施例中的相关描述,此处不再赘述。It should be noted that, for the specific flow of the processing method described in the embodiments of the present invention, reference may be made to the relevant descriptions in the embodiments of the present invention described in FIG. 1 to FIG. 10 , which will not be repeated here.
本发明实施例还提供一种计算机可读存储介质,其中,该计算机可读存储介质可存储有程序,该程序被处理器执行时,使得所述处理器可以执行上述方法实施例中记载的任意一种的部分或全部步骤。An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may store a program, and when the program is executed by a processor, the processor may execute any of the methods described in the foregoing method embodiments. Some or all of the steps of a kind.
本发明实施例还提供一种计算机程序,该计算机程序包括指令,当该计算机程序被多核处理器执行时,使得所述处理器可以执行上述方法实施例中记载的任意一种的部分或全部步骤。Embodiments of the present invention further provide a computer program, where the computer program includes instructions, when the computer program is executed by a multi-core processor, the processor can perform some or all of the steps of any one of the above method embodiments .
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可能可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described action sequence. As in accordance with the present invention, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.
在本发明所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative, for example, the division of the above-mentioned units is only a logical function division, and other division methods may be used in actual implementation, for example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The above-mentioned units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本发明各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本发明各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(Read-Only Memory,缩写:ROM)或者随机存取存储器 (Random Access Memory,缩写:RAM)等各种可以存储程序代码的介质。If the above-mentioned integrated units are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc., specifically a processor in the computer device) to execute all or part of the steps of the above methods in various embodiments of the present invention. Wherein, the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disk, Read-Only Memory (Read-Only Memory, abbreviation: ROM) or Random Access Memory (Random Access Memory, abbreviation: RAM), etc. A medium that can store program code.
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present invention.
Claims (22)
- 一种处理器,其特征在于,包括数据选择单元、与所述数据选择单元连接的指令译码单元和M行*N列通用寄存器组,所述M行*N列通用寄存器组中的每一个通用寄存器组包括K个通用寄存器;M、N和K为大于或者等于1的整数;其中,A processor, characterized in that it includes a data selection unit, an instruction decoding unit connected to the data selection unit, and an M row*N column general-purpose register group, each of the M rows*N column general-purpose register group The general-purpose register group includes K general-purpose registers; M, N, and K are integers greater than or equal to 1; among them,所述指令译码单元,用于对输入的X条指令进行译码,获取所述X条指令各自的至少一个源操作数地址,共计Y个源操作数地址;并将所述Y个源操作数地址发送至所述数据选择单元;所述Y个源操作数地址中的每一个源操作数地址中包括至少一个列地址位、至少一个行地址位和至少一个目标地址位;所述至少一个列地址位用于指示所述每一个源操作数地址所属的通用寄存器列,所述至少一个行地址位用于指示所述每一个源操作数地址所属的通用寄存器行;所述至少一个目标地址位用于指示所述每一个源操作数地址与所述K个通用寄存器中的第t个通用寄存器的对应关系;X、Y和t为大于或者等于1的整数;The instruction decoding unit is used to decode the input X instructions, obtain at least one source operand address of each of the X instructions, and a total of Y source operand addresses; and operate the Y source operands. A number address is sent to the data selection unit; each of the Y source operand addresses includes at least one column address bit, at least one row address bit and at least one destination address bit; the at least one The column address bit is used to indicate the general-purpose register column to which each source operand address belongs, and the at least one row address bit is used to indicate the general-purpose register row to which each source operand address belongs; the at least one destination address Bit is used to indicate the correspondence between each source operand address and the t-th general-purpose register in the K general-purpose registers; X, Y, and t are integers greater than or equal to 1;所述数据选择单元,用于根据第i个源操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第i个源操作数地址对应的通用寄存器,获取对应的源操作数;所述第i个源操作数地址为所述Y个源操作数地址中的一个;i为大于或者等于1,且小于或者等于Y的整数。The data selection unit is configured to select the data in the M row*N column according to the at least one column address bit, the at least one row address bit and the at least one destination address bit included in the i-th source operand address Access the general register corresponding to the ith source operand address in the general register group to obtain the corresponding source operand; the ith source operand address is one of the Y source operand addresses; i is An integer greater than or equal to 1 and less than or equal to Y.
- 根据权利要求1所述的处理器,其特征在于,所述处理器还包括与所述数据选择单元连接的执行单元,所述执行单元包括至少一个算术逻辑单元;The processor of claim 1, wherein the processor further comprises an execution unit connected to the data selection unit, the execution unit comprising at least one arithmetic logic unit;所述至少一个算术逻辑单元,用于基于所述Y个源操作数地址各自对应的所述源操作数执行所述X条指令,得到所述X条指令各自的计算结果。The at least one arithmetic logic unit is configured to execute the X instructions based on the source operands corresponding to the Y source operand addresses, and obtain respective calculation results of the X instructions.
- 根据权利要求2所述的处理器,其特征在于,所述处理器还包括与所述执行单元连接的结果写回单元;其中,The processor of claim 2, wherein the processor further comprises a result write-back unit connected to the execution unit; wherein,所述指令译码单元,还用于获取所述X条指令各自的目的操作数地址,共计X个目的操作数地址;所述X个目的操作数地址中的每一个目的操作数地址中包括所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位;所述至少一个列地址位用于指示所述每一个目的操作数地址所属的所述通用寄存器列,所述至少一个行地址位用于指示所述每一个目的操作数地址所属的所述通用寄存器行;所述至少一个目标地址位用于指示所述每一个目的操作数地址与所述K个通用寄存器中的所述第t个通用寄存器的对应关系;The instruction decoding unit is also used to obtain the respective destination operand addresses of the X instructions, totaling X destination operand addresses; each destination operand address in the X destination operand addresses includes all the destination operand addresses. the at least one column address bit, the at least one row address bit, and the at least one target address bit; the at least one column address bit is used to indicate the general-purpose register column to which each destination operand address belongs, and the The at least one row address bit is used to indicate the general register row to which each destination operand address belongs; the at least one target address bit is used to indicate that each destination operand address and the K general registers The corresponding relationship of the t-th general-purpose register in ;所述结果写回单元,用于根据第j个目的操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第j个目的操作数地址对应的所述通用寄存器,并将第j条指令的所述计算结果写回所述第j个目的操作数地址对应的所述通用寄存器中;j为大于或者等于1,且小于或者等于X的整数。The result write-back unit is configured to write the result in the M row*N according to the at least one column address bit, the at least one row address bit and the at least one target address bit included in the jth destination operand address. Accessing the general-purpose register corresponding to the j-th destination operand address in the column general-purpose register group, and writing the calculation result of the j-th instruction back to the general-purpose register corresponding to the j-th destination operand address In; j is an integer greater than or equal to 1 and less than or equal to X.
- 根据权利要求1所述的处理器,其特征在于,所述数据选择单元,具体用于:The processor according to claim 1, wherein the data selection unit is specifically configured to:根据所述第i个源操作数地址中包括的所述至少一个列地址位,确定所述第i个源操作数地址所属的第i’个通用寄存器列,并在所述第i’个通用寄存器列中,根据所述第i个源操作数地址中包括的所述至少一个行地址位和所述至少一个目标地址位访问对应的所述通用寄存器,i’为大于或者等于1,且小于或者等于N的整数;或者,According to the at least one column address bit included in the i-th source operand address, determine the i'th general-purpose register column to which the i-th source operand address belongs, and perform the operation in the i'th general-purpose register column. In the register column, the corresponding general-purpose register is accessed according to the at least one row address bit and the at least one target address bit included in the i-th source operand address, where i' is greater than or equal to 1 and less than or an integer equal to N; or,根据所述第i个源操作数地址中包括的所述至少一个行地址位,确定所述第i个源操作数地址所属的第i”个通用寄存器行,并在所述第i”个通用寄存器行中,根据所述第i个源操作数地址中包括的所述至少一个列地址位和所述至少一个目标地址位访问对应的所述通用寄存器,i”为大于或者等于1,且小于或者等于M的整数。According to the at least one row address bit included in the ith source operand address, determine the ith general register row to which the ith source operand address belongs, and set the ith general register row to which the ith source operand address belongs. In the register row, the corresponding general-purpose register is accessed according to the at least one column address bit and the at least one target address bit included in the i-th source operand address, where i" is greater than or equal to 1 and less than or an integer equal to M.
- 根据权利要求3所述的处理器,其特征在于,所述结果写回单元,具体用于:The processor according to claim 3, wherein the result write-back unit is specifically used for:根据所述第j个目标操作数地址中包括的所述至少一个列地址位,确定所述第j个目标操作数地址所属的第j’个通用寄存器列,并在所述第j’个通用寄存器列中,根据所述第j个目标操作数地址中包括的所述至少一个行地址位和所述至少一个目标地址位访问对应的所述通用寄存器,j’为大于或者等于1,且小于或者等于N的整数;或者,According to the at least one column address bit included in the jth target operand address, determine the j'th general-purpose register column to which the jth target operand address belongs, and perform the operation in the j'th general-purpose register column. In the register column, the corresponding general-purpose register is accessed according to the at least one row address bit and the at least one target address bit included in the jth target operand address, where j' is greater than or equal to 1 and less than or an integer equal to N; or,根据所述第j个目标操作数地址中包括的所述至少一个行地址位,确定所述第j个目标操作数地址所属的第j”个通用寄存器行,并在所述第j”个通用寄存器行中,根据所述第j个目标操作数地址中包括的所述至少一个列地址位和所述至少一个目标地址位访问对应的所述通用寄存器,j”为大于或者等于1,且小于或者等于M的整数。According to the at least one row address bit included in the jth target operand address, determine the j"th general-purpose register row to which the jth target operand address belongs, and perform the operation in the j"th general purpose register row. In the register row, the corresponding general-purpose register is accessed according to the at least one column address bit and the at least one target address bit included in the jth target operand address, where j" is greater than or equal to 1 and less than or equal to 1. or an integer equal to M.
- 根据权利要求3述的处理器,其特征在于,所述Y个源操作数地址中的任意两个不同的源操作数地址属于不同的所述通用寄存器列;所述X个目的操作数地址中的任意两个不同的目的操作数地址属于不同的所述通用寄存器列。The processor according to claim 3, wherein any two different source operand addresses in the Y source operand addresses belong to different general register columns; among the X destination operand addresses Any two different destination operand addresses belong to different columns of the general-purpose register.
- 根据权利要求3所述的处理器,其特征在于,所述处理器还包括临时寄存器,所述临时寄存器与目标通用寄存器对应;所述目标通用寄存器为所述M行*N列通用寄存器组中的一个;所述临时寄存器存储有所述目标通用寄存器内的数据;所述Y个源操作数地址中包括多个相同的数量大于第一阈值的第一源操作数地址,以及第二源操作数地址;所述第一源操作数地址对应的所述通用寄存器为所述目标通用寄存器,所述第二源操作数地址属于所述目标通用寄存器所在的所述通用寄存器列;The processor of claim 3, wherein the processor further comprises a temporary register, the temporary register corresponds to a target general-purpose register; the target general-purpose register is in the general-purpose register group of M rows*N columns one; the temporary register stores the data in the target general-purpose register; the Y source operand addresses include a plurality of first source operand addresses with the same number greater than the first threshold, and the second source operand address address; the general-purpose register corresponding to the first source operand address is the target general-purpose register, and the second source operand address belongs to the general-purpose register row where the target general-purpose register is located;所述数据选择单元,还用于根据所述第一源操作数地址访问对应的所述临时寄存器,获取对应的源操作数。The data selection unit is further configured to access the corresponding temporary register according to the address of the first source operand, and obtain the corresponding source operand.
- 根据权利要求3所述的处理器,其特征在于,所述处理器还包括临时寄存器,所述临时寄存器与目标通用寄存器对应;所述目标通用寄存器为所述M行*N列通用寄存器组中的一个;所述临时寄存器存储有所述目标通用寄存器内的数据;所述X个目的操作数地址中包括多个相同的数量大于第一阈值的第一目的操作数地址,以及第二目的操作数地址;所述第一目的操作数地址对应的所述通用寄存器为所述目标通用寄存器,所述第二目的操作数地址属于所述目标通用寄存器所在的所述通用寄存器列;The processor of claim 3, wherein the processor further comprises a temporary register, the temporary register corresponds to a target general-purpose register; the target general-purpose register is in the general-purpose register group of M rows*N columns one; the temporary register stores the data in the target general-purpose register; the X destination operand addresses include a plurality of first destination operand addresses whose number is greater than the first threshold, and the second destination operation address; the general-purpose register corresponding to the first destination operand address is the target general-purpose register, and the second destination operand address belongs to the general-purpose register row where the target general-purpose register is located;所述结果写回单元,还用于根据所述目标目的操作数地址访问对应的所述临时寄存器,并将对应的所述计算结果写回所述临时寄存器中。The result write-back unit is further configured to access the corresponding temporary register according to the target destination operand address, and write the corresponding calculation result back into the temporary register.
- 根据权利要求1-8任意一项所述的处理器,其特征在于,所述处理器还包括与所述指令译码单元连接的指令获取单元;其中,The processor according to any one of claims 1-8, wherein the processor further comprises an instruction acquisition unit connected to the instruction decoding unit; wherein,所述指令获取单元,用于获取待执行的所述X条指令,并将所述X条指令发送至所述指令译码单元;所述X条指令为所述处理器在一个时钟周期内并行执行的指令。The instruction acquisition unit is configured to acquire the X instructions to be executed, and send the X instructions to the instruction decoding unit; the X instructions are parallel for the processor within one clock cycle instruction to execute.
- 根据权利要求1-9任意一项所述的处理器,其特征在于,所述Y个源操作数地址以及所述X个目的操作数地址的访问类型均为直接访问类型或者均为间接访问类型。The processor according to any one of claims 1-9, wherein the access types of the Y source operand addresses and the X destination operand addresses are both direct access types or indirect access types .
- 一种处理方法,应用于处理器,其特征在于,所述处理器包括数据选择单元、与所述数据选择单元连接的指令译码单元和M行*N列通用寄存器组,所述M行*N列通用寄存器组中的每一个通用寄存器组包括K个通用寄存器;M、N和K为大于或者等于1的整数;所述方法包括:A processing method, applied to a processor, characterized in that the processor comprises a data selection unit, an instruction decoding unit connected to the data selection unit, and a general-purpose register group of M rows*N columns, the M rows* Each general-purpose register group in the N-column general-purpose register group includes K general-purpose registers; M, N and K are integers greater than or equal to 1; the method includes:通过所述指令译码单元,对输入的X条指令进行译码,获取所述X条指令各自的至少一个源操作数地址,共计Y个源操作数地址;并将所述Y个源操作数地址发送至所述数据选择单元;所述Y个源操作数地址中的每一个源操作数地址中包括至少一个列地址位、至少一个行地址位和至少一个目标地址位;所述至少一个列地址位用于指示所述每一个源操作数地址所属的通用寄存器列,所述至少一个行地址位用于指示所述每一个源操作数地址所属的通用寄存器行;所述至少一个目标地址位用于指示所述每一个源操作数地址与所述K个通用寄存器中的第t个通用寄存器的对应关系;X、Y和t为大于或者等于1的整数;Through the instruction decoding unit, the input X instructions are decoded, and at least one source operand address of each of the X instructions is obtained, which is a total of Y source operand addresses; and the Y source operands are address is sent to the data selection unit; each of the Y source operand addresses includes at least one column address bit, at least one row address bit and at least one destination address bit; the at least one column address bit The address bits are used to indicate the general-purpose register column to which each source operand address belongs, and the at least one row address bit is used to indicate the general-purpose register row to which each source operand address belongs; the at least one destination address bit Used to indicate the correspondence between each source operand address and the t-th general-purpose register in the K general-purpose registers; X, Y, and t are integers greater than or equal to 1;通过所述数据选择单元,根据第i个源操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第i个源操作数地址对应的通用寄存器,获取对应的源操作数;所述第i个源操作数地址为所述Y个源操作数地址中的一个;i为大于或者等于1,且小于或者等于Y的整数。By the data selection unit, according to the at least one column address bit, the at least one row address bit and the at least one destination address bit included in the i-th source operand address are common in the M rows*N columns Access the general-purpose register corresponding to the i-th source operand address in the register group to obtain the corresponding source operand; the i-th source operand address is one of the Y source operand addresses; i is greater than or an integer equal to 1 and less than or equal to Y.
- 根据权利要求11所述的方法,其特征在于,所述处理器还包括与所述数据选择单元连接的执行单元,所述执行单元包括至少一个算术逻辑单元;所述方法还包括:The method of claim 11, wherein the processor further comprises an execution unit connected to the data selection unit, the execution unit comprising at least one arithmetic logic unit; the method further comprises:通过所述至少一个算术逻辑单元,基于所述Y个源操作数地址各自对应的所述源操作数执行所述X条指令,得到所述X条指令各自的计算结果。Through the at least one arithmetic logic unit, the X instructions are executed based on the source operands corresponding to the Y source operand addresses, respectively, to obtain respective calculation results of the X instructions.
- 根据权利要求12所述的方法,其特征在于,所述处理器还包括与所述执行单元连接的结果写回单元;所述方法还包括:The method according to claim 12, wherein the processor further comprises a result write-back unit connected to the execution unit; the method further comprises:通过所述指令译码单元,获取所述X条指令各自的目的操作数地址,共计X个目的操作数地址;所述X个目的操作数地址中的每一个目的操作数地址中包括所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位;所述至少一个列地址位用于指示所述每一个目的操作数地址所属的所述通用寄存器列,所述至少一个行地址位用于指示 所述每一个目的操作数地址所属的所述通用寄存器行;所述至少一个目标地址位用于指示所述每一个目的操作数地址与所述K个通用寄存器中的所述第t个通用寄存器的对应关系;Through the instruction decoding unit, the respective destination operand addresses of the X instructions are acquired, with a total of X destination operand addresses; each destination operand address in the X destination operand addresses includes the at least one column address bit, the at least one row address bit and the at least one target address bit; the at least one column address bit is used to indicate the general register column to which each destination operand address belongs, the at least one column address bit One row address bit is used to indicate the general-purpose register row to which each destination operand address belongs; the at least one target address bit is used to indicate that each destination operand address is associated with the K general-purpose registers. the correspondence of the t-th general-purpose register;通过所述结果写回单元,根据第j个目的操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第j个目的操作数地址对应的所述通用寄存器,并将第j条指令的所述计算结果写回所述第j个目的操作数地址对应的所述通用寄存器中;j为大于或者等于1,且小于或者等于X的整数。According to the result write-back unit, according to the at least one column address bit, the at least one row address bit and the at least one target address bit included in the jth destination operand address, in the M row*N column Access the general-purpose register corresponding to the j-th destination operand address in the general-purpose register group, and write the calculation result of the j-th instruction back into the general-purpose register corresponding to the j-th destination operand address ; j is an integer greater than or equal to 1 and less than or equal to X.
- 根据权利要求11所述的方法,其特征在于,所述通过所述数据选择单元,根据第i个源操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第i个源操作数地址对应的通用寄存器,包括:The method according to claim 11, wherein the data selection unit is based on the at least one column address bit, the at least one row address bit and all the bits included in the i-th source operand address. The at least one target address bit accesses the general-purpose register corresponding to the i-th source operand address in the M-row*N-column general-purpose register group, including:通过所述数据选择单元,根据所述第i个源操作数地址中包括的所述至少一个列地址位,确定所述第i个源操作数地址所属的第i’个通用寄存器列,并在所述第i’个通用寄存器列中,根据所述第i个源操作数地址中包括的所述至少一个行地址位和所述至少一个目标地址位访问对应的所述通用寄存器,i’为大于或者等于1,且小于或者等于N的整数;或者,Through the data selection unit, according to the at least one column address bit included in the i-th source operand address, determine the i'th general-purpose register column to which the i-th source operand address belongs, and perform a In the i'th general-purpose register column, the corresponding general-purpose register is accessed according to the at least one row address bit and the at least one target address bit included in the i-th source operand address, where i' is: an integer greater than or equal to 1 and less than or equal to N; or,通过所述数据选择单元,根据所述第i个源操作数地址中包括的所述至少一个行地址位,确定所述第i个源操作数地址所属的第i”个通用寄存器行,并在所述第i”个通用寄存器行中,根据所述第i个源操作数地址中包括的所述至少一个列地址位和所述至少一个目标地址位访问对应的所述通用寄存器,i”为大于或者等于1,且小于或者等于M的整数。Through the data selection unit, according to the at least one row address bit included in the i-th source operand address, determine the i-th general register row to which the i-th source operand address belongs, and set the row in the i-th source operand address. In the i" th general register row, the corresponding general register is accessed according to the at least one column address bit and the at least one target address bit included in the i th source operand address, where i" is: An integer greater than or equal to 1 and less than or equal to M.
- 根据权利要求13所述的方法,其特征在于,所述通过所述结果写回单元,根据第j个目的操作数地址中包括的所述至少一个列地址位、所述至少一个行地址位和所述至少一个目标地址位在所述M行*N列通用寄存器组中访问所述第j个目的操作数地址对应的所述通用寄存器,包括:14. The method according to claim 13, wherein the writing back unit through the result is based on the at least one column address bit, the at least one row address bit included in the jth destination operand address and the The at least one target address bit accesses the general-purpose register corresponding to the j-th destination operand address in the M-row*N-column general-purpose register group, including:通过所述结果写回单元,根据所述第j个目标操作数地址中包括的所述至少一个列地址位,确定所述第j个目标操作数地址所属的第j’个通用寄存器列,并在所述第j’个通用寄存器列中,根据所述第j个目标操作数地址中包括的所述至少一个行地址位和所述至少一个目标地址位访问对应的所述通用寄存器,j’为大于或者等于1,且小于或者等于N的整数;或者,Through the result write-back unit, according to the at least one column address bit included in the jth target operand address, determine the j'th general register column to which the jth target operand address belongs, and In the j'th general register column, the corresponding general register is accessed according to the at least one row address bit and the at least one target address bit included in the jth target operand address, j' is an integer greater than or equal to 1 and less than or equal to N; or,通过所述结果写回单元,根据所述第j个目标操作数地址中包括的所述至少一个行地址位,确定所述第j个目标操作数地址所属的第j”个通用寄存器行,并在所述第j”个通用寄存器行中,根据所述第j个目标操作数地址中包括的所述至少一个列地址位和所述至少一个目标地址位访问对应的所述通用寄存器,j”为大于或者等于1,且小于或者等于M的整数。Through the result write-back unit, according to the at least one row address bit included in the jth target operand address, determine the j"th general register row to which the jth target operand address belongs, and In the j"-th general-purpose register row, the corresponding general-purpose register is accessed according to the at least one column address bit and the at least one target address bit included in the j-th target operand address, j" is an integer greater than or equal to 1 and less than or equal to M.
- 根据权利要求13所述的方法,其特征在于,所述Y个源操作数地址中的任意两个不同的源操作数地址属于不同的所述通用寄存器列;所述X个目的操作数地址中的任意两个不同的目的操作数地址属于不同的所述通用寄存器列。The method according to claim 13, wherein any two different source operand addresses in the Y source operand addresses belong to different general register columns; among the X destination operand addresses Any two different destination operand addresses belong to different columns of the general-purpose register.
- 根据权利要求13所述的方法,其特征在于,所述处理器还包括临时寄存器,所述临时寄存器与目标通用寄存器对应;所述目标通用寄存器为所述M行*N列通用寄存器组中的一个;所述临时寄存器存储有所述目标通用寄存器内的数据;所述Y个源操作数地址中包括多个相同的数量大于第一阈值的第一源操作数地址,以及第二源操作数地址;所述第一源操作数地址对应的所述通用寄存器为所述目标通用寄存器,所述第二源操作数地址属于所述目标通用寄存器所在的所述通用寄存器列;所述方法还包括:The method according to claim 13, wherein the processor further comprises a temporary register, the temporary register corresponds to a target general-purpose register; the target general-purpose register is a general-purpose register set in the M row*N column general-purpose register group. One; the temporary register stores the data in the target general-purpose register; the Y source operand addresses include a plurality of first source operand addresses with the same number greater than the first threshold, and the second source operand address; the general-purpose register corresponding to the first source operand address is the target general-purpose register, and the second source operand address belongs to the general-purpose register column where the target general-purpose register is located; the method further includes :通过所述数据选择单元,根据所述第一源操作数地址访问对应的所述临时寄存器,获取对应的源操作数。Through the data selection unit, the corresponding temporary register is accessed according to the address of the first source operand, and the corresponding source operand is acquired.
- 根据权利要求13所述的方法,其特征在于,所述处理器还包括临时寄存器,所述临时寄存器与目标通用寄存器对应;所述目标通用寄存器为所述M行*N列通用寄存器组中的一个;所述临时寄存器存储有所述目标通用寄存器内的数据;所述X个目的操作数地址中包括多个相同的数量大于第一阈值的第一目的操作数地址,以及第二目的操作数地址;所述第一目的操作数地址对应的所述通用寄存器为所述目标通用寄存器,所述第二目的操作数地址属于所述目标通用寄存器所在的所述通用寄存器列;所述方法还包括:The method according to claim 13, wherein the processor further comprises a temporary register, the temporary register corresponds to a target general-purpose register; the target general-purpose register is a general-purpose register set in the M row*N column general-purpose register group. One; the temporary register stores the data in the target general-purpose register; the X destination operand addresses include a plurality of first destination operand addresses whose number is greater than the first threshold, and the second destination operand address; the general-purpose register corresponding to the first destination operand address is the target general-purpose register, and the second destination operand address belongs to the general-purpose register row where the target general-purpose register is located; the method further includes :通过所述结果写回单元,根据所述目标目的操作数地址访问对应的所述临时寄存器,并将对应的所述计算结果写回所述临时寄存器中。Through the result write-back unit, the corresponding temporary register is accessed according to the target destination operand address, and the corresponding calculation result is written back into the temporary register.
- 根据权利要求11-18任意一项所述的方法,其特征在于,所述处理器还包括与所述指令译码单元连接的指令获取单元;所述方法还包括:The method according to any one of claims 11-18, wherein the processor further comprises an instruction acquisition unit connected to the instruction decoding unit; the method further comprises:通过所述指令获取单元,获取待执行的所述X条指令,并将所述X条指令发送至所述指令译码单元;所述X条指令为所述处理器在一个时钟周期内并行执行的指令。Obtain the X instructions to be executed by the instruction acquisition unit, and send the X instructions to the instruction decoding unit; the X instructions are executed in parallel by the processor within one clock cycle instruction.
- 根据权利要求11-19任意一项所述的方法,其特征在于,所述Y个源操作数地址以及所述X个目的操作数地址的访问类型均为直接访问类型或者均为间接访问类型。The method according to any one of claims 11-19, wherein the access types of the Y source operand addresses and the X destination operand addresses are both direct access types or indirect access types.
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述权利要求11-20中任意一项所述的方法。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method described in any one of the preceding claims 11-20 is implemented.
- 一种计算机程序,其特征在于,所述计算机可读程序包括指令,当所述计算机程序被处理器执行时,使得所述处理器执行如上述权利要求11-20中任意一项所述的方法。A computer program, characterized in that the computer-readable program includes instructions that, when the computer program is executed by a processor, cause the processor to perform the method according to any one of the preceding claims 11-20 .
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202080105555.6A CN116507999B (en) | 2020-09-29 | 2020-09-29 | A processor, a processing method and related equipment |
PCT/CN2020/118836 WO2022067510A1 (en) | 2020-09-29 | 2020-09-29 | Processor, processing method, and related device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/118836 WO2022067510A1 (en) | 2020-09-29 | 2020-09-29 | Processor, processing method, and related device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022067510A1 true WO2022067510A1 (en) | 2022-04-07 |
Family
ID=80949254
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/118836 WO2022067510A1 (en) | 2020-09-29 | 2020-09-29 | Processor, processing method, and related device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116507999B (en) |
WO (1) | WO2022067510A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119003002A (en) * | 2024-07-19 | 2024-11-22 | 摩尔线程智能科技(北京)有限责任公司 | Processor, display card, computer device, and dependency release method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1334512A (en) * | 2000-07-18 | 2002-02-06 | 多思资讯(集团)有限公司 | Stack register file and its control method |
CN1601462A (en) * | 2003-09-27 | 2005-03-30 | 英特尔公司 | Extended register space device of processor and method thereof |
CN1766834A (en) * | 2005-01-20 | 2006-05-03 | 西安电子科技大学 | Dual ALU RISC 8-bit Microcontroller |
CN102262611A (en) * | 2010-05-25 | 2011-11-30 | 无锡华润矽科微电子有限公司 | 16-site RISC (Reduced Instruction-Set Computer) CUP (Central Processing Unit) system structure |
CN102314333A (en) * | 2010-06-22 | 2012-01-11 | 国际商业机器公司 | Be used to expand the method and system of the number of the general-purpose register that can be used for instructing |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3345050B2 (en) * | 1992-08-31 | 2002-11-18 | 株式会社日立製作所 | Two-dimensional array type memory system |
JP3739797B2 (en) * | 1995-10-06 | 2006-01-25 | パトリオット サイエンティフィック コーポレイション | Reduced instruction set computer microprocessor structure |
US5982696A (en) * | 1996-06-06 | 1999-11-09 | Cirrus Logic, Inc. | Memories with programmable address decoding and systems and methods using the same |
US6085282A (en) * | 1997-09-24 | 2000-07-04 | Motorola, Inc. | Method and apparatus for distinguishing register reads from memory reads in a flash memory |
-
2020
- 2020-09-29 CN CN202080105555.6A patent/CN116507999B/en active Active
- 2020-09-29 WO PCT/CN2020/118836 patent/WO2022067510A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1334512A (en) * | 2000-07-18 | 2002-02-06 | 多思资讯(集团)有限公司 | Stack register file and its control method |
CN1601462A (en) * | 2003-09-27 | 2005-03-30 | 英特尔公司 | Extended register space device of processor and method thereof |
CN1766834A (en) * | 2005-01-20 | 2006-05-03 | 西安电子科技大学 | Dual ALU RISC 8-bit Microcontroller |
CN102262611A (en) * | 2010-05-25 | 2011-11-30 | 无锡华润矽科微电子有限公司 | 16-site RISC (Reduced Instruction-Set Computer) CUP (Central Processing Unit) system structure |
CN102314333A (en) * | 2010-06-22 | 2012-01-11 | 国际商业机器公司 | Be used to expand the method and system of the number of the general-purpose register that can be used for instructing |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119003002A (en) * | 2024-07-19 | 2024-11-22 | 摩尔线程智能科技(北京)有限责任公司 | Processor, display card, computer device, and dependency release method |
Also Published As
Publication number | Publication date |
---|---|
CN116507999A (en) | 2023-07-28 |
CN116507999B (en) | 2024-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11977886B2 (en) | Systems, methods, and apparatuses for tile store | |
CN113762490B (en) | Matrix multiplication speedup for sparse matrices using column folding and squeezing | |
US11403071B2 (en) | Systems and methods for performing instructions to transpose rectangular tiles | |
EP3623941B1 (en) | Systems and methods for performing instructions specifying ternary tile logic operations | |
US8412917B2 (en) | Data exchange and communication between execution units in a parallel processor | |
EP4290371A2 (en) | Systems and methods for performing instructions to transform matrices into row-interleaved format | |
EP3757769B1 (en) | Systems and methods to skip inconsequential matrix operations | |
US11579883B2 (en) | Systems and methods for performing horizontal tile operations | |
US12282525B2 (en) | Systems, methods, and apparatuses for matrix operations | |
KR20170110686A (en) | A vector processor configured to operate on variable length vectors using instructions to combine and divide vectors, | |
CN111443948B (en) | Instruction execution method, processor and electronic equipment | |
US5053986A (en) | Circuit for preservation of sign information in operations for comparison of the absolute value of operands | |
EP4462249A2 (en) | Matrix transpose and multiply | |
US5119324A (en) | Apparatus and method for performing arithmetic functions in a computer system | |
CN104133748A (en) | Method and system to combine corresponding half word units from multiple register units within a microprocessor | |
EP1861775A2 (en) | Processor and method of indirect register read and write operations | |
US20140207838A1 (en) | Method, apparatus and system for execution of a vector calculation instruction | |
US7877571B2 (en) | System and method of determining an address of an element within a table | |
WO2022067510A1 (en) | Processor, processing method, and related device | |
CN108959180B (en) | Data processing method and system | |
CN111291320A (en) | Double-precision floating-point complex matrix operation optimization method based on HXDDSP chip | |
US6427200B1 (en) | Multiple changeable addressing mapping circuit | |
EP3929732A1 (en) | Matrix data scatter and gather by row | |
US7788471B2 (en) | Data processor and methods thereof | |
CN118426831A (en) | Vector processor and operation method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20955536 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202080105555.6 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20955536 Country of ref document: EP Kind code of ref document: A1 |