CN115617396B

CN115617396B - Register allocation method and device applied to novel artificial intelligence processor

Info

Publication number: CN115617396B
Application number: CN202211225754.8A
Authority: CN
Inventors: 官孝峰; 周浩; 许志鹏; 张忠军
Original assignee: Shanghai Enflame Technology Co ltd
Current assignee: Shanghai Suiyuan Technology Co ltd
Priority date: 2022-10-09
Filing date: 2022-10-09
Publication date: 2023-08-29
Anticipated expiration: 2042-10-09
Also published as: CN115617396A

Abstract

The invention discloses a register allocation method and a device applied to a novel artificial intelligence processor, comprising the following steps: before the instruction scheduling of a compiler, constructing an intra-block grouping diagram according to grouping limitation of each virtual register in a target chip in the instruction, and analyzing a register grouping relation according to the intra-block grouping diagram so as to split the grouping of the registers; after the instruction scheduling is completed, constructing a register block conflict graph according to the block limit of each virtual register in the instruction, and carrying out block assignment on the registers according to the register block conflict graph; during register allocation, each register is allocated and designated according to the grouping limit, the block designation and the life-time conflict limit of the register. The technical scheme of the embodiment of the invention can ensure the program functionality problem under the design of the two-stage hardware register structure of the block-grouping of the artificial intelligent chip, and can meet the load balancing requirement of register resources while ensuring the accuracy of the register allocation result.

Description

Register allocation method and device applied to novel artificial intelligence processor

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a register allocation method and a register allocation device applied to a novel artificial intelligence processor.

Background

In the design of chip integrated circuits, large Register files (Register files) may be implemented as Static Random-Access memories (SRAMs), and such large Register files are often implemented as multi-block (multi-band) structures due to limitations imposed by various factors such as complexity, power consumption, clock cycles, etc. in the design of the processor.

Under a multi-block architecture, each register block contains only a portion of the registers, and since each register block contains only one Read Port (Read Port), one Write Port (Write Port), or a single Read/Write Port (Read/Write Port), a block conflict may occur when an instruction reads and writes operands from multiple register blocks in the same clock cycle. FIG. 1 is a schematic diagram of an instruction in a register two-block design in the prior art, as shown in FIG. 1, when the register adopts the two-block design, a block conflict may occur during execution of a program instruction, and for the instruction 0, the operands v0 and v1 are respectively from the block 0 and the block 1, and for the instruction 1, the operands v0 and v2 are both from the block 0, thereby causing the block conflict.

In the existing method, when the register block conflict is processed, a hardware or software control mode can be adopted. The simplest hardware processing method is to improve the design of hardware control logic, namely, a pipeline stall (pipeline stall) mode is adopted to ensure the storage read-write of a register block conflict period, and in some GPU designs, a register buffer (Buffering) mode or a high-speed register buffer (Register File Cache, RFC) mode is adopted to reduce the probability of the register block conflict or the cost caused by the register block conflict, so that the pipeline performance is ensured. However, both of the above methods increase the complexity of the hardware design process, and further affect the performance of the chip in terms of area, clock, and power consumption.

In contrast, some register designs do not typically consider handling register conflicts on hardware, but rather completely avoid register conflicts that may exist in instructions by software. The software mode is that in the process of compiling program source code, register block conflict abstract description is used as a register conflict graph (Register Conflict Graph, RCG), the register conflict graph is subjected to graph coloring, register block assignment is carried out, and the effect of register allocation under the limitation of the register block is completed by combining the existing register allocation method.

However, in such a register design, hardware cannot guarantee the register block conflict, and the accuracy of instruction execution is guaranteed by the register allocation process of the compiler software. The operands of each instruction must ultimately originate from the same register block in the resulting final executable file. This can lead to two significant problems:

problem one: FIG. 2 is a schematic diagram of an instruction under another Register two-block design in the prior art, for the instruction in FIG. 2, since the Register conflict graph cannot be two-colored, when the total number of registers is 2, in order to avoid the occurrence of Register block conflict, the compiler can only generate a Register overflow (Register space) instruction to ensure the function, and the Register overflow instruction overflows to bring about more obvious running performance loss in many cases;

and a second problem: in order to completely realize the register allocation process of the compiler, the program is ensured not to have register block conflict, and unacceptable compiling time overhead is usually existed in the prior art. It has been found that finding a bipartite graph with the smallest register overflow overhead in a register conflict graph is often an NP-complete problem, so when the problem is complex in size, there may be a possibility that the compilation overhead is unacceptable.

In the existing design of the novel artificial intelligent chip, the design of a register is generally further simplified due to the consideration of hardware factors such as power consumption, control area and the like. In particular, by omitting cross-wiring (Crossbar) in the register design, the design complexity of the register module can be further reduced. FIG. 3 is a schematic diagram of a structure of a register block in a conventional novel artificial intelligence chip, as shown in FIG. 3, each register block is connected with an operation logic unit through a cross connection line. FIG. 4 is a schematic diagram of a two-stage structure of register block-grouping in the design of a conventional novel artificial intelligence chip, as shown in FIG. 4, each register block includes a plurality of groups, and each group is directly connected to an arithmetic logic unit. Thus, for an arithmetic logic unit, it can only operate on register data within the same grouping of different blocks.

Aiming at the design of the chip, the complexity of hardware design can be reduced by reducing the cross connection lines, which is beneficial to reducing the hardware area and improving the operation frequency of hardware. But, in contrast, this design also brings more restrictions on the use of registers, i.e. register reads and writes within the same clock cycle need to meet the restrictions on the use of groups of registers (registers within the same group must be used) in addition to the restrictions on the blocking of registers (registers within different blocks must be used). When the register blocking is problematic, the hardware only guarantees the functions of the basic registers by a pipeline pause method, and the register grouping limitation needs to be guaranteed by compiler software. In addition, in the chip design disclosed in the prior art, similar hardware design is not realized, but a register allocation method based on a register block structure design does not have the guarantee of hardware on register block conflict, nor does the two-stage structure of block-grouping adopted in hardware be considered, so that the related method is not suitable for carrying out register allocation on the novel artificial intelligent chip.

Disclosure of Invention

The embodiment of the invention provides a register allocation method and a device applied to a novel artificial intelligent processor, which can ensure the program functionality problem under the structural design of a 'block-grouping' two-stage hardware register of a novel artificial intelligent chip, and equally allocate registers in a plurality of register blocks while ensuring that a compiling result meets functional correctness, thereby realizing simplification of hardware register design on the software and hardware collaborative design.

In a first aspect, an embodiment of the present invention provides a method for allocating registers in a two-stage hardware register structure, where the method includes:

before the instruction scheduling of a compiler, constructing an intra-block grouping diagram according to grouping limitation of each virtual register in a target chip in the instruction, and analyzing virtual register grouping relation according to the intra-block grouping diagram so as to split the grouping of the virtual registers;

after the instruction scheduling is completed, constructing a register block conflict graph according to the block limit of each virtual register in the instruction, and performing block assignment on the virtual registers according to the register block conflict graph;

during register allocation, each virtual register is allocated and designated according to grouping limitation, block designation and life-time conflict limitation of the virtual register.

Optionally, according to grouping restriction of each virtual register in the target chip in the instruction, building an intra-block grouping graph, including:

and acquiring the operand of the virtual register read in each instruction, and connecting all virtual registers corresponding to each instruction to obtain an intra-block knot map corresponding to each virtual register.

Optionally, analyzing the virtual register junction relationship according to the intra-block junction graph to split the grouping of the virtual registers, including:

acquiring a plurality of knot groups and target registers in the intra-block knot map;

before all instructions taking a target register as a source operand, inserting a copy instruction corresponding to the target register, and modifying the instructions according to the copy instruction;

and updating the intra-block junction graph according to the modified instruction to obtain a new intra-block junction graph.

Optionally, performing block assignment on the virtual register according to the register block conflict graph includes:

and adopting a block coloring algorithm based on a greedy algorithm to allocate and assign the blocks of the virtual register.

Optionally, allocating and assigning each virtual register according to the grouping restriction, the partitioning specification and the lifetime conflict restriction of the register, including:

And distributing and appointing each virtual register according to the knot group limit, the partition appointing and the life-time conflict limit of the register generated by the new intra-block knot map.

In a second aspect, an embodiment of the present invention further provides a register allocation apparatus, including:

the block diagram construction module is used for constructing an intra-block diagram according to grouping limitation of each virtual register in the target chip in the instruction before the instruction scheduling of the compiler, and analyzing the virtual register grouping relation according to the intra-block diagram so as to split the grouping of the virtual registers;

the conflict graph construction module is used for constructing a register block conflict graph according to the block limit of each virtual register in the instruction after the instruction scheduling is completed, and carrying out block assignment on the virtual registers according to the register block conflict graph;

and the allocation module is used for allocating and appointing each virtual register according to grouping limitation, block appointing and life-time conflict limitation of the register when the registers are allocated.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processors;

A storage means for storing one or more programs;

the register allocation method provided by any embodiment of the present invention is implemented when the one or more programs are executed by the one or more processors, such that the one or more processors execute the programs.

In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium storing a computer program, which when executed by a processor implements the register allocation method provided in any of the embodiments of the present invention.

In a fifth aspect, embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements the register allocation method provided by any of the embodiments of the present invention.

Before the instruction scheduling of a compiler, the technical scheme of the embodiment of the invention constructs an intra-block knot map according to the grouping limitation of each virtual register in the instruction in a target chip, and analyzes the virtual register knot relation according to the intra-block knot map so as to split the grouping of the virtual registers; after the instruction scheduling is completed, constructing a register block conflict graph according to the block limit of each virtual register in the instruction, and performing block assignment on the virtual registers according to the register block conflict graph; when the registers are allocated, according to the grouping limitation, the block assignment and the life-time conflict limitation of the registers, the technical means of allocating and assigning the virtual registers can ensure the program functionality problem under the design of the two-stage hardware register structure of the artificial intelligent chip, and the accuracy of the register allocation result is ensured, and meanwhile, the requirement of load balancing of register resources is met, so that the complexity of the design of the hardware registers is reduced.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a prior art register with two-block design;

FIG. 2 is a schematic diagram of an instruction under another register two-block design according to the prior art;

FIG. 3 is a schematic diagram of a prior art architecture of register partitioning in a novel artificial intelligence chip;

FIG. 4 is a schematic diagram of a two-stage structure of register partitioning-grouping in the design of a novel artificial intelligence chip in the prior art;

FIG. 5a is a flowchart of a register allocation method according to a first embodiment of the present invention;

FIG. 5b is a block diagram of an intra-block knot set according to one embodiment of the present invention;

FIG. 5c is a block conflict graph provided in accordance with a first embodiment of the present invention;

FIG. 6a is a flowchart of a register allocation method according to a second embodiment of the present invention;

FIG. 6b is a diagram of an intra-block knot set provided according to a second embodiment of the present invention;

FIG. 6c is a diagram of a new intra-block junction set provided in accordance with a second embodiment of the present invention;

FIG. 7 is a schematic diagram of a register allocation apparatus according to a third embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device implementing a register allocation method according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 5a is a flowchart of a register allocation method according to an embodiment of the present invention, where the method may be performed by a register allocation device, and the embodiment may be applicable to a case of allocating registers in a new artificial intelligence chip. The register allocation device may be implemented by software and/or hardware, and may be generally integrated in an electronic device having a data processing function, and specifically includes the following steps:

And 110, before dispatching the compiler instruction, constructing an intra-block grouping diagram according to grouping limitation of each virtual register in the target chip in the instruction, and analyzing virtual register grouping relation according to the intra-block grouping diagram so as to split the grouping of the virtual registers.

In this embodiment, the target chip may be a novel artificial intelligence chip. When the register is applied to the target chip, a developer can generate an instruction (i.e., the intermediate expression Intermediate Representation) with the virtual register as an operand in the compiler according to the function requirement corresponding to the target chip. Before each virtual register is allocated (i.e. the virtual register is mapped to a physical register), each instruction using the virtual register may be acquired, then a grouping relation (i.e. grouping limitation) between the virtual registers in each instruction is acquired, and a plurality of virtual registers with the grouping relation are sequentially connected according to each instruction, so as to obtain an intra-block grouping diagram. Specifically, when an operand of an instruction contains an excess or equal number of input virtual registers and an output register, a node restriction relationship is required to be embodied in the intra-block node map. Typically, an instruction uses virtual register x, virtual register y as an input operand, and virtual register z as an output operand, so that there is a grouping relationship between virtual registers x, y, z, which need to be interconnected in an intra-block grouping diagram.

In one implementation of this embodiment, the building block intra-block group map includes: and acquiring the virtual register operand read in each instruction, and connecting the related virtual registers corresponding to each instruction to obtain the intra-block knot map corresponding to each virtual register.

Specifically, virtual registers in all operands of each instruction can be obtained, the virtual register name is taken as a node, the grouping relation is taken as a (undirected) edge, and an intra-block grouping diagram corresponding to the code is constructed.

In a specific embodiment, assume that the instructions in the target chip are:

b＝*

c＝*

d＝*

g＝*

a＝vadd b,c

e＝vsub b,a

f＝vmul g,d

from the above instruction, it is known that the virtual register a has a junction relationship with the virtual registers b and c, the virtual register e has a junction relationship with the virtual registers b and a, and the virtual register f has a junction relationship with the virtual registers g and d, so that the intra-block junction graph obtained by the above relationship can be shown in fig. 5 b. In fig. 5b, if there is a connection between two virtual registers, it can be considered that there is a junction relationship between the two virtual registers.

It can be determined from fig. 5b that in the intra-block junction graph formed by the codes, two sub-graphs are formed, namely node { a, b, c, e } and its associated edges form a first sub-graph and node { g, d, f } and its associated edges form a second sub-graph. Nodes in the subgraph are limited by packet relationships and must be allocated and specified from physical registers in the same packet. Taking the 2-4 register "group-blocking" hardware setting in FIG. 4 as an example, virtual register nodes { a, b, c, e } may be allocated any 4 different registers from among physical register set 0, i.e., { v0, v4, v8, … }, or physical register set 1, i.e., { v1, v5, v9, … }, or physical register set 2, i.e., { v2, v6, v10, … }, or physical register set 3, i.e., { v3, v7, v11, … }, and eventually register designated. Similarly, virtual registers { g, d, f } may also be register allocated from one of the four physical register sets described above and designated as registers. In this example, one register designation that complies with the register grouping restriction, i.e., the virtual register to physical register mapping, is: { a, b, c, e } - > { v0, v4, v8, v12}, { g, d, f } - > { v1, v5, v7}. In addition, because the grouping assignment of virtual register sets of the first sub-graph and the second sub-graph does not require mutual exclusion between the sets, { a, b, c, e } - > { v0, v4, v8, v12}, { g, d, f } - > { v16, v20, v4} is also a register assignment that can satisfy the register grouping restriction.

And 120, after the instruction scheduling is completed, constructing a register block conflict graph according to the block limit of each virtual register in the instruction, and performing block assignment on the virtual registers according to the register block conflict graph.

In this embodiment, after the intra-block join graph is constructed, the instruction may be scheduled, and then a block conflict graph corresponding to each virtual register may be constructed according to the block restriction of each virtual register in the instruction. Specifically, assuming that the input operand of the same instruction includes both virtual register x and virtual register y, it may be determined that there is a block conflict relationship between virtual register x and virtual register y, i.e., virtual register x and virtual register y must be allocated to registers from different register blocks.

In one implementation of this embodiment, constructing a register block conflict map according to the block constraints of each virtual register in an instruction includes: and acquiring a plurality of operands read in each instruction, and constructing a corresponding block conflict graph by taking the names of the virtual register operands as nodes and taking the block conflict relationship as an (undirected) edge.

In a specific embodiment, assume that the instructions in the target chip are:

b＝*

c＝*

d＝*

g＝*

a＝vadd b,c

e＝vsub b,a

f＝vmul g,d

As is clear from the above instruction, the virtual register b and the virtual register c have a block conflict relationship, the virtual register b and the virtual register a have a block conflict relationship, and the virtual register g and the virtual register d have a block conflict relationship, so that the block conflict graph obtained from the block conflict relationship can be shown in fig. 5 c. In fig. 5c, if there is a connection between two virtual registers, it can be considered that the two virtual registers have a block conflict relationship.

According to the block conflict graph, the block assignment of the virtual register can be performed. In one implementation manner of this embodiment, performing block assignment on a virtual register according to the register block conflict graph includes: and adopting a block coloring algorithm based on a greedy algorithm to allocate and assign the blocks of the virtual register. The block coloring algorithm, i.e., coloring from the high-level node, greedily finds a color that is different from all neighboring nodes. If such a color exists, the node is designated that color, and when the number of colors of adjacent nodes is the same as the number of register blocks, the node is arbitrarily designated that color. When one node coloring is complete, then all its uncolored neighbors will continue to be colored. When one sub-graph is colored, the same process is repeated for the next sub-graph.

In this embodiment, taking the 2-4 register "group-block" hardware setting of fig. 4 as an example, there are two blocks of hardware, so 2-shading can be performed on fig. 5 c. One tile designation that meets the above algorithm is { a, c, d, f } - > tile 0, { b, g, e } - > tile 1. In this process, the register pressures for partition 0 and partition 1 can be calculated in a simulated manner to ensure that the partition allocation of registers is uniform.

Step 130, during register allocation, each virtual register is allocated and designated according to the grouping limit, the block designation and the lifetime conflict limit of the register.

In a specific embodiment, assume that the instructions in the target chip are:

b＝*

c＝*

d＝*

g＝*

a＝vadd b,c

e＝vsub b,a

f＝vmul g,d

the intra-block grouping diagram and the block conflict diagram are shown in fig. 5b and 5c, taking the 2-4 register grouping-block hardware configuration of fig. 4 as an example, a register instruction conforming to both the register grouping limitation and the register block limitation is: a- > v0 (chunk 0, group 0), b- > v4 (chunk 1, group 0), c- > v8 (chunk 0, group 0), d- > v1 (chunk 0, group 1), e- > v12 (chunk 1, group 0), f- > v9 (chunk 0, group 1), g- > v5 (chunk 1, group 1).

In the register allocation, however, it is considered to utilize the existing register lifetime analysis technique to reduce the number of registers used by the register reuse. Such as the following: a- > v0 (block 0, group 0), b- > v4 (block 1, group 0), c- > v0 (block 0, group 0), d- > v1 (block 0, group 1), e- > v4 (block 1, group 0), f- > v1 (block 0, group 1), g- > v5 (block 1, group 1), the register usage is reduced from 7 to 4 by reuse of virtual registers a and c, virtual registers b and e, and virtual registers f and d lifetime.

Compared with the prior art, the embodiment solves the problem of program function guarantee under the hardware design of the grouping-blocking of the register in the novel artificial intelligent chip in steps through a reasonable execution flow, and is different from other technical schemes for avoiding generating codes with the blocking conflict of the register through a compiler. The reliability of a register allocation result under the 'block-grouping' design of register hardware can be ensured by constructing an intra-block grouping diagram and a block conflict diagram and allocating registers together by combining intra-block grouping limitation and block assignment; by constructing a block conflict graph after instruction scheduling is completed, the pressure of registers in each block can be simulated and calculated, so that excessive pressure of the registers in a single register block is avoided, and reasonable register block assignment is further carried out; finally, register allocation is completed through three separate processes, instead of considering all complex conditions in the same process, the accuracy of allocation results can be ensured under the condition of controllable software design complexity, the execution efficiency of each component allocated by the register is improved, and finally, the effect of reducing hardware complexity is achieved through software and hardware combination design.

Before the instruction scheduling of a compiler, the technical scheme of the embodiment of the invention constructs an intra-block knot map according to the grouping limitation of each virtual register in the instruction in a target chip, and analyzes the virtual register knot relation according to the intra-block knot map so as to split the grouping of the virtual registers; after the instruction scheduling is completed, constructing a register block conflict graph according to the block limit of each virtual register in the instruction, and performing block assignment on the virtual registers according to the register block conflict graph; during register allocation, according to the grouping limitation, the block assignment and the life-time conflict limitation of the virtual registers, the technical means of allocating and assigning the virtual registers can ensure the accuracy of register allocation results under the register hardware block-grouping design, relieve the pressure of the registers and reduce the complexity of hardware design.

Example two

The present embodiment is a further refinement of the foregoing embodiments, and the same or corresponding terms as those of the foregoing embodiments are explained, which are not repeated herein. Fig. 6a is a flowchart of a register allocation method provided in the second embodiment, in this embodiment, the technical solution of the present embodiment may be combined with one or more methods in the solutions of the foregoing embodiments, as shown in fig. 6a, where the method provided in the present embodiment may further include:

Step 201, before the compiler instruction is scheduled, building an intra-block organization chart according to grouping limitation of each virtual register in the target chip in the instruction.

Step 202, obtaining a plurality of knot groups and target registers in the intra-block knot map.

In this embodiment, a 2-4 "block-group" register design is employed in the new artificial intelligence chip, which contains as many as 128 physical registers, although the number of physical registers is as high as 1024, specifically to a single block-group. For larger-scale operation type codes, it is easier to form a sub-graph with too many nodes in the intra-block node graph. When the number of virtual register nodes in the sub-graph is greater than 128 and the virtual register lifetimes conflict with each other, a single block-packet register over-pressure condition is created. While at the same time there may be other conditions where the physical registers in the block-packet are not fully used, i.e. starved, which may also be referred to as block-packet load imbalance.

In order to solve the above problem, the present embodiment provides a method for performing value segmentation on a block-inside knot map, so as to ensure load balancing of blocks and avoid excessive pressure of registers.

In a specific embodiment, assume that the instructions in the target chip are:

b＝*

c＝*

d＝*

a＝vadd b,c

e＝vsub b,d

f＝vmul c,d

the intra-block junction block diagram constructed by step 201 is shown in fig. 6b, where it can be seen that all virtual registers { a, b, c, d, e, f } are in the same sub-graph, which must be mapped into the same block. This results in a larger register pressure for a single packet.

In this step, registers common between the junction groups may be acquired in the intra-block junction map as target registers. Taking the intra-block junction diagram in fig. 6b as an example, it is assumed that the junction group 1 includes a register a, a register b, and a register c, the junction group 2 includes a register e, a register d, and a register b, and the junction group 3 includes a register f, a register d, and a register c, and the target registers common to the three junction groups are the register b, the register d, and the register c, respectively.

Step 203, inserting a copy instruction corresponding to the target register before all instructions taking the target register as a source operand, and modifying the instructions according to the copy instruction.

In this step, it is assumed that the destination register is a register b, and the copy instruction corresponding to the destination register may be b' =b. Taking the target register in step 202 as an example, the modified instruction may be:

b＝*

c＝*

d＝*

b’＝b

a＝vadd b’,c

d’＝d

e＝vsub b,d’

c’＝c

f＝vmul c’,d

And step 204, updating the intra-block junction graph according to the modified instruction to obtain a new intra-block junction graph.

In a specific embodiment, the new intra-block knot map may be obtained according to the modified instruction as shown in fig. 6 c.

And 205, after the instruction scheduling is completed, constructing a register block conflict graph according to the block limit of each virtual register in the instruction, and performing block assignment on the virtual registers according to the register block conflict graph.

Step 206, during register allocation, each virtual register is allocated and designated according to the grouping limit, the block designation and the lifetime conflict limit of the register.

In the embodiment, the value segmentation is performed on the intra-block junction graph, so that on one hand, the balance of the register load in the register block-grouping can be ensured, the pressure of a single block-grouping register is relieved, and on the other hand, the updated intra-block junction graph can be better integrated into the existing register distribution flow, and the accuracy of the register distribution result is ensured.

According to the technical scheme, before the instruction scheduling of the compiler, an intra-block diagram is constructed according to the grouping limit of each virtual register in the instruction in the target chip, a plurality of blocks and target registers are obtained in the intra-block diagram, before all instructions taking the target registers as source operands, the copy instructions corresponding to the target registers are inserted, the instructions are modified according to the copy instructions, the intra-block diagram is updated according to the modified instructions, a new intra-block diagram is obtained, after the instruction scheduling is completed, a register block conflict diagram is constructed according to the block limit of each virtual register in the instructions, the virtual registers are specified in blocks according to the register block conflict diagram, and during the register allocation, the accuracy of the register allocation results can be guaranteed according to the technical means of the grouping limit, the block specification and the life-time conflict limit of the registers, the single block-grouping register pressure can be relieved, and the complexity of hardware design is reduced.

Example III

Fig. 7 is a schematic structural diagram of a register allocation apparatus according to a third embodiment of the present invention, as shown in fig. 7, where the apparatus includes: a junction graph construction module 310, a conflict graph construction module 320, and an allocation module 330.

The node diagram construction module 310 is configured to construct an intra-block node diagram according to grouping limitation of each virtual register in the target chip in the instruction before dispatching the compiler instruction, and analyze a virtual register node relation according to the intra-block node diagram to split the grouping of the virtual registers;

the conflict graph construction module 320 is configured to construct a register block conflict graph according to the block restriction of each virtual register in the instruction after the instruction scheduling is completed, and perform block assignment on the virtual registers according to the register block conflict graph;

the allocation module 330 is configured to allocate and assign each virtual register according to the grouping constraint, the chunk assignment, and the lifetime conflict constraint of the register when the registers are allocated.

Before the instruction scheduling of a compiler, the technical scheme provided by the embodiment of the invention constructs an intra-block grouping diagram according to grouping limitation of each virtual register in the instruction in a target chip, and analyzes the virtual register grouping relation according to the intra-block grouping diagram so as to split the grouping of the virtual registers; after the instruction scheduling is completed, constructing a register block conflict graph according to the block limit of each virtual register in the instruction, and performing block assignment on the virtual registers according to the register block conflict graph; during register allocation, according to the grouping limitation, the block assignment and the life-time conflict limitation of the virtual registers, the technical means of allocating and assigning the virtual registers can ensure the accuracy of register allocation results under the register hardware block-grouping design, relieve the pressure of the registers and reduce the complexity of hardware design.

On the basis of the above embodiment, the junction graph construction module 310 includes:

the register connection unit is used for acquiring the virtual register operand read in each instruction, and connecting all virtual registers corresponding to each instruction to obtain an intra-block knot group diagram corresponding to each virtual register;

a target register obtaining unit, configured to obtain a plurality of node groups and a target register in the intra-block node map;

the copy instruction inserting unit is used for inserting a copy instruction corresponding to the target register before all instructions taking the target register as a source operand and modifying the instructions according to the copy instruction;

and the knot map updating unit is used for updating the intra-block knot map according to the modified instruction to obtain a new intra-block knot map.

The conflict graph construction module 320 includes:

a conflict register obtaining unit, configured to obtain a plurality of conflict registers having a connection relationship in the block conflict graph;

and the register block designating unit is used for distributing and designating the blocks of the virtual registers by adopting a block coloring algorithm based on a greedy algorithm.

The allocation module 330 includes:

and the allocation unit is used for allocating and appointing each virtual register according to the knot group limit, the block appointing and the life-time conflict limit of the register generated by the new intra-block knot map.

The device can execute the method provided by all the embodiments of the invention, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in the embodiments of the present invention can be found in the methods provided in all the foregoing embodiments of the present invention.

Example IV

Fig. 8 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 8, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM12 and the RAM13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as a register allocation method.

In some embodiments, the register allocation method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM12 and/or the communication unit 19. When a computer program is loaded into RAM13 and executed by processor 11, one or more steps of the register allocation method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the register allocation method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of register allocation, the method comprising:

before the instruction scheduling of a compiler, constructing an intra-block grouping diagram according to grouping limitation of each virtual register in a target chip in the instruction, and analyzing the grouping limitation of the virtual registers according to the intra-block grouping diagram so as to split the virtual registers;

2. The method of claim 1, wherein constructing the intra-block join graph based on the grouping restriction in the instruction for each virtual register in the target chip comprises:

3. The method of claim 1, wherein analyzing virtual register junction relationships from the intra-block junction graph to split groupings of virtual registers comprises:

4. A method according to claim 3, wherein partitioning a virtual register according to the register partitioning conflict graph comprises:

5. The method of claim 4, wherein assigning and assigning virtual registers based on grouping limits, chunk designations, and life-time conflict limits of the registers comprises:

and distributing and appointing each virtual register according to the grouping limit, the block appointing and the life-time conflict limit of the register generated by the new intra-block grouping diagram.

6. A register allocation apparatus, the apparatus comprising:

the block diagram construction module is used for constructing an intra-block diagram according to grouping limitation of each virtual register in the target chip in the instruction before the instruction scheduling of the compiler, and analyzing the grouping limitation of the virtual registers according to the intra-block diagram so as to split the virtual registers;

7. An electronic device, the electronic device comprising:

one or more processors;

a storage means for storing one or more programs;

the register allocation method of any one of claims 1-5, when the one or more programs are executed by the one or more processors, causing the one or more processors to execute the programs.

8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a register allocation method as claimed in any one of claims 1 to 5.