Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
In the description of the present specification, the terms "comprising," "including," "having," "containing," and the like are used in an open-ended fashion, i.e., to mean including, but not limited to. Reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the embodiments is for illustrative purposes to illustrate the implementation of the present application, and the sequence of steps is not limited and can be adjusted as needed.
The invention discloses a reconfigurable processor configuration loading scheme supporting prefetching and dynamic switching, which adopts a configuration storage scheme of distributed storage to ensure the integrity of configuration and the parallelism of instruction fetching-execution and reduce the resource consumption and delay of the whole architecture.
Fig. 1 is a schematic diagram of a configuration loading system of a reconfigurable processor according to an embodiment of the present invention, as shown in fig. 1, the system includes:
configuring the controller 11 and the processing unit array PEA12, wherein the PEA12 includes a PEA controller 121 and a plurality of processing units PE 122;
the configuration controller 11 is configured to obtain a length of configuration data and a plurality of configuration addresses required by a configuration task of the PEA; acquiring a plurality of configuration packets according to the plurality of configuration addresses of the configuration data and sending the configuration packets to the PEA controller 121 until the number of the currently acquired configuration packets is equal to the length of the configuration data; judging whether a PEA _ CP _ Finish signal sent by the PEA controller 121 is received currently, if so, prefetching configuration data of a next configuration task by repeatedly executing the steps; the configuration data comprises a plurality of configuration packets, each configuration packet corresponds to a configuration address, and the length of the configuration data is the number of the configuration packets in the configuration data;
the PEA controller 121 is configured to parse top-level configuration information from each configuration packet, determine, based on the top-level configuration information, a PE corresponding to each configuration packet, and send the configuration packet to the corresponding PE 122; after receiving PE _ CP _ Finish signals sent by all PEs 122 related to the current configuration task, sending a PEA _ CP _ Finish signal to the configuration controller;
PE122 is configured to send a PE _ CP _ Finish signal to PEA controller 121 after each execution of a configuration packet.
Therefore, in the embodiment of the present invention, the configuration controller may determine whether the PEA _ CP _ Finish signal sent by the PEA controller is received currently, and if so, prefetch the configuration data of the next configuration task, thereby implementing prefetch and dynamic switching of the configuration data, reducing the configuration time of the PEA, i.e., implementing less delay.
In specific implementation, fig. 2 is a schematic diagram of a configuration loading system of a reconfigurable processor implementing configuration loading in an embodiment of the present invention, and fig. 3 is a flowchart of prefetching a configuration packet in an embodiment of the present invention. In an embodiment, the configuration Controller Context Controller is specifically configured to: the Coprocessor Interface is communicated with a main control (generally referred to as a RISC-V processor core) through a Coprocessor Interface coprocessorinterface, the length CP _ Len and a plurality of configuration addresses CP _ Addr of configuration data required by a configuration task of the PEA are obtained, and an acknowledgement signal CP _ Ld _ Ok is returned to the Coprocessor Interface.
In an embodiment, the configuration controller is specifically configured to: and sequentially sending a configuration address CP _ Addr corresponding to each configuration packet CP _ Data to a primary configuration memory L1 Cache through an AHB-Master bus and an AHB-64 bus, and acquiring each configuration packet CP _ Data. And then, the configuration controller sends the configuration packet CP _ Data to the PEA controller, and the PEA dynamically configures by taking the configuration packet of the PE as a unit.
It should be noted that when the configuration controller accesses the L1 Cache, one configuration address is sent each time to obtain a configuration packet corresponding to the configuration address, and then the configuration address is continuously sent until the number of the configuration packets currently obtained is equal to the length of the configuration data.
After receiving each configuration packet, the PEA controller parses the top-level configuration information, specifically, the PE _ Index, from each configuration packet, determines a PE corresponding to each configuration packet based on the top-level configuration information (i.e., PE _ Index), and sends the configuration packet to the corresponding PE. It can be seen that for a configuration task, not all PEs may receive a configuration packet, for example, if a PEA includes 64 PEs, then only some of the PEs may receive a configuration packet, and some of the PEs may receive one configuration packet, while some of the PEs may receive multiple configuration packets.
In the flowchart corresponding to fig. 3, in comparison with a static acquisition mode during FPGA configuration (acquiring of each configuration requires re-performing power-on reset downloading), here, after acquiring configuration data (multiple configuration packets) required by a configuration task, a processing unit array PEA (generally, 64 PEs are provided for 1 PEA) may perform pre-fetching of the configuration data required by the next configuration task, so that dynamic switching is realized, and configuration information does not need to be re-downloaded after power-on reset. Each PE in the PEA may also obtain the next configuration packet while executing one configuration packet, that is, while performing computation, thereby improving the configuration efficiency.
Fig. 4 is a flowchart illustrating a PE executing a configuration packet according to an embodiment of the present invention, where in an embodiment, the PE includes a PE controller and a configuration memory, where the PE controller is configured to send the configuration packet to the configuration memory after receiving the configuration packet; sequentially executing the configuration packets; after executing a configuration packet each time, sending a PE _ CP _ Finish signal to the PEA controller; and the configuration memory is used for storing the configuration package. The configuration memory is a double-end configuration memory and comprises two ports, wherein one port supports reading operation, and the other port supports writing operation.
After receiving the PE _ CP _ Finish signals sent by all PEs 122 related to the current configuration task, the PEA controller sends the PEA _ CP _ Finish signals to the configuration controller;
in one embodiment, the PE controller is further configured to: after executing all configuration packets in the PE controller, sending a PE _ Task _ Finish signal to the PEA controller;
the PEA controller is further configured to: after receiving PE _ Task _ Finish signals sent by PE controllers of all PEs related to a current configuration Task, sending a PEA _ Task _ Finish signal to the configuration controller;
the configuration controller is further configured to: and after receiving the PEA _ Task _ Finish signal, stopping pre-fetching the configuration data of the next configuration Task.
In an embodiment, the PE controller is specifically configured to: and when the configuration packets are sequentially executed, reading the configuration packets from the configuration memory in a ping-pong reading mode.
In one embodiment, the number of addresses in each configuration packet is no greater than half of the number of pieces of configuration information in each configuration packet.
In the above embodiment, the address in each configuration packet is initially Addr0 or Addr8, and is generally Addr0 by default, and taking the size of each configuration packet as 16 × 64 as an example, that is, a configuration packet stores 16 pieces of configuration information at maximum, and the number of addresses in each configuration packet is not more than half of the number of pieces of configuration information in each configuration packet, that is, the number of addresses in each configuration packet is required to be less than or equal to 8, and only if this condition is satisfied, the prefetch of configuration data required by each configuration task can be implemented.
Taking an example that one configuration packet stores 16 pieces of configuration information at maximum, fig. 5 is a graph showing a relationship between reading and execution of the configuration packet when prefetching is not performed in the embodiment of the present invention, and fig. 6 is a graph showing a relationship between reading and execution of the configuration packet when prefetching is performed in the embodiment of the present invention, in fig. 5, when the number of addresses in the configuration packet is greater than 8 and less than or equal to 16, prefetching and dynamic switching of the configuration packet cannot be performed, and only one configuration packet can be read and then executed, and then read and then executed, and so on. In fig. 6, when the number of addresses in a configuration packet is less than or equal to 8, prefetching and dynamic switching of the configuration packet can be performed, and prefetching is performed while executing one configuration packet, so as to hide the reading time of the configuration packet.
In addition, when the PE controller executes the configuration packets in sequence, the configuration packets are read from the configuration memory in a ping-pong reading mode. Taking an example that a configuration packet stores 16 pieces of configuration information at maximum, fig. 7 is a schematic diagram of reading the configuration packet from the configuration memory in a ping-pong reading manner in the embodiment of the present invention, as shown in fig. 7, (1) when prefetching cannot be supported, an address in the configuration packet starts from 0, and there is no case that the address starts from a start address Addr8 of the high 8 bits of the configuration memory, where 16 pieces of configuration information include top-level configuration information, and each new configuration packet CP is read from an address 0 of the configuration memory CM inside the PE by default; (2) in the case of pre-fetching support, the lower 8 addresses and the upper 8 addresses of the configuration memory CM are made into a ping-pong format, a single processing unit PE includes at most 2 different configuration packets Configuration Package (CP), and a single configuration packet CP includes at most 8 pieces of configuration information, including the top-level configuration information; the ping-pong reading method of the configuration package is that if the address in the first configuration package CP1 read is the lower 8-bit address Addr 0-7 of the configuration memory, when executing the first configuration package CP1, the address Addr 8-15 of the next configuration package CP2 read in the configuration memory; if the address of the second configuration packet CP2 is the 8-bit Addr 8-15 of the configuration memory, the address of the next configuration packet CP3 in the configuration memory is Addr 0-7 while CP2 is executed, and so on. The configuration speed of the whole PEA array can be obviously improved by effectively hiding the configuration time by adopting a ping-pong reading mode.
In summary, the configuration loading system of the reconfigurable processor according to the embodiment of the present invention includes: configuring a controller and a processing element array PEA, wherein the PEA comprises a PEA controller and a plurality of processing elements PE; the configuration controller is used for acquiring the length of configuration data required by a configuration task of the PEA and a plurality of configuration addresses; acquiring a plurality of configuration packets according to a plurality of configuration addresses of the configuration data and sending the configuration packets to the PEA controller until the number of the currently acquired configuration packets is equal to the length of the configuration data; judging whether a PEA _ CP _ Finish signal sent by a PEA controller is received currently, if so, prefetching configuration data of a next configuration task by repeatedly executing the steps; the configuration data comprises a plurality of configuration packets, each configuration packet corresponds to a configuration address, and the length of the configuration data is the number of the configuration packets in the configuration data; the PEA controller is used for analyzing top-level configuration information from each configuration package, determining a PE corresponding to each configuration package based on the top-level configuration information and sending the configuration package to the corresponding PE; after receiving PE _ CP _ Finish signals sent by all PEs related to the current configuration task, sending a PEA _ CP _ Finish signal to a configuration controller; and the PE is used for sending a PE _ CP _ Finish signal to the PEA controller after executing one configuration packet each time. In the system, the configuration controller can judge whether a PEA _ CP _ Finish signal sent by the PEA controller is received currently, and if so, the configuration data of the next configuration task is prefetched, so that the configuration data prefetching and dynamic switching are realized, the configuration time of the PEA is reduced, namely, less delay is realized, and the configuration-execution pipelining can be efficiently realized, thereby hiding the stage of fetching the finger, which is the stage of reading the configuration packet. Under the condition that configuration prefetching exists, the configuration storage structure of the distributed configuration memory of the PE can effectively hide configuration time, and the configuration speed of the whole PEA can be obviously improved.
Based on the same inventive concept, the embodiment of the present invention further provides a configuration loading method for a reconfigurable processor, as described in the following embodiments. Because the principles of solving the problems are similar to those of a configuration loading system of a reconfigurable processor, the implementation of the method can be referred to the implementation of the system, and repeated parts are not described in detail.
Fig. 8 is a flowchart of a configuration loading method for a reconfigurable processor according to an embodiment of the present invention, and as shown in fig. 8, the method includes:
step 801, acquiring the length of configuration data and a plurality of configuration addresses required by a configuration task of a PEA; the configuration data comprises a plurality of configuration packets, each configuration packet corresponds to a configuration address, and the length of the configuration data is the number of the configuration packets in the configuration data;
step 802, obtaining a plurality of configuration packets according to a plurality of configuration addresses of configuration data and sending the configuration packets to a PEA controller of a processing unit array PEA until the number of the currently obtained configuration packets is equal to the length of the configuration data; the PEA controller analyzes top-level configuration information from each configuration packet, determines a processing unit PE corresponding to each configuration packet based on the top-level configuration information, sends the configuration packet to the corresponding PE, and sends a PEA _ CP _ Finish signal to the configuration controller after receiving PE _ CP _ Finish signals sent by all PEs related to the current configuration task;
step 803, judging whether the PEA _ CP _ Finish signal sent by the PEA controller is received currently, if so, prefetching configuration data of a next configuration task by repeatedly executing the steps.
In an embodiment, obtaining the length of the configuration data and the plurality of configuration addresses required by the configuration task of the PEA includes:
and communicating with the master control through the coprocessor interface, acquiring the length of configuration data and a plurality of configuration addresses required by the configuration task of the PEA, and returning a confirmation signal to the coprocessor interface.
In one embodiment, obtaining a plurality of configuration packets according to a plurality of configuration addresses of configuration data includes:
and sequentially sending the configuration address corresponding to each configuration packet to the primary configuration memory through an AHB-Master bus and an AHB-64 bus to obtain each configuration packet.
In one embodiment, the PE includes a PE controller and a configuration memory, wherein,
the PE controller is used for sending the configuration packet to the configuration memory after receiving the configuration packet; sequentially executing the configuration packets; after executing a configuration packet each time, sending a PE _ CP _ Finish signal to the PEA controller;
and the configuration memory is used for storing the configuration package.
In one embodiment, the PE controller is further configured to: after executing all configuration packets in the PE controller, sending a PE _ Task _ Finish signal to the PEA controller;
the PEA controller is further configured to: after receiving PE _ Task _ Finish signals sent by PE controllers of all PEs related to a current configuration Task, sending a PEA _ Task _ Finish signal to the configuration controller;
the method further comprises the following steps: and after receiving the PEA _ Task _ Finish signal, stopping pre-fetching the configuration data of the next configuration Task.
In an embodiment, the PE controller is specifically configured to:
and when the configuration packets are sequentially executed, reading the configuration packets from the configuration memory in a ping-pong reading mode.
In one embodiment, the number of addresses in each configuration packet is less than the number of pieces of configuration information in each configuration packet.
In summary, in the method provided in the embodiment of the present invention, the length of the configuration data and the plurality of configuration addresses required by the configuration task of the PEA are obtained; the configuration data comprises a plurality of configuration packets, each configuration packet corresponds to a configuration address, and the length of the configuration data is the number of the configuration packets in the configuration data; acquiring a plurality of configuration packets according to a plurality of configuration addresses of the configuration data and sending the configuration packets to a PEA controller of a processing unit array PEA until the number of the currently acquired configuration packets is equal to the length of the configuration data; the PEA controller analyzes top-level configuration information from each configuration packet, determines a processing unit PE corresponding to each configuration packet based on the top-level configuration information, sends the configuration packet to the corresponding PE, and sends a PEA _ CP _ Finish signal to the configuration controller after receiving PE _ CP _ Finish signals sent by all PEs related to the current configuration task; and judging whether a PEA _ CP _ Finish signal sent by the PEA controller is received currently, if so, prefetching configuration data of a next configuration task by repeatedly executing the steps. Therefore, whether the PEA _ CP _ Finish signal sent by the PEA controller is received or not can be judged, if yes, the configuration data of the next configuration task is prefetched, so that the prefetching and the dynamic switching of the configuration data are realized, the configuration time of the PEA is reduced, namely, less delay is realized, the 'configuration-execution' pipelining can be efficiently realized, and the 'instruction fetching' stage is hidden. Under the condition that configuration prefetching exists, the configuration storage structure of the distributed configuration memory of the PE can effectively hide configuration time, and the configuration speed of the whole PEA can be obviously improved.
An embodiment of the present application further provides a computer device, and fig. 9 is a schematic diagram of a computer device in an embodiment of the present invention, where the computer device is capable of implementing all steps in the configuration loading method of the reconfigurable processor in the foregoing embodiment, and the computer device specifically includes the following contents:
a processor (processor)901, a memory (memory)902, a communication Interface (Communications Interface)903, and a communication bus 904;
the processor 901, the memory 902 and the communication interface 903 complete mutual communication through the communication bus 904; the communication interface 903 is used for realizing information transmission among related devices such as server-side devices, detection devices, user-side devices and the like;
the processor 901 is configured to call the computer program in the memory 902, and when the processor executes the computer program, the processor implements all the steps in the configuration loading method of the reconfigurable processor in the above embodiments.
An embodiment of the present application further provides a computer-readable storage medium, which can implement all the steps in the configuration loading method for the reconfigurable processor in the above embodiment, where the computer-readable storage medium stores a computer program, and when the computer program is executed by the processor, the computer program implements all the steps of the configuration loading method for the reconfigurable processor in the above embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.