CN113313171A

CN113313171A - Method, device and equipment for realizing image recognition based on FPGA (field programmable Gate array) and storage medium

Info

Publication number: CN113313171A
Application number: CN202110594603.9A
Authority: CN
Inventors: 徐子扬; 王昱; 龚辞; 陈俊康; 卢意; 章馨予; 贺泷玉; 黄子浩; 周建
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-08-27

Abstract

The invention discloses a method, device, equipment and storage medium for realizing image recognition based on FPGA. The method includes: acquiring network model parameters, adjusting a convolution network model according to the network model parameters, and the network model parameters at least include the number of convolution layers and each volume Convolution parameters of the convolution layer; obtain the convolution operation command, perform convolution calculation on the input image according to the convolution parameters of the current convolution layer, and output the convolution operation result; according to the number of convolution layers and the volume of the current convolution layer Product parameters to determine whether the current convolution layer is the last convolution layer, if so, output the image recognition result according to the result of the convolution operation, otherwise, output the result of the convolution operation to the next convolution layer for convolution operation, until After the operation of the last convolutional layer is completed, the image recognition result is output according to the result outputted by the last convolutional layer. The invention solves the technical problem that the current convolutional neural network built based on FPGA does not have reconstruction and generality when performing image processing.

Description

Method, device and equipment for realizing image recognition based on FPGA (field programmable Gate array) and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a method, a device, equipment and a storage medium for realizing image recognition based on an FPGA.

Background

With the advent of the big data era, the application environment of the convolutional neural network is more complex, which requires that the network has stronger feature learning and feature expression capability. The deepening of the model depth and the complication of the hierarchy solve the problem, but also bring the problem. General purpose processors are reluctant to afford ever increasing amounts of data computation. The convolutional neural network calculation process has a large number of convolution, pooling, activation and other steps, and a large number of intermediate calculation results and weight parameters are generated along with the convolution, so that a large number of storage spaces need to be developed, and the convolutional neural network calculation method is not friendly to an application scene emphasizing real-time performance.

The CPU and the GPU are common general processors at present, the CPU is used as an operation and control center of a computer, has enough cache and enough digital logic units, and is good at processing calculation steps and complex calculation tasks of data dependence; GPU has a relatively small cache size per core, a small number of digital logic units (dmus), and a large throughput compared to CPU, and is originally generated and served by the video game due to the popularity of the video game, and around 2003, the distinctive computing power of GPU has been paid attention and is being attempted for general purpose computing. CUDA, OpenCL, etc. are now in great use in the general computing field.

Due to the defects of the two general processors in expansibility and flexibility, a hardware platform such as a Field Programmable Gate Array (FPGA) is also receiving attention from researchers. Compared with the traditional digital circuit system, the FPGA has larger parallelism, which is realized based on two technologies of concurrency and pipelining; in addition, in the resource allowable range, the user is allowed to realize the logic circuit of the user; meanwhile, the system also has the characteristic of reconfigurability, and the logic inside the system can be changed according to requirements.

High Level Synthesis (HLS) and hdl (hardware description language) are two ways to implement convolutional neural networks on FPGAs at present. The HDL language can enable a user to describe and control registers, and high-level languages such as C, C + +, OpenCL and the like cannot display logic resources and connection relations therein, but describe functions expected by the user from the algorithm level. But eventually needs to be resolved into a description in HDL language and then begin logical synthesis. Compared with HLS, the HDL language has longer development period, larger description difficulty and inconvenient maintenance, most of convolutional neural networks realized based on the HDL mode are fixed structures, but the HDL language is more favorable for resource scheduling and time sequence making, and codes have higher confidentiality.

Convolution networks implemented based on HDL language are usually fixed structures, that is, a hardware system corresponding to a trained fixed network is usually customized on an FPGA. If the network structure is changed due to subsequent training adjustment, image recognition target change and the like, the content of the hardware part has to be modified. For the program described by the HDL language, if the change is large, the problems of code change and huge workload of system timing simulation are often generated. If different layers of the convolutional neural network can be modularized under the condition of hardware resource and system real-time condition, so that the different layers can be called according to the actual structure of the network, the universality of the system can be greatly improved.

Factors such as flexibility, code autonomy, cost performance, power consumption and the like of an application scene are comprehensively considered, and the convolutional neural network can be developed on an FPGA platform by using an HDL language. How to fully utilize the parallelism and the pipelining of the FPGA to improve the network efficiency and simultaneously make the constructed convolutional neural network have the reconstruction and the universality are still the problems to be solved.

Disclosure of Invention

The invention aims to overcome the technical defects, provides a method, a device, equipment and a storage medium for realizing image recognition based on an FPGA (field programmable gate array), and solves the technical problem that a convolutional neural network built based on the FPGA in the prior art does not have reconstruction and universality during image processing.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a method for implementing image recognition based on an FPGA, including the following steps:

s1, obtaining network model parameters, and adjusting the convolution network model according to the network model parameters, wherein the network model parameters at least comprise the number of convolution layers and convolution parameters of each convolution layer;

s2, acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer, and outputting a convolution operation result;

s3, judging whether the current convolutional layer is the last convolutional layer according to the number of convolutional layers and convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.

Preferably, in the method for realizing image recognition based on FPGA, the convolution parameters of each convolutional layer at least include the current number of layers, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.

Preferably, in the method for implementing image recognition based on FPGA, the step S2 specifically includes:

s21, acquiring a convolution operation instruction, calculating an input image by adopting the current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;

s22, judging whether the pooling operability of the current convolution layer is the pooling operation needed, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;

s23, if necessary, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as the output result of the current convolution kernel after the accumulation processing is completed;

and S24, repeating the steps S21 to S23, obtaining the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernel operations of the current convolution layer are completed.

Preferably, in the method for realizing image recognition based on the FPGA, a mean pooling mode is adopted for pooling operation.

Preferably, in the method for realizing image recognition based on FPGA, the convolution kernel may calculate a plurality of input data simultaneously.

In a second aspect, the present invention further provides an apparatus for implementing image recognition based on an FPGA, including:

the model adjusting module is used for acquiring network model parameters and adjusting the convolutional network model according to the network model parameters, wherein the network model parameters at least comprise the number of convolutional layers and convolutional parameters of each convolutional layer;

the convolution calculation module is used for acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer and outputting a convolution operation result;

and the output module is used for judging whether the current convolutional layer is the last convolutional layer according to the number of the convolutional layers and the convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.

Preferably, in the apparatus for implementing image recognition based on FPGA, the convolution parameters of each convolutional layer at least include the current layer number, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.

Preferably, in the apparatus for implementing image recognition based on FPGA, the convolution calculation module is specifically configured to:

acquiring a convolution operation instruction, calculating an input image by adopting a current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;

judging whether the pooling operability of the current convolution layer is the pooling operation needed or not, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;

if so, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as an output result of the current convolution kernel after the accumulation processing is completed;

and acquiring the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernels of the current convolution layer are operated.

In a third aspect, the present invention further provides an apparatus for implementing image recognition based on an FPGA, including: a processor and a memory;

the memory has stored thereon a computer readable program executable by the processor;

the processor, when executing the computer readable program, implements the steps in the method for implementing image recognition based on an FPGA as described above.

In a fourth aspect, the present invention also provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the method for implementing image recognition based on FPGA as described above.

Compared with the prior art, the method, the device, the equipment and the storage medium for realizing the image recognition based on the FPGA have the advantages that by setting the network model parameters, the needed convolution network model is obtained by loading the corresponding network model parameters according to the specific use scene and the recognition target when the convolution calculation of the input image is carried out, and the reconstruction of the convolution neural network is not required to be carried out every time, so that the convolution neural network based on the FPGA has the reconfigurability and the universality and can be conveniently reused.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the method for implementing image recognition based on FPGA according to the present invention;

FIG. 2 is a diagram illustrating a convolution layer performing a convolution operation according to a preferred embodiment of the present invention;

FIG. 3 is a timing diagram illustrating a preferred embodiment of convolution operations performed by convolutional layers according to the present invention;

FIG. 4 is a diagram illustrating a preferred embodiment of the convolution kernel convolution operation of the present invention;

FIG. 5 is a timing diagram illustrating a preferred embodiment of the convolution kernel convolution operation of the present invention;

FIG. 6 is a schematic diagram of a preferred embodiment of the pooling operation of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, the present invention provides a method for implementing image recognition based on FPGA, which includes the following steps:

s1, obtaining network model parameters, and adjusting the convolution network model according to the network model parameters, wherein the network model parameters at least comprise convolution layer numbers and convolution parameters of each convolution layer.

In this embodiment, most of the CNN systems implemented in the HDL mode are designed for a fixed network structure, and the storage units to be used and the time for various operations to be performed in the network can be planned in advance. By using the pipeline structure, the operation efficiency of the whole network can be improved within the range allowed by resources. However, the network structure needs to be determined first, and the upper computer training part needs to obtain a determined result, so that the convolutional neural network built based on the FPGA does not have reconstructability and universality, and the collocation of the convolutional neural network needs to be performed again for different scenes and situations, such as identifying automobiles, pedestrians, lane lines and the like in an image. The embodiment of the invention is provided with the network model parameters, and before the use, the required convolutional network model is obtained by loading the corresponding network model parameters according to a specific use scene without reconstructing the convolutional neural network each time, so that the convolutional neural network based on the FPGA has the reconfigurability and the universality and can be conveniently reused.

In specific implementation, the network model parameters at least include an input image size, the number of convolution layers, and convolution parameters of each convolution layer, where the convolution parameters of each convolution layer at least include the current number of layers, the size and number of convolution kernels of each convolution layer, the size of a final output result, and pooling operability. According to different scenes, the size of an input image, the number of convolution layers and the convolution parameters of each convolution layer can be directly modified, and a corresponding convolution network model can be obtained, so that the image can be rapidly calculated, and a required result can be obtained. Wherein the pooling operability indicates whether the layer of convolutional layer needs to be pooled.

And S2, acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer, and outputting a convolution operation result.

In this embodiment, after the adjustment of the convolutional network model is completed, the convolutional calculation can be performed by using the convolutional network model. Specifically, the step S2 specifically includes:

In this embodiment, before each layer is calculated, the convolution parameters of the current convolutional layer, including the size and number of convolution kernels of the current layer, the size after the calculation of the feature map, whether pooling is performed, and the like, are read from the ROM. And then carrying out convolution calculation according to the convolution parameters to obtain a convolution operation result. As shown in fig. 2, for example, the number of input and output feature maps of the current layer is set to 6 for illustration. F1 to F6 are 6 RAM memory cells, which store the calculation results of the previous layer, that is, the feature map group to be read out for convolution operation, F7 to F12 are memory cell groups of the calculation results of the current layer, and C1 to C6 represent 6 convolution kernels. The method comprises the following steps:

the method comprises the following steps: when the current layer is the 1 st layer, copying the input original picture to F₁～F₆Performing the following steps;

step two: starting from i ═ 1(i ═ 1,2, …,6), F was added₁～F₆Data of (1) and C_iPerforming convolution operationObtaining 6 calculation results of Tj (j is 1,2, …, 6);

step three: if pooling is needed, performing pooling operation on the result Tj obtained in the step two, otherwise, entering the step four;

step four: accumulating the obtained 6 result graphs according to the corresponding pixel positions, and writing the result graphs into F_i+6Performing the following steps;

step five: judging whether to calculate to the last convolution kernel C₆If yes, ending, otherwise, carrying out the operation of the second step.

In other words, the embodiment of the present invention utilizes a memory scheduling mechanism to implement "ping-pong storage" of the feature map, which is essentially a reuse of the RAM storing the feature map. In time sequence, the calculation of different characteristic graphs between layers and in each layer is in a serial relation, and when the convolution operation is carried out, a plurality of inputs and the operation of a single convolution kernel are carried out in parallel. As shown in fig. 3, Level _1_ en is the operation control enable of the first layer of the network, Feature _11 to Feature _16 represent the inputs of the first layer, and Core _11 and Core _12 represent the first 2 of the 6 convolution kernels to be calculated in the current layer. In the first layer of calculation, it can be seen that when the first convolution kernel Core _11 is calculated, the convolution operations of 6 input images are parallel, and after the convolution operations are finished, the correlation operation of the second convolution kernel is performed. And finally, after the operation of the layer is completely finished, entering the next layer.

In the preferred embodiment, the convolution kernel is 5 × 5 or 3 × 3 when performing convolution operation, and 3 × 3 is taken as an example in the following description. Because the convolution operation module has the characteristic of multiple input and multiple output, the convolution operation can be realized by utilizing the characteristics of an FPGA (field programmable gate array) pipeline. As shown in fig. 4, a convolution operation of 3 × 3 can be performed every clock by using 2 shift registers. Before the convolution calculation is started, data buffering is carried out on the shift register, and the convolution operation is started at the 2 nd clock after the 1 st image pixel is output from the 2 nd shift register. Fig. 5 is a timing chart of the convolution operation process, where the detection window size is 32 × 32, the shift register depth is 32, the pixel Data is shifted out from the second register through 64 clocks after entering the first register, and this operation is controlled by shift _ en. Meanwhile, in the first 9 clocks, the corresponding convolution kernel Weight data is read into the corresponding Weight register. Then, the conv _ en signal goes high and the convolution operation starts.

In a preferred embodiment, in order to reduce the feature dimension and reduce the number of parameters, a pooling operation is typically performed after the convolution calculation is completed. Common pooling modes include maximum pooling and mean pooling, and the embodiment of the invention adopts the mean pooling mode to perform pooling operation. As shown in fig. 6, in the implementation of the FPGA platform, the present invention adopts a shift register to average data in 2 × 2 regions, and performs a right shift operation on the result after adding 4 data to obtain an average value.

In a preferred embodiment, the convolution kernel may perform computation on a plurality of input data simultaneously, in other words, the convolution operations on a plurality of input data are parallel, and the correlation operation of the second convolution kernel is performed after the convolution operations are completed.

In this embodiment, after the calculation of one convolutional layer is completed, that is, the output result of the current convolutional layer is used as the input of the next convolutional layer, and the convolution calculation is performed by the next convolutional layer, and the process of the convolution calculation is as shown in steps S1 and S2, which is not described herein again. And after all the convolution layers are calculated, outputting an image recognition result according to the output result of the last convolution layer which finishes the convolution calculation, and further realizing the convolution calculation of the data. The image recognition result is a specific target to be recognized in the image, such as a vehicle, a pedestrian, a lane line and the like in the image, and the required recognition result can be output in a targeted manner only by modifying the model parameters according to different scenes.

The method for realizing image recognition based on the FPGA provided by the embodiment of the invention can be suitable for various different application scenes, and can be used for detecting people and vehicles in roads, recognizing lane lines and traffic marks, recognizing and tracking ships, detecting targets in gray scenes, detecting parks and benches, detecting ultra-large objects (such as space shuttles) and the like only by modifying network model parameters according to the corresponding scenes.

According to the method, the network model parameters are set, and the required convolutional network model is obtained by loading the corresponding network model parameters according to a specific use scene during convolutional calculation, so that the convolutional neural network based on the FPGA has reconfigurability and universality and can be conveniently reused without reconstructing the convolutional neural network each time.

Based on the method for realizing image recognition based on the FPGA, the invention also correspondingly provides a device for realizing image recognition based on the FPGA, which comprises the following steps:

In a preferred embodiment, the convolution parameters of each convolutional layer at least include the current layer number, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.

In a preferred embodiment, the convolution calculation module is specifically configured to:

Since the method for implementing image recognition based on the FPGA has been described in detail above, it is not described herein again.

Based on the method for realizing image recognition based on the FPGA, the invention also correspondingly provides equipment for realizing image recognition based on the FPGA, which comprises the following steps: a processor and a memory;

the processor, when executing the computer readable program, implements the steps in the method for implementing image recognition based on FPGA as described in the embodiments above.

Based on the method for realizing image recognition based on the FPGA, the present invention also provides a computer readable storage medium, where one or more programs are stored, and the one or more programs can be executed by one or more processors to realize the steps in the method for realizing image recognition based on the FPGA according to the embodiments.

In summary, according to the method, the device, the equipment and the storage medium for realizing image recognition based on the FPGA provided by the present invention, by setting the network model parameters, the required convolutional network model is obtained by loading the corresponding network model parameters in accordance with the specific use scenario during the convolutional calculation, and the convolutional neural network does not need to be reconstructed each time, so that the convolutional neural network based on the FPGA has reconstructability and universality, and can be conveniently reused.

The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention. Any other corresponding changes and modifications made according to the technical idea of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A method for realizing image recognition based on FPGA is characterized by comprising the following steps:

2. The method of claim 1, wherein the convolution parameters of each convolutional layer at least include the current number of layers, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.

3. The method for realizing image recognition based on FPGA according to claim 2, wherein the step S2 specifically includes:

4. The FPGA-based image recognition method of claim 3, wherein a mean pooling is employed.

5. The FPGA-based image recognition method of claim 3, wherein the convolution kernel can simultaneously compute a plurality of input images.

6. An apparatus for realizing image recognition based on FPGA, comprising:

7. The apparatus of claim 6, wherein the convolution parameters of each convolutional layer at least include the current layer number, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.

8. The apparatus according to claim 7, wherein the convolution calculation module is specifically configured to:

9. An apparatus for realizing image recognition based on FPGA, comprising: a processor and a memory;

the processor, when executing the computer readable program, implements the steps in the method for implementing image recognition based on FPGA of any one of claims 1-5.

10. A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the method for implementing image recognition based on FPGA of any one of claims 1-5.