Disclosure of Invention
The invention aims to overcome the technical defects, provides a method, a device, equipment and a storage medium for realizing image recognition based on an FPGA (field programmable gate array), and solves the technical problem that a convolutional neural network built based on the FPGA in the prior art does not have reconstruction and universality during image processing.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for implementing image recognition based on an FPGA, including the following steps:
s1, obtaining network model parameters, and adjusting the convolution network model according to the network model parameters, wherein the network model parameters at least comprise the number of convolution layers and convolution parameters of each convolution layer;
s2, acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer, and outputting a convolution operation result;
s3, judging whether the current convolutional layer is the last convolutional layer according to the number of convolutional layers and convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.
Preferably, in the method for realizing image recognition based on FPGA, the convolution parameters of each convolutional layer at least include the current number of layers, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.
Preferably, in the method for implementing image recognition based on FPGA, the step S2 specifically includes:
s21, acquiring a convolution operation instruction, calculating an input image by adopting the current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;
s22, judging whether the pooling operability of the current convolution layer is the pooling operation needed, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;
s23, if necessary, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as the output result of the current convolution kernel after the accumulation processing is completed;
and S24, repeating the steps S21 to S23, obtaining the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernel operations of the current convolution layer are completed.
Preferably, in the method for realizing image recognition based on the FPGA, a mean pooling mode is adopted for pooling operation.
Preferably, in the method for realizing image recognition based on FPGA, the convolution kernel may calculate a plurality of input data simultaneously.
In a second aspect, the present invention further provides an apparatus for implementing image recognition based on an FPGA, including:
the model adjusting module is used for acquiring network model parameters and adjusting the convolutional network model according to the network model parameters, wherein the network model parameters at least comprise the number of convolutional layers and convolutional parameters of each convolutional layer;
the convolution calculation module is used for acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer and outputting a convolution operation result;
and the output module is used for judging whether the current convolutional layer is the last convolutional layer according to the number of the convolutional layers and the convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.
Preferably, in the apparatus for implementing image recognition based on FPGA, the convolution parameters of each convolutional layer at least include the current layer number, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.
Preferably, in the apparatus for implementing image recognition based on FPGA, the convolution calculation module is specifically configured to:
acquiring a convolution operation instruction, calculating an input image by adopting a current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;
judging whether the pooling operability of the current convolution layer is the pooling operation needed or not, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;
if so, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as an output result of the current convolution kernel after the accumulation processing is completed;
and acquiring the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernels of the current convolution layer are operated.
In a third aspect, the present invention further provides an apparatus for implementing image recognition based on an FPGA, including: a processor and a memory;
the memory has stored thereon a computer readable program executable by the processor;
the processor, when executing the computer readable program, implements the steps in the method for implementing image recognition based on an FPGA as described above.
In a fourth aspect, the present invention also provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the method for implementing image recognition based on FPGA as described above.
Compared with the prior art, the method, the device, the equipment and the storage medium for realizing the image recognition based on the FPGA have the advantages that by setting the network model parameters, the needed convolution network model is obtained by loading the corresponding network model parameters according to the specific use scene and the recognition target when the convolution calculation of the input image is carried out, and the reconstruction of the convolution neural network is not required to be carried out every time, so that the convolution neural network based on the FPGA has the reconfigurability and the universality and can be conveniently reused.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the present invention provides a method for implementing image recognition based on FPGA, which includes the following steps:
s1, obtaining network model parameters, and adjusting the convolution network model according to the network model parameters, wherein the network model parameters at least comprise convolution layer numbers and convolution parameters of each convolution layer.
In this embodiment, most of the CNN systems implemented in the HDL mode are designed for a fixed network structure, and the storage units to be used and the time for various operations to be performed in the network can be planned in advance. By using the pipeline structure, the operation efficiency of the whole network can be improved within the range allowed by resources. However, the network structure needs to be determined first, and the upper computer training part needs to obtain a determined result, so that the convolutional neural network built based on the FPGA does not have reconstructability and universality, and the collocation of the convolutional neural network needs to be performed again for different scenes and situations, such as identifying automobiles, pedestrians, lane lines and the like in an image. The embodiment of the invention is provided with the network model parameters, and before the use, the required convolutional network model is obtained by loading the corresponding network model parameters according to a specific use scene without reconstructing the convolutional neural network each time, so that the convolutional neural network based on the FPGA has the reconfigurability and the universality and can be conveniently reused.
In specific implementation, the network model parameters at least include an input image size, the number of convolution layers, and convolution parameters of each convolution layer, where the convolution parameters of each convolution layer at least include the current number of layers, the size and number of convolution kernels of each convolution layer, the size of a final output result, and pooling operability. According to different scenes, the size of an input image, the number of convolution layers and the convolution parameters of each convolution layer can be directly modified, and a corresponding convolution network model can be obtained, so that the image can be rapidly calculated, and a required result can be obtained. Wherein the pooling operability indicates whether the layer of convolutional layer needs to be pooled.
And S2, acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer, and outputting a convolution operation result.
In this embodiment, after the adjustment of the convolutional network model is completed, the convolutional calculation can be performed by using the convolutional network model. Specifically, the step S2 specifically includes:
s21, acquiring a convolution operation instruction, calculating an input image by adopting the current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;
s22, judging whether the pooling operability of the current convolution layer is the pooling operation needed, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;
s23, if necessary, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as the output result of the current convolution kernel after the accumulation processing is completed;
and S24, repeating the steps S21 to S23, obtaining the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernel operations of the current convolution layer are completed.
In this embodiment, before each layer is calculated, the convolution parameters of the current convolutional layer, including the size and number of convolution kernels of the current layer, the size after the calculation of the feature map, whether pooling is performed, and the like, are read from the ROM. And then carrying out convolution calculation according to the convolution parameters to obtain a convolution operation result. As shown in fig. 2, for example, the number of input and output feature maps of the current layer is set to 6 for illustration. F1 to F6 are 6 RAM memory cells, which store the calculation results of the previous layer, that is, the feature map group to be read out for convolution operation, F7 to F12 are memory cell groups of the calculation results of the current layer, and C1 to C6 represent 6 convolution kernels. The method comprises the following steps:
the method comprises the following steps: when the current layer is the 1 st layer, copying the input original picture to F1~F6Performing the following steps;
step two: starting from i ═ 1(i ═ 1,2, …,6), F was added1~F6Data of (1) and CiPerforming convolution operationObtaining 6 calculation results of Tj (j is 1,2, …, 6);
step three: if pooling is needed, performing pooling operation on the result Tj obtained in the step two, otherwise, entering the step four;
step four: accumulating the obtained 6 result graphs according to the corresponding pixel positions, and writing the result graphs into Fi+6Performing the following steps;
step five: judging whether to calculate to the last convolution kernel C6If yes, ending, otherwise, carrying out the operation of the second step.
In other words, the embodiment of the present invention utilizes a memory scheduling mechanism to implement "ping-pong storage" of the feature map, which is essentially a reuse of the RAM storing the feature map. In time sequence, the calculation of different characteristic graphs between layers and in each layer is in a serial relation, and when the convolution operation is carried out, a plurality of inputs and the operation of a single convolution kernel are carried out in parallel. As shown in fig. 3, Level _1_ en is the operation control enable of the first layer of the network, Feature _11 to Feature _16 represent the inputs of the first layer, and Core _11 and Core _12 represent the first 2 of the 6 convolution kernels to be calculated in the current layer. In the first layer of calculation, it can be seen that when the first convolution kernel Core _11 is calculated, the convolution operations of 6 input images are parallel, and after the convolution operations are finished, the correlation operation of the second convolution kernel is performed. And finally, after the operation of the layer is completely finished, entering the next layer.
In the preferred embodiment, the convolution kernel is 5 × 5 or 3 × 3 when performing convolution operation, and 3 × 3 is taken as an example in the following description. Because the convolution operation module has the characteristic of multiple input and multiple output, the convolution operation can be realized by utilizing the characteristics of an FPGA (field programmable gate array) pipeline. As shown in fig. 4, a convolution operation of 3 × 3 can be performed every clock by using 2 shift registers. Before the convolution calculation is started, data buffering is carried out on the shift register, and the convolution operation is started at the 2 nd clock after the 1 st image pixel is output from the 2 nd shift register. Fig. 5 is a timing chart of the convolution operation process, where the detection window size is 32 × 32, the shift register depth is 32, the pixel Data is shifted out from the second register through 64 clocks after entering the first register, and this operation is controlled by shift _ en. Meanwhile, in the first 9 clocks, the corresponding convolution kernel Weight data is read into the corresponding Weight register. Then, the conv _ en signal goes high and the convolution operation starts.
In a preferred embodiment, in order to reduce the feature dimension and reduce the number of parameters, a pooling operation is typically performed after the convolution calculation is completed. Common pooling modes include maximum pooling and mean pooling, and the embodiment of the invention adopts the mean pooling mode to perform pooling operation. As shown in fig. 6, in the implementation of the FPGA platform, the present invention adopts a shift register to average data in 2 × 2 regions, and performs a right shift operation on the result after adding 4 data to obtain an average value.
In a preferred embodiment, the convolution kernel may perform computation on a plurality of input data simultaneously, in other words, the convolution operations on a plurality of input data are parallel, and the correlation operation of the second convolution kernel is performed after the convolution operations are completed.
S3, judging whether the current convolutional layer is the last convolutional layer according to the number of convolutional layers and convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.
In this embodiment, after the calculation of one convolutional layer is completed, that is, the output result of the current convolutional layer is used as the input of the next convolutional layer, and the convolution calculation is performed by the next convolutional layer, and the process of the convolution calculation is as shown in steps S1 and S2, which is not described herein again. And after all the convolution layers are calculated, outputting an image recognition result according to the output result of the last convolution layer which finishes the convolution calculation, and further realizing the convolution calculation of the data. The image recognition result is a specific target to be recognized in the image, such as a vehicle, a pedestrian, a lane line and the like in the image, and the required recognition result can be output in a targeted manner only by modifying the model parameters according to different scenes.
The method for realizing image recognition based on the FPGA provided by the embodiment of the invention can be suitable for various different application scenes, and can be used for detecting people and vehicles in roads, recognizing lane lines and traffic marks, recognizing and tracking ships, detecting targets in gray scenes, detecting parks and benches, detecting ultra-large objects (such as space shuttles) and the like only by modifying network model parameters according to the corresponding scenes.
According to the method, the network model parameters are set, and the required convolutional network model is obtained by loading the corresponding network model parameters according to a specific use scene during convolutional calculation, so that the convolutional neural network based on the FPGA has reconfigurability and universality and can be conveniently reused without reconstructing the convolutional neural network each time.
Based on the method for realizing image recognition based on the FPGA, the invention also correspondingly provides a device for realizing image recognition based on the FPGA, which comprises the following steps:
the model adjusting module is used for acquiring network model parameters and adjusting the convolutional network model according to the network model parameters, wherein the network model parameters at least comprise the number of convolutional layers and convolutional parameters of each convolutional layer;
the convolution calculation module is used for acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer and outputting a convolution operation result;
and the output module is used for judging whether the current convolutional layer is the last convolutional layer according to the number of the convolutional layers and the convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.
In a preferred embodiment, the convolution parameters of each convolutional layer at least include the current layer number, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.
In a preferred embodiment, the convolution calculation module is specifically configured to:
acquiring a convolution operation instruction, calculating an input image by adopting a current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;
judging whether the pooling operability of the current convolution layer is the pooling operation needed or not, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;
if so, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as an output result of the current convolution kernel after the accumulation processing is completed;
and acquiring the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernels of the current convolution layer are operated.
Since the method for implementing image recognition based on the FPGA has been described in detail above, it is not described herein again.
Based on the method for realizing image recognition based on the FPGA, the invention also correspondingly provides equipment for realizing image recognition based on the FPGA, which comprises the following steps: a processor and a memory;
the memory has stored thereon a computer readable program executable by the processor;
the processor, when executing the computer readable program, implements the steps in the method for implementing image recognition based on FPGA as described in the embodiments above.
Since the method for implementing image recognition based on the FPGA has been described in detail above, it is not described herein again.
Based on the method for realizing image recognition based on the FPGA, the present invention also provides a computer readable storage medium, where one or more programs are stored, and the one or more programs can be executed by one or more processors to realize the steps in the method for realizing image recognition based on the FPGA according to the embodiments.
Since the method for implementing image recognition based on the FPGA has been described in detail above, it is not described herein again.
In summary, according to the method, the device, the equipment and the storage medium for realizing image recognition based on the FPGA provided by the present invention, by setting the network model parameters, the required convolutional network model is obtained by loading the corresponding network model parameters in accordance with the specific use scenario during the convolutional calculation, and the convolutional neural network does not need to be reconstructed each time, so that the convolutional neural network based on the FPGA has reconstructability and universality, and can be conveniently reused.
The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention. Any other corresponding changes and modifications made according to the technical idea of the present invention should be included in the protection scope of the claims of the present invention.