[go: up one dir, main page]

CN113313171A - Method, device and equipment for realizing image recognition based on FPGA (field programmable Gate array) and storage medium - Google Patents

Method, device and equipment for realizing image recognition based on FPGA (field programmable Gate array) and storage medium Download PDF

Info

Publication number
CN113313171A
CN113313171A CN202110594603.9A CN202110594603A CN113313171A CN 113313171 A CN113313171 A CN 113313171A CN 202110594603 A CN202110594603 A CN 202110594603A CN 113313171 A CN113313171 A CN 113313171A
Authority
CN
China
Prior art keywords
convolution
result
layer
current
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110594603.9A
Other languages
Chinese (zh)
Inventor
徐子扬
王昱
龚辞
陈俊康
卢意
章馨予
贺泷玉
黄子浩
周建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202110594603.9A priority Critical patent/CN113313171A/en
Publication of CN113313171A publication Critical patent/CN113313171A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开一种基于FPGA实现图像识别的方法、装置、设备及存储介质,方法包括:获取网络模型参数,根据网络模型参数调整卷积网络模型,网络模型参数至少包括卷积层数以及各卷积层的卷积参数;获取卷积运算指令,根据当前卷积层的卷积参数对输入的图像进行卷积计算,并输出卷积运算结果;根据卷积层数以及当前卷积层的卷积参数,判断当前卷积层是否为最后一个卷积层,如果是,则根据卷积运算结果输出图像识别结果,否则,将卷积运算结果输出至下一个卷积层进行卷积运算,直至最后一个卷积层运算完成后,根据最后一个卷积层输出的结果输出图像识别结果。本发明解决了目前基于FPGA搭建的卷积神经网络在进行图像处理时,不具备重构与通用性的技术问题。

Figure 202110594603

The invention discloses a method, device, equipment and storage medium for realizing image recognition based on FPGA. The method includes: acquiring network model parameters, adjusting a convolution network model according to the network model parameters, and the network model parameters at least include the number of convolution layers and each volume Convolution parameters of the convolution layer; obtain the convolution operation command, perform convolution calculation on the input image according to the convolution parameters of the current convolution layer, and output the convolution operation result; according to the number of convolution layers and the volume of the current convolution layer Product parameters to determine whether the current convolution layer is the last convolution layer, if so, output the image recognition result according to the result of the convolution operation, otherwise, output the result of the convolution operation to the next convolution layer for convolution operation, until After the operation of the last convolutional layer is completed, the image recognition result is output according to the result outputted by the last convolutional layer. The invention solves the technical problem that the current convolutional neural network built based on FPGA does not have reconstruction and generality when performing image processing.

Figure 202110594603

Description

Method, device and equipment for realizing image recognition based on FPGA (field programmable Gate array) and storage medium
Technical Field
The invention relates to the technical field of image processing, in particular to a method, a device, equipment and a storage medium for realizing image recognition based on an FPGA.
Background
With the advent of the big data era, the application environment of the convolutional neural network is more complex, which requires that the network has stronger feature learning and feature expression capability. The deepening of the model depth and the complication of the hierarchy solve the problem, but also bring the problem. General purpose processors are reluctant to afford ever increasing amounts of data computation. The convolutional neural network calculation process has a large number of convolution, pooling, activation and other steps, and a large number of intermediate calculation results and weight parameters are generated along with the convolution, so that a large number of storage spaces need to be developed, and the convolutional neural network calculation method is not friendly to an application scene emphasizing real-time performance.
The CPU and the GPU are common general processors at present, the CPU is used as an operation and control center of a computer, has enough cache and enough digital logic units, and is good at processing calculation steps and complex calculation tasks of data dependence; GPU has a relatively small cache size per core, a small number of digital logic units (dmus), and a large throughput compared to CPU, and is originally generated and served by the video game due to the popularity of the video game, and around 2003, the distinctive computing power of GPU has been paid attention and is being attempted for general purpose computing. CUDA, OpenCL, etc. are now in great use in the general computing field.
Due to the defects of the two general processors in expansibility and flexibility, a hardware platform such as a Field Programmable Gate Array (FPGA) is also receiving attention from researchers. Compared with the traditional digital circuit system, the FPGA has larger parallelism, which is realized based on two technologies of concurrency and pipelining; in addition, in the resource allowable range, the user is allowed to realize the logic circuit of the user; meanwhile, the system also has the characteristic of reconfigurability, and the logic inside the system can be changed according to requirements.
High Level Synthesis (HLS) and hdl (hardware description language) are two ways to implement convolutional neural networks on FPGAs at present. The HDL language can enable a user to describe and control registers, and high-level languages such as C, C + +, OpenCL and the like cannot display logic resources and connection relations therein, but describe functions expected by the user from the algorithm level. But eventually needs to be resolved into a description in HDL language and then begin logical synthesis. Compared with HLS, the HDL language has longer development period, larger description difficulty and inconvenient maintenance, most of convolutional neural networks realized based on the HDL mode are fixed structures, but the HDL language is more favorable for resource scheduling and time sequence making, and codes have higher confidentiality.
Convolution networks implemented based on HDL language are usually fixed structures, that is, a hardware system corresponding to a trained fixed network is usually customized on an FPGA. If the network structure is changed due to subsequent training adjustment, image recognition target change and the like, the content of the hardware part has to be modified. For the program described by the HDL language, if the change is large, the problems of code change and huge workload of system timing simulation are often generated. If different layers of the convolutional neural network can be modularized under the condition of hardware resource and system real-time condition, so that the different layers can be called according to the actual structure of the network, the universality of the system can be greatly improved.
Factors such as flexibility, code autonomy, cost performance, power consumption and the like of an application scene are comprehensively considered, and the convolutional neural network can be developed on an FPGA platform by using an HDL language. How to fully utilize the parallelism and the pipelining of the FPGA to improve the network efficiency and simultaneously make the constructed convolutional neural network have the reconstruction and the universality are still the problems to be solved.
Disclosure of Invention
The invention aims to overcome the technical defects, provides a method, a device, equipment and a storage medium for realizing image recognition based on an FPGA (field programmable gate array), and solves the technical problem that a convolutional neural network built based on the FPGA in the prior art does not have reconstruction and universality during image processing.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for implementing image recognition based on an FPGA, including the following steps:
s1, obtaining network model parameters, and adjusting the convolution network model according to the network model parameters, wherein the network model parameters at least comprise the number of convolution layers and convolution parameters of each convolution layer;
s2, acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer, and outputting a convolution operation result;
s3, judging whether the current convolutional layer is the last convolutional layer according to the number of convolutional layers and convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.
Preferably, in the method for realizing image recognition based on FPGA, the convolution parameters of each convolutional layer at least include the current number of layers, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.
Preferably, in the method for implementing image recognition based on FPGA, the step S2 specifically includes:
s21, acquiring a convolution operation instruction, calculating an input image by adopting the current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;
s22, judging whether the pooling operability of the current convolution layer is the pooling operation needed, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;
s23, if necessary, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as the output result of the current convolution kernel after the accumulation processing is completed;
and S24, repeating the steps S21 to S23, obtaining the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernel operations of the current convolution layer are completed.
Preferably, in the method for realizing image recognition based on the FPGA, a mean pooling mode is adopted for pooling operation.
Preferably, in the method for realizing image recognition based on FPGA, the convolution kernel may calculate a plurality of input data simultaneously.
In a second aspect, the present invention further provides an apparatus for implementing image recognition based on an FPGA, including:
the model adjusting module is used for acquiring network model parameters and adjusting the convolutional network model according to the network model parameters, wherein the network model parameters at least comprise the number of convolutional layers and convolutional parameters of each convolutional layer;
the convolution calculation module is used for acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer and outputting a convolution operation result;
and the output module is used for judging whether the current convolutional layer is the last convolutional layer according to the number of the convolutional layers and the convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.
Preferably, in the apparatus for implementing image recognition based on FPGA, the convolution parameters of each convolutional layer at least include the current layer number, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.
Preferably, in the apparatus for implementing image recognition based on FPGA, the convolution calculation module is specifically configured to:
acquiring a convolution operation instruction, calculating an input image by adopting a current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;
judging whether the pooling operability of the current convolution layer is the pooling operation needed or not, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;
if so, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as an output result of the current convolution kernel after the accumulation processing is completed;
and acquiring the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernels of the current convolution layer are operated.
In a third aspect, the present invention further provides an apparatus for implementing image recognition based on an FPGA, including: a processor and a memory;
the memory has stored thereon a computer readable program executable by the processor;
the processor, when executing the computer readable program, implements the steps in the method for implementing image recognition based on an FPGA as described above.
In a fourth aspect, the present invention also provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the method for implementing image recognition based on FPGA as described above.
Compared with the prior art, the method, the device, the equipment and the storage medium for realizing the image recognition based on the FPGA have the advantages that by setting the network model parameters, the needed convolution network model is obtained by loading the corresponding network model parameters according to the specific use scene and the recognition target when the convolution calculation of the input image is carried out, and the reconstruction of the convolution neural network is not required to be carried out every time, so that the convolution neural network based on the FPGA has the reconfigurability and the universality and can be conveniently reused.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the method for implementing image recognition based on FPGA according to the present invention;
FIG. 2 is a diagram illustrating a convolution layer performing a convolution operation according to a preferred embodiment of the present invention;
FIG. 3 is a timing diagram illustrating a preferred embodiment of convolution operations performed by convolutional layers according to the present invention;
FIG. 4 is a diagram illustrating a preferred embodiment of the convolution kernel convolution operation of the present invention;
FIG. 5 is a timing diagram illustrating a preferred embodiment of the convolution kernel convolution operation of the present invention;
FIG. 6 is a schematic diagram of a preferred embodiment of the pooling operation of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the present invention provides a method for implementing image recognition based on FPGA, which includes the following steps:
s1, obtaining network model parameters, and adjusting the convolution network model according to the network model parameters, wherein the network model parameters at least comprise convolution layer numbers and convolution parameters of each convolution layer.
In this embodiment, most of the CNN systems implemented in the HDL mode are designed for a fixed network structure, and the storage units to be used and the time for various operations to be performed in the network can be planned in advance. By using the pipeline structure, the operation efficiency of the whole network can be improved within the range allowed by resources. However, the network structure needs to be determined first, and the upper computer training part needs to obtain a determined result, so that the convolutional neural network built based on the FPGA does not have reconstructability and universality, and the collocation of the convolutional neural network needs to be performed again for different scenes and situations, such as identifying automobiles, pedestrians, lane lines and the like in an image. The embodiment of the invention is provided with the network model parameters, and before the use, the required convolutional network model is obtained by loading the corresponding network model parameters according to a specific use scene without reconstructing the convolutional neural network each time, so that the convolutional neural network based on the FPGA has the reconfigurability and the universality and can be conveniently reused.
In specific implementation, the network model parameters at least include an input image size, the number of convolution layers, and convolution parameters of each convolution layer, where the convolution parameters of each convolution layer at least include the current number of layers, the size and number of convolution kernels of each convolution layer, the size of a final output result, and pooling operability. According to different scenes, the size of an input image, the number of convolution layers and the convolution parameters of each convolution layer can be directly modified, and a corresponding convolution network model can be obtained, so that the image can be rapidly calculated, and a required result can be obtained. Wherein the pooling operability indicates whether the layer of convolutional layer needs to be pooled.
And S2, acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer, and outputting a convolution operation result.
In this embodiment, after the adjustment of the convolutional network model is completed, the convolutional calculation can be performed by using the convolutional network model. Specifically, the step S2 specifically includes:
s21, acquiring a convolution operation instruction, calculating an input image by adopting the current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;
s22, judging whether the pooling operability of the current convolution layer is the pooling operation needed, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;
s23, if necessary, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as the output result of the current convolution kernel after the accumulation processing is completed;
and S24, repeating the steps S21 to S23, obtaining the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernel operations of the current convolution layer are completed.
In this embodiment, before each layer is calculated, the convolution parameters of the current convolutional layer, including the size and number of convolution kernels of the current layer, the size after the calculation of the feature map, whether pooling is performed, and the like, are read from the ROM. And then carrying out convolution calculation according to the convolution parameters to obtain a convolution operation result. As shown in fig. 2, for example, the number of input and output feature maps of the current layer is set to 6 for illustration. F1 to F6 are 6 RAM memory cells, which store the calculation results of the previous layer, that is, the feature map group to be read out for convolution operation, F7 to F12 are memory cell groups of the calculation results of the current layer, and C1 to C6 represent 6 convolution kernels. The method comprises the following steps:
the method comprises the following steps: when the current layer is the 1 st layer, copying the input original picture to F1~F6Performing the following steps;
step two: starting from i ═ 1(i ═ 1,2, …,6), F was added1~F6Data of (1) and CiPerforming convolution operationObtaining 6 calculation results of Tj (j is 1,2, …, 6);
step three: if pooling is needed, performing pooling operation on the result Tj obtained in the step two, otherwise, entering the step four;
step four: accumulating the obtained 6 result graphs according to the corresponding pixel positions, and writing the result graphs into Fi+6Performing the following steps;
step five: judging whether to calculate to the last convolution kernel C6If yes, ending, otherwise, carrying out the operation of the second step.
In other words, the embodiment of the present invention utilizes a memory scheduling mechanism to implement "ping-pong storage" of the feature map, which is essentially a reuse of the RAM storing the feature map. In time sequence, the calculation of different characteristic graphs between layers and in each layer is in a serial relation, and when the convolution operation is carried out, a plurality of inputs and the operation of a single convolution kernel are carried out in parallel. As shown in fig. 3, Level _1_ en is the operation control enable of the first layer of the network, Feature _11 to Feature _16 represent the inputs of the first layer, and Core _11 and Core _12 represent the first 2 of the 6 convolution kernels to be calculated in the current layer. In the first layer of calculation, it can be seen that when the first convolution kernel Core _11 is calculated, the convolution operations of 6 input images are parallel, and after the convolution operations are finished, the correlation operation of the second convolution kernel is performed. And finally, after the operation of the layer is completely finished, entering the next layer.
In the preferred embodiment, the convolution kernel is 5 × 5 or 3 × 3 when performing convolution operation, and 3 × 3 is taken as an example in the following description. Because the convolution operation module has the characteristic of multiple input and multiple output, the convolution operation can be realized by utilizing the characteristics of an FPGA (field programmable gate array) pipeline. As shown in fig. 4, a convolution operation of 3 × 3 can be performed every clock by using 2 shift registers. Before the convolution calculation is started, data buffering is carried out on the shift register, and the convolution operation is started at the 2 nd clock after the 1 st image pixel is output from the 2 nd shift register. Fig. 5 is a timing chart of the convolution operation process, where the detection window size is 32 × 32, the shift register depth is 32, the pixel Data is shifted out from the second register through 64 clocks after entering the first register, and this operation is controlled by shift _ en. Meanwhile, in the first 9 clocks, the corresponding convolution kernel Weight data is read into the corresponding Weight register. Then, the conv _ en signal goes high and the convolution operation starts.
In a preferred embodiment, in order to reduce the feature dimension and reduce the number of parameters, a pooling operation is typically performed after the convolution calculation is completed. Common pooling modes include maximum pooling and mean pooling, and the embodiment of the invention adopts the mean pooling mode to perform pooling operation. As shown in fig. 6, in the implementation of the FPGA platform, the present invention adopts a shift register to average data in 2 × 2 regions, and performs a right shift operation on the result after adding 4 data to obtain an average value.
In a preferred embodiment, the convolution kernel may perform computation on a plurality of input data simultaneously, in other words, the convolution operations on a plurality of input data are parallel, and the correlation operation of the second convolution kernel is performed after the convolution operations are completed.
S3, judging whether the current convolutional layer is the last convolutional layer according to the number of convolutional layers and convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.
In this embodiment, after the calculation of one convolutional layer is completed, that is, the output result of the current convolutional layer is used as the input of the next convolutional layer, and the convolution calculation is performed by the next convolutional layer, and the process of the convolution calculation is as shown in steps S1 and S2, which is not described herein again. And after all the convolution layers are calculated, outputting an image recognition result according to the output result of the last convolution layer which finishes the convolution calculation, and further realizing the convolution calculation of the data. The image recognition result is a specific target to be recognized in the image, such as a vehicle, a pedestrian, a lane line and the like in the image, and the required recognition result can be output in a targeted manner only by modifying the model parameters according to different scenes.
The method for realizing image recognition based on the FPGA provided by the embodiment of the invention can be suitable for various different application scenes, and can be used for detecting people and vehicles in roads, recognizing lane lines and traffic marks, recognizing and tracking ships, detecting targets in gray scenes, detecting parks and benches, detecting ultra-large objects (such as space shuttles) and the like only by modifying network model parameters according to the corresponding scenes.
According to the method, the network model parameters are set, and the required convolutional network model is obtained by loading the corresponding network model parameters according to a specific use scene during convolutional calculation, so that the convolutional neural network based on the FPGA has reconfigurability and universality and can be conveniently reused without reconstructing the convolutional neural network each time.
Based on the method for realizing image recognition based on the FPGA, the invention also correspondingly provides a device for realizing image recognition based on the FPGA, which comprises the following steps:
the model adjusting module is used for acquiring network model parameters and adjusting the convolutional network model according to the network model parameters, wherein the network model parameters at least comprise the number of convolutional layers and convolutional parameters of each convolutional layer;
the convolution calculation module is used for acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer and outputting a convolution operation result;
and the output module is used for judging whether the current convolutional layer is the last convolutional layer according to the number of the convolutional layers and the convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.
In a preferred embodiment, the convolution parameters of each convolutional layer at least include the current layer number, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.
In a preferred embodiment, the convolution calculation module is specifically configured to:
acquiring a convolution operation instruction, calculating an input image by adopting a current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;
judging whether the pooling operability of the current convolution layer is the pooling operation needed or not, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;
if so, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as an output result of the current convolution kernel after the accumulation processing is completed;
and acquiring the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernels of the current convolution layer are operated.
Since the method for implementing image recognition based on the FPGA has been described in detail above, it is not described herein again.
Based on the method for realizing image recognition based on the FPGA, the invention also correspondingly provides equipment for realizing image recognition based on the FPGA, which comprises the following steps: a processor and a memory;
the memory has stored thereon a computer readable program executable by the processor;
the processor, when executing the computer readable program, implements the steps in the method for implementing image recognition based on FPGA as described in the embodiments above.
Since the method for implementing image recognition based on the FPGA has been described in detail above, it is not described herein again.
Based on the method for realizing image recognition based on the FPGA, the present invention also provides a computer readable storage medium, where one or more programs are stored, and the one or more programs can be executed by one or more processors to realize the steps in the method for realizing image recognition based on the FPGA according to the embodiments.
Since the method for implementing image recognition based on the FPGA has been described in detail above, it is not described herein again.
In summary, according to the method, the device, the equipment and the storage medium for realizing image recognition based on the FPGA provided by the present invention, by setting the network model parameters, the required convolutional network model is obtained by loading the corresponding network model parameters in accordance with the specific use scenario during the convolutional calculation, and the convolutional neural network does not need to be reconstructed each time, so that the convolutional neural network based on the FPGA has reconstructability and universality, and can be conveniently reused.
The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention. Any other corresponding changes and modifications made according to the technical idea of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A method for realizing image recognition based on FPGA is characterized by comprising the following steps:
s1, obtaining network model parameters, and adjusting the convolution network model according to the network model parameters, wherein the network model parameters at least comprise the number of convolution layers and convolution parameters of each convolution layer;
s2, acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer, and outputting a convolution operation result;
s3, judging whether the current convolutional layer is the last convolutional layer according to the number of convolutional layers and convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.
2. The method of claim 1, wherein the convolution parameters of each convolutional layer at least include the current number of layers, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.
3. The method for realizing image recognition based on FPGA according to claim 2, wherein the step S2 specifically includes:
s21, acquiring a convolution operation instruction, calculating an input image by adopting the current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;
s22, judging whether the pooling operability of the current convolution layer is the pooling operation needed, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;
s23, if necessary, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as the output result of the current convolution kernel after the accumulation processing is completed;
and S24, repeating the steps S21 to S23, obtaining the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernel operations of the current convolution layer are completed.
4. The FPGA-based image recognition method of claim 3, wherein a mean pooling is employed.
5. The FPGA-based image recognition method of claim 3, wherein the convolution kernel can simultaneously compute a plurality of input images.
6. An apparatus for realizing image recognition based on FPGA, comprising:
the model adjusting module is used for acquiring network model parameters and adjusting the convolutional network model according to the network model parameters, wherein the network model parameters at least comprise the number of convolutional layers and convolutional parameters of each convolutional layer;
the convolution calculation module is used for acquiring a convolution operation instruction, performing convolution calculation on the input image according to the convolution parameters of the current convolution layer and outputting a convolution operation result;
and the output module is used for judging whether the current convolutional layer is the last convolutional layer according to the number of the convolutional layers and the convolutional parameters of the current convolutional layer, if so, outputting an image identification result according to the convolutional operation result, otherwise, outputting the convolutional operation result to the next convolutional layer for convolutional operation, and outputting the image identification result according to the result output by the last convolutional layer after the last convolutional layer operation is finished.
7. The apparatus of claim 6, wherein the convolution parameters of each convolutional layer at least include the current layer number, the size and number of convolution kernels of each convolutional layer, the size of the final output result, and the pooling operability.
8. The apparatus according to claim 7, wherein the convolution calculation module is specifically configured to:
acquiring a convolution operation instruction, calculating an input image by adopting a current convolution kernel according to the convolution parameters of the current convolution layer, and outputting an initial calculation result;
judging whether the pooling operability of the current convolution layer is the pooling operation needed or not, if not, accumulating the initial calculation result, and taking the initial calculation result after accumulation as the output result of the current convolution kernel;
if so, performing pooling operation on the initial calculation result, performing accumulation processing on the initial calculation result after the pooling operation, and taking the result after the accumulation processing as an output result of the current convolution kernel after the accumulation processing is completed;
and acquiring the output result of each convolution kernel of the current convolution layer, and taking the output result of all convolution kernels of the current convolution layer as the convolution operation result of the current convolution layer after all convolution kernels of the current convolution layer are operated.
9. An apparatus for realizing image recognition based on FPGA, comprising: a processor and a memory;
the memory has stored thereon a computer readable program executable by the processor;
the processor, when executing the computer readable program, implements the steps in the method for implementing image recognition based on FPGA of any one of claims 1-5.
10. A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the method for implementing image recognition based on FPGA of any one of claims 1-5.
CN202110594603.9A 2021-05-28 2021-05-28 Method, device and equipment for realizing image recognition based on FPGA (field programmable Gate array) and storage medium Pending CN113313171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110594603.9A CN113313171A (en) 2021-05-28 2021-05-28 Method, device and equipment for realizing image recognition based on FPGA (field programmable Gate array) and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110594603.9A CN113313171A (en) 2021-05-28 2021-05-28 Method, device and equipment for realizing image recognition based on FPGA (field programmable Gate array) and storage medium

Publications (1)

Publication Number Publication Date
CN113313171A true CN113313171A (en) 2021-08-27

Family

ID=77376357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110594603.9A Pending CN113313171A (en) 2021-05-28 2021-05-28 Method, device and equipment for realizing image recognition based on FPGA (field programmable Gate array) and storage medium

Country Status (1)

Country Link
CN (1) CN113313171A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707649A (en) * 2022-03-28 2022-07-05 北京理工大学 A general convolution operation device
CN114997386A (en) * 2022-06-29 2022-09-02 桂林电子科技大学 CNN neural network acceleration design method based on multi-FPGA heterogeneous architecture
CN115456860A (en) * 2022-11-09 2022-12-09 深圳市唯特视科技有限公司 Image enhancement method and device based on FPGA, helmet, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Implementation method, system and device of convolutional neural network based on FPGA and line output priority
CN111416743A (en) * 2020-03-19 2020-07-14 华中科技大学 Convolutional network accelerator, configuration method and computer readable storage medium
CN111797982A (en) * 2020-07-31 2020-10-20 北京润科通用技术有限公司 Image processing system based on convolutional neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Implementation method, system and device of convolutional neural network based on FPGA and line output priority
CN111416743A (en) * 2020-03-19 2020-07-14 华中科技大学 Convolutional network accelerator, configuration method and computer readable storage medium
CN111797982A (en) * 2020-07-31 2020-10-20 北京润科通用技术有限公司 Image processing system based on convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
严军 等: "信息论与编码简明教程", 中国地质大学出版社, pages: 172 - 175 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114707649A (en) * 2022-03-28 2022-07-05 北京理工大学 A general convolution operation device
CN114707649B (en) * 2022-03-28 2024-07-02 北京理工大学 A universal convolution operation device
CN114997386A (en) * 2022-06-29 2022-09-02 桂林电子科技大学 CNN neural network acceleration design method based on multi-FPGA heterogeneous architecture
CN114997386B (en) * 2022-06-29 2024-03-22 桂林电子科技大学 A CNN neural network acceleration design method based on multi-FPGA heterogeneous architecture
CN115456860A (en) * 2022-11-09 2022-12-09 深圳市唯特视科技有限公司 Image enhancement method and device based on FPGA, helmet, equipment and medium

Similar Documents

Publication Publication Date Title
CN111967468B (en) Implementation method of lightweight target detection neural network based on FPGA
US12223288B2 (en) Neural network processing unit including approximate multiplier and system on chip including the same
US11934949B2 (en) Composite binary decomposition network
CN116580241B (en) Image processing method and system based on dual-branch multi-scale semantic segmentation network
US20180260710A1 (en) Calculating device and method for a sparsely connected artificial neural network
CN113313171A (en) Method, device and equipment for realizing image recognition based on FPGA (field programmable Gate array) and storage medium
US20180164866A1 (en) Low-power architecture for sparse neural network
CN115147598B (en) Target detection segmentation method and device, intelligent terminal and storage medium
WO2020073211A1 (en) Operation accelerator, processing method, and related device
Zhao et al. Bilateral U‐Net semantic segmentation with spatial attention mechanism
CN116432736A (en) Neural network model optimization method, device and computing equipment
CN108171328A (en) A kind of convolution algorithm method and the neural network processor based on this method
CN118196544A (en) UAV small target detection method and system based on information enhancement and feature fusion
EP3987448A1 (en) Optimizing machine learning model performance
Zhang et al. DSP-based traffic target detection for intelligent transportation
CN110969104B (en) Method, system and storage medium for detecting drivable area based on binary network
Duggal et al. Shallow SqueezeNext: An Efficient & Shallow DNN
CN118247323A (en) Scene depth estimation model training method, scene depth estimation method and device
Zhang et al. Repmono: a lightweight self-supervised monocular depth estimation architecture for high-speed inference
Chen et al. Research on hardware acceleration of traffic sign recognition based on spiking neural network and FPGA platform
Xiao et al. An improved detection method of traffic prohibition sign for intelligent vehicles based on YOLOV5s
Yuan et al. A real-time image processing hardware acceleration method based on fpga
Peng et al. Motorcycle detection based on deep learning implemented on FPGA.
Fang et al. Embedded image recognition system for lightweight convolutional Neural Networks
CN120179295B (en) Parallel operation method of operator stream, computer device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210827