[go: up one dir, main page]

CN112598062A - Image identification method and device - Google Patents

Image identification method and device Download PDF

Info

Publication number
CN112598062A
CN112598062A CN202011553934.XA CN202011553934A CN112598062A CN 112598062 A CN112598062 A CN 112598062A CN 202011553934 A CN202011553934 A CN 202011553934A CN 112598062 A CN112598062 A CN 112598062A
Authority
CN
China
Prior art keywords
image
classification
network
confidence
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011553934.XA
Other languages
Chinese (zh)
Inventor
黄高
王语霖
吕康晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202011553934.XA priority Critical patent/CN112598062A/en
Publication of CN112598062A publication Critical patent/CN112598062A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了一种图像识别方法和装置,该方法包括:获取待识别图像;从待识别图像中随机裁剪出具有预设图像大小的图像块;将图像块输入训练好的神经网络分类模型获取图像块的分类结果;根据分类结果确定分类置信度;根据分类置信度确定是否将当前分类结果作为最终图像识别结果;当不能将当前分类结果作为最终图像识别结果时,以迭代计算的形式根据特征图和定位策略网络重新获得下一个图像块,并根据下一个图像块获取下一个分类置信度,直至根据获得的分类置信度确定出将当前分类结果作为最终图像识别结果。该实施例方案在确保图像分类结果准确率的基础上达到了更好的神经网络加速效果,大大提高了系统的运行效率。

Figure 202011553934

The embodiments of the present application provide an image recognition method and device, the method includes: acquiring an image to be recognized; randomly cropping out an image block with a preset image size from the to-be-recognized image; inputting the image block into a trained neural network for classification The model obtains the classification result of the image block; determines the classification confidence according to the classification result; determines whether to use the current classification result as the final image recognition result according to the classification confidence; when the current classification result cannot be used as the final image recognition result, in the form of iterative calculation According to the feature map and the positioning strategy network, the next image block is obtained again, and the next classification confidence is obtained according to the next image block, until the current classification result is determined as the final image recognition result according to the obtained classification confidence. The solution of this embodiment achieves a better neural network acceleration effect on the basis of ensuring the accuracy of the image classification result, and greatly improves the operating efficiency of the system.

Figure 202011553934

Description

Image identification method and device
Technical Field
The present disclosure relates to neural network acceleration technologies, and more particularly, to an image recognition method and apparatus.
Background
The neural network acceleration technology is characterized in that the calculation overhead and the test time delay of the neural network are reduced by using methods such as network pruning, weight quantification and the like, the operation process of the neural network is accelerated, and the neural network acceleration technology has very wide application in actual deployment and application of a neural network model. In recent years, with the rapid development of artificial intelligence technology, the scale of deep convolutional neural networks becomes larger and larger, and the large-scale convolutional neural networks can handle more complex tasks, but also bring huge consumption of computing and storage resources. For mobile devices and wearable devices with limited computing resources, such as mobile phones and bracelets, huge computation amount means higher power consumption and operation delay, deployment of a neural network on hardware is not facilitated, application of the neural network is restricted, and the neural network model can be more widely applied to various scenes through a neural network acceleration technology.
For the problem of high calculation overhead of a neural network model, the self-adaptive reasoning is an effective solution, the method can self-adaptively distinguish simple samples and difficult samples, less calculation power is adopted for easily-recognized samples, more calculation power is adopted for difficultly-recognized samples for reasoning, and the effect of saving the consumption of model calculation resources is achieved on the whole. Although some existing adaptive inference network technologies have been applied to neural network acceleration, these methods often sacrifice more classification accuracy, and the network acceleration effect is not ideal.
Disclosure of Invention
The embodiment of the application provides an image identification method and device, which can achieve a better neural network acceleration effect on the basis of ensuring the accuracy of an image classification result and greatly improve the operation efficiency of a system.
The application provides an image recognition method, which can comprise the following steps:
acquiring an image to be identified;
randomly cutting out image blocks with a preset image size from the image to be recognized;
inputting the image blocks into a pre-trained neural network classification model to obtain classification results of the image blocks; the classification result refers to that the image blocks are classified into one or more preset image types;
determining a classification confidence according to the classification result; the classification confidence refers to the probability that the image block is classified into each image type;
determining whether the current classification result is used as the final image recognition result of the corresponding image to be recognized or not according to the classification confidence; and when the current classification result cannot be used as the final image recognition result, obtaining a next image block again according to the feature map and a pre-established and trained positioning strategy network in an iterative calculation mode, and obtaining a next classification confidence coefficient according to the next image block until the current classification result is determined to be used as the final image recognition result of the corresponding image to be recognized according to the obtained classification confidence coefficient.
In an exemplary embodiment of the present application, the neural network classification model may include: a feature extraction network and a full connectivity layer;
the inputting the image block into a pre-trained neural network classification model, and obtaining the classification result of the image block may include:
inputting the image blocks into a pre-established and trained feature extraction network to obtain a feature map, inputting the feature map into a pre-established and trained full-connection layer, and obtaining the classification results of the image blocks.
In an exemplary embodiment of the present application, the determining whether to use the current classification result as the final image recognition result of the corresponding image to be recognized according to the classification confidence may include:
when the classification confidence coefficient is larger than or equal to a preset threshold value, determining that the current classification result is used as a final image recognition result of the image to be recognized;
and when the classification confidence is smaller than the preset threshold, determining that the current classification result cannot be used as the final image recognition result of the image to be recognized.
In an exemplary embodiment of the present application, the obtaining a next image block again according to the feature map and a pre-established and trained positioning policy network in an iterative computation manner, and obtaining a next classification confidence according to the next image block until determining, according to the obtained classification confidence, that a current classification result is used as a final image recognition result of a corresponding image to be recognized may include:
41. inputting the feature map obtained last time into a pre-established positioning strategy network, and obtaining the position normalization coordinates of the image block to be cut in the next step; cutting the next image block according to the position normalized coordinates of the image blocks;
42. inputting the image blocks into a pre-established and trained feature extraction network to obtain a feature map, inputting the feature map into a pre-established and trained full-connection layer, and obtaining classification results of the image blocks; determining a classification confidence according to the classification result;
43. determining whether the current classification result is used as a final image recognition result or not according to the classification confidence; if yes, go to step 44; if not, returning to the step 41;
44. and outputting the current classification result.
In an exemplary embodiment of the present application, the method may further include: when the classification confidence coefficient obtained after iteration circulation is performed for N times is still smaller than the preset threshold value, taking the classification result obtained for the Nth time as the final image recognition result; n is a positive integer and is a preset iteration threshold.
In an exemplary embodiment of the present application, the feature extraction network may include: a plurality of function layers arranged according to the ResNet (residual neural network) rule or the DenseNet (tightly-connected neural network) rule; and/or the presence of a gas in the gas,
the positioning policy network may include: a plurality of convolutional layers and a fully-connected layer, the convolutional layers and the fully-connected layer being sequentially arranged.
In an exemplary embodiment of the present application, the method may further include: extracting the feature according to the following first calculation formulaTaking the parameter theta of the networkgAnd the parameter theta of the full attachment layermTraining is carried out:
Figure BDA0002858412400000031
wherein, log [ ·]The function of the logarithm is represented and,
Figure BDA0002858412400000032
indicates the theta corresponding to the minimum value of the functiong,ΘmValue of (a), g (x)i,Θg) Indicates that an arbitrary ith image x is to be takeniThe input parameter is thetagIs used to extract the network g (x, theta)g) The obtained characteristic diagram is used for describing the characteristics of the image,
Figure BDA0002858412400000041
representing an image xiCorresponding classification result m (g (x)i,Θg),Θm) Y in (1)iElement, yiAs an image xiThe category label defined in (1);
Figure BDA0002858412400000042
representing the finally obtained optimized parameters; i is a positive integer.
In an exemplary embodiment of the present application, training the positioning policy network may include:
acquiring image data required by training to form a training set
Figure BDA0002858412400000043
And for the training set
Figure BDA0002858412400000044
Each image in (1) is marked with a corresponding category label of yi
By means of iterative calculation, according to the training set
Figure BDA0002858412400000045
Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image xiS of classification confidence seriesi0,si1,...,si,N};
According to the classification confidence coefficient sequence si0,si1,...,si,NCalculating confidence increment deltas between two adjacent iterationsi,t+1Wherein, Δ si,t+1=si,t+1-si,t
According to the confidence coefficient increment deltasi,t+1And a preset second calculation formula is used for comparing the parameters of the positioning strategy network
Figure BDA0002858412400000046
And (5) training.
In an exemplary embodiment of the present application, the training set is calculated iteratively according to the training set
Figure BDA0002858412400000047
Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image xiS of classification confidence seriesi0,si1,...,si,N}, may include:
81. from the training set
Figure BDA0002858412400000048
Image x of (1)iRandomly cutting out image blocks with preset image size
Figure BDA0002858412400000049
Wherein i refers to the ith image, and i is a positive integer; i is less than or equal to x, x is a training set
Figure BDA00028584124000000410
Total number of images in (1); j refers to the jth iteration, j being an integer; j is less than or equal to N, and N is a preset iteration threshold;
82. the image block
Figure BDA00028584124000000411
Inputting the feature into a pre-established and trained feature extraction network to obtain a feature map fi,jAnd applying the characteristic map fi,jInputting a pre-established and trained full connection layer to obtain an image block
Figure BDA00028584124000000412
Classification result of (c)i,j(ii) a According to the classification result ci,jDetermining a classification confidence si,j
83. And detecting whether j is satisfied or not. When j is equal to N, obtaining a classification confidence sequence si0,si1,...,si,N}; when j ≠ N, proceed to step 84;
84. the characteristic diagram f obtained in the last stepi,jInputting the position normalized coordinates of the image blocks required by the j +1 th iteration into a positioning strategy network p; normalizing the coordinates from the original image x according to the positioniObtaining the image block to be processed in the jth iteration by middle cutting
Figure BDA00028584124000000413
Using said image blocks
Figure BDA00028584124000000414
Updating the image block
Figure BDA0002858412400000051
And returns to step 83.
In an exemplary embodiment of the present application, the second calculation formula may include:
Figure BDA0002858412400000052
wherein, thetapA parameter representing the policy network p,
Figure BDA0002858412400000053
expression to find the value of theta which minimizes the functionpE (-) represents a mathematical expectation operation, γ is a predefined discount rate parameter, γ is between 0 and 1; t is an integer, t is less than or equal to N, and N is a preset iteration threshold; Δ sit=sit-si(t-1)Refers to the image xiCorresponding classification confidence series si0,si1,...,si,NThe difference between the t-th classification confidence and the t-1 st classification confidence in the data,
Figure BDA0002858412400000054
representing the resulting policy network parameters.
The embodiment of the application also provides an image recognition device, which may include a processor and a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed by the processor, the image recognition device implements the image recognition method described in any one of the above items.
Compared with the related art, the embodiment of the application can comprise the following steps: acquiring an image to be identified; randomly cutting out image blocks with a preset image size from the image to be recognized; inputting the image blocks into a pre-trained neural network classification model to obtain classification results of the image blocks; the classification result refers to that the image blocks are classified into one or more preset image types; determining a classification confidence according to the classification result; the classification confidence refers to the probability that the image block is classified into each image type; determining whether the current classification result is used as the final image recognition result of the corresponding image to be recognized or not according to the classification confidence; and when the current classification result cannot be used as the final image recognition result, obtaining a next image block again according to the feature map and a pre-established and trained positioning strategy network in an iterative calculation mode, and obtaining a next classification confidence coefficient according to the next image block until the current classification result is determined to be used as the final image recognition result of the corresponding image to be recognized according to the obtained classification confidence coefficient. Through the scheme of the embodiment, a better neural network acceleration effect is achieved on the basis of ensuring the accuracy of the image classification result, and the operation efficiency of the system is greatly improved.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the present application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification and the drawings.
Drawings
The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.
FIG. 1 is a flowchart of an image recognition method according to an embodiment of the present application;
fig. 2 is a block diagram of an image recognition apparatus according to an embodiment of the present application.
Detailed Description
The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.
The present application provides an image recognition method, as shown in fig. 1, the method may include steps S101-S105:
s101, acquiring an image to be identified;
s102, randomly cutting out image blocks with preset image sizes from the image to be recognized;
s103, inputting the image blocks into a pre-trained neural network classification model to obtain classification results of the image blocks; the classification result refers to that the image blocks are classified into one or more preset image types;
s104, determining a classification confidence coefficient according to the classification result; the classification confidence refers to the probability that the image block is classified into each image type;
s105, determining whether the current classification result is used as the final image recognition result of the corresponding image to be recognized according to the classification confidence; and when the current classification result cannot be used as the final image recognition result, obtaining a next image block again according to the feature map and a pre-established and trained positioning strategy network in an iterative calculation mode, and obtaining a next classification confidence coefficient according to the next image block until the current classification result is determined to be used as the final image recognition result of the corresponding image to be recognized according to the obtained classification confidence coefficient.
In an exemplary embodiment of the application, a neural network acceleration method based on a visual attention mechanism is provided, which may include inputting an original image to be recognized into a neural network (which may include a neural network classification model and a positioning policy network) after being randomly cropped, extracting image features, and generating a classification result and a classification confidence according to the image features; determining whether the current classification result is used as a final image recognition result or not according to the classification confidence, if the current classification result cannot be used as the final image recognition result, determining the central position of the next cut image according to the feature map of the image, and iteratively generating a classification result and a classification confidence until a high-confidence classification result is obtained; finally, the neural network is deployed for image automatic recognition.
In the exemplary embodiment of the application, the scheme of the embodiment of the application effectively solves the problems of large calculation amount, high test time delay, difficulty in efficient deployment on mobile and embedded devices with limited resources and the like of an automatic image classification method based on deep learning, so that a neural network can obtain a correct image classification result with smaller calculation amount and lower time delay, the inference process of the neural network on a mobile platform is accelerated, and the operation efficiency of a system is greatly improved. Compared with the traditional self-adaptive reasoning method, the method can obtain better acceleration effect, does not modify the structure of the neural network, and has wider applicability.
In an exemplary embodiment of the present application, the neural network classification model may include: a feature extraction network and a full connectivity layer;
the inputting the image block into a pre-trained neural network classification model, and obtaining the classification result of the image block may include:
inputting the image blocks into a pre-established and trained feature extraction network to obtain a feature map, inputting the feature map into a pre-established and trained full-connection layer, and obtaining the classification results of the image blocks.
In an exemplary embodiment of the present application, the obtaining a next image block again according to the feature map and a pre-established and trained positioning policy network in an iterative computation manner, and obtaining a next classification confidence according to the next image block until determining that the current classification result is used as a final image recognition result of the corresponding image to be recognized according to the obtained classification confidence may include steps a 1-D1:
a1, inputting the feature map obtained last time into a pre-established positioning strategy network, and obtaining the position normalization coordinates of the image block to be cut in the next step; cutting the next image block according to the position normalized coordinates of the image blocks;
b1, inputting the image blocks into a pre-established and trained feature extraction network to obtain a feature map, inputting the feature map into a pre-established and trained full connection layer to obtain the classification results of the image blocks; determining a classification confidence according to the classification result;
c1, determining whether the current classification result is used as a final image recognition result according to the classification confidence; if yes, go to step D1; if not, returning to the step A1;
d1, outputting the current classification result.
In an exemplary embodiment of the present application, the determining whether to use the current classification result as the final image recognition result of the corresponding image to be recognized according to the classification confidence may include:
when the classification confidence coefficient is larger than or equal to a preset threshold value, determining that the current classification result is used as a final image recognition result of the image to be recognized;
and when the classification confidence is smaller than the preset threshold, determining that the current classification result cannot be used as the final image recognition result of the image to be recognized.
In an exemplary embodiment of the present application, based on the above embodiment scheme, the detailed image automatic identification method may include:
1. for each test image x, randomly cropping out image blocks with an image size of H' × W
Figure BDA0002858412400000091
Obtaining a feature map through the feature extraction network g and the full connection layer m
Figure BDA0002858412400000092
And classification results
Figure BDA0002858412400000093
Let the classification confidence be s0=maxci0When s is0Direct output of c when ≧ eta0And as a recognition result, wherein eta is a preset threshold and takes a value between 0 and 1.
2. When s isiWhen the position is less than eta, wherein i is 0, 1, and N-1, the position of the image block processed in the step i +1 is obtained through a positioning strategy network p, and the image block is cut out
Figure BDA0002858412400000094
Obtaining a classification result c through the feature extraction network g and the full connection layer mi+1And classification confidence si+1. When s isi+1Output c when ≧ ηi+1And (5) as a recognition result, otherwise, repeating the step 2.
In an exemplary embodiment of the present application, the method may further include: when the classification confidence coefficient obtained after iteration circulation is performed for N times is still smaller than the preset threshold value, taking the classification result obtained for the Nth time as the final image recognition result; n is a positive integer and is a preset iteration threshold.
In an exemplary embodiment of the present application, the classification confidence s obtained when iterating the nth timeNWhen < eta, output cNAs a result of the recognition.
In the exemplary embodiment of the present application, before the image recognition by the above embodiment scheme, a neural network (a feature extraction network, a full connection layer, and a positioning strategy network) for classification may be constructed and trained in advance.
In an exemplary embodiment of the present application, image data required for training may first be acquired as a training set
Figure BDA0002858412400000095
Wherein the training set
Figure BDA0002858412400000096
Is xi,xiIt may be a three-dimensional matrix of a × H × W, each element representing a pixel value of an image, a representing the number of channels of the image, and H and W representing the height and width of the image, respectively; each image xiWith a category label yiCorresponding to, yiIs an integer with a value between 1 and K (assuming that K classification categories are shared, namely K classification recognition results) and is used for marking xiClass to which it belongs, label yiMay be given by manual labeling.
In an exemplary embodiment of the present application, a feature extraction network g and a full connectivity layer m may be established.
In an exemplary embodiment of the present application, the feature extraction network may include: a plurality of function layers arranged according to the ResNet (residual neural network) rule or the DenseNet (tightly-connected neural network) rule. The feature extraction network can be formed by arranging a plurality of function layers according to a ResNet rule or a DenseNet rule. The parameter of the feature extraction network can be set to thetagWith f ═ g (x, Θ)g) Representing the image x input parameter as ΘgThe characteristic diagram f obtained by the neural network is Af×Hf×WfOf a three-dimensional matrix offNumber of channels, H, of the feature mapfAnd WfRespectively, the height and width of the feature map.
In an exemplary embodiment of the present application, the parameter of the full connection layer may be set to ΘmUsing m (f, theta)m) Representing the feature diagram f with the input parameter thetamThe output c obtained from the fully connected layer of (2) is Kx 1And (3) vector, wherein each element takes a value between 0 and 1, and K is the total number of classification categories defined in the content.
In an exemplary embodiment of the present application, the training set may be selected from a training set
Figure BDA0002858412400000101
Image x of (1)i(the dimension of which may be AxHxW as defined above) blocks of an image of size H '× W' are randomly cropped
Figure BDA0002858412400000102
Wherein H ' < H, W ' < W, i.e. respectively from the interval [0, H-H ']And [0, W-W']To generate a random integer hi0And wi0
In an exemplary embodiment of the present application, the original image x may be subjected to the following clipping calculation formulaiCutting to obtain image block
Figure BDA0002858412400000103
The size is H '× W', the horizontal and vertical coordinates of the upper left corner are HijAnd wijThe values of the two are respectively in the interval of [0, H-H']And [0, W-W']And (4) the following steps.
In an exemplary embodiment of the present application, the method may further include: extracting the parameters theta of the network from the features according to a first calculation formulagAnd the parameter theta of the full attachment layermTraining is carried out:
Figure BDA0002858412400000104
wherein, log [ ·]The function of the logarithm is represented and,
Figure BDA0002858412400000105
indicates the theta corresponding to the minimum value of the functiong,ΘmValue of (a), g (x)i,Θg) Indicates that an arbitrary ith image x is to be takeniThe input parameter is thetagIs used to extract the network g (x, theta)g) The obtained compoundThe figure is a figure of merit,
Figure BDA0002858412400000106
representing an image xiCorresponding classification result m (g (x)i,Θg),Θm) Y in (1)iElement, yiAs an image xiThe category label defined in (1);
Figure BDA0002858412400000107
representing the finally obtained optimized parameters; i is a positive integer.
In an exemplary embodiment of the present application, a three-dimensional matrix having a dimension of a × H '× W', i.e., an image block, may be divided into
Figure BDA0002858412400000111
Obtaining a feature graph f output by the feature extraction network in the input defined feature extraction networki0Then, the feature map f is usedi0And inputting the classification result output by the full connection layer into the defined full connection layer. And according to the first calculation formula, the parameter theta of the neural network g is calculatedgAnd the parameter theta of the full attachment layer mmThe training is carried out, and the training is carried out,
in an exemplary embodiment of the present application, the location policy network may include: a plurality of convolutional layers and a fully-connected layer, the convolutional layers and the fully-connected layer being sequentially arranged.
In an exemplary embodiment of the present application, a positioning policy network p may be established, where the positioning policy network p may be formed by sequentially arranging a plurality of convolutional layers and a fully-connected layer, and a parameter of the positioning policy network p may be set to ΘpThe input of the positioning strategy network p is a feature graph f obtained by a defined feature extraction network, and the dimension of the feature graph f is Af×Hf×WfThe output of p is the normalized coordinates (h, w) of the position of the next image block to be cut, and is a2 x 1 vector, the value of each element in the vector is between 0 and 1, and the position of the upper left corner of the image block accounts for the proportion of the whole image.
In an exemplary embodiment of the present application, training the positioning strategy network may comprise steps A2-D2:
a2, obtaining image data required by training to form a training set
Figure BDA0002858412400000112
And for the training set
Figure BDA0002858412400000113
Each image x in (1)iLabel the corresponding category label as yi
B2, through iterative calculation, according to the training set
Figure BDA0002858412400000114
Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image xiS of classification confidence seriesi0,si1,…,si,N}。
In an exemplary embodiment of the present application, the training set is calculated iteratively according to the training set
Figure BDA0002858412400000115
Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image xiS of classification confidence seriesi0,si1,...,si,NMay comprise steps A3-D3:
a3, from the training set
Figure BDA0002858412400000116
Image x of (1)iRandomly cutting out image blocks with preset image size
Figure BDA0002858412400000117
Wherein i refers to the ith image, and i is a positive integer; i is less than or equal to x, x is a training set
Figure BDA0002858412400000118
Total number of images in (1);j refers to the jth iteration, j being an integer; j is less than or equal to N, and N is a preset iteration threshold;
b3, converting the image block
Figure BDA0002858412400000119
Inputting the feature into a pre-established and trained feature extraction network to obtain a feature map fi,jAnd applying the characteristic map fi,jInputting a pre-established and trained full connection layer to obtain an image block
Figure BDA00028584124000001110
Classification result of (c)i,j(ii) a According to the classification result ci,jDetermining a classification confidence si,j
And C3, detecting whether j is equal to N. When j is equal to N, obtaining a classification confidence sequence si0,si1,...,si,N}; when j is not equal to N, entering the step D3;
d3, feature map f obtained in the previous stepi,jInputting the position normalized coordinates of the image blocks required by the j +1 th iteration into a positioning strategy network p; normalizing the coordinates from the original image x according to the positioniObtaining the image block to be processed in the jth iteration by middle cutting
Figure BDA0002858412400000121
Using said image blocks
Figure BDA0002858412400000122
Updating the image block
Figure BDA0002858412400000123
And returns to step C3.
C2, according to the classification confidence coefficient sequence si0,si1,...,si,NCalculating confidence increment deltas between two adjacent iterationsi,t+1Wherein, Δ si,t+1=si,t+1-si,t
D2, increasing Δ s according to the confidencei,t+1And a preset second calculation formula is used for comparing the parameters of the positioning strategy network
Figure BDA0002858412400000124
And (5) training.
In an exemplary embodiment of the present application, the second calculation formula may include:
Figure BDA0002858412400000125
wherein, thetapA parameter representing the policy network p,
Figure BDA0002858412400000126
expression to find the value of theta which minimizes the functionpE (-) represents a mathematical expectation operation, γ is a predefined discount rate parameter, γ is between 0 and 1; t is an integer, t is less than or equal to N, and N is a preset iteration threshold; Δ sit=sit-si(t-1)Refers to the image xiCorresponding classification confidence series si0,si1,...,si,NThe difference between the t-th classification confidence and the t-1 st classification confidence in the data,
Figure BDA0002858412400000127
representing the resulting policy network parameters.
In an exemplary embodiment of the present application, for the training set
Figure BDA0002858412400000128
Each image x in (1)iThe corresponding category label is yiRepeating the steps A3 and B3 to obtain corresponding characteristic graphs
Figure BDA0002858412400000129
And classification results
Figure BDA00028584124000001210
Let the classification confidence be ci0Middle category labelyiThe corresponding component, i.e.
Figure BDA00028584124000001211
In exemplary embodiments of the present application, f may be usedi,j-1Representing the feature graph obtained by the feature extraction network g in the previous step, and converting the feature graph f into the feature graphi,j-1Inputting the position normalized coordinates of the image block required by the step j into a positioning strategy network p to obtain: (h)ij,wij)=p(fi,j-1,Θp). Can be based on the above normalized coordinates (h)ij,wij) From the original image xiObtaining the image block to be processed in the j step by middle cutting
Figure BDA00028584124000001212
Figure BDA00028584124000001213
The size of the left upper corner is H 'multiplied by W', the horizontal and vertical coordinates of the left upper corner are H respectivelyijAnd wij. Can be used for image block
Figure BDA00028584124000001214
Inputting a feature extraction network g and a full connection layer m to obtain a feature graph and a classification result:
Figure BDA0002858412400000131
confidence of classification is
Figure BDA0002858412400000132
In an exemplary embodiment of the present application, the previous step is repeated for N rounds, where N is a pre-specified parameter, and may generally be equal to 5, so as to obtain a classification confidence sequence { s ═ si0,si1,...,si,N}。
In an exemplary embodiment of the present application, the confidence increment between two adjacent steps may be Δ si,t+1=si,t+1-si,t(ii) a By calculatingSolving the following problem in the training set
Figure BDA0002858412400000133
Upper training positioning strategy network p:
Figure BDA0002858412400000134
wherein gamma is a predefined discount rate parameter, the size of gamma is between 0 and 1, and the optimal positioning strategy network parameter can be obtained by minimizing the above formula
Figure BDA0002858412400000135
In the exemplary embodiment of the present application, a trained feature extraction network, a full connection layer and a positioning strategy network are obtained, and based on these trained neural networks, automatic image recognition can be realized according to steps S101 to S105.
In the exemplary embodiments of the present application, the embodiments of the present application include at least the following advantages:
1. the scheme of the embodiment of the application effectively solves the problems that an automatic image classification method based on deep learning is large in calculation amount, high in test time delay, difficult to efficiently deploy on mobile and embedded equipment with limited resources and the like, so that a neural network can obtain a correct image classification result with smaller calculation amount and lower time delay, the inference process of a neural network model on a mobile platform is accelerated, and the operation efficiency of the system is greatly improved. In addition, compared with other self-adaptive reasoning methods, the method does not modify the structure and parameters of the neural network, can be compatible with other network pruning, knowledge distillation and weight quantification methods, and has the characteristics of high efficiency and easiness in use.
2. The huge computing resource consumption of the deep convolutional neural network is not favorable for the deployment of the network model on actual systems such as mobile equipment. In view of this point, the embodiment of the present application dynamically determines the size of the network for processing each input image (that is, using image blocks) by using a self-adaptive inference method, and further adaptively determines the amount of computation power consumed by each input image, so that the same accuracy is achieved with smaller computation overhead, and the deployment cost of the deep convolutional neural network on the device with limited computation resources is greatly saved.
3. Other adaptive inference methods for neural network acceleration tend to directly modify a network model, such as embedding a multi-stage classifier in a network to realize adaptive inference. The embodiment of the application introduces a self-adaptive reasoning method based on a visual attention mechanism, which adopts image blocks with smaller sizes obtained by cutting from an original image as input, and realizes self-adaptive reasoning by selecting the amount of calculation power adopted by a decision network of the number of the image blocks without modifying the structure of a neural network. The embodiment of the application can also be compatible with other neural network compression and acceleration methods, such as network pruning, knowledge distillation and the like, so that the acceleration effect of the network model is further improved.
The embodiment of the present application further provides an image recognition apparatus 1, as shown in fig. 2, which may include a processor 11 and a computer-readable storage medium 12, where the computer-readable storage medium 12 stores instructions, and when the instructions are executed by the processor 11, the image recognition method described in any one of the above is implemented.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims (10)

1. An image recognition method, characterized in that the method comprises:
acquiring an image to be identified;
randomly cutting out image blocks with a preset image size from the image to be recognized;
inputting the image blocks into a pre-trained neural network classification model to obtain classification results of the image blocks; the classification result refers to that the image blocks are classified into one or more preset image types;
determining a classification confidence according to the classification result; the classification confidence refers to the probability that the image block is classified into each image type;
determining whether the current classification result is used as the final image recognition result of the corresponding image to be recognized or not according to the classification confidence; and when the current classification result cannot be used as the final image recognition result, obtaining a next image block again according to the feature map and a pre-established and trained positioning strategy network in an iterative calculation mode, and obtaining a next classification confidence coefficient according to the next image block until the current classification result is determined to be used as the final image recognition result of the corresponding image to be recognized according to the obtained classification confidence coefficient.
2. The image recognition method of claim 1, wherein the neural network classification model comprises: a feature extraction network and a full connectivity layer;
the inputting the image blocks into a pre-trained neural network classification model, and the obtaining of the classification results of the image blocks comprises:
inputting the image blocks into a pre-established and trained feature extraction network to obtain a feature map, inputting the feature map into a pre-established and trained full-connection layer, and obtaining the classification results of the image blocks.
3. The image recognition method according to claim 1, wherein the determining whether to take the current classification result as the final image recognition result of the corresponding image to be recognized according to the classification confidence comprises:
when the classification confidence coefficient is larger than or equal to a preset threshold value, determining that the current classification result is used as a final image recognition result of the image to be recognized;
and when the classification confidence is smaller than the preset threshold, determining that the current classification result cannot be used as the final image recognition result of the image to be recognized.
4. The image recognition method according to claim 2, wherein the obtaining a next image block again according to the feature map and a pre-established and trained positioning strategy network in an iterative computation manner, and obtaining a next classification confidence according to the next image block until determining that a current classification result is a final image recognition result of a corresponding image to be recognized according to the obtained classification confidence comprises:
41. inputting the feature map obtained last time into a pre-established positioning strategy network, and obtaining the position normalization coordinates of the image block to be cut in the next step; cutting the next image block according to the position normalized coordinates of the image blocks;
42. inputting the image blocks into a pre-established and trained feature extraction network to obtain a feature map, inputting the feature map into a pre-established and trained full-connection layer, and obtaining classification results of the image blocks; determining a classification confidence according to the classification result;
43. determining whether the current classification result is used as a final image recognition result or not according to the classification confidence; if yes, go to step 44; if not, returning to the step 41;
44. and outputting the current classification result.
5. The image recognition method according to claim 2 or 4,
the feature extraction network includes: a plurality of function layers arranged according to a residual neural network ResNet rule or a closely connected neural network DenseNet rule; and/or the presence of a gas in the gas,
the positioning policy network comprises: a plurality of convolutional layers and a fully-connected layer, the convolutional layers and the fully-connected layer being sequentially arranged.
6. The image recognition method of claim 5, further comprising: extracting the parameters theta of the network from the features according to a first calculation formulagAnd the parameter theta of the full attachment layermTraining is carried out:
Figure FDA0002858412390000021
wherein, log [ ·]A logarithmic function is expressed, and theta corresponding to the minimum function value is expressedg、ΘmValue of (a), g (x)i,Θg) Indicates that an arbitrary ith image x is to be takeniThe input parameter is thetagIs used to extract the network g (x, theta)g) The obtained characteristic diagram is used for describing the characteristics of the image,
Figure FDA0002858412390000031
representing an image xiCorresponding classification result m (g (x)i,Θg),Θm) Y in (1)iElement, yiAs an image xiThe category label defined in (1);
Figure FDA0002858412390000032
representing the finally obtained optimized parameters; i is a positive integer.
7. The image recognition method of claim 5, wherein training the positioning strategy network comprises:
acquiring image data required by training to form a training set
Figure FDA0002858412390000033
And for the training set
Figure FDA0002858412390000034
Each image x in (1)iLabel the corresponding category label as yi
By means of iterative calculation, according to the training set
Figure FDA0002858412390000035
Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image xiS of classification confidence seriesi0,si1,...,si,N};
According to the classification confidence coefficient sequence si0,si1,...,si,NCalculating confidence increment deltas between two adjacent iterationsi,t+1Wherein, Δ si,t+1=si,t+1-si,t
According to the confidence coefficient increment deltasi,t+1And a preset second calculation formula is used for comparing the parameters of the positioning strategy network
Figure FDA0002858412390000036
And (5) training.
8. The image recognition method of claim 7, wherein the iterative computation is based on the training set
Figure FDA0002858412390000037
Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image xiS of classification confidence seriesi0,si1,...,si,NAnd (4) the method comprises the following steps:
81. from the training set
Figure FDA0002858412390000038
Image x of (1)iRandomly cutting out image blocks with preset image size
Figure FDA0002858412390000039
Wherein i refers to the ith image, and i is a positive integer; i is less than or equal to x, x is a training set
Figure FDA00028584123900000310
Total number of images in (1); j refers to the jth iteration, j being an integer; j is less than or equal to N, and N is a preset iteration threshold;
82. the image block
Figure FDA00028584123900000312
Inputting the feature into a pre-established and trained feature extraction network to obtain a feature map fi,jAnd applying the characteristic map fi,jInputting a pre-established and trained full connection layer to obtain an image block
Figure FDA00028584123900000311
Classification result of (c)i,j(ii) a According to the classification result ci,jDetermining a classification confidence si,j
83. And detecting whether j is satisfied or not. When j is equal to N, obtaining a classification confidence sequence si0,si1,...,si,N}; when j ≠ N, proceed to step 84;
84. the characteristic diagram f obtained in the last stepi,jInputting the position normalized coordinates of the image blocks required by the j +1 th iteration into a positioning strategy network p; normalizing the coordinates from the original image x according to the positioniObtaining the image block to be processed in the jth iteration by middle cutting
Figure FDA0002858412390000041
Using said image blocks
Figure FDA0002858412390000042
Updating the image block
Figure FDA0002858412390000043
And returns to step 83.
9. The image recognition method according to claim 7, wherein the second calculation formula includes:
Figure FDA0002858412390000044
wherein, thetapA parameter representing the policy network p,
Figure FDA0002858412390000045
expression to find the value of theta which minimizes the functionpE (-) represents a mathematical expectation operation, γ is a predefined discount rate parameter, γ is between 0 and 1; t is an integer, t is less than or equal to N, and N is a preset iteration threshold; Δ sit=sit-si(t-1)Refers to the image xiCorresponding classification confidence series si0,si1,...,si,NThe t-th classification confidence in theThe difference in the confidence of each of the classifications,
Figure FDA0002858412390000046
representing the resulting policy network parameters.
10. An image recognition apparatus comprising a processor and a computer-readable storage medium having instructions stored therein, wherein the instructions, when executed by the processor, implement the image recognition method of any one of claims 1-9.
CN202011553934.XA 2020-12-24 2020-12-24 Image identification method and device Pending CN112598062A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011553934.XA CN112598062A (en) 2020-12-24 2020-12-24 Image identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011553934.XA CN112598062A (en) 2020-12-24 2020-12-24 Image identification method and device

Publications (1)

Publication Number Publication Date
CN112598062A true CN112598062A (en) 2021-04-02

Family

ID=75202581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011553934.XA Pending CN112598062A (en) 2020-12-24 2020-12-24 Image identification method and device

Country Status (1)

Country Link
CN (1) CN112598062A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095307A (en) * 2021-06-09 2021-07-09 国网浙江省电力有限公司 Automatic identification method for financial voucher information
CN115546672A (en) * 2022-11-30 2022-12-30 广州天地林业有限公司 Forest picture processing method and system based on image processing
CN116758359A (en) * 2023-08-16 2023-09-15 腾讯科技(深圳)有限公司 Image recognition method, device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027605A (en) * 2019-11-28 2020-04-17 北京影谱科技股份有限公司 Fine-grained image recognition method and device based on deep learning
CN111460862A (en) * 2019-01-21 2020-07-28 中科星图股份有限公司 Neural network-based remote sensing image ground object extraction method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460862A (en) * 2019-01-21 2020-07-28 中科星图股份有限公司 Neural network-based remote sensing image ground object extraction method and system
CN111027605A (en) * 2019-11-28 2020-04-17 北京影谱科技股份有限公司 Fine-grained image recognition method and device based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YULIN WANG ET AL.: "Glance and Focus: a Dynamic Approach to Reducing", 《ARXIV》 *
刘栋等: "深度学习及其在图像物体分类与检测中的应用综述", 《计算机科学》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095307A (en) * 2021-06-09 2021-07-09 国网浙江省电力有限公司 Automatic identification method for financial voucher information
CN115546672A (en) * 2022-11-30 2022-12-30 广州天地林业有限公司 Forest picture processing method and system based on image processing
CN115546672B (en) * 2022-11-30 2023-03-24 广州天地林业有限公司 Forest picture processing method and system based on image processing
CN116758359A (en) * 2023-08-16 2023-09-15 腾讯科技(深圳)有限公司 Image recognition method, device and electronic equipment

Similar Documents

Publication Publication Date Title
CN109978142B (en) Neural network model compression method and device
CN106599900B (en) A method and apparatus for identifying character strings in images
WO2022006919A1 (en) Activation fixed-point fitting-based method and system for post-training quantization of convolutional neural network
WO2022252455A1 (en) Methods and systems for training graph neural network using supervised contrastive learning
CN109086722B (en) Hybrid license plate recognition method and device and electronic equipment
CN113128478B (en) Model training method, pedestrian analysis method, device, equipment and storage medium
CN109840531A (en) The method and apparatus of training multi-tag disaggregated model
CN115426671A (en) Method, system and equipment for graph neural network training and wireless cell fault prediction
CN107292458B (en) Prediction method and prediction device applied to neural network chip
CN112183742A (en) Neural network hybrid quantization method based on progressive quantization and Hessian information
CN112598062A (en) Image identification method and device
CN113825148B (en) Method, device and computing equipment for determining network node alarm level
CN115018039A (en) Neural network distillation method, target detection method and device
CN110298394B (en) Image recognition method and related device
CN113516163B (en) Vehicle classification model compression method, device and storage medium based on network pruning
CN114332500A (en) Image processing model training method and device, computer equipment and storage medium
CN117421657B (en) A noisy labeled image sample screening learning method and system based on oversampling strategy
CN113627537A (en) Image identification method and device, storage medium and equipment
CN112464057A (en) Network data classification method, device, equipment and readable storage medium
JP6107531B2 (en) Feature extraction program and information processing apparatus
CN112132167A (en) Image generation and neural network training method, apparatus, device, and medium
CN118334323B (en) Insulator detection method and system based on ultraviolet image
CN116188878A (en) Image classification method, device and storage medium based on fine-tuning of neural network structure
CN111178039A (en) Model training method and device, and method and device for realizing text processing
CN114492783B (en) A pruning method and device for a multi-task neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210402