CN112598062A

CN112598062A - Image identification method and device

Info

Publication number: CN112598062A
Application number: CN202011553934.XA
Authority: CN
Inventors: 黄高; 王语霖; 吕康晨
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2021-04-02

Abstract

The embodiments of the present application provide an image recognition method and device, the method includes: acquiring an image to be recognized; randomly cropping out an image block with a preset image size from the to-be-recognized image; inputting the image block into a trained neural network for classification The model obtains the classification result of the image block; determines the classification confidence according to the classification result; determines whether to use the current classification result as the final image recognition result according to the classification confidence; when the current classification result cannot be used as the final image recognition result, in the form of iterative calculation According to the feature map and the positioning strategy network, the next image block is obtained again, and the next classification confidence is obtained according to the next image block, until the current classification result is determined as the final image recognition result according to the obtained classification confidence. The solution of this embodiment achieves a better neural network acceleration effect on the basis of ensuring the accuracy of the image classification result, and greatly improves the operating efficiency of the system.

Description

Image identification method and device

Technical Field

The present disclosure relates to neural network acceleration technologies, and more particularly, to an image recognition method and apparatus.

Background

The neural network acceleration technology is characterized in that the calculation overhead and the test time delay of the neural network are reduced by using methods such as network pruning, weight quantification and the like, the operation process of the neural network is accelerated, and the neural network acceleration technology has very wide application in actual deployment and application of a neural network model. In recent years, with the rapid development of artificial intelligence technology, the scale of deep convolutional neural networks becomes larger and larger, and the large-scale convolutional neural networks can handle more complex tasks, but also bring huge consumption of computing and storage resources. For mobile devices and wearable devices with limited computing resources, such as mobile phones and bracelets, huge computation amount means higher power consumption and operation delay, deployment of a neural network on hardware is not facilitated, application of the neural network is restricted, and the neural network model can be more widely applied to various scenes through a neural network acceleration technology.

For the problem of high calculation overhead of a neural network model, the self-adaptive reasoning is an effective solution, the method can self-adaptively distinguish simple samples and difficult samples, less calculation power is adopted for easily-recognized samples, more calculation power is adopted for difficultly-recognized samples for reasoning, and the effect of saving the consumption of model calculation resources is achieved on the whole. Although some existing adaptive inference network technologies have been applied to neural network acceleration, these methods often sacrifice more classification accuracy, and the network acceleration effect is not ideal.

Disclosure of Invention

The embodiment of the application provides an image identification method and device, which can achieve a better neural network acceleration effect on the basis of ensuring the accuracy of an image classification result and greatly improve the operation efficiency of a system.

The application provides an image recognition method, which can comprise the following steps:

acquiring an image to be identified;

randomly cutting out image blocks with a preset image size from the image to be recognized;

inputting the image blocks into a pre-trained neural network classification model to obtain classification results of the image blocks; the classification result refers to that the image blocks are classified into one or more preset image types;

determining a classification confidence according to the classification result; the classification confidence refers to the probability that the image block is classified into each image type;

determining whether the current classification result is used as the final image recognition result of the corresponding image to be recognized or not according to the classification confidence; and when the current classification result cannot be used as the final image recognition result, obtaining a next image block again according to the feature map and a pre-established and trained positioning strategy network in an iterative calculation mode, and obtaining a next classification confidence coefficient according to the next image block until the current classification result is determined to be used as the final image recognition result of the corresponding image to be recognized according to the obtained classification confidence coefficient.

In an exemplary embodiment of the present application, the neural network classification model may include: a feature extraction network and a full connectivity layer;

the inputting the image block into a pre-trained neural network classification model, and obtaining the classification result of the image block may include:

inputting the image blocks into a pre-established and trained feature extraction network to obtain a feature map, inputting the feature map into a pre-established and trained full-connection layer, and obtaining the classification results of the image blocks.

In an exemplary embodiment of the present application, the determining whether to use the current classification result as the final image recognition result of the corresponding image to be recognized according to the classification confidence may include:

when the classification confidence coefficient is larger than or equal to a preset threshold value, determining that the current classification result is used as a final image recognition result of the image to be recognized;

and when the classification confidence is smaller than the preset threshold, determining that the current classification result cannot be used as the final image recognition result of the image to be recognized.

In an exemplary embodiment of the present application, the obtaining a next image block again according to the feature map and a pre-established and trained positioning policy network in an iterative computation manner, and obtaining a next classification confidence according to the next image block until determining, according to the obtained classification confidence, that a current classification result is used as a final image recognition result of a corresponding image to be recognized may include:

41. inputting the feature map obtained last time into a pre-established positioning strategy network, and obtaining the position normalization coordinates of the image block to be cut in the next step; cutting the next image block according to the position normalized coordinates of the image blocks;

42. inputting the image blocks into a pre-established and trained feature extraction network to obtain a feature map, inputting the feature map into a pre-established and trained full-connection layer, and obtaining classification results of the image blocks; determining a classification confidence according to the classification result;

43. determining whether the current classification result is used as a final image recognition result or not according to the classification confidence; if yes, go to step 44; if not, returning to the step 41;

44. and outputting the current classification result.

In an exemplary embodiment of the present application, the method may further include: when the classification confidence coefficient obtained after iteration circulation is performed for N times is still smaller than the preset threshold value, taking the classification result obtained for the Nth time as the final image recognition result; n is a positive integer and is a preset iteration threshold.

In an exemplary embodiment of the present application, the feature extraction network may include: a plurality of function layers arranged according to the ResNet (residual neural network) rule or the DenseNet (tightly-connected neural network) rule; and/or the presence of a gas in the gas,

the positioning policy network may include: a plurality of convolutional layers and a fully-connected layer, the convolutional layers and the fully-connected layer being sequentially arranged.

In an exemplary embodiment of the present application, the method may further include: extracting the feature according to the following first calculation formulaTaking the parameter theta of the network_gAnd the parameter theta of the full attachment layer_mTraining is carried out:

wherein, log [ ·]The function of the logarithm is represented and,

indicates the theta corresponding to the minimum value of the function_g，Θ_mValue of (a), g (x)_i，Θ_g) Indicates that an arbitrary ith image x is to be taken_iThe input parameter is theta_gIs used to extract the network g (x, theta)_g) The obtained characteristic diagram is used for describing the characteristics of the image,

representing an image x_iCorresponding classification result m (g (x)_i，Θ_g)，Θ_m) Y in (1)_iElement, y_iAs an image x_iThe category label defined in (1);

representing the finally obtained optimized parameters; i is a positive integer.

In an exemplary embodiment of the present application, training the positioning policy network may include:

acquiring image data required by training to form a training set

And for the training set

Each image in (1) is marked with a corresponding category label of y_i；

By means of iterative calculation, according to the training set

Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image x_iS of classification confidence series_i0，s_i1，...，s_i，N}；

According to the classification confidence coefficient sequence s_i0，s_i1，...，s_i，NCalculating confidence increment deltas between two adjacent iterations_i，t+1Wherein, Δ s_i，t+1＝s_i，t+1-s_i，t；

According to the confidence coefficient increment deltas_i，t+1And a preset second calculation formula is used for comparing the parameters of the positioning strategy network

And (5) training.

In an exemplary embodiment of the present application, the training set is calculated iteratively according to the training set

Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image x_iS of classification confidence series_i0，s_i1，...，s_i，N}, may include:

81. from the training set

Image x of (1)_iRandomly cutting out image blocks with preset image size

Wherein i refers to the ith image, and i is a positive integer; i is less than or equal to x, x is a training set

Total number of images in (1); j refers to the jth iteration, j being an integer; j is less than or equal to N, and N is a preset iteration threshold;

82. the image block

Inputting the feature into a pre-established and trained feature extraction network to obtain a feature map f_i，jAnd applying the characteristic map f_i，jInputting a pre-established and trained full connection layer to obtain an image block

Classification result of (c)_i，j(ii) a According to the classification result c_i，jDetermining a classification confidence s_i，j；

83. And detecting whether j is satisfied or not. When j is equal to N, obtaining a classification confidence sequence s_i0，s_i1，...，s_i，N}; when j ≠ N, proceed to step 84;

84. the characteristic diagram f obtained in the last step_i，jInputting the position normalized coordinates of the image blocks required by the j +1 th iteration into a positioning strategy network p; normalizing the coordinates from the original image x according to the position_iObtaining the image block to be processed in the jth iteration by middle cutting

Using said image blocks

Updating the image block

And returns to step 83.

In an exemplary embodiment of the present application, the second calculation formula may include:

wherein, theta_pA parameter representing the policy network p,

expression to find the value of theta which minimizes the function_pE (-) represents a mathematical expectation operation, γ is a predefined discount rate parameter, γ is between 0 and 1; t is an integer, t is less than or equal to N, and N is a preset iteration threshold; Δ s_it＝s_it-s_i(t-1)Refers to the image x_iCorresponding classification confidence series s_i0，s_i1，...，s_i，NThe difference between the t-th classification confidence and the t-1 st classification confidence in the data,

representing the resulting policy network parameters.

The embodiment of the application also provides an image recognition device, which may include a processor and a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed by the processor, the image recognition device implements the image recognition method described in any one of the above items.

Compared with the related art, the embodiment of the application can comprise the following steps: acquiring an image to be identified; randomly cutting out image blocks with a preset image size from the image to be recognized; inputting the image blocks into a pre-trained neural network classification model to obtain classification results of the image blocks; the classification result refers to that the image blocks are classified into one or more preset image types; determining a classification confidence according to the classification result; the classification confidence refers to the probability that the image block is classified into each image type; determining whether the current classification result is used as the final image recognition result of the corresponding image to be recognized or not according to the classification confidence; and when the current classification result cannot be used as the final image recognition result, obtaining a next image block again according to the feature map and a pre-established and trained positioning strategy network in an iterative calculation mode, and obtaining a next classification confidence coefficient according to the next image block until the current classification result is determined to be used as the final image recognition result of the corresponding image to be recognized according to the obtained classification confidence coefficient. Through the scheme of the embodiment, a better neural network acceleration effect is achieved on the basis of ensuring the accuracy of the image classification result, and the operation efficiency of the system is greatly improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the present application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification and the drawings.

Drawings

The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.

FIG. 1 is a flowchart of an image recognition method according to an embodiment of the present application;

fig. 2 is a block diagram of an image recognition apparatus according to an embodiment of the present application.

Detailed Description

The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.

Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

The present application provides an image recognition method, as shown in fig. 1, the method may include steps S101-S105:

s101, acquiring an image to be identified;

s102, randomly cutting out image blocks with preset image sizes from the image to be recognized;

s103, inputting the image blocks into a pre-trained neural network classification model to obtain classification results of the image blocks; the classification result refers to that the image blocks are classified into one or more preset image types;

s104, determining a classification confidence coefficient according to the classification result; the classification confidence refers to the probability that the image block is classified into each image type;

s105, determining whether the current classification result is used as the final image recognition result of the corresponding image to be recognized according to the classification confidence; and when the current classification result cannot be used as the final image recognition result, obtaining a next image block again according to the feature map and a pre-established and trained positioning strategy network in an iterative calculation mode, and obtaining a next classification confidence coefficient according to the next image block until the current classification result is determined to be used as the final image recognition result of the corresponding image to be recognized according to the obtained classification confidence coefficient.

In an exemplary embodiment of the application, a neural network acceleration method based on a visual attention mechanism is provided, which may include inputting an original image to be recognized into a neural network (which may include a neural network classification model and a positioning policy network) after being randomly cropped, extracting image features, and generating a classification result and a classification confidence according to the image features; determining whether the current classification result is used as a final image recognition result or not according to the classification confidence, if the current classification result cannot be used as the final image recognition result, determining the central position of the next cut image according to the feature map of the image, and iteratively generating a classification result and a classification confidence until a high-confidence classification result is obtained; finally, the neural network is deployed for image automatic recognition.

In the exemplary embodiment of the application, the scheme of the embodiment of the application effectively solves the problems of large calculation amount, high test time delay, difficulty in efficient deployment on mobile and embedded devices with limited resources and the like of an automatic image classification method based on deep learning, so that a neural network can obtain a correct image classification result with smaller calculation amount and lower time delay, the inference process of the neural network on a mobile platform is accelerated, and the operation efficiency of a system is greatly improved. Compared with the traditional self-adaptive reasoning method, the method can obtain better acceleration effect, does not modify the structure of the neural network, and has wider applicability.

In an exemplary embodiment of the present application, the obtaining a next image block again according to the feature map and a pre-established and trained positioning policy network in an iterative computation manner, and obtaining a next classification confidence according to the next image block until determining that the current classification result is used as a final image recognition result of the corresponding image to be recognized according to the obtained classification confidence may include steps a 1-D1:

a1, inputting the feature map obtained last time into a pre-established positioning strategy network, and obtaining the position normalization coordinates of the image block to be cut in the next step; cutting the next image block according to the position normalized coordinates of the image blocks;

b1, inputting the image blocks into a pre-established and trained feature extraction network to obtain a feature map, inputting the feature map into a pre-established and trained full connection layer to obtain the classification results of the image blocks; determining a classification confidence according to the classification result;

c1, determining whether the current classification result is used as a final image recognition result according to the classification confidence; if yes, go to step D1; if not, returning to the step A1;

d1, outputting the current classification result.

In an exemplary embodiment of the present application, based on the above embodiment scheme, the detailed image automatic identification method may include:

1. for each test image x, randomly cropping out image blocks with an image size of H' × W

Obtaining a feature map through the feature extraction network g and the full connection layer m

And classification results

Let the classification confidence be s₀＝maxc_i0When s is₀Direct output of c when ≧ eta₀And as a recognition result, wherein eta is a preset threshold and takes a value between 0 and 1.

2. When s is_iWhen the position is less than eta, wherein i is 0, 1, and N-1, the position of the image block processed in the step i +1 is obtained through a positioning strategy network p, and the image block is cut out

Obtaining a classification result c through the feature extraction network g and the full connection layer m_i+1And classification confidence s_i+1. When s is_i+1Output c when ≧ η_i+1And (5) as a recognition result, otherwise, repeating the step 2.

In an exemplary embodiment of the present application, the classification confidence s obtained when iterating the nth time_NWhen < eta, output c_NAs a result of the recognition.

In the exemplary embodiment of the present application, before the image recognition by the above embodiment scheme, a neural network (a feature extraction network, a full connection layer, and a positioning strategy network) for classification may be constructed and trained in advance.

In an exemplary embodiment of the present application, image data required for training may first be acquired as a training set

Wherein the training set

Is x_i，x_iIt may be a three-dimensional matrix of a × H × W, each element representing a pixel value of an image, a representing the number of channels of the image, and H and W representing the height and width of the image, respectively; each image x_iWith a category label y_iCorresponding to, y_iIs an integer with a value between 1 and K (assuming that K classification categories are shared, namely K classification recognition results) and is used for marking x_iClass to which it belongs, label y_iMay be given by manual labeling.

In an exemplary embodiment of the present application, a feature extraction network g and a full connectivity layer m may be established.

In an exemplary embodiment of the present application, the feature extraction network may include: a plurality of function layers arranged according to the ResNet (residual neural network) rule or the DenseNet (tightly-connected neural network) rule. The feature extraction network can be formed by arranging a plurality of function layers according to a ResNet rule or a DenseNet rule. The parameter of the feature extraction network can be set to theta_gWith f ═ g (x, Θ)_g) Representing the image x input parameter as Θ_gThe characteristic diagram f obtained by the neural network is A_f×H_f×W_fOf a three-dimensional matrix of_fNumber of channels, H, of the feature map_fAnd W_fRespectively, the height and width of the feature map.

In an exemplary embodiment of the present application, the parameter of the full connection layer may be set to Θ_mUsing m (f, theta)_m) Representing the feature diagram f with the input parameter theta_mThe output c obtained from the fully connected layer of (2) is Kx 1And (3) vector, wherein each element takes a value between 0 and 1, and K is the total number of classification categories defined in the content.

In an exemplary embodiment of the present application, the training set may be selected from a training set

Image x of (1)_i(the dimension of which may be AxHxW as defined above) blocks of an image of size H '× W' are randomly cropped

Wherein H ' < H, W ' < W, i.e. respectively from the interval [0, H-H ']And [0, W-W']To generate a random integer h_i0And w_i0。

In an exemplary embodiment of the present application, the original image x may be subjected to the following clipping calculation formula_iCutting to obtain image block

The size is H '× W', the horizontal and vertical coordinates of the upper left corner are H_ijAnd w_ijThe values of the two are respectively in the interval of [0, H-H']And [0, W-W']And (4) the following steps.

In an exemplary embodiment of the present application, the method may further include: extracting the parameters theta of the network from the features according to a first calculation formula_gAnd the parameter theta of the full attachment layer_mTraining is carried out:

wherein, log [ ·]The function of the logarithm is represented and,

indicates the theta corresponding to the minimum value of the function_g，Θ_mValue of (a), g (x)_i，Θ_g) Indicates that an arbitrary ith image x is to be taken_iThe input parameter is theta_gIs used to extract the network g (x, theta)_g) The obtained compoundThe figure is a figure of merit,

In an exemplary embodiment of the present application, a three-dimensional matrix having a dimension of a × H '× W', i.e., an image block, may be divided into

Obtaining a feature graph f output by the feature extraction network in the input defined feature extraction network_i0Then, the feature map f is used_i0And inputting the classification result output by the full connection layer into the defined full connection layer. And according to the first calculation formula, the parameter theta of the neural network g is calculated_gAnd the parameter theta of the full attachment layer m_mThe training is carried out, and the training is carried out,

in an exemplary embodiment of the present application, the location policy network may include: a plurality of convolutional layers and a fully-connected layer, the convolutional layers and the fully-connected layer being sequentially arranged.

In an exemplary embodiment of the present application, a positioning policy network p may be established, where the positioning policy network p may be formed by sequentially arranging a plurality of convolutional layers and a fully-connected layer, and a parameter of the positioning policy network p may be set to Θ_pThe input of the positioning strategy network p is a feature graph f obtained by a defined feature extraction network, and the dimension of the feature graph f is A_f×H_f×W_fThe output of p is the normalized coordinates (h, w) of the position of the next image block to be cut, and is a2 x 1 vector, the value of each element in the vector is between 0 and 1, and the position of the upper left corner of the image block accounts for the proportion of the whole image.

In an exemplary embodiment of the present application, training the positioning strategy network may comprise steps A2-D2:

a2, obtaining image data required by training to form a training set

And for the training set

Each image x in (1)_iLabel the corresponding category label as y_i。

B2, through iterative calculation, according to the training set

Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image x_iS of classification confidence series_i0，s_i1，…，s_i，N}。

Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image x_iS of classification confidence series_i0，s_i1，...，s_i，NMay comprise steps A3-D3:

a3, from the training set

Image x of (1)_iRandomly cutting out image blocks with preset image size

Total number of images in (1);j refers to the jth iteration, j being an integer; j is less than or equal to N, and N is a preset iteration threshold;

b3, converting the image block

And C3, detecting whether j is equal to N. When j is equal to N, obtaining a classification confidence sequence s_i0，s_i1，...，s_i，N}; when j is not equal to N, entering the step D3;

d3, feature map f obtained in the previous step_i，jInputting the position normalized coordinates of the image blocks required by the j +1 th iteration into a positioning strategy network p; normalizing the coordinates from the original image x according to the position_iObtaining the image block to be processed in the jth iteration by middle cutting

Using said image blocks

Updating the image block

And returns to step C3.

C2, according to the classification confidence coefficient sequence s_i0，s_i1，...，s_i，NCalculating confidence increment deltas between two adjacent iterations_i，t+1Wherein, Δ s_i，t+1＝s_i，t+1-s_i，t。

D2, increasing Δ s according to the confidence_i，t+1And a preset second calculation formula is used for comparing the parameters of the positioning strategy network

And (5) training.

wherein, theta_pA parameter representing the policy network p,

representing the resulting policy network parameters.

In an exemplary embodiment of the present application, for the training set

Each image x in (1)_iThe corresponding category label is y_iRepeating the steps A3 and B3 to obtain corresponding characteristic graphs

And classification results

Let the classification confidence be c_i0Middle category labely_iThe corresponding component, i.e.

In exemplary embodiments of the present application, f may be used_i，j-1Representing the feature graph obtained by the feature extraction network g in the previous step, and converting the feature graph f into the feature graph_i，j-1Inputting the position normalized coordinates of the image block required by the step j into a positioning strategy network p to obtain: (h)_ij，w_ij)＝p(f_i，j-1，Θ_p). Can be based on the above normalized coordinates (h)_ij，w_ij) From the original image x_iObtaining the image block to be processed in the j step by middle cutting

The size of the left upper corner is H 'multiplied by W', the horizontal and vertical coordinates of the left upper corner are H respectively_ijAnd w_ij. Can be used for image block

Inputting a feature extraction network g and a full connection layer m to obtain a feature graph and a classification result:

confidence of classification is

In an exemplary embodiment of the present application, the previous step is repeated for N rounds, where N is a pre-specified parameter, and may generally be equal to 5, so as to obtain a classification confidence sequence { s ═ s_i0，s_i1，...，s_i，N}。

In an exemplary embodiment of the present application, the confidence increment between two adjacent steps may be Δ s_i，t+1＝s_i，t+1-s_i，t(ii) a By calculatingSolving the following problem in the training set

Upper training positioning strategy network p:

wherein gamma is a predefined discount rate parameter, the size of gamma is between 0 and 1, and the optimal positioning strategy network parameter can be obtained by minimizing the above formula

In the exemplary embodiment of the present application, a trained feature extraction network, a full connection layer and a positioning strategy network are obtained, and based on these trained neural networks, automatic image recognition can be realized according to steps S101 to S105.

In the exemplary embodiments of the present application, the embodiments of the present application include at least the following advantages:

1. the scheme of the embodiment of the application effectively solves the problems that an automatic image classification method based on deep learning is large in calculation amount, high in test time delay, difficult to efficiently deploy on mobile and embedded equipment with limited resources and the like, so that a neural network can obtain a correct image classification result with smaller calculation amount and lower time delay, the inference process of a neural network model on a mobile platform is accelerated, and the operation efficiency of the system is greatly improved. In addition, compared with other self-adaptive reasoning methods, the method does not modify the structure and parameters of the neural network, can be compatible with other network pruning, knowledge distillation and weight quantification methods, and has the characteristics of high efficiency and easiness in use.

2. The huge computing resource consumption of the deep convolutional neural network is not favorable for the deployment of the network model on actual systems such as mobile equipment. In view of this point, the embodiment of the present application dynamically determines the size of the network for processing each input image (that is, using image blocks) by using a self-adaptive inference method, and further adaptively determines the amount of computation power consumed by each input image, so that the same accuracy is achieved with smaller computation overhead, and the deployment cost of the deep convolutional neural network on the device with limited computation resources is greatly saved.

3. Other adaptive inference methods for neural network acceleration tend to directly modify a network model, such as embedding a multi-stage classifier in a network to realize adaptive inference. The embodiment of the application introduces a self-adaptive reasoning method based on a visual attention mechanism, which adopts image blocks with smaller sizes obtained by cutting from an original image as input, and realizes self-adaptive reasoning by selecting the amount of calculation power adopted by a decision network of the number of the image blocks without modifying the structure of a neural network. The embodiment of the application can also be compatible with other neural network compression and acceleration methods, such as network pruning, knowledge distillation and the like, so that the acceleration effect of the network model is further improved.

The embodiment of the present application further provides an image recognition apparatus 1, as shown in fig. 2, which may include a processor 11 and a computer-readable storage medium 12, where the computer-readable storage medium 12 stores instructions, and when the instructions are executed by the processor 11, the image recognition method described in any one of the above is implemented.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. An image recognition method, characterized in that the method comprises:

acquiring an image to be identified;

2. The image recognition method of claim 1, wherein the neural network classification model comprises: a feature extraction network and a full connectivity layer;

the inputting the image blocks into a pre-trained neural network classification model, and the obtaining of the classification results of the image blocks comprises:

3. The image recognition method according to claim 1, wherein the determining whether to take the current classification result as the final image recognition result of the corresponding image to be recognized according to the classification confidence comprises:

4. The image recognition method according to claim 2, wherein the obtaining a next image block again according to the feature map and a pre-established and trained positioning strategy network in an iterative computation manner, and obtaining a next classification confidence according to the next image block until determining that a current classification result is a final image recognition result of a corresponding image to be recognized according to the obtained classification confidence comprises:

44. and outputting the current classification result.

5. The image recognition method according to claim 2 or 4,

the feature extraction network includes: a plurality of function layers arranged according to a residual neural network ResNet rule or a closely connected neural network DenseNet rule; and/or the presence of a gas in the gas,

the positioning policy network comprises: a plurality of convolutional layers and a fully-connected layer, the convolutional layers and the fully-connected layer being sequentially arranged.

6. The image recognition method of claim 5, further comprising: extracting the parameters theta of the network from the features according to a first calculation formula_gAnd the parameter theta of the full attachment layer_mTraining is carried out:

wherein, log [ ·]A logarithmic function is expressed, and theta corresponding to the minimum function value is expressed_g、Θ_mValue of (a), g (x)_i，Θ_g) Indicates that an arbitrary ith image x is to be taken_iThe input parameter is theta_gIs used to extract the network g (x, theta)_g) The obtained characteristic diagram is used for describing the characteristics of the image,

7. The image recognition method of claim 5, wherein training the positioning strategy network comprises:

acquiring image data required by training to form a training set

And for the training set

Each image x in (1)_iLabel the corresponding category label as y_i；

By means of iterative calculation, according to the training set

And (5) training.

8. The image recognition method of claim 7, wherein the iterative computation is based on the training set

Pre-established and trained feature extraction network and pre-established and trained full connectivity layer acquisition image x_iS of classification confidence series_i0，s_i1，...，s_i，NAnd (4) the method comprises the following steps:

81. from the training set

Image x of (1)_iRandomly cutting out image blocks with preset image size

82. the image block

Using said image blocks

Updating the image block

And returns to step 83.

9. The image recognition method according to claim 7, wherein the second calculation formula includes:

wherein, theta_pA parameter representing the policy network p,

expression to find the value of theta which minimizes the function_pE (-) represents a mathematical expectation operation, γ is a predefined discount rate parameter, γ is between 0 and 1; t is an integer, t is less than or equal to N, and N is a preset iteration threshold; Δ s_it＝s_it-s_i(t-1)Refers to the image x_iCorresponding classification confidence series s_i0，s_i1，...，s_i，NThe t-th classification confidence in theThe difference in the confidence of each of the classifications,

representing the resulting policy network parameters.

10. An image recognition apparatus comprising a processor and a computer-readable storage medium having instructions stored therein, wherein the instructions, when executed by the processor, implement the image recognition method of any one of claims 1-9.