CN112396123A

CN112396123A - Image recognition method, system, terminal and medium based on convolutional neural network

Info

Publication number: CN112396123A
Application number: CN202011382932.9A
Authority: CN
Inventors: 方堃; 杨杰
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-02-23

Abstract

The invention discloses an image identification method, a system, a terminal and a medium based on a convolutional neural network, wherein the method comprises the following steps: training a convolutional neural network model for executing an image recognition task by adopting a training image; inputting an image to be identified into the convolutional neural network model, and outputting an image identification result; the convolutional neural network model comprises a convolutional neural network, an orthogonal multipath block is embedded in the convolutional neural network, the orthogonal multipath block structure comprises a plurality of paths, and parameters on each path are orthogonal to each other, so that the robustness of the convolutional neural network is improved. The method solves the problem that the robustness of the current common neural network under the image recognition task is very fragile, and has very high model robustness while maintaining the high accuracy of the image recognition.

Description

Image recognition method, system, terminal and medium based on convolutional neural network

Technical Field

The invention belongs to the technical field of image processing and pattern recognition, and particularly relates to an image recognition method, an image recognition system, a terminal and a medium based on a convolutional neural network.

Background

In the field of image processing and pattern recognition, one of the most common tasks is the image recognition task. In a classical image recognition dataset, such as CIFAR10, the categories of images include 10 categories: airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats and trucks, on larger data sets, such as IMAGENET, contain up to 2000 images in total of 1500 million categories. The image recognition task is essentially a classification task, and researchers need to solve for an effective classifier to accurately classify an image into the true category to which it belongs. Early researchers used simple classical image processing methods such as gaussian blur, feature pyramid extraction and the like in related researches of image recognition tasks, and often combined with the classical image processing methods and matched with a priori knowledge, only one image recognition method with limited performance can be obtained finally.

In recent years, with the advent of large-scale data sets and the advancement of computing power of graphic processing units, neural network models have begun to be applied more and more widely in various scientific research fields, including computer vision, natural language processing and recommendation systems, and the like, due to their powerful learning capabilities. After the introduction of a neural network model, the image recognition task has also been developed rapidly, and the neural network structure for image recognition is developed from the earliest multi-layer perceptron (MLP) to a cascaded Convolutional Neural Network (CNN) to a residual network (resnet) with a residual connection structure; the number of layers of the neural network also develops from a shallow-structured 5-layer network to a residual network as deep as 152 layers; on CIFAR10 and IMAGENET, researchers developed more novel structures and deeper neural networks, refreshing recognition accuracy on these data sets from time to time.

At present, in the engineering practice of an image recognition task, it is not complicated to train an image classifier based on a convolutional neural network model with excellent performance. However, researchers have found that the generalization performance of neural networks is very fragile in certain situations. Taking the image recognition task as an example, given a fully trained network, the network already has excellent generalization performance, i.e., the network can obtain a high recognition rate on training data and can also obtain good recognition accuracy on unseen test data. However, researchers have found that if some carefully designed modifications are made to the images in the training data or the test data, such modifications may be made with a little noise or even at a pixel level, and the modified images are visually indistinguishable from the original image, i.e., the human still can correctly recognize and classify the modified images, however, the neural network gives erroneous classification results to the modified images with a very high degree of confidence. These modified images are called confrontation samples (confrontation samples), the process of generating the confrontation samples is called confrontation attack (confrontation attack), the recognition capability of the neural network on the confrontation samples brings about the research on the robustness of the neural network, and the research on the robustness of the network also helps to explore the nature of the neural network, and the significance is very important.

Disclosure of Invention

Aiming at the problem that the convolutional neural network is generally fragile and stable in an image recognition task, the invention provides an image recognition method, a system, a terminal and a medium based on the convolutional neural network.

In a first aspect of the present invention, an image recognition method based on a convolutional neural network is provided, including:

training a convolutional neural network model for executing an image recognition task by adopting a training image;

inputting an image to be identified into the convolutional neural network model, and outputting an image identification result;

the convolutional neural network model comprises a convolutional neural network, an orthogonal multipath block is embedded in the convolutional neural network, the orthogonal multipath block structure comprises a plurality of paths, and parameters on each path are orthogonal to each other, so that the robustness of the convolutional neural network is improved.

Optionally, the training develops a convolutional neural network model that performs an image recognition task, including:

s11, acquiring a batch of training images with category labels;

s12, initializing a convolutional neural network, embedding an orthogonal multipath block in the convolutional neural network, and increasing the robustness of the convolutional neural network;

s13, randomly selecting a small batch of images from all the images in S11, inputting the small batch of images into a convolutional neural network, wherein each path in an orthogonal multipath block in the convolutional neural network outputs a predicted image type to the images;

s14, for each path, calculating the difference between the output predicted image category and the real category of the image, and taking weighted average to the calculated differences of all paths;

s15, updating the network parameters by a gradient descent method according to the calculated average difference;

and S16, repeating the steps S13 to S15 until the average difference converges, or setting a sufficient number of times of repetition, and stopping training after the number of times of repetition is reached, thereby obtaining a trained neural network model.

Optionally, the orthogonal multipath block is embedded in any position of the convolutional neural network, specifically determined according to specific service use requirements.

Optionally, the orthogonal multi-path block is embedded in the last linear layer of the convolutional neural network, each path in the block is a linear layer, linear layer parameters on the paths are orthogonal to each other, and the linear layers share the previous layer of the network.

Optionally, the orthogonal multipaths are embedded in convolutional layers of the convolutional neural network, each path in the block is a convolutional layer, convolutional layer parameters on the paths are orthogonal to each other, and the convolutional layers share the rest of the network.

Optionally, the inputting the image to be recognized into the convolutional neural network model, and outputting an image recognition result, includes:

s21, deploying the convolutional neural network model to a business machine;

s22, inputting the image to be recognized into the convolutional neural network model, wherein each path in the convolutional neural network model outputs a prediction result of the image;

s23, the prediction result with the largest number of occurrences among the prediction results of these paths is taken as the final prediction result of the image.

Optionally, the method further comprises: before training and recognition, preprocessing and/or image enhancement operations are performed on the training images and the images to be recognized, including:

the preprocessing comprises the normalization of scaling the image size to the same size and the size of the image pixel value;

the image enhancement operation comprises the steps of supplementing 0 pixel at the edge of an image, then cutting the image, and randomly horizontally turning the image.

In a second aspect of the present invention, there is provided an image recognition system based on a convolutional neural network, comprising:

a training module for training a convolutional neural network model for executing an image recognition task by using a training image;

the recognition module inputs the image to be recognized into the convolutional neural network model and outputs an image recognition result;

In a third aspect of the present invention, there is provided an electronic terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to perform the image recognition method.

In a fourth aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the above-mentioned image recognition method.

Compared with the prior art, the embodiment of the invention has at least one of the following beneficial effects:

the embodiment of the invention solves the problem that the robustness of the current common neural network under the image recognition task is very fragile, and has very high model robustness while maintaining the high accuracy of the image recognition.

According to the embodiment of the invention, the orthogonal constraint is applied to the parameters on each path in the orthogonal multi-path block, so that the rest part in the neural network can be simultaneously adapted to the mutually orthogonal paths, the convolutional neural network can learn more stable characteristics, the image after malicious modification can still be kept at a higher identification accuracy rate, and the robustness of the network is enhanced.

The embodiment of the invention researches the influence of orthogonal multipath blocks placed at different positions in a convolutional neural network on the network robustness, the network robustness characteristics corresponding to the orthogonal multipath blocks at different positions are different, and the characteristics can guide the specific deployment and application of a convolutional neural network model under the requirements of different service scenes for image recognition.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a training process according to an embodiment of the present invention.

FIG. 3 is a flow chart of a testing process according to an embodiment of the present invention.

Fig. 4 is a partial comparison diagram of a conventional network and a network in which orthogonal multipath blocks are embedded in the present invention.

Fig. 5a, 5b, and 5c are schematic diagrams illustrating the embedding of orthogonal multipath blocks at different positions in a neural network according to an embodiment of the present invention.

FIG. 6 is a flowchart illustrating a method implementation of an embodiment of the invention.

Fig. 7 is a schematic diagram of a deployment manner of a specific application scenario according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

Referring to fig. 1, in the image recognition method in the embodiment of the present invention, a convolutional neural network model is used as an image classifier, an image to be classified is input, and a category of the image is output. Specifically, an image recognition method based on a convolutional neural network includes:

s100, training a convolutional neural network model for executing an image recognition task by adopting a training image;

s200, inputting the image to be identified into a convolutional neural network model, and outputting an image identification result;

the convolutional neural network model comprises a convolutional neural network, an orthogonal multi-path block is embedded in the convolutional neural network, the orthogonal multi-path block structure comprises a plurality of paths, and parameters on each path are orthogonal to each other, so that the robustness of the convolutional neural network is improved.

In another preferred embodiment, the image recognition method based on the convolutional neural network comprises a training phase and a testing phase. Firstly, training a convolutional neural network on image data with labels, greatly enhancing the robustness of the network by embedding an orthogonal multipath block structure in the network, then deploying the trained network into actual services, and executing an image recognition task on images needing to be classified.

Specifically, referring to fig. 2, the training phase in the preferred embodiment may include the following steps:

the method comprises the steps of firstly, acquiring a batch of training image data with category labels;

after the training image data is obtained, preprocessing operation and image enhancement operation can be carried out on the training data, wherein the preprocessing operation comprises the normalization of scaling the image size to the same size and the image pixel value size, and the image enhancement operation comprises the steps of supplementing 0 pixel at the edge of the image, cutting and randomly horizontally overturning the image;

secondly, initializing a convolutional neural network, and embedding an orthogonal multi-path block structure in the convolutional neural network according to specific service use requirements;

step three, randomly taking a small batch of images from all image data, inputting the small batch of images into a convolutional neural network, wherein each path in an orthogonal multi-path block in the network outputs a predicted image type to the images;

fourthly, for each path, calculating the difference between the output prediction category and the real category of the images, and taking weighted average of the calculated differences of all paths;

fifthly, updating network parameters by a gradient descent method according to the calculated average difference;

and sixthly, repeating the second step to the fifth step until the average difference converges, or setting a sufficient number of times of repetition, and stopping training after the number of times of repetition is reached, thereby obtaining a trained convolutional neural network model.

Referring to fig. 3, the testing phase in the preferred embodiment includes the following steps:

step one, deploying a convolutional neural network model obtained in a training stage to a business machine;

secondly, acquiring each image needing to identify a specific category; these images may be subjected to the same pre-processing operations as the first step in the training phase;

loading a convolutional neural network model trained in a training stage, inputting the preprocessed image to be recognized into the convolutional neural network model, wherein each path in the convolutional neural network model outputs a prediction result of the image;

and fourthly, taking the mode of the path prediction results, namely the prediction result with the largest occurrence frequency as the final prediction result of the image.

Referring to fig. 3, based on the above embodiment, preferably, the convolutional neural network structure including an Orthogonal Multi-Path block (OMP block) in the third step of the training stage is specifically: an orthogonal multipath block is embedded in a classical convolutional neural network structure, parameters on each path in the block are constrained to be orthogonal, and the orthogonal multipath block can be embedded at any position in the network. For example, if the orthogonal multi-path block is embedded in the last linear layer of the network, each path in the block is a linear layer, the linear layer parameters on the paths are orthogonal to each other, and the linear layers share the previous layer of the network; if orthogonal multipaths are embedded in convolutional layers of the network, each path in the block is a convolutional layer, convolutional layer parameters on the paths are orthogonal to each other, and the convolutional layers share the rest of the network. These specific locations are selected according to actual business requirements, determining the final neural network model structure. According to the embodiment of the invention, the orthogonal constraint is applied to the parameters on each path in the orthogonal multi-path block, so that the rest part in the neural network can be simultaneously adapted to the mutually orthogonal paths, the convolutional neural network can learn more stable characteristics, the image after malicious modification can still be kept at a higher identification accuracy rate, and the robustness of the network is enhanced.

Referring to fig. 4, a partial comparison of a conventional network and a network with embedded orthogonal multipath blocks, where the orthogonal multipath blocks comprise multiple paths and the parameters on each path are constrained to be orthogonal to each other.

Fig. 5a, 5b, 5c show the network structure after embedding orthogonal multipath blocks at three different locations of the convolutional neural network. The orthogonal multipath block is embedded in the first layer of convolution, the middle layer of convolution and the last linear layer of the convolutional neural network from top to bottom in sequence.

In another preferred embodiment, the detailed description is based on the situation that the orthogonal multipath block is placed at the last layer of the network, and the orthogonal multipath block is placed at other positions with similar training methods, which are not described herein again. First, some relevant notation is given: let the last linear classification layer be denoted as g (-) and the remaining part of the network be denoted as h (-) so that the entire network can be represented by g (h (-) R^d→R^KWhere d and K represent the dimensions of the network input and output, respectively.

Referring to FIG. 6, a flowchart of an embodiment is shown, which includes data preparation, model training, and model testing (deployment). The data preparation mainly refers to the collection, labeling, preprocessing and data enhancement of training data, model training is to obtain a convolutional neural network model for image recognition, and model testing is the actual deployment application of the convolutional neural network. Fig. 7 is a schematic diagram of a deployment in a specific application scenario. Specifically, the model training and model testing in this embodiment are described in detail below.

In this embodiment, the model training includes:

s101, taking out a batch of image samples from a training image set every time, and recording the image samples as (x, y);

s102, inputting the batch of images into an orthogonal multipath block, placing the orthogonal multipath block in a convolutional neural network of the last layer, carrying out forward propagation of a model, and then calculating a loss function required by training as follows:

loss＝l_c+λ·l_o

wherein L (·,) represents a loss function for measuring the difference between the predicted class result g (h (x)) and the true class y of the image x by the network, L is the number of paths, L_cThe sum of the loss functions of the network corresponding to each path is calculated, and in practical application, more weighted average modes can be adopted, and the method is not limited to simple summation in a formula, namely l_oThe sum of the squares of the inner products of the parameters on any two paths is calculated. The parameter orthogonal means that the inner product of the parameters is 0, so l will be used here_oAs an objective function to be optimized, it is equivalent to constrain the orthogonality of the parameters on any two paths.

S103, calculating the gradient of the loss function relative to the parameter according to a random gradient descent algorithm, and updating the parameter:

where θ represents all parameters in the network and η represents the learning rate in the stochastic gradient descent algorithm.

S104, if the confrontation training is needed, generating a batch of corresponding confrontation samples (x) based on the current network^adv，y)；

S105, calculating a loss function corresponding to the confrontation sample as follows:

loss_adv＝l_{c_adv}+λ·l_o

s106, calculating the gradient of the loss function on the antagonizing sample relative to the parameters according to a random gradient descent algorithm, and updating the parameters again:

and S107, repeating S101-S106 for a plurality of times until a trained convolutional neural network model M is obtained.

After the convolutional neural network model M is obtained, the next model test, that is, the test process, is performed. As shown in fig. 3:

s201, deploying the trained convolutional neural network model M to an image recognition service platform;

s202, when receiving an image needing to identify a specific category, firstly carrying out the same preprocessing operation as the preprocessing operation in the first step of the training stage, and not carrying out the image enhancement operation;

s203, loading the model M, and carrying out preprocessing on the image x to be recognized_newInputting the predicted result into a model M, wherein each path in M outputs a predicted result yⁱ＝gⁱ(h(x))，i＝1，...，L

S204, taking y¹，y²，...，y^LThe category result with the largest occurrence number is used as the final recognition result of the image to be recognized.

According to the embodiment of the invention, the orthogonal multi-path block structure is embedded in the network, so that the robustness of the network can be greatly enhanced, then the trained network is deployed in actual services, and the image recognition task is executed on the image to be classified, so that the problem that the robustness of the current common neural network under the image recognition task is very fragile is solved, and the high accuracy of image recognition can be maintained and the model robustness is very high.

Based on the image recognition method, in another embodiment of the present invention, there is provided an image recognition system based on a convolutional neural network, the system including:

a training module for training a convolutional neural network model for executing an image recognition task by using a training image; the convolutional neural network model comprises a convolutional neural network, an orthogonal multi-path block is embedded in the convolutional neural network, the orthogonal multi-path block structure comprises a plurality of paths, and parameters on each path are orthogonal to each other, so that the robustness of the convolutional neural network is improved;

and the identification module inputs the image to be identified into the convolutional neural network model and outputs an image identification result.

In the above embodiments of the present invention, the convolutional neural network model can provide a high recognition accuracy on the image data used for training, and at the same time, can have an excellent recognition performance on the test image data that has never been found.

The implementation of the modules in the above embodiments may specifically refer to the corresponding steps in the above embodiments of the image recognition method, and is not described herein again.

In another embodiment of the present invention, an electronic terminal is further provided, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor executes the computer program to perform the image recognition method.

In another embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program for executing the above-mentioned image recognition method when executed by a processor.

It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may refer to the technical solution of the system to implement the step flow of the method, that is, the embodiment in the system may be understood as a preferred example for implementing the method, and details are not described herein.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to implement the same functions by implementing the system and its various devices in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. an image recognition method based on convolutional neural network, is characterized in that, comprises:

Use training images to train a convolutional neural network model for image recognition tasks;

Input the image to be recognized into the convolutional neural network model, and output the image recognition result;

Wherein, the convolutional neural network model includes a convolutional neural network, an orthogonal multi-path block is embedded in the convolutional neural network, the orthogonal multi-path block structure includes multiple paths, and the parameters on each path are mutually orthogonal, increasing the robustness of the convolutional neural network.

2. the image recognition method based on convolutional neural network according to claim 1, is characterized in that, described training out the convolutional neural network model that performs image recognition task, comprises:

S11, obtain a batch of training images with category labels;

S12, initialize a convolutional neural network, and embed an orthogonal multi-path block in the convolutional neural network to increase the robustness of the convolutional neural network;

S13, randomly select a small batch of images from all the images in S11, and input them into the convolutional neural network. Each path in the orthogonal multi-path block in the network will output a predicted image category to the image;

S14, for each path, calculate the difference between the output predicted image category and the real category of the batch of images respectively, and take a weighted average of the calculated differences for all paths;

S15, update network parameters by gradient descent method according to the calculated average difference;

S16, repeat S13 to S15 until the average difference converges, or set a sufficient number of repetitions, and stop training when the number of repetitions is reached, thereby obtaining a trained neural network model.

3. the image recognition method based on convolutional neural network according to claim 2, is characterized in that, described orthogonal multi-path block is embedded in the arbitrary position of described convolutional neural network, and concrete embedding position is according to actual use Business needs are determined.

4. The image recognition method based on convolutional neural network according to claim 3, is characterized in that, described orthogonal multi-path block is embedded in the last linear layer of described convolutional neural network, then the Each path is a linear layer, and the linear layer parameters on these paths are orthogonal to each other, and these linear layers share the previous layers of the network.

5. The image recognition method based on a convolutional neural network according to claim 3, wherein the orthogonal multi-path is embedded in the convolutional layer of the convolutional neural network, then each A path is a convolutional layer, the parameters of the convolutional layers on these paths are mutually orthogonal, and these convolutional layers share the rest of the network.

6. The image recognition method based on a convolutional neural network according to claim 1, wherein the image to be recognized is input into the convolutional neural network model, and the output image recognition result comprises:

S21, deploying the convolutional neural network model on a business machine;

S22, input the image to be identified into the convolutional neural network model, and each path in the convolutional neural network model will output the prediction result of the image;

S23, take the prediction result with the largest number of occurrences among the prediction results of these paths as the final prediction result of the image.

7. The image recognition method based on a convolutional neural network according to claim 1, further comprising: before training and recognition, preprocessing and/or preprocessing the training image and the image to be recognized image enhancement operations, where,

The preprocessing includes scaling the image size to the same size and normalizing the image pixel value size;

The image enhancement operation includes adding 0 pixels to the edge of the image, then cropping, and randomly flipping the image horizontally.

8. An image recognition system based on convolutional neural network, characterized in that, comprising:

A training module, which uses training images to train a convolutional neural network model for image recognition tasks;

a recognition module, which inputs the image to be recognized into the convolutional neural network model, and outputs the image recognition result;

9. An electronic terminal, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor is used to execute any of claims 1-7 when the processor executes the program. a described method.

10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the program is used to execute the method of any one of claims 1-7.