CN107480773B

CN107480773B - Method and device for training convolutional neural network model and storage medium

Info

Publication number: CN107480773B
Application number: CN201710675297.5A
Authority: CN
Inventors: 万韶华
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-08-09
Filing date: 2017-08-09
Publication date: 2020-11-13
Anticipated expiration: 2037-08-09
Also published as: CN107480773A

Abstract

The disclosure relates to a method, a device and a storage medium for training a convolutional neural network model, relating to the technical field of deep learning, wherein the method comprises the following steps: for each hidden layer in a plurality of hidden layers included in a convolutional neural network model, selecting a target node from a plurality of nodes included in the hidden layer based on the hidden probability of the hidden layer, wherein the hidden probabilities of the hidden layers are different; the convolutional neural network model is trained based on target nodes selected from a plurality of hidden layers. Because different hidden layers correspond to different input values, when different hidden probabilities are adopted to select target nodes from different hidden layers for training, compared with the prior art in which all hidden layers are trained by adopting the same hidden probability, the image recognition accuracy of the convolutional neural network model can be effectively improved.

Description

Method and device for training convolutional neural network model and storage medium

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to a method and an apparatus for training a convolutional neural network model, and a storage medium.

Background

In recent years, deep learning techniques are widely used in the field of image recognition and classification. The convolutional neural network models adopted in the deep learning technology are generally multilayer convolutional networks. When the convolutional neural network model is trained, if the number of samples in a training set is small, overfitting is easily caused, and therefore the image recognition accuracy is reduced. To solve the above problem, the Dropout algorithm may be used to train the convolutional neural network model.

In the related art, the convolutional neural network model may include an input layer, an output layer, and a plurality of hidden layers, where the plurality of hidden layers are located between the input layer and the output layer, the input layer is connected to a first hidden layer, an output value of the first hidden layer is used as an input value of a next hidden layer adjacent to the first hidden layer, and an output value of the last hidden layer is used as an input value of the output layer. When the Dropout algorithm is used for training the convolutional neural network model, for each hidden layer in the convolutional neural network model, a target node may be selected from a plurality of nodes included in the hidden layer according to a preset probability, and the convolutional neural network model may be trained according to the target node selected from the plurality of hidden layers.

Disclosure of Invention

In order to overcome the problem that the image recognition accuracy of a convolutional neural network model is low when the convolutional neural network model is trained by adopting the same preset probability in the related technology, the disclosure provides a method and a device for training the convolutional neural network model and a storage medium.

According to a first aspect of embodiments of the present disclosure, there is provided a method of training a convolutional neural network model, including:

for each hidden layer in a plurality of hidden layers included in a convolutional neural network model, selecting a target node from a plurality of nodes included in the hidden layer based on the hidden probability of the hidden layer, wherein the hidden probabilities of the hidden layers are different;

training the convolutional neural network model based on target nodes selected from the plurality of hidden layers.

Optionally, before the selecting the target node from the plurality of nodes included in the hidden layer based on the hidden probability of the hidden layer, the method further includes:

and determining the hiding probability of each hidden layer in a plurality of hidden layers included in the convolutional neural network model, wherein the hiding probabilities of the hidden layers are sequentially increased according to the sequence of the abstract degrees of the output values of the hidden layers from high to low.

Optionally, the determining the hidden probability of each hidden layer in the plurality of hidden layers included in the convolutional neural network model includes:

for each hidden layer in a plurality of hidden layers included in the convolutional neural network model, acquiring an output value of the hidden layer;

determining a concealment probability for the hidden layer based on the output value of the hidden layer.

Optionally, the determining the hidden probability of the hidden layer based on the output value of the hidden layer includes:

performing singular value decomposition on the output value of the hidden layer to obtain N singular values, wherein N is a positive integer greater than 1;

calculating the square sum of the N singular values, and calculating the product of the square sum of the N singular values and a preset proportion to obtain a target square sum;

sequencing the N singular values from large to small to obtain a sequencing result;

determining an Mth singular value in the sorting result, wherein the square sum of the first M singular values in the sorting result is larger than the target square sum, the square sum of the first M-1 singular values in the sorting result is smaller than the target square sum, and M is a positive integer larger than or equal to 1;

and determining the ratio of the M to the N as the hiding probability of the hidden layer.

acquiring an output value of a first hidden layer and an output value of a second hidden layer, wherein the first hidden layer is a hidden layer with the lowest abstract degree of the output values in the hidden layers, and the second hidden layer is a hidden layer with the highest abstract degree of the output values in the hidden layers;

determining a concealment probability of the first hidden layer based on the output value of the first hidden layer, and determining a concealment probability of the second hidden layer based on the output value of the second hidden layer;

determining the hiding probability of the plurality of hidden layers except for other hidden layers between the first hidden layer and the second hidden layer based on a probability difference between the hiding probability of the first hidden layer and the hiding probability of the second hidden layer.

Optionally, the first hidden layer is a first hidden layer connected to the input layer, and the second hidden layer is a last hidden layer connected to the output layer;

the determining, based on a probability difference between the concealment probability of the first hidden layer and the concealment probability of the second hidden layer, the concealment probabilities of other hidden layers of the plurality of hidden layers except the first hidden layer and the second hidden layer comprises:

determining a concealment probability of each hidden layer located between the first hidden layer and the last hidden layer based on the number of the plurality of hidden layers, the probability difference, the concealment probability of the first hidden layer, and the concealment probability of the last hidden layer.

Optionally, the selecting a target node from a plurality of nodes included in the hidden layer based on the hidden probability of the hidden layer includes:

generating a random probability for each node in a plurality of nodes included in the hidden layer according to a preset rule;

and when the random probability is smaller than the hiding probability, determining the node as a target node.

According to a second aspect of embodiments of the present disclosure, there is provided an apparatus for training a convolutional neural network model, the apparatus comprising:

a selection module, configured to select, for each hidden layer of a plurality of hidden layers included in a convolutional neural network model, a target node from a plurality of nodes included in the hidden layer based on hidden probability values of the hidden layer, where the hidden probabilities of the plurality of hidden layers are different;

a training module to train the convolutional neural network model based on target nodes selected from the plurality of hidden layers.

Optionally, the apparatus further comprises:

a determining module, configured to determine a hiding probability of each hidden layer in a plurality of hidden layers included in the convolutional neural network model, where the hiding probabilities of the plurality of hidden layers sequentially increase according to a sequence from high to low of abstraction degrees of output values of the plurality of hidden layers.

Optionally, the determining module includes:

a first obtaining sub-module, configured to obtain, for each hidden layer of a plurality of hidden layers included in the convolutional neural network model, an output value of the hidden layer;

a first determining submodule, configured to determine a hiding probability of the hidden layer based on an output value of the hidden layer.

Optionally, the determining sub-module is configured to:

Optionally, the determining module includes:

a second obtaining sub-module, configured to obtain an output value of a first hidden layer and an output value of a second hidden layer, where the first hidden layer is a hidden layer with a lowest abstraction degree of an output value in the multiple hidden layers, and the second hidden layer is a hidden layer with a highest abstraction degree of an output value in the multiple hidden layers;

a second determining sub-module, configured to determine a hiding probability of the first hidden layer based on the output value of the first hidden layer, and determine a hiding probability of the second hidden layer based on the output value of the second hidden layer;

a third determining sub-module, configured to determine, based on a probability difference between the hiding probability of the first hidden layer and the hiding probability of the second hidden layer, the hiding probabilities of other hidden layers of the multiple hidden layers except for the first hidden layer and the second hidden layer.

the third determination submodule is configured to:

Optionally, the selection module comprises:

a fourth determining submodule, configured to generate a random probability for each node in the plurality of nodes included in the hidden layer according to a preset rule;

and the fifth determining submodule is used for determining the node as a target node when the random probability is smaller than the hiding probability.

According to a third aspect of embodiments of the present disclosure, there is provided an apparatus for training a convolutional neural network model, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the steps of any one of the methods of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon instructions which, when executed by a processor, implement the steps of any one of the methods of the first aspect described above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: for each hidden layer in a plurality of hidden layers included in the convolutional neural network model, selecting a target node from a plurality of nodes included in the hidden layer through the hidden probability of the hidden layer, and training the convolutional neural network model according to the target node selected from the plurality of hidden layers, wherein the hidden probabilities of the hidden layers are different. Because different hidden layers correspond to different input values, when different hidden probabilities are adopted to select target nodes from different hidden layers for training, compared with the prior art in which all hidden layers are trained by adopting the same hidden probability, the image recognition accuracy of the convolutional neural network model can be effectively improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is an architectural diagram illustrating a convolutional neural network model, according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method of training a convolutional neural network model in accordance with an exemplary embodiment.

FIG. 3 is a flow diagram illustrating a method of training a convolutional neural network model in accordance with an exemplary embodiment.

FIG. 4 is a flow diagram illustrating a method of training a convolutional neural network model in accordance with an exemplary embodiment.

FIG. 5A is a block diagram illustrating an apparatus for training a convolutional neural network model, according to an example embodiment.

FIG. 5B is a block diagram illustrating an apparatus for training a convolutional neural network model, according to an example embodiment.

FIG. 5C is a block diagram illustrating a determination module in accordance with an exemplary embodiment.

FIG. 5D is a block diagram illustrating another determination module in accordance with an exemplary embodiment.

FIG. 6 is a block diagram illustrating an apparatus for training a convolutional neural network model in accordance with an exemplary embodiment.

FIG. 7 is a block diagram illustrating an apparatus for training a convolutional neural network model in accordance with an exemplary embodiment.

Detailed Description

To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

Before explaining the embodiments of the present disclosure in detail, an application scenario related to the embodiments of the present disclosure will be described.

The convolutional neural network model is a neural network developed on the basis of a traditional multilayer neural network and aiming at image classification and identification, and compared with the traditional multilayer neural network, a convolutional algorithm and a pooling algorithm are introduced into the convolutional neural network model. Before images are classified and identified by the convolutional neural network model, the convolutional neural network model needs to be trained. When the convolutional neural network model is trained, a plurality of samples in the training sample set can be input into the convolutional neural network model for forward calculation, so as to obtain an output value of each layer of the convolutional neural network model. Then, inverse calculation can be performed according to the output value of the last layer of the convolutional neural network model and the output values of the other layers, so as to update the parameters of the nodes of each layer.

In the training process, if the number of sample data in the training sample set is too small, the fitting effect of the trained convolutional neural network model on the number of sample in the training sample set may be very accurate, but when the test data is fitted through the convolutional neural network model, a phenomenon of very poor accuracy may occur, which is referred to as overfitting. If the convolutional neural network model is over-fitted, the image recognition accuracy rate using the convolutional neural network model is also greatly reduced.

Currently, in order to prevent the above-mentioned over-fitting phenomenon, a Dropout technique may be used to train the convolutional neural network model. The Dropout technique was originally proposed by the deep learning father Hinton, and when a convolutional neural network model is trained by using the Dropout technique, for each hidden layer in a plurality of hidden layers included in the convolutional neural network model, a plurality of target nodes in the hidden layer can be selected by a preset probability in an iterative training process, and parameters of the selected plurality of target nodes are updated. For the remaining unselected nodes in the hidden layer, the nodes are considered to be hidden in the iterative training process, that is, the parameters of the unselected nodes in the hidden layer are not updated temporarily. And when the iterative training is carried out again through other training samples, selecting the target node again through the preset probability. Therefore, because the selected nodes are different in each iterative training, the convolutional neural network model obtained in each iterative training is different. Because nodes included in each hidden layer are chosen or rejected when the Dropout technology is adopted to train the convolutional neural network model, the coupling effect among the nodes in each hidden layer is reduced, overfitting is reduced, and the image recognition accuracy is improved. The method for training the convolutional neural network model provided by the embodiment of the disclosure can be used in the process of training the convolutional neural network model by using the Dropout technology.

After the application scenario of the embodiment of the present disclosure is introduced, a basic architecture of a convolutional neural network model related to the embodiment of the present disclosure is introduced next.

Fig. 1 is an architecture of a convolutional neural network model provided in an embodiment of the present disclosure. As shown in FIG. 1, the convolutional neural network model comprises an input layer 101, a

hidden layer

102, 105 and an output layer 106, wherein the input layer 101 is connected with the hidden layer 102, the

hidden layers

102, 103, 104 and 105 are connected in sequence, and the hidden layer 105 is connected with the output layer 106. The input layer may include only one node or a plurality of nodes, any hidden layer in the hidden

layer

102 and 105 may include a plurality of nodes, and the output layer 106 may include one node or a plurality of nodes. As shown in FIG. 1, in the embodiment of the present disclosure, assuming that the input layer includes only one node, the

hidden layers

102 and 105 each include 4 nodes, and the output layer 106 may include one node.

The input layer is configured to determine pixel values of all pixel points included in an input image, and transmit the pixel values of all pixel points of the image to the hidden layer 102, where the hidden layer 102 may be a convolutional layer. After the hidden layer 102 receives the pixel values of all the pixel points, a first convolution algorithm process is performed according to the received pixel values of all the pixel points to obtain pixel points after the first convolution process, and the pixel points after the first convolution process are transmitted to the hidden layer 103, at this time, the hidden layer 103 may be a convolution layer, a sampling layer or a pooling layer. When the hidden layer 103 is a pooling layer, a first pooling algorithm process may be performed according to the pixel values of the pixels after the first convolution process to obtain the pixels after the first pooling process, and the pixels after the first pooling process are transmitted to the next hidden layer 104. At this time, the hidden layer 104 may be a convolution layer, and the hidden layer 104 may perform a second convolution algorithm process according to the pixel points after the first pooling process to obtain pixel points after the second convolution process, and transmit the pixel points after the second convolution process to the next hidden layer 105. When hidden layer 104 is a convolutional layer, hidden layer 105 may be a pooling layer. The hidden layer 105 may perform a second pooling algorithm process according to the received pixel points after the second convolution process, obtain pixel points after the second pooling process, and transmit the pixel points after the second pooling process to the output layer 106. Generally, the output layer 106 is a fully connected layer, and the probability that the image belongs to each of the preset multiple categories can be determined according to the received pixel values of the processed pixel points, so as to obtain the classification of the image.

It should be noted that the hidden layer in the convolutional neural network model may be a convolutional layer, a sampling layer, a pooling layer, or a fully-connected layer. The architecture of the convolutional neural network model provided above is only one possible architecture of the convolutional neural network model provided in the embodiments of the present disclosure, and does not constitute a limitation to the embodiments of the present disclosure.

After the application scenarios of the embodiments of the present disclosure and the related architecture of the convolutional neural network model are introduced, the following describes in detail the implementation of the training convolutional neural network model provided by the embodiments of the present disclosure.

Fig. 2 is a flowchart illustrating a method for training a convolutional neural network model according to an exemplary embodiment, which may be used in a terminal or a server, as shown in fig. 2, and includes the following steps:

in step 201, for each hidden layer in a plurality of hidden layers included in a convolutional neural network model, a target node is selected from a plurality of nodes included in the hidden layer based on a hidden probability of the hidden layer, and the hidden probabilities of the plurality of hidden layers are not the same.

In step 202, the convolutional neural network model is trained based on target nodes selected from the plurality of hidden layers.

In the embodiment of the disclosure, for each hidden layer in a plurality of hidden layers included in a convolutional neural network model, a target node is selected from a plurality of nodes included in the hidden layer according to the hidden probability of the hidden layer, and the convolutional neural network model is trained according to the target node selected from the plurality of hidden layers, wherein the hidden probabilities of the plurality of hidden layers are different. Because different hidden layers correspond to different input values, when different hidden probabilities are adopted to select target nodes from different hidden layers for training, compared with the prior art in which all hidden layers are trained by adopting the same hidden probability, the image recognition accuracy of the convolutional neural network model can be effectively improved.

Optionally, before selecting the target node from the plurality of nodes included in the hidden layer based on the hidden probability of the hidden layer, the method further includes:

and determining the hiding probability of each hidden layer in a plurality of hidden layers included in the convolutional neural network model, wherein the hiding probabilities of the hidden layers are sequentially increased according to the sequence of the abstract degrees of output values of the hidden layers from high to low.

Optionally, determining the hidden probability of each hidden layer in a plurality of hidden layers included in the convolutional neural network model comprises:

based on the output value of the hidden layer, the hiding probability of the hidden layer is determined.

Optionally, determining the hidden probability of the hidden layer based on the output value of the hidden layer includes:

determining the Mth singular value in the sequencing result, wherein the square sum of the first M singular values in the sequencing result is greater than the target square sum, the square sum of the first M-1 singular values in the sequencing result is less than the target square sum, and M is a positive integer greater than or equal to 1;

the ratio of M and N is determined as the concealment probability of the hidden layer.

acquiring an output value of a first hidden layer and an output value of a second hidden layer, wherein the first hidden layer is a hidden layer with the lowest abstract degree of the output values in the plurality of hidden layers, and the second hidden layer is a hidden layer with the highest abstract degree of the output values in the plurality of hidden layers;

determining the hiding probability of the first hidden layer based on the output value of the first hidden layer, and determining the hiding probability of the second hidden layer based on the output value of the second hidden layer;

determining the hiding probability of other hidden layers except the first hidden layer and the second hidden layer in the plurality of hidden layers based on the probability difference value between the hiding probability of the first hidden layer and the hiding probability of the second hidden layer.

determining the hiding probability of other hidden layers except the first hidden layer and the second hidden layer in the plurality of hidden layers based on the probability difference value between the hiding probability of the first hidden layer and the hiding probability of the second hidden layer, comprising:

and determining the hiding probability of each hidden layer positioned between the first hidden layer and the last hidden layer based on the number of the hidden layers, the probability difference value, the hiding probability of the first hidden layer and the hiding probability of the last hidden layer.

Optionally, selecting the target node from a plurality of nodes included in the hidden layer based on the hidden probability of the hidden layer includes:

and when the random probability is smaller than the hidden probability, determining the node as a target node.

All the above optional technical solutions can be combined arbitrarily to form optional embodiments of the present disclosure, and the embodiments of the present disclosure are not described in detail again.

When training the convolutional neural network model, for a plurality of hidden layers included in the convolutional neural network model, different hidden probabilities may be used to select a target node from a plurality of nodes included in each hidden layer, and the convolutional neural network model may be trained according to the selected target node. And prior to selecting the target node, a hiding probability for each of the plurality of hidden layers may be determined.

In the convolutional neural network model, the output values of the hidden layers are different in abstraction degree. Generally, the closer to the hidden layer of the output layer, in order from the input layer to the output layer, the closer the output value is to the class information, i.e., the higher the abstraction level. The closer to the hidden layer of the input layer, the closer the output value is to the morphological information, and the lower the abstraction level. For hidden layers with a high degree of abstraction, training can be performed with a smaller probability of hiding, and for hidden layers with a lower degree of abstraction, training can be performed with a larger probability of hiding.

Based on the above description, in the embodiment of the present disclosure, for the plurality of hidden layers, the hiding probability of each hidden layer in the plurality of hidden layers may be determined according to the level of abstraction of the output values of the plurality of hidden layers, and according to a principle that the hiding probability of a hidden layer decreases with increasing level of abstraction. Specifically, the hidden probability of each hidden layer in the plurality of hidden layers can be determined by two different methods, and then the convolutional neural network model is trained according to the hidden probability. Next, a first method for training a convolutional neural network model provided in an embodiment of the present disclosure will be explained in detail with reference to fig. 3.

FIG. 3 is a flow diagram illustrating a method of training a convolutional neural network model in accordance with an exemplary embodiment. The method may be used in a terminal or a server, and in the embodiment of the present disclosure, the terminal will be explained as an execution subject. When the execution subject is a server, the convolutional neural network model may still be trained through the implementation process in the following embodiments. As shown in fig. 3, the method comprises the steps of:

in step 301, for each hidden layer of a plurality of hidden layers included in the convolutional neural network model, an output value of the hidden layer is obtained.

In the embodiment of the present disclosure, when training the convolutional neural network model, the training images in the training sample set may be calculated forward, that is, the training images are input from the input layer of the convolutional neural network model, and the recognition result of the training images is finally output from the output layer of the convolutional neural network model through calculation of the intermediate hidden layers.

In the process of performing forward calculation on the training image, according to the sequence from the input layer to the output layer, the output value of the input layer is used as the input value of the first hidden layer, and the output value of the last hidden layer is used as the input value of the output layer. And for two adjacent hidden layers, the output value of the previous hidden layer is used as the input value of the next hidden layer. When the training image is transmitted from the input layer to the output layer, for each of the plurality of hidden layers of the convolutional neural network model, the terminal may obtain an output value of each hidden layer.

In step 302, based on the output value of each hidden layer, determining the hidden probability of each hidden layer, where the hidden probabilities of the hidden layers are different, and the hidden probabilities of the hidden layers are sequentially increased according to the order of the abstract degrees of the output values of the hidden layers from high to low.

After the output value of each hidden layer is obtained, the terminal can determine the hiding probability of each hidden layer according to the following method through the output value of each hidden layer. The hidden probability is a basis for subsequently selecting a plurality of nodes included in the hidden layer.

Wherein, for each hidden layer in the plurality of hidden layers, based on the output value of the hidden layer, the operation of determining the hidden probability of the hidden layer may be: performing singular value decomposition on the output value of the hidden layer to obtain N singular values, wherein N is a positive integer greater than 1; calculating the square sum of the N singular values, and calculating the product of the square sum of the N singular values and a preset proportion to obtain a target square sum; sequencing the N singular values from large to small to obtain a sequencing result; determining the Mth singular value in the sequencing result, wherein the square sum of the first M singular values in the sequencing result is greater than the target square sum, the square sum of the first M-1 singular values in the sequencing result is less than the target square sum, and M is a positive integer greater than or equal to 1; the ratio of M and N is determined as the concealment probability of the hidden layer.

It should be noted that after the output value of the hidden layer is obtained, singular value decomposition may be performed on the output value to obtain N singular values. Then, the sum of squares of the N singular values may be calculated, and a target sum of squares may be calculated at a preset ratio, for example, assuming that the sum of squares of the N singular values is R and the preset ratio is k, the target sum of squares will be W ═ k × R. The preset ratio may be 80%, 70%, or other values.

After the target square sum is determined, the terminal may sort the obtained N singular values in an order from large to small to obtain a sorting result. Thereafter, the terminal may first calculate the square of the singular value starting from the first singular value in the sorted result, and determine whether the square of the singular value is greater than or equal to the target sum of squares. If the square of the singular value is less than the target sum of squares, the terminal may add the square of the first singular value to the square of the second singular value in the sorted result to obtain the sum of squares of the first two singular values in the sorted result, and determine whether the sum of squares of the first two singular values is greater than or equal to the target sum of squares. According to the method, the terminal can continuously calculate and continuously judge until the square sum of the first M singular values in the sequencing result obtained by calculation is larger than or equal to the target square sum, the calculation is stopped, and the ratio of M to N is used as the hiding probability of the hidden layer.

For example, assume that 10 singular values obtained by singular value decomposition of the output values of the hidden layer, that is, N is 10. The sum of squares of the 10 singular values is calculated as R, and the preset proportion is 80%, so the target sum of squares W is 80% R. Sequencing the 10 singular values, and obtaining a sequencing result as follows: (n)₁，n₂，n₃，n₄，n₅，n₆，n₇，n₈，n₉，n₁₀). Then, the terminal first calculates

And judge

Whether or not W is greater than or equal to, if

If it is greater than or equal to W, then M is 1, and the hidden probability of the hidden layer will be M/N ═ 0.1. If it is not

Less than W, at which time the terminal can calculate

And continuously judge

Whether or not it is greater than or equal to W. If it is not

Less than W, then the terminal will continue to calculate

And the like, and stopping the calculation until the calculated sum of squares is greater than or equal to W. Suppose when the terminal calculates to

When it is determined

And if the current value is greater than or equal to the target sum of squares, determining that M is 6, and calculating that the hiding probability of the hidden layer is M/N is 0.6.

In step 303, a target node is selected from a plurality of nodes comprised by the hidden layer based on the hidden probability of the hidden layer.

After determining the hidden probability of the hidden layer, for each node in a plurality of nodes included in the hidden layer, the terminal may generate a random probability for the node according to a preset rule, and when the random probability is smaller than the hidden probability, determine the node as the target node.

The terminal can determine that a random probability is generated for the node through a Bernoulli function or a binomial distribution function, and then compares the random probability with the hidden probability of the hidden layer, so as to judge whether to update the parameters of the node in the subsequent training process.

For example, assuming that the hidden layer includes 4 nodes, the determined hiding probability of the hidden layer is 0.6. For the node 1, it is assumed that the random probability of the node determined by the bernoulli function or the binomial distribution function is 0.4, and since 0.4 is less than 0.6, at this time, the node 1 may be determined as a target node, that is, in the subsequent training process, the parameter of the node 1 needs to be updated. For the node 2, assuming that the random probability is 0.7, since 0.7 is greater than 0.6, at this time, the node 2 cannot be used as a target node, that is, in the subsequent training process, the node 2 will be temporarily hidden, and the parameters of the node 2 are not updated. For the remaining two nodes, whether to use it as the target node can be determined by the method described above.

For each hidden layer, the terminal may select at least one target node from the hidden layers by the method described above, and after determining the target node corresponding to each hidden layer, the terminal may train the convolutional neural network model by the method in step 304.

In step 304, the convolutional neural network model is trained based on target nodes selected from the plurality of hidden layers.

For each hidden layer, the terminal can select a target node from the hidden layers by the method, and after the terminal determines the target nodes corresponding to the multiple hidden layers respectively, the convolutional neural network model can be trained by the selected target nodes.

When the terminal determines a plurality of target nodes of the hidden layer and the forward calculation in the training process is completed, the terminal can start the reverse calculation. When the reverse calculation is performed, the parameters of the selected target node can be updated according to the output value of the output layer during the forward calculation, so that the training of the convolutional neural network model is completed.

In the embodiment of the disclosure, the terminal may determine the hidden probability of each hidden layer in a plurality of hidden layers included in the convolutional neural network model, where the hidden probabilities of the plurality of hidden layers are different, and the hidden probabilities of the plurality of hidden layers are sequentially reduced according to a sequence that the abstraction degrees of the output values of the plurality of hidden layers are from low to high. Since the output values of the various hidden layers differ in the order from the input layer to the output layer. Therefore, the terminal can determine different hiding probabilities for different hidden layers according to the abstraction degree corresponding to each hidden layer. And because different hidden layers correspond to different input values, when different hidden probabilities are adopted to train the convolutional neural network model, compared with the prior art in which all the hidden layers are trained by adopting the same hidden probability, the image recognition accuracy of the convolutional neural network model can be effectively improved.

It should be noted that, experiments prove that, by determining the hidden probability according to the method provided in the embodiment of the present disclosure, and training the convolutional neural network model according to the hidden probability, when image recognition is performed by using the trained convolutional neural network model, compared with image recognition performed by using the convolutional neural network model obtained through the preset probability training of 0.5 in the related art, the recognition accuracy is increased from 0.86 to 0.92, and the improvement effect is very obvious.

A first method for training a convolutional neural network model is described in the foregoing embodiment, and a second method for training a convolutional neural network model provided in the embodiment of the present disclosure will be described with reference to fig. 4.

Fig. 4 is a flowchart illustrating a method for training a convolutional neural network model according to an exemplary embodiment, where the method may be used in a terminal or a server, and in this embodiment of the present disclosure, an execution subject is taken as a terminal for explanation, and in practical applications, when the execution subject is a server, the convolutional neural network model may still be trained through an implementation process in the following embodiments. As shown in fig. 4, the method comprises the steps of:

in step 401, an output value of a first hidden layer and an output value of a second hidden layer are obtained, where the first hidden layer is a hidden layer with the lowest abstract degree of the output values in the plurality of hidden layers, and the second hidden layer is a hidden layer with the highest abstract degree of the output values in the plurality of hidden layers.

In the embodiment of the present disclosure, in order to reduce the calculation amount, the terminal may analyze only the output values of the first hidden layer with the lowest abstract degree of the output value and the second hidden layer with the highest abstract degree of the output value to obtain the corresponding hidden probabilities respectively, and for other hidden layers, may not analyze the output values thereof, and determine the hidden probabilities thereof by using a simpler and more convenient algorithm. Therefore, in this step, the terminal may only obtain the output value of the first hidden layer and the output value of the second hidden layer.

It should be noted that, in the current convolutional neural network model, generally, the abstraction degree of the output value of the first hidden layer connected to the input layer is the lowest, and the abstraction degree of the output value of the last hidden layer connected to the output layer is usually the highest, so that the terminal can directly obtain the output value of the first hidden layer connected to the input layer as the output value of the first hidden layer, and use the output value of the last hidden layer connected to the output layer as the output value of the second hidden layer.

In step 402, a concealment probability of the first hidden layer is determined based on the output value of the first hidden layer, and a concealment probability of the second hidden layer is determined based on the output value of the second hidden layer.

When the output value of the first hidden layer is obtained, the terminal may determine to obtain the hidden probability of the first hidden layer in the manner in step 302 in the foregoing embodiment. Similarly, when the hidden probability of the second hidden layer is obtained, the terminal may also determine the hidden probability of the second hidden layer by determining the hidden probability in step 302. The disclosed embodiments are not described in detail herein.

In step 403, based on a probability difference between the hiding probability of the first hidden layer and the hiding probability of the second hidden layer, the hiding probabilities of other hidden layers of the plurality of hidden layers except the first hidden layer and the second hidden layer are determined.

After determining that the hidden probabilities of the first hidden layer and the second hidden layer are obtained, the terminal can calculate a probability difference value between the hidden probability of the first hidden layer and the hidden probability of the second hidden layer; the terminal may then determine the concealment probabilities for the remaining ones of the plurality of hidden layers based on the probability difference.

The terminal can sequence the abstraction degrees of the output values of the plurality of hidden layers from low to high to obtain a sequencing result, at this time, a first hidden layer in the sequencing result is a first hidden layer, and a last hidden layer in the sequencing result is a second hidden layer. Then, the terminal may determine the hidden probability of each hidden layer located between the first hidden layer and the last hidden layer in the ranking result based on the number of the plurality of hidden layers, the probability difference, the hidden probability of the first hidden layer, and the hidden probability of the second hidden layer.

After the terminal calculates the probability difference between the hidden probability of the first hidden layer and the hidden probability of the second hidden layer, the terminal can determine the number K of hidden layers included in the convolutional neural network model and calculate the ratio between the probability difference and K-1. Then, the terminal may subtract the ratio from the hidden probability of the first hidden layer to obtain the hidden probability of the second hidden layer in the ranking result, and so on. That is, for each hidden layer located between the second hidden layers of the first hidden layer in the ranking result, the ratio may be subtracted from the hiding probability of the hidden layer, so as to obtain the hiding probability of the next hidden layer located after the hidden layer in the ranking result.

It should be noted that, in the current convolutional neural network model, the abstraction levels of the output values of the plurality of hidden layers are usually sequentially increased according to the sequence from the input layer to the output layer, that is, the abstraction level of the output value of the first hidden layer connected to the input layer is the lowest, the abstraction level of the output value of the next hidden layer connected to the first hidden layer is higher than that of the first hidden layer, and so on, the abstraction level of the output value of the last hidden layer connected to the output layer is the highest. Thus, for such convolutional neural network models, the terminal may no longer have to order the various hidden layers by the abstraction level of the output values. That is, the terminal may directly determine the hidden probability of the first hidden layer connected to the input layer and the hidden probability of the last hidden layer connected to the output layer, calculate a probability difference between the first hidden layer and the last hidden layer, and then calculate the hidden probability of each hidden layer located between the first hidden layer and the last hidden layer in the convolutional neural network model based on the number of the plurality of hidden layers, the probability difference, the hidden probability of the first hidden layer, and the hidden probability of the last hidden layer.

For example, assuming that the hidden probability of the first hidden layer connected to the input layer in the convolutional neural network model is a, the hidden probability of the last hidden layer connected to the output layer is b, and the number of hidden layers included in the convolutional neural network model is 10, the terminal may calculate a probability difference a-b between the hidden probability a of the first hidden layer and the hidden probability b of the last hidden layer, and calculate a ratio (a-b)/(K-1). Then, for the next hidden layer connected with the first hidden layer, the hidden probability of the hidden layer is a- (a-b)/(K-1), and the hidden probability of the next hidden layer connected with the hidden layer is a-2 x (a-b)/(K-1), and so on, the hidden probability of each hidden layer between the first hidden layer and the last hidden layer can be calculated.

In step 404, for each hidden layer, a target node is selected from a plurality of nodes included in the hidden layer based on the hidden probability of the hidden layer.

After determining the hidden probability of each hidden layer, for each hidden layer, the terminal may refer to step 303 in the foregoing embodiment, and select a target node from a plurality of nodes included in the hidden layer according to the hidden probability of the hidden layer. The disclosed embodiments are not described in detail herein.

In step 405, the convolutional neural network model is trained based on target nodes selected from a plurality of hidden layers.

In this step, reference may be made to the implementation manner in step 304 in the foregoing embodiment, and details of this embodiment of the disclosure are not repeated.

In addition, in this embodiment, the terminal may only analyze the output value of the hidden layer with the highest output value abstraction degree and the output value of the hidden layer with the lowest abstraction degree, and the remaining hidden layers may be determined according to the principle that the hidden probability decreases as the abstraction degree increases, without performing output value analysis any more, thereby reducing the calculation amount of the terminal.

After the method for training the convolutional neural network model provided by the embodiment of the present disclosure is introduced, an apparatus for training the convolutional neural network model provided by the embodiment of the present disclosure is introduced next.

FIG. 5A is a block diagram illustrating an apparatus for training a convolutional neural network model, according to an example embodiment. Referring to fig. 5A, the apparatus includes a determination module 501, a selection module 502, and a training module 503.

A selecting module 501, configured to select, for each hidden layer in a plurality of hidden layers included in a convolutional neural network model, a target node from a plurality of nodes included in the hidden layer based on hidden probability values of the hidden layer, where hidden probabilities of the plurality of hidden layers are different;

a training module 502 for training the convolutional neural network model based on target nodes selected from the plurality of hidden layers.

Optionally, referring to fig. 5B, the apparatus further comprises:

a determining module 503, configured to determine a hiding probability of each hidden layer in a plurality of hidden layers included in the convolutional neural network model, where the hiding probabilities of the plurality of hidden layers sequentially increase according to an order from high to low of abstraction degrees of output values of the plurality of hidden layers.

Optionally, referring to fig. 5C, the determining module 503 includes:

a first obtaining sub-module 5031, configured to obtain, for each hidden layer of a plurality of hidden layers included in the convolutional neural network model, an output value of the hidden layer;

a first determining sub-module 5032 for determining the hiding probability of the hidden layer based on the output value of the hidden layer.

Optionally, the first determining sub-module is configured to:

Optionally, referring to fig. 5D, the determining module 503 includes:

a second obtaining sub-module 5033, configured to obtain an output value of a first hidden layer and an output value of a second hidden layer, where the first hidden layer is a hidden layer with a lowest abstraction level of output values in the multiple hidden layers, and the second hidden layer is a hidden layer with a highest abstraction level of output values in the multiple hidden layers;

a second determining sub-module 5034, configured to determine a hiding probability of the first hidden layer based on the output value of the first hidden layer, and determine a hiding probability of the second hidden layer based on the output value of the second hidden layer;

a third determining sub-module 5035 configured to determine the hiding probabilities of the hidden layers other than the first hidden layer and the second hidden layer in the plurality of hidden layers based on a probability difference between the hiding probability of the first hidden layer and the hiding probability of the second hidden layer.

the third determination submodule is configured to:

Optionally, the selecting module 502 comprises:

the fourth determining submodule is used for generating a random probability for each node in the plurality of nodes included in the hidden layer according to a preset rule;

and the fifth determining submodule is used for determining the node as the target node when the random probability is smaller than the hiding probability.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 6 is a block diagram illustrating an apparatus 600 for training a convolutional neural network model in accordance with an exemplary embodiment. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, apparatus 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operations at the apparatus 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power supplies for the apparatus 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, audio component 610 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the device 600, the sensor component 614 may also detect a change in position of the device 600 or a component of the device 600, the presence or absence of user contact with the device 600, orientation or acceleration/deceleration of the device 600, and a change in temperature of the device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the methods provided by the embodiments illustrated in fig. 2-4 and described above.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 620 of the apparatus 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium, wherein instructions of the storage medium, when executed by a processor of a terminal, enable the terminal to perform a method of training a convolutional neural network model provided by the embodiments shown in fig. 2, fig. 3 and fig. 4 described above.

FIG. 7 is a block diagram illustrating an apparatus 700 for training a convolutional neural network model in accordance with an exemplary embodiment. For example, the apparatus 700 may be provided as a server. Referring to fig. 7, the apparatus 700 includes a processor 722 that further includes one or more processors and memory resources, represented by memory 732, for storing instructions, such as applications, that are executable by the processor 722. The application programs stored in memory 732 may include one or more modules that each correspond to a set of instructions. Further, processor 722 is configured to execute instructions to perform the methods provided by the embodiments illustrated in fig. 2-4 and described above.

The apparatus 700 may also include a power component 726 configured to perform power management of the apparatus 700, a wired or wireless network interface 750 configured to connect the apparatus 700 to a network, and an input output (I/O) interface 758. The apparatus 700 may operate based on an operating system stored in memory 732, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided that includes instructions, such as the memory 732 that includes instructions, which are executable by the processor 722 of the device 700 to perform the above-described method. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium, wherein instructions of the storage medium, when executed by a processor of a server, enable the server to perform the method of training a convolutional neural network model provided by the embodiments shown in fig. 2, fig. 3 and fig. 4 described above.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of training a convolutional neural network model, the method comprising:

determining the hiding probability of each hidden layer in a plurality of hidden layers included in a convolutional neural network model, wherein the hiding probabilities of the hidden layers are sequentially increased according to the sequence of the abstract degrees of output values of the hidden layers from high to low;

for each hidden layer of the plurality of hidden layers included in the convolutional neural network model, selecting a target node from a plurality of nodes included in the hidden layer based on a hidden probability of the hidden layer, the hidden probabilities of the plurality of hidden layers being different;

the convolutional neural network model is trained on the basis of a target node selected from the plurality of hidden layers, an input layer of the trained convolutional neural network model is used for determining pixel values of all pixel points included in an input image and transmitting the pixel values of all the pixel points of the image to the hidden layer connected with the input layer, and an output layer of the trained convolutional neural network model is used for determining the probability that the image belongs to each of a plurality of preset categories according to the received pixel values of the processed pixel points, so that the recognition result of the image is output.

2. The method of claim 1, wherein determining the hidden probability for each hidden layer in the plurality of hidden layers included in the convolutional neural network model comprises:

3. The method of claim 2, wherein determining the hidden probability of the hidden layer based on the output value of the hidden layer comprises:

4. The method of claim 1, wherein determining the hidden probability for each hidden layer in the plurality of hidden layers included in the convolutional neural network model comprises:

determining hiding probabilities of other hidden layers of the plurality of hidden layers except the first hidden layer and the second hidden layer based on a probability difference between the hiding probability of the first hidden layer and the hiding probability of the second hidden layer.

5. The method of claim 4, wherein the first hidden layer is a first hidden layer connected to an input layer, and wherein the second hidden layer is a last hidden layer connected to an output layer;

6. The method according to any of claims 1-5, wherein said selecting a target node from a plurality of nodes comprised by the hidden layer based on the hidden probability of the hidden layer comprises:

7. An apparatus for training a convolutional neural network model, the apparatus comprising:

the determining module is used for determining the hiding probability of each hidden layer in a plurality of hidden layers included in the convolutional neural network model, and the hiding probabilities of the hidden layers are sequentially increased according to the sequence of the abstract degrees of the output values of the hidden layers from high to low;

a selection module, configured to select, for each hidden layer of a plurality of hidden layers included in the convolutional neural network model, a target node from a plurality of nodes included in the hidden layer based on a hidden probability of the hidden layer, where hidden probabilities of the plurality of hidden layers are different;

the training module is used for training the convolutional neural network model based on a target node selected from the plurality of hidden layers, an input layer of the trained convolutional neural network model is used for determining pixel values of all pixel points included in an input image and transmitting the pixel values of all pixel points of the image to the hidden layer connected with the input layer, and an output layer of the trained convolutional neural network model is used for determining the probability that the image belongs to each of a plurality of preset categories according to the received pixel values of the processed pixel points, so that the recognition result of the image is output.

8. The apparatus of claim 7, wherein the determining module comprises:

9. The apparatus of claim 8, wherein the first determination submodule is configured to:

10. The apparatus of claim 7, wherein the determining module comprises:

11. The apparatus of claim 10, wherein the first hidden layer is a first hidden layer connected to an input layer and the second hidden layer is a last hidden layer connected to an output layer;

the third determination submodule is configured to:

12. The apparatus according to any of claims 7-11, wherein the selection module comprises:

13. An apparatus for training a convolutional neural network model, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to the steps of any of the methods of claims 1-6.

14. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of any of the methods of claims 1-6.