Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
For example, the prediction result of some samples may be better than the prediction result of the teacher network, but the prediction results of the student network and the teacher network are still pulled up, so that the prediction accuracy of the student network on the samples is reduced. The current research lacks a distillation method based on knowledge filtering, namely, a teacher network transmits 'positive knowledge' to a student network in the distillation process. The invention mainly solves the problem that a teacher network can transmit negative knowledge to a student network in the model distillation process.
The purpose of scale learning is to shorten the distance between similar samples and lengthen the distance between heterogeneous samples. Therefore, in the model distillation process, for the same input sample, the prediction result of the teacher network can be transmitted to the student network as knowledge, namely, the prediction result of the student network and the teacher network for the same pair of samples is pulled up. The distillation method can be widely applied without the limitation of teacher network and student network to sample characteristic dimension. However, during distillation, the teacher network may output some "negative knowledge," such as for some samples, the prediction results of the student network may be more accurate, and if the prediction results of the student network and the teacher network are forced to be pulled up, the prediction accuracy of the student network on these samples may be reduced. Based on this, according to the network model training method, device and electronic equipment provided by the embodiment of the invention, the second network model (student network) is enabled to learn better samples which are worse than the first network model (teacher network) by deleting the better samples which are already learned by the second network model (student network), and the better samples which are already learned by the second network model are not biased, so that the accuracy of the second network model is improved to a certain extent. The following is a description of examples.
The second network model in the embodiment of the invention can be applied to various application scenes such as target detection, target recognition and the like, for example, the second network model is applied to recognize pedestrians or vehicles, the second network model is applied to track pedestrians or vehicles, the second network model is applied to recognize human body parts or vehicle components (such as license plates or vehicle logos) and the like.
As shown in fig. 1, an electronic device 100 includes one or more processors 102, one or more memories 104, an input device 106, an output device 108, and one or more image capture devices 110, which are interconnected by a bus system 112 and/or other forms of connection mechanisms (not shown). It should be noted that the components and structures of the electronic device 100 shown in fig. 1 are exemplary only and not limiting, as electronic devices may have other components and structures as desired.
The processor 102 may be a server, a smart terminal, or a device that includes a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, may process data from other components in the electronic device 100, and may control other components in the electronic device 100 to perform network model training functions.
Memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and executed by the processor 102 to perform functions in the embodiments of the present invention (implemented by a processing device) below and/or other desired functions. Various applications and various data, such as visible and infrared video sequences, as well as various data used and/or generated by the applications, etc., may also be stored in the computer readable storage medium.
The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, mouse, microphone, touch screen, and the like.
The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.
The image acquisition device 110 may acquire a set of sample pairs and store the acquired video sequence in the memory 104 for use by other components.
Illustratively, the devices in the electronic apparatus for implementing the network model training method, the image object recognition method and the apparatus according to the embodiments of the present invention may be integrally disposed, or may be separately disposed, such as integrally disposing the processor 102, the memory 104, the input device 106 and the output device 108, and disposing the image capturing device 110 at a designated location where the video frame may be captured. When the devices in the above electronic apparatus are integrally provided, the electronic apparatus may be implemented as an intelligent terminal such as a camera, a smart phone, a tablet computer, a vehicle-mounted terminal, or the like.
The embodiment provides a network model training method, wherein the method is applied to a server, and referring to a flowchart of the network model training method shown in fig. 2, the method specifically comprises the following steps:
step S202, training a second network model by applying the same sample pair set as the first network model, wherein the calculation amount of the first network model is larger than that of the second network model, and the first network model is the teacher network and the second network model is the student network.
The samples in the above-described sample pair set are all present in pairs, for example, (a 1,b1) is a sample pair and (a 2,b2) is a sample pair, and in this embodiment, the number of sample pairs included in the sample pair set is not limited.
Step S204, when the second network model completes the iterative training of the present round, a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair are obtained;
and when the training of the second network model reaches the preset time or the preset times, the iterative training of the round is completed, and the preset time or the preset times can be set according to actual needs without limitation.
In general, the samples in the sample pair set can be divided into two types, one type is a similar sample and the other type is a heterogeneous sample, for example, a picture sample only comprising men is regarded as a similar sample, a picture sample comprising men and women is regarded as a heterogeneous sample, in order to train the network model by using more picture features, the similar sample or the heterogeneous sample can be divided into a plurality of small-class samples, for example, a picture sample of men wearing hats in the similar sample is regarded as a small-class sample, a picture sample of men wearing western clothing is regarded as a small-class sample, and the like, and the division of the similar sample and the heterogeneous sample can be set according to practical application scenes without limitation.
The first sample pair may be a homogeneous sample including at least one sample pair, or a heterogeneous sample including at least one sample pair, and the first sample pair is predicted by using a second network model trained by the present round and a first network model trained by the sample pair set in advance, so as to obtain a first predicted result predicted by the first network model and a second predicted result predicted by the second network model.
Step S206, if the second predicted result is better than the first predicted result, deleting the first sample pair from the sample pair set to obtain a sample pair update set;
If the obtained second prediction result is better than the first prediction result, it is indicated that the first sample pair is a good sample that has been learned for the second network model, and in order to avoid the problem of poor training effect caused by training the second network model by repeatedly using the first sample during the knowledge distillation process, in this embodiment, the first sample pair or the similar sample to which the first sample pair belongs may be deleted from the sample set, so as to obtain a sample pair update set. For example, the first pair of samples is 10 pairs of samples comprising a cap man, and if the 10 pairs of samples are predicted to be better than the 10 pairs of samples using the second network model pair than the 10 pairs of samples of the first network model pair, the 10 pairs of samples, or all pairs of samples (20 pairs of samples) of the cap man, may be deleted from the set of pairs of samples.
If the second prediction result obtained is inferior to the first prediction result, it is indicated that the second network model has poorer prediction capability for the first pair of samples than the first network model, so that knowledge of the first pair of samples is required by the second network model, and therefore, the first pair of samples is kept to train the second network model.
Step S208, determining a total loss function value of the second network model based on the sample pair update set;
The determined total loss function value is a convergence parameter of the second network model for achieving the final training purpose.
And S210, if the total loss function value is larger than the preset value, updating the parameters of the second network model by applying the total loss function value, and continuing to perform next iteration training on the updated second network model by applying the sample pair set sample pair update set pair until the total loss function value is converged to the preset value, so as to obtain a trained second network model.
When the determined total loss function value is greater than the preset value, it indicates that the second network model trained this time does not reach the preset convergence, after the updated second network model is obtained by updating the parameters of the second network model by applying the total loss function value, a first sample pair can be selected again from the sample pair set, and steps S204 to S210 are executed again until the determined total loss function value is not greater than the preset value, which indicates that the updated second network model reaches the preset convergence effect, so that the output of the trained second network model can approach the output of the first network model.
When the next training is carried out, the original sample pair set can still be used, so that the sample pair set used by the second network model is always the same as the sample pair set used by the first network model, the total loss function value of the second network model is determined based on the updated set of the sample pair of the current training round on the basis of controlling the training times of the second network model, the rationality of the training times is ensured, and the performance of the trained second network model is basically consistent with that of the first network model.
The embodiment of the application provides a network model training method, which comprises the steps of training a second network model by using a sample pair set which is the same as a first network model, obtaining a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair when the second network model completes the iterative training, deleting the first sample pair from the sample pair set to obtain a sample pair update set if the second prediction result is better than the first prediction result, determining a total loss function value of the first network model based on the sample pair update set, updating parameters of the second network model by using the total loss function value if the total loss function value is greater than a preset value, enabling the prediction result of the second network model to approach to the prediction result of the first network model, and continuing the next iterative training of the updated second network model by using the sample pair set until the total loss function value converges to the preset value, so as to obtain the trained second network model. According to the method, the learned good samples of the second network model are removed, so that the second network model learns samples which are worse than the first network model more intensively, the learned good samples cannot be biased, the training efficiency is improved, the performance of the trained second network model is ensured, and the accuracy of target identification and detection by using the second network model is higher.
The present embodiment provides another network model training method, which is implemented on the basis of the foregoing embodiment, and focuses on a specific implementation manner of acquiring a first prediction result of a first network model for a first sample pair and a second prediction result of a second network model for the first sample pair. As shown in fig. 3, the method for training a network model in this embodiment includes the following steps:
step S302, training a second network model by applying the same sample pair set as the first network model;
Step S304, when the second network model completes the iterative training of the present round, calculating a first characteristic distance of the first network model to the first sample pair, and calculating a second characteristic distance of the second network model to the first sample pair;
The first feature distance and the second feature distance are euclidean distances or cosine similarities corresponding to features of the first sample pair, or other metrics, and are not limited herein.
In this embodiment, taking the euclidean distance corresponding to the feature of the first sample pair as an example, if the first sample pair belongs to the same class of samples, the first feature distance of the first sample pair calculated by using the first network model is:
D intra1 [ i ] = euclidean_distance (a [ i ], b [ i ]), where (a [ i ], b [ i ]) represents the i first sample pair and euclidean_distance (a [ i ], b [ i ]) represents the Euclidean Distance of the i first sample pair.
If the first sample pair belongs to a heterogeneous sample, the first feature distance of the first sample pair calculated by using the first network model is as follows:
D inter1 [ j ] = euclidean_distance (a [ j ], b [ j ]), where euclidean_distance (a [ j ], b [ j ]) represents the Euclidean Distance of the j-th first sample pair.
If the first pair belongs to the same class of samples, the second characteristic Distance of the first pair calculated by using the second network model is D intra2 [ i ] = Euclidean_distance (a [ i ], b [ i ]).
If the first pair of samples belongs to heterogeneous samples, the second characteristic Distance of the first pair of samples calculated by using the second network model is D inter2 [ j ] = Euclidean_distance (a [ j ], b [ j ]).
In the embodiment of the present invention, the calculation method of the euclidean distance is similar to the calculation method of the existing euclidean distance, and will not be described in detail here.
Step S306, calculating a first characteristic distance mean value of a first sample pair belonging to a sample class based on a first network model, and calculating a second characteristic distance mean value of the first sample pair belonging to the sample class based on a second network model, wherein the sample class comprises a similar sample or a heterogeneous sample;
in practical use, in order to facilitate distinguishing the sample class to which the sample pair belongs, each sample pair in the sample pair set may be identified by a sample class, where the sample pair of the sample pair belonging to the same type of sample may be identified as 1, and the heterogeneous sample may be identified as 0, so that when the first sample pair is acquired, the sample class to which the sample pair belongs may be determined according to the identification thereof.
The characteristic distance average value refers to an average value of characteristic distances of all samples in the sample class to which the first sample belongs, and the first characteristic distance average value and the second characteristic distance average value of the sample class to which the first sample belongs can be calculated by using the first network model and the second network model respectively.
The specific calculation method of the first characteristic distance mean value comprises the steps of inputting all sample pairs of a first sample pair to a first network model one by one to obtain the characteristic distance of each sample pair, carrying out summation calculation on the characteristic distance of each sample pair, and dividing the calculated summation value by the total sample pair number of the first sample pair to obtain a result value which is a first characteristic distance mean1 (D intra) or mean1 (D inter), wherein mean1 (D intra) represents the first characteristic distance mean value of a first sample pair to a same sample, and mean1 (D inter) represents the first characteristic distance mean value of a first sample pair to a different sample.
The second feature distance mean calculated based on the second network model is mean2 (D intra) or mean2 (D inter), where mean2 (D intra) represents the second feature distance mean of the first sample to the belonging homogeneous sample and mean2 (D inter) represents the second feature distance mean of the first sample to the belonging heterogeneous sample. Since the process of calculating the second feature distance average value based on the second network model is the same as the process of calculating the first feature distance average value based on the first network model, a detailed description thereof will be omitted.
Step S308, carrying out normalization processing on the first feature distance by applying a first feature distance mean value, and taking a first feature distance normalization result as a first prediction result of a first network model for a first sample pair;
in the embodiment, the normalization processing method is to divide the first characteristic distance by the first characteristic distance mean value, and the obtained normalization result can be regarded as a first prediction result of the first network model for the first sample pair.
If the normalization result of the first characteristic distance obtained by the first sample on the same type of sample is that:
If the normalization result of the first characteristic distance obtained by the first sample on the heterogeneous sample is that:
step S310, carrying out normalization processing on the second characteristic distance by applying a second characteristic distance mean value, and taking a second characteristic distance normalization result as a second prediction result of the second network model for the first sample pair;
The purpose and method of normalizing the second feature distance are the same as those of step S308, and will not be described here again.
Wherein, if the normalization result of the second characteristic distance obtained by the first sample on the same class of samples is:
If the normalization result of the second characteristic distance obtained by the first sample on the heterogeneous sample is that:
Step S312, based on the first sample pair, judging whether the second predicted result is better than the first predicted result by comparing the first predicted result and the second predicted result;
The process of step S312 may be implemented by steps A1 to A2:
A1, if the first sample is the same type of sample as the belonging sample, subtracting the second predicted result from the first predicted result to obtain a first difference value, and if the first difference value is greater than 0 or greater than a preset first positive value, determining that the second predicted result is better than the first predicted result;
The first difference is greater than 0 or greater than a preset first positive value, indicating that the first predicted result is greater than the second predicted result. Continuing with the description taking Euclidean distance corresponding to the features of the first sample pair as an example, in the similar samples, if the second predicted result of the second network model is smaller than the first predicted result of the first network model, it is described that the prediction capability of the second network model is better than that of the first network model in the first sample pair, so that knowledge of the sample is filtered out in the distillation process.
If the second prediction result of the second network model is greater than or equal to the first prediction result of the first network model, it is explained that the second network model predicts that the prediction capacity for this part is worse than the first network model, so that knowledge of this part of the sample is exactly what the second network model needs to train.
And A2, if the first sample is a heterogeneous sample for the belonging sample class, calculating a first predicted result minus a second predicted result to obtain a second difference value, and if the second difference value is smaller than 0 or smaller than a preset second negative value, determining that the second predicted result is better than the first predicted result.
The second difference is smaller than 0 or smaller than a preset second negative value, which indicates that the first predicted result is not larger than the second predicted result. Taking Euclidean distance as a measurement mode of the characteristic distance as an example, for heterogeneous samples, if the second predicted result of the second network model is larger than the first predicted result of the first network model, the prediction capability of the second network model is better than that of the first network model on the part of samples, so that the knowledge of the part of samples is 'negative knowledge' for the student network in the distillation process should not be learned.
If the second predicted result of the second network model is less than or equal to the first predicted result of the first network model, it is stated that the second network model has more space for optimization over the first network model on the part of the samples, so that the learning of the part of the samples is enhanced, i.e. the second network model is trained by using the first samples during the distillation.
Step S314, if the second predicted result is better than the first predicted result, deleting the first sample pair from the sample pair set to obtain a sample pair update set;
Whether it is a homogeneous sample or a heterogeneous sample, if the second predicted result of the second network model is better than the first predicted result of the first network model, it is stated that knowledge of the fractional sample need not be learned and can be deleted from the set of sample pairs.
Step S316, determining a total loss function value of the second network model based on the sample pair update set;
and step S318, if the total loss function value is larger than the preset value, updating the parameters of the second network model by using the total loss function value, and continuing the next iteration training of the updated second network model by using the sample pair set until the total loss function value is converged to the preset value, thereby obtaining the trained second network model.
According to the network model training method provided by the embodiment of the invention, a first prediction result can be obtained based on the first network model according to the calculated first characteristic distance mean value and the first characteristic distance, a second prediction result is obtained based on the second network model according to the calculated second characteristic distance mean value and the second characteristic distance of the first sample pair, when the second prediction result is judged to be better than the first prediction result, the first sample pair is deleted from the sample pair set, the total loss function value of the second network model is determined by utilizing the sample pair update set, if the total loss function value is greater than a preset value, the parameters of the second network model are updated by applying the total loss function value, so that the prediction result of the second network model is close to the prediction result of the first network model, the next round of iterative training is continued by applying the sample pair set until the total loss function value converges to the preset value, and a trained second network model is obtained.
The present embodiment provides another network model training method, which is implemented on the basis of the foregoing embodiment, and focuses on a specific implementation manner of determining the total loss function value of the second network model based on the sample pair update set. As shown in fig. 4, the method for training a network model in this embodiment includes the following steps:
step S402, training a second network model by applying the same sample pair set as the first network model;
Step S404, when the second network model completes the iterative training of the present round, a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair are obtained;
Step S406, if the second predicted result is better than the first predicted result, deleting the first sample pair from the sample pair set to obtain a sample pair update set;
step S408, determining a similar sample distillation loss value and a heterogeneous sample distillation loss value of the second network model based on the sample pair update set;
the process of step S408 may be implemented by steps B1-B4:
Step B1, calculating the square sum of effective similar distance differences of the first network model and the second network model based on similar sample pairs in the sample pair updating set;
In the embodiment, the square sum of effective similar distance differences of similar sample pairs can be calculated by using the following formula W intra=∑((Dintra_student[i]-Dintra_teacher[i])×mask[i])2, wherein D intra_student [ i ] represents a vector formed by sample distances corresponding to the similar sample pairs of the second network model, and D intra_teacher [ i ] represents a vector formed by sample distances corresponding to the similar sample pairs of the first network model; I is more than or equal to 1 and less than or equal to N intra,Nintra, the number of the same kind of sample pairs, and mask [ i ] is a mask of the same kind of sample pairs.
Step B2, dividing the square sum of the effective similar distance differences by the number of similar sample pairs to obtain similar sample distillation loss values;
The distillation loss value of the same sample is as follows:
Step B3, calculating the square sum of effective heterogeneous distance differences of the first network model and the second network model based on heterogeneous sample pairs in the sample pair update set;
In the embodiment, the square sum of effective similar distance differences of the heterogeneous sample pairs can be calculated by using the following formula, wherein D intra_student [ j ] represents a vector formed by the sample distances corresponding to the heterogeneous sample pairs of the second network model, and D intra_teacher [ j ] represents a vector formed by the sample distances corresponding to the heterogeneous sample pairs of the first network model; and j is more than or equal to 1 and less than or equal to N inter,Ninter, the number of heterogeneous sample pairs, and mask [ j ] is a mask of the same type of sample pairs.
And B4, dividing the square sum of the effective heterogeneous distance differences by the number of the heterogeneous sample pairs to obtain a heterogeneous sample distillation loss value.
The distillation loss values of the heterogeneous samples are as follows:
step S410, determining a total loss function value of the second network model based on the similar sample distillation loss value and the heterogeneous sample distillation loss value;
The total loss function value can be expressed by the following formula:
L distill=αLintra+βLinter, wherein alpha represents a weight value corresponding to a similar sample distillation loss value, and beta represents a weight value corresponding to a heterogeneous sample distillation loss value. The two weight values can be adjusted according to actual needs to highlight the importance degree of the distillation loss value of the same type of sample and the distillation loss value of different types of samples on the total loss function value.
And step S412, if the total loss function value is larger than the preset value, updating the parameters of the second network model by using the total loss function value, and continuing the next iteration training of the updated second network model by using the sample pair set until the total loss function value is converged to the preset value, thereby obtaining the trained second network model.
According to the network model training method provided by the embodiment of the invention, the first prediction result of the first network model for the first sample pair and the second prediction result of the second network model for the first sample pair can be obtained, when the second prediction result is superior to the first prediction result, the sample pair set is updated and deleted to obtain the sample pair updated set, the square sum of the effective similar distance differences and the square sum of the effective heterogeneous distance differences of the first network model and the second network model are calculated respectively based on the similar sample pair and the heterogeneous sample pair in the sample pair updated set, and the similar sample distillation loss value and the heterogeneous sample distillation loss value are obtained according to the number of the similar sample pairs and the number of the heterogeneous sample pairs, so that the total loss function value of the second network model is determined, and in the training process, the training of the second network model is stopped when the obtained total loss function value converges to a preset value. According to the method, the learned sample pairs of the second network model are filtered, so that the second network model can concentrate on the sample pairs which need to be learned poorly, and the output result of the trained second network model can be close to the output result of the first network model, and a good training effect is achieved.
Further, in order to fully understand the above network model training method, fig. 5 shows a flowchart of another network model training method, and as shown in fig. 5, the network model training method includes the following steps:
Step S500, extracting first sample characteristics of a first similar sample pair by using a first network model;
Step S501, calculating the Euclidean distance of a first similar sample based on the first sample characteristics and carrying out normalization processing to obtain a first prediction result;
step S502, based on the second sample characteristics, calculating the Euclidean distance of the first similar sample and carrying out normalization processing to obtain a second prediction result;
step S503, if the second predicted result is better than the first predicted result, filtering the same kind of samples in the sample pair set to obtain a sample pair update set;
Step S504, determining similar sample distillation loss values based on the sample pair update set;
Step S505, extracting third sample characteristics of the first heterogeneous sample pair by using the first network model, and extracting fourth sample characteristics of the first heterogeneous sample pair by using the second network model;
Step S506, calculating the Euclidean distance of the first heterogeneous sample pair based on the third sample characteristic and carrying out normalization processing to obtain a first prediction result;
Step S507, calculating the Euclidean distance of the first heterogeneous sample pair based on the fourth sample characteristic and carrying out normalization processing to obtain a second prediction result;
Step S508, if the second predicted result is better than the first predicted result, filtering heterogeneous samples in the sample pair set to obtain a sample pair update set;
step S509, determining a heterogeneous sample distillation loss value based on the sample pair update set;
Step S510, determining a total loss function value of the second network model based on the similar sample distillation loss value and the heterogeneous sample distillation loss value;
and S511, if the total loss function value is larger than the preset value, updating the parameters of the second network model by using the total loss function value, and continuing the next iteration training of the updated second network model by using the sample pair set until the total loss function value is converged to the preset value, thereby obtaining the trained second network model.
The steps S500 to S504 are processes for updating the similar sample set and calculating the loss value, and the steps S505 to S509 are processes for updating the heterogeneous sample set and calculating the loss value, and thus the execution order of the two processes may be exchanged or executed in parallel, which is not limited herein.
According to the network model training method provided by the embodiment of the invention, no matter the first sample pair is the first similar sample or the first heterogeneous sample, when the second predicted result of the second network model is better than the first predicted result of the first network model, the sample pair predicted by the subsection is filtered out from the sample pair set, and the rest of the second network model predicted result is not as good as the sample pair predicted by the first network model, so that the second network model is continuously trained until the total loss function value is converged to a preset value, the iterative training of the second network model is stopped, and the method for filtering out negative knowledge can enable the second network model to learn better the predicted sample worse than the first network model, so that the accuracy of the second network model is improved.
The second network model can be used for processing such as target identification and detection in the image, and the trained second network model improves the accuracy of target identification and detection of the second network model to a certain extent. Based on this, the embodiment of the invention also provides an image target recognition method, which is applied to the electronic device, referring to a flowchart of the image target recognition method shown in fig. 8, and the method comprises the following steps:
step S802, receiving an image to be identified;
The image to be identified in this embodiment may be an image acquired by an image acquisition device (such as a camera), and the image acquisition device may be a device such as a camera or a camera disposed in a public place, or may be a device such as a camera or a camera disposed in a specific place.
The image to be identified can be obtained from a third party device besides the image acquisition device, and the third party device can provide the acquired original image for the electronic device, or can provide the image for the electronic device after the image is filtered or screened.
And step S804, processing the image to be identified by using a second network model, and outputting a target identification result, wherein the second network model is a second network model which is obtained by training in advance through the network model training method provided by the embodiment.
In the image target recognition method, the second network model trained by the embodiment is applied to target recognition, and then a target recognition result is obtained. The method comprises the steps of training a second network model by applying a sample pair set identical to a first network model in a training process, obtaining a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair when the second network model completes each round of iterative training, deleting the first sample pair from the sample pair set to obtain a sample pair updated set if the second prediction result is superior to the first prediction result, determining a total loss function value of the first network model based on the sample pair updated set, updating parameters of the second network model by applying the total loss function value if the total loss function value is greater than a preset value, and continuing the next round of iterative training of the updated second network model by applying the sample pair set until the total loss function value converges to the preset value, so as to obtain the trained second network model. According to the method, the learned good samples of the second network model are removed, so that the second network model learns samples which are worse than the first network model more intensively, and the learned good samples are not biased, so that the performance of the trained second network model is ensured, and the accuracy of target identification and detection by applying the second network model is higher.
Corresponding to the above embodiment of the network model training method, the embodiment of the present invention provides a network model training device, where the device is applied to a server, fig. 6 shows a schematic structural diagram of network model training, and as shown in fig. 6, the device includes:
A training module 602, configured to train a second network model by applying the same set of sample pairs as the first network model, where the computation amount of the first network model is greater than the computation amount of the second network model;
An obtaining module 604, configured to obtain, when the second network model completes the iterative training of the present round, a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair;
A deleting module 606, configured to delete the first sample pair from the sample pair set if the second predicted result is better than the first predicted result, to obtain a sample pair update set;
a determination module 608 for determining an overall loss function value for the second network model based on the sample pair update set;
And the iteration module 610 is configured to apply the total loss function value to update parameters of the second network model if the total loss function value is greater than the preset value, and apply the sample pair set to continue the next iteration training on the updated second network model until the total loss function value converges to the preset value, so as to obtain a trained second network model.
The embodiment of the application provides a network model training device, wherein a second network model is trained by applying a sample pair set which is the same as a first network model, when the second network model completes the iterative training of a round, a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair are obtained, if the second prediction result is better than the first prediction result, the first sample pair is deleted from the sample pair set to obtain a sample pair update set, the total loss function value of the first network model is determined based on the sample pair update set, if the total loss function value is larger than a preset value, the parameter of the second network model is updated by applying the total loss function value, and the next iterative training of the updated second network model by applying the sample pair set is continued until the total loss function value converges to the preset value, so as to obtain a trained second network model. According to the method, good samples of the second network model are removed, so that the second network model learns samples which are worse than the first network model more intensively, and the good samples of the second network model are not biased, so that the accuracy of the second network model is improved to a certain extent.
The obtaining module 604 is further configured to calculate a first feature distance of the first network model for the first sample pair, calculate a second feature distance of the second network model for the first sample pair, calculate a first feature distance mean value of the first sample pair based on the first network model, calculate a second feature distance mean value of the first sample pair based on the second network model, wherein the sample pair includes a similar sample or a heterogeneous sample, normalize the first feature distance by using the first feature distance mean value, normalize the first feature distance by using the first feature distance normalization result as a first prediction result of the first network model for the first sample pair, normalize the second feature distance by using the second feature distance mean value, and use the second feature distance normalization result as a second prediction result of the second network model for the first sample pair.
The feature distance is Euclidean distance or cosine similarity corresponding to the features of the first sample pair.
Based on the above-mentioned network model training device, the embodiment of the present invention further provides another network model training device, referring to the schematic structural diagram of the network model training device shown in fig. 7, where the network model training device further includes a comparison module 702 connected to both the acquisition module 604 and the deletion module 606, and is configured to determine, based on the first sample pair, whether the second prediction result is better than the first prediction result by comparing the first prediction result and the second prediction result, where the comparison module 702 is connected to both the acquisition module 604 and the deletion module 606.
The comparison module 702 is further configured to subtract the second prediction result from the first prediction result to obtain a first difference if the first sample is similar to the belonging sample, and determine that the second prediction result is better than the first prediction result if the first difference is greater than 0 or greater than a preset first positive value.
The comparison module 702 is further configured to calculate a second difference value by subtracting the second prediction result from the first prediction result if the first sample is a heterogeneous sample for the belonging sample class, and determine that the second prediction result is better than the first prediction result if the second difference value is less than 0 or less than a preset second negative value.
The determining module 608 is further configured to determine a homogeneous sample distillation loss value and a heterogeneous sample distillation loss value of the second network model based on the updated set of sample pairs, and determine a total loss function value of the second network model based on the homogeneous sample distillation loss value and the heterogeneous sample distillation loss value.
The determining module 608 is further configured to calculate a sum of squares of effective similar distance differences between the first network model and the second network model based on similar pairs of samples in the updated set of sample pairs, divide the sum of squares of effective similar distance differences by the number of similar pairs of samples to obtain a similar sample distillation loss value, calculate a sum of squares of effective dissimilar distance differences between the first network model and the second network model based on dissimilar pairs of samples in the updated set of sample pairs, and divide the sum of squares of effective dissimilar distance differences by the number of dissimilar pairs of samples to obtain a heterogeneous sample distillation loss value.
The determining module 608 is further configured to calculate a sum of squares W intra=∑((Dintra_student[i]-Dintra_teacher[i])×mask[i])2 of the effective homogeneous distance differences between the first network model and the second network model, where D intra_student [ i ] represents a vector formed by a homogeneous sample pair of the second network model and D intra_teacher [ i ] represents a vector formed by a homogeneous sample pair of the first network model and a corresponding sample distance; I is more than or equal to 1 and less than or equal to N intra,Nintra, the number of the same kind of sample pairs, and mask [ i ] is a mask of the same kind of sample pairs.
The determining module 608 is further configured to calculate a sum of squares W inter=∑((Dintra_student[j]-Dintra_teacher[j])×mask[j])2 of the effective differential distances between the first network model and the second network model, where D intra_student [ j ] represents a vector formed by the differential sample pair of the second network model and D intra_teacher [ j ] represents a vector formed by the differential sample pair of the first network model; and j is more than or equal to 1 and less than or equal to N inter,Ninter, the number of heterogeneous sample pairs, and mask [ j ] is a mask of the same type of sample pairs.
The network model training device provided by the embodiment of the invention has the same technical characteristics as the network model training method provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
Corresponding to the image target recognition method, the embodiment of the invention also provides an image target recognition device, which is applied to electronic equipment, and is shown in a schematic diagram of the image target recognition device in fig. 9, and comprises an image receiving module 92 for receiving an image to be recognized; the image processing module 94 is configured to process the image to be identified by using a second network model, and output a target identification result, where the second network model is a second network model that is obtained by training in advance by using the above network model training method.
In the image target recognition device, the second network model trained by the embodiment is applied to perform target recognition, and then a target recognition result is obtained. The method comprises the steps of training a second network model by applying a sample pair set identical to a first network model in a training process, obtaining a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair when the second network model completes each round of iterative training, deleting the first sample pair from the sample pair set to obtain a sample pair updated set if the second prediction result is superior to the first prediction result, determining a total loss function value of the first network model based on the sample pair updated set, updating parameters of the second network model by applying the total loss function value if the total loss function value is greater than a preset value, and continuing the next round of iterative training of the updated second network model by applying the sample pair set until the total loss function value converges to the preset value, so as to obtain the trained second network model. According to the method, the learned good samples of the second network model are removed, so that the second network model learns samples which are worse than the first network model more intensively, and the learned good samples are not biased, so that the performance of the trained second network model is ensured, and the accuracy of target identification and detection by applying the second network model is higher.
The present embodiment also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processing device performs the above-described network model training method or performs the above-described image object recognition method.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the electronic devices, apparatuses and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
The embodiment of the invention provides a network model training method, an image target recognition device and a computer program product of electronic equipment, which comprise a computer readable storage medium storing program codes, wherein the program codes comprise instructions for executing the method described in the embodiment of the method, and specific implementation can be seen in the embodiment of the method and will not be repeated here.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
It should be noted that the foregoing embodiments are merely illustrative embodiments of the present invention, and not restrictive, and the scope of the invention is not limited to the embodiments, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features of the embodiments described in the foregoing embodiments may be easily contemplated within the scope of the present invention, and the spirit and scope of the technical solutions of the embodiments do not depart from the spirit and scope of the embodiments of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.