CN118365990A

CN118365990A - Model training method and device applied to contraband detection and electronic equipment

Info

Publication number: CN118365990A
Application number: CN202410788771.5A
Authority: CN
Inventors: 李林超; 权家新; 周凯; 曹俐莉; 靳宗振; 曾毅
Original assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Current assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority date: 2024-06-19
Filing date: 2024-06-19
Publication date: 2024-07-19
Anticipated expiration: 2044-06-19
Also published as: CN118365990B

Abstract

The invention discloses a model training method and device applied to contraband detection and electronic equipment, and relates to the technical field of target detection, wherein the method comprises the following steps: based on an initial detection model to be trained, respectively carrying out target detection on the contraband image and the background image to obtain a prediction result of the contraband image and a prediction result of the background image; determining the difficulty of recognizing the prohibited article image according to the predicted result of the prohibited article image and the labeling result of the prohibited article image, and determining the difficulty of recognizing the background image according to the predicted result of the background image and the labeling result of the background image; fusing prohibited article images and background images with different identification difficulties to obtain a sample image, and fusing the labeling result of the prohibited article images and the labeling result of the background image to obtain the labeling result of the sample image; training the initial detection model by adopting the sample image and the labeling result of the sample image to obtain the contraband detection model. The model training speed is high, and the false detection rate is low.

Description

Model training method and device applied to contraband detection and electronic equipment

Technical Field

The present invention relates to the field of target detection technologies, and in particular, to a method and an apparatus for training a model applied to contraband detection, and an electronic device.

Background

With the rapid development of the express industry, in order to ensure public safety, it is required to detect whether the express package contains contraband, such as a gun and a knife.

In the related art, an image to be detected is processed based on a target detection model, and a detection frame, a category and a confidence of a detected object are obtained. In the model training process, an image to be trained is generally input into a preset model to obtain a prediction frame, a prediction category and a confidence coefficient, a loss value is determined based on the prediction frame, the annotation frame, the prediction category and the annotation category and the confidence coefficient, and preset model parameters are adjusted according to the loss value to obtain a target detection model.

However, the target detection model has a high false-positive rate and a slow model training speed.

Disclosure of Invention

The invention provides a model training method and device applied to contraband detection and electronic equipment, and aims to solve the problems of high model false analysis rate and low model training speed in the related technology.

According to an aspect of the present invention, there is provided a model training method applied to contraband detection, comprising:

Based on an initial detection model to be trained, respectively carrying out target detection on the contraband image and the background image to obtain a prediction result of the contraband image and a prediction result of the background image;

Determining the difficulty of recognizing the prohibited article image according to the predicted result of the prohibited article image and the labeling result of the prohibited article image, and determining the difficulty of recognizing the background image according to the predicted result of the background image and the labeling result of the background image;

Fusing prohibited article images and background images with different identification difficulties to obtain a sample image, and fusing the labeling result of the prohibited article images and the labeling result of the background image to obtain the labeling result of the sample image;

Training an initial detection model by adopting a sample image and a labeling result of the sample image to obtain a contraband detection model;

Wherein the initial detection model is determined according to the following manner:

Performing target detection on the contraband image based on the original detection model to obtain a prediction result of the contraband image; determining the recognition difficulty of the contraband image according to the prediction result of the contraband image and the labeling result of the contraband image;

fusing the prohibited article images with different identification difficulties to obtain an original sample image, and fusing the labeling results of the prohibited article images to obtain the labeling results of the original sample image;

And training the original detection model by adopting the original sample image and the labeling result of the original sample image to obtain an initial detection model.

According to another aspect of the present invention, there is provided a model training apparatus applied to contraband detection, comprising:

The prediction unit is used for respectively carrying out target detection on the contraband image and the background image based on an initial detection model to be trained to obtain a prediction result of the contraband image and a prediction result of the background image;

the difficulty determining unit is used for determining the difficulty of recognizing the prohibited article image according to the predicted result of the prohibited article image and the labeling result of the prohibited article image, and determining the difficulty of recognizing the background image according to the predicted result of the background image and the labeling result of the background image;

The sample determining unit is used for fusing the prohibited article images with different identification difficulties and the background images to obtain sample images, and fusing the labeling results of the prohibited article images and the labeling results of the background images to obtain the labeling results of the sample images;

The training unit is used for training the initial detection model by adopting the sample image and the labeling result of the sample image to obtain a contraband detection model;

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the model training method for contraband detection of any of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the model training method for contraband detection according to any of the embodiments of the present invention when executed.

According to another aspect of the invention, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the model training method for contraband detection according to any of the embodiments of the invention.

According to the technical scheme, model training is divided into two stages, and in the first stage, only prohibited article images are adopted for model training, so that the training speed of the model can be accelerated, and the detection performance can be improved; in the second stage, the contraband images and the background images are mixed for training, and the model is guided to learn background knowledge, so that false detection can be effectively reduced. And moreover, the sample images obtained by image fusion based on the recognition difficulty are used for model training, so that the difference of the recognition difficulty between the sample images is reduced, the stability of a model loss value is maintained, the model is prevented from vibrating, and the model convergence is accelerated.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a model training method for contraband detection according to a first embodiment of the present invention;

FIG. 2 is a flow chart of another model training method for contraband detection according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a model training device for contraband detection according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device implementing a model training method applied to contraband detection according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "target," "original," "first," "second," "third," and the like in the description and claims of the present invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a model training method applied to contraband detection according to an embodiment of the present invention, where the embodiment is applicable to a model training scenario of contraband detection of express packages, and the method may be performed by a model training apparatus applied to contraband detection, and the apparatus may be implemented by software and/or hardware, and specifically configured in an electronic device. As shown in fig. 1, the method includes:

And step 101, respectively carrying out target detection on the contraband image and the background image based on an initial detection model to be trained to obtain a prediction result of the contraband image and a prediction result of the background image.

Where contraband images refer to images containing contraband. The contraband image may include at least one type of contraband and may include other items in addition to contraband.

Where the background image refers to an image that does not contain contraband. Other items than contraband may be included in the background image.

The prediction result may include a prediction type and a prediction frame of the detected target object.

Step 102, determining the difficulty of recognizing the prohibited article image according to the predicted result of the prohibited article image and the labeling result of the prohibited article image, and determining the difficulty of recognizing the background image according to the predicted result of the background image and the labeling result of the background image.

Considering that the prediction result is not similar to the labeling result, the difficulty of identifying the image by the characterization model is greater; the more similar the prediction result is to the labeling result, the less difficult the characterization model is to identify the image. Based on the concept, the recognition difficulty of the contraband image can be determined according to the prediction result of the contraband image and the labeling result of the contraband image, and the recognition difficulty of the background image can be determined according to the prediction result of the background image and the labeling result of the background image.

And 103, fusing the prohibited article images with different identification difficulties and the background images to obtain sample images, and fusing the labeling results of the prohibited article images and the labeling results of the background images to obtain the labeling results of the sample images.

In particular, the difficulty of recognition may be divided into a plurality of levels. The prohibited article images with different identification difficulties and the background images with different identification difficulties can be fused to obtain a sample image. And the labeling results of the contraband images with different recognition difficulties and the labeling results of the background images with different recognition difficulties can be fused to obtain the labeling results of the sample images.

Step 104, training an initial detection model by adopting a sample image and a labeling result of the sample image to obtain a contraband detection model; wherein, the initial detection model is determined according to the following mode: performing target detection on the contraband image based on the original detection model to obtain a prediction result of the contraband image; determining the recognition difficulty of the contraband image according to the prediction result of the contraband image and the labeling result of the contraband image; fusing the prohibited article images with different identification difficulties to obtain an original sample image, and fusing the labeling results of the prohibited article images to obtain the labeling results of the original sample image; and training the original detection model by adopting the original sample image and the labeling result of the original sample image to obtain an initial detection model.

The original detection model may be a model which has been trained to a certain extent.

The determination mode of the initial detection model is similar to that of the contraband detection model, and is not repeated.

According to the technical scheme provided by the embodiment of the invention, the model training is divided into two stages, and the first stage only adopts the contraband images to carry out the model training, so that the training speed of the model can be accelerated, and the detection performance can be improved; in the second stage, the contraband images and the background images are mixed for training, and the model is guided to learn background knowledge, so that false detection can be effectively reduced. And moreover, the sample images obtained by image fusion based on the recognition difficulty are used for model training, so that the difference of the recognition difficulty between the sample images is reduced, the stability of a model loss value is maintained, the model is prevented from vibrating, and the model convergence is accelerated.

Example two

Fig. 2 is a flowchart of another model training method applied to contraband detection according to the second embodiment of the present invention, in which steps 102 and 104 in the first embodiment are refined. As shown in fig. 2, the method includes:

step 201, based on an initial detection model to be trained, respectively performing target detection on the contraband image and the background image to obtain a prediction result of the contraband image and a prediction result of the background image.

Step 201 is similar to the principle and implementation of step 101, and will not be described again.

After step 201, step 202, or step 204, may be performed.

Step 202, determining the recognition difficulty of the third predicted target object based on the second predicted frame, the third predicted class, the second labeling frame and the third labeling class of the third predicted target object; and the second prediction frame and the third prediction category of the third prediction target object are obtained by processing the contraband image according to the initial detection model.

Considering that the second prediction frame is not similar to the second labeling frame, the difficulty of identifying the third prediction target object by the initial detection model is higher; the more similar the second prediction frame is to the second labeling frame, the smaller the identification difficulty of the initial detection model to the third prediction target object is represented. Similarly, if the third prediction category is dissimilar to the third labeling category, the greater the difficulty in identifying the third prediction target object by the initial detection model is represented; the more similar the third prediction category is to the third labeling category, the smaller the identification difficulty of the initial detection model to the third prediction target object is represented. Thus, based on this concept, the difficulty of recognition of the third predicted object may be determined based on the second prediction box, the third prediction category, the second annotation box, and the third annotation category of the third predicted object.

In one implementation, determining a second coordinate confidence level according to a second prediction frame and a second labeling frame of the third predicted target object; determining a third category confidence level according to a third prediction category and a third labeling category of a third prediction target object; and determining the recognition difficulty of the third predicted target object based on the second coordinate confidence and the third category confidence.

The second coordinate confidence is used for representing the similarity between a second prediction frame and a second labeling frame of the third prediction target object.

The third category confidence is used for representing the similarity between a third prediction category and a third labeling category of the third prediction target object.

Specifically, the greater the confidence of the second coordinate is considered, the smaller the recognition difficulty of the initial detection model on the third predicted target object is represented; the smaller the confidence of the second coordinate is, the greater the difficulty in identifying the third predicted target object by the initial detection model is. Similarly, the larger the confidence of the third category is, the smaller the recognition difficulty of the initial detection model on the third predicted target object is represented; the smaller the confidence of the third category is, the greater the difficulty in identifying the third predicted target object by the initial detection model is. The difficulty of recognition of the third predicted target may be determined based on the second coordinate confidence and the third category confidence based on the concepts described above.

Specifically, the identification difficulty of the third predicted target object can be conveniently and accurately determined based on the second predicted frame, the third predicted class, the second labeling frame and the third labeling class of the third predicted target object in the mode.

Optionally, if the second coordinate confidence coefficient is greater than the fourth confidence threshold, determining the recognition difficulty of the third predicted target object based on the second coordinate confidence coefficient and the third category confidence coefficient; and if the second coordinate confidence coefficient is smaller than or equal to the fourth confidence threshold value, determining the recognition difficulty of the third predicted target object based on the third category confidence coefficient.

The fourth confidence threshold is a value preset according to experience. For example, the fourth confidence threshold may be set to 0.2. It may be considered that the third predicted target is contraband if the second coordinate confidence is greater than the fourth confidence threshold. It may be considered that the third predicted target is not contraband if the second coordinate confidence is less than the fourth confidence threshold. If the third predicted target is not contraband, the second label box is empty.

Specifically, if the second coordinate confidence coefficient is greater than the fourth confidence threshold and the third prediction category is the same as the third labeling category, a difference between the confidence coefficient of 1 and the second coordinate confidence coefficient can be calculated, a difference between the confidence coefficient of 1 and the confidence coefficient of the third category can be calculated, and a larger difference in the two difference values is selected as the recognition difficulty of the third prediction target object.

If the confidence coefficient of the second coordinate is greater than the fourth confidence threshold and the third prediction category is different from the third labeling category, selecting the larger one of the following two values as the recognition difficulty of the third prediction target object: 1 and the second coordinate confidence level, and a third category confidence level.

If the second coordinate confidence coefficient is smaller than the fourth confidence threshold value, the third category confidence coefficient can be used as the recognition difficulty of the third prediction target object.

Specifically, the identification difficulty of the third predicted target object can be accurately determined through the mode.

Step 203, determining the difficulty of recognizing the contraband image based on the difficulty of recognizing the third predicted target object.

Step 203 may be followed by step 206.

Specifically, the recognition difficulty of the third predicted target object may be ranked according to the difficulty level, and the recognition difficulty may be ranked based on the ranking result. For example, the top 30% of the third predicted targets are targeted for difficulty, the last 40% are targeted for simplicity, and the others are targeted for more difficulty.

Counting the number of difficulty targets contained in each contraband image, and sorting based on the number of the difficulty targets, wherein the more the number of the difficulty targets is, the greater the difficulty of recognizing the contraband image is. Similarly, when the number of difficult targets is consistent, the number of more difficult targets may be compared. Based on this, the difficulty of recognition of the contraband image can be ranked. For example, the difficulty of identifying the contraband image may be classified into two levels.

Specifically, the method can accurately and conveniently determine the difficulty in recognizing the contraband image according to the predicted result of the contraband image and the labeling result of the contraband image.

Step 204, determining the recognition difficulty of the fourth predicted target object based on the third predicted class and the third labeling class of the fourth predicted target object; and processing the background image according to the initial detection model by a third prediction frame and a third prediction category of the fourth prediction target object.

The background image has no contraband, so that the labeling frame of the fourth prediction target object is empty. Therefore, the recognition difficulty of the fourth predicted target object can be determined based on the third predicted category and the third labeling category of the fourth predicted target object.

Considering that the more similar the third prediction category and the third labeling category are, the smaller the recognition difficulty of the fourth prediction target object is; the more dissimilar the third prediction category and the third labeling category are, the greater the difficulty in identifying the fourth prediction target object is. Thus, based on this concept, a similarity between the third prediction category and the third annotation category may be calculated, and the difficulty of recognition of the fourth prediction target may be determined based on the similarity.

Step 205, determining the recognition difficulty of the background image based on the recognition difficulty of the fourth prediction target object.

Specifically, the recognition difficulty of the fourth predicted target object may be ranked according to the difficulty level, and the recognition difficulty may be ranked based on the ranking result. For example, the top 30% of the fourth predicted targets are targeted for difficulty, the last 40% are targeted for simplicity, and the others are targeted for more difficulty.

Counting the number of difficulty targets contained in each background image, and sorting based on the number of the difficulty targets, wherein the more the number of the difficulty targets is, the greater the recognition difficulty of the background image is. Similarly, when the number of difficult targets is consistent, the number of more difficult targets may be compared. Based on this, the recognition difficulty of the background image can be ranked. For example, the difficulty of recognition of the background image may be classified into two levels.

Specifically, the method can accurately and conveniently determine the recognition difficulty of the background image according to the prediction result of the background image and the labeling result of the background image.

And 206, fusing the prohibited article images with different identification difficulties and the background images to obtain sample images, and fusing the labeling results of the prohibited article images and the labeling results of the background images to obtain the labeling results of the sample images.

Step 206 is similar to the implementation of step 103 and will not be described again.

Step 207, determining a first loss value of the contraband image in the sample image based on the labeling result of the contraband image in the sample image.

Specifically, the first loss value of the contraband image in the sample image can be calculated according to the prediction result and the labeling result of the contraband image in the sample image.

In one implementation, a first target loss value of the first predicted target is determined according to a first prediction category, a first prediction frame, a first annotation category, and a first annotation frame of the first predicted target; the method comprises the steps that a first prediction type and a first prediction frame of a first prediction target object are obtained by processing a contraband image in a sample image according to an initial detection model; and determining a first loss value of the contraband image in the sample image according to the first target object loss value.

Specifically, the first target object loss values of all the first predicted target objects included in the contraband image may be added to obtain the first loss value of the contraband image.

Based on the above mode, the first loss value of the contraband image can be conveniently determined.

Optionally, if the first coordinate confidence coefficient is greater than the first confidence threshold and the first prediction category is the same as the first labeling category, determining the target confidence coefficient according to the first category confidence coefficient and the first coordinate confidence coefficient; the first category confidence is determined according to the first prediction category and the first labeling category; the first coordinate confidence is determined according to the first prediction frame and the first annotation frame.

The first confidence threshold is a value preset according to experience. For example, the first confidence threshold may be set to 0.7. The first coordinate confidence level is greater than a first confidence threshold value, indicating that the accuracy of the first prediction frame prediction is higher.

Specifically, the target confidence level may be determined by the following formula:

；

Wherein, Representing the confidence level of the target; representing a first class confidence; Representing a first coordinate confidence level; s and t are preset super parameters.

If the identification accuracy of the first predicted target object meets the preset accuracy condition based on the target confidence, determining a first target object loss value of the first predicted target object based on the target confidence, the first coordinate loss and the first class loss; the first coordinate loss is determined according to the first prediction frame and the first annotation frame; the first class loss is determined from the first prediction class and the first annotation class.

Specifically, the higher the target confidence, the more accurate the model prediction is represented. In an actual scene, the situation that the target confidence coefficient is smaller due to the fact that manual calibration errors exist, the situation can be eliminated because the manual calibration errors are not worth model learning.

Specifically, the target confidence degrees can be ranked, the target confidence degrees of k% before ranking are reserved, and the rest target confidence degrees are discarded.

The first target loss value may be determined by the following formula:

；

Wherein, Representing a first target loss value when the first coordinate confidence is greater than a first confidence threshold and the first prediction category is the same as the first annotation category; Representing the confidence level of the target; representing a first class loss; representing a first coordinate loss. Wherein the first class loss may be a cross entropy loss. The first coordinate loss may be a regression loss.

Optionally, if the first coordinate confidence coefficient is greater than the first confidence threshold and the first prediction category is different from the first labeling category, determining a category reinforcement learning parameter according to the first category confidence coefficient and the confidence coefficient corresponding to the first labeling category; the first category confidence is determined according to the first prediction category and the first labeling category; the first coordinate confidence is determined according to the first prediction frame and the first annotation frame. Determining a first target class loss according to the class reinforcement learning parameter and the first class confidence; determining a first target object loss value of the first predicted target object according to the first target class loss and the second coordinate loss; the second coordinate loss is determined according to the first prediction frame and the first annotation frame.

Specifically, if the first coordinate confidence is greater than the first confidence threshold and the first prediction category is different from the first labeling category, it indicates that the coordinate prediction branch prediction capability of the model is better, but the classification prediction branch prediction capability is biased, so that the learning of the classification prediction branch of the model needs to be enlarged.

Specifically, the first target object loss value in the case that the confidence coefficient of the first coordinate is greater than the first confidence threshold value and the first prediction category is different from the first labeling category may be determined by the following formula.

；

Wherein,Representing a first target loss value in the case that the first coordinate confidence is greater than a first confidence threshold and the first prediction category is different from the first annotation category; representing a first class confidence; representing the confidence coefficient of the category which is output by the model and is the same as the first labeling category, namely the confidence coefficient corresponding to the first labeling category; Representing a first coordinate loss.

Optionally, if the first coordinate confidence level is greater than the second confidence threshold and less than the first confidence threshold, and the first prediction category is the same as the first labeling category, determining a category weakening learning parameter according to the first category confidence level and a preset parameter; wherein the first confidence threshold is greater than the second confidence threshold; the first category confidence is determined according to the first prediction category and the first labeling category; the first coordinate confidence is determined according to the first prediction frame and the first annotation frame;

determining a second target class loss according to the class weakening learning parameter and the first class confidence;

determining a first target object loss value of the first predicted target object according to the second target class loss and the third coordinate loss; the third coordinate loss is determined according to the first prediction frame and the first annotation frame.

Specifically, the second confidence threshold is a value that is empirically preset. For example, the second confidence threshold may be set to 0.3. If the confidence coefficient of the first coordinate is larger than the second confidence threshold and smaller than the first confidence threshold, and the first prediction category is the same as the first labeling category, the model coordinate prediction branch prediction capability can be characterized to be poor, and the classification prediction branch prediction capability is good. Since there are many cases in the actual model training, if learning of the coordinate prediction branch is enlarged, the model is excessively focused on the coordinate prediction branch, and thus learning of the classification prediction branch can be weakened.

Specifically, the first target loss value in the case where the first coordinate confidence is greater than the second confidence threshold and less than the first confidence threshold and the first prediction category is the same as the first labeling category may be determined by the following formula.

；

Wherein,Representing a first target loss value if the first coordinate confidence is greater than the second confidence threshold and less than the first confidence threshold, and the first prediction category is the same as the first annotation category; representing a first class confidence; Representing preset parameters; Representing a third coordinate loss.

Optionally, if the first coordinate confidence is smaller than the second confidence threshold, determining the recognition difficulty of the first predicted target object based on the first class confidence;

and determining a first target object loss value of the first predicted target object according to the identification difficulty of the first predicted target object and the first class confidence.

Specifically, if the confidence coefficient of the first coordinate is smaller than the second confidence threshold, the first prediction frame is characterized as not being matched with the first labeling frame, the first labeling frame is empty, and the first prediction target object is not prohibited. In this case, a first target loss value may be determined based on the first class confidence. The specific implementation is similar to the method described in step 208, and will not be repeated.

Step 208, determining a second loss value of the background image in the sample image based on the labeling result of the background image in the sample image.

Specifically, the second loss value of the background image in the sample image may be calculated according to the prediction result and the labeling result of the background image in the sample image.

In one implementation, a second class confidence is determined based on a second predicted class and a second labeled class of a second predicted target; determining the recognition difficulty of the second predicted target object according to the second category confidence; the second prediction category of the second prediction target object is obtained by processing a background image in the sample image according to the initial detection model; determining a second target object loss value of the second predicted target object according to the recognition difficulty of the second predicted target object and the second class confidence; and determining a second loss value of the background image in the sample image according to the second target object loss value.

Specifically, the recognition difficulty of the second predicted target object may be ranked according to the difficulty level, and the recognition difficulty may be ranked based on the ranking result. For example, the top 40% of the fourth predicted target is taken as the difficult negative sample, and the last 60% is taken as the other negative sample.

The second loss value may be determined by the following formula.

；

Wherein,Representing a second loss value; p represents a second class confidence; Indicating a preset super parameter.

Specifically, the loss value of the negative sample of the difficult case can be set to be larger than the loss values of other negative samples through the formula, so that the learning of the negative sample of the difficult case is enhanced.

Specifically, the second target object loss values of all the second predicted target objects included in the background image may be added to obtain a second loss value.

By the mode, the second loss value can be accurately and conveniently determined.

Step 209, determining a target loss value according to the first loss value and the second loss value, and training an initial detection model based on the target loss value to obtain a contraband detection model; wherein, the initial detection model is determined according to the following mode: performing target detection on the contraband image based on the original detection model to obtain a prediction result of the contraband image; determining the recognition difficulty of the contraband image according to the prediction result of the contraband image and the labeling result of the contraband image; fusing the prohibited article images with different identification difficulties to obtain an original sample image, and fusing the labeling results of the prohibited article images to obtain the labeling results of the original sample image; and training the original detection model by adopting the original sample image and the labeling result of the original sample image to obtain an initial detection model.

Specifically, an average value of the first loss value and the second loss value may be calculated, and the average value may be taken as the target loss value.

Example III

Fig. 3 is a schematic structural diagram of a model training device for contraband detection according to a third embodiment of the present invention. As shown in fig. 3, the apparatus 300 includes:

the prediction unit 310 is configured to perform target detection on the contraband image and the background image based on an initial detection model to be trained, so as to obtain a prediction result of the contraband image and a prediction result of the background image;

The difficulty determining unit 320 is configured to determine a difficulty of recognizing the contraband image according to the prediction result of the contraband image and the labeling result of the contraband image, and determine a difficulty of recognizing the background image according to the prediction result of the background image and the labeling result of the background image;

The sample determining unit 330 is configured to fuse the contraband images and the background images with different recognition difficulties to obtain a sample image, and fuse the labeling result of the contraband images and the labeling result of the background images to obtain the labeling result of the sample image;

The training unit 340 is configured to train the initial detection model by using the sample image and the labeling result of the sample image, so as to obtain a contraband detection model;

Wherein, the initial detection model is determined according to the following mode:

The training unit 340 is specifically configured to determine a first loss value of the contraband image in the sample image based on a labeling result of the contraband image in the sample image; and determining a second loss value of the background image in the sample image based on the labeling result of the background image in the sample image,

And determining a target loss value according to the first loss value and the second loss value, and training an initial detection model based on the target loss value to obtain a contraband detection model.

The training unit 340 is specifically configured to determine a first target object loss value of the first predicted target object according to the first prediction category, the first prediction frame, the first labeling category, and the first labeling frame of the first predicted target object; the method comprises the steps that a first prediction type and a first prediction frame of a first prediction target object are obtained by processing a contraband image in a sample image according to an initial detection model;

And determining a first loss value of the contraband image in the sample image according to the first target object loss value.

The training unit 340 is specifically configured to determine the target confidence level according to the first category confidence level and the first coordinate confidence level if the first coordinate confidence level is greater than the first confidence threshold value and the first prediction category is the same as the first labeling category; the first category confidence is determined according to the first prediction category and the first labeling category; the first coordinate confidence is determined according to the first prediction frame and the first annotation frame;

The training unit 340 is specifically configured to determine a category reinforcement learning parameter according to the first category confidence level and the confidence level corresponding to the first labeling category if the first coordinate confidence level is greater than the first confidence threshold and the first prediction category is different from the first labeling category; the first category confidence is determined according to the first prediction category and the first labeling category; the first coordinate confidence is determined according to the first prediction frame and the first annotation frame;

Determining a first target class loss according to the class reinforcement learning parameter and the first class confidence; determining a first target object loss value of the first predicted target object according to the first target class loss and the second coordinate loss; the second coordinate loss is determined according to the first prediction frame and the first annotation frame.

The training unit 340 is specifically configured to determine a category weakening learning parameter according to the first category confidence level and a preset parameter if the first coordinate confidence level is greater than the second confidence threshold and less than the first confidence threshold, and the first prediction category is the same as the first labeling category; wherein the first confidence threshold is greater than the second confidence threshold; the first category confidence is determined according to the first prediction category and the first labeling category; the first coordinate confidence is determined according to the first prediction frame and the first annotation frame;

The training unit 340 is specifically configured to determine, based on the first class confidence, a difficulty of identifying the first predicted target object if the first coordinate confidence is less than the second confidence threshold;

The training unit 340 is specifically configured to determine a second class confidence according to a second prediction class and a second labeling class of the second predicted target object; determining the recognition difficulty of the second predicted target object according to the second category confidence; the second prediction category of the second prediction target object is obtained by processing a background image in the sample image according to the initial detection model;

Determining a second target object loss value of the second predicted target object according to the recognition difficulty of the second predicted target object and the second class confidence;

And determining a second loss value of the background image in the sample image according to the second target object loss value.

The difficulty determining unit 320 is specifically configured to determine the recognition difficulty of the third predicted target object based on the second prediction frame, the third prediction category, the second labeling frame, and the third labeling category of the third predicted target object; the second prediction frame and the third prediction category of the third prediction target object are obtained by processing the contraband image according to the initial detection model;

And determining the recognition difficulty of the contraband image based on the recognition difficulty of the third prediction target object.

The difficulty determining unit 320 is specifically configured to determine a second coordinate confidence according to the second prediction frame and the second labeling frame of the third predicted target object;

determining a third category confidence level according to a third prediction category and a third labeling category of a third prediction target object;

And determining the recognition difficulty of the third predicted target object based on the second coordinate confidence and the third category confidence.

The difficulty determining unit 320 is specifically configured to determine, if the second coordinate confidence coefficient is greater than the fourth confidence threshold, a recognition difficulty of the third predicted target object based on the second coordinate confidence coefficient and the third category confidence coefficient;

and if the second coordinate confidence coefficient is smaller than or equal to the fourth confidence threshold value, determining the recognition difficulty of the third predicted target object based on the third category confidence coefficient.

The difficulty determining unit 320 is specifically configured to determine the recognition difficulty of the fourth predicted target object based on the third predicted category and the third labeling category of the fourth predicted target object; the background image is processed according to the initial detection model to obtain a third prediction frame and a third prediction category of a fourth prediction target object;

And determining the recognition difficulty of the background image based on the recognition difficulty of the fourth predicted target object.

The model training device applied to contraband detection provided by the embodiment of the invention can execute the model training method applied to contraband detection provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the model training method applied to contraband detection.

Example IV

Fig. 4 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as model training methods applied to contraband detection.

In some embodiments, any of the above-described model training methods applied to contraband detection may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of any of the model training methods described above as being applied to contraband detection may be performed. Alternatively, in other embodiments, processor 11 may be configured by any other suitable means (e.g., by means of firmware) to perform any of the model training methods described above as being applied to contraband detection.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A model training method for contraband detection, comprising:

2. The method of claim 1, wherein training the initial detection model using the sample image and the labeling result of the sample image to obtain the contraband detection model comprises:

determining a first loss value of the contraband image in the sample image based on the labeling result of the contraband image in the sample image; and determining a second loss value of the background image in the sample image based on the labeling result of the background image in the sample image,

And determining a target loss value according to the first loss value and the second loss value, and training the initial detection model based on the target loss value to obtain a contraband detection model.

3. The method of claim 2, wherein determining the first loss value of the contraband image in the sample image based on the labeling of the contraband image in the sample image comprises:

determining a first target object loss value of the first predicted target object according to the first predicted category, the first predicted frame, the first labeling category and the first labeling frame of the first predicted target object; the method comprises the steps that a first prediction type and a first prediction frame of a first prediction target object are obtained by processing a contraband image in a sample image according to an initial detection model;

4. The method of claim 3, wherein determining the first target loss value for the first predicted target based on the first prediction category, the first prediction box, the first annotation category, and the first annotation box for the first predicted target comprises:

If the first coordinate confidence coefficient is larger than a first confidence threshold value and the first prediction category is the same as the first labeling category, determining a target confidence coefficient according to the first category confidence coefficient and the first coordinate confidence coefficient; the first category confidence is determined according to the first prediction category and the first labeling category; the first coordinate confidence is determined according to the first prediction frame and the first annotation frame;

5. The method of claim 3, wherein determining the first target loss value for the first predicted target based on the first prediction category, the first prediction box, the first annotation category, and the first annotation box for the first predicted target comprises:

If the first coordinate confidence coefficient is larger than a first confidence threshold value and the first prediction category is different from the first labeling category, determining a category reinforcement learning parameter according to the first category confidence coefficient and the confidence coefficient corresponding to the first labeling category; the first category confidence is determined according to the first prediction category and the first labeling category; the first coordinate confidence is determined according to the first prediction frame and the first annotation frame;

6. The method of claim 3, wherein determining the first target loss value for the first predicted target based on the first prediction category, the first prediction box, the first annotation category, and the first annotation box for the first predicted target comprises:

if the first coordinate confidence coefficient is larger than the second confidence threshold value and smaller than the first confidence threshold value, and the first prediction category is the same as the first labeling category, determining category weakening learning parameters according to the first category confidence coefficient and preset parameters; wherein the first confidence threshold is greater than the second confidence threshold; the first category confidence is determined according to the first prediction category and the first labeling category; the first coordinate confidence is determined according to the first prediction frame and the first annotation frame;

7. The method of claim 3, wherein determining the first target loss value for the first predicted target based on the first prediction category, the first prediction box, the first annotation category, and the first annotation box for the first predicted target comprises:

If the first coordinate confidence coefficient is smaller than the second confidence threshold value, determining the recognition difficulty of the first predicted target object based on the first class confidence coefficient;

8. The method according to claim 2, wherein determining the second loss value of the background image in the sample image based on the labeling result of the background image in the sample image comprises:

determining a second category confidence according to a second prediction category and a second labeling category of the second prediction target object; determining the recognition difficulty of the second predicted target object according to the second category confidence; the second prediction category of the second prediction target object is obtained by processing a background image in the sample image according to the initial detection model;

9. The method according to claim 1, wherein determining the difficulty of recognizing the contraband image according to the predicted result of the contraband image and the labeling result of the contraband image comprises:

Determining the recognition difficulty of a third predicted target object based on a second predicted frame, a third predicted category, a second labeling frame and a third labeling category of the third predicted target object; the second prediction frame and the third prediction category of the third prediction target object are obtained by processing the contraband image according to the initial detection model;

10. The method of claim 9, wherein determining the difficulty of recognition of the third predicted target based on the second prediction box, the third prediction category, the second annotation box, and the third annotation category of the third predicted target comprises:

Determining a second coordinate confidence coefficient according to a second prediction frame and a second labeling frame of the third prediction target object;

11. The method of claim 10, wherein determining the difficulty of recognition of the third predicted target based on the second coordinate confidence and the third category confidence comprises:

if the second coordinate confidence coefficient is larger than a fourth confidence threshold value, determining the recognition difficulty of the third predicted target object based on the second coordinate confidence coefficient and the third category confidence coefficient;

12. The method according to claim 1, wherein determining the difficulty of identifying the background image according to the prediction result of the background image and the labeling result of the background image comprises:

Determining the recognition difficulty of a fourth predicted target object based on a third predicted class and a third labeling class of the fourth predicted target object; the background image is processed according to the initial detection model to obtain a third prediction frame and a third prediction category of a fourth prediction target object;

and determining the recognition difficulty of the background image based on the recognition difficulty of the fourth prediction target object.

13. A model training device for contraband detection, comprising:

14. An electronic device, the electronic device comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the model training method of any of claims 1-12 for contraband detection.

15. A computer readable storage medium storing computer instructions for causing a processor to implement the model training method of any one of claims 1-12 for contraband detection when executed.

16. A computer program product comprising a computer program which, when executed by a processor, implements a model training method for contraband detection according to any of claims 1-12.