Disclosure of Invention
In order to solve the problems in the background art, the invention provides a cultural relic authenticity identification method based on knowledge distillation, which has the advantages of high detection speed and high accuracy.
The technical scheme for solving the problems is as follows: the cultural relic authenticity identification method based on knowledge distillation is characterized by comprising the following steps of:
step 1: before the cultural relics are displayed, fingerprint area images are collected, and a data set is made;
step 2: configuring a YOLOV3 network as a teacher network and configuring a YOLOV3-Tiny network as a student network;
and step 3: training YOLOV 3;
and 4, step 4: training YOLOV3-Tiny based on knowledge distillation;
and 5: after the cultural relics are recovered, the fingerprint area images are collected again to manufacture a test set;
step 6: and (3) identifying the authenticity of the cultural relic by using the trained YOLOV 3-Tiny.
Further, the step 1 specifically includes: before the cultural relics are shown, selecting an area on the cultural relics as a fingerprint area, acquiring RGB images of the fingerprint area from multiple angles by using a high-precision camera under different illumination conditions, marking the fingerprint area in the images by using a marking tool, manufacturing a data set, randomly selecting a part of images and marking files thereof as a training set, and taking the rest parts of the images as a verification set.
Further, in step 2, the yolo layer with prediction scale of 52 × 52 of YOLOV3 is deleted, and only the predictions of 13 × 13 and 26 × 26 are retained; aiming at the training set obtained in the step 1, 6 anchors are calculated by using a K-Means algorithm, and the anchors of the original Yolov3 and Yolov3-Tiny are replaced.
Further, the step 3 specifically includes: and (3) training the Yolov3 network model on the data set obtained in the step 1, and storing a weight file.
Further, in the step 4, the trained Yolov3 network is used as a teacher network, and the Yolov3-Tiny network is used as a student network; the two networks perform forward propagation on the input image in turn, resulting in outputs of scale 13 × 13 × c, 26 × 26 × c, respectivelyIs marked as outt(teacher) and outs(students); the error of Yolov3-Tiny was calculated according to the formulas (1) to (3):
LOSS=αT2·losssoft+(1-α)·losshard (1)
losshard=crossentropy(outs,Target) (3)
in the formula, losssoftFor soft target error, losshardFor a hard target error (i.e., the original error of the Yolov3 network), α is the adjustment losssoftAnd losshardT is the distillation temperature; target represents the original label of the data set, namely the hard Target; softmax () and cross () respectively represent the function value of softmax and the cross entropy value; to balance losssoftAnd losshardOrder of magnitude of (1), introduction coefficient beta1(ii) a In equation 2, the position, confidence, and category prediction values of the student and teacher networks are softened, and then the relative entropy is obtained as a soft target error.
Further, the step 5 specifically includes: and after the cultural relics are recovered, acquiring a plurality of images in the fingerprint area again to be used as a test set.
Further, the step 6 specifically includes: and (3) performing inference on the trained YOLOV3-Tiny on the test set obtained in the step (5) to obtain confidence values of the fingerprint regions, solving the average value of the confidence values of each fingerprint region, if the confidence value is greater than a set threshold value, determining that the fingerprint region is unchanged, and if the fingerprint region is not changed, determining that the cultural relic is a genuine relic.
The invention has the advantages that:
according to the cultural relic authenticity identification method based on knowledge distillation, the YOLOV3 which is good in accuracy and low in speed is used as a teacher network, the YOLOV3-Tiny which is poor in accuracy and high in speed is used as a student network, knowledge distillation is carried out, the softened target is used for supervising student network learning, under the condition that the original high detection speed of YOLOV3-Tiny is kept, the accuracy is greatly improved, the hardware resource occupation in the cultural relic identification process is reduced, the detection efficiency is improved, and the identification cost is saved.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1, a cultural relic authenticity identification method based on knowledge distillation comprises the following steps:
step 1: before the cultural relics are displayed, fingerprint area images are collected, and a data set is made;
step 2: configuring a YOLOV3 network as a teacher network and configuring a YOLOV3-Tiny network as a student network;
and step 3: YOLOV3 was trained on the training set;
and 4, step 4: training YOLOV3-Tiny based on knowledge distillation;
and 5: after the cultural relics are recovered, the fingerprint area images are collected again to manufacture a test set;
step 6: identifying the authenticity of the cultural relic by using the trained YOLOV 3-Tiny;
step 1: before the cultural relics are shown, fingerprint area images are collected and a data set is made.
Further, the step 1 specifically includes: before the cultural relics are shown, selecting an area on the cultural relics as a fingerprint area, acquiring RGB images of the fingerprint area from multiple angles by using a high-precision camera under different illumination conditions, marking the fingerprint area in the images by using a marking tool, manufacturing a data set, randomly selecting a part of images and marking files thereof as a training set, and taking the rest parts of the images as a verification set.
Further, in step 2, the yolo layer with prediction scale of 52 × 52 in YOLOV3 is deleted, only the predictions of 13 × 13 and 26 × 26 are retained, and the network model after deletion is shown in fig. 2. The number of classes of the modified YoloV3 network is m classes, the number of convolution kernel channels of the yolo layer is (m +1+4) × 3 and is marked as c, and the output size of the modified yolo layer is 13 × 13 × c and 26 × 26 × c. Aiming at the training set obtained in the step 1, 6 anchors are calculated by using a K-Means algorithm, and the anchors of the original Yolov3 and Yolov3-Tiny are replaced. The Yolov3-TINY network model is shown in FIG. 3.
Further, in step 3, setting an initial hyper-parameter of the YOLOV3 network, setting a maximum iteration number epochmax and a batch processing number batch, training on a training set, calculating precision ratio, recall ratio and mAP of the YOLOV3 on a verification set after each epoch is finished, and storing the precision ratio, the recall ratio and the mAP in a training log. After each training (reaching the maximum iteration frequency epochmax), the coefficients of the position error function, the confidence coefficient error function and the classification error function are adjusted according to the training log, the training results with high precision ratio, recall ratio and mAP are finally obtained, and the weight file is saved.
Further, in step 4, the trained YOLOV3 network is used as a teacher network, and the YOLOV3-Tiny network is used as a student network. The two networks perform forward propagation on the input image in turn, resulting in outputs with scales of 13 × 13 × c, 26 × 26 × c, respectively denoted as out, as shown in fig. 4t(teacher) and outs(students). Backward propagation and updating of weights is only done for the YOLOV3-Tiny networkHeavy, the YOLOV3 network does not update the weights, only performs forward inference. The back propagation error includes two components, as shown in equations 1-3.
LOSS=αT2·losssoft+(1-α)·losshard (1)
losshard=crossentropy(outs,Target) (3)
In the formula, losssoftFor soft target error, losshardFor a hard target error (i.e., the original error of the Yolov3 network), α is the adjustment losssoftAnd losshardT is the distillation temperature. Target represents the original annotation of the data set, i.e. the hard Target. softmax () and cross () represent the softmax function value and cross entropy value, respectively. To balance losssoftAnd losshardOrder of magnitude of (1), introduction coefficient beta1. In equation 2, the position, confidence, and category prediction values of the student and teacher networks are softened, and then the relative entropy is obtained as a soft target error.
Setting the maximum iteration times and the batch processing number, training on a training set, calculating precision ratio, recall ratio and mAP of YOLOV3-Tiny on a verification set after each epoch is finished, and storing the precision ratio, the recall ratio and the mAP in a training log. After each training is finished, the hyperparameter of YOLOV3-Tiny is adjusted according to the training log, the training result with higher precision ratio, recall ratio and mAP is finally obtained, and the weight file is saved.
Further, in the step 5, after the cultural relics are recovered, a plurality of images are collected again in the fingerprint area at the position m to form a test set for identifying the authenticity of the cultural relics.
Further, step 6 uses the YOLOV3-Tiny network trained in step 4 to perform an inference process on the test set, and obtains a confidence value of each image fingerprint region. And counting the confidence values of the fingerprint areas at the same position, calculating the average value of the confidence values, and if the m average values are all larger than a set threshold value, judging that the cultural relic is a genuine product.
Example (b):
in the embodiment, 5 fingerprint areas are known before the cultural relics are displayed, and the sizes of the fingerprint areas are all 5mm by 5mm2. Under different illumination conditions, 500 RGB images are respectively collected from 5 fingerprint areas at multiple angles by using an EOS 7D Mark II camera with an MP-E65 mm f/2.81-5X lens, the resolution is 5472 multiplied by 3648 pixels, and 2500 images are obtained in total. And marking out the fingerprint area in the image by using a marking tool labelimg to obtain an XML file corresponding to each image. 2300 images and the labeled files thereof are randomly selected as a training set, and the rest are used as a verification set.
Step 2: configuring a YOLOV3 network as a teacher network and configuring a YOLOV3-Tiny network as a student network; modifying the network model and parameters;
in this embodiment, a deep learning development environment needs to be configured, the CPU is i79700K, the GPU is nvidiageforcetrtx 2080, the operating system is Ubuntu 16.04LTS, CUDA10.0, and the deep learning framework is Pytorch.
Modifying a Yolov3 network structure, wherein the concrete contents comprise: 1) deleting a yolo layer with the dimension of 52 multiplied by 52, and deleting a corresponding convolutional layer, an upper sampling layer and a route layer in the cfg configuration file; 1) modifying the number of anchors to be 6, modifying num to be 6 in the cfg configuration file, recalculating 6 anchors on the training set by using a K-Means algorithm and replacing the original values; 1) the modified classification number is 5, the modified classes in the cfg configuration file is 5, the number of channels of the yolo layer is (5+1+4) × 3, that is, 30, and the modified yolo layer size is 13 × 13 × 30, 26 × 26 × 30.
Modifying the YOLOV3-Tiny network parameter, wherein the specific contents comprise: 1) replacing the original values with the 6 anchors; 2) the modified classification number is 5, the modified classes in the cfg configuration file is 5, the number of channels of the yolo layer is (5+1+4) × 3, that is, 30, and the modified yolo layer size is 13 × 13 × 30, 26 × 26 × 30. And building a Yolov3 and a Yolov3-Tiny network model.
And step 3: and training a Yolov3 network and storing a weight file.
This embodiment is used to train the YOLOV3 network model. Setting the epochmax to be 250 and the batch to be 8, calculating precision ratio, recall ratio and mAP on the verification set after each epoch is finished, and storing the precision ratio, the recall ratio and the mAP as training logs. And after one-time training is finished, adjusting coefficients of the positioning error function, the confidence error function and the classification error function according to the training log, training for multiple times to obtain a network model with better performance, and storing a weight file.
And 4, step 4: training YOLOV3-Tiny based on knowledge distillation
In this embodiment, the YOLOV3 network loads its weight file, and the YOLOV3-Tiny network trains from scratch. The two networks perform forward propagation on the input image in turn, resulting in outputs of scale 13 × 13 × 30, 26 × 26 × 30, respectively denoted outt(teacher) and outs(students). Then, calculating a YOLOV3-Tiny network error according to the formulas 1-3, and setting the relevant parameters as follows: t4, α 0.6, β1=0.0003。
And setting the maximum iteration times and the batch processing number, only calculating the error and the gradient of the YOLOV3-Tiny network in the training process, and updating the weight, wherein the YOLOV3 network only executes forward inference and does not calculate the error and the gradient. And after each epoch is finished, calculating precision ratio, recall ratio and mAP of YOLOV3-Tiny on the verification set, and storing the precision ratio, the recall ratio and the mAP as training logs. After one-time training is finished, the hyperparameter of YOLOV3-Tiny is adjusted according to the training log, a network model with better performance is obtained through multiple times of training, and a weight file is stored.
And 5: after the cultural relics are recovered, the fingerprint area images are collected again;
in the embodiment, after the cultural relic is withdrawn, 100 images of the fingerprint area at 5 positions are collected again by using the camera, the resolution is 5472 multiplied by 3648 pixels, and a total of 500 image forming test sets are obtained.
Step 6: identifying the authenticity of the cultural relic by using the trained YOLOV 3-Tiny;
the trained YOLOV3-Tiny network was used to perform an inference process on the test set, and each image outputted a confidence value for the fingerprint region. Averaging 100 confidence values of each fingerprint area, if the average value is larger than a set threshold value of 0.95, determining that the fingerprint area is not changed, and if no fingerprint area is changed at 5 positions, determining that the cultural relic is a genuine relic.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures or equivalent flow transformations made by using the contents of the specification and the drawings, or applied directly or indirectly to other related systems, are included in the scope of the present invention.