CN110232445B

CN110232445B - Cultural relic authenticity identification method based on knowledge distillation

Info

Publication number: CN110232445B
Application number: CN201910526264.3A
Authority: CN
Inventors: 刘学平; 李玙乾; 张晶阳; 王哲
Original assignee: Shenzhen Graduate School Tsinghua University
Current assignee: Shenzhen Graduate School Tsinghua University
Priority date: 2019-06-18
Filing date: 2019-06-18
Publication date: 2021-03-26
Anticipated expiration: 2039-06-18
Also published as: CN110232445A

Abstract

The invention relates to the field of authenticity identification of cultural relics, in particular to a method for authenticity identification of cultural relics based on knowledge distillation, which mainly includes: step 1: before the cultural relics are exhibited, collect images of fingerprint areas to make a data set; step 2: configure a YOLOV3 network as a Teacher network, configure YOLOV3‑Tiny network as the student network; Step 3: Train YOLOV3; Step 4: Train YOLOV3‑Tiny based on knowledge distillation; Step 5: After the cultural relics are recovered, re-collect the fingerprint area images to make a test set; Step 6: Use The trained YOLOV3‑Tiny identifies the authenticity of artifacts. The invention uses YOLOV3 with good accuracy but slow speed as the teacher network, and YOLOV3-Tiny with poor accuracy but fast speed as the student network, performs knowledge distillation, supervises students' network learning with softened goals, and maintains the original YOLOV3-Tiny In the case of a faster detection speed, the accuracy is greatly improved, the hardware resource occupation in the process of cultural relic identification is reduced, the detection efficiency is improved, and the identification cost is saved.

Description

Cultural relic authenticity identification method based on knowledge distillation

Technical Field

The invention relates to the field of cultural relic authenticity identification, in particular to a cultural relic authenticity identification method based on knowledge distillation.

Background

The historical culture of China is long, and a great amount of historical relics are treasure. In order to carry forward historical culture, cultural relic collection units all over the country can regularly develop cultural relic exhibition activities. After the activity is shown, the cultural relics need to be identified so as to prevent the cultural relics from being damaged and replaced. At present, the cultural relic identification work is mainly completed manually, which depends on the personal experience and knowledge range of experts and sometimes needs to be assisted by high-tech equipment for detection. The manual identification process is long in time consumption, more manpower and material resources are required to be invested, and the subjectivity of the identification result is strong.

Disclosure of Invention

In order to solve the problems in the background art, the invention provides a cultural relic authenticity identification method based on knowledge distillation, which has the advantages of high detection speed and high accuracy.

The technical scheme for solving the problems is as follows: the cultural relic authenticity identification method based on knowledge distillation is characterized by comprising the following steps of:

step 1: before the cultural relics are displayed, fingerprint area images are collected, and a data set is made;

step 2: configuring a YOLOV3 network as a teacher network and configuring a YOLOV3-Tiny network as a student network;

and step 3: training YOLOV 3;

and 4, step 4: training YOLOV3-Tiny based on knowledge distillation;

and 5: after the cultural relics are recovered, the fingerprint area images are collected again to manufacture a test set;

step 6: and (3) identifying the authenticity of the cultural relic by using the trained YOLOV 3-Tiny.

Further, the step 1 specifically includes: before the cultural relics are shown, selecting an area on the cultural relics as a fingerprint area, acquiring RGB images of the fingerprint area from multiple angles by using a high-precision camera under different illumination conditions, marking the fingerprint area in the images by using a marking tool, manufacturing a data set, randomly selecting a part of images and marking files thereof as a training set, and taking the rest parts of the images as a verification set.

Further, in step 2, the yolo layer with prediction scale of 52 × 52 of YOLOV3 is deleted, and only the predictions of 13 × 13 and 26 × 26 are retained; aiming at the training set obtained in the step 1, 6 anchors are calculated by using a K-Means algorithm, and the anchors of the original Yolov3 and Yolov3-Tiny are replaced.

Further, the step 3 specifically includes: and (3) training the Yolov3 network model on the data set obtained in the step 1, and storing a weight file.

Further, in the step 4, the trained Yolov3 network is used as a teacher network, and the Yolov3-Tiny network is used as a student network; the two networks perform forward propagation on the input image in turn, resulting in outputs of scale 13 × 13 × c, 26 × 26 × c, respectivelyIs marked as out_t(teacher) and out_s(students); the error of Yolov3-Tiny was calculated according to the formulas (1) to (3):

LOSS＝αT²·loss_soft+(1-α)·loss_hard (1)

loss_hard＝crossentropy(out_s,Target) (3)

in the formula, loss_softFor soft target error, loss_hardFor a hard target error (i.e., the original error of the Yolov3 network), α is the adjustment loss_softAnd loss_hardT is the distillation temperature; target represents the original label of the data set, namely the hard Target; softmax () and cross () respectively represent the function value of softmax and the cross entropy value; to balance loss_softAnd loss_hardOrder of magnitude of (1), introduction coefficient beta₁(ii) a In equation 2, the position, confidence, and category prediction values of the student and teacher networks are softened, and then the relative entropy is obtained as a soft target error.

Further, the step 5 specifically includes: and after the cultural relics are recovered, acquiring a plurality of images in the fingerprint area again to be used as a test set.

Further, the step 6 specifically includes: and (3) performing inference on the trained YOLOV3-Tiny on the test set obtained in the step (5) to obtain confidence values of the fingerprint regions, solving the average value of the confidence values of each fingerprint region, if the confidence value is greater than a set threshold value, determining that the fingerprint region is unchanged, and if the fingerprint region is not changed, determining that the cultural relic is a genuine relic.

The invention has the advantages that:

according to the cultural relic authenticity identification method based on knowledge distillation, the YOLOV3 which is good in accuracy and low in speed is used as a teacher network, the YOLOV3-Tiny which is poor in accuracy and high in speed is used as a student network, knowledge distillation is carried out, the softened target is used for supervising student network learning, under the condition that the original high detection speed of YOLOV3-Tiny is kept, the accuracy is greatly improved, the hardware resource occupation in the cultural relic identification process is reduced, the detection efficiency is improved, and the identification cost is saved.

Drawings

FIG. 1 is a schematic flow chart of the cultural relic authenticity identification method based on knowledge distillation of the invention;

FIG. 2 is a modified YOLOV3 network structure;

FIG. 3 is a YOLOV3-Tiny network structure;

FIG. 4 is a schematic diagram of the knowledge distillation.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Referring to fig. 1, a cultural relic authenticity identification method based on knowledge distillation comprises the following steps:

and step 3: YOLOV3 was trained on the training set;

and 4, step 4: training YOLOV3-Tiny based on knowledge distillation;

step 6: identifying the authenticity of the cultural relic by using the trained YOLOV 3-Tiny;

step 1: before the cultural relics are shown, fingerprint area images are collected and a data set is made.

Further, in step 2, the yolo layer with prediction scale of 52 × 52 in YOLOV3 is deleted, only the predictions of 13 × 13 and 26 × 26 are retained, and the network model after deletion is shown in fig. 2. The number of classes of the modified YoloV3 network is m classes, the number of convolution kernel channels of the yolo layer is (m +1+4) × 3 and is marked as c, and the output size of the modified yolo layer is 13 × 13 × c and 26 × 26 × c. Aiming at the training set obtained in the step 1, 6 anchors are calculated by using a K-Means algorithm, and the anchors of the original Yolov3 and Yolov3-Tiny are replaced. The Yolov3-TINY network model is shown in FIG. 3.

Further, in step 3, setting an initial hyper-parameter of the YOLOV3 network, setting a maximum iteration number epochmax and a batch processing number batch, training on a training set, calculating precision ratio, recall ratio and mAP of the YOLOV3 on a verification set after each epoch is finished, and storing the precision ratio, the recall ratio and the mAP in a training log. After each training (reaching the maximum iteration frequency epochmax), the coefficients of the position error function, the confidence coefficient error function and the classification error function are adjusted according to the training log, the training results with high precision ratio, recall ratio and mAP are finally obtained, and the weight file is saved.

Further, in step 4, the trained YOLOV3 network is used as a teacher network, and the YOLOV3-Tiny network is used as a student network. The two networks perform forward propagation on the input image in turn, resulting in outputs with scales of 13 × 13 × c, 26 × 26 × c, respectively denoted as out, as shown in fig. 4_t(teacher) and out_s(students). Backward propagation and updating of weights is only done for the YOLOV3-Tiny networkHeavy, the YOLOV3 network does not update the weights, only performs forward inference. The back propagation error includes two components, as shown in equations 1-3.

LOSS＝αT²·loss_soft+(1-α)·loss_hard (1)

loss_hard＝crossentropy(out_s,Target) (3)

In the formula, loss_softFor soft target error, loss_hardFor a hard target error (i.e., the original error of the Yolov3 network), α is the adjustment loss_softAnd loss_hardT is the distillation temperature. Target represents the original annotation of the data set, i.e. the hard Target. softmax () and cross () represent the softmax function value and cross entropy value, respectively. To balance loss_softAnd loss_hardOrder of magnitude of (1), introduction coefficient beta₁. In equation 2, the position, confidence, and category prediction values of the student and teacher networks are softened, and then the relative entropy is obtained as a soft target error.

Setting the maximum iteration times and the batch processing number, training on a training set, calculating precision ratio, recall ratio and mAP of YOLOV3-Tiny on a verification set after each epoch is finished, and storing the precision ratio, the recall ratio and the mAP in a training log. After each training is finished, the hyperparameter of YOLOV3-Tiny is adjusted according to the training log, the training result with higher precision ratio, recall ratio and mAP is finally obtained, and the weight file is saved.

Further, in the step 5, after the cultural relics are recovered, a plurality of images are collected again in the fingerprint area at the position m to form a test set for identifying the authenticity of the cultural relics.

Further, step 6 uses the YOLOV3-Tiny network trained in step 4 to perform an inference process on the test set, and obtains a confidence value of each image fingerprint region. And counting the confidence values of the fingerprint areas at the same position, calculating the average value of the confidence values, and if the m average values are all larger than a set threshold value, judging that the cultural relic is a genuine product.

Example (b):

in the embodiment, 5 fingerprint areas are known before the cultural relics are displayed, and the sizes of the fingerprint areas are all 5mm by 5mm². Under different illumination conditions, 500 RGB images are respectively collected from 5 fingerprint areas at multiple angles by using an EOS 7D Mark II camera with an MP-E65 mm f/2.81-5X lens, the resolution is 5472 multiplied by 3648 pixels, and 2500 images are obtained in total. And marking out the fingerprint area in the image by using a marking tool labelimg to obtain an XML file corresponding to each image. 2300 images and the labeled files thereof are randomly selected as a training set, and the rest are used as a verification set.

Step 2: configuring a YOLOV3 network as a teacher network and configuring a YOLOV3-Tiny network as a student network; modifying the network model and parameters;

in this embodiment, a deep learning development environment needs to be configured, the CPU is i79700K, the GPU is nvidiageforcetrtx 2080, the operating system is Ubuntu 16.04LTS, CUDA10.0, and the deep learning framework is Pytorch.

Modifying a Yolov3 network structure, wherein the concrete contents comprise: 1) deleting a yolo layer with the dimension of 52 multiplied by 52, and deleting a corresponding convolutional layer, an upper sampling layer and a route layer in the cfg configuration file; 1) modifying the number of anchors to be 6, modifying num to be 6 in the cfg configuration file, recalculating 6 anchors on the training set by using a K-Means algorithm and replacing the original values; 1) the modified classification number is 5, the modified classes in the cfg configuration file is 5, the number of channels of the yolo layer is (5+1+4) × 3, that is, 30, and the modified yolo layer size is 13 × 13 × 30, 26 × 26 × 30.

Modifying the YOLOV3-Tiny network parameter, wherein the specific contents comprise: 1) replacing the original values with the 6 anchors; 2) the modified classification number is 5, the modified classes in the cfg configuration file is 5, the number of channels of the yolo layer is (5+1+4) × 3, that is, 30, and the modified yolo layer size is 13 × 13 × 30, 26 × 26 × 30. And building a Yolov3 and a Yolov3-Tiny network model.

And step 3: and training a Yolov3 network and storing a weight file.

This embodiment is used to train the YOLOV3 network model. Setting the epochmax to be 250 and the batch to be 8, calculating precision ratio, recall ratio and mAP on the verification set after each epoch is finished, and storing the precision ratio, the recall ratio and the mAP as training logs. And after one-time training is finished, adjusting coefficients of the positioning error function, the confidence error function and the classification error function according to the training log, training for multiple times to obtain a network model with better performance, and storing a weight file.

And 4, step 4: training YOLOV3-Tiny based on knowledge distillation

In this embodiment, the YOLOV3 network loads its weight file, and the YOLOV3-Tiny network trains from scratch. The two networks perform forward propagation on the input image in turn, resulting in outputs of scale 13 × 13 × 30, 26 × 26 × 30, respectively denoted out_t(teacher) and out_s(students). Then, calculating a YOLOV3-Tiny network error according to the formulas 1-3, and setting the relevant parameters as follows: t4, α 0.6, β₁＝0.0003。

And setting the maximum iteration times and the batch processing number, only calculating the error and the gradient of the YOLOV3-Tiny network in the training process, and updating the weight, wherein the YOLOV3 network only executes forward inference and does not calculate the error and the gradient. And after each epoch is finished, calculating precision ratio, recall ratio and mAP of YOLOV3-Tiny on the verification set, and storing the precision ratio, the recall ratio and the mAP as training logs. After one-time training is finished, the hyperparameter of YOLOV3-Tiny is adjusted according to the training log, a network model with better performance is obtained through multiple times of training, and a weight file is stored.

And 5: after the cultural relics are recovered, the fingerprint area images are collected again;

in the embodiment, after the cultural relic is withdrawn, 100 images of the fingerprint area at 5 positions are collected again by using the camera, the resolution is 5472 multiplied by 3648 pixels, and a total of 500 image forming test sets are obtained.

the trained YOLOV3-Tiny network was used to perform an inference process on the test set, and each image outputted a confidence value for the fingerprint region. Averaging 100 confidence values of each fingerprint area, if the average value is larger than a set threshold value of 0.95, determining that the fingerprint area is not changed, and if no fingerprint area is changed at 5 positions, determining that the cultural relic is a genuine relic.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures or equivalent flow transformations made by using the contents of the specification and the drawings, or applied directly or indirectly to other related systems, are included in the scope of the present invention.

Claims

1. a cultural relic authenticity identification method based on knowledge distillation, is characterized in that, comprises the following steps:

Step 1: Before the cultural relics are exhibited, collect images of the fingerprint area and create a data set;

Step 2: Configure the YOLOV3 network as the teacher network and the YOLOV3-Tiny network as the student network;

Step 3: Train YOLOV3;

Step 4: Train YOLOV3-Tiny based on knowledge distillation;

Step 5: After the cultural relics are recovered, re-collect the fingerprint area image to make a test set;

Step 6: Use the trained YOLOV3-Tiny to identify the authenticity of cultural relics.

2. a kind of cultural relic authenticity identification method based on knowledge distillation according to claim 1, is characterized in that:

The step 1 is specifically: before the cultural relic is exhibited, select an area on the cultural relic as the fingerprint area, use a camera to collect RGB images of the fingerprint area from multiple angles under different lighting conditions, and use an annotation tool to mark the fingerprint in the image. Region, make a dataset, randomly select a part of the images and their annotation files as the training set, and the rest as the validation set.

3. a kind of cultural relic authenticity identification method based on knowledge distillation according to claim 2, is characterized in that:

In step 2, delete the yolo layer whose prediction scale of YOLOV3 is 52×52, and only retain the predictions of 13×13 and 26×26 scales; modify the number of categories of the YOLOV3 network to m, and the yolo layer convolution kernel channel The number is (m+1+4)×3, denoted as c; for the training set obtained in step 1, the K-Means algorithm is used to calculate 6 anchorboxes, replacing the original anchorboxes of YOLOV3 and YOLOV3-Tiny.

4. a kind of cultural relic authenticity identification method based on knowledge distillation according to claim 3, is characterized in that:

The step 3 is specifically: train the YOLOV3 network model on the data set obtained in step 1, and save the weight file.

5. a kind of cultural relic authenticity identification method based on knowledge distillation according to claim 4, is characterized in that:

In the step 4, the trained YOLOV3 network is used as the teacher network, and the YOLOV3-Tiny network is used as the student network; the two networks perform forward propagation on the input image in turn, and the scales are 13 × 13 × c, 26 × 26 The output of ×c is recorded as out _t and out _s respectively; YOLOV3-Tiny error is calculated according to formulas (1) to (3):

LOSS=αT ² ·loss _soft +(1-α)·loss _hard (1)

loss _hard =crossentropy(out _s ,Target) (3)

In the above formula, loss _soft is the soft target error, loss _hard is the hard target error, α is the weight coefficient for adjusting loss _soft and loss _hard , T is the distillation temperature; Target represents the original label of the data set, that is, the hard target; softmax( ) and crossentropy() represent the value of softmax function and cross entropy respectively; in order to balance the order of magnitude of loss _soft and loss _hard , a coefficient β ₁ is introduced; After the predicted value is softened, the relative entropy is obtained as the soft target error.

6. a kind of cultural relic authenticity identification method based on knowledge distillation according to claim 5, is characterized in that:

The step 5 is specifically: after the cultural relics are recovered, a plurality of images are collected again in the fingerprint area as a test set.

7. a kind of cultural relic authenticity identification method based on knowledge distillation according to claim 6, is characterized in that:

The step 6 is specifically as follows: perform inference on the trained YOLOV3-Tiny on the test set obtained in step 5, obtain the confidence value of the fingerprint area, and obtain the average value of the confidence degree of each fingerprint area, if it is greater than the set threshold value. , it is considered that the place has not changed, and if the fingerprint area has not changed, it is determined that the cultural relic is genuine.