CN115310555A

CN115310555A - Image anomaly detection method based on local perception knowledge distillation network

Info

Publication number: CN115310555A
Application number: CN202211046771.5A
Authority: CN
Inventors: 宋亚楠; 刘贤斐; 鲁鹏; 沈卫明
Original assignee: Institute Of Computer Innovation Technology Zhejiang University
Current assignee: Institute Of Computer Innovation Technology Zhejiang University
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-11-08

Abstract

The invention discloses an image anomaly detection method based on a local perception knowledge distillation network. Constructing an anomaly detection network with a characteristic local sensing module and a difficult sensing loss function, inputting a non-abnormal image into training, and inputting a real-time test image into the trained anomaly detection network to obtain an anomaly image to judge and detect anomaly; the abnormality detection network comprises a structural teacher network and a student network, and a feature corresponding graph containing local information is obtained by using a feature local perception operation module; and training a student network under the condition of multi-level difficult perception characteristic loss, and simultaneously keeping the teacher network unchanged, wherein the characteristic loss of the two networks is a large abnormal position. The invention can fully sense the context feature information and the local spatial information, improves the extraction capability of the network on the structural feature and the function of the structural information in the abnormal detection task, enhances the sensing capability of the network on tiny abnormal areas, the adaptability of the network on noise and the detection precision of the abnormal areas, and also effectively improves the generalization performance of the network.

Description

Image anomaly detection method based on local perception knowledge distillation network

Technical Field

The invention relates to an image detection method in the field of computer vision, in particular to an image anomaly detection method based on a local perception knowledge distillation network.

Background

The main task of anomaly detection is to identify situations different from conventional modes, and the method is widely applied to the fields of video monitoring, product quality control, medical diagnosis and the like. The traditional anomaly detection method depends on manual characteristics and cannot be effectively applied to complex and various detection scenes. The deep learning method can autonomously extract high-dimensional characteristics of the defect image, and is widely explored in recent years. However, due to the diversity of the anomaly data classes and the lack of anomaly data samples, the supervised deep learning approach cannot effectively handle anomaly detection scenarios. Although the knowledge distillation model can realize unsupervised training only on the basis of positive samples, the network model depends on feature maps extracted by a teacher network and a student network, and the abnormal position is judged only through the difference of the positions of single pixels corresponding to the two feature maps, so that the context information of the image cannot be effectively utilized, and the extraction capability of the network on the structural features is limited. In addition, the knowledge distillation network model only depends on the distance of the corresponding pixels to construct a loss function, so that the difference between an abnormal region and a normal region is blurred, the effect of a difficult sample on network optimization cannot be effectively excavated, and the positioning precision of abnormal detection is not favorably improved.

Disclosure of Invention

The invention aims to provide an anomaly detection method based on a local perception knowledge distillation network, which aims to solve the background technology. In addition, the constructed hard sensing loss function enlarges the distance between the abnormal area loss and the normal area loss, and improves the contribution degree of the hard sample to the network loss.

In order to achieve the purpose, the invention adopts the technical scheme that:

step 1: constructing an anomaly detection network with a characteristic local perception module and a difficult case perception loss function;

step 2: inputting the historical images without the abnormity into an abnormity detection network and then training;

and 3, step 3: and acquiring a real-time image under a scene to be detected, inputting the real-time image into the trained anomaly detection network, detecting and outputting an anomaly map, and judging detection anomaly according to the anomaly map.

The historical image and the real-time image are both two-dimensional images.

The abnormity refers to the situation different from the situation in most of time, and is defined according to specific requirements.

The anomaly detection network is shown in FIG. 4 and comprises a teacher network, a student network, a plurality of characteristic local perception modules and a plurality of difficult case perception loss functions;

the teacher network is mainly formed by sequentially connecting four continuous convolution modules Tconv 1-Tconv 4, the student network is mainly formed by sequentially connecting four continuous convolution modules Sconv 1-Sconv 4, the convolution modules Tconv 1-Tconv 4 correspond to the convolution modules Sconv 1-Sconv 4, each convolution module is formed by sequentially connecting a plurality of convolution blocks, and each convolution block is formed by sequentially performing convolution operation;

the input images are respectively input into a teacher network and a student network, high-dimensional feature maps output by a second convolution module Tconv2 of the teacher network and a second convolution module Sconv2 of the student network are processed by respective feature local sensing modules to obtain respective local sensing feature maps, the respective local sensing feature maps are input into a first difficult sensing loss function, and the first difficult sensing loss function outputs a first feature distance map;

the input images are respectively input into a teacher network and a student network, high-dimensional feature maps output by a third convolution module Tconv3 of the teacher network and a third convolution module Sconv3 of the student network pass through respective feature local perception modules to obtain respective local perception feature maps, the respective local perception feature maps are input into a second difficult case perception loss function, and the second difficult case perception loss function outputs a second feature distance map;

the input images are respectively input into a teacher network and a student network, respective local sensing characteristic graphs are obtained after respective characteristic local sensing modules of high-dimensional characteristic graphs output by a fourth convolution module Tconv4 of the teacher network and a fourth convolution module Sconv4 of the student network are input into a third difficult-to-case sensing loss function, and a third characteristic distance graph is output by the third difficult-to-case sensing loss function;

and multiplying the three characteristic distance maps O2, O3 and O4 to obtain an abnormal map O.

The judging and detecting abnormity according to the abnormity map specifically comprises the following steps: traversing each feature position in the abnormal graph O, and if the value of one feature position is larger than a preset abnormal threshold value, determining that an abnormality exists in the abnormal graph O, namely the real-time image has an abnormality.

In the step 2, in the training process, the respective hard sensing losses are respectively calculated by each characteristic distance map, the total loss is obtained by directly adding the three hard sensing losses corresponding to the three characteristic distance maps, the anomaly detection network is trained with the aim of minimizing the total loss, and the parameters of the anomaly detection network are optimized.

The characteristic local perception module is specifically as follows:

s1, taking each channel of the high-dimensional feature map as a feature channel map, and traversing each feature channel map in the high-dimensional feature map:

s2, traversing each feature position (i, j) in the feature channel graph, and performing the following processing at each feature position (i, j): establishing a search range around the feature position (i, j) as a center to search neighborhood points, wherein the height of the search range is h, the width of the search range is w, and taking the maximum value of all neighborhood points in the search range as the local perception feature of the feature position (i, j);

s3, repeating the steps for each feature position on the feature channel map to traverse each feature position (i, j) in the feature channel map, so as to obtain local perception features of each feature position (i, j) in the traverse feature channel map;

and S4, repeatedly executing the steps for each channel of the network high-dimensional features to traverse each feature channel map in the high-dimensional feature map, further obtaining the local perception features of each feature position (i, j) of each feature channel map in the high-dimensional feature map, and finally forming the local perception feature map of the high-dimensional feature map.

When the feature position (i, j) is located at the edge position, 0 pixel is added at a position within the search range where there is no feature position, so that the entire feature map can use the same size search range.

The processes of outputting a first characteristic distance map by a first difficult sensing loss function, outputting a second characteristic distance map by a second difficult sensing loss function and outputting a third characteristic distance map by a third difficult sensing loss function are the same, and the processes of outputting the characteristic distance maps by the difficult sensing loss functions are as follows:

s1, regularizing the feature vectors of the high-dimensional feature maps obtained by the teacher network and the student network at each feature position (i, j) to obtain regularized vectors at the feature positions (i, j) corresponding to the teacher network and the student network

And

wherein, F _t (I) Representing a high-dimensional profile obtained by the teacher's network, F _s (I) Representing high-dimensional feature maps obtained by a student network, F _t (I) _ij A teacher feature vector representing a high-dimensional feature map obtained by the teacher network at a feature position (i, j), F _s (I) _ij Representing the student feature vectors of the high-dimensional feature map obtained by the student network at the feature position (i, j),

and

respectively representing regularization vectors at characteristic positions (I, j) corresponding to a teacher network and a student network, wherein I represents an input image;

s2, regularizing vectors at each characteristic position (i, j) corresponding to the teacher network and the student network

And

calculating a feature distance dis (I) at each feature position (I, j) according to the following formula _ij And further by the feature distances dis (I) at all feature positions (I, j) _ij Obtaining a characteristic distance map:

wherein, dis (I) _ij Representing the characteristic distance at the characteristic position (i, j),

representing the L2 norm.

The method comprises the following steps of respectively calculating respective hard sensing loss by each characteristic distance map, specifically:

s1, traversing each characteristic position (I, j) on the characteristic distance graph, and firstly utilizing the distance dis (I) at the same characteristic position (I, j) of the teacher network and the student network _ij Calculating hard case perception loss value l (I) at characteristic position (I, j) _ij ：

Wherein e represents a natural constant;

s2, according to the difficult example perception loss value l (I) at each characteristic position (I, j) _ij Calculating the corresponding characteristic position (i, j) according to the following formulaHard case perceptual loss l (I) of the feature distance map generated between the teacher network and the student network:

where H represents the height of the high-dimensional feature map, and W represents the width of the high-dimensional feature map.

The method can be applied to images in the fields of video monitoring, product quality control, medical diagnosis and the like.

When applied to video surveillance images, the abnormal behavior of pedestrians or abnormal movement of moving objects in the images can be detected.

When applied to a product quality control image, defects of a product in the image can be detected.

When applied to medical diagnostic images, diseases in the images can be detected.

The invention designs a local perception operation module aiming at high-dimensional features of an image. A teacher network with a complex structure and a student network with a simple structure are constructed; meanwhile, respectively corresponding feature graphs of a teacher network and a student network are built, and the designed feature local perception operation module is applied to the built feature graphs to obtain feature corresponding graphs containing local information; constructing hard case perception feature loss between two networks based on a multi-level feature corresponding graph; training a student network under the guidance of multi-level feature loss, and simultaneously keeping a teacher network model unchanged; and the positions with larger loss of the two network features are the abnormal positions in the image.

The invention has the beneficial effects that:

the method can fully sense the context feature information in the feature construction process, improves the extraction capability of the network to the structural features, and enhances the sensing capability of the anomaly detection network to the tiny abnormal areas.

The method can effectively utilize local spatial information in the two-dimensional image, improves the function of the structured information in the anomaly detection, and enhances the adaptability of the network to noise and the detection precision of the anomaly area.

Meanwhile, the network loss error established by the invention can sense the difficult sample on line, and the weight of the difficult sample in the network loss is improved, thereby being beneficial to improving the detection capability of the network on abnormal areas with the same category and different sizes and effectively improving the generalization performance of the network.

Drawings

FIG. 1 is a schematic diagram of high-dimensional features of an image;

FIG. 2 is a schematic diagram of a local perceptual feature map construction process at location (i, j);

FIG. 3 is a schematic diagram of a feature map with 0 pixels added at the outer edge positions;

FIG. 4 is an anomaly detection network based on local perception knowledge distillation;

FIG. 5 is a flow chart of the anomaly detection method of the present invention.

Detailed Description

The invention is further described with reference to the accompanying drawings and the detailed description.

It is to be understood that the embodiments described herein are exemplary and that the specific parameters used in the description of the embodiments are for the purpose of describing the invention only and are not intended to be limiting.

As shown in fig. 5, an embodiment of the present invention includes the steps of:

step 1: and constructing a characteristic local perception module.

The constructed characteristic local perception module mainly extracts a high-dimensional characteristic diagram aiming at a network. As shown in fig. 1, the dimension of the image high-dimensional feature map is H × W × C, which respectively represents the image height H, width W, and channel number C. And the characteristic local perception module is respectively acted on each channel of the high-dimensional characteristic diagram, extracts context information for each characteristic position and acquires a local perception characteristic diagram. The method comprises the following specific steps:

(1) Traversing and selecting a feature channel graph;

(2) And establishing a search range search neighborhood point around the feature position (i, j) of the selected feature channel image, wherein the height of the search range is h, and the width of the search range is w.

(3) And taking the maximum value of all neighborhood points in the search range as a local perception feature map of the feature position (i, j).

(4) And (4) repeating the steps (2) to (3) for each characteristic position on the selected characteristic channel map, and acquiring a local perception characteristic map of the corresponding position.

(5) And (5) performing steps (1) to (4) for each channel of the network high-dimensional feature to obtain a local perception feature map of the high-dimensional feature.

Taking the characteristic positions of the ith row and the jth column on a certain channel as an example, the construction processes of steps (2) - (3) are described, as shown in fig. 2. The black dots represent positions where local perception feature maps need to be constructed, and the height h and the width w of the neighborhood search range are 3 pixel positions. It should be noted that, when solving the local perceptual feature map at the edge position of the feature channel map, 0 pixel needs to be added at the outer edge of the edge position to ensure that the whole feature map can use the same size of search range, as shown in fig. 3.

Local perceptual feature maps of these high-dimensional features are constructed with respect to high-dimensional feature maps TF2, TF3, TF4 and SF2, SF3, SF4, with reference to fig. 2 and 3. The corresponding local perception characteristic maps are TL2, TL3 and TL4, and SL2, SL3 and SL4. In specific implementation, the height h and the width w of the neighborhood search range are 3 pixel positions, and 0 pixel is added to the outer edge of the feature map.

Step 2: and constructing an inexplicable perceptual loss function.

In the training process, the main goal of the network is to obtain the feature distribution of the normal sample by matching the high-dimensional features of the images extracted by the teacher network and the student network. Given an input image I, the teacher network obtains a feature map of

The corresponding characteristic diagram obtained by the student network is

Representing a real number domain with dimensions H × W × C. The teacher feature vector and the student feature vector at the feature position (i, j) are respectively

And

representing a real number domain with dimension C. In constructing the loss at feature location (i, j), the feature vector F is first constructed _t (I) _ij And F _s (I) _ij Regularization, and then a loss function is constructed based on the L2 distance between the two regularization vectors. Using teacher network characteristic diagram

And student network characteristic diagram

For example, a hard case perceptual loss function corresponding to the feature map is constructed by the following steps:

(1) Constructing regularization vectors at feature locations (i, j) corresponding to the teacher network and the student network

And

(2) Calculating the L2 distance dis (I) between the two regularized vectors obtained in the step _ij 。

(3) Calculating difficult case perception loss l (I) at characteristic positions (I, j) corresponding to teacher network and student network _ij 。

(4) The hard case perceptual loss l (I) of the corresponding feature maps generated by the two networks is calculated.

And (3) constructing difficult example perception losses L2, L3 and L4 according to the formulas (1) to (5), and summing the obtained three difficult example perception losses L2, L3 and L4 to obtain the total loss L of the network.

And 3, step 3: and constructing an anomaly detection network.

The constructed anomaly detection network based on local perception knowledge distillation is shown in fig. 4. Wherein Tconv1, tconv2, tconv3, and Tconv4 represent convolution modules at different stages in the teacher network. Sconv1, sconv2, sconv3, sconv4 denote convolution modules at different stages in the student network, and correspond to convolution modules in the teacher network.

The abnormality detection network takes a two-dimensional image as input, the convolution modules Tconv1, tconv2, tconv3 and Tconv4 in the teacher network extract teacher features TF1, TF2, TF3 and TF4 from the image respectively, and the convolution modules Sconv1, sconv2, sconv3 and Sconv4 in the student network extract student features SF1, SF2, SF3 and SF4 from the image respectively. The corresponding teacher features and student features have the same dimensions. The teacher characteristics TF2, TF3 and TF4 are subjected to the constructed characteristic local perception module to obtain local perception characteristic maps TL2, TL3 and TL4, and the corresponding student characteristics SF2, SF3 and SF4 are subjected to the constructed characteristic local perception module to obtain local perception characteristic maps SL2, SL3 and SL4.

The input image size of the specific implementation is 256 × 256.

In the teacher network, tconv1 corresponds to an operation of (7 × 7, 64) × 1,tconv2 corresponds to an operation of

Tconv3 corresponds to an operation of

Tconv4 corresponds to an operation of

In the student network, the operation corresponding to Sconv1 is (7 × 7, 64) × 1, and the operation corresponding to Sconv2 is

Sconv3 corresponds to an operation of

Sconv4 corresponds to an operation of

Wherein, taking Tconv2 as an example, tconv2 corresponds to the operation of

Where the inside of the brackets indicates a volume block and the outside of the brackets indicates a stack number of volume blocks of 3. The inside of the volume block 1 × 1 and 3 × 3 indicate the convolution kernel size, and 64 and 256 indicate the number of convolution kernels.

And 4, step 4: and training the constructed anomaly detection network.

The local perception characteristic diagram obtained by the teacher network and the local perception characteristic diagram obtained by the student network respectively obtain corresponding difficult case perception losses L2, L3 and L4 through a difficult case perception loss module. The total loss L of the network is obtained by summing the three difficult perceptual losses. And the abnormal network completes the training process under the guidance of the total loss L.

In specific implementation, the MVTec anomaly detection data set is adopted to train the constructed anomaly detection network. The data set contains more than 5000 high resolution images covering 15 different object types. The initial learning rate is set to 0.4, the training batch size is 32, and the maximum number of iterations is 200 epochs.

And 5: and directly predicting the abnormal area of the input object image by the trained abnormal detection network.

After the abnormal network training is completed, the input image can be subjected to an abnormal detection test. The main objective of the test phase is to obtain an abnormal area map O with the size Rh × Rw, with the specific steps:

(1) And inputting the test image into a teacher network and a student network to respectively obtain local perception feature maps TL2, TL3 and TL4 of the teacher network and local perception feature maps SL2, SL3 and SL4 of the student network.

(2) The distances between the three feature pairs of the local perceptual feature maps TL2 and SL2, TL3 and SL3, TL4 and SL4 at the corresponding positions (i, j) are calculated according to the formulas (1) - (3). The (i, j) location represents each location on the local perceptual profile. And after the distance calculation of each position on the local perception feature map is completed, a distance map between each pair of features can be obtained. The characteristic distance between the local perceptual characteristic maps TL2 and SL2 is denoted as O2, the characteristic distance between TL3 and SL3 is denoted as O3, and the characteristic distance between TL4 and SL4 is denoted as O4.

(3) Multiplying the characteristic distance maps O2, O3 and O4 calculated in the step (2) by corresponding elements to obtain an abnormal map O with the size of Rh multiplied by Rw, which is expressed as follows:

O＝O2×O3×O4

it should be noted that, when the size of the feature maps O2, O3, and O4 is not Rh × Rw, the bilinear interpolation method is used to adjust the size to Rh × Rw, and then the two are multiplied by the corresponding elements.

Compared with the prior art, the method constructs a characteristic local perception module for capturing characteristic context information and a difficult perception loss function for difficult mining. The method provided by the invention can effectively utilize the unstructured features of the image, improve the sensing capability of the network to the local features and the global features, effectively cope with the complex conditions of noise, unknown abnormality and the like, can mine difficult samples in data concentration, and improve the generalization performance of the model. In the MVTec anomaly detection data set, 97.2% of pixel-level anomaly detection precision and 91.3% of image-level anomaly detection precision are obtained by the method. The abnormality detection performance exceeds similar abnormality detection methods such as SSIM-AE, anogAN, CNN-Dic, cut-Paste, patch-SVDD, patIM-R18, SPADE and the like.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited to the above embodiment, and any person skilled in the art should be within the technical scope of the present invention, and equivalent substitutions or changes made according to the technical solution of the present invention and the inventive concept thereof should be included in the scope of the present invention.

Claims

1. An anomaly detection method based on a local perception knowledge distillation network is characterized by comprising the following steps:

and 2, step: inputting the historical images without the abnormity into an abnormity detection network and then training;

and step 3: and acquiring a real-time image under a scene to be detected, inputting the real-time image into the trained anomaly detection network, detecting and outputting an anomaly map, and judging detection anomaly according to the anomaly map.

2. The anomaly detection method based on the local perception knowledge distillation network as claimed in claim 1, wherein the method comprises the following steps: the anomaly detection network comprises a teacher network, a student network, a plurality of characteristic local perception modules and a plurality of difficult case perception loss functions;

the teacher network is mainly formed by sequentially connecting four continuous convolution modules Tconv 1-Tconv 4, the student network is mainly formed by sequentially connecting four continuous convolution modules Sconv 1-Sconv 4, each convolution module is formed by sequentially connecting a plurality of convolution blocks, and each convolution block is formed by sequentially performing a plurality of convolution operations;

the input images are respectively input into a teacher network and a student network, high-dimensional feature maps output by a second convolution module Tconv2 of the teacher network and a second convolution module Sconv2 of the student network pass through respective feature local perception modules to obtain respective local perception feature maps, the respective local perception feature maps are input into a first difficult case perception loss function, and the first difficult case perception loss function outputs a first feature distance map;

the input images are respectively input into a teacher network and a student network, high-dimensional feature maps output by a third convolution module Tconv3 of the teacher network and a third convolution module Sconv3 of the student network are processed by respective feature local sensing modules to obtain respective local sensing feature maps, the respective local sensing feature maps are input into a second difficult-to-perceive loss function, and the second difficult-to-perceive loss function outputs a second feature distance map;

3. The anomaly detection method based on the local perception knowledge distillation network as claimed in claim 2 or 3, wherein: the judging and detecting abnormity according to the abnormity map specifically comprises the following steps: traversing each characteristic position in the abnormal graph O, and if the value of one characteristic position is larger than a preset abnormal threshold value, determining that an abnormality exists in the abnormal graph O, namely the real-time image has an abnormality.

4. The anomaly detection method based on the local perception knowledge distillation network as claimed in claim 1, wherein the anomaly detection method comprises the following steps: in the step 2, in the training process, the respective difficult sensing loss is respectively calculated by each feature distance map, the total loss is obtained by combining the three difficult sensing losses corresponding to the three feature distance maps, the anomaly detection network is trained with the aim of minimizing the total loss, and the parameters of the anomaly detection network are optimized.

5. The anomaly detection method based on the local perception knowledge distillation network as claimed in claim 1, wherein the method comprises the following steps: the characteristic local perception module specifically comprises:

s1, taking each channel of the high-dimensional feature map as a feature channel map:

s2, performing the following processing under each characteristic position (i, j): establishing a search range around the feature position (i, j) as a center to search neighborhood points, wherein the height of the search range is h, the width of the search range is w, and the maximum value of all neighborhood points in the search range is taken as the local perception feature of the feature position (i, j);

s3, repeating the steps to traverse each feature position (i, j) in the feature channel map, and accordingly obtaining local perception features of each feature position (i, j) in the traversed feature channel map;

and S4, repeatedly executing the step to traverse each feature channel image in the high-dimensional feature images, further obtaining the local perception feature of each feature position (i, j) of each feature channel image in the high-dimensional feature images, and finally forming the local perception feature image of the high-dimensional feature images.

6. The anomaly detection method based on the local perception knowledge distillation network as claimed in claim 5, wherein: when the feature position (i, j) is located at the edge position, 0 pixel is added in a place having no feature position within the search range.

7. The anomaly detection method based on the local perception knowledge distillation network as claimed in claim 2, wherein: the characteristic distance graph output by the difficult perception loss function is specifically as follows:

And

wherein, F _t (I) High-dimensional feature graph representing teacher network acquisition, F _s (I) Representing high-dimensional feature maps obtained by a student network, F _t (I) _ij A teacher feature vector representing a high-dimensional feature map obtained by the teacher network at a feature position (i, j), F _s (I) _ij Representing the student feature vectors of the high-dimensional feature maps obtained by the student network at the feature positions (i, j),

and

respectively representing regularization vectors at characteristic positions (I, j) corresponding to the teacher network and the student network, wherein I represents an input image;

And

calculating a feature distance dis (I) at each feature position (I, j) according to the following formula _ij And further by the feature distance dis (I) at all feature positions (I, j) _ij Obtaining a characteristic distance map:

wherein, dis (I) _ij Representing the feature distance at the feature location (i, j),

representing the L2 norm.

8. The anomaly detection method based on the local perception knowledge distillation network as claimed in claim 4, wherein the method comprises the following steps: the method comprises the following steps of respectively calculating respective hard sensing loss by each characteristic distance map, specifically:

Wherein e represents a natural constant;

s2, according to the difficult example perception loss value l (I) at each characteristic position (I, j) _ij Traversing each feature location (I, j) to calculate an unmanageable perceptual loss l (I) of a feature distance map generated between the teacher network and the student network according to the following formula:

where H denotes the height of the high-dimensional feature map and W denotes the width of the high-dimensional feature map.