CN114170599B

CN114170599B - Abnormal object segmentation method based on distillation comparison

Info

Publication number: CN114170599B
Application number: CN202111523499.0A
Authority: CN
Inventors: 周瑜; 周欢; 龚石; 白翔; 郑增强; 刘荣华
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2024-08-23
Anticipated expiration: 2041-12-14
Also published as: CN114170599A

Abstract

The present invention discloses a method for segmenting abnormal objects based on distillation comparison: on a training set without abnormalities, a semantic segmentation network is trained, and the network is used as a teacher branch after removing the semantic classification head; the parameters of the teacher branch are fixed, and a student branch with a similar structure to the teacher branch is obtained by using semantic feature distribution distillation. The outputs of the two branches are consistent in the normal class and inconsistent in the abnormal class. An abnormal test image is input, and the two branches respectively perform multi-scale feature extraction and aggregation on the image. The extracted semantic features are compared position by position to obtain an abnormal score map. The abnormal score map is bilinearly interpolated and thresholded to divide all pixels in the image into normal and abnormal categories. This method introduces a new, simple and flexible distillation comparison network to perform abnormal object segmentation. The result of the semantic classification head is not used in the reasoning stage, which greatly reduces the misjudgment of normal category pixels with semantic segmentation errors, and achieves more accurate abnormal object segmentation.

Description

A method for abnormal object segmentation based on distillation comparison

技术领域Technical Field

本发明属于计算机视觉技术领域，更具体地，涉及一种基于蒸馏比较的异常物体分割方法。The present invention belongs to the technical field of computer vision, and more specifically, relates to an abnormal object segmentation method based on distillation comparison.

背景技术Background Art

近年来，异常物体分割成为一些安全关键领域的研究热点，如自动驾驶，医学图像分析等。深度卷积神经网络在语义分割任务上取得了重大突破，其目标是为每个像素分配一个预定义的语义类别标签，然而，在真实世界的开放场景中，如果一个像素属于一个未知的类别，即异常类别，那么将它归为预定义正常类别中的一类而不是异常类是很危险的。异常物体分割任务是要将测试图像中的不在训练图像预定义正常类别中的异常物体像素分割出来。In recent years, abnormal object segmentation has become a research hotspot in some safety-critical fields, such as autonomous driving and medical image analysis. Deep convolutional neural networks have made significant breakthroughs in semantic segmentation tasks, whose goal is to assign a predefined semantic category label to each pixel. However, in real-world open scenes, if a pixel belongs to an unknown category, i.e., an abnormal category, it is dangerous to classify it as a class in the predefined normal category instead of the abnormal category. The abnormal object segmentation task is to segment out the abnormal object pixels in the test image that are not in the predefined normal category of the training image.

现有的异常物体分割方法首先利用语义分割网络(如PSPNet)获得语义分割预测，然后根据语义分割预测使用不同的策略来获得异常分数图。基于不确定度估计的方法对语义分割预测进行不同的后处理，如softmax函数、CRF(Conditional Random Field，条件随机场)算法、语义分割预测集成等来获得异常分数图；基于图像重建的方法使用语义分割预测对原始图像进行重新合成，并将重新合成的图像与原始图像进行比较来获得异常分数图。语义分割预测是现有异常分割方法在推理阶段的一个核心步骤。Existing abnormal object segmentation methods first use semantic segmentation networks (such as PSPNet) to obtain semantic segmentation predictions, and then use different strategies to obtain anomaly score maps based on the semantic segmentation predictions. Methods based on uncertainty estimation perform different post-processing on semantic segmentation predictions, such as softmax function, CRF (Conditional Random Field) algorithm, semantic segmentation prediction integration, etc. to obtain anomaly score maps; methods based on image reconstruction use semantic segmentation predictions to resynthesize the original image, and compare the resynthesized image with the original image to obtain anomaly score maps. Semantic segmentation prediction is a core step in the inference stage of existing abnormal segmentation methods.

现有的异常物体分割方法大多严重依赖于语义分割预测，但观察发现语义分割中分类错误的像素非常容易被误检为异常，这种现象严重影响了异常物体分割的准确性，但在现有的方法中很少被讨论。Most existing abnormal object segmentation methods rely heavily on semantic segmentation prediction, but it is observed that pixels that are misclassified in semantic segmentation are very easy to be misdetected as abnormalities. This phenomenon seriously affects the accuracy of abnormal object segmentation, but is rarely discussed in existing methods.

发明内容Summary of the invention

针对现有技术的以上缺陷或改进需求，本发明提供了一种更准确的基于蒸馏比较的异常物体分割方法，避免语义分割错误的引入对结果造成不良影响，大幅减少了对语义分割错误的正常类别像素的误判，实现了对图像中异常物体更准确的分割。In response to the above defects or improvement needs of the prior art, the present invention provides a more accurate abnormal object segmentation method based on distillation comparison, which avoids the adverse effects of the introduction of semantic segmentation errors on the results, greatly reduces the misjudgment of normal category pixels caused by semantic segmentation errors, and achieves more accurate segmentation of abnormal objects in images.

为达到上述目的，本发明提供一种基于蒸馏比较的异常物体分割方法，包括以下步骤：To achieve the above object, the present invention provides an abnormal object segmentation method based on distillation comparison, comprising the following steps:

步骤S1，利用无异常物体的训练图像和其像素级语义标签训练一个语义分割网络，移除训练好的语义分割网络的分类器仅保留特征提取和聚合部分作为教师分支；Step S1, using training images without abnormal objects and their pixel-level semantic labels to train a semantic segmentation network, removing the classifier of the trained semantic segmentation network and retaining only the feature extraction and aggregation parts as the teacher branch;

步骤S2，固定步骤S1获得的教师分支的参数，在无异常物体的训练图像上利用语义特征分布蒸馏训练一个与教师分支结构相似的学生分支，语义特征分布蒸馏保证两分支的输出在正常类上保持一致，在异常类上表现不一致；Step S2, fixing the parameters of the teacher branch obtained in step S1, and using semantic feature distribution distillation to train a student branch with a similar structure to the teacher branch on training images without abnormal objects. The semantic feature distribution distillation ensures that the outputs of the two branches are consistent on the normal class and inconsistent on the abnormal class.

步骤S3，输入带有异常物体的测试图像，利用步骤S1获得的教师分支和步骤S2获得的学生分支分别对图像进行多尺度特征提取和聚合，获得图像高层语义特征；Step S3, input a test image with an abnormal object, use the teacher branch obtained in step S1 and the student branch obtained in step S2 to perform multi-scale feature extraction and aggregation on the image, and obtain high-level semantic features of the image;

步骤S4，将步骤S3获得的教师分支和学生分支的高层语义特征逐位置进行比较，利用打分函数计算两分支语义特征的差异作为该位置的异常分数得到异常分数图，将异常分数图双线性上采样到原图大小，最后设置合适的阈值将图像中的所有像素划分为正常和异常两类。In step S4, the high-level semantic features of the teacher branch and the student branch obtained in step S3 are compared position by position, and the difference in the semantic features of the two branches is calculated using a scoring function as the anomaly score of the position to obtain an anomaly score map. The anomaly score map is bilinearly upsampled to the original image size, and finally a suitable threshold is set to divide all pixels in the image into normal and abnormal categories.

本发明的一个实施例中，所述步骤S1中的语义分割网络，采用任意现有的语义分割网络。In one embodiment of the present invention, the semantic segmentation network in step S1 adopts any existing semantic segmentation network.

本发明的一个实施例中，在语义分割网络中移除分类器部分作为教师分支，教师分支包括图像特征提取模块和特征聚合模块，在训练语义分割网络时，去掉特征提取模块高层的下采样操作，同时在特征提取模块高层引入空洞卷积替代普通卷积，使得教师分支输出特征图尺寸为输入图像的1/8，通道数为C，C为预设值，最后对每一层通道分别进行标准化得到教师分支最终的输出特征。In one embodiment of the present invention, the classifier part is removed from the semantic segmentation network as a teacher branch. The teacher branch includes an image feature extraction module and a feature aggregation module. When training the semantic segmentation network, the downsampling operation of the high-level feature extraction module is removed. At the same time, a hole convolution is introduced at the high-level of the feature extraction module to replace the ordinary convolution, so that the output feature map size of the teacher branch is 1/8 of the input image, the number of channels is C, and C is a preset value. Finally, each layer of channels is standardized to obtain the final output features of the teacher branch.

本发明的一个实施例中，所述步骤S2中的学生分支，采用与所述步骤S1中的教师分支相似的结构。In one embodiment of the present invention, the student branch in step S2 adopts a structure similar to that of the teacher branch in step S1.

本发明的一个实施例中，所述步骤S2中的学生分支具体为：学生分支包括图像特征提取模块和特征聚合模块，去掉特征提取模块高层的下采样操作，同时在特征提取模块高层引入空洞卷积替代普通卷积，使得学生分支输出特征图尺寸为输入图像的1/8，与教师分支输出特征图尺寸保持一致，通道数为C，C为预设值，与教师分支输出特征通道数保持一致，最后对每一层通道分别进行标准化得到学生分支最终的输出特征。In one embodiment of the present invention, the student branch in step S2 is specifically as follows: the student branch includes an image feature extraction module and a feature aggregation module, the downsampling operation of the high-level feature extraction module is removed, and at the same time, a hole convolution is introduced at the high-level feature extraction module to replace the ordinary convolution, so that the output feature map size of the student branch is 1/8 of the input image, which is consistent with the output feature map size of the teacher branch, the number of channels is C, C is a preset value, and is consistent with the number of output feature channels of the teacher branch, and finally each layer of channels is standardized to obtain the final output features of the student branch.

本发明的一个实施例中，所述步骤S2中在无异常物体的训练图像上利用语义特征分布蒸馏训练学生分支，优化的目标函数为：其中M为批大小，C为通道数，i,j是特征图中的位置索引，c是特征图中的通道索引，是教师分支和学生分支输出特征图中对应通道对应位置处特征值的差。In one embodiment of the present invention, in step S2, the student branch is trained using semantic feature distribution distillation on a training image without abnormal objects, and the optimized objective function is: Where M is the batch size, C is the number of channels, i,j are the position indexes in the feature map, and c is the channel index in the feature map. It is the difference between the feature values at the corresponding positions of the corresponding channels in the output feature maps of the teacher branch and the student branch.

本发明的一个实施例中，在教师分支和学生分支的训练阶段，输入的训练图像均是不带有异常物体仅包含预定义正常类别的图像。In one embodiment of the present invention, during the training phase of the teacher branch and the student branch, the input training images are images without abnormal objects and only contain predefined normal categories.

本发明的一个实施例中，所述步骤S4中的打分函数为：其中，C为通道数，i,j是特征图中的位置索引，c是特征图中的通道索引，是教师分支和学生分支输出特征图中对应通道对应位置处特征值的差。In one embodiment of the present invention, the scoring function in step S4 is: Among them, C is the number of channels, i,j is the position index in the feature map, and c is the channel index in the feature map. It is the difference between the feature values at the corresponding positions of the corresponding channels in the output feature maps of the teacher branch and the student branch.

本发明的一个实施例中，采用以ResNet-50作为骨干网络的PSPNet作为语义分割网络。In one embodiment of the present invention, PSPNet with ResNet-50 as the backbone network is used as the semantic segmentation network.

本发明的一个实施例中，M＝4，C＝512。In one embodiment of the present invention, M=4, C=512.

总体而言，通过本发明所构思的以上技术方案与现有技术相比，具有如下有益效果：In general, the above technical solution conceived by the present invention has the following beneficial effects compared with the prior art:

(1)本发明提出了一种新颖的基于蒸馏比较的异常物体分割方法，该方法在训练阶段仅利用无异常的正常图片即可，测试阶段通过逐位置比较两分支提取的高层语义特征的差异来更有效发现异常，测试阶段不使用语义分割网络分类头的预测作为中间步骤，避免语义分割错误的引入对结果造成不良影响，大幅减少了对语义分割错误的正常类别像素的误判，实现了对图像中异常物体更准确的分割；(1) The present invention proposes a novel abnormal object segmentation method based on distillation comparison. In the training phase, the method only uses normal images without abnormalities. In the test phase, the differences in high-level semantic features extracted by the two branches are compared position by position to more effectively detect abnormalities. In the test phase, the prediction of the semantic segmentation network classification head is not used as an intermediate step to avoid the introduction of semantic segmentation errors that have a negative impact on the results. This greatly reduces the misjudgment of normal category pixels caused by semantic segmentation errors, and achieves more accurate segmentation of abnormal objects in images.

(2)本发明提出了一种简单且灵活的基于蒸馏比较的异常物体分割方法，其教师分支可以是任意的语义分割网络去掉分类器部分，学生分支的结构与教师分支相似即可，可以根据实际情况要求选择模型大小，方法更具有通用性。(2) The present invention proposes a simple and flexible abnormal object segmentation method based on distillation comparison, in which the teacher branch can be any semantic segmentation network without the classifier part. The structure of the student branch can be similar to that of the teacher branch. The model size can be selected according to actual requirements, and the method is more universal.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明提供的基于蒸馏比较的异常物体分割方法的整体流程图。FIG1 is an overall flow chart of the abnormal object segmentation method based on distillation comparison provided by the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the purpose, technical solutions and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

本发明提供了一种基于蒸馏比较的异常物体分割方法，如图1所示，包括以下步骤：The present invention provides a method for segmenting abnormal objects based on distillation comparison, as shown in FIG1 , comprising the following steps:

步骤S1，利用无异常物体的训练图像(1)和其像素级语义标签训练一个语义分割网络，移除训练好的语义分割网络的分类器仅保留特征提取和聚合部分作为教师分支(2)；Step S1, using training images without abnormal objects (1) and their pixel-level semantic labels to train a semantic segmentation network, removing the classifier of the trained semantic segmentation network and retaining only the feature extraction and aggregation parts as the teacher branch (2);

步骤S2，固定步骤S1获得的教师分支的参数，在无异常物体的训练图像(1)上利用语义特征分布蒸馏训练一个与教师分支结构相似的学生分支(3)，语义特征分布蒸馏保证两分支的输出在正常类上保持一致，在异常类上表现不一致；Step S2, fix the parameters of the teacher branch obtained in step S1, and train a student branch (3) with a similar structure to the teacher branch using semantic feature distribution distillation on the training image (1) without abnormal objects. The semantic feature distribution distillation ensures that the outputs of the two branches are consistent on the normal class and inconsistent on the abnormal class.

步骤S3，输入带有异常物体的测试图像(5)，利用步骤S1获得的教师分支和步骤S2获得的学生分支分别对图像进行多尺度特征提取和聚合，获得图像高层语义特征；Step S3, input the test image (5) with the abnormal object, use the teacher branch obtained in step S1 and the student branch obtained in step S2 to perform multi-scale feature extraction and aggregation on the image, and obtain high-level semantic features of the image;

步骤S4，将步骤S3获得的教师分支和学生分支的高层语义特征逐位置进行比较，利用打分函数(6)计算两分支语义特征的差异作为该位置的异常分数得到异常分数图，将异常分数图双线性上采样到原图大小，最后设置合适的阈值将图像中的所有像素划分为正常和异常两类。In step S4, the high-level semantic features of the teacher branch and the student branch obtained in step S3 are compared position by position, and the difference in the semantic features of the two branches is calculated using the scoring function (6) as the anomaly score of the position to obtain an anomaly score map. The anomaly score map is bilinearly upsampled to the original image size, and finally a suitable threshold is set to divide all pixels in the image into normal and abnormal categories.

主要有三个实现部分：1)教师分支；2)学生分支；3)目标函数和打分函数。接下来对本发明中步骤进行详细说明。There are three main implementation parts: 1) teacher branch; 2) student branch; 3) objective function and scoring function. Next, the steps in the present invention are described in detail.

1.教师分支1. Teacher branch

本发明实施例采用以ResNet-50作为骨干网络的PSPNet作为语义分割网络。在无异常物体的训练集上，PSPNet是在语义分割的常规设置下进行训练。实施例中的教师分支(2)是将训练好的PSPNet移除分类器部分，保留特征提取模块ResNet-50进行图像特征提取，特征聚合模块采用金字塔池化模块进行高层语义特征多尺度聚合。为了不损失太多分辨率以免丢失小物体信息，实施例中将ResNet-50去掉最后两层卷积块的下采样操作，为了获得更大感受野以获取更多上下文信息，将ResNet-50第四层卷积块替换成空洞率为2的空洞卷积块，将第五层卷积块替换成空洞率为4的空洞卷积块，输出的图像特征图尺寸为输入图像的1/8。ResNet-50最后一层输出特征经过1×1，2×2，3×3，6×6四个等级的池化，将池化后的特征双线性上采样到输入图像的1/8大小并与ResNet-50最后一层输出特征连接在一起，使用1×1卷积将连接后的特征通道数变为C，最后对每一层通道分别进行标准化得到教师分支(2)最终的输出特征。实施例中将C设置为512。The embodiment of the present invention adopts PSPNet with ResNet-50 as the backbone network as the semantic segmentation network. On the training set without abnormal objects, PSPNet is trained under the conventional settings of semantic segmentation. The teacher branch (2) in the embodiment removes the classifier part of the trained PSPNet, retains the feature extraction module ResNet-50 for image feature extraction, and the feature aggregation module uses a pyramid pooling module for multi-scale aggregation of high-level semantic features. In order not to lose too much resolution to avoid losing small object information, the embodiment removes the downsampling operation of the last two layers of convolution blocks of ResNet-50. In order to obtain a larger receptive field to obtain more contextual information, the fourth layer convolution block of ResNet-50 is replaced with a dilated convolution block with a dilated rate of 2, and the fifth layer convolution block is replaced with a dilated convolution block with a dilated rate of 4. The output image feature map size is 1/8 of the input image. The output features of the last layer of ResNet-50 are pooled at four levels: 1×1, 2×2, 3×3, and 6×6. The pooled features are bilinearly upsampled to 1/8 of the input image size and concatenated with the output features of the last layer of ResNet-50. A 1×1 convolution is used to convert the number of concatenated feature channels to C. Finally, each layer of channels is standardized to obtain the final output features of the teacher branch (2). In the embodiment, C is set to 512.

2.学生分支2. Student Branch

本发明实施例采用的学生分支(3)与实施例中的教师分支(2)结构相似，以ResNet-34作为特征提取模块进行图像特征提取，金字塔池化模块作为特征聚合模块进行高层语义特征多尺度聚合，为了不损失太多分辨率以免丢失小物体信息，实施例中将ResNet-34去掉最后两层卷积块的下采样操作，为了获得更大感受野以获取更多上下文信息，将ResNet-34第四层卷积块替换成空洞率为2的空洞卷积块，将第五层卷积块替换成空洞率为4的空洞卷积块，输出的图像特征图尺寸为输入图像的1/8。ResNet-34最后一层输出特征经过1×1，2×2，3×3，6×6四个等级的池化，将池化后的特征双线性上采样到输入图像的1/8大小并与ResNet-34最后一层输出特征连接在一起，使用1×1卷积将连接后的特征通道数变为C，最后对每一层通道分别进行标准化得到学生分支最终的输出特征。实施例中将C设置为512。The student branch (3) used in the embodiment of the present invention is similar in structure to the teacher branch (2) in the embodiment. ResNet-34 is used as a feature extraction module to extract image features, and a pyramid pooling module is used as a feature aggregation module to perform multi-scale aggregation of high-level semantic features. In order not to lose too much resolution and avoid losing small object information, the downsampling operation of the last two layers of convolution blocks of ResNet-34 is removed in the embodiment. In order to obtain a larger receptive field to obtain more context information, the fourth layer of convolution blocks of ResNet-34 is replaced with a dilated convolution block with a dilated rate of 2, and the fifth layer of convolution blocks is replaced with a dilated convolution block with a dilated rate of 4. The size of the output image feature map is 1/8 of the input image. The output features of the last layer of ResNet-34 are pooled at four levels of 1×1, 2×2, 3×3, and 6×6. The pooled features are bilinearly upsampled to 1/8 of the size of the input image and connected with the output features of the last layer of ResNet-34. The number of feature channels after connection is changed to C using 1×1 convolution. Finally, each layer of channels is standardized to obtain the final output features of the student branch. In the embodiment, C is set to 512.

3.目标函数和打分函数3. Objective function and scoring function

本发明实施例在训练学生分支阶段，固定教师分支的参数，在无异常物体的训练集上利用语义特征分布蒸馏训练学生分支，优化的目标函数(4)为：In the stage of training the student branch, the embodiment of the present invention fixes the parameters of the teacher branch, and trains the student branch using semantic feature distribution distillation on a training set without abnormal objects. The optimized objective function (4) is:

其中M为批大小，C为通道数，i,j是特征图中的位置索引，c是特征图中的通道索引，是教师分支和学生分支输出特征图中对应通道对应位置处特征值的差，本发明实施例设定M＝4，C＝512。Where M is the batch size, C is the number of channels, i,j are the position indexes in the feature map, and c is the channel index in the feature map. It is the difference between the feature values at the corresponding positions of the corresponding channels in the output feature graphs of the teacher branch and the student branch. In this embodiment of the present invention, M=4 and C=512 are set.

本发明实施例在测试阶段，输入带有异常物体的测试图像，教师分支(2)和学生分支(3)分别提取聚合后输出带有异常物体的测试图像(5)的高层语义特征并逐位置进行比较，利用打分函数(6)计算两分支语义特征的差异作为该位置的异常分数得到异常分数图，打分函数(6)为：In the test phase of the embodiment of the present invention, a test image with an abnormal object is input, and the teacher branch (2) and the student branch (3) respectively extract high-level semantic features of the test image (5) with an abnormal object after aggregation and compare them position by position. The difference of the semantic features of the two branches is calculated using the scoring function (6) as the abnormal score of the position to obtain an abnormal score map. The scoring function (6) is:

其中，C为通道数，i,j是特征图中的位置索引，c是特征图中的通道索引，是教师分支和学生分支输出特征图中对应通道对应位置处特征值的差，本发明实施例设定C＝512。语义特征分布蒸馏保证两分支的输出在正常类上保持一致，因为训练过程只在不带有异常物体的仅包含预定义正常类别的图像上进行，对于未经过训练的异常类别像素，两分支提取的语义特征分布呈现任意性，所以两分支的输出特征在异常类上表现不一致。对于正常类别像素，打分函数计算出的异常分数较小，而对于异常类别像素，打分函数计算出的异常分数较大。Among them, C is the number of channels, i,j is the position index in the feature map, and c is the channel index in the feature map. It is the difference between the feature values at the corresponding positions of the corresponding channels in the output feature maps of the teacher branch and the student branch. In the embodiment of the present invention, C=512 is set. The semantic feature distribution distillation ensures that the outputs of the two branches are consistent in the normal class, because the training process is only performed on images that only contain predefined normal categories without abnormal objects. For untrained abnormal category pixels, the semantic feature distribution extracted by the two branches is arbitrary, so the output features of the two branches are inconsistent in the abnormal class. For normal category pixels, the abnormal score calculated by the scoring function is small, while for abnormal category pixels, the abnormal score calculated by the scoring function is large.

本发明提出了一种基于蒸馏比较的异常物体分割方法。蒸馏比较网络包含一个教师分支和一个学生分支，在无异常物体的训练集上，教师分支是在语义分割的常规设置下进行训练，学生分支是利用对教师分支语义特征分布进行蒸馏获得。语义特征分布蒸馏保证两分支的输出在正常类上保持一致，在异常类上表现不一致，测试阶段利用两分支之间语义特征的差异来有效发现异常。蒸馏比较网络简单灵活，测试阶段不使用语义分割网络分类头的预测作为中间步骤，避免语义分割错误的引入对结果造成不良影响，大幅减少了对语义分割错误的正常类别像素的误判，实现了对图像中异常物体更准确的分割。The present invention proposes a method for segmenting abnormal objects based on distillation comparison. The distillation comparison network includes a teacher branch and a student branch. On a training set without abnormal objects, the teacher branch is trained under the conventional setting of semantic segmentation, and the student branch is obtained by distilling the semantic feature distribution of the teacher branch. The semantic feature distribution distillation ensures that the outputs of the two branches remain consistent on the normal class and inconsistent on the abnormal class. The difference in semantic features between the two branches is used to effectively detect anomalies in the test phase. The distillation comparison network is simple and flexible. The prediction of the semantic segmentation network classification head is not used as an intermediate step in the test phase, so as to avoid the introduction of semantic segmentation errors that have a negative impact on the results, greatly reduce the misjudgment of normal category pixels with semantic segmentation errors, and achieve more accurate segmentation of abnormal objects in the image.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。It will be easily understood by those skilled in the art that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for segmenting abnormal objects based on distillation comparison, characterized in that it comprises the following steps:

Step S1, using training images without abnormal objects and their pixel-level semantic labels to train a semantic segmentation network, removing the classifier of the trained semantic segmentation network and retaining only the feature extraction and aggregation parts as the teacher branch;

Step S2, fixing the parameters of the teacher branch obtained in step S1, and using semantic feature distribution distillation to train a student branch with a similar structure to the teacher branch on training images without abnormal objects. The semantic feature distribution distillation ensures that the outputs of the two branches are consistent on the normal class and inconsistent on the abnormal class.

Step S3, input a test image with an abnormal object, use the teacher branch obtained in step S1 and the student branch obtained in step S2 to perform multi-scale feature extraction and aggregation on the image, and obtain high-level semantic features of the image;

In step S4, the high-level semantic features of the teacher branch and the student branch obtained in step S3 are compared position by position, and the difference in the semantic features of the two branches is calculated using a scoring function as the anomaly score of the position to obtain an anomaly score map. The anomaly score map is bilinearly upsampled to the original image size, and finally a suitable threshold is set to divide all pixels in the image into normal and abnormal categories.

2. The abnormal object segmentation method based on distillation comparison as described in claim 1 is characterized in that the semantic segmentation network in step S1 adopts any existing semantic segmentation network.

3. The abnormal object segmentation method based on distillation comparison as described in claim 1 or 2 is characterized in that the classifier part is removed as a teacher branch in the semantic segmentation network, and the teacher branch includes an image feature extraction module and a feature aggregation module. When training the semantic segmentation network, the downsampling operation of the high-level feature extraction module is removed, and at the same time, a hole convolution is introduced at the high-level of the feature extraction module to replace the ordinary convolution, so that the output feature map size of the teacher branch is 1/8 of the input image, the number of channels is C, C is a preset value, and finally each layer of channels is standardized to obtain the final output features of the teacher branch.

4. The abnormal object segmentation method based on distillation comparison as described in claim 1 or 2 is characterized in that the student branch in step S2 adopts a structure similar to that of the teacher branch in step S1.

5. The abnormal object segmentation method based on distillation comparison according to claim 4, characterized in that the student branch in step S2 is specifically:

The student branch includes an image feature extraction module and a feature aggregation module. The downsampling operation of the high-level feature extraction module is removed, and atrous convolution is introduced at the high-level feature extraction module to replace ordinary convolution, so that the output feature map size of the student branch is 1/8 of the input image, which is consistent with the output feature map size of the teacher branch. The number of channels is C, which is a preset value and consistent with the number of output feature channels of the teacher branch. Finally, each layer of channels is standardized to obtain the final output features of the student branch.

6. The abnormal object segmentation method based on distillation comparison according to claim 1 or 2, characterized in that in step S2, the student branch is trained by using semantic feature distribution distillation on the training image without abnormal objects, and the optimized objective function is:

Where M is the batch size, C is the number of channels, i,j are the position indexes in the feature map, and c is the channel index in the feature map. It is the difference between the feature values at the corresponding positions of the corresponding channels in the output feature maps of the teacher branch and the student branch.

7. The abnormal object segmentation method based on distillation comparison as described in claim 1 or 2 is characterized in that, in the training stage of the teacher branch and the student branch, the input training images are images without abnormal objects and only contain predefined normal categories.

8. The abnormal object segmentation method based on distillation comparison according to claim 1 or 2, characterized in that the scoring function in step S4 is:

Among them, C is the number of channels, i,j is the position index in the feature map, and c is the channel index in the feature map. It is the difference between the feature values at the corresponding positions of the corresponding channels in the output feature maps of the teacher branch and the student branch.

9. The abnormal object segmentation method based on distillation comparison as described in claim 1 or 2 is characterized in that PSPNet with ResNet-50 as the backbone network is used as the semantic segmentation network.

10. The abnormal object segmentation method based on distillation comparison as described in claim 6 is characterized in that M=4 and C=512.