CN111598854A

CN111598854A - Complex texture small defect segmentation method based on rich robust convolution characteristic model

Info

Publication number: CN111598854A
Application number: CN202010368806.1A
Authority: CN
Inventors: 陈海永; 刘聪; 王霜; 刘卫朋; 张建华
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2020-05-01
Filing date: 2020-05-01
Publication date: 2020-08-28
Anticipated expiration: 2040-05-01
Also published as: CN111598854B

Abstract

The invention discloses a method for segmenting small defects of complex textures based on a rich and robust convolution feature model. The image of the segmented object undergoes feature reorganization to obtain the feature map of each side output layer; the feature map of each side output layer is sequentially connected to a deconvolution layer and a fine loss function of the side output layer, and the output layer of each stage is obtained. Predict the feature map; at the same time, a fusion layer is added to the model to fuse the feature maps after deconvolution of the feature maps of all side output layers, and then connect a fusion layer fine loss function to obtain the final prediction map to achieve defect segmentation. This method solves the problem of inaccurate prediction results caused by the unbalanced ratio between target pixels and background pixels, and can predict fine targets.

Description

A Segmentation Method for Small Defects in Complex Textures Based on Rich Robust Convolutional Feature Models

技术领域technical field

本发明涉及锂电池表面缺陷检测技术领域，具体涉及一种基于丰富鲁棒卷积特征模型的复杂纹理小缺陷的分割方法。The invention relates to the technical field of lithium battery surface defect detection, in particular to a segmentation method for small defects of complex textures based on a rich robust convolution feature model.

背景技术Background technique

锂电池表面缺陷检测已经成为锂电池表面质量控制的重要技术手段，锂电池表面质量不仅可以提升电池组件的使用寿命，也可以提高锂电池的发电效率。Lithium battery surface defect detection has become an important technical means of lithium battery surface quality control. The surface quality of lithium battery can not only improve the service life of battery components, but also improve the power generation efficiency of lithium battery.

对于基于卷积神经网络(CNN)的裂纹分割方法通常面临着两个问题，其一是裂纹的漏检或误检严重，其二是预测的裂纹分割结果较粗，需要复杂的后期处理才能获得精细裂纹。锂电池表面具有复杂的非均匀纹理背景是产生上述问题的主要原因之一，另一个原因是缺陷图像中裂纹像素与背景像素的占比极度不均衡，例如缺陷图像尺寸为100万像素，而缺陷像素只占几十个像素甚至十几个像素。For the crack segmentation method based on convolutional neural network (CNN), there are usually two problems, one is that the crack is missed or falsely detected, and the other is that the predicted crack segmentation result is coarse and requires complex post-processing to obtain Fine cracks. The complex non-uniform texture background on the surface of the lithium battery is one of the main reasons for the above problems. Another reason is that the proportion of crack pixels and background pixels in the defect image is extremely unbalanced. For example, the size of the defect image is 1 million pixels, and the defect image Pixels only occupy dozens or even a dozen pixels.

通过不同的卷积层获得的信息随层数变深而变得更粗糙，更具备“全局性”，低层卷积层包含了复杂的随机纹理背景和目标细节信息，对于目标信息和背景的区分尚不明显，网络学习到的只是一些形状、边角特征等不具有区分性的信息，在更高层卷积层中重要的目标信息被保留下来，而中间卷积层则包含了必不可少的目标细节信息。但是，一般的卷积神经网络模型仅使用最后一个卷积层或各个阶段池化层之前的卷积层的输出特征，忽视了中间卷积层包含的目标细节信息；而对于裂纹分割来说，其面临的关键性问题就是背景和目标信息的相似度很高，过度的融合势必造成误检严重。The information obtained through different convolutional layers becomes rougher and more "global" as the number of layers becomes deeper. The lower convolutional layer contains complex random texture background and target detail information, and the distinction between target information and background It is not obvious that what the network learns is only some indistinguishable information such as shape and corner features. Important target information is preserved in the higher convolutional layers, while the intermediate convolutional layers contain essential information. target details. However, the general convolutional neural network model only uses the output features of the last convolutional layer or the convolutional layers before each stage pooling layer, ignoring the target details contained in the intermediate convolutional layers; for crack segmentation, The key problem it faces is that the similarity between the background and target information is very high, and excessive fusion will inevitably lead to serious false detections.

尽管基于卷积神经网络的分割方法擅长预测出富含语义信息的轮廓、边缘等特征，但是通过分析可知直接采用基于卷积神经网络进行裂纹分割的预测结果比真实标签的标注裂纹粗得多，导致不能精确地定位裂纹像素。现有文献中很少讨论预测裂纹、边缘、轮廓或者线条过粗的问题，一个可能的原因是这些方法通常会在生成初始预测结果后应用使裂纹、边缘、轮廓或者线条细化的后期处理方法来获得接近真实标签的预测结果，使得处理后的预测结果的宽度对结果看似没有什么影响，实际会降低预测的精度，因此在一些对像素级的精确定位有着较高要求的检测任务中，不能满足需求。Although the segmentation method based on convolutional neural network is good at predicting features such as contours and edges rich in semantic information, it can be seen from the analysis that the prediction result of crack segmentation based on convolutional neural network directly is much thicker than the labeled cracks of real labels. As a result, crack pixels cannot be accurately located. The problem of predicting cracks, edges, contours, or lines that are too thick is rarely discussed in the existing literature. One possible reason is that these methods usually apply post-processing methods that thin the cracks, edges, contours, or lines after generating the initial prediction results. In order to obtain prediction results close to the real labels, the width of the processed prediction results seems to have no effect on the results, which will actually reduce the prediction accuracy. Therefore, in some detection tasks that require high pixel-level accurate positioning, cannot meet demand.

损失函数是用来评估模型的预测值与真实值之间不一致的程度，损失函数越小，代表模型的鲁棒性越好，损失函数能指导模型学习。而锂电池裂纹缺陷图像由于裂纹像素与背景像素占比极度不均衡，导致负样本(背景像素)占有模型损失的很大一部分比重，这会使学习过程陷入损失函数的局部最小值，造成预测更偏向于背景像素，训练出的模型也无法检测出裂纹这一“不容易发生的事件”，因此锂电池裂纹缺陷图像中裂纹像素与背景像素占比不均衡对损失函数产生的影响，是导致裂纹分割结果较粗的根本原因。The loss function is used to evaluate the degree of inconsistency between the predicted value of the model and the real value. The smaller the loss function, the better the robustness of the model, and the loss function can guide the learning of the model. However, the crack defect image of lithium battery is extremely unbalanced in the proportion of crack pixels and background pixels, resulting in negative samples (background pixels) occupying a large proportion of the model loss, which will make the learning process fall into the local minimum value of the loss function, resulting in more accurate predictions. It is biased towards background pixels, and the trained model cannot detect cracks, which is an "uneasy event". Therefore, the unbalanced proportion of crack pixels and background pixels in the crack defect image of lithium batteries has an impact on the loss function, which is the cause of cracks. The root cause of coarse segmentation results.

发明内容SUMMARY OF THE INVENTION

针对现有技术的不足，本发明拟解决的技术问题是，提供一种基于丰富鲁棒卷积特征模型的复杂纹理小缺陷的分割方法。In view of the deficiencies of the prior art, the technical problem to be solved by the present invention is to provide a segmentation method for small defects of complex textures based on a rich and robust convolution feature model.

本发明解决所述技术问题采用的技术方案是：The technical scheme adopted by the present invention to solve the technical problem is:

一种基于丰富鲁棒卷积特征模型的复杂纹理小缺陷的分割方法，其特征在于，该方法包括获取含有待分割对象的图像，并利用丰富鲁棒卷积特征模型对含有待分割对象的图像进行特征重组，获得每个侧输出层的特征图；每个侧输出层的特征图依次连接一个反卷积层和一个侧输出层精细损失函数，得到每个阶段侧输出层的预测特征图；A method for segmenting small defects of complex textures based on a rich robust convolution feature model, characterized in that the method includes acquiring an image containing an object to be segmented, and using the rich robust convolution feature model to segment the image containing the object to be segmented Perform feature reorganization to obtain the feature map of each side output layer; the feature map of each side output layer is sequentially connected to a deconvolution layer and a fine loss function of the side output layer to obtain the predicted feature map of the side output layer of each stage;

同时在模型中添加一个融合层，将所有侧输出层特征图反卷积后的特征图融合在一起，然后连接一个融合层精细损失函数获得最终预测图，实现缺陷分割；At the same time, a fusion layer is added to the model to fuse the feature maps after deconvolution of the feature maps of all side output layers, and then connect a fusion layer fine loss function to obtain the final prediction map to achieve defect segmentation;

其中，侧输出层精细损失函数，满足公式(1)：Among them, the fine loss function of the side output layer satisfies formula (1):

P_side＝σ(A_side)，A_side＝{a_j，j＝1，……|Y|} (2)P _side =σ(A _side ), A _side ={a _j , j=1,...|Y|} (2)

式中，L^(k)(P_side，G)表示第k阶段的距离损失函数；L(W，w^(k))表示第k阶段的加权交叉熵损失函数；P_side表示第k阶段侧输出层的预测特征图；σ为sigmoid激活函数；A_side表示k个阶段侧输出层的预测特征图所有像素处激活值的集合；a_j表示第k阶段侧输出层的预测特征图中的任一像素j处的激活值；Y表示图中缺陷像素和非缺陷像素之和；In the formula, L ^(k) (P _side , G) represents the distance loss function of the k-th stage; L (W, w ^(k) ) represents the weighted cross-entropy loss function of the k-th stage; P _side represents the k-th stage side output layer prediction feature map; σ is the sigmoid activation function; A _side represents the set of activation values at all pixels of the prediction feature map of the output layer of the k stage side; a _j represents any one of the prediction feature maps of the output layer of the kth stage side The activation value at pixel j; Y represents the sum of defective and non-defective pixels in the figure;

融合层精细损失函数由下式得到：The fine loss function of the fusion layer is obtained by the following formula:

L_fuse(W，w)＝L_c(P_fuse，G) (3)L _fuse (W, w)=L _c (P _fuse , G) (3)

式中，L_c表示标准的交叉熵损失函数；P_fuse表示k个阶段侧输出层的预测特征图的融合，即融合层权重；K表示阶段总数；In the formula, L _c represents the standard cross-entropy loss function; P _fuse represents the fusion of the prediction feature maps of the output layers of the k stages, that is, the weight of the fusion layer; K represents the total number of stages;

使用argmin函数将融合层精细损失函数和所有阶段的侧输出层精细损失函数汇总起来，得到目标函数L，用公式(5)表示；Use the argmin function to summarize the fine loss function of the fusion layer and the fine loss function of the side output layer of all stages to obtain the objective function L, which is expressed by formula (5);

最后优化目标函数，获得侧输出层精细损失函数以及融合层精细损失函数的权重。Finally, the objective function is optimized to obtain the fine loss function of the side output layer and the weight of the fine loss function of the fusion layer.

利用丰富鲁棒卷积特征模型进行特征重组的具体过程为：The specific process of feature reorganization using a rich and robust convolutional feature model is as follows:

在原始ResNet40网络的基础上去掉全连接层和第五阶段的池化层，原始ResNet40网络第一阶段的标识块层以及第二阶段的标识块层分别侧向连接一个卷积层，得到第一、二阶段侧输出层的特征图；On the basis of the original ResNet40 network, the fully connected layer and the fifth-stage pooling layer are removed. The first-stage identification block layer and the second-stage identification block layer of the original ResNet40 network are respectively connected to a convolutional layer laterally, and the first stage is obtained. , the feature map of the output layer of the two-stage side;

在原始ResNet40网络的第三、四、五阶段中的每个块层之后分别侧向连接一个卷积层，得到各自块层卷积后的特征图，然后将同一阶段所有的块层卷积后的特征图分别进行逐元素相加获得相应阶段侧输出层的特征图。Connect a convolutional layer laterally after each block layer in the third, fourth and fifth stages of the original ResNet40 network to obtain the feature map after convolution of the respective block layer, and then convolve all the block layers in the same stage. The feature maps of , respectively, are added element by element to obtain the feature map of the output layer on the corresponding stage side.

第一、二阶段的标识块层侧向连接的卷积层的卷积核大小1×1、步长和通道数均为1；第三、四、五阶段中的每个块层后侧向连接的卷积层的卷积核大小为1×1、步长为1、通道数为21。The convolution kernel size of the convolutional layers connected laterally in the first and second stages of the identification block layer is 1 × 1, the stride and the number of channels are both 1; in the third, fourth and fifth stages, the lateral The connected convolutional layers have a kernel size of 1 × 1, a stride of 1, and a number of channels of 21.

所述原始ResNet40网络包括40个卷积层和位于网络最后一层的全连接层，分为5个阶段，每个阶段均包含一个卷积块层和一个或多个标识块层，其中第一、二阶段分别包含一个含卷积块层和一个标识块层，第三、四、五阶段分别包含一个卷积块层和两个标识块层，每个卷积块层和标识块层均包含多个卷积层；每个阶段在所有的标识块层之后均加入一个池化窗口大小为2×2、步长为1的池化层。The original ResNet40 network includes 40 convolutional layers and a fully connected layer at the last layer of the network, which is divided into 5 stages, each stage includes a convolutional block layer and one or more identification block layers, of which the first , the second stage respectively contains a convolution block layer and a marker block layer, the third, fourth and fifth stages respectively include a convolution block layer and two marker block layers, each convolution block layer and marker block layer contains Multiple convolutional layers; each stage adds a pooling layer with a pooling window size of 2×2 and stride 1 after all the marker block layers.

原始ResNet40网络的具体结构为：首先输入的目标图像依次经过卷积核大小为5×5、步长为1、通道数为32的卷积和卷积核大小为2×2、步长为2的最大池化层后得到第一阶段的输入特征；第一阶段的输入特征依次经过卷积核大小为1×1、3×3、1×1，步长均为1、通道数均为32的三个卷积和一个卷积核大小为1×1、步长为1、通道数为32的残差连接后得到第一阶段卷积块层的输出特征；第一阶段卷积块层的输出特征依次经过卷积核大小为1×1、3×3、1×1，步长均为1、通道数均为32的三个卷积后得到第一阶段标识块层的输出特征；第一阶段标识块层的输出特征经过一个卷积核大小为2×2、步长为2的池化层后得到第一阶段的输出特征；The specific structure of the original ResNet40 network is: first, the input target image is successively passed through the convolution kernel size of 5 × 5, the step size is 1, the number of channels is 32, and the convolution kernel size is 2 × 2, the step size is 2 The input features of the first stage are obtained after the maximum pooling layer of The output features of the first-stage convolution block layer are obtained after the three convolutions and a convolution kernel size of 1 × 1, stride of 1, and channel number of 32 residuals are connected to obtain the output features of the first-stage convolutional block layer; The output features are successively passed through three convolutions with the convolution kernel size of 1×1, 3×3, and 1×1, the step size is 1, and the number of channels is 32 to obtain the output features of the first-stage identification block layer; The output features of the first-stage identification block layer are passed through a pooling layer with a convolution kernel size of 2 × 2 and a stride of 2 to obtain the output features of the first stage;

第一阶段的输出特征依次经过卷积核大小为1×1、3×3、1×1，步长均为1、通道数为均为64的三个卷积和一个卷积核大小为1×1、步长为1、通道数为64的残差连接后得到第二阶段卷积块层的输出特征；第二阶段卷积块层的输出特征依次经过卷积核大小为1×1、3×3、1×1，步长均为1、通道数均为64的三个卷积后得到第二阶段标识块层的输出特征；第二阶段标识块层的输出特征经过一个卷积核大小为2×2、步长为2的池化层后得到第二阶段的输出特征；The output features of the first stage go through three convolutions with convolution kernel sizes of 1×1, 3×3, and 1×1, the stride size is 1, and the number of channels is 64, and a convolution kernel size is 1. The output features of the second-stage convolution block layer are obtained after residual connections with a step size of 1, a stride of 1, and a channel number of 64; After three convolutions of 3 × 3, 1 × 1, the step size is 1, and the number of channels is 64, the output features of the second-stage identification block layer are obtained; the output features of the second-stage identification block layer are passed through a convolution kernel. The output features of the second stage are obtained after a pooling layer with a size of 2×2 and a stride of 2;

第二阶段的输出特征依次经过卷积核大小为1×1、3×3、1×1，步长均为1、通道数为均为256的三个卷积和一个卷积核大小为1×1、步长为1、通道数为256的残差连接后得到第三阶段卷积块层的输出特征；第三阶段卷积块层的输出特征依次经过卷积核大小为1×1、3×3、1×1，步长均为1、通道数均为256的三个卷积后得到第三阶段第一标识块层的输出特征；第三阶段第一标识块层的输出特征依次经过卷积核大小为1×1、3×3、1×1，步长均为1、通道数均为256的三个卷积后得到第三阶段第二标识块层的输出特征；第三阶段第二标识块层的输出特征经过一个卷积核大小为2×2、步长为2的池化层后得到第三阶段的输出特征；The output features of the second stage go through three convolutions with convolution kernel sizes of 1×1, 3×3, and 1×1, the stride size is 1, the number of channels is 256, and a convolution kernel size is 1. The output features of the third-stage convolution block layer are obtained after residual connections with a step size of 1, a stride of 1, and a channel number of 256; After three convolutions of 3 × 3, 1 × 1, the step size is 1, and the number of channels is 256, the output features of the first identification block layer in the third stage are obtained; the output features of the first identification block layer in the third stage are sequentially After three convolutions with convolution kernel sizes of 1×1, 3×3, and 1×1, stride of 1, and number of channels of 256, the output features of the second identification block layer in the third stage are obtained; The output features of the second identification block layer in the stage are passed through a pooling layer with a convolution kernel size of 2 × 2 and a stride of 2 to obtain the output features of the third stage;

第四阶段的操作过程与第三阶段相同，第三阶段的输出特征重复第三阶段的操作后得到第四阶段的输出特征；The operation process of the fourth stage is the same as that of the third stage. The output features of the third stage are repeated to obtain the output features of the fourth stage after repeating the operation of the third stage;

第五阶段的操作过程与第四阶段的卷积块层和两个标识块层的操作过程相同，第四阶段的输出特征重复第四阶段的卷积块层和两个标识块层的操作后得到第五阶段的输出特征。The operation process of the fifth stage is the same as the operation process of the convolution block layer and the two identification block layers in the fourth stage. The output features of the fourth stage repeat the operations of the convolution block layer and the two identification block layers in the fourth stage. Get the output features of the fifth stage.

一种基于丰富鲁棒卷积特征模型的复杂纹理小缺陷的分割方法，该方法的具体步骤是：A segmentation method of complex texture small defects based on a rich robust convolutional feature model, the specific steps of the method are:

S1图像预处理S1 image preprocessing

采集含有待分割缺陷的图像，将采集的图像归一化处理为1024×1024像素；对归一化处理后的图像添加像素级标签，这些添加了标签的图像即为目标图像；将目标图像按照比例分为不同的样本集；Collect images containing defects to be segmented, and normalize the collected images to 1024×1024 pixels; add pixel-level labels to the normalized images, and these labeled images are the target images; The proportion is divided into different sample sets;

S2构建丰富鲁棒卷积特征模型S2 builds a rich and robust convolutional feature model

以原始ResNet40网络为基础，在原始ResNet40网络第一阶段的标识块层以及第二阶段的标识块层分别侧向连接一个卷积层，得到第一、二阶段侧输出层的特征图；Based on the original ResNet40 network, a convolutional layer is laterally connected to the first-stage identification block layer and the second-stage identification block layer of the original ResNet40 network to obtain the feature maps of the first and second-stage output layers;

在原始ResNet40网络的第三、四、五阶段中的每个块层之后分别侧向连接一个卷积层，得到各自块层卷积后的特征图，然后将同一阶段所有的块层卷积后的特征图分别进行逐元素相加获得相应阶段侧输出层的特征图；Connect a convolutional layer laterally after each block layer in the third, fourth and fifth stages of the original ResNet40 network to obtain the feature map after convolution of the respective block layer, and then convolve all the block layers in the same stage. The feature maps are added element by element to obtain the feature map of the output layer of the corresponding stage;

将上述五个阶段侧输出层的特征图分别连接一个反卷积层(deconv)进行上采样，得到各自阶段反卷积后的特征图，并将每个阶段反卷积后的特征图分别连接一个侧输出层精细损失函数进行逐像素分类，得到每个阶段侧输出层的预测特征图；The feature maps of the output layers of the above five stages are respectively connected to a deconvolution layer (deconv) for upsampling to obtain the feature maps after deconvolution of each stage, and the feature maps after deconvolution of each stage are respectively connected. A fine loss function of the side output layer performs pixel-by-pixel classification to obtain the predicted feature map of the side output layer at each stage;

将上述各个阶段反卷积后的特征图连接在一起，然后通过卷积核大小为1×1、步长为1的卷积层融合所有反卷积后的特征图，得到融合层特征图；融合层特征图最后连接一个融合层精细损失函数，得到最终预测特征图；Connect the feature maps after deconvolution of the above stages together, and then fuse all the feature maps after deconvolution through a convolution layer with a convolution kernel size of 1 × 1 and a stride of 1 to obtain a fusion layer feature map; The fusion layer feature map is finally connected to a fusion layer fine loss function to obtain the final predicted feature map;

S3模型训练与测试S3 model training and testing

初始化模型参数，输入训练用的目标图像及其对应的像素级标签；在模型训练过程中通过随机梯度下降法将损失传递至每个卷积层的权重，并更新其权重值，随机梯度下降法的动量为0.9，权重衰减为0.0005；每个训练过程中随机采样1张图像，迭代周期数达到100周期时停止训练，完成模型的训练；Initialize the model parameters, input the target image for training and its corresponding pixel-level label; in the model training process, the loss is passed to the weight of each convolutional layer through the stochastic gradient descent method, and its weight value is updated, the stochastic gradient descent method The momentum is 0.9, and the weight decay is 0.0005; 1 image is randomly sampled during each training process, and the training is stopped when the number of iteration cycles reaches 100 cycles, and the training of the model is completed;

将测试用的目标图像缩放调整到1024x1024像素，并将缩放后的目标图像输入到训练完成的模型中；单张图像测试时间为0.1s，重复模型的操作，完成模型测试。The target image for testing is scaled to 1024x1024 pixels, and the scaled target image is input into the model after training; the test time for a single image is 0.1s, and the operation of the model is repeated to complete the model test.

所述待分割对象为裂纹、边缘或线状结构。The objects to be segmented are cracks, edges or linear structures.

与现有技术相比，本发明的有益效果是：Compared with the prior art, the beneficial effects of the present invention are:

本发明从卷积特征的合理利用和设计损失函数的角度出发，旨在使模型学习到尽可能丰富完整的缺陷特征，并在不使用后期处理方法的情况下预测出具有鲁棒性的精细缺陷，使模型学习到的缺陷特征能产生与真实标签尽可能相似的预测特征图；因此，本发明以原始ResNet40网络为基础，构建了丰富鲁棒卷积特征模型，并在Keras1.13深度学习框架下进行端到端的深度学习，该模型采用了多尺度和多等级特征的网络结构，更多的融合了高层特征(第三、四、五阶段)，而较少的融合低层(第一、二阶段)特征，同时在每个阶段分别采用卷积核大小为1x1的卷积进行叠加融合，将所有卷积特征封装为更丰富更鲁棒的表达方式，提高了特征的表达能力；而且利用了每个阶段中间层的输出特征图，克服了现有常规卷积神经网络模型仅使用最后一个卷积层或各个阶段池化层之前的卷积层的输出特征的而忽视中间层包含的目标细节信息的缺陷。From the perspective of rational utilization of convolution features and design of loss functions, the present invention aims to make the model learn as rich and complete defect features as possible, and predict robust fine defects without using post-processing methods. , so that the defect features learned by the model can generate a predicted feature map that is as similar as possible to the real label; therefore, the present invention builds a rich and robust convolutional feature model based on the original ResNet40 network, and uses it in the Keras1.13 deep learning framework. The end-to-end deep learning is performed under the model. The model adopts a network structure of multi-scale and multi-level features, and more high-level features (third, fourth, and fifth stages) are fused, and lower layers (first, second, and second stages) are fused less. At the same time, the convolution kernel size of 1x1 is used for superposition fusion at each stage, and all convolution features are encapsulated into a richer and more robust expression, which improves the expression ability of features; and uses The output feature map of the intermediate layer in each stage overcomes the fact that the existing conventional convolutional neural network model only uses the output features of the last convolutional layer or the convolutional layer before the pooling layer of each stage and ignores the target details contained in the intermediate layer. Information flaws.

本发明针对裂纹像素和非裂纹像素之间占比不均衡导致的预测结果不准确的问题，为预测出精细裂纹，分别在各个阶段的侧输出层引入了侧输出层精细损失函数，在模型的融合层引入了融合层精细损失函数，侧输出层精细损失函数结合了加权交叉熵损失函数和距离损失函数，精细侧输出层的预测特征图中的缺陷特征，同时在模型训练过程中通过侧输出层精细损失函数优化侧输出层中各个卷积层的权重；融合层精细损失函数又融合了侧输出层精细损失函数，精细最终预测特征图中的缺陷特征，同时在模型训练过程中通过融合层精细损失函数优化融合层中各个卷积层的权重，实现了对裂纹从全局到局部的预测。Aiming at the problem of inaccurate prediction results caused by the unbalanced ratio between cracked pixels and non-cracked pixels, the present invention introduces the fine loss function of the side output layer in the side output layer of each stage in order to predict the fine crack. The fusion layer introduces the fusion layer fine loss function, and the side output layer fine loss function combines the weighted cross entropy loss function and the distance loss function. Layer fine loss function optimizes the weight of each convolutional layer in the side output layer; the fusion layer fine loss function fuses the side output layer fine loss function to refine and finally predict the defect features in the feature map, and at the same time, in the model training process, the fusion layer The fine loss function optimizes the weights of each convolutional layer in the fusion layer, and realizes the prediction of cracks from global to local.

从实验结果来看，相较于传统的滤波器分割方法和常规卷积神经网络，使用丰富鲁棒卷积特征模型可以预测出更精细的裂纹，使裂纹分割识别准确率能达到79.64％。From the experimental results, compared with the traditional filter segmentation method and the conventional convolutional neural network, the rich and robust convolutional feature model can predict finer cracks, and the crack segmentation recognition accuracy can reach 79.64%.

本方法能为与裂纹结构类似的具有极度长宽比，且对精细性有较高要求的目标分割提供思路。This method can provide ideas for the segmentation of targets with extreme aspect ratios similar to crack structures and high requirements for fineness.

附图说明Description of drawings

图1为本发明的丰富鲁棒卷积特征模型的网络结构图；Fig. 1 is the network structure diagram of the rich and robust convolution feature model of the present invention;

图2为本发明的不同分割方法的裂纹分割结果图；Fig. 2 is the crack segmentation result diagram of different segmentation methods of the present invention;

图3为本发明的不同分割方法的评估结果对比图。FIG. 3 is a comparison diagram of evaluation results of different segmentation methods of the present invention.

具体实施方式Detailed ways

下面将结合具体附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the specific drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

下面以应用于锂电池中进行锂电池表面裂纹缺陷为例，对本申请方法进行详细叙述。Hereinafter, the method of the present application will be described in detail by taking the application in the lithium battery for the surface crack defect of the lithium battery as an example.

本发明提供一种基于丰富鲁棒卷积特征模型的复杂纹理小缺陷的分割方法(简称方法)，包括以下步骤：The present invention provides a segmentation method (method for short) of small defects in complex textures based on a rich and robust convolution feature model, comprising the following steps:

S1图像预处理S1 image preprocessing

S1-1获取图像S1-1 get image

利用140万近红外相机采集锂电池图像，本发明采集的锂电池图像实际大小为165mm×165mm，将采集的图像归一化处理为1024×1024像素，作为原始图像，本申请中不需要对原始图像进行复杂的预处理过程，保证尺寸归一化后能用于模型输入即可；该图像尺寸的设置几乎保持与原始相机采集的图像尺寸相当，能够更好地保留图像原始信息，且没有复杂的处理过程，提高了算法处理速度，满足了生产线检测的实时性要求；原始图像包括含有待分割对象的图像和不含有待分割对象的图像；The lithium battery image is collected by a 1.4 million near-infrared camera. The actual size of the lithium battery image collected by the present invention is 165mm×165mm, and the collected image is normalized to 1024×1024 pixels. The image is subjected to a complex preprocessing process to ensure that the size can be used for model input after normalization; the setting of the image size is almost the same as the image size collected by the original camera, which can better preserve the original image information, and there is no complicated The processing process improves the processing speed of the algorithm and meets the real-time requirements of production line detection; the original image includes the image containing the object to be segmented and the image that does not contain the object to be segmented;

S1-2制作图像标签S1-2 Make Image Labels

使用Labelimg软件分别对步骤S1-1中所有含有待分割对象的原始图像进行手动标注，添加像素级标签，像素级标签包含了缺陷的面积大小和空间位置信息，这些添加了标签的图像即为用于模型训练、测试以及验证的目标图像；Use Labelimg software to manually label all the original images containing the objects to be segmented in step S1-1, and add pixel-level labels. The pixel-level labels contain the area size and spatial position information of the defects. These labeled images are used. target images for model training, testing and validation;

S1-3制作样本集S1-3 Making a sample set

对步骤S1-2的目标图像进行分组，从中随机抽取20％(默认值)的目标图像作为测试样本集，剩余的目标图像按照4:1的比例随机分为训练样本集和验证样本集；The target images in step S1-2 are grouped, and 20% (default value) of the target images are randomly selected as the test sample set, and the remaining target images are randomly divided into the training sample set and the verification sample set according to the ratio of 4:1;

S2构建丰富鲁棒卷积特征模型(Rich and Robust Convolutional Features,RRCF)S2 builds a rich and robust convolutional feature model (Rich and Robust Convolutional Features, RRCF)

S2-1原始ResNet40网络S2-1 original ResNet40 network

本发明是基于原始ResNet40网络的改进，原始ResNet40网络包括40个卷积层(Conv)和位于网络最后一层的全连接层(Fully Connected Layer)，主要分为5个阶段(Stage)，每个阶段均包含一个卷积块层(Conv Block)和一个或多个标识块层(IdentityBlock)，用stagek_blockm表示，k表示阶段数，m表示相应阶段内的块层数，其中第一、二阶段分别包含一个含卷积块层和一个标识块层，第三、四、五阶段分别包含一个卷积块层和两个标识块层，每个卷积块层和标识块层均包含多个卷积层；每个阶段在所有的标识块层之后均加入一个池化窗口大小为2×2、步长为1的池化层；原始ResNet40网络各个卷积层的具体参数如表1所示；The present invention is based on the improvement of the original ResNet40 network. The original ResNet40 network includes 40 convolutional layers (Conv) and a fully connected layer (Fully Connected Layer) located at the last layer of the network. It is mainly divided into 5 stages (Stage), each Each stage includes a convolution block layer (Conv Block) and one or more identity block layers (IdentityBlock), which are represented by stagek_blockm, k represents the number of stages, m represents the number of block layers in the corresponding stage, and the first and second stages are respectively It consists of a convolution block layer and a marker block layer. The third, fourth, and fifth stages respectively include a convolution block layer and two marker block layers. Each convolution block layer and marker block layer contains multiple convolution blocks. layer; in each stage, a pooling layer with a pooling window size of 2×2 and a stride of 1 is added after all the identification block layers; the specific parameters of each convolutional layer of the original ResNet40 network are shown in Table 1;

首先输入的目标图像依次经过卷积核大小为5×5、步长为1、通道数为32的卷积和卷积核大小为2×2、步长为2的最大池化层(Maxpool)后得到第一阶段的输入特征；第一阶段的输入特征依次经过卷积核大小为1×1、3×3、1×1，步长均为1、通道数均为32的三个卷积和一个卷积核大小为1×1、步长为1、通道数为32的残差连接(Shortcut)后得到第一阶段卷积块层的输出特征；第一阶段卷积块层的输出特征依次经过卷积核大小为1×1、3×3、1×1，步长均为1、通道数均为32的三个卷积后得到第一阶段标识块层的输出特征；第一阶段标识块层的输出特征经过一个卷积核大小为2×2、步长为2的池化层后得到第一阶段的输出特征；First, the input target image goes through a convolution with a kernel size of 5×5, a stride of 1, and a number of channels of 32, and a max pooling layer with a kernel size of 2×2 and a stride of 2 (Maxpool). Then, the input features of the first stage are obtained; the input features of the first stage are successively subjected to three convolutions with convolution kernel sizes of 1×1, 3×3, and 1×1, the stride size is 1, and the number of channels is 32. And a residual connection (Shortcut) with a convolution kernel size of 1 × 1, a stride of 1, and a channel number of 32 to obtain the output features of the first-stage convolutional block layer; the output features of the first-stage convolutional block layer After successively going through three convolutions with the convolution kernel size of 1×1, 3×3, and 1×1, the step size is 1, and the number of channels is 32, the output features of the first-stage identification block layer are obtained; The output features of the identification block layer go through a pooling layer with a convolution kernel size of 2 × 2 and a stride of 2 to obtain the output features of the first stage;

第五阶段的操作过程与第四阶段的卷积块层和两个标识块层的操作过程相同，第四阶段的输出特征重复第四阶段的卷积块层和两个标识块层的操作后得到第五阶段的输出特征；The operation process of the fifth stage is the same as the operation process of the convolution block layer and the two identification block layers in the fourth stage. The output features of the fourth stage repeat the operations of the convolution block layer and the two identification block layers in the fourth stage. Get the output features of the fifth stage;

输入的目标图像被逐层计算提取特征，例如目标图像的大小为1024×1024×32，其中长和宽均为1024，通道数为32；经过卷积核大小为5×5，步长为1，通道数为32的卷积之后的输出尺寸为1024×1024×32；然后再经过卷积核大小为2×2，步长为2的最大池化后的输出尺寸为512×512×32，即第一阶段的输入特征的尺寸为512×512×32；经过上述操作后第一阶段的输出特征尺寸为256×256×32，第二阶段的输出特征尺寸为128×128×64，第三阶段的输出特征尺寸为64×64×256，第四阶段的输出特征尺寸为32×32×256，第五阶段的输出特征尺寸为16×16×256；The input target image is calculated layer by layer to extract features. For example, the size of the target image is 1024 × 1024 × 32, of which the length and width are 1024, and the number of channels is 32; the size of the convolution kernel is 5 × 5, and the step size is 1. , the output size after convolution with a channel number of 32 is 1024×1024×32; then the output size after max pooling with a convolution kernel size of 2×2 and a stride of 2 is 512×512×32, That is, the input feature size of the first stage is 512×512×32; after the above operations, the output feature size of the first stage is 256×256×32, the output feature size of the second stage is 128×128×64, and the third stage’s output feature size is 128×128×64. The output feature size of the stage is 64×64×256, the output feature size of the fourth stage is 32×32×256, and the output feature size of the fifth stage is 16×16×256;

表1原始ResNet40网络的具体参数Table 1 Specific parameters of the original ResNet40 network

表中，Identity Block×2表示经过两次Identity Block操作；In the table, Identity Block×2 means that the Identity Block operation has been performed twice;

S2-2丰富鲁棒卷积特征模型的特征图重组Feature Map Reorganization for S2-2 Rich Robust Convolutional Feature Model

在步骤S2-1构建的原始ResNet40网络的基础上去掉了全连接层和第五阶段的池化层；一方面，去掉全连接层是为了提供一个输出从图像到图像预测的全卷积网络，同时可以降低模型计算复杂度；另一方面，第五阶段的池化层会使步长增加两倍而影响缺陷的定位；虽然池化层都会影响定位，但是前四个阶段采用池化层主要为了加快训练；On the basis of the original ResNet40 network constructed in step S2-1, the fully connected layer and the fifth-stage pooling layer are removed; on the one hand, the fully connected layer is removed to provide a fully convolutional network whose output is predicted from image to image, At the same time, the computational complexity of the model can be reduced; on the other hand, the pooling layer in the fifth stage will double the step size and affect the positioning of defects; although the pooling layer will affect the positioning, the pooling layer is mainly used in the first four stages. To speed up training;

在原始ResNet40网络第一阶段的标识块层(stage1_block2)以及第二阶段的标识块层(stage2_block2)分别侧向连接一个卷积核大小1×1、步长和通道数均为1的卷积层对通道数降维，分别得到第一、二阶段侧输出层的特征图，实现特征信息的整合；In the original ResNet40 network, the first-stage identification block layer (stage1_block2) and the second-stage identification block layer (stage2_block2) are respectively connected laterally to a convolutional layer with a convolution kernel size of 1×1 and a stride and channel number of 1. Reduce the dimensionality of the number of channels, and obtain the feature maps of the output layers of the first and second stages respectively, so as to realize the integration of feature information;

在原始ResNet40网络的第三、四、五阶段中的每个块层，即stage3_block1，stage3_block2，stage3_block3，stage4_block1，stage4_block2，stage4_block3，stage5_block1，stage5_block2，stage5_block3之后分别侧向连接一个卷积核大小为1×1、步长为1、通道数为21的卷积层，得到各自块层卷积后的特征图，然后将同一阶段所有的块层卷积后的特征图分别进行逐元素相加获得相应阶段侧输出层的特征图；After each block layer in the third, fourth and fifth stages of the original ResNet40 network, namely stage3_block1, stage3_block2, stage3_block3, stage4_block1, stage4_block2, stage4_block3, stage5_block1, stage5_block2, stage5_block3, a convolution kernel size of 1× 1. A convolutional layer with a stride of 1 and a channel number of 21, obtains the feature maps after the convolution of the respective block layers, and then adds the feature maps after the convolution of all the block layers in the same stage element by element to obtain the corresponding stage. The feature map of the side output layer;

S2-3构建丰富鲁棒卷积特征模型的预测特征图S2-3 Build the predicted feature map of a rich and robust convolutional feature model

将上述五个阶段侧输出层的特征图分别连接一个反卷积层(deconv)进行上采样，用以预测出与目标图像尺度相同的特征图，得到各自阶段反卷积后的特征图，这些反卷积后的特征图中保留了目标图像中缺陷的空间位置信息；The feature maps of the output layers of the above five stages are respectively connected to a deconvolution layer (deconv) for upsampling, so as to predict the feature maps of the same scale as the target image, and obtain the feature maps after deconvolution of the respective stages. The feature map after deconvolution retains the spatial location information of defects in the target image;

将每个阶段反卷积后的特征图分别连接一个侧输出层精细损失函数进行逐像素分类，得到每个阶段侧输出层的预测特征图，即对反卷积后的特征图上的每个像素都进行分类，精细侧输出层的预测特征图中的缺陷特征，同时在模型训练过程中通过侧输出层精细损失函数优化侧输出层中各个卷积层的权重；The feature map after deconvolution at each stage is connected to a fine loss function of the side output layer for pixel-by-pixel classification, and the predicted feature map of the side output layer of each stage is obtained, that is, for each feature map on the deconvolved feature map. All pixels are classified, and the defect features in the prediction feature map of the fine side output layer are refined, and the weight of each convolutional layer in the side output layer is optimized through the fine loss function of the side output layer during the model training process;

为了直接利用各个阶段侧输出层的预测特征图，在模型中添加了融合层，并在训练过程中学习了融合层权重，即将上述各个阶段反卷积后的特征图连接(concatenate)在一起，然后通过卷积核大小为1×1、步长为1的卷积层融合所有反卷积后的特征图，得到融合层特征图；融合层特征图最后连接一个融合层精细损失函数，得到最终预测特征图，精细最终预测特征图中的缺陷特征，同时在模型训练过程中通过融合层精细损失函数优化融合层中各个卷积层的权重；最终预测特征图即为丰富鲁棒卷积特征模型的预测特征图；In order to directly utilize the predicted feature maps of the output layer at each stage, a fusion layer is added to the model, and the fusion layer weights are learned during the training process. Then, all the deconvolutional feature maps are fused through a convolutional layer with a convolution kernel size of 1×1 and a stride of 1 to obtain a fusion layer feature map; the fusion layer feature map is finally connected to a fusion layer fine loss function to obtain the final Predict the feature map, refine the defect features in the final prediction feature map, and optimize the weights of each convolutional layer in the fusion layer through the fusion layer fine loss function during the model training process; the final predicted feature map is the rich and robust convolutional feature model. The predicted feature map of ;

S3精细损失函数的设计Design of S3 Fine Loss Function

S3-1加权交叉熵损失函数S3-1 weighted cross entropy loss function

由于缺陷图像中在复杂纹理背景下，小缺陷在像素上分布非常不均衡，大多数像素是随机分布的非缺陷像素，即背景，例如裂纹像素和非裂纹像素，因此直接使用交叉熵损失函数将无法精确地将缺陷像素从非缺陷像素中分割出来；加权交叉熵损失函数(weighted cross-entropy loss)引入了类平衡权重系数β来抵消缺陷像素和非缺陷像素之间的不均衡性，每个像素的损失满足式(1)：Due to the complex texture background in the defect image, the distribution of small defects on the pixels is very uneven, and most of the pixels are randomly distributed non-defect pixels, that is, the background, such as crack pixels and non-crack pixels, so directly use the cross entropy loss function to Defective pixels cannot be accurately segmented from non-defective pixels; the weighted cross-entropy loss function (weighted cross-entropy loss) introduces a class-balanced weight coefficient β to offset the imbalance between defective and non-defective pixels, each The loss of pixels satisfies equation (1):

β＝|Y_-|/|Y|，1-β＝|Y₊|/|Y| (2)β=|Y _- |/|Y|, 1-β=|Y ₊ |/|Y| (2)

式中，X表示目标图像；W表示所有网络层参数的集合；w^(k)表示第k阶段侧输出层的预测特征图的权重；Y₊和Y_-分别代表缺陷像素和非缺陷像素；β表示类平衡权重系数；Y＝Y₊和Y_{_}之和；y_j表示目标图像中任一像素；Pr(y_j＝1|X；W，w^(k))表示在像素y_j处使用sigmoid激活函数计算出来的类别分数，且Pr∈[0,1]；In the formula, X represents the target image; W represents the set of all network layer parameters; w ^(k) represents the weight of the predicted feature map of the output layer of the kth stage; Y ₊ and Y _- represent defective pixels and non-defective pixels, respectively; β represents the class balance weight coefficient; Y=Y ₊ and Y _{_} sum; y _j represents any pixel in the target image; Pr(y _j =1|X; W, w ^(k) ) represents the use of sigmoid at pixel y _j The category score calculated by the activation function, and Pr∈[0,1];

S3-2距离损失函数(Dice loss，写为Dice损失函数)S3-2 distance loss function (Dice loss, written as Dice loss function)

给定一个目标图像X和对应的真实标签G，目标图像X的预测图像为P，距离损失函数(Dice损失函数)能比较预测图像P和真实标签G之间的相似度，并能最小化两者的距离，Dice损失函数(Dist(P,G))公式如下：Given a target image X and the corresponding real label G, the predicted image of the target image X is P, the distance loss function (Dice loss function) can compare the similarity between the predicted image P and the real label G, and can minimize the two. The distance between the two, the Dice loss function (Dist(P,G)) formula is as follows:

其中，p_j∈P，为预测图像P中的任一像素；g_j∈G，为真实标签G中的任一像素；N代表目标图像中像素的总个数；Among them, p _j ∈ P, is any pixel in the predicted image P; g _j ∈ G, is any pixel in the real label G; N represents the total number of pixels in the target image;

S3-3精细损失函数的设计Design of S3-3 Fine Loss Function

为了获得更好的缺陷预测性能，提出一种将加权交叉熵损失函数和Dice损失函数结合起来的精细损失函数(Precise Loss Function)；其中Dice损失函数被认为是图像级的损失，专注于两组图像像素之间的相似性，Dice损失函数能够减少冗余信息，在本申请中是生成精细裂纹的关键，Dice损失函数容易出现预测不全，目标丢失的现象，例如预测出的裂纹缺一部分；加权交叉熵损失函数专注于像素级的差异性，因为它是预测图像和真实标签之间每个对应像素之间的距离之和，预测全面，不会造成目标丢失，但是加权交叉熵损失函数容易引入更多的背景信息，导致预测结果不准；因此，两者结合能够实现最小化图像级到像素级之间的距离，实现从全局到局部的预测；In order to obtain better defect prediction performance, a Precise Loss Function that combines the weighted cross-entropy loss function and the Dice loss function is proposed; the Dice loss function is considered as an image-level loss, focusing on two groups of The similarity between image pixels, the Dice loss function can reduce redundant information, which is the key to generating fine cracks in this application, and the Dice loss function is prone to incomplete prediction and loss of targets, such as a predicted crack missing part; The weighted cross-entropy loss function focuses on pixel-level dissimilarity, because it is the sum of the distances between each corresponding pixel between the predicted image and the true label, and the prediction is comprehensive and will not cause target loss, but the weighted cross-entropy loss function is easy to Introducing more background information, resulting in inaccurate prediction results; therefore, the combination of the two can minimize the distance between the image level and the pixel level, and achieve global to local prediction;

为了获得各个阶段侧输出层更加精细的预测特征图，提出一种侧输出层精细损失函数，满足如下公式：In order to obtain a more refined prediction feature map of the side output layer at each stage, a fine loss function of the side output layer is proposed, which satisfies the following formula:

P_side＝σ(A_side)，A_side＝{a_j，j＝1，……|Y|} (5)P _side =σ(A _side ), A _side ={a _j , j=1,...|Y|} (5)

其中，L^(k)(P_side，G)表示第k阶段的距离损失函数；L(W，w^(k))表示第k阶段的加权交叉熵损失函数；P_side表示第k阶段侧输出层的预测特征图；σ为sigmoid激活函数；A_side表示k个阶段侧输出层的预测特征图所有像素处激活值的集合；a_j表示第k阶段侧输出层的预测特征图中的任一像素j处的激活值；Among them, L ^(k) (P _side , G) represents the distance loss function of the k-th stage; L (W, w ^(k) ) represents the weighted cross-entropy loss function of the k-th stage; P _side represents the k-th stage side output layer σ is the sigmoid activation function; A _side represents the set of activation values at all pixels of the prediction feature map of the output layer of the k stage side; a _j represents any pixel in the prediction feature map of the output layer of the kth stage side activation value at j;

L_fuse(W，w)＝L_c(P_fuse，G) (6)L _fuse (W, w)=L _c (P _fuse , G) (6)

其中，L_c表示标准的交叉熵损失函数；P_fuse表示k个阶段侧输出层的预测特征图的融合，即融合层权重；K表示阶段总数；Among them, L _c represents the standard cross-entropy loss function; P _fuse represents the fusion of the prediction feature maps of the output layers of the k stages, that is, the weight of the fusion layer; K represents the total number of stages;

使用argmin函数(最小值函数，表示使目标函数取最小值时的变量值)将融合层精细损失函数和所有阶段的侧输出层精细损失函数汇总起来，得到目标函数，如式(8)所示；通过标准的随机梯度下降法优化目标函数，进而优化各个侧输出层精细损失函数的权重以及融合层精细损失函数的权重；Use the argmin function (minimum function, indicating the variable value when the objective function takes the minimum value) to sum up the fine loss function of the fusion layer and the fine loss function of the side output layer of all stages to obtain the objective function, as shown in formula (8) ; Optimize the objective function through the standard stochastic gradient descent method, and then optimize the weight of the fine loss function of each side output layer and the weight of the fine loss function of the fusion layer;

S4模型训练与测试S4 model training and testing

S4-1模型参数初始化：初始化所有权重值、偏置值、批量归一化尺度因子值，并将初始化的参数数据输入步骤S2建立的丰富鲁棒卷积特征模型，设置模型的初始学习率λ＝0.001；第一至五阶段中的卷积层的权重标准差初始化为0.01，权重偏差初始化为0；融合层所有卷积层的权重标准差初始化0.2，权重偏差初始化为0；S4-1 Model parameter initialization: initialize all weight values, bias values, and batch normalized scale factor values, input the initialized parameter data into the rich robust convolution feature model established in step S2, and set the initial learning rate λ of the model = 0.001; the weight standard deviation of the convolutional layers in the first to fifth stages is initialized to 0.01, and the weight deviation is initialized to 0; the weight standard deviation of all convolutional layers of the fusion layer is initialized to 0.2, and the weight deviation is initialized to 0;

S4-2模型训练：将训练样本集中的目标图像及其对应的像素级标签输入步骤S4-1初始化参数后的丰富鲁棒卷积特征模型中；在模型训练过程中通过随机梯度下降法(Stochastic gradientdescent，SGD)将损失传递至每个卷积层的权重，并更新其权重值，随机梯度下降法的动量为0.9，权重衰减为0.0005；每个训练过程中随机采样1张图像，迭代周期数达到100周期时停止训练，完成丰富鲁棒卷积特征模型的训练；上述操作均是在window10系统下完成，训练使用的电脑CPU为酷睿i7系列，内存为32GB，显卡为NIVIDIAGeforce GTX2080ti；基于Keras1.13深度学习框架实现模型的训练；S4-2 Model training: Input the target images in the training sample set and their corresponding pixel-level labels into the rich robust convolution feature model after the parameters are initialized in step S4-1; in the model training process, the stochastic gradient descent method (Stochastic gradient descent, SGD) passes the loss to the weight of each convolutional layer, and updates its weight value. The momentum of the stochastic gradient descent method is 0.9, and the weight decay is 0.0005; 1 image is randomly sampled in each training process, and the number of iteration cycles Stop training when it reaches 100 cycles, and complete the training of the rich and robust convolutional feature model; the above operations are all completed under the window10 system, the computer used for training is the Core i7 series, the memory is 32GB, and the graphics card is NIVIDIAGeforce GTX2080ti; based on Keras1. 13 The deep learning framework realizes the training of the model;

S4-3模型测试：将测试样本集中的目标图像缩放调整到1024x1024像素，并将缩放后的目标图像输入步骤S4-2训练完成的丰富鲁棒卷积特征模型中；单张图像测试时间为0.1s，以满足生产效率的要求，重复步骤S4-2的操作，完成模型测试。S4-3 model test: scale the target image in the test sample set to 1024x1024 pixels, and input the scaled target image into the rich robust convolution feature model trained in step S4-2; the test time for a single image is 0.1 s, to meet the requirements of production efficiency, repeat the operation of step S4-2 to complete the model test.

为了验证本方法的有效性，使用该方法对含有裂纹缺陷的锂电池图像进行了实验，同时与传统分割方法(Gabor滤波器方法)以及常用的卷积神经网络方法(UNet，U型网络)进行对比，对比结果如图2所示；其中，(a1)表示含有裂纹缺陷的原始图像，(a5)为其对应的真实标签；(a2)为采用Gabor滤波器方法进行特征提取的结果；(a3)为采用UNet模型(U型网络)方法进行特征提取的结果；(a4)为采用本方法所提出的丰富鲁棒卷积特征模型(RRCF)进行特征提取的结果；In order to verify the effectiveness of this method, experiments were carried out on lithium battery images containing crack defects using this method. For comparison, the comparison results are shown in Figure 2; among them, (a1) represents the original image containing crack defects, (a5) is the corresponding real label; (a2) is the result of feature extraction using the Gabor filter method; (a3) ) is the result of feature extraction using the UNet model (U-shaped network) method; (a4) is the result of feature extraction using the rich robust convolutional feature model (RRCF) proposed by this method;

从图2中可以看出，本方法提出的RRCF模型由于在各个阶段对卷积特征进行了叠加融合，学习到的裂纹信息更多，克服了Gabor滤波器方法容易将与裂纹结构相似的栅线结构误检，以及UNet模型容易将有晶粒遮挡的部分误检为裂纹的缺陷；本方法提出的RRCF模型预测出的裂纹线条较细，更加接近与真实标签，结果表明本方法中的两种精细损失函数有助于预测精细裂纹，改善了Gabor滤波器和UNet模型预测不够精细的预测结果，预测精度更高。As can be seen from Figure 2, the RRCF model proposed by this method can learn more crack information due to the superposition and fusion of convolution features at each stage, which overcomes the Gabor filter method which is easy to integrate grid lines similar to crack structures. Structural misdetection, and the UNet model is easy to misdetect the part with grain occlusion as a crack defect; the crack line predicted by the RRCF model proposed by this method is thinner and closer to the real label. The results show that the two methods in this method The fine loss function helps to predict fine cracks, improves the prediction results of the Gabor filter and the UNet model that are not fine enough, and has higher prediction accuracy.

为了定量评估各个方法的性能，分别使用cpt(完整度)、crt(准确度)和F-measure(F测度)三个指标进行定量分析，F-measure是基于cpt和crt得出的计算结果，F-measure值越高，表明采用的方法越有效；各个表达式如式(9)-(11)所示；In order to quantitatively evaluate the performance of each method, three indicators of cpt (completeness), crt (accuracy) and F-measure (F-measure) are used for quantitative analysis. F-measure is the calculation result based on cpt and crt. The higher the F-measure value, the more effective the method used; the expressions are shown in equations (9)-(11);

其中，L_g表示由手动标注的真实标签中裂纹像素的个数；L_t为检测方法中提取到的像素个数；L为检测方法提取结果中与真实标签中相匹配的像素个数；Among them, L _g represents the number of cracked pixels in the real label marked manually; L _t is the number of pixels extracted by the detection method; L is the number of pixels that match the real label in the extraction result of the detection method;

三种方法的各个指标值如图3所示，UNet模型和丰富鲁棒卷积特征模型都表现出较高完整度cpt，反映出卷积神经网络在解决复杂背景干扰下裂纹检测问题的优势；丰富鲁棒卷积特征模型的F-测度为85.81％，性能优于其他两种方法；丰富鲁棒卷积特征模型的的完整度和准确度分别为93.02％和79.64％，一方面通过对网络的多等级融合利于提高裂纹分割的完整度，另一方面根据裂纹这种极度长宽比的特性设计出两种精细损失函数减少背景信息干扰提高准确度，识别的准确性显著提高，在此过程中不易丢失裂纹特征，不会对裂纹漏检；UNet模型的准确度最低(69.5％)，这是由于受背景干扰的影响较大，会引入过多的背景信息，无法实现裂纹的精细分割，因此表现出最低的准确度；综上，本方法无论是裂纹分割的完整度还是准确度都最高，分割效果最好，能够实现对裂纹的精细分割。The index values of the three methods are shown in Figure 3. Both the UNet model and the rich and robust convolutional feature model show high integrity cpt, reflecting the advantages of convolutional neural networks in solving the crack detection problem under complex background interference; The F-measure of the rich robust convolutional feature model is 85.81%, which is better than the other two methods; the completeness and accuracy of the rich robust convolutional feature model are 93.02% and 79.64%, respectively. The multi-level fusion is beneficial to improve the integrity of crack segmentation. On the other hand, according to the extreme aspect ratio of cracks, two fine loss functions are designed to reduce background information interference and improve accuracy, and the accuracy of recognition is significantly improved. In this process It is not easy to lose the crack feature and will not miss the crack detection; the accuracy of the UNet model is the lowest (69.5%), because it is greatly affected by the background interference, it will introduce too much background information, and the fine segmentation of the crack cannot be achieved. Therefore, it shows the lowest accuracy. In summary, this method has the highest completeness and accuracy of crack segmentation, and has the best segmentation effect, which can achieve fine segmentation of cracks.

本发明未述及之处适用于现有技术。What is not described in the present invention applies to the prior art.

Claims

1. A segmentation method of complex texture small defects based on a rich robust convolution feature model is characterized in that the method comprises the steps of obtaining an image containing an object to be segmented, and performing feature recombination on the image containing the object to be segmented by using the rich robust convolution feature model to obtain a feature map of each side output layer; the feature diagram of each side output layer is sequentially connected with a deconvolution layer and a side output layer fine loss function to obtain a prediction feature diagram of each stage side output layer;

meanwhile, a fusion layer is added in the model, the characteristic graphs after deconvolution of the characteristic graphs of all side output layers are fused together, and then a fusion layer fine loss function is connected to obtain a final prediction graph, so that defect segmentation is realized;

wherein, the side output layer fine loss function satisfies formula (1):

P_side＝σ(A_side)，A_side＝{a_j，j＝1，……|Y|} (2)

in the formula, L^(k)(P_sideG) represents the distance loss function of the k stage; l (W, W)^(k)) Representing a weighted cross-entropy loss function for the kth stage; p_sideA prediction feature map representing a k-th stage side output layer; sigma is a sigmoid activation function; a. the_sideA set of activation values at all pixels of a predicted feature map representing k stage-side output layers; a is_jRepresenting an activation value at any pixel j in the prediction feature map of the k-th stage side output layer; y represents the sum of the defective pixel and the non-defective pixel in the diagram;

the fusion layer fine loss function is given by:

L_fuse(W，w)＝L_c(P_fuse，G) (3)

in the formula, L_cA cross entropy loss function representing a criterion; p_fuseRepresenting the fusion of the prediction characteristic graphs of the k stage side output layers, namely the fusion layer weight; k represents the total number of stages;

summarizing the fusion layer fine loss function and the side output layer fine loss functions of all stages by using an argmin function to obtain a target function L, and expressing the target function L by using a formula (5);

and finally, optimizing the objective function to obtain the weights of the side output layer fine loss function and the fusion layer fine loss function.

2. The segmentation method according to claim 1, wherein the specific process of using the rich robust convolution feature model to perform the feature reconstruction is as follows:

removing the full-connection layer and the pooling layer in the fifth stage on the basis of the original ResNet40 network, and respectively laterally connecting the identification block layer in the first stage and the identification block layer in the second stage of the original ResNet40 network with one convolutional layer to obtain characteristic diagrams of side output layers in the first and second stages;

and respectively connecting a convolution layer laterally behind each block layer in the third, fourth and fifth stages of the original ResNet40 network to obtain feature maps after convolution of the respective block layers, and then respectively adding element by element the feature maps after convolution of all the block layers in the same stage to obtain the feature maps of the output layers at the corresponding stages.

3. The segmentation method according to claim 2, wherein the convolution kernel size of the convolutional layer laterally connected to the first and second stages of the identifier block layer is 1 × 1, and the step size and the number of channels are all 1; the convolution kernel size of the convolution layer connected laterally behind each block layer in the third, fourth and fifth stages is 1 × 1, the step size is 1, and the number of channels is 21.

4. The segmentation method according to claim 2, wherein the original ResNet40 network comprises 40 convolutional layers and a fully-connected layer located at the last layer of the network, and is divided into 5 stages, each stage comprises a convolutional block layer and one or more flag block layers, wherein the first and second stages respectively comprise a convolutional block layer and a flag block layer, the third, fourth and fifth stages respectively comprise a convolutional block layer and two flag block layers, and each convolutional block layer and flag block layer comprises a plurality of convolutional layers; each stage adds a pooling layer with a pooling window size of 2 x2 and a step size of 2 after all the flag layers.

5. The segmentation method according to claim 4, wherein the original ResNet40 network has a specific structure:

firstly, sequentially carrying out convolution with convolution kernel size of 5 multiplied by 5, step length of 1, channel number of 32 and maximum pooling layer with convolution kernel size of 2 multiplied by 2 and step length of 2 on an input target image to obtain input characteristics of a first stage; the input features of the first stage are connected with a residual error with convolution kernel size of 1 × 1, step size of 1 × 3 and channel number of 32 through three convolution kernels with convolution kernel size of 1 × 1, convolution kernel size of 3 × 3 and convolution kernel size of 1 × 1 and channel number of 1 × 1 in sequence to obtain output features of the convolution block layer of the first stage; the output characteristics of the first stage convolution block layer are subjected to three convolutions with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 32 in sequence to obtain the output characteristics of the first stage identification block layer; the output characteristic of the first stage identification block layer is subjected to a pooling layer with a convolution kernel size of 2 multiplied by 2 and a step length of 2 to obtain the output characteristic of the first stage;

the output characteristics of the first stage are connected by three convolution volumes with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 64 and a residual error with convolution kernel sizes of 1 × 1, step lengths of 1 and channel numbers of 64 in sequence to obtain the output characteristics of the convolution block layer of the second stage; the output characteristics of the second stage convolution block layer are subjected to three convolutions with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 64 in sequence to obtain the output characteristics of the second stage identification block layer; the output characteristics of the second stage identification block layer pass through a pooling layer with convolution kernel size of 2 multiplied by 2 and step length of 2 to obtain the output characteristics of the second stage;

the output characteristics of the second stage are connected by three convolution volumes with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 256 and a residual error with convolution kernel sizes of 1 × 1, step lengths of 1 and channel numbers of 256 in sequence to obtain the output characteristics of the convolution block layer of the third stage; the output characteristics of the convolution block layer in the third stage are subjected to three convolutions with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 256 in sequence to obtain the output characteristics of the first identification block layer in the third stage; the output characteristics of the first identification block layer in the third stage are subjected to three convolutions with convolution kernel sizes of 1 × 1, 3 × 3 and 1 × 1, step lengths of 1 and channel numbers of 256 in sequence to obtain the output characteristics of the second identification block layer in the third stage; the output characteristic of the second identification block layer in the third stage passes through a pooling layer with the convolution kernel size of 2 multiplied by 2 and the step length of 2 to obtain the output characteristic of the third stage;

the operation process of the fourth stage is the same as that of the third stage, and the output characteristic of the fourth stage is obtained after the output characteristic of the third stage is repeated in the operation of the third stage;

the operation process of the fifth stage is the same as the operation process of the convolution block layer and the two identification block layers of the fourth stage, and the output characteristic of the fourth stage is obtained after the operation of the convolution block layer and the two identification block layers of the fourth stage is repeated.

6. The segmentation method according to any one of claims 1 to 5, characterized in that the method comprises the specific steps of:

s1 image preprocessing

Collecting an image containing a defect to be segmented, and normalizing the collected image into 1024 multiplied by 1024 pixels; adding pixel level labels to the normalized images, wherein the images added with the labels are target images; dividing the target image into different sample sets according to the proportion;

s2 construction of rich robust convolution characteristic model

Based on an original ResNet40 network, respectively and laterally connecting a convolutional layer on an identification block layer at a first stage and an identification block layer at a second stage of the original ResNet40 network to obtain characteristic diagrams of output layers at the first and second stages;

respectively connecting a convolution layer laterally behind each block layer in the third, fourth and fifth stages of the original ResNet40 network to obtain feature maps after convolution of the respective block layers, and then respectively adding element by element the feature maps after convolution of all the block layers in the same stage to obtain the feature maps of the output layers at the corresponding stages;

respectively connecting the feature maps of the five stage side output layers with a deconvolution layer (deconv) for up-sampling to obtain feature maps after respective stage deconvolution, and respectively connecting the feature maps after each stage deconvolution with a side output layer fine loss function for pixel-by-pixel classification to obtain a prediction feature map of each stage side output layer;

connecting the characteristic diagrams after deconvolution of each stage, and then fusing all the characteristic diagrams after deconvolution through a convolution layer with convolution kernel size of 1 multiplied by 1 and step length of 1 to obtain a fused layer characteristic diagram; finally, connecting the fusion layer feature graph with a fusion layer fine loss function to obtain a final prediction feature graph;

s3 model training and testing

Initializing model parameters, and inputting a target image for training and a pixel level label corresponding to the target image; in the model training process, the weight value of each convolution layer is transferred to the loss through a random gradient descent method, the weight value is updated, the momentum of the random gradient descent method is 0.9, and the weight attenuation is 0.0005; randomly sampling 1 image in each training process, stopping training when the number of iterative cycles reaches 100 cycles, and finishing the training of the model;

scaling the target image for testing to 1024x1024 pixels, and inputting the scaled target image into the trained model; the testing time of a single image is 0.1s, and the operation of the model is repeated to complete the model test.

7. The segmentation method according to claim 1, wherein the object to be segmented is a crack, an edge or a line structure.