CN107133960A

CN107133960A - Image crack dividing method based on depth convolutional neural networks

Info

Publication number: CN107133960A
Application number: CN201710267789.0A
Authority: CN
Inventors: 姚剑; 刘亚辉; 赵娇; 谢仁平
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2017-04-21
Filing date: 2017-04-21
Publication date: 2017-09-05

Abstract

The invention discloses an image crack segmentation method based on a deep convolutional neural network, comprising: inputting an original image into a deep convolutional neural network, learning features through convolution, pooling and activation layers, and obtaining a feature map; The feature map with the same size as the original image is obtained by upsampling; the softmax prediction is performed on the feature map with the same size as the original image, and the category of the corresponding position is obtained, so as to realize the segmentation of the crack area. The invention can learn multi-level features from low to high, can quickly realize high-precision crack region segmentation, and is especially suitable for bridge structure crack detection.

Description

Image crack segmentation method based on deep convolutional neural network

技术领域technical field

本发明涉及裂缝检测技术领域，特别涉及一种基于深度卷积神经网络的图像裂缝分割方法。The invention relates to the technical field of crack detection, in particular to an image crack segmentation method based on a deep convolutional neural network.

背景技术Background technique

随着我国经济发展、城市化进程加快、高铁等国家工程飞速发展，在公路、铁路或是城市农村水利建设中，修建的跨越障碍的各类桥梁数目日益激增，桥梁在国民经济发展中起着举足轻重的作用，同时也是我国综合实力的一种体现。由于桥梁的普遍存在性，桥体结构的安全性和持久性不容忽视。裂缝作为一种主要的桥体结构病害特征，对桥体结构的耐久性和安全性产生的危害最大，因此，裂缝是其健康状况的主要评价指标之一。With the rapid development of national projects such as my country's economic development, urbanization, and high-speed rail, the number of bridges built to cross obstacles in the construction of roads, railways, or urban and rural water conservancy has increased rapidly. Bridges play an important role in the development of the national economy. It plays a pivotal role and is also a manifestation of our country's comprehensive strength. Due to the ubiquity of bridges, the safety and durability of bridge structures cannot be ignored. Cracks, as a major bridge structure disease feature, are the most harmful to the durability and safety of the bridge structure. Therefore, cracks are one of the main evaluation indicators of its health status.

目前的检测方法仍然以人工检测为主，存在很多不足之处：The current detection method is still mainly manual detection, which has many shortcomings:

(1)检测效率低：耗时，需要安装或拆卸手架等设备；(1) Low detection efficiency: time-consuming, need to install or disassemble handstand and other equipment;

(2)检测精度低：主要以人眼进行观察检测，容易受到人的主观因素的影响；(2) Low detection accuracy: it is mainly observed and detected by human eyes, which is easily affected by human subjective factors;

(3)劳动强度大：桥梁多，检测工作量大，单纯依靠人工完成，强度很大；(3) High labor intensity: there are many bridges, and the inspection workload is heavy, and it is only done manually, which is very intensive;

(4)安全性低：检测人员需要下到桥梁底下进行检测，安全没有保障；(4) Low safety: inspectors need to go down to the bottom of the bridge for inspection, and safety is not guaranteed;

(5)成本高：使用大量的人力、物力进行检测，花费高；(5) High cost: use a lot of manpower and material resources for testing, and the cost is high;

(6)信息化程度低：无法精确建立桥梁裂缝历史数据，不便于危险桥梁的管理和维护，亦无法给政府管理部门提供决策支撑信息。(6) Low degree of informatization: It is impossible to accurately establish historical data of bridge cracks, it is not convenient for the management and maintenance of dangerous bridges, and it is also unable to provide decision-making support information for government management departments.

上述不足导致目前的检测现状完全不能适应当下的桥梁建设与发展。The above deficiencies lead to the fact that the current detection status cannot adapt to the current bridge construction and development at all.

近几年，基于图像视觉方法检测盒提取道路裂缝的算法相继被提出，这使得道路裂缝的自动化、智能化检测上有了较大的发展。桥梁结构裂缝与道路路面裂缝检测相类似，但前者更为复杂，主要表现为两个方面：第一，桥梁结构的复杂性导致基于视觉图像方法在获取数据时难度极大增加，桥梁结构上表面与道路基本一致，数据相对容易获取，目前也有实用化的系统投入生产，但是，至今都未出现有效的数据获取与处理系统针对其下表面。第二，道路路面纹理特征相对简单、单一，裂缝特征一般具有一致性，而桥梁底部表面纹理相对复杂，存在大量的斑点、污迹、水渍、检测标志线等大量“噪声”，裂缝检测与提取的难度更大。这两点极大限制了桥梁结构裂缝自动化检测与智能化结构安全监测。In recent years, algorithms for detecting and extracting road cracks based on image vision methods have been proposed one after another, which has made great progress in the automatic and intelligent detection of road cracks. Bridge structure crack detection is similar to road pavement crack detection, but the former is more complicated, mainly in two aspects: first, the complexity of the bridge structure makes it extremely difficult to obtain data based on visual image methods, and the upper surface of the bridge structure It is basically consistent with the road, and the data is relatively easy to obtain. At present, there are practical systems put into production. However, there has been no effective data acquisition and processing system for its lower surface so far. Second, the pavement texture features of the road are relatively simple and single, and the crack features are generally consistent, while the surface texture of the bridge bottom is relatively complex, and there are a lot of "noise" such as spots, stains, water stains, and detection marking lines. Extraction is more difficult. These two points greatly limit the automatic detection of bridge structural cracks and intelligent structural safety monitoring.

对裂缝特征的不同理解，使得人们提出的裂缝检测方法也各种各样，大部分算法利用的基本特征是一致的，而且算法的流程也大致相同：预处理，裂缝区域检测与分割，后处理与特征描述。裂缝作为一种看似简单，却因为其背景及本身结构特征而具有多变性和复杂性的目标，现有的道路裂缝检测算法仍存在较多缺陷，远不能满足其需求。Different understandings of crack characteristics lead to a variety of crack detection methods proposed by people. Most of the algorithms use the same basic features, and the algorithm flow is roughly the same: preprocessing, crack region detection and segmentation, postprocessing and feature descriptions. As a seemingly simple object, cracks have variability and complexity due to their background and structural characteristics. The existing road crack detection algorithms still have many defects, which are far from meeting their needs.

简言之，用于检测裂缝的特征多种多样，但是简单而又高效的检测还是一个难点，如何将纷杂多样的裂缝与背景特征较好的分割开，如何快速提取裂缝特征快速重建裂缝结构特征都是非常具有挑战性的问题。In short, there are a variety of features used to detect cracks, but simple and efficient detection is still a difficulty. How to better separate the diverse cracks from background features, how to quickly extract crack features and quickly reconstruct crack structure features These are very challenging questions.

发明内容Contents of the invention

针对现有技术中存在的不足，本发明提供了一种基于深度卷积神经网络的图像裂缝分割方法，该方法基于深度学习法，利用深度卷积神经网络学习到的由低到高的多层次特征，可实现高效和高精度的裂缝区域分割。Aiming at the deficiencies in the prior art, the present invention provides an image crack segmentation method based on a deep convolutional neural network. The method is based on a deep learning method and utilizes the low-to-high multi-layer features for efficient and high-precision fracture region segmentation.

本发明思路为：The idea of the present invention is:

本发明基于全卷积神经网络(Fully Convolutional Networks，FCN)和深度监督网络(Deeply-Supervised Nets，DSN)，提供了一种学习层次特征的深度卷积神经网络，能够自动地学习多种尺度、多种级别的多层次裂缝特征，从而实现端到端、像素级的预测。由于深度卷积神经网络整体、直接地监督学习策略，因此可应用于处理多种场景和尺度的裂缝图像。Based on Fully Convolutional Networks (FCN) and Deeply-Supervised Nets (DSN), the present invention provides a deep convolutional neural network for learning hierarchical features, which can automatically learn multiple scales, Multiple levels of multi-level fracture features to achieve end-to-end, pixel-level prediction. Since the deep convolutional neural network holistically and directly supervises the learning strategy, it can be applied to process crack images of various scenes and scales.

为解决上述技术问题，本发明采用如下技术方案：In order to solve the problems of the technologies described above, the present invention adopts the following technical solutions:

基于深度卷积神经网络的图像裂缝分割方法，包括：Image crack segmentation method based on deep convolutional neural network, including:

所采用的深度卷积神经网络包括5个卷积阶段，其中，第一个卷积阶段包括卷积层Conv1_1和卷积层Conv1_2，卷积层Conv1_1和卷积层Conv1_2分别使用64个卷积核；第二个卷积阶段包括卷积层Conv2_1和卷积层Conv2_2，卷积层Conv2_1和卷积层Conv2_2分别使用128个卷积核；第三个卷积阶段包括卷积层Conv3_1、卷积层Conv3_2和卷积层Conv3_3，卷积层Conv3_1、卷积层Conv3_2和卷积层Conv3_3分别使用256个卷积核；第四个卷积阶段包括卷积层conv4_1、卷积层conv4_2和卷积层conv4_3，卷积层conv4_1、卷积层Conv4_2和卷积层Conv4_3分别使用512个卷积核；第五个卷积阶段包括卷积层Conv5_1、卷积层Conv5_2和卷积层Conv5_3，卷积层conv5_1、卷积层conv5_2和卷积层conv5_3分别使用512个卷积核；卷积层conv1_2、卷积层conv2_2、卷积层conv3_3和卷积层conv4_3后均依次紧跟批标准化层、非线性激活层、池化层；其他卷积层后均依次紧跟批标准化层、非线性激活层；The deep convolutional neural network used includes 5 convolutional stages, among which the first convolutional stage includes convolutional layer Conv1_1 and convolutional layer Conv1_2, and convolutional layer Conv1_1 and convolutional layer Conv1_2 use 64 convolution kernels respectively ; The second convolution stage includes convolutional layer Conv2_1 and convolutional layer Conv2_2, convolutional layer Conv2_1 and convolutional layer Conv2_2 use 128 convolution kernels respectively; the third convolutional stage includes convolutional layer Conv3_1, convolutional layer Conv3_2 and convolutional layer Conv3_3, convolutional layer Conv3_1, convolutional layer Conv3_2 and convolutional layer Conv3_3 use 256 convolution kernels respectively; the fourth convolution stage includes convolutional layer conv4_1, convolutional layer conv4_2 and convolutional layer conv4_3 , convolutional layer conv4_1, convolutional layer Conv4_2, and convolutional layer Conv4_3 use 512 convolution kernels respectively; the fifth convolution stage includes convolutional layer Conv5_1, convolutional layer Conv5_2, and convolutional layer Conv5_3, convolutional layer conv5_1, Convolution layer conv5_2 and convolution layer conv5_3 use 512 convolution kernels respectively; convolution layer conv1_2, convolution layer conv2_2, convolution layer conv3_3 and convolution layer conv4_3 are followed by batch normalization layer, nonlinear activation layer, Pooling layer; other convolutional layers are followed by batch normalization layer and nonlinear activation layer in turn;

将卷积层conv1_1后紧跟的批标准化层、非线性激活层记为第一批标准化层、第一非线性激活层；将卷积层conv1_2后紧跟的批标准化层、非线性激活层、池化层记为第二批标准化层、第二非线性激活层、第一池化层；The batch normalization layer and nonlinear activation layer following the convolutional layer conv1_1 are recorded as the first batch of normalization layers and the first nonlinear activation layer; the batch normalization layer, nonlinear activation layer, and The pooling layer is recorded as the second batch of normalization layer, the second nonlinear activation layer, and the first pooling layer;

将卷积层conv2_1后紧跟的批标准化层、非线性激活层记为第三批标准化层、第三非线性激活层；将卷积层conv2_2后紧跟的批标准化层、非线性激活层、池化层记为第四批标准化层、第四非线性激活层、第二池化层；The batch normalization layer and nonlinear activation layer following the convolutional layer conv2_1 are recorded as the third batch normalization layer and the third nonlinear activation layer; the batch normalization layer, nonlinear activation layer, and The pooling layer is recorded as the fourth batch of normalization layer, the fourth nonlinear activation layer, and the second pooling layer;

将卷积层conv3_1后紧跟的批标准化层、非线性激活层记为第五批标准化层、第五非线性激活层；将卷积层conv3_2后紧跟的批标准化层、非线性激活层记为第六批标准化层、第六非线性激活层；将卷积层conv3_3后紧跟的批标准化层、非线性激活层、池化层记为第七批标准化层、第七非线性激活层、第三池化层；The batch normalization layer and nonlinear activation layer following the convolutional layer conv3_1 are denoted as the fifth batch normalization layer and the fifth nonlinear activation layer; the batch normalization layer and nonlinear activation layer immediately following the convolutional layer conv3_2 are denoted as It is the sixth batch of normalization layer and the sixth non-linear activation layer; the batch normalization layer, non-linear activation layer and pooling layer following the convolutional layer conv3_3 are recorded as the seventh batch of normalization layer, the seventh non-linear activation layer, The third pooling layer;

将卷积层conv4_1后紧跟的批标准化层、非线性激活层记为第八批标准化层、第八非线性激活层；将卷积层conv4_2后紧跟的批标准化层、非线性激活层记为第九批标准化层、第九非线性激活层；将卷积层conv4_3后紧跟的批标准化层、非线性激活层、池化层记为第十批标准化层、第十非线性激活层、第四池化层；The batch normalization layer and nonlinear activation layer following the convolutional layer conv4_1 are denoted as the eighth batch normalization layer and the eighth nonlinear activation layer; the batch normalization layer and nonlinear activation layer immediately following the convolutional layer conv4_2 are denoted as It is the ninth batch of normalization layer and the ninth non-linear activation layer; the batch normalization layer, non-linear activation layer and pooling layer following the convolutional layer conv4_3 are recorded as the tenth batch of normalization layer, the tenth non-linear activation layer, The fourth pooling layer;

将卷积层conv5_1后紧跟的批标准化层、非线性激活层记为第十一批标准化层、第十一非线性激活层；将卷积层conv5_2后紧跟的批标准化层、非线性激活层记为第十二批标准化层、第十二非线性激活层；将卷积层conv5_3后紧跟的批标准化层、非线性激活层、池化层记为第十三批标准化层、第十三非线性激活层；The batch normalization layer and nonlinear activation layer immediately following the convolutional layer conv5_1 are recorded as the eleventh batch normalization layer and the eleventh nonlinear activation layer; the batch normalization layer and nonlinear activation layer immediately following the convolutional layer conv5_2 The layer is marked as the twelfth batch normalization layer and the twelfth nonlinear activation layer; the batch normalization layer, nonlinear activation layer, and pooling layer immediately following the convolutional layer conv5_3 are recorded as the thirteenth batch normalization layer and the tenth batch normalization layer. Three non-linear activation layers;

原始图像从卷积层conv1_1输入；第一个卷积阶段中，原始图像顺次经卷积层conv1_1、第一批标准化层、第一非线性激活层、卷积层conv1_2、第二批标准化层、第二非线性激活层输出第一特征图；第一特征图经第一池化层后输入第二个卷积阶段，顺次经卷积层conv2_1、第三批标准化层、第三非线性激活层、卷积层conv2_2、第四批标准化层、第四非线性激活层输出第二特征图；第二特征图经第二池化层后输入第三个卷积阶段，顺次经卷积层conv3_1、第五批标准化层、第五非线性激活层、卷积层conv3_2、第六批标准化层、第六非线性激活层、卷积层conv3_3、第七批标准化层、第七非线性激活层输出第三特征图；第三特征图经第三池化层后输入第四个卷积阶段，顺次经卷积层conv4_1、第八批标准化层、第八非线性激活层、卷积层conv4_2、第九批标准化层、第九非线性激活层、卷积层conv4_3、第十批标准化层、第十非线性激活层输出第四特征图；第四特征图经第四池化层后输入第四个卷积阶段，顺次经卷积层conv5_1、第十一批标准化层、第十一非线性激活层、卷积层conv5_2、第十二批标准化层、第十二非线性激活层、卷积层conv5_3、第十三批标准化层、第十三非线性激活层输出第五特征图；The original image is input from the convolutional layer conv1_1; in the first convolution stage, the original image is sequentially passed through the convolutional layer conv1_1, the first batch of normalization layers, the first nonlinear activation layer, the convolutional layer conv1_2, and the second batch of normalization layers , The second nonlinear activation layer outputs the first feature map; the first feature map is input to the second convolution stage after the first pooling layer, and then passes through the convolution layer conv2_1, the third batch of normalization layers, and the third nonlinear The activation layer, convolutional layer conv2_2, the fourth batch of normalization layers, and the fourth nonlinear activation layer output the second feature map; the second feature map is input to the third convolution stage after the second pooling layer, and is convolved in sequence Layer conv3_1, fifth batch normalization layer, fifth nonlinear activation layer, convolutional layer conv3_2, sixth batch normalization layer, sixth nonlinear activation layer, convolutional layer conv3_3, seventh batch normalization layer, seventh nonlinear activation Layer outputs the third feature map; the third feature map is input to the fourth convolution stage after passing through the third pooling layer, and then passes through the convolution layer conv4_1, the eighth batch normalization layer, the eighth nonlinear activation layer, and the convolution layer conv4_2, the ninth batch of normalization layer, the ninth non-linear activation layer, the convolutional layer conv4_3, the tenth batch of normalization layer, the tenth non-linear activation layer output the fourth feature map; the fourth feature map is input after the fourth pooling layer The fourth convolution stage, sequentially through the convolutional layer conv5_1, the eleventh batch of normalization layer, the eleventh nonlinear activation layer, the convolutional layer conv5_2, the twelfth batch of normalization layer, the twelfth nonlinear activation layer, The convolutional layer conv5_3, the thirteenth batch normalization layer, and the thirteenth non-linear activation layer output the fifth feature map;

对第一特征图、第二特征图、第三特征图、第四特征图、第五特征图分别经过卷积层Conv1、卷积层Conv2、卷积层Conv3、卷积层Conv4、卷积层Conv5，Conv1、卷积层Conv2、卷积层Conv3、卷积层Conv4、卷积层Conv5的输出特征图分别经去卷积层Deconv2、Deconv3、Deconv4、Deconv5依次进行去卷积、上采样，获得与原始图像大小相同的特征图；将五个与原始图像大小相同的特征图通过连接层融合，连接层的输出依次通过卷积层Conv-1-2降维、通过softmax函数获得各像素对应的裂缝和非裂缝的预测概率，记为融合预测概率，从而实现原始图像的图像裂缝分割；The first feature map, the second feature map, the third feature map, the fourth feature map, and the fifth feature map respectively go through the convolutional layer Conv1, the convolutional layer Conv2, the convolutional layer Conv3, the convolutional layer Conv4, and the convolutional layer The output feature maps of Conv5, Conv1, convolutional layer Conv2, convolutional layer Conv3, convolutional layer Conv4, and convolutional layer Conv5 are respectively deconvoluted and upsampled by deconvolution layers Deconv2, Deconv3, Deconv4, and Deconv5 to obtain The feature map with the same size as the original image; five feature maps with the same size as the original image are fused through the connection layer, and the output of the connection layer is sequentially reduced through the convolutional layer Conv-1-2, and the corresponding pixel value is obtained through the softmax function. The prediction probability of cracks and non-cracks is recorded as the fusion prediction probability, so as to realize the image crack segmentation of the original image;

将第二非线性激活层、去卷积层Deconv2、Deconv3、Deconv4、Deconv5的输出记为侧边输出；Record the output of the second nonlinear activation layer, deconvolution layer Deconv2, Deconv3, Deconv4, Deconv5 as side output;

所述的深度卷积神经网络采用如下方法进行样本训练：The described depth convolutional neural network adopts the following method to carry out sample training:

采集训练样本集I_i表示第i张训练样本，G_i表示I_i对应的人工标注，分别统计各训练样本中非裂缝像素和裂缝像素的数量，分别记为C₀和C₁；Collect training sample set I _i represents the i-th training sample, G _i represents the manual label corresponding to I _i , count the number of non-crack pixels and crack pixels in each training sample, and record them as C ₀ and C ₁ respectively;

以损失代价函数L(I,G,Q)＝L_side(I,G,Q,q)+L_fuse(I,G,Q)最小，优化深度卷积神经网络的参数；其中：With the loss cost function L(I,G,Q)=L _side (I,G,Q,q)+L _fuse (I,G,Q) minimum, optimize the parameters of the deep convolutional neural network; where:

L_side(I,G,W,w)表示侧边损失代价， L _side (I,G,W,w) represents the side loss cost,

ω₀和ω₁是类平衡权值，ω₀＝1，ω₁＝C₀/C₁；I(k)和G(k)是分别指I和G的第k个像素对应的值，logPr(*)指softmax函数； ω ₀ and ω ₁ are class balance weights, ω ₀ =1, ω ₁ =C ₀ /C ₁ ; I(k) and G(k) refer to the values corresponding to the kth pixel of I and G respectively, logPr (*) refers to the softmax function;

L_fuse(I,G,Q)表示融合损失代价，L _fuse (I,G,Q) represents the fusion loss cost,

进一步的，所有卷积层均使用大小为3×3的卷积核窗口。Further, all convolutional layers use a convolutional kernel window with a size of 3×3.

进一步的，所有的池化层采用大小为2×2的卷积核窗口进行最大值池化。Further, all pooling layers use a convolution kernel window of size 2×2 for maximum pooling.

进一步的，所有的批标准化层均采用公式对卷积层处理后的特征图进行批标准化操作，其中，BN_γ,β(x)表示批标准化操作的结果；β＝E[x]，ε为一个趋近于0的常数；Var[·]表示求方差，E[·]表示求均值。Further, all batch normalization layers use the formula Perform a batch normalization operation on the feature map processed by the convolutional layer, where BN _{γ, β} (x) represents the result of the batch normalization operation; β=E[x], ε is a constant approaching to 0; Var[·] means variance, E[·] means mean value.

进一步的，所述的非线性激活层采用非线性函数f(x)＝max(0,x)。Further, the nonlinear activation layer adopts a nonlinear function f(x)=max(0,x).

和现有技术相比，本发明具有以下优点和有益效果：Compared with the prior art, the present invention has the following advantages and beneficial effects:

本发明可学习由低到高的多层次特征，可快速实现高精度的裂缝区域分割，尤其适用于桥梁结构裂缝检测。The invention can learn multi-level features from low to high, can quickly realize high-precision crack region segmentation, and is especially suitable for bridge structure crack detection.

附图说明Description of drawings

图1为本发明深度卷积神经网络示意图；Fig. 1 is a schematic diagram of a deep convolutional neural network of the present invention;

图2为本发明深度卷积神经网络的输出示意图；Fig. 2 is the output schematic diagram of deep convolutional neural network of the present invention;

图3为卷积核窗口示意图，其中，图(a)所示为图像外围不补0的情况，图(b)所示为图像外围补0的情况。Figure 3 is a schematic diagram of the convolution kernel window, where Figure (a) shows the case where the periphery of the image is not filled with 0, and Figure (b) shows the situation where the periphery of the image is filled with 0.

具体实施方式detailed description

下面将结合附图对本发明技术方案进行详细说明。The technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings.

将本发明深度卷积神经网络简记为DeepCrack，其为一种学习层次特征的深度卷积神经网络，基于全卷积神经网络(Fully Convolutional Networks,FCN)和深度监督网络(Deeply-Supervised Nets,DSN)实现，在训练阶段可对裂缝和非裂缝的预测结果的损失代价进行平衡。The deep convolutional neural network of the present invention is abbreviated as DeepCrack, which is a deep convolutional neural network for learning hierarchical features, based on fully convolutional neural networks (Fully Convolutional Networks, FCN) and deep supervision networks (Deeply-Supervised Nets, DSN) implementation, the loss cost of crack and non-crack prediction results can be balanced in the training phase.

本发明包括：将原始图像输入深层卷积神经网络，经卷积、池化和激活层学习特征，获得特征图；对特征图进行上采样得到与原始图像大小相同的特征图；对与原始图像大小相同的特征图进行softmax预测，获得对应位置所属类别，从而实现裂缝区域分割。本发明的特征学习和裂缝区域分割，都是通过深度卷积神经网络自主实现，见图1。The invention includes: inputting the original image into a deep convolutional neural network, learning features through convolution, pooling and activation layers, and obtaining a feature map; performing up-sampling on the feature map to obtain a feature map with the same size as the original image; The feature maps of the same size are predicted by softmax, and the category of the corresponding position is obtained, so as to realize the segmentation of the crack area. The feature learning and fracture region segmentation of the present invention are both independently realized by a deep convolutional neural network, as shown in FIG. 1 .

全卷积神经网络是指利用深度卷积网络学习有效的图像语义特征，最终生成与原始图像大小相同的预测图。深度监督网络是指通过监督各卷积阶段学习到的特征和各卷积阶段融合后的特征，见图2所示的侧边输出，统计学习到的特征进行语义预测的损失代价，从而形成整体监督。本发明中原始图像经过深度卷积神经网络直接得到预测结果，预测结果与原始图像上各像素点对应。这里预测结果即同时得到各像素点属于裂缝和非裂缝的概率值。Fully convolutional neural network refers to the use of deep convolutional network to learn effective image semantic features, and finally generate a prediction map with the same size as the original image. The deep supervision network refers to the features learned by supervising each convolution stage and the fused features of each convolution stage, see the side output shown in Figure 2, and the loss cost of semantic prediction for statistically learned features, thus forming an overall supervision. In the present invention, the original image directly obtains the prediction result through the deep convolutional neural network, and the prediction result corresponds to each pixel on the original image. The prediction result here is to obtain the probability value of each pixel belonging to cracks and non-cracks at the same time.

本发明学习层次特征的深度卷积神经网络，具有多层卷积层(Convolutionallayer)，各层卷积层后都紧跟着批标准化层(Batch Normalization，BN)和非线性激活层(Rectified Linear Units，ReLu)。卷积分为5个卷积阶段，第一卷积阶段包括conv1_1卷积层和conv1_2卷积层；第二卷积阶段包括conv2_1卷积层和conv2_2卷积层；第三卷积阶段包括conv3_1卷积层、conv3_2卷积层和conv3_3卷积层；第四卷积阶段包括conv4_1卷积层、conv4_2卷积层和conv4_3卷积层，第五卷积阶段包括conv5_1卷积层、conv5_2卷积层和conv5_3卷积层。各卷积阶段均使用3×3大小的卷积核窗口，卷积核窗口的移动步距为1。在前4个卷积阶段的最后，即conv1_2卷积层、conv2_2卷积层、conv3_3卷积层和conv4_3卷积层后，除了紧接着BN批标准化层、ReLu非线性激活层外，还有池化层(Pooling)。池化层使用2×2的卷积核窗口对卷积阶段的特征图进行降采样。对各卷积阶段的最后一层学习到的特征，即conv1_2卷积层、conv2_2卷积层、conv3_3卷积层、conv4_3卷积层和conv5_3卷积层，分别上采样0、2、4、8、16倍，得到与原始图像大小一样的特征图。再分别通过一层卷积层降低特征图维度，将得到的五个特征图通过连接层Concat层融合在一起，再通过一层卷积层降维，并通过Softmax后得到每个像素对应的裂缝和非裂缝的预测概率。The deep convolutional neural network for learning hierarchical features of the present invention has a multi-layer convolutional layer (Convolutional layer), and each layer of convolutional layer is followed by a batch normalization layer (Batch Normalization, BN) and a nonlinear activation layer (Rectified Linear Units). , ReLu). Convolution is divided into 5 convolution stages, the first convolution stage includes conv1_1 convolution layer and conv1_2 convolution layer; the second convolution stage includes conv2_1 convolution layer and conv2_2 convolution layer; the third convolution stage includes conv3_1 convolution layer, conv3_2 convolution layer and conv3_3 convolution layer; the fourth convolution stage includes conv4_1 convolution layer, conv4_2 convolution layer and conv4_3 convolution layer, and the fifth convolution stage includes conv5_1 convolution layer, conv5_2 convolution layer and conv5_3 convolutional layer. Each convolution stage uses a 3×3 convolution kernel window, and the movement step of the convolution kernel window is 1. At the end of the first 4 convolutional stages, after the conv1_2 convolutional layer, conv2_2 convolutional layer, conv3_3 convolutional layer, and conv4_3 convolutional layer, in addition to the BN batch normalization layer and the ReLu nonlinear activation layer, there is also pooling Layer (Pooling). The pooling layer uses a 2×2 convolution kernel window to downsample the feature map of the convolution stage. For the features learned in the last layer of each convolution stage, namely conv1_2 convolutional layer, conv2_2 convolutional layer, conv3_3 convolutional layer, conv4_3 convolutional layer and conv5_3 convolutional layer, upsample 0, 2, 4, 8 respectively , 16 times to get a feature map with the same size as the original image. Then reduce the dimension of the feature map through a convolutional layer, fuse the obtained five feature maps through the connection layer Concat layer, and then reduce the dimensionality through a layer of convolutional layer, and get the crack corresponding to each pixel after passing through Softmax and the predicted probability of non-crack.

在卷积阶段通过池化层，使得在不同卷积阶段卷积窗口对应在原始图像上的感受野大小呈现逐级增大的特点，conv1_2卷积层、conv2_2卷积层、conv3_3卷积层、conv4_3卷积层和conv5_3对应在原始图像上的感受野分别为5、14、40、92、196。这一特点使得不同卷积阶段学习得到的图像特征呈现由浅到深、由低级到抽象的特点，即可自动学习的多尺度、多种级别的裂缝特征。In the convolution stage, the pooling layer is used to make the receptive field size of the convolution window corresponding to the original image in different convolution stages increase step by step, conv1_2 convolution layer, conv2_2 convolution layer, conv3_3 convolution layer, The conv4_3 convolutional layer and conv5_3 correspond to receptive fields on the original image of 5, 14, 40, 92, and 196, respectively. This feature makes the image features learned in different convolution stages show the characteristics of from shallow to deep, from low-level to abstract, which can automatically learn multi-scale and multi-level crack features.

下面将结合表1描述深度卷积神经网络的数据处理过程。见表1，Conv<a_b>-<c>-<d>中，a表示卷积阶段，b表示卷积阶段的卷积层编号，c表示卷积核大小，d表示卷积核数量。例如，Conv1_1-3-64表示卷积层conv1_1，其使用64个卷积核，所使用的卷积核大小为3×3(即卷积窗口大小为3×3)。BN表示批标准化层，ReLU表示非线性激活层，Pool1、Pool2、Pool3、Pool4表示池化层。Conv<a>-<c>-<d>其中a,c,d与前面的字符含义一样，例如，Conv1-3-1表示使用1个大小为3×3的卷积核窗口的卷积层。Deconv<a>-<e>表示去卷积层，e表示上采样的倍数，Crop表示对上采样结果进行裁剪，使得输出的特征图尺寸保持一致。Conv1表示卷积层Conv1-3-1的输出，Deconv2、Deconv3、Deconv4、Deconv5分别表示去卷积层Deconv2-2、Deconv3-4、Deconv4-8、Deconv5-16的输出。The following will describe the data processing process of the deep convolutional neural network in conjunction with Table 1. See Table 1, Conv<a_b>-<c>-<d>, a represents the convolution stage, b represents the convolution layer number of the convolution stage, c represents the size of the convolution kernel, and d represents the number of convolution kernels. For example, Conv1_1-3-64 represents the convolutional layer conv1_1, which uses 64 convolution kernels, and the size of the convolution kernel used is 3×3 (that is, the convolution window size is 3×3). BN represents the batch normalization layer, ReLU represents the nonlinear activation layer, and Pool1, Pool2, Pool3, and Pool4 represent the pooling layer. Conv<a>-<c>-<d> where a, c, and d have the same meaning as the previous characters, for example, Conv1-3-1 means a convolution layer using a convolution kernel window with a size of 3×3 . Deconv<a>-<e> represents the deconvolution layer, e represents the multiple of upsampling, and Crop represents the cropping of the upsampling result so that the output feature map size remains consistent. Conv1 represents the output of the convolutional layer Conv1-3-1, and Deconv2, Deconv3, Deconv4, and Deconv5 represent the output of the deconvolutional layers Deconv2-2, Deconv3-4, Deconv4-8, and Deconv5-16, respectively.

本发明中，卷积阶段均使用大小为3×3的卷积核，在图像外围补充一圈0，见图3。其中，灰色表示卷积核窗口，在图像外围不补0的情况下，见图3(a)，卷积核窗口中心不能位于图像边缘像素位置，以左上角边缘像素为例，起始卷积核窗口中心只能达到(1，1)，若输入图像的高度和宽度分别是M和N，则输出图像数据的高度和宽度只能为M-2和N-2。在图像外围补0的情况下，见图3(b)，卷积核窗口中心能位于图像边缘像素位置。仍然以左上角为例，起始卷积核窗口中心能达到(0，0)，若输入图像的高度和宽度分别是M和N，则输出图像的高度和宽度也分别为M和N。In the present invention, a convolution kernel with a size of 3×3 is used in the convolution stage, and a circle of 0 is added to the periphery of the image, as shown in FIG. 3 . Among them, the gray color represents the convolution kernel window. In the case where the periphery of the image is not filled with 0, see Figure 3(a). The center of the convolution kernel window cannot be located at the edge pixel position of the image. Taking the edge pixel in the upper left corner as an example, the initial convolution The center of the kernel window can only reach (1, 1). If the height and width of the input image are M and N respectively, the height and width of the output image data can only be M-2 and N-2. In the case of adding 0 to the periphery of the image, see Figure 3(b), the center of the convolution kernel window can be located at the edge pixel position of the image. Still taking the upper left corner as an example, the center of the initial convolution kernel window can reach (0, 0). If the height and width of the input image are M and N respectively, the height and width of the output image are also M and N respectively.

表1深度卷积神经网络的数据处理过程示意表Table 1 Schematic diagram of the data processing process of the deep convolutional neural network

本发明包括训练阶段和测试阶段，训练阶段，对每个softmax的输出进行损失统计，其中Conv1、Deconv2、…DeConv5部分统计的损失代价为侧边输出的损失代价，Conv-1-2对应得softmax是融合后的损失代价。测试阶段，去掉侧边输边的softmax，其余部分保留。The present invention includes a training phase and a testing phase. In the training phase, loss statistics are performed on the output of each softmax, wherein the loss cost of the Conv1, Deconv2, ... DeConv5 part statistics is the loss cost of the side output, and Conv-1-2 corresponds to the softmax is the loss cost after fusion. In the test phase, the softmax on the side input side is removed, and the rest is reserved.

本发明中，对于大小为N×N的卷积核窗口，其权值矩阵记为设卷积核窗口滑动到某一位置时对应的图像像素特征集为x，则卷积操作结果y为：In the present invention, for a convolution kernel window whose size is N×N, its weight matrix is denoted as Let the corresponding image pixel feature set when the convolution kernel window slides to a certain position be x, then the convolution operation result y is:

式(1)中：In formula (1):

x_i,j表示卷积核窗内第i行第j列像素的特征值；x _{i, j} represent the eigenvalues of the i-th row and j-th column pixel in the convolution kernel window;

w_i,j表示权值矩阵w中第i行第j列的元素，即x_i,j对应的权值；w _{i, j} represents the element in the i-th row and j-column in the weight matrix w, that is, the weight corresponding to x _{i, j} ;

b为偏置量。b is the offset.

本发明中，w和b为待优化参数。In the present invention, w and b are parameters to be optimized.

本发明中，批标准化层用来进行批标准化操作，批标准化操作是一种归一化方法，记每次输入深度卷积神经网络的K张图像为{p_k|k＝1,2,...K}，对第k张图像像素位置(i,j)上卷积层输出的特征值x_i,j,k进行批标准化操作：In the present invention, the batch normalization layer is used for batch normalization operation, which is a normalization method, and the K images input to the deep convolutional neural network are recorded as {p _k |k=1,2,. ..K}, perform batch normalization on the eigenvalues x _i,j,k output by the convolutional layer at the pixel position (i,j) of the k-th image:

式(2)中：In formula (2):

BN_γ,β(x_i,j,k)表示批标准化操作结果；BN _γ,β ( _xi,j,k ) represents the result of the batch normalization operation;

γ和β的初始值分别为：β₀＝E[x_i,j]，训练过程中，采用梯度下降法对γ和β进行优化；The initial values of γ and β are respectively: β ₀ ＝E[ _xi,j ], during the training process, the gradient descent method is used to optimize γ and β;

ε为一个趋近于0的常数，一般在(1e-10,1e-3)范围取值； ε is a constant close to 0, generally in the range of (1e-10,1e-3);

E[·]表示求均值， E[·] represents mean value,

Var[·]表示求方差， Var[ ] means to find the variance,

本实施例中，非线性激活层ReLu采用非线性函数进行非线性激活处理，所采用的非线性函数f(x)＝max(0,x)。In this embodiment, the nonlinear activation layer ReLu uses a nonlinear function to perform nonlinear activation processing, and the adopted nonlinear function f(x)=max(0,x).

本实施例中，池化层采用最值池化，对于大小为N×N的卷积核窗口中的像素特征值，仅保留图像像素特征集合x中的最大值，即：In this embodiment, the pooling layer adopts the most value pooling, and for the pixel feature values in the convolution kernel window with a size of N×N, only the maximum value in the image pixel feature set x is retained, namely:

f(x)＝max(x) (3)f(x)=max(x) (3)

式(3)中，f(x)表示池化操作结果。In formula (3), f(x) represents the pooling operation result.

本发明中，连接层用来合并数据，用来将特征矩阵进行融合，连接层的操作公式如下：In the present invention, the connection layer is used to combine data, and is used to convert the feature matrix For fusion, the operation formula of the connection layer is as follows:

式(4)中，W、H、c分别指特征矩阵的宽度、高度和通道数。In formula (4), W, H, and c refer to the width, height, and number of channels of the feature matrix, respectively.

本发明中，采用softmax方法计算各像素所属类别的概率值。对D维的特征向量x和权值向量Q，则特征向量x属于类别a的预测概率为：In the present invention, the softmax method is used to calculate the probability value of the category to which each pixel belongs. For D-dimensional feature vector x and weight vector Q, the predicted probability that feature vector x belongs to category a is:

本发明中类别只有两类，非裂缝类和裂缝类，即a＝0或1。There are only two classes in the present invention, non-crack class and crack class, ie a=0 or 1.

根据监督各个卷积阶段学习到的特征，统计各个卷积阶段的预测结果的损失代价和融合后的预测结果的损失代价。本发明中对裂缝和非裂缝预测结果的损失代价进行平衡，首先，采集训练样本集其中，R表示训练样本数，I_r指输入训练的第r张原始图像，即第r个训练样本；G_r是I_r对应的人工标注。将深度卷积神经网络的所有参数记为W，W包括深度卷积神经网络中所有卷积层的权值矩阵和偏置量；深度监督部分的参数为w＝{(w⁽¹⁾,w⁽²⁾,...,w^(M))}，M为索引总数，m表示索引号，w^(m)分别对应侧边输出部分的网络参数，因此，侧边深度监督部分的损失代价函数为：According to the features learned in each convolution stage of supervision, the loss cost of the prediction results of each convolution stage and the loss cost of the fusion prediction results are counted. In the present invention, the loss cost of crack and non-crack prediction results is balanced. First, the training sample set is collected Among them, R represents the number of training samples, I _r refers to the r-th original image input for training, that is, the r-th training sample; G _r is the manual label corresponding to I _r . Record all the parameters of the deep convolutional neural network as W, W includes the weight matrix and bias of all convolutional layers in the deep convolutional neural network; the parameters of the deep supervision part are w={(w ⁽¹⁾ ,w ⁽²⁾ ,...,w ^(M) )}, M is the total number of indexes, m represents the index number, w ^(m) respectively correspond to the network parameters of the side output part, therefore, the loss cost function of the side depth supervision part for:

式(6)中：In formula (6):

l_side(I,G,W,w^(m))表示深度监督部分的图像级损失函数；l _side (I,G,W,w ^(m) ) represents the image-level loss function of the deep supervision part;

表示深度监督部分输出的预测结果DSN-side，即裂缝与非裂缝的概率图； Indicates the prediction result DSN-side output by the deep supervision part, that is, the probability map of cracks and non-cracks;

a_m为常数系数，默认值为1.0；a _m is a constant coefficient, the default value is 1.0;

Δ(·)表示交叉熵损失函数，考虑到一张图像中90％以上都是非裂缝像素，因此，Δ(·)被定义为：Δ( ) represents the cross-entropy loss function. Considering that more than 90% of an image is non-crack pixels, Δ( ) is defined as:

式(7)中：In formula (7):

ω₀和ω₁是类平衡权值；ω ₀ and ω ₁ are the class balance weights;

|I|表示训练样本图像I的像素数；|I| represents the number of pixels of the training sample image I;

I(k)和G(k)是分别指I和G中第k个像素对应的值；I(k) and G(k) refer to the value corresponding to the kth pixel in I and G respectively;

logPr(*)指softmax函数Pr(*)的结果取对数。logPr(*) refers to the logarithm of the result of the softmax function Pr(*).

对于一张训练样本，统计其中的非裂缝像素与裂缝像素的数量，分别记为C₀和C₁，本发明中，ω₀＝1，ω₁＝C₀/C₁。For a training sample, count the number of non-crack pixels and crack pixels therein, which are recorded as C ₀ and C ₁ respectively. In the present invention, ω ₀ =1, ω ₁ =C ₀ /C ₁ .

除了侧边输出部分的监督外，最终的融合预测部分在训练过程中也会统计损失，记该部分的损失函数L_fuse(I,G,W)为：In addition to the supervision of the side output part, the final fusion prediction part will also count the loss during the training process. The loss function L _fuse (I, G, W) of this part is recorded as:

因此，整体损失代价函数L(I,G,W)记为：Therefore, the overall loss cost function L(I,G,W) is written as:

L(I,G,W)＝L_side(I,G,W,w)+L_fuse(I,G,W) (9)L(I,G,W)=L _side (I,G,W,w)+L _fuse (I,G,W) (9)

本发明的整体化优化目标即获取使整体损失代价值L(I,G,W)最小的参数：The overall optimization goal of the present invention is to obtain the parameters that minimize the overall loss cost value L (I, G, W):

Claims

1. the image crack dividing method based on depth convolutional neural networks, it is characterized in that：

The depth convolutional neural networks used include 5 convolution stages, wherein, first convolution stage includes convolutional layer Conv1_1 and convolutional layer Conv1_2, convolutional layer Conv1_1 and convolutional layer Conv1_2 are respectively using 64 convolution kernels；Second The convolution stage includes convolutional layer Conv2_1 and convolutional layer Conv2_2, and convolutional layer Conv2_1 and convolutional layer Conv2_2 are used respectively 128 convolution kernels；3rd convolution stage includes convolutional layer Conv3_1, convolutional layer Conv3_2 and convolutional layer Conv3_3, convolution Layer Conv3_1, convolutional layer Conv3_2 and convolutional layer Conv3_3 are respectively using 256 convolution kernels；4th convolution stage includes Convolutional layer conv4_1, convolutional layer conv4_2 and convolutional layer conv4_3, convolutional layer conv4_1, convolutional layer Conv4_2 and convolution Layer Conv4_3 is respectively using 512 convolution kernels；5th convolution stage include convolutional layer Conv5_1, convolutional layer Conv5_2 and Convolutional layer Conv5_3, convolutional layer conv5_1, convolutional layer conv5_2 and convolutional layer conv5_3 are respectively using 512 convolution kernels； Successively immediately following batch standardization after convolutional layer conv1_2, convolutional layer conv2_2, convolutional layer conv3_3 and convolutional layer conv4_3 Layer, nonlinear activation layer, pond layer；Successively immediately following batch normalization layer, nonlinear activation layer after other convolutional layers；

Batch normalization layer closelyed follow after convolutional layer conv1_1, nonlinear activation layer are designated as first normalization layer, the first non-thread Property active coating；Batch normalization layer closelyed follow after convolutional layer conv1_2, nonlinear activation layer, pond layer are designated as second batch standard Change layer, the second nonlinear activation layer, the first pond layer；

Batch normalization layer closelyed follow after convolutional layer conv2_1, nonlinear activation layer are designated as the 3rd batch of normalization layer, the 3rd non-thread Property active coating；Batch normalization layer closelyed follow after convolutional layer conv2_2, nonlinear activation layer, pond layer are designated as the 4th batch of standard Change layer, the 4th nonlinear activation layer, the second pond layer；

Batch normalization layer closelyed follow after convolutional layer conv3_1, nonlinear activation layer are designated as the 5th batch of normalization layer, the 5th non-thread Property active coating；Batch normalization layer closelyed follow after convolutional layer conv3_2, nonlinear activation layer are designated as the 6th batch of normalization layer, the Six nonlinear activations layer；Batch normalization layer closelyed follow after convolutional layer conv3_3, nonlinear activation layer, pond layer are designated as the 7th Criticize normalization layer, the 7th nonlinear activation layer, the 3rd pond layer；

Batch normalization layer closelyed follow after convolutional layer conv4_1, nonlinear activation layer are designated as the 8th batch of normalization layer, the 8th non-thread Property active coating；Batch normalization layer closelyed follow after convolutional layer conv4_2, nonlinear activation layer are designated as the 9th batch of normalization layer, the Nine nonlinear activations layer；Batch normalization layer closelyed follow after convolutional layer conv4_3, nonlinear activation layer, pond layer are designated as the tenth Criticize normalization layer, the tenth nonlinear activation layer, the 4th pond layer；

Batch normalization layer closelyed follow after convolutional layer conv5_1, nonlinear activation layer are designated as the tenth a collection of normalization layer, the 11st Nonlinear activation layer；Batch normalization layer closelyed follow after convolutional layer conv5_2, nonlinear activation layer are designated as the 12nd batch of standardization Layer, the 12nd nonlinear activation layer；By batch normalization layer closelyed follow after convolutional layer conv5_3, nonlinear activation layer, pond layer note For the 13rd batch of normalization layer, the 13rd nonlinear activation layer；

Original image is inputted from convolutional layer conv1_1；In first convolution stage, original image sequentially through convolutional layer conv1_1, First normalization layer, the first nonlinear activation layer, convolutional layer conv1_2, second batch normalization layer, the second nonlinear activation layer Export fisrt feature figure；Fisrt feature figure inputs second convolution stage after the first pond layer, sequentially through convolutional layer conv2_ 1st, the 3rd batch of normalization layer, the 3rd nonlinear activation layer, convolutional layer conv2_2, the 4th batch of normalization layer, the 4th nonlinear activation Layer output second feature figure；Second feature figure inputs the 3rd convolution stage after the second pond layer, sequentially through convolutional layer Conv3_1, the 5th batch of normalization layer, the 5th nonlinear activation layer, convolutional layer conv3_2, the 6th batch of normalization layer, the 6th non-thread Property active coating, convolutional layer conv3_3, the 7th batch of normalization layer, the 7th nonlinear activation layer output third feature figure；Third feature Figure inputs the 4th convolution stage after the 3rd pond layer, sequentially through convolutional layer conv4_1, the 8th batch of normalization layer, the 8th non- Linear active coating, convolutional layer conv4_2, the 9th batch of normalization layer, the 9th nonlinear activation layer, convolutional layer conv4_3, the tenth batch Normalization layer, the tenth nonlinear activation layer output fourth feature figure；Fourth feature figure inputs the 4th volume after the 4th pond layer Product the stage, sequentially through convolutional layer conv5_1, the tenth a collection of normalization layer, the 11st nonlinear activation layer, convolutional layer conv5_2, 12nd batch of normalization layer, the 12nd nonlinear activation layer, convolutional layer conv5_3, the 13rd batch of normalization layer, the 13rd non-thread Property active coating output fifth feature figure；

To fisrt feature figure, second feature figure, third feature figure, fourth feature figure, fifth feature figure respectively through convolutional layer Conv1, convolutional layer Conv2, convolutional layer Conv3, convolutional layer Conv4, convolutional layer Conv5, Conv1, convolutional layer Conv2, convolution Layer Conv3, convolutional layer Conv4, convolutional layer Conv5 output characteristic figure respectively through layer Deconv2 that deconvolute, Deconv3, Deconv4, Deconv5 are deconvoluted, up-sampled successively, are obtained and original image size identical characteristic pattern；By five with Original image size identical characteristic pattern is merged by articulamentum, and the output of articulamentum passes sequentially through convolutional layer Conv-1-2 drops Dimension, the prediction probability by softmax functions each corresponding crack of pixel of acquisition and non-crack, are designated as fusion forecasting probability, from And realize the image crack segmentation of original image；

Second nonlinear activation layer, deconvolute layer Deconv2, Deconv3, Deconv4, Deconv5 output are designated as side Output；

Described depth convolutional neural networks are adopted to be trained with the following method：

Gather training sample setTraining sample set is inputted into depth convolutional neural networks, wherein, I_rRepresent the R training samples, G_rFor I_rCorresponding artificial mark, counts the quantity of non-crack pixel and crack pixel in each training sample, point C is not designated as it₀And C₁；

To lose the minimum parameter as objective optimization depth convolutional neural networks of cost value, described loss cost function is：

L (I, G, W)=L_side(I,G,W,w)+L_fuse(I,G,W)

Wherein：

L_side(I, G, W, w) represent side loss cost,

<mrow> <mi>&Delta;</mi> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mi>I</mi> <mo>|</mo> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>I</mi> <mo>|</mo> </mrow> </munderover> <mo>{</mo> <msub> <mi>&omega;</mi> <mn>0</mn> </msub> <mi>log</mi> <mi> </mi> <mi>Pr</mi> <mrow> <mo>(</mo> <mi>G</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <mo>=</mo> <mn>0</mn> <mo>|</mo> <mi>I</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <mo>;</mo> <mi>W</mi> <mo>,</mo> <msup> <mi>w</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>}</mo> <mo>+</mo> <msub> <mi>&omega;</mi> <mn>1</mn> </msub> <mi>log</mi> <mi> </mi> <mi>Pr</mi> <mrow> <mo>(</mo> <mi>G</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>I</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <mo>;</mo> <mi>W</mi> <mo>,</mo> <msup> <mi>w</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>}</mo> </mrow>

ω₀And ω₁It is class balance weight, ω₀=1, ω₁=C₀/C₁；

I (k) and G (k) are the corresponding values of k-th of pixel for referring to I and G respectively, and logPr (*) refers to softmax functions；

L_fuse(I, G, Q) represents fusion loss cost,

<mrow> <msub> <mi>L</mi> <mrow> <mi>f</mi> <mi>u</mi> <mi>s</mi> <mi>e</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>I</mi> <mo>,</mo> <mi>G</mi> <mo>,</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mrow> <mo>|</mo> <mi>I</mi> <mo>|</mo> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>I</mi> <mo>|</mo> </mrow> </munderover> <mo>{</mo> <msub> <mi>&omega;</mi> <mn>0</mn> </msub> <mi>log</mi> <mi> </mi> <mi>Pr</mi> <mrow> <mo>(</mo> <mi>G</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <mo>=</mo> <mn>0</mn> <mo>|</mo> <mi>I</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <mo>;</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>}</mo> <mo>+</mo> <msub> <mi>&omega;</mi> <mn>1</mn> </msub> <mi>log</mi> <mi> </mi> <mi>Pr</mi> <mrow> <mo>(</mo> <mi>G</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>I</mi> <mo>(</mo> <mi>k</mi> <mo>)</mo> <mo>;</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>}</mo> <mo>.</mo> </mrow>

2. the image crack dividing method as claimed in claim 1 based on depth convolutional neural networks, it is characterized in that：

Convolutional layer conv1_1 and convolutional layer conv1_2 are respectively using 64 convolution kernels, convolutional layer conv2_1 and convolutional layer Conv2_2 is respectively using 128 convolution kernels, and convolutional layer conv3_1, convolutional layer conv3_2 and convolutional layer conv3_3 are used respectively 256 convolution kernels, convolutional layer conv4_1, convolutional layer conv4_2 and convolutional layer conv4_3 are respectively using 512 convolution kernels, volume Lamination conv5_1, convolutional layer conv5_2 and convolutional layer conv5_3 are respectively using 512 convolution kernels.

3. the image crack dividing method as claimed in claim 1 based on depth convolutional neural networks, it is characterized in that：

All convolutional layers are using the convolution kernel window that size is 3 × 3.

4. the image crack dividing method as claimed in claim 1 based on depth convolutional neural networks, it is characterized in that：

All pond layers use size to carry out maximum pond for 2 × 2 convolution kernel window.

5. the image crack dividing method as claimed in claim 1 based on depth convolutional neural networks, it is characterized in that：

All batch normalization layers use formulaCharacteristic pattern after convolutional layer processing is criticized Normalizing operation, wherein, BN_γ,β(x_i,j,k) represent to criticize the result of normalizing operation；ε is levels off to 0 Constant,β=E [x],Note is each The image of convolutional layer has K, and convolutional layer refers to the convolutional layer before batch normalization layer, x here_i,j,kRepresent the kth of convolutional layer output The characteristic value of image pixel positions (i, j).

6. the image crack dividing method as claimed in claim 1 based on depth convolutional neural networks, it is characterized in that：

Described nonlinear activation layer using nonlinear function f (x)=max (0, x), wherein, x represents the defeated of batch normalization layer Go out.