CN112733756B

CN112733756B - Remote sensing image semantic segmentation method based on W divergence countermeasure network

Info

Publication number: CN112733756B
Application number: CN202110053047.4A
Authority: CN
Inventors: 刘昶; 曹峡; 赵卫东; 鄢涛; 刘永红
Original assignee: Chengdu University
Current assignee: Chengdu University
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2023-01-20
Anticipated expiration: 2041-01-15
Also published as: CN112733756A

Abstract

The invention relates to a remote sensing image semantic segmentation method based on a W divergence countermeasure network, which introduces a countermeasure training mechanism on a traditional U-net network architecture to solve the problem that the traditional convolutional neural network excessively depends on single pixel precision loss and ignores context connection. According to the invention, by improving the model network structure and adding a layer of deconvolution layer, the spatial resolution of the distribution of the segmented targets is recovered, and Wassertein divergence is introduced, so that the problem of spatial discontinuity of the joint distribution of different source data is solved, and the continuity of the training gradient is ensured. The segmentation precision of the model and the stability of model training are improved. All training and debugging works in the remote sensing image semantic segmentation method designed by the invention are carried out on line, and the model can be used for fast prediction of new on-line samples after training is finished.

Description

A Semantic Segmentation Method of Remote Sensing Image Based on W-Divergence Adversarial Network

技术领域technical field

本发明涉及图像处理领域，尤其涉及一种基于W散度对抗网络的遥感图像语义分割方法。The invention relates to the field of image processing, in particular to a remote sensing image semantic segmentation method based on W-divergence confrontation network.

背景技术Background technique

生成式对抗网络GAN是一种新型深度学习模型，在自然图像合成、图像翻译、风格迁移等应用中表现良好。GAN框架由一个生成器网络和一个判别器网络组成，其中生成器试图通过伪造数据来骗过判别器，而判别器则需要提升自身的鉴别能力来区分生成的假数据和真实数据，通过对抗训练，生成器和判别器就能学到样本数据的真实分布。由于GAN具有良好的数据拟合能力，许多研究尝试将GAN方法引入传统CNN语义分割网络。这些方法将原始的CNN分割网络作为对抗网络的生成器，设置另一个卷积结构的判别器对生成器生成的预测分割图与真实的标签进行对比，返回预测分割图为真的概率分布作为损失函数进而对生成器和判别器进行对抗训练。由于语义分割任务中标签是离散的，生成器生成的预测分布图往往存在连续，这种结构上的差异导致原始GAN模型在拟合连续分割结果与离散标签的映射关系时表现欠佳，具体体现在其使用的JS散度在两个分布正交或重叠部分较少时会出现梯度消失导致训练无法正常进行。Generative Adversarial Network (GAN) is a new type of deep learning model that performs well in applications such as natural image synthesis, image translation, and style transfer. The GAN framework consists of a generator network and a discriminator network, in which the generator tries to deceive the discriminator by forging data, and the discriminator needs to improve its own discrimination ability to distinguish generated fake data from real data, through confrontation training , the generator and discriminator can learn the true distribution of the sample data. Due to the good data fitting ability of GAN, many studies try to introduce the GAN method into the traditional CNN semantic segmentation network. These methods use the original CNN segmentation network as the generator of the confrontation network, set another discriminator with a convolutional structure to compare the predicted segmentation map generated by the generator with the real label, and return the probability distribution of the predicted segmentation map as a loss. The function then performs adversarial training on the generator and the discriminator. Since the labels in the semantic segmentation task are discrete, the prediction distribution map generated by the generator is often continuous. This structural difference leads to the poor performance of the original GAN model in fitting the mapping relationship between continuous segmentation results and discrete labels. Specifically, When the JS divergence used in the two distributions is orthogonal or the overlap is small, the gradient will disappear and the training will not proceed normally.

最近，Pix2pix采用CGAN(Conditional GAN)框架+U-net网络实现了城市遥感图像到地图的翻译，遗憾的是，在语义分割评价指标上，引入GAN框架的Pix2pix得到的分数不如基础的U-net网络。Recently, Pix2pix used the CGAN (Conditional GAN) framework + U-net network to realize the translation of urban remote sensing images to maps. Unfortunately, in terms of semantic segmentation evaluation indicators, Pix2pix introduced the GAN framework. The scores obtained are not as good as the basic U-net network.

现有遥感图像分割技术存在以下不足：The existing remote sensing image segmentation technology has the following deficiencies:

1、采用JS散度作为损失函数的原始GAN模型在训练过程中需要小心平衡生成器和判别器的学习能力，以免出现一方压制另一方导致模型崩溃。目前GAN分割方法例如Pix2pix仍然采用原始GAN框架，当生成器执行复杂的语义分割任务时，生成器和判别器的平衡条件更加微妙，导致模型难以稳定训练。此外，JS散度在处理语义分割任务的离散标签时常常出现梯度消失，从而导致训练无法正常进行。1. The original GAN model using JS divergence as the loss function needs to carefully balance the learning capabilities of the generator and the discriminator during the training process, so as to avoid one party suppressing the other and causing the model to collapse. At present, GAN segmentation methods such as Pix2pix still use the original GAN framework. When the generator performs complex semantic segmentation tasks, the balance conditions between the generator and the discriminator are more subtle, making it difficult to train the model stably. In addition, JS divergence often has vanishing gradients when dealing with discrete labels for semantic segmentation tasks, which makes training impossible.

2、Pix2pix的判别器缺少必要的上采样层，导致预测分布图的空间分辨率降低，从而丢失分割目标位置信息。此外，判别器损失使用概率分数统计预测分布图的真实性，一定程度上会导致数据失真，导致分割精度下降。2. The discriminator of Pix2pix lacks the necessary upsampling layer, which leads to the reduction of the spatial resolution of the predicted distribution map, thus losing the location information of the segmentation target. In addition, the discriminator loss uses probability score statistics to predict the authenticity of the distribution map, which will lead to data distortion to a certain extent, resulting in a decrease in segmentation accuracy.

因此，如何提高对遥感图像的分割精度是遥感图像处理领域亟需解决的问题。Therefore, how to improve the segmentation accuracy of remote sensing images is an urgent problem in the field of remote sensing image processing.

发明内容Contents of the invention

针对现有技术之不足，本发明提出一种基于Wasserstein散度对抗网络的遥感图像语义分割方法，所述方法包括：Aiming at the deficiencies in the prior art, the present invention proposes a remote sensing image semantic segmentation method based on Wasserstein divergence confrontation network, said method comprising:

步骤1：建立遥感图像数据集；Step 1: Establish a remote sensing image dataset;

步骤2：建立遥感图像语义分割模型，所述语义分割模型包括生成网络和判别器网络；Step 2: Establish a remote sensing image semantic segmentation model, which includes a generation network and a discriminator network;

步骤3：训练所述遥感图像语义分割模型，图4是本发明遥感图像分割方法的训练流程图，具体包括：Step 3: training described remote sensing image semantic segmentation model, Fig. 4 is the training flowchart of remote sensing image segmentation method of the present invention, specifically comprises:

步骤31：将所述训练集中待分割的RGB遥感图像和对应二值图标签配对，依次将一组RGB遥感图像和对应二值图标签输入到所述生成器网络的输入层，随机裁剪为生成器卷积网络额定输入尺寸，并进行位置对齐；Step 31: pair the RGB remote sensing images to be segmented in the training set with the corresponding binary image labels, and sequentially input a group of RGB remote sensing images and corresponding binary image labels to the input layer of the generator network, and randomly crop them to generate The rated input size of the convolutional network and position alignment;

具体的，将RGB遥感图像随机裁剪为生成器网络额定输入尺寸，默认256×256，并对二值图标签对应位置做相同操作。Specifically, the RGB remote sensing image is randomly cropped to the rated input size of the generator network, the default is 256×256, and the same operation is performed on the corresponding position of the binary image label.

步骤32：生成器卷积栈对输入的待分割RGB遥感图像进行特征提取，完成特征提取后输出第一预测分割图；Step 32: The generator convolution stack performs feature extraction on the input RGB remote sensing image to be segmented, and outputs the first predicted segmentation map after feature extraction is completed;

步骤33：对比所述第一预测分割图和二值图标签，统计预测分布的L1损失，数学表达式如下：Step 33: Comparing the labels of the first predicted segmentation image and the binary image, and counting the L1 loss of the predicted distribution, the mathematical expression is as follows:

其中，

表示L1距离损失，y表示二值图标签，G(x)表示预测分割图，

表示一次迭代的平均损失。in,

Represents the L1 distance loss, y represents the binary image label, G(x) represents the predicted segmentation map,

Indicates the average loss for an iteration.

迭代的输入单位为batch，一个batch中可能包含多张图片，故

可能在不同模型中计算范围不同。The input unit of iteration is batch, and a batch may contain multiple pictures, so

The calculation range may be different in different models.

步骤34：将生成器生成的所述第一预测分割图和对应的所述待分割RGB图像堆叠，同时将二值图标签和所述待分割RGB遥感图像堆叠，将两组堆叠数据依次输入判别器网络中，完成判别并输出所述第一预测分割图和所述二值图标签的Wasserstein散度，Wasserstein散度的数学表达式如下：Step 34: stack the first predicted segmentation image generated by the generator and the corresponding RGB image to be segmented, and simultaneously stack the binary image label and the RGB remote sensing image to be segmented, and input the two sets of stacked data into the discriminant in sequence In the device network, complete the discrimination and output the Wasserstein divergence of the first predicted segmentation map and the binary map label, the mathematical expression of the Wasserstein divergence is as follows:

其中，

表示Wasserstein散度损失，x表示待分割RGB遥感图像，y表示二值图标签，G(·)表示生成器网络输出，D(·)表示判别器网络输出，k、p分别为正则项的系数和指数，属于超参数，本方法默认k＝0.001,p＝3。in,

Represents the Wasserstein divergence loss, x represents the RGB remote sensing image to be segmented, y represents the binary image label, G( ) represents the output of the generator network, D( ) represents the output of the discriminator network, k and p are the coefficients of the regularization term and index are hyperparameters. This method defaults to k=0.001 and p=3.

步骤35：将Wasserstein散度作为判别器损失函数，通过梯度下降反向传播，更新判别器权重向量一次；Step 35: Use Wasserstein divergence as the discriminator loss function, and update the discriminator weight vector once through gradient descent backpropagation;

步骤36：将Wasserstein散度多项式中的生成器损失项取反再和L1损失加权求和(比重默认1：100)作为生成器损失函数，通过梯度下降反向传播，更新生成器权重向量一次，最优生成器G^*的数学描述为：Step 36: Invert the generator loss term in the Wasserstein divergence polynomial and then add the weighted sum of the L1 loss (the default ratio is 1:100) as the generator loss function, and update the generator weight vector once through gradient descent backpropagation. The mathematical description of the optimal generator G ^* is:

其中，λ表示权重参数。Among them, λ represents the weight parameter.

步骤37、重复步骤31至步骤36，直到所述训练集所有样本都参与一次训练；Step 37. Repeat steps 31 to 36 until all samples in the training set participate in one training session;

步骤38、重复步骤31至步骤37，直到迭代次数达到上限；Step 38. Repeat steps 31 to 37 until the number of iterations reaches the upper limit;

步骤39、模型预训练完成，保存预训练模型；Step 39, the model pre-training is completed, and the pre-trained model is saved;

步骤4：测试所述遥感图像语义分割模型。Step 4: Test the remote sensing image semantic segmentation model.

根据一种优选的实施方式，所述步骤1包括：According to a preferred embodiment, the step 1 includes:

步骤11：将高分辨率遥感图像和对应二值图标签均匀地切割成适应计算资源的尺寸的子图像，筛选出目标和背景像素点数量不均衡的样本；Step 11: Evenly cut the high-resolution remote sensing image and the corresponding binary image label into sub-images that fit the size of the computing resources, and filter out the samples with unbalanced number of target and background pixels;

步骤12：将切割后的所述子图像按照设定比例分成训练集和验证集，保证训练集和验证集不重叠同分布。Step 12: Divide the cut sub-images into a training set and a verification set according to a set ratio, ensuring that the training set and the verification set do not overlap and have the same distribution.

根据一种优选的实施方式，所述步骤2具体包括：According to a preferred embodiment, the step 2 specifically includes:

步骤21：建立基于U-net架构的生成器网络，包含一个输入层和一个卷积栈，所述生成器网络用于生成。Step 21: Establish a generator network based on the U-net architecture, including an input layer and a convolution stack, and the generator network is used for generation.

步骤22：建立基于FCN架构的判别器网络，包含一个卷积栈，所述判别器网络用于判别；Step 22: Establish a discriminator network based on the FCN architecture, including a convolution stack, and the discriminator network is used for discrimination;

步骤23：对生成器网络和判别器网络权重向量进行初始化。Step 23: Initialize the generator network and discriminator network weight vectors.

根据一种优选的实施方式，所述测试遥感图像语义分割模型的方法包括：According to a preferred embodiment, the method for testing the semantic segmentation model of remote sensing images includes:

步骤41：将所述验证集中的待分割RGB遥感图像输入预训练模型生成器网络中，得到第二预分割图像；Step 41: Input the RGB remote sensing image to be segmented in the verification set into the pre-training model generator network to obtain the second pre-segmented image;

步骤42：将所述待分割RGB遥感图像对应的二值图标签输入到生成器网络，通过对比所述第二预分割图像和二值图标签的差异性计算验证集的分割精度，差异量化可选用DICE相似性系数IoU交并比等参数来定量估计；Step 42: Input the binary image label corresponding to the RGB remote sensing image to be segmented into the generator network, and calculate the segmentation accuracy of the verification set by comparing the difference between the second pre-segmented image and the binary image label, and the difference quantification can be Select parameters such as the DICE similarity coefficient IoU intersection and union ratio to quantitatively estimate;

步骤43：若验证集的分割精度未达到设定值，执行步骤2至步骤4来调整参数提高分割精度。Step 43: If the segmentation accuracy of the verification set does not reach the set value, perform steps 2 to 4 to adjust parameters to improve segmentation accuracy.

本发明的有益效果在于：The beneficial effects of the present invention are:

1、本发明使用散度形式的Wasserstein距离代替传统GAN方法中的JS散度，一方面解决了JS散度带来的梯度消失问题，另一方面规避了传统Wasserstein距离的优化需要满足的Lipschitz约束，解决了遥感图像和二值图标签之间不同源数据联合分布空间不连续的问题，保证了训练梯度的稳定性。1. The present invention uses the Wasserstein distance in the form of divergence to replace the JS divergence in the traditional GAN method. On the one hand, it solves the problem of gradient disappearance caused by the JS divergence, and on the other hand, it avoids the Lipschitz constraint that the optimization of the traditional Wasserstein distance needs to meet , which solves the problem of spatial discontinuity in the joint distribution of different source data between remote sensing images and binary image labels, and ensures the stability of the training gradient.

2、本发明在判别器下采样卷积栈后添加了一层反卷积层，不同于JS散度采用多像素选举的方式返回概率分数或概率分布图来反映预测分割准确性，wasserstein距离采用实值计算，所以我们的方法不需要压缩判别器输出的空间分辨率，因此加上反卷积层，用于恢复分割目标分布的空间分辨率，减少位置信息的丢失。2. The present invention adds a deconvolution layer after the discriminator downsamples the convolution stack. Unlike JS divergence, which uses multi-pixel elections to return probability scores or probability distribution maps to reflect the prediction segmentation accuracy, the Wasserstein distance adopts Real-valued calculations, so our method does not need to compress the spatial resolution of the discriminator output, so a deconvolution layer is added to restore the spatial resolution of the segmentation target distribution and reduce the loss of position information.

附图说明Description of drawings

图1是本发明遥感图像分割方法流程图；Fig. 1 is the flow chart of remote sensing image segmentation method of the present invention;

图2是本发明生成器的网络结构图；Fig. 2 is the network structural diagram of generator of the present invention;

图3是本发明判别器的网络结构图；Fig. 3 is the network structural diagram of discriminator of the present invention;

图4是本发明遥感图像分割方法的训练流程图；Fig. 4 is the training flowchart of remote sensing image segmentation method of the present invention;

图5是本发明一个实例的效果对比图；和Fig. 5 is the effect contrast figure of an example of the present invention; With

图6是本发明另一个实例的效果对比图。Fig. 6 is an effect comparison diagram of another example of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明了，下面结合具体实施方式并参照附图，对本发明进一步详细说明。应该理解，这些描述只是示例性的，而并非要限制本发明的范围。此外，在以下说明中，省略了对公知结构和技术的描述，以避免不必要地混淆本发明的概念。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in combination with specific embodiments and with reference to the accompanying drawings. It should be understood that these descriptions are exemplary only, and are not intended to limit the scope of the present invention. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concept of the present invention.

下面结合附图进行详细说明。A detailed description will be given below in conjunction with the accompanying drawings.

主要解决传统卷积神经网络(CNN)方法在遥感图像语义分割中存在的上下文关系缺失的问题。高分辨率遥感图像包含丰富的地面信息，相对于自然图像的语义分割，遥感图像的分割目标往往存在更大的类内差异性和更严重的类间干扰，这对神经网络的语义理解能力提出了更高的要求。传统的CNN方法过于依赖单个像素的分类而忽略目标实例的内部整体性以及类间关系，从而导致误分割和误识别。传统的Patch AN通过压缩空间分辨率将一个直径小于分割目标的像素块的像素值归纳到一个概率分数，从而获得像素块内的上下文一致性，实践中发现这个做法会导致分割结果模糊，本发明将JS散度改为Wasserstein散度之后，在实验中采用反卷积层恢复空间分辨率效果更好。It mainly solves the problem of lack of context in the traditional convolutional neural network (CNN) method in the semantic segmentation of remote sensing images. High-resolution remote sensing images contain rich ground information. Compared with the semantic segmentation of natural images, the segmentation targets of remote sensing images often have greater intra-class differences and more serious inter-class interference. higher requirements. Traditional CNN methods rely too much on the classification of individual pixels and ignore the internal integrity of target instances and the relationship between classes, resulting in mis-segmentation and mis-recognition. The traditional Patch AN summarizes the pixel value of a pixel block with a diameter smaller than the segmentation target into a probability score by compressing the spatial resolution, thereby obtaining the contextual consistency within the pixel block. In practice, it is found that this approach will lead to blurred segmentation results. The present invention After changing the JS divergence to the Wasserstein divergence, it is better to use the deconvolution layer to restore the spatial resolution in the experiment.

为了向CNN网络引入更高阶的上下文关系，改善CNN网络的分割性能，本发明提出了一种基于Wasserstein散度的对抗网络(GAN)分割方法。In order to introduce higher-order context relations into the CNN network and improve the segmentation performance of the CNN network, the present invention proposes an Adversarial Network (GAN) segmentation method based on Wasserstein divergence.

图1为本发明分割方法的流程图，现结合图1，对本发明方法进行详细的描述。本发明分割方法包括：FIG. 1 is a flow chart of the segmentation method of the present invention. Now, with reference to FIG. 1 , the method of the present invention will be described in detail. The segmentation method of the present invention comprises:

步骤1：建立遥感图像数据集，具体包括：Step 1: Establish a remote sensing image dataset, including:

步骤11：将高分辨率遥感图像和对应二值图标签均匀地切割成适应计算资源的尺寸的子图像，筛选掉目标和背景像素点数量不均衡的样本；筛选的目的在于：防止模型对像素点的分类结果向像素点占比更高的一方偏移。默认600*600的尺寸。Step 11: Evenly cut the high-resolution remote sensing image and the corresponding binary image label into sub-images that fit the size of the computing resources, and filter out samples with an unbalanced number of target and background pixels; the purpose of screening is to prevent the model from distorting pixels The classification result of the points is shifted to the side with a higher proportion of pixels. The default size is 600*600.

步骤12：将切割后的所述子图像按照设定比例分成训练集和验证集，保证训练集和验证集不重叠同分布。一般按照训练集：验证集的比例为3:1。Step 12: Divide the cut sub-images into a training set and a verification set according to a set ratio, ensuring that the training set and the verification set do not overlap and have the same distribution. Generally, the ratio of training set: validation set is 3:1.

步骤2：建立遥感图像语义分割模型，所述语义分割模型包括生成器网络和判别器网络，具体包括：Step 2: Establish a remote sensing image semantic segmentation model, the semantic segmentation model includes a generator network and a discriminator network, specifically including:

步骤21：建立基于U-net架构的生成器网络，包含一个输入层和一个卷积栈，所述生成器网络用于生成预测分割图。图2为本发明生成器的网络结构图，具体结构如图2所示。卷积栈包括依次运行的卷积层和相应的反卷积层，卷积层和解卷积层之间的虚线指skip-connection layer，即跳跃层，其作用为融合不同卷积层提取到的特征。目前，跳跃层的使用方式有加合、堆叠两种。本发明采用堆叠的方式连接镜像层，主要作用为增强空间对齐和防止梯度消失。Step 21: Establish a generator network based on the U-net architecture, including an input layer and a convolution stack, and the generator network is used to generate a prediction segmentation map. Fig. 2 is a network structure diagram of the generator of the present invention, and the specific structure is shown in Fig. 2 . The convolution stack includes convolutional layers and corresponding deconvolutional layers that run sequentially. The dotted line between the convolutional layer and the deconvolutional layer refers to the skip-connection layer, that is, the skipping layer. Its function is to fuse the data extracted by different convolutional layers. feature. Currently, there are two ways to use jump layers: addition and stacking. The present invention connects mirror layers in a stacked manner, and its main function is to enhance spatial alignment and prevent gradient disappearance.

步骤22：建立基于FCN架构的判别器网络，包含一个卷积栈，所述判别器网络用于计算生成器所生成的预测分割图的wasserstein散度损失；图3为本发明判别器的网络结构图，如图3所示，判别器包括依次运行的多个卷积层和一个反卷积层。本发明下采样卷积栈后添加了一层反卷积层，不同于JS散度采用统计像素块的概率分数或概率分布图的方式来反映预测分割准确性，wasserstein距离采用单个像素点的实值计算，所以我们的方法不需要压缩判别器输出的空间分辨率，因此加上反卷积层，用于恢复分割目标分布的空间分辨率，减少位置信息的丢失。Step 22: Establish a discriminator network based on the FCN architecture, including a convolution stack, and the discriminator network is used to calculate the wasserstein divergence loss of the predicted segmentation map generated by the generator; Figure 3 is the network structure of the discriminator of the present invention As shown in Figure 3, the discriminator consists of multiple convolutional layers and a deconvolutional layer that run sequentially. The present invention adds a layer of deconvolution layer after the downsampling convolution stack, which is different from JS divergence which uses the probability score or probability distribution map of statistical pixel blocks to reflect the accuracy of prediction segmentation. Wasserstein distance uses the actual value of a single pixel Value calculation, so our method does not need to compress the spatial resolution of the discriminator output, so a deconvolution layer is added to restore the spatial resolution of the segmentation target distribution and reduce the loss of position information.

步骤23：对生成器网络和判别器网络权重向量进行初始化；Step 23: Initialize the generator network and discriminator network weight vectors;

其中，

表示L1距离损失，y表示二值图标签，G(x)表示预测分割图，

表示一次迭代的平均损失。in,

Indicates the average loss for an iteration.

迭代的输入单位为batch，一个batch中可能包含多张图片，故

The calculation range may be different in different models.

其中，

其中，λ表示权重参数。Among them, λ represents the weight parameter.

步骤4：测试所述遥感图像语义分割模型，具体包括：Step 4: Test the remote sensing image semantic segmentation model, specifically including:

本发明方法还包括：当验证集的分割精度达到设定值，进行线上应用，即将待分割的新样本输入预训练的生成器网络，输出预测分割结果。The method of the present invention also includes: when the segmentation accuracy of the verification set reaches the set value, the online application is performed, that is, the new sample to be segmented is input into the pre-trained generator network, and the predicted segmentation result is output.

本发明的调参工作主要分为两个方面：1、通过调整网络架构、更换损失函数、调整网络结构达到精度提升；2、通过观察损失曲线的收敛情况结合预分割图像的视觉效果以及验证集精度来调整超参、归一化、标准化、激活函数设置等操作排除异常。The parameter adjustment work of the present invention is mainly divided into two aspects: 1. Accuracy is improved by adjusting the network architecture, replacing the loss function, and adjusting the network structure; 2. By observing the convergence of the loss curve combined with the visual effect of the pre-segmented image and the verification set Accuracy to adjust hyperparameters, normalization, standardization, activation function settings and other operations to eliminate abnormalities.

本发明的目的在于通过在传统卷积网络中引入对抗网络架构来改善分割模型的性能，为了验证本文方法可行性，以U-net为例进行了如下对比实验。The purpose of the present invention is to improve the performance of the segmentation model by introducing an adversarial network architecture into the traditional convolutional network. In order to verify the feasibility of the method in this paper, the following comparative experiments were carried out using U-net as an example.

本文设置了基础U-net分割网络和U-net+WGAN-div(本发明方法)两组实验。实验所用参数、训练集和验证集、训练迭代次数均一致。本文方法较基础U-net不仅在精度指标上取得了较大提升，同时，我们的方法在视觉效果上也体现出了更好的表现。本发明采用的客观评价指标为DICE相似性系数和IoU交并比,实验结果是本发明方法的DICE值是0.844，IoU值是0.729,单像素精度是0.951，基础U-net分割结果的DICE值为0.796，IoU值是0.663，单像素精度是0.933。In this paper, two sets of experiments on the basic U-net segmentation network and U-net+WGAN-div (the method of the present invention) are set up. The parameters used in the experiment, the training set and validation set, and the number of training iterations are all the same. Compared with the basic U-net, the method in this paper has not only achieved a greater improvement in the accuracy index, but also our method has better performance in terms of visual effects. The objective evaluation index adopted by the present invention is the DICE similarity coefficient and the IoU cross-over-union ratio. The experimental results show that the DICE value of the method of the present invention is 0.844, the IoU value is 0.729, the single-pixel precision is 0.951, and the DICE value of the basic U-net segmentation result It is 0.796, the IoU value is 0.663, and the single-pixel accuracy is 0.933.

图5和图6从左至右依次为原始RGB遥感图像、基础U-net分割结果图、本发明方法分割结果、真实二值图标签。从图5明显可见，GAN方法在个体实例的分割上具有更好的类内一致性，而基础U-net在建筑物表面不光滑时存在漏分。从图6可见，基础U-net在分割建筑物时容易受开阔区域例如停车场等干扰导致误识别，而本发明方法相对具有更好的鲁棒性。Figure 5 and Figure 6 are from left to right the original RGB remote sensing image, the basic U-net segmentation result map, the segmentation result of the method of the present invention, and the real binary image label. It is obvious from Figure 5 that the GAN method has better intra-class consistency in the segmentation of individual instances, while the basic U-net has missing points when the building surface is not smooth. It can be seen from Figure 6 that the basic U-net is prone to misidentification caused by interference from open areas such as parking lots when segmenting buildings, while the method of the present invention is relatively more robust.

本发明的主要目的是改善传统U-net所代表的卷积神经网络在语义分割任务中缺乏上下文联系，传统卷积网络将语义分割任务转化成了逐像素分类任务，使单个像素的分类脱离了语义环境，事实上分割目标具有连续性，一个像素点如果被判定为目标，那么它的领域同样为目标的概率应该高于其他区域。目前针对语义分割的改进大多围绕上下文联系的问题展开，对抗网络是其中一种使用较少的方。传统基于JS散度的对抗网络主要通过压缩图像空间分辨率提升上下文一致性，比如通过3个的下采样卷积层，就可以用一个概率分数表示一个8*8的区域范围内的像素值，副作用是可能造成分割结果高频部分(比如边界轮廓)模糊。实验中传统对抗网络方法因为解决不好连续的RGB图像到离散的二值化标签导致梯度消失问题，分割结果并不好，这也是对抗网络使用较少的一个原因。本文的创新点在于使用wasserstein散度替换JS散度来规避传统对抗网络方法的梯度消失问题，并引入一个反卷积层，更多的保留了空间分辨率，提高了分割的准确性。The main purpose of the present invention is to improve the lack of contextual connection in the semantic segmentation task of the convolutional neural network represented by the traditional U-net. The traditional convolutional network converts the semantic segmentation task into a pixel-by-pixel classification task, so that the classification of a single pixel is separated from the Semantic environment, in fact, the segmented target has continuity, if a pixel is judged as the target, then its area should have a higher probability of being the target than other areas. At present, most of the improvements for semantic segmentation revolve around the problem of contextual connection, and the confrontation network is one of the less used methods. The traditional JS divergence-based adversarial network mainly improves the contextual consistency by compressing the spatial resolution of the image. For example, through three downsampling convolutional layers, a probability score can be used to represent the pixel value in an 8*8 area. The side effect is that the high-frequency part (such as the boundary contour) of the segmentation result may be blurred. In the experiment, the traditional adversarial network method is not good at solving the problem of gradient disappearance from continuous RGB images to discrete binarized labels, and the segmentation results are not good, which is one of the reasons why the adversarial network is used less. The innovation of this paper is to use the wasserstein divergence to replace the JS divergence to avoid the gradient disappearance problem of the traditional adversarial network method, and introduce a deconvolution layer, which retains more spatial resolution and improves the accuracy of segmentation.

需要注意的是，上述具体实施例是示例性的，本领域技术人员可以在本发明公开内容的启发下想出各种解决方案，而这些解决方案也都属于本发明的公开范围并落入本发明的保护范围之内。本领域技术人员应该明白，本发明说明书及其附图均为说明性而并非构成对权利要求的限制。本发明的保护范围由权利要求及其等同物限定。It should be noted that the above specific embodiments are exemplary, and those skilled in the art can come up with various solutions inspired by the disclosure of the present invention, and these solutions also belong to the scope of the disclosure of the present invention and fall within the scope of this disclosure. within the scope of protection of the invention. Those skilled in the art should understand that the description and drawings of the present invention are illustrative rather than limiting to the claims. The protection scope of the present invention is defined by the claims and their equivalents.

Claims

1. a remote sensing image semantic segmentation method based on W divergence confrontation network, it is characterized in that, described method comprises:

Step 1: Establish a remote sensing image dataset;

Step 11: Evenly cut the high-resolution remote sensing image and the corresponding binary image label into sub-images that fit the size of the computing resources, and filter out the samples with unbalanced number of target and background pixels;

Step 12: Dividing the cut sub-images into a training set and a verification set according to a set ratio, ensuring that the training set and the verification set do not overlap and have the same distribution;

Step 2: Establish a remote sensing image semantic segmentation model, the semantic segmentation model includes a generator network and a discriminator network;

Step 3: training the remote sensing image semantic segmentation model, specifically including:

Step 31: pair the RGB remote sensing images to be segmented in the training set with the corresponding binary image labels, and sequentially input a group of RGB remote sensing images and corresponding binary image labels to the input layer of the generator network, and randomly crop them to generate The rated input size of the convolutional network and position alignment;

Step 32: The generator convolution stack performs feature extraction on the input RGB remote sensing image to be segmented, and outputs the first predicted segmentation map after feature extraction is completed;

Step 33: Comparing the labels of the first predicted segmentation image and the binary image, and counting the L1 loss of the predicted distribution, the mathematical expression is as follows:

in,

Indicates the average loss of an iteration;

Step 34: stack the first predicted segmentation map generated by the generator and the corresponding RGB remote sensing image to be segmented, and simultaneously stack the binary image label and the RGB remote sensing image to be segmented, and input two sets of stacked data in sequence In the discriminator network, complete the discrimination and output the Wasserstein divergence of the first predicted segmentation map and the binary map label, the mathematical expression of the Wasserstein divergence is as follows:

in,

Represents the Wasserstein divergence loss, x represents the RGB remote sensing image to be segmented, y represents the binary image label, G( ) represents the output of the generator network, D( ) represents the output of the discriminator network, k and p are the coefficients of the regularization term and index;

Step 35: Use Wasserstein divergence as the discriminator loss function, and update the discriminator weight vector once through gradient descent backpropagation;

Step 36: Invert the generator loss term in the Wasserstein divergence polynomial and then add the weighted sum of the L1 loss as the generator loss function, and update the generator weight vector once through gradient descent backpropagation, the optimal generator G ^* The mathematical description is:

Among them, λ represents the weight parameter;

Step 37: Repeat steps 31 to 36 until all samples in the training set participate in one training session;

Step 38: Repeat steps 31 to 37 until the number of iterations reaches the upper limit;

Step 39: Model pre-training is completed, save the pre-trained model;

Step 4: Test the remote sensing image semantic segmentation model.

2. The remote sensing image semantic segmentation method according to claim 1, wherein said step 2 specifically comprises:

Step 21: set up a generator network based on U-net architecture, including an input layer and a convolution stack, and the generator network is used to generate the first prediction segmentation map;

Step 22: Establish a discriminator network based on the FCN architecture, including a convolution stack, and the discriminator network is used to calculate the wasserstein divergence loss of the predicted segmentation map generated by the generator;

Step 23: Initialize the generator network and discriminator network weight vectors.

3. the remote sensing image semantic segmentation method as claimed in claim 1, is characterized in that, the method for testing described remote sensing image semantic segmentation model comprises:

Step 41: Input the RGB remote sensing image to be segmented in the verification set into the pre-training model generator network to obtain the second pre-segmented image;

Step 42: Input the binary image label corresponding to the RGB remote sensing image to be segmented into the generator network, calculate the segmentation accuracy of the verification set by comparing the difference between the second pre-segmented image and the binary image label, and select the difference quantification DICE similarity coefficient and IoU intersection and ratio parameters are used to quantitatively estimate;

Step 43: If the segmentation accuracy of the verification set does not reach the set value, perform steps 2 to 4 to adjust parameters to improve segmentation accuracy.