CN113505792B

CN113505792B - Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Info

Publication number: CN113505792B
Application number: CN202110739174.XA
Authority: CN
Inventors: 聂婕; 王成龙; 魏志强; 时津津; 叶敏; 陈昊
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2023-10-27
Anticipated expiration: 2041-06-30
Also published as: CN113505792A

Abstract

The invention discloses a multi-scale semantic segmentation method and a multi-scale semantic segmentation model for an unbalanced remote sensing image, wherein the multi-scale semantic segmentation model for the unbalanced remote sensing image adopts a multi-level semantic segmentation network which can learn fine-grained local features, retain small-class information, learn whole global context semantic features and retain large-scale information; the whole network architecture is divided into three layers, each layer adopts different network structures to extract features of different scales, outputs segmented images of different resolutions, and fuses the features after images are fused in the same layer by adopting a Bayesian fusion method, so that the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized; the multi-scale semantic segmentation method for the unbalanced remote sensing image adopts an optimization algorithm which can enable pixels of different categories to be more separated and pixels of the same category to be more aggregated, so that the semantic segmentation network model can realize uniform segmentation on category unbalanced data.

Description

Multi-scale semantic segmentation method and model for unbalanced remote sensing images

技术领域Technical field

本发明属于遥感图像处理技术领域，特别涉及面向非均衡遥感图像的多尺度语义分割方法及模型。The invention belongs to the technical field of remote sensing image processing, and particularly relates to a multi-scale semantic segmentation method and model for unbalanced remote sensing images.

背景技术Background technique

随着对地观测技术的发展和图像获取技术的进步，遥感图像为对地观测与发现提供了海量的研究数据。通过图像处理和人工智能技术对遥感图像进行内容分析，是充分挖掘遥感数据的有效方法。主要手段包括场景分类、目标识别、语义分割等。其中，语义分割是遥感图像的内容分析的重要技术之一，通过推断图像单个像素的语义类别，来分割图像中包含的目标和区域。With the development of earth observation technology and the advancement of image acquisition technology, remote sensing images provide massive research data for earth observation and discovery. Content analysis of remote sensing images through image processing and artificial intelligence technology is an effective method to fully mine remote sensing data. The main means include scene classification, target recognition, semantic segmentation, etc. Among them, semantic segmentation is one of the important technologies for content analysis of remote sensing images. It segments the targets and regions contained in the image by inferring the semantic categories of individual pixels in the image.

目前图像语义分割所采用的方法是基于深度学习的语义分割方法，经典的深度学习语义分割网络包括全卷积网络(FCN)、SegNet、U-Net等。FCN通过全部采用卷积层，首次实现了端到端的分割网络，可以接受任何大小的输入图像并输出相同大小的分割图像。但没有全面地结合上下文信息以及像素与像素之间的相关性，分割精度不足。U-Net网络通过在通道维度实现级联功能，可以适用于多尺度、大尺寸图像的分割。但是U-Net对设备的计算能力要求比较高，且计算速度相对较慢。SegNet网络通过在解码阶段使用编码阶段保存的最大池化索引，可以提高内存利用率和模型分割效率。但是对低分辨率特征图进行池化操作时会忽略相邻像素的信息，造成精度缺失。金字塔场景解析网络(PSPNet)通过使用金字塔池化模块学习多层次特征，可以充分利用全局上下文信息。但是没有充分利用整个场景信息。The current method used for image semantic segmentation is a semantic segmentation method based on deep learning. Classic deep learning semantic segmentation networks include fully convolutional networks (FCN), SegNet, U-Net, etc. By using all convolutional layers, FCN realizes an end-to-end segmentation network for the first time, which can accept input images of any size and output segmented images of the same size. However, the contextual information and the correlation between pixels are not comprehensively combined, and the segmentation accuracy is insufficient. The U-Net network realizes the cascade function in the channel dimension and can be suitable for segmentation of multi-scale and large-size images. However, U-Net has relatively high requirements on the computing power of the device, and the computing speed is relatively slow. The SegNet network can improve memory utilization and model segmentation efficiency by using the maximum pooling index saved in the encoding stage during the decoding stage. However, when performing pooling operations on low-resolution feature maps, the information of adjacent pixels will be ignored, resulting in a loss of accuracy. Pyramid Scene Parsing Network (PSPNet) can fully utilize global context information by learning multi-level features using pyramid pooling module. But the entire scene information is not fully utilized.

由于遥感图像和普通图像存在分辨率、空间结构、语义方面的显著差异，传统的方法以及普通神经网络的方法很难针对遥感图像的特点实现高效的分割。遥感图像的语义分割仍面临以下挑战：Due to the significant differences in resolution, spatial structure, and semantics between remote sensing images and ordinary images, it is difficult for traditional methods and ordinary neural network methods to achieve efficient segmentation based on the characteristics of remote sensing images. Semantic segmentation of remote sensing images still faces the following challenges:

其一，现有自然图像分割方法处理的语义类别分布相对均衡，没有考虑遥感图像存在某类别占图像较大比例的现象。由于地球表面真实物理实体分布的差异，导致遥感图像的前景背景不均衡，不同类别的物体规模差异较大。其次，适用于自然图像的深度学习分割模型对物体的尺度变化不敏感，直接用于遥感图像的语义分割会损失像素精度。遥感图像中车辆等类别与土地、湖泊等类别相比，体积甚至可以忽略不计，物体之间存在很大的尺度变化。因此，先前的语义分割方法不适合直接应用于遥感图像，需要针对遥感图像的特点出发，设计相应的分割算法。First, the distribution of semantic categories processed by existing natural image segmentation methods is relatively balanced, and the phenomenon that a certain category accounts for a larger proportion of the image in remote sensing images is not considered. Due to differences in the distribution of real physical entities on the earth's surface, the foreground and background of remote sensing images are uneven, and the scales of objects of different categories vary greatly. Secondly, deep learning segmentation models suitable for natural images are insensitive to scale changes of objects, and directly used for semantic segmentation of remote sensing images will lose pixel accuracy. Compared with categories such as land and lakes, the volume of categories such as vehicles in remote sensing images is even negligible, and there are large scale changes between objects. Therefore, previous semantic segmentation methods are not suitable for direct application to remote sensing images, and corresponding segmentation algorithms need to be designed based on the characteristics of remote sensing images.

由于遥感图像获取的多样性以及数据本身区别于自然图像的特殊性，所以其语义分割并不能用之前单一的方式解决问题。遥感图像存在物体多尺度变化的问题，大尺度的物体在分割中占主导地位，同时会抑制小尺度物体的学习，造成小类别的物体难以识别。另外，由于遥感图像的高分辨率特点，图像中包含的信息通常会很密集，造成了图像类别分布不均衡的问题。Due to the diversity of remote sensing image acquisition and the particularity of the data itself that is different from natural images, its semantic segmentation cannot solve the problem in the previous single way. Remote sensing images have the problem of multi-scale changes in objects. Large-scale objects dominate the segmentation and inhibit the learning of small-scale objects, making it difficult to identify small-scale objects. In addition, due to the high-resolution characteristics of remote sensing images, the information contained in the images is usually very dense, causing the problem of uneven distribution of image categories.

发明内容Contents of the invention

针对现有技术存在的不足，本发明提供一种面向非均衡遥感图像的多尺度语义分割方法及模型，解决的技术问题是：(1)遥感图像中的对象尺度差异大问题。(2)遥感影像的类别分布不平衡问题。针对第一个问题，本发明提出了一种面向非均衡遥感图像的多尺度语义分割模型，设计多层级语义分割网络，提取不同尺度的特征，将这些特征在同一层级进行融合，实现缺失信息的互补，充分利用全局的上下文信息，同时在保留图像局部细节信息的前提下，克服多尺度物体的相互影响，提升遥感图像分割的鲁棒性与准确性。针对第二个问题，本发明从两方面进行算法设计：1)构建类间损失函数，实现不同类别样本的类间距最大化；2)构建类别权重均衡分布损失函数，解决所有类别的正负样本失衡的问题。In view of the shortcomings of the existing technology, the present invention provides a multi-scale semantic segmentation method and model for unbalanced remote sensing images. The technical problems solved are: (1) The problem of large differences in object scales in remote sensing images. (2) The problem of unbalanced category distribution of remote sensing images. In response to the first problem, the present invention proposes a multi-scale semantic segmentation model for unbalanced remote sensing images, designs a multi-level semantic segmentation network, extracts features of different scales, and fuses these features at the same level to achieve missing information. Complementary, make full use of global context information, while retaining local detail information of the image, overcome the mutual influence of multi-scale objects, and improve the robustness and accuracy of remote sensing image segmentation. In response to the second problem, the present invention carries out algorithm design from two aspects: 1) constructing an inter-class loss function to maximize the class distance of samples of different categories; 2) constructing a category weight balanced distribution loss function to solve the problem of positive and negative samples of all categories Imbalance problem.

为了解决上述技术问题，本发明采用的技术方案详细说明如下：In order to solve the above technical problems, the technical solutions adopted by the present invention are described in detail as follows:

首先，本发明提供了一种面向非均衡遥感图像的多尺度语义分割模型，采用既能学习细粒度局部特征、保留小类别信息、又能学习整个全局上下文语义特征、保留大尺度信息的多层级语义分割网络；所述多层级语义分割网络的整个网络架构分为三层，每层采用不同的网络结构提取不同尺度的特征，输出不同分辨率的分割图像，将这些特征在同一层级采用贝叶斯融合方法进行图像后融合，实现多尺度分割图像信息的融合，实现缺失信息的互补。First, the present invention provides a multi-scale semantic segmentation model for unbalanced remote sensing images, using a multi-level model that can learn fine-grained local features, retain small category information, and learn the entire global contextual semantic features and retain large-scale information. Semantic segmentation network; the entire network architecture of the multi-level semantic segmentation network is divided into three layers. Each layer uses different network structures to extract features of different scales, outputs segmented images of different resolutions, and uses Bayesian methods to combine these features at the same level. This fusion method performs post-image fusion to achieve the fusion of multi-scale segmented image information and complement the missing information.

进一步的，所述多层级语义分割网络模型的第一层级Level 1采用原始分辨率的数据，第二层级Level 2采用下采样2倍之后的数据，第三层级Level 3采用下采样4倍之后的数据；Further, the first level Level 1 of the multi-level semantic segmentation network model uses the original resolution data, the second level Level 2 uses the data after downsampling 2 times, and the third level Level 3 uses the data after 4 times downsampling. data;

所述多层级语义分割网络模型采用的主干网络为SegNet语义分割网络，网络的左边是编码器，由5个卷积池化的过程组成，前两层的每一层包含两个卷积层，后三层的每一层包含三个卷积层；The backbone network used in the multi-level semantic segmentation network model is the SegNet semantic segmentation network. The left side of the network is the encoder, which consists of 5 convolutional pooling processes. Each of the first two layers contains two convolutional layers. Each of the last three layers contains three convolutional layers;

网络右边是解码器，由5个上采样、卷积的过程组成，右边第一、四层是一个上采样层加两个卷积层，第二、三层是一个上采样层加三个卷积层，第五层是一个上采样层加两个卷积层，最后加一个Softmax层。The right side of the network is the decoder, which consists of 5 upsampling and convolution processes. The first and fourth layers on the right are one upsampling layer plus two convolutional layers, and the second and third layers are one upsampling layer plus three convolutions. The fifth layer is an upsampling layer plus two convolutional layers, and finally a Softmax layer.

进一步的，编码阶段的每一层和解码阶段的每一层一一对应；在解码器的每个上采样层与同层级编码器的最大池化层对应，解码器的上采样层使用最大池化过程保留的索引来进行上采样特征图，使在编码阶段图像分类的特征得以重现，产生稠密的特征图，最后将这些特征图恢复到与原始图像相同的大小，再通过softmax层进行分类，即产生最后的分割图。Furthermore, each layer in the encoding stage corresponds to each layer in the decoding stage; each upsampling layer in the decoder corresponds to the maximum pooling layer of the encoder at the same level, and the upsampling layer of the decoder uses maximum pooling. The index retained in the encoding process is used to upsample the feature map, so that the features of image classification in the encoding stage can be reproduced to generate dense feature maps. Finally, these feature maps are restored to the same size as the original image, and then classified through the softmax layer. , which generates the final segmentation map.

进一步的，所述多层级语义分割网络的三层网络输出分割图O₁,O₂和O₃,三层网络输出的分割图在进行图像后融合时，选择其中任意一层输出的分割图作为先验O_i，i＝1,2,3，除此分割图外的任意一个分割图O_j计算似然，j≠i,j＝{1,2,3}，因此后验概率的计算为：Further, the three-layer network of the multi-level semantic segmentation network outputs segmentation maps O ₁ , O ₂ and O ₃ . When performing post-image fusion of the segmentation maps output by the three-layer network, the segmentation map output by any one of the layers is selected as the segmentation map. Priori O _i , i=1,2,3, calculate the likelihood of any segmentation map O _j except this segmentation map, j≠i,j={1,2,3}, so the calculation of posterior probability is :

n表示当前像素的类别，m表示类别的数量；F_ni和B_ni分别表示在类别为n时的前景区域和背景区域；O_ni作为先验，i＝1,2,3，O_nj用来计算似然，j≠i,j＝{1,2,3}；在每一个区域，通过比较O_ni和O_nj在每个类别的前景和背景来计算似然。n represents the category of the current pixel, m represents the number of categories; F _ni and B _ni respectively represent the foreground area and background area when the category is n; O _ni is used as a priori, i = 1, 2, 3, O _nj is used Calculate the likelihood, j≠i,j={1,2,3}; in each region, calculate the likelihood by comparing O _ni and O _nj in the foreground and background of each category.

优选的是，三层网络输出的分割图进行图像后融合时，首先使用前两层的输出分割图进行融合；然后将前两层融合得到的分割图与第三层输出的分割图进行融合，具体是：Preferably, when the segmentation map output by the three-layer network is used for post-image fusion, the segmentation map output by the first two layers is first used for fusion; and then the segmentation map obtained by the fusion of the first two layers is fused with the segmentation map output by the third layer, specifically is:

首先使用第一层网络输出的分割图作为先验，使用第二层网络输出的分割图计算似然率，然后基于贝叶斯公式合并两个分割图的信息；First, use the segmentation map output by the first layer network as a priori, use the segmentation map output by the second layer network to calculate the likelihood, and then merge the information of the two segmentation maps based on the Bayesian formula;

然后，将两者交换，以第二层输出的分割图为先验，使用第一层输出的分割图计算似然率，然后基于贝叶斯公式整合；Then, the two are exchanged, using the segmentation map output by the second layer as the prior, using the segmentation map output by the first layer to calculate the likelihood, and then integrating based on the Bayesian formula;

最后，以相同的方式融合前两个网络层和第三个网络层的分割图，以获得最终的集成分割图。Finally, the segmentation maps of the first two network layers and the third network layer are fused in the same way to obtain the final integrated segmentation map.

然后，本发明还提供了一种面向非均衡遥感图像的多尺度语义分割方法，包括以下步骤：Then, the present invention also provides a multi-scale semantic segmentation method for unbalanced remote sensing images, which includes the following steps:

一、改进语义分割网络架构：构建多层级语义分割网络模型，每层网络输出不同尺度分割图，采用贝叶斯融合方法融合多尺度分割图像信息；1. Improve the semantic segmentation network architecture: build a multi-level semantic segmentation network model, each layer of the network outputs segmentation maps of different scales, and use the Bayesian fusion method to fuse multi-scale segmentation image information;

二、均衡化损失函数：采用既能使不同类别像素更加分离、又能使相同类别像素更加聚合的优化算法，具体如下：2. Equalization loss function: Use an optimization algorithm that can not only make pixels of different categories more separated, but also make pixels of the same category more aggregated. The details are as follows:

1)基于聚焦损失函数Focal loss构建类别权重均衡分布损失函数，解决所有类别的正负样本失衡的问题；1) Construct a category weight balanced distribution loss function based on the focal loss function to solve the problem of imbalance between positive and negative samples of all categories;

2)引入合页损失函数Hinge loss构建类间损失函数，实现不同类别样本的类间距最大化；3)均衡损失函数：构建整体损失函数。2) Introduce the hinge loss function Hinge loss to construct an inter-class loss function to maximize the class distance of samples of different categories; 3) Balance loss function: Construct an overall loss function.

进一步的，所述类别权重均衡分布损失函数Further, the category weight balanced distribution loss function

其中，p_t为类别t样本为正的概率，M为样本类别数量，t代表某一个类别，γ是超参数，-log(p_t)为初始交叉熵损失函数；λ是可以调节的超参数，设置0<λ<1，其目的为了增加对不同样本分类准确性的可调节性，降低复杂样本惩罚权重，增加良好样本的惩罚贡献。Among them, p _t is the probability that the sample of category t is positive, M is the number of sample categories, t represents a certain category, γ is a hyperparameter, -log(p _t ) is the initial cross entropy loss function; λ is an adjustable hyperparameter , setting 0<λ<1, the purpose is to increase the adjustability of the classification accuracy of different samples, reduce the penalty weight of complex samples, and increase the penalty contribution of good samples.

进一步的，所述类间损失函数为Further, the inter-class loss function is

Hinge＝max(0,1+maxw_wrong-w_correct) (11)Hinge＝max(0,1+maxw _wrong -w _correct ) (11)

其中，w_wrong是错误分类的样本个数，w_correct是正确分类的样本个数，对w_wrong取最大值表示选取错误样本最多的类别。Among them, w _wrong is the number of incorrectly classified samples, w _correct is the number of correctly classified samples, and taking the maximum value of w _wrong means selecting the category with the most incorrect samples.

进一步的，所述整体损失函数如下Further, the overall loss function is as follows

β是控制Hinge loss惩罚项贡献率的超参数，β＞0。β is a hyperparameter that controls the contribution rate of the Hinge loss penalty term, β>0.

与现有技术相比，本发明优点在于：Compared with the prior art, the advantages of the present invention are:

(1)本发明提出了多层级语义分割网络，提取不同尺度的特征，将这些特征在同一层级进行融合，实现缺失信息的互补，既能学习细粒度的局部特征，保留小类别的信息，又能学习整个全局上下文的语义特征，充分利用全局的上下文信息，保留了大尺度的信息；在保留图像局部细节信息的前提下，克服多尺度物体的相互影响，提升遥感图像分割的鲁棒性与准确性。(1) The present invention proposes a multi-level semantic segmentation network to extract features of different scales and fuse these features at the same level to achieve complementation of missing information. It can not only learn fine-grained local features and retain small category information, but also It can learn the semantic features of the entire global context, make full use of the global context information, and retain large-scale information; on the premise of retaining local detail information of the image, it can overcome the mutual influence of multi-scale objects and improve the robustness and accuracy of remote sensing image segmentation. accuracy.

(2)本发明还提供一种基于贝叶斯的多尺度后融合语义分割方法，针对遥感图像尺度依赖性特点，通过研究多尺度后融合方法，将不同尺度结果分别建模为先验和似然，利用贝叶斯原理进行最优决策，并验证了该方法能够提升分割的准确率。(2) The present invention also provides a Bayesian-based multi-scale post-fusion semantic segmentation method. In view of the scale-dependent characteristics of remote sensing images, by studying the multi-scale post-fusion method, different scale results are modeled as prior and similarity respectively. However, Bayes' principle is used to make optimal decisions, and it is verified that this method can improve the accuracy of segmentation.

使用本发明的贝叶斯融合方法能更好的识别物体的语义信息，整个物体的轮廓以及类别的分布相对更加清晰，同时不同类别物体的边界更明显，并且该方法还改进了网络输出分割图的性能。The Bayesian fusion method of the present invention can better identify the semantic information of objects. The outline of the entire object and the distribution of categories are relatively clearer. At the same time, the boundaries of different categories of objects are more obvious, and this method also improves the network output segmentation map. performance.

(3)本发明设计了非均衡遥感图像语义分割的均衡化损失函数，针对遥感图像语义分布不均衡特点，特别是空间差异导致的前景和背景不均衡现象，基于聚焦损失思想，平衡了难易训练样本的损失权重，减少易于学习类别的权重，增加难于学习类别的权重，提升了训练的平稳性；同时引入合页损失，扩大类间距离，使不同类别样本的边界更明显，从而提高分割结构的准确度和对局部信息分类的清晰度。(3) The present invention designs an equalization loss function for semantic segmentation of unbalanced remote sensing images. In view of the uneven semantic distribution characteristics of remote sensing images, especially the imbalance of foreground and background caused by spatial differences, based on the idea of focusing loss, the difficulty is balanced. The loss weight of training samples reduces the weight of easy-to-learn categories and increases the weight of difficult-to-learn categories, which improves the stability of training; at the same time, hinge loss is introduced to expand the distance between classes and make the boundaries of samples of different categories more obvious, thereby improving segmentation Structural accuracy and clarity in classifying local information.

通过本发明的均衡化损失算法使得语义分割网络模型在类别不平衡数据上也能实现均匀的分割。Through the equalization loss algorithm of the present invention, the semantic segmentation network model can also achieve uniform segmentation on category imbalanced data.

附图说明Description of the drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. Those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

图1为本发明的多层级语义分割网络架构图；Figure 1 is a multi-level semantic segmentation network architecture diagram of the present invention;

图2为本发明的第一层分割网络编码解码结构；Figure 2 is the first layer segmentation network encoding and decoding structure of the present invention;

图3为本发明的第二层分割网络编码解码结构；Figure 3 is the second layer segmentation network encoding and decoding structure of the present invention;

图4为本发明的第三层分割网络编码解码结构；Figure 4 is the third layer segmentation network coding and decoding structure of the present invention;

图5为本发明的贝叶斯图像融合算法示意图。Figure 5 is a schematic diagram of the Bayesian image fusion algorithm of the present invention.

具体实施方式Detailed ways

下面结合附图及具体实施例对本发明作进一步的说明。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

实施例1Example 1

本实施例设计了多层级深度神经网络模型，具体是提出一种面向非均衡遥感图像的多尺度语义分割模型，采用既能学习细粒度局部特征、保留小类别信息、又能学习整个全局上下文语义特征、保留大尺度信息的多层级语义分割网络。网络架构如图1所示。多层级语义分割网络的整个网络架构分为三层，第一层级对应图1中的Level 1，采用原始分辨率的数据；第二层级对应图1中的Level 2,采用下采样2倍之后的数据；第三层级对应图1中的Level 3，采用下采样4倍之后的数据。通过对遥感图像进行两次下采样，获得更多的局部信息和全局信息。This embodiment designs a multi-level deep neural network model. Specifically, it proposes a multi-scale semantic segmentation model for non-equilibrium remote sensing images, which can learn fine-grained local features, retain small category information, and learn the entire global context semantics. Feature,multi-level semantic segmentation networks that preserve large-scale,information. The network architecture is shown in Figure 1. The entire network architecture of the multi-level semantic segmentation network is divided into three layers. The first level corresponds to Level 1 in Figure 1, using original resolution data; the second level corresponds to Level 2 in Figure 1, using 2 times downsampling. Data; the third level corresponds to Level 3 in Figure 1, using data after downsampling 4 times. By downsampling the remote sensing image twice, more local and global information can be obtained.

针对多尺度信息，每层采用不同的网络结构提取不同尺度的特征，尽可能在特征提取过程中保留更多的视觉信息，输出不同分辨率的分割图像；然后，将这些特征在同一层级采用贝叶斯融合方法进行图像后融合，实现多尺度分割图像信息的融合，实现缺失信息的互补，解决对遥感图像的精确分割。For multi-scale information, each layer uses different network structures to extract features of different scales, retain as much visual information as possible in the feature extraction process, and output segmented images of different resolutions; then, these features are used at the same level using Bayern The Yeasian fusion method performs post-image fusion to achieve the fusion of multi-scale segmented image information, complement the missing information, and solve the precise segmentation of remote sensing images.

本发明进行分割的多层级网络模型与其他经典的深度神经网络分割模型的优势之处在于多层级网络既能保留很好的局部细节信息，同时较好的保留了全局的语义信息。The advantage of the multi-level network model for segmentation of the present invention and other classic deep neural network segmentation models is that the multi-level network can not only retain good local detail information, but also better retain global semantic information.

1.下面详细介绍本实施例的多层级网络架构：1. The multi-level network architecture of this embodiment is introduced in detail below:

本实施例采用的主干网络为SegNet语义分割网络，如图2、3、4所示，分别对应图1中的Level 1，Level 2，Level 3。网络的左边是编码器，由5个卷积池化的过程组成，前两层的每一层包含两个卷积层，后三层的每一层包含三个卷积层。通过卷积层提取图像的特征，然后用池化层来缩小特征图大小，增大感受野。池化层采用最大池化来实现小空间移动上的空间不变性，但是会造成定位精度的损失和空间细节的丢失。The backbone network used in this embodiment is the SegNet semantic segmentation network, as shown in Figures 2, 3, and 4, which correspond to Level 1, Level 2, and Level 3 in Figure 1 respectively. The left side of the network is the encoder, which consists of 5 convolutional pooling processes. Each of the first two layers contains two convolutional layers, and each of the last three layers contains three convolutional layers. The features of the image are extracted through the convolutional layer, and then the pooling layer is used to reduce the size of the feature map and increase the receptive field. The pooling layer uses maximum pooling to achieve spatial invariance in small space movements, but it will cause loss of positioning accuracy and loss of spatial details.

编码阶段的每一层和解码阶段的每一层一一对应，类似于U-net网络的U型结构。在解码器的每个上采样层与同层级编码器的最大池化层对应，解码器的上采样层使用最大池化过程保留的索引来进行上采样特征图。通过上采样层使在编码阶段图像分类的特征得以重现，产生稠密的特征图，最后将这些特征图恢复到与原始图像相同的大小，再通过softmax层进行分类，即产生最后的分割图。SegNet训练参数少，占用计算内存比较小，同时还能保证分割的准确率，适应于高分辨率的遥感图像语义分割。Each layer in the encoding stage corresponds to each layer in the decoding stage, similar to the U-shaped structure of the U-net network. Each upsampling layer in the decoder corresponds to the max pooling layer of the encoder at the same level. The upsampling layer of the decoder uses the index retained by the max pooling process to upsample the feature map. The features of the image classification in the encoding stage are reproduced through the upsampling layer to generate dense feature maps. Finally, these feature maps are restored to the same size as the original image, and then classified through the softmax layer, which generates the final segmentation map. SegNet has fewer training parameters and takes up less computing memory. It can also ensure the accuracy of segmentation and is suitable for semantic segmentation of high-resolution remote sensing images.

2.下面详细介绍本实施例的图像后融合方法：2. The image post-fusion method of this embodiment is introduced in detail below:

多尺度网络会输出不同分辨率的分割图像，由于分辨率的不同，分割效果存在差异。本专利提出了基于贝叶斯原理的多尺度后融合语义分割方法，在显著性检测任务中，后验概率通过整合显著性映射计算：The multi-scale network will output segmented images of different resolutions. Due to different resolutions, the segmentation effects are different. This patent proposes a multi-scale post-fusion semantic segmentation method based on Bayesian principle. In the saliency detection task, the posterior probability is calculated by integrating the saliency mapping:

S₁和S₂都是显著性映射，其中一个用作先验概率S_i(i＝{1,2})，另一个S_j(j≠i,j＝{1,2})用于计算似然率；F_i和B_i分别表示前景和背景区域，每个区域中的似然率通过以下公式计算：Both S ₁ and S ₂ are saliency maps, one of which is used as the prior probability Si ( _i ={1,2}) and the other S _j (j≠i,j={1,2}) for calculation Likelihood rate; F _i and B _i represent the foreground and background areas respectively, and the likelihood rate in each area is calculated by the following formula:

其中表示前景中的像素数，/>是其颜色特征落入包含特征S_j的前景/>中的像素数；/>表示背景中的像素数，/>是其颜色特征落入包含特征S_j的背景仓/>中的像素数。in Represents the number of pixels in the foreground,/> is that its color feature falls into the foreground containing feature S _j /> The number of pixels in ;/> Represents the number of pixels in the background, /> Its color feature falls into the background bin containing feature S _j /> number of pixels in .

本发明的多层级语义分割网络的三层网络输出分割图O₁,O₂和O₃，三层网络输出的分割图在进行图像后融合时，选择其中任意一层输出的分割图作为先验O_i(i＝1,2,3)，除此分割图外的任意一个分割图O_j(j≠i,j＝{1,2,3})计算似然，因此后验概率的计算为：The three-layer network of the multi-level semantic segmentation network of the present invention outputs segmentation maps O ₁ , O ₂ and O ₃ . When performing image post-fusion on the segmentation maps output by the three-layer network, the segmentation map output by any one of the layers is selected as a priori O _i (i=1,2,3), any segmentation map O _j (j≠i,j={1,2,3}) except this segmentation map calculates the likelihood, so the posterior probability is calculated as :

其中，n表示当前像素的类别，m表示类别的数量；F_ni和B_ni分别表示在类别为n时的前景区域和背景区域；O_ni(i＝1,2,3)作为先验，O_nj(j≠i,j＝{1,2,3})用来计算似然；在每一个区域，通过比较O_ni和O_nj在每个类别的前景和背景来计算似然：Among them, n represents the category of the current pixel, m represents the number of categories; F _ni and B _ni respectively represent the foreground area and background area when the category is n; O _ni (i=1,2,3) is used as a priori, O _nj (j≠i,j={1,2,3}) is used to calculate the likelihood; in each region, the likelihood is calculated by comparing the foreground and background of O _ni and O _nj in each category:

表示第n个类别的前景像素数量，/>是包含特征O_nj(z)的前景区域中颜色特征的像素数量。使用O_nj作为先验来计算一次后验概率。 Represents the number of foreground pixels of the nth category,/> is the number of pixels of color features in the foreground area containing feature O _nj (z). Use O _nj as the prior to calculate the posterior probability.

作为一个优选的实施例，三层网络输出的分割图进行图像后融合时，首先使用前两层的输出分割图进行融合；然后将前两层融合得到的分割图与第三层输出的分割图进行融合，分别如下(6)、(7)所示。As a preferred embodiment, when the segmentation map output by the three-layer network is post-image fused, the segmentation map output by the first two layers is first used for fusion; then the segmentation map obtained by the fusion of the first two layers is combined with the segmentation map output by the third layer. Fusion is performed as shown in (6) and (7) below.

O_n4(z)＝O_B(O_n1(z),O_n2(z))＝p(F_n1|O_n2(z))+p(F_n2|O_n1(z)) (6)O _n4 (z)＝O _B (O _n1 (z),O _n2 (z))＝p(F _n1 |O _n2 (z))+p(F _n2 |O _n1 (z)) (6)

O(z)＝O_B(O_n3(z),O_n4(z))＝p(F_n3|O_n4(z))+p(F_n4|O_n3(z)) (7)O(z)＝O _B (O _n3 (z),O _n4 (z))＝p(F _n3 |O _n4 (z))+p(F _n4 |O _n3 (z)) (7)

结合图5所示，具体是：As shown in Figure 5, the details are:

基于贝叶斯融合反复强制使用不同的输出分割图作为先验，可以融合不同分辨率分割图的有效信息，提升图像分割的精度。Based on Bayesian fusion, it is repeatedly forced to use different output segmentation maps as a priori, which can fuse the effective information of segmentation maps of different resolutions and improve the accuracy of image segmentation.

实施例2Example 2

本实施例提供一种面向非均衡遥感图像的多尺度语义分割方法，包括以下步骤：This embodiment provides a multi-scale semantic segmentation method for unbalanced remote sensing images, including the following steps:

一、改进语义分割网络架构1. Improved semantic segmentation network architecture

构建多层级语义分割网络模型，每层网络输出不同尺度分割图，采用贝叶斯融合方法融合多尺度分割图像信息。A multi-level semantic segmentation network model is constructed. Each layer of the network outputs segmentation maps of different scales, and the Bayesian fusion method is used to fuse multi-scale segmentation image information.

所述语义分割网络可采用经典语义分割网络，作为一个优选的实施例，所述的多层级语义分割网络模型可以直接采用实施例1所述的模型，具体可参见实施例1的记载，此处不再赘述。The semantic segmentation network can use a classic semantic segmentation network. As a preferred embodiment, the multi-level semantic segmentation network model can directly use the model described in Embodiment 1. For details, please refer to the records of Embodiment 1, here No longer.

二、均衡化损失函数2. Equalization loss function

采用既能使不同类别像素更加分离、又能使相同类别像素更加聚合的优化算法，具体如下：Use an optimization algorithm that can both make pixels of different categories more separated and make pixels of the same category more aggregated, as follows:

1)基于聚焦损失函数Focal loss构建类别权重均衡分布损失函数，解决所有类别的正负样本失衡的问题。1) Based on the focal loss function, a category weight balanced distribution loss function is constructed to solve the problem of imbalance between positive and negative samples of all categories.

2)引入合页损失函数Hinge loss构建类间损失函数，实现不同类别样本的类间距最大化。2) Introduce the hinge loss function Hinge loss to construct an inter-class loss function to maximize the class distance of samples of different categories.

3)均衡损失函数：构建整体损失函数。3) Equilibrium loss function: Construct an overall loss function.

下面分别介绍：The following are introduced respectively:

1.类别权重均衡分布1. Balanced distribution of category weights

在多分类问题的解决上，数据集的样本类别分布不均，负样本的数量太大，并且大多数样本区分度较大，这经常导致训练过程中学习无效。在对遥感图像的分割问题中，这一问题尤其明显。使用Focal loss损失函数的目的主要是针对目标检测情景的背景与前景之间存在的极端不平衡问题。它是在交叉熵损失(可直接参考现有技术)的基础上增加调制因子改进而来，具体的公式如下：In solving multi-classification problems, the sample categories of the data set are unevenly distributed, the number of negative samples is too large, and most samples are highly differentiated, which often leads to ineffective learning during the training process. This problem is especially obvious in the segmentation problem of remote sensing images. The purpose of using the Focal loss loss function is mainly to address the extreme imbalance problem between the background and foreground of the target detection scenario. It is improved by adding a modulation factor on the basis of cross-entropy loss (you can directly refer to the existing technology). The specific formula is as follows:

在这个公式中，p_t为类别t样本为正的概率，M为样本类别数量，t代表某一个类别，-log(p_t)为初始交叉熵损失函数；γ≥0是可调节的超参数，(1-p_t)^γ为调制因子。对于容易学习的样本，p_t的值接近于1，这时调制因子趋向于零。然而对于困难样本和错误分类的样本p_t的值很小，调制因子的值就会相应增加，以平衡由于样本问题带来的训练无效问题。这一优势可以有效解决遥感图像中由于类别不均衡导致的训练失效问题。因此针对遥感图像分割的多分类问题，类间损失函数公式调整如下：In this formula, p _t is the probability that the sample of category t is positive, M is the number of sample categories, t represents a certain category, -log(p _t ) is the initial cross-entropy loss function; γ≥0 is an adjustable hyperparameter , (1-p _t ) ^γ is the modulation factor. For easy-to-learn samples, the value of p _t is close to 1, and the modulation factor tends to zero. However, for difficult samples and misclassified samples p _t have a small value, the value of the modulation factor will increase accordingly to balance the ineffective training problem caused by sample problems. This advantage can effectively solve the problem of training failure caused by category imbalance in remote sensing images. Therefore, for the multi-classification problem of remote sensing image segmentation, the inter-class loss function formula is adjusted as follows:

其中，λ是可以调节的超参数，设置0<λ<1，其目的为了增加对不同样本分类准确性的可调节性，在原有公式(8)的基础上进一步减小复杂样本训练中惩罚项的权重，增加简单样本分类中惩罚项的权重，做到类间样本的平衡一致性。例如当p_t较高时，说明对该类样本的置信度较高，如果设置λ＝1，那么对齐的惩罚权重将会较小。其对训练中的贡献值会降低。同样，如果p_t较小，说明判决样本的分类较难，样本属于复杂样本，λ为1时，训练权重较大，其对训练中的贡献将较高，对λ设置，使其0<λ<1，这样会降低复杂样本惩罚权重，增加良好样本的惩罚贡献，进而提高良性样本的分类准确率，因此，λ的有效设定，可以在复杂样本和容易样本直接找到一个良性的平衡，进而提高整体样本分类准确性。Among them, λ is an adjustable hyperparameter, and is set to 0<λ<1. The purpose is to increase the adjustability of the classification accuracy of different samples and further reduce the penalty term in complex sample training based on the original formula (8). The weight of the penalty term in simple sample classification is increased to achieve balanced consistency of samples between classes. For example, when p _t is higher, it means that the confidence in this type of sample is higher. If λ=1 is set, the penalty weight of the alignment will be smaller. Its contribution to training will be reduced. Similarly, if p _t is small, it means that the classification of the judgment sample is difficult and the sample is a complex sample. When λ is 1, the training weight is larger and its contribution to training will be higher. Set λ so that 0<λ <1, this will reduce the penalty weight of complex samples, increase the penalty contribution of good samples, and thereby improve the classification accuracy of benign samples. Therefore, the effective setting of λ can directly find a benign balance between complex samples and easy samples, and then Improve overall sample classification accuracy.

2.类间平衡2. Inter-class balance

由于遥感图像中，相邻样本的差异性较小，如何扩大类间样本的差异，也是需要解决的问题，本实施例引入Hinge loss(HL)损失函数，HL通常用于支持向量机(SVM)的最大间隔分类任务中，缩小类内距离，增大类间距离，以实现最大边界，对于二进制分类，其公式如下：Since the difference between adjacent samples in remote sensing images is small, how to expand the difference between samples between classes is also a problem that needs to be solved. This embodiment introduces the Hinge loss (HL) loss function. HL is usually used in support vector machines (SVM). In the maximum margin classification task, the intra-class distance is reduced and the inter-class distance is increased to achieve the maximum boundary. For binary classification, the formula is as follows:

Hinge＝max(0,1-y*y_pre) (10)Hinge＝max(0,1-y*y _pre ) (10)

在上式中，y是一个真实样本的标签，它的取值只能是-1或1。y_pre是预测值。当此预测值的绝对值大于等于1时，样本与分界线的距离大于1，这种情况不会有任何的奖励，因为这样对样本可以正确分类的概率是相当大的。它的多分类形式如下：In the above formula, y is the label of a real sample, and its value can only be -1 or 1. y _pre is the predicted value. When the absolute value of this predicted value is greater than or equal to 1, the distance between the sample and the dividing line is greater than 1. In this case, there will be no reward, because the probability of correctly classifying the sample is quite high. Its multi-classification form is as follows:

Hinge＝max(0,1+maxw_wrong-w_correct)Hinge＝max(0,1+maxw _wrong -w _correct )

(11)(11)

其中，w_wrong是错误分类的样本个数，w_correct是正确分类的样本个数。对w_wrong取最大值表示选取错误样本最多的类别。当错误分类的样本个数较多时，公式(11)会给与该训练数据较大的惩罚项，促使训练继续进行，只有当错误训练数据较少，且正确的训练样本较多时，惩罚项会自动降低，进而加速训练过程尽快结束。这样可以促使类内样本趋于一致，扩大类间样本的间隔，提高分类的准确性。Among them, w _wrong is the number of incorrectly classified samples, and w _correct is the number of correctly classified samples. Taking the maximum value of w _wrong means selecting the category with the most wrong samples. When the number of incorrectly classified samples is large, formula (11) will give a larger penalty term to the training data, prompting the training to continue. Only when there are less incorrect training data and more correct training samples, the penalty term will be Automatically reduces, thus speeding up the training process to end as quickly as possible. This can promote the consistency of samples within a class, expand the interval of samples between classes, and improve the accuracy of classification.

3.平衡算法3. Balance algorithm

最终为了解决类别不平衡问题，同时增加不同类别的样本间隔并提高分割的准确性，本发明整体的损失函数如下：Finally, in order to solve the problem of category imbalance, while increasing the sample interval of different categories and improving the accuracy of segmentation, the overall loss function of the present invention is as follows:

其中，β是控制Hinge loss惩罚项贡献率的超参数，β＞0。最终，本发明采用经典的梯度下降法对模型参数进行优化，得到最优参数，进而对训练样本进行测试。Among them, β is the hyperparameter that controls the contribution rate of the Hinge loss penalty term, β>0. Finally, the present invention uses the classic gradient descent method to optimize the model parameters, obtain the optimal parameters, and then test the training samples.

综上所述，本发明改进语义分割网络结构和设计损失函数均衡化方法，采用空间多尺度并行后融合框架实现尺度差异性刻画，既能保留很好的局部细节信息，同时较好的保留了全局的语义信息；基于像素级样本分布不均衡设计损失函数，基于聚焦损失思想，平衡了难易训练样本的损失权重，提升了训练的平稳性；同时引入合页损失，扩大类间距离，提升了语义分割的准确率；均衡化损失函数和经典语义分割网络进行结合，提高分割结构的准确度和对局部信息分类的清晰度。In summary, the present invention improves the semantic segmentation network structure and designs the loss function equalization method, and uses a spatial multi-scale parallel post-fusion framework to achieve scale difference characterization, which can not only retain good local detail information, but also better retain Global semantic information; the loss function is designed based on the uneven distribution of pixel-level samples. Based on the idea of focused loss, the loss weight of difficult and easy training samples is balanced, improving the stability of training; at the same time, hinge loss is introduced to expand the distance between classes and improve The accuracy of semantic segmentation is improved; the equalization loss function is combined with the classic semantic segmentation network to improve the accuracy of segmentation structure and the clarity of local information classification.

当然，上述说明并非是对本发明的限制，本发明也并不限于上述举例，本技术领域的普通技术人员，在本发明的实质范围内，做出的变化、改型、添加或替换，都应属于本发明的保护范围。Of course, the above description is not a limitation of the present invention, and the present invention is not limited to the above examples. Those of ordinary skill in the art should make changes, modifications, additions or substitutions within the essential scope of the present invention. belong to the protection scope of the present invention.

Claims

1. A construction method of a multi-scale semantic segmentation model for unbalanced remote sensing images is characterized by adopting a multi-level semantic segmentation network which can learn fine-grained local features, retain small-class information, learn whole global context semantic features and retain large-scale information; the whole network architecture of the multi-level semantic segmentation network is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images with different resolutions are output, and the features are subjected to graph in the same level by adopting a Bayesian fusion methodAfter-image fusion, fusion of multi-scale segmentation image information is realized, and complementation of missing information is realized; three-layer network output segmentation map O of the multi-layer semantic segmentation network ₁ ,O ₂ And O ₃ When the segmentation graphs output by the three layers of networks are fused after the images are processed, selecting any one layer of segmentation graph output as priori O _i I=1, 2,3, any one of the divided maps O except the divided map _j The likelihood, j+.i, j= {1,2,3}, is calculated, so the posterior probability is calculated as:

n represents the class of the current pixel, m represents the number of classes; f (F) _ni And B _ni Respectively representing a foreground region and a background region when the category is n; o (O) _ni As a priori i=1, 2,3, o _nj For computing likelihood, j+.i, j= {1,2,3}; in each region, by comparing O _ni And O _nj Calculating likelihood at the foreground and background of each category;

when the segmentation graphs output by the three layers of networks are fused after images, the output segmentation graphs of the first two layers are fused; then fusing the segmentation map obtained by fusing the first two layers with the segmentation map output by the third layer, specifically:

firstly, using a segmentation map output by a first layer of network as a priori, calculating likelihood by using a segmentation map output by a second layer of network, and then merging information of the two segmentation maps based on a Bayesian formula;

then exchanging the two, taking the segmentation map output by the second layer as a priori, calculating likelihood ratio by using the segmentation map output by the first layer, and integrating based on a Bayesian formula;

finally, the segmentation maps of the first two network layers and the third network layer are fused in the same way to obtain the final integrated segmentation map.

2. The method for constructing the multi-scale semantic segmentation model for the unbalanced remote sensing image according to claim 1, wherein a first Level 1 of the multi-Level semantic segmentation network model adopts data with original resolution, a second Level 2 adopts data after downsampling by 2 times, and a third Level 3 adopts data after downsampling by 4 times;

the main network adopted by the multi-level semantic segmentation network model is a SegNet semantic segmentation network, the left side of the network is an encoder, the network is composed of 5 convolution pooling processes, each of the first two layers comprises two convolution layers, and each of the last three layers comprises three convolution layers;

the right of the network is a decoder which is composed of 5 up-sampling and convolution processes, the first layer and the fourth layer on the right are an up-sampling layer and two convolution layers, the second layer and the third layer are an up-sampling layer and three convolution layers, the fifth layer is an up-sampling layer and two convolution layers, and finally a Softmax layer is added.

3. The method for constructing the multi-scale semantic segmentation model for the unbalanced remote sensing image according to claim 2, wherein each layer of the encoding stage corresponds to each layer of the decoding stage one by one; each up-sampling layer of the decoder corresponds to the maximum pooling layer of the same-level encoder, the up-sampling layer of the decoder uses the index reserved in the maximum pooling process to up-sample the feature map, so that the features of the image classification in the encoding stage are reproduced, dense feature maps are generated, and finally the feature maps are restored to the same size as the original image, and classified by the softmax layer, namely the final segmentation map is generated.

4. The multi-scale semantic segmentation method for the unbalanced remote sensing image is characterized by comprising the following steps of:

1. improving semantic segmentation network architecture: constructing a multi-level semantic segmentation network model, outputting different scale segmentation graphs by each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method;

2. equalizing the loss function: the optimization algorithm which can separate pixels of different categories and aggregate pixels of the same category is adopted, and the optimization algorithm is specifically as follows:

1) Constructing class weight balanced distribution loss functions based on Focal loss functions, and solving the problem of unbalance of positive and negative samples of all classes; the class weight balanced distribution loss function is as follows:

wherein p is _t The probability of positive for a class tsample, M is the number of sample classes, t represents a class, γ is the hyper-parameter, -log (p _t ) Is an initial cross entropy loss function; lambda is the adjusted hyper-parameter, set 0<λ<1, in order to increase the adjustability of classification accuracy of different samples, reducing the punishment weight of complex samples and increasing the punishment contribution of good samples;

2) A Hinge loss function Hinge loss is introduced to construct an inter-class loss function, so that class spacing maximization of samples of different classes is realized; the inter-class loss function is:

Hinge＝max(0,1+max w _wrong -w _correct ) (11)

wherein w is _wrong The number of samples misclassified, w _correct The number of correctly classified samples is equal to w _wrong The maximum value is taken to represent the category with the most error samples;

3) Equalizing the loss function: and constructing an overall loss function.

5. The unbalanced remote sensing image oriented multi-scale semantic segmentation method of claim 4, wherein the overall loss function is as follows:

wherein, beta is a super parameter for controlling the contribution rate of the finger loss penalty term, and beta is more than 0.