CN115937704A

CN115937704A - Remote sensing image road segmentation method based on topology perception neural network

Info

Publication number: CN115937704A
Application number: CN202211575990.2A
Authority: CN
Inventors: 郭学俊; 周瑞森; 刘伟琳; 李龙
Original assignee: Taiyuan University of Technology
Current assignee: Taiyuan University of Technology
Priority date: 2022-12-09
Filing date: 2022-12-09
Publication date: 2023-04-07
Anticipated expiration: 2042-12-09
Also published as: CN115937704B

Abstract

The invention belongs to the technical field of remote sensing image semantic segmentation methods, in particular to a remote sensing image road segmentation method based on topology perception neural network. It includes the following steps: 1) divide the high-resolution remote sensing data set into training set and test set, and preprocess the remote sensing images in the training set and test set respectively; 2) build the image semantic segmentation network model; 3) input the training set The semantic segmentation network first initializes the semantic segmentation network, then updates the parameters in the model, and optimizes the loss function until convergence; 4) Input the test set data into the trained generator module to obtain high-precision semantic segmentation results. The invention solves the problem of insufficient feature expression ability, inability to accurately perceive the topological structure of the road, limited context information to a single training sample, large loss of edge and detail information in the process of semantic segmentation, and large loss of edge and detail information in the existing remote sensing image road automatic extraction technology The large sample size required and so on.

Description

Road segmentation method for remote sensing images based on topology-aware neural network

技术领域Technical Field

本发明属于遥感图像语义分割方法技术领域，具体是一种基于拓扑感知神经网络的遥感图像道路分割方法。The invention belongs to the technical field of remote sensing image semantic segmentation methods, and specifically is a remote sensing image road segmentation method based on a topology-aware neural network.

背景技术Background Art

随着高分辨率遥感卫星不断发射升空、无人机技术的日益广泛应用、遥感图像的空间分辨率不断提升，对地观测技术已经进入高分辨率遥感大数据时代。高分辨率遥感大数据中，路网信息不仅直观反映了一个国家或地区的经济和社会发展水平，而且交通管理、城市规划、自动导航等领域均有重要的应用价值。With the continuous launch of high-resolution remote sensing satellites, the increasing application of drone technology, and the continuous improvement of the spatial resolution of remote sensing images, earth observation technology has entered the era of high-resolution remote sensing big data. In high-resolution remote sensing big data, road network information not only directly reflects the economic and social development level of a country or region, but also has important application value in traffic management, urban planning, automatic navigation and other fields.

传统高分辨率遥感图像道路分割主要依靠人工标注，不仅费时、费力而且结果还具有很强的主观性。为了提高检测效率，各种基于对象的语义分割方法被不断提出，并取得了一定应用成果。但是这些技术严重依赖基于纹理、几何形状和边缘等人工设计的低级语义特征，无法很好地应用到高分辨率遥感图像这种复杂场景、因而泛化性能差而且容易受各种噪声干扰影响。Traditional high-resolution remote sensing image road segmentation mainly relies on manual annotation, which is not only time-consuming and laborious, but also highly subjective. In order to improve detection efficiency, various object-based semantic segmentation methods have been continuously proposed and have achieved certain application results. However, these technologies rely heavily on low-level semantic features designed manually based on texture, geometry, and edges, and cannot be well applied to complex scenes such as high-resolution remote sensing images. Therefore, they have poor generalization performance and are easily affected by various noise interferences.

基于深度学习的全卷积神经网络通过多层网络结构和非线性变换能够自动地从遥感图像中获取高级语义特征。此外，全卷积神经网络端到端、像素到像素式的实现方式，能够提供像素级道路的识别、定位，是目前最具潜力的是遥感图像道路提取方法。但目前用于遥感图像道路提取的全卷积神经网络主要专注于像素分类精度，无法准确感知道路的拓扑结构而且语义分割的过程中边缘和细节信息损失较大、容易受到遮挡或阴影的影响，严重影响了结果的精度和完整性。此外，全卷积模型往往依赖海量的高精度训练样本或由日常场景海量样本训练所得的预训练模型。然而，遥感图像的像素级标注却往往费时、费力。同时，由于日常场景图像与遥感图像差异较大，因而由日常场景图像训练得到的预训练模型分割精度经常差强人意。此外，这些复杂的模型参数量巨大对存储和计算设备均提出了较高要求，训练和应用模型也均非常耗时。这些缺陷极大的限制了这些方法的实际应用。The fully convolutional neural network based on deep learning can automatically obtain high-level semantic features from remote sensing images through a multi-layer network structure and nonlinear transformation. In addition, the end-to-end, pixel-to-pixel implementation of the fully convolutional neural network can provide pixel-level road recognition and positioning, and is currently the most promising method for extracting roads from remote sensing images. However, the fully convolutional neural network currently used for road extraction from remote sensing images mainly focuses on pixel classification accuracy, cannot accurately perceive the topological structure of the road, and the edge and detail information is lost in the process of semantic segmentation, and is easily affected by occlusion or shadows, which seriously affects the accuracy and integrity of the results. In addition, the fully convolutional model often relies on a large number of high-precision training samples or pre-trained models trained from a large number of samples of daily scenes. However, pixel-level annotation of remote sensing images is often time-consuming and laborious. At the same time, due to the large difference between daily scene images and remote sensing images, the segmentation accuracy of the pre-trained model trained from daily scene images is often unsatisfactory. In addition, the huge number of parameters in these complex models places high demands on storage and computing devices, and training and applying models are also very time-consuming. These defects greatly limit the practical application of these methods.

发明内容Summary of the invention

本发明为了综合解决现有高分辨遥感图像道路分割方法无法准确感知拓扑结构、语义分割的过程中边缘和细节信息损失较大、容易受到遮挡和阴影的干扰、模型效率低下和训练困难等问题。提供一种基于拓扑感知神经网络的遥感图像道路分割方法。The present invention aims to comprehensively solve the problems that the existing high-resolution remote sensing image road segmentation method cannot accurately perceive the topological structure, the edge and detail information are lost greatly during the semantic segmentation process, it is easily disturbed by occlusion and shadow, the model is inefficient and difficult to train. A remote sensing image road segmentation method based on a topology-aware neural network is provided.

本发明采取以下技术方案：一种高分辨率遥感图像语义分割网络，包括下采样路径、空间上下文模块和上采样路径，所述下采样路径包括可变形卷积层以及5个连续的下采样单元，输入数据首先通过可变形卷积层，得到语义特征图，随后通过5个连续的下采样单元进行特征提取和下采样，最后输出一个特征图；所述上采样路径依次由5个连续的上采样单元、一个A聚合操作、一个卷积层以及Softmax层组成，最后生成分割预测图；所述5个下采样单元和5上采样单元之间一一对应，并采用经过通道注意力模块的注意力调整的跳跃连接相连；所述空间上下文模块对下采样路径输出的特征图进行分割融合后并输出给上采样路径。The present invention adopts the following technical scheme: a high-resolution remote sensing image semantic segmentation network, including a downsampling path, a spatial context module and an upsampling path, the downsampling path includes a deformable convolution layer and 5 continuous downsampling units, the input data first passes through the deformable convolution layer to obtain a semantic feature map, then passes through 5 continuous downsampling units for feature extraction and downsampling, and finally outputs a feature map; the upsampling path is composed of 5 continuous upsampling units, an A aggregation operation, a convolution layer and a Softmax layer in sequence, and finally generates a segmentation prediction map; the 5 downsampling units and the 5 upsampling units correspond one to one, and are connected by a jump connection adjusted by the attention of a channel attention module; the spatial context module performs segmentation and fusion on the feature map output by the downsampling path and outputs it to the upsampling path.

在一些实施例中，可变形卷积层是一个卷积核大小为3×3、步长为1的可变形卷积，该操作对3×3卷积核中每个采样点的位置都增加了一个可学习的偏移变量和权重系数。In some embodiments, the deformable convolution layer is a deformable convolution with a convolution kernel size of 3×3 and a stride of 1, which adds a learnable offset variable and weight coefficient to the position of each sampling point in the 3×3 convolution kernel.

在一些实施例中，下采样单元包括一次聚合模块、聚合操作和下转换模块，其中一次聚合模块负责提取特征，然后聚合操作将一次聚合模块的输入和输出聚合，再将结果特征图分别传输至下转换模块和通道注意力模块，其中通道注意力模块的输出与输入进行对应元素相乘，得到经过通道注意力调整的跳跃连接特征并输入至上采样路径对应部分。In some embodiments, the downsampling unit includes a primary aggregation module, an aggregation operation, and a down-conversion module, wherein the primary aggregation module is responsible for extracting features, and then the aggregation operation aggregates the input and output of the primary aggregation module, and then transmits the resulting feature map to the down-conversion module and the channel attention module respectively, wherein the output of the channel attention module is multiplied by the input by corresponding elements to obtain the jump connection features adjusted by the channel attention and input to the corresponding part of the upsampling path.

在一些实施例中，上采样单元由一组上转换模块、B聚合操作和一次聚合模块，其中上转换模块负责通过上采样恢复特征图的空间分辨率，B聚合操作负责将通道注意力模块调整过的跳跃连接特征图和上转换模块所得特征图进行聚合，一次聚合模块负责从B聚合操作的结果中提取特征。In some embodiments, the upsampling unit consists of a group of up-conversion modules, B aggregation operations and one-time aggregation modules, wherein the up-conversion module is responsible for restoring the spatial resolution of the feature map through upsampling, the B aggregation operation is responsible for aggregating the jump connection feature map adjusted by the channel attention module and the feature map obtained by the up-conversion module, and the one-time aggregation module is responsible for extracting features from the results of the B aggregation operation.

在一些实施例中，一次聚合模块的结构如下：特征图

输入至一次聚合模块后，首先经过

个卷积模块，得到

个新特征图

，其中前两个卷积模块为可变形卷积模块；然后将所得

个特征图

进行通道堆叠操作，得到通道堆叠后的特征图

；所述卷积模块，由批归一化层、ReLU激活函数层、3

3卷积层和随机失活层依次组成；所述可变形卷积模块，由批归一化层、ReLU激活函数层、3

3可变形卷积层和随机失活层依次组成。In some embodiments, the structure of the primary aggregation module is as follows:

After being input into the primary aggregation module, it first passes through

convolution modules, we get

New feature map

, where the first two convolution modules are deformable convolution modules; then the obtained

Feature Map

Perform channel stacking operation to obtain the feature map after channel stacking

; The convolution module consists of a batch normalization layer, a ReLU activation function layer, 3

3 convolutional layers and random dropout layers in sequence; the deformable convolution module consists of a batch normalization layer, a ReLU activation function layer, 3

3 Deformable convolutional layers and random dropout layers are composed in sequence.

在一些实施例中，通道注意力模块的结构为：大小为

的输入特征分别经过全局最大池化和全局平均池化得到两个

的特征图；接着将它们分别送入一个共享多层感知器:第一层神经元个数为

，r为减少率，r=16，激活函数为Relu，第二层神经元个数为

；输出的两个特征进行相加，再经过Sigmoid激活操作，生成最终的通道注意力；通道注意力和输入特征做对应元素乘法操作。In some embodiments, the structure of the channel attention module is:

The input features of are respectively subjected to global maximum pooling and global average pooling to obtain two

feature map; then they are sent to a shared multilayer perceptron: the number of neurons in the first layer is

, r is the reduction rate, r=16, the activation function is Relu, and the number of neurons in the second layer is

; The two output features are added, and then the Sigmoid activation operation is performed to generate the final channel attention; the channel attention and the input features perform corresponding element multiplication operations.

在一些实施例中，空间上下文模块包括“高度方向”和“宽度方向”两条路径；In some embodiments, the spatial context module includes two paths: "height direction" and "width direction";

其中“高度方向”路径来自下采样路径的特征图，大小为

，其中C代表通道数目，H代表特征图高度，W代表特征图宽度，在垂直方向被分为H个切片：

，对每个切片

，

，使用大小为

卷积操作进行线性投影得到

，然后将

输入第一个线性层

，得到注意力

,再对注意力

依次在特征图尺寸和通道维度分别使用Softmax和L1正则化，随后将正则化后的注意力输入第二个线性层

，得到新的切片

，最后将所有新切片沿“高度方向”聚合为大小为

的特征图；The "height direction" path comes from the feature map of the downsampling path, and its size is

, where C represents the number of channels, H represents the feature map height, and W represents the feature map width, and is divided into H slices in the vertical direction:

, for each slice

,

, using a size of

The convolution operation is linearly projected to obtain

, then

Input the first linear layer

, get attention

, and then pay attention

Softmax and L1 regularization are used in the feature map size and channel dimension respectively, and then the regularized attention is input into the second linear layer

, get a new slice

, and finally aggregate all new slices along the "height direction" into a size of

The feature map of

其中“宽度方向”路径，来自下采样路径的特征图沿“宽度”方向被分为W个切片，切片按照同样的方式更新并聚合为特征图；In the “width direction” path, the feature map from the downsampling path is divided into W slices along the “width” direction, and the slices are updated and aggregated into feature maps in the same way;

来自“高度方向”和“宽度方向”两条路径、大小均为

的特征图通过加法运算进行融合。From the "height direction" and "width direction" paths, the size is

The feature maps are fused by addition operation.

一种基于拓扑感知神经网络的遥感图像道路分割方法，包括以下步骤，A remote sensing image road segmentation method based on topology-aware neural network includes the following steps:

S100：将高分辨率遥感数据集划分为训练集和测试集，并对训练集和测试集中的图像分别进行预处理；S100: Divide the high-resolution remote sensing dataset into a training set and a test set, and preprocess the images in the training set and the test set respectively;

S200：搭建高分辨率遥感图像语义分割网络；S200: Building a high-resolution remote sensing image semantic segmentation network;

S300：将S100中经过预处理的训练集图像输入至S200中的高分辨遥感图像语义分割网络进行训练，首先使用He Uniform方法对高分辨遥感图像分割网络进行初始化，然后对模型中参数进行更新，优化损失函数直至收敛；S300: input the preprocessed training set images in S100 into the high-resolution remote sensing image semantic segmentation network in S200 for training. First, the high-resolution remote sensing image segmentation network is initialized using the He Uniform method, and then the parameters in the model are updated to optimize the loss function until convergence.

S400：将经过预处理的测试集遥感图像输入到S200中训练好的分割网络，输出分高分辨率遥感图像语义分割结果。S400: Input the preprocessed test set remote sensing images into the segmentation network trained in S200, and output the semantic segmentation results of the high-resolution remote sensing images.

步骤S100中，所述预处理包括图像人工标注、图像裁剪和数据增强；In step S100, the preprocessing includes manual image annotation, image cropping and data enhancement;

所述的图像人工标注具体为：在ArcGIS软件中人工对高分辨率图像中的道路进行像素级语义标注得到带标签的表面裂纹图像；The manual image annotation specifically includes: manually annotating the roads in the high-resolution image at the pixel level in ArcGIS software to obtain a labeled surface crack image;

所述的图像裁剪具体为：将带标签的高分辨率遥感图像随机裁剪为512像素×512像素的子图像；The image cropping is specifically as follows: randomly cropping the labeled high-resolution remote sensing image into sub-images of 512 pixels×512 pixels;

所述数据增强包括：将子图像进行尺度随机变换、随机角度图像旋转、图像垂直与水平翻转得到的高分辨率遥感图像。The data enhancement includes: performing random scale transformation on sub-images, random angle image rotation, and vertical and horizontal image flipping to obtain a high-resolution remote sensing image.

所述步骤S300中，损失函数采用Dice Loss，具体为：In step S300, the loss function adopts Dice Loss, which is specifically:

其中X代表预测图中所有像素对应预测值的集合，Y代表标签图像所有像素对应值的集合，

代表X和Y的交集，

和

代表X和Y中元素的个数。Where X represents the set of predicted values corresponding to all pixels in the prediction image, and Y represents the set of values corresponding to all pixels in the label image.

represents the intersection of X and Y,

and

Represents the number of elements in X and Y.

与现有技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1）本发明采用空间上下文模块，空间上下文模块将传统的卷积层接层(Layer-by-layer)的连接形式的转为特征图中片连片卷积(Slice-by-slice)的形式，使得图中像素行和列之间能够传递信息，因此即使是在道路被遮挡或其它外观线索较差时，空间上下文模块因其强大的拓扑形态感知能力仍然适用于检测长距离连续形状的裂纹目标。1) The present invention adopts a spatial context module, which converts the traditional layer-by-layer convolution connection form into a slice-by-slice convolution form in the feature map, so that information can be transmitted between pixel rows and columns in the map. Therefore, even when the road is blocked or other appearance clues are poor, the spatial context module is still suitable for detecting crack targets with long-distance continuous shapes due to its powerful topological morphological perception ability.

2）本发明中的空间上下文模块，采用两个具有记忆功能的线性层替代传统自注意力机制中的Key矢量和Value矢量，不仅可以从整个训练集中所有样本中获取全局上下文信息，同时降低计算复杂度。2) The spatial context module in the present invention uses two linear layers with memory function to replace the Key vector and Value vector in the traditional self-attention mechanism, which can not only obtain global context information from all samples in the entire training set, but also reduce the computational complexity.

3）本发明在网络中的下转换模块和上转换模块中使用了小波变换和逆小波变换实现下采样和上采样进一步避免模型中的细节和边缘信息损失，进而提升模型性能。3) The present invention uses wavelet transform and inverse wavelet transform in the down-conversion module and up-conversion module in the network to implement down-sampling and up-sampling to further avoid the loss of details and edge information in the model, thereby improving the model performance.

4）本发明的下采样和上采样路径中，利用一次性聚合模块提取道路特征。该设计使得模型对特征的重复利用率更高并减少冗余计算，因而模型易于训练，无需预训练模型而且降低了模型训练所需样本的数量；同时一次性聚合模块还结合了可变形卷积，有利于获取精确的道路几何特征和增强拓扑感知能力。4) In the downsampling and upsampling paths of the present invention, a one-time aggregation module is used to extract road features. This design allows the model to reuse features more efficiently and reduces redundant calculations, making the model easier to train, eliminating the need for pre-training models and reducing the number of samples required for model training. At the same time, the one-time aggregation module is combined with deformable convolution, which is beneficial for obtaining accurate road geometry features and enhancing topology perception capabilities.

5）本发明的跳跃连接中采用了通道注意力模块，该设计有利于选取合适通道信息进行融合并避免干扰，以更好的提升模型的语义分割精度。5) The jump connection of the present invention adopts a channel attention module, which is conducive to selecting appropriate channel information for fusion and avoiding interference, so as to better improve the semantic segmentation accuracy of the model.

本发明通过以上的有益效果，综合解决了现有高分辨率遥感图像道路提取方法中存在的无法准确感知道路的拓扑结构、全局上下文信息的获取局限于单个样本、语义分割的过程中边缘和细节信息损失较大、训练所需样本量大等问题。Through the above beneficial effects, the present invention comprehensively solves the problems existing in the existing high-resolution remote sensing image road extraction methods, such as the inability to accurately perceive the topological structure of the road, the acquisition of global context information is limited to a single sample, the loss of edge and detail information in the semantic segmentation process is large, and the number of samples required for training is large.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明方法中构建的高分辨率遥感图像语义分割网络总体结构示意图；FIG1 is a schematic diagram of the overall structure of a high-resolution remote sensing image semantic segmentation network constructed in the method of the present invention;

图2为本发明方法中构建的高分辨率遥感图像语义分割网络中结合可变形卷积的一次聚合模块的结构组成示意图；FIG2 is a schematic diagram of the structural composition of a primary aggregation module combined with deformable convolution in a high-resolution remote sensing image semantic segmentation network constructed in the method of the present invention;

图3为本发明方法中构建的高分辨率遥感图像语义分割网络中通道注意力模块的组成结构示意图；FIG3 is a schematic diagram of the composition structure of the channel attention module in the high-resolution remote sensing image semantic segmentation network constructed in the method of the present invention;

图4为本发明方法中构建的高分辨率遥感图像语义分割网络中空间上下文模块的组成结构示意图。FIG4 is a schematic diagram of the composition structure of the spatial context module in the high-resolution remote sensing image semantic segmentation network constructed in the method of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图对本发明的具体实施方式进行描述，以便本领域的技术人员更好地理解本发明。The specific implementation modes of the present invention are described below in conjunction with the accompanying drawings so that those skilled in the art can better understand the present invention.

本发明涉及一种基于多尺度拓扑感知神经网络的表面裂纹检测方法，包括以下步骤：The present invention relates to a surface crack detection method based on a multi-scale topological perception neural network, comprising the following steps:

步骤一：将高分辨率遥感数据集划分为训练集和测试集，并对训练集和测试集中的图像分别进行预处理。Step 1: Divide the high-resolution remote sensing dataset into a training set and a test set, and preprocess the images in the training set and the test set respectively.

所述预处理包括图像人工标注、图像裁剪和数据增强；所述的图像人工标注具体为：在ArcGIS软件中人工对高分辨率图像中的道路进行像素级语义标注得到带标签的表面裂纹图像。The preprocessing includes manual image annotation, image cropping and data enhancement; the manual image annotation specifically includes: manually performing pixel-level semantic annotation on roads in high-resolution images in ArcGIS software to obtain labeled surface crack images.

所述的图像裁剪具体为：将带标签的高分辨率遥感图像随机裁剪为512像素×512像素的子图像。The image cropping is specifically as follows: randomly cropping the labeled high-resolution remote sensing image into sub-images of 512 pixels×512 pixels.

步骤二：搭建高分辨率遥感图像语义分割网络，如图1所示，所述的高分辨率遥感图像语义分割网络依次由下采样路径、空间上下文模块和上采样路径组成,下采样路径与上采样路径采用经过通道注意力调整的跳跃连接相连。Step 2: Build a high-resolution remote sensing image semantic segmentation network. As shown in Figure 1, the high-resolution remote sensing image semantic segmentation network consists of a downsampling path, a spatial context module and an upsampling path in sequence. The downsampling path and the upsampling path are connected by a jump connection adjusted by channel attention.

下采样路径；从输入数据开始，首先通过卷积核大小为3×3、步长为1的可变形卷积层，得到通道数目

的取值为48的语义特征图，随后通过5个连续的下采样单元进行特征提取和下采样。Downsampling path: Starting from the input data, first pass through a deformable convolution layer with a convolution kernel size of 3×3 and a stride of 1 to get the number of channels

The semantic feature map with a value of 48 is then extracted and downsampled through 5 consecutive downsampling units.

可变形卷积对常规卷积核中每个采样点的位置都增加了一个可学习的偏移变量和权重系数；通过这些变量,卷积核就可以在当前位置附近随意的采样并排除了无用上下文信息的干扰，不再局限于常规卷积的规则格点；可变形卷积有利于道路复杂几何特征的准确提取。Deformable convolution adds a learnable offset variable and weight coefficient to the position of each sampling point in the conventional convolution kernel; through these variables, the convolution kernel can sample arbitrarily near the current position and eliminate the interference of useless context information, and is no longer limited to the regular grid of conventional convolution; deformable convolution is conducive to the accurate extraction of complex geometric features of the road.

下采样单元依次由一次聚合模块、聚合操作和下转换模块组成，其中一次聚合模块负责提取特征，然后聚合操作将一次聚合模块的输入和输出聚合，再将结果特征图分别传输至下转换模块和通道注意力模块，其中通道注意力模块的输出（通道注意力）与输入进行对应元素相乘，得到经过通道注意力调整的跳跃连接特征并输入至上采样路径对应部分。The downsampling unit is composed of a single aggregation module, an aggregation operation, and a down-conversion module in sequence. The single aggregation module is responsible for extracting features. The aggregation operation then aggregates the input and output of the single aggregation module, and then transmits the resulting feature map to the down-conversion module and the channel attention module respectively. The output of the channel attention module (channel attention) is multiplied by the input by corresponding elements to obtain the jump connection features adjusted by the channel attention and input into the corresponding part of the upsampling path.

如图2所示，所述一次聚合模块的结构如下：特征图

输入至一次聚合模块后，首先经过n个卷积模块，得到n个新特征图

(其中前两个卷积模块为可变形卷积模块)；然后将所得

个特征图

进行通道堆叠操作，得到通道堆叠后的特征图

；一次性聚合模块将特征图一次性聚合，即重复利用了网络所提取的特征又避免了冗余计算，因而可以有效降低模型训练的难度并减少所需样本数量。As shown in Figure 2, the structure of the primary aggregation module is as follows:

After being input into the primary aggregation module, it first passes through n convolution modules to obtain n new feature maps

(The first two convolution modules are deformable convolution modules); then

Feature Map

The one-time aggregation module aggregates the feature maps at one time, which reuses the features extracted by the network and avoids redundant calculations, thus effectively reducing the difficulty of model training and the number of samples required.

下采样路径中每个一次聚合模块包含的常规卷积和可变形卷积模块总个数

的取值分别为4、5、7、10、12，对应输出的特征图通道数目m的取值分别为112、192、304、464、656；每个上采样单元中一次聚合模块包含的常规卷积和可变形卷积模块总个数

的取值分别为12、10、7、5、4，输出到下一个上转换模块的特征图通道数

的取值为192、160、112、80、64。The total number of regular convolution and deformable convolution modules contained in each primary aggregation module in the downsampling path

The values of are 4, 5, 7, 10, and 12, respectively, and the corresponding values of the number of feature map channels m of the output are 112, 192, 304, 464, and 656, respectively; the total number of conventional convolution and deformable convolution modules contained in the aggregation module in each upsampling unit is

The values of are 12, 10, 7, 5, and 4, respectively, and the number of feature map channels output to the next up-conversion module

The values are 192, 160, 112, 80, and 64.

一次聚合模块中的卷积模块或可变形卷积模块，由批归一化层、ReLU激活函数层、3

3卷积层或可变形卷积层(通道数m=16)和随机失活层（失活概率p=0.2）依次组成。The convolution module or deformable convolution module in the primary aggregation module consists of a batch normalization layer, a ReLU activation function layer, 3

It consists of 3 convolutional layers or deformable convolutional layers (channel number m=16) and random inactivation layers (inactivation probability p=0.2).

如图3所示，所述通道注意力模块结构如下：大小为

的输入特征分别经过全局最大池化和全局平均池化得到两个

的特征图，接着将它们分别送入一个共享多层感知器:第一层神经元个数为

（r为减少率，r=16），激活函数为Relu，第二层神经元个数为

；输出的两个特征进行相加，再经过Sigmoid激活操作，生成最终的通道注意力；通道注意力和输入特征图做对应元素乘法操作。As shown in Figure 3, the channel attention module structure is as follows:

The feature maps are then sent to a shared multilayer perceptron: the number of neurons in the first layer is

(r is the reduction rate, r=16), the activation function is Relu, and the number of neurons in the second layer is

; The two output features are added, and then the Sigmoid activation operation is performed to generate the final channel attention; the channel attention and the input feature map perform corresponding element multiplication operations.

下转换模块选用Daubechies小波为小波基，经过一次小波变换图像的每个通道被分解为原先一半高度和宽度的4个子图像：

、

、

和

，其中下标中字母的前后顺序分别代表水平和垂直方向，G代表高频信息、而D代表低频信息，例如GG代表：水平和垂直方向均为高频信息下标；将

子图像输出，而其余子图像传输至上采样路径对应的模块。The down-conversion module uses Daubechies wavelet as the wavelet basis. After a wavelet transform, each channel of the image is decomposed into four sub-images with half the original height and width:

,

and

, where the order of the letters in the subscripts represents the horizontal and vertical directions respectively, G represents high-frequency information, and D represents low-frequency information. For example, GG represents: both the horizontal and vertical directions are high-frequency information subscripts;

The sub-image is output, while the remaining sub-images are transmitted to the modules corresponding to the upsampling path.

如图4所示，所述空间上下文模块由“高度方向”和“宽度方向”两条路径组成；在“高度方向”路径来自下采样路径的特征图（假设大小为

，其中C代表通道数目，H代表特征图高度，W代表特征图宽度）在垂直方向被分为H个切片：

，首先对每个切片

（

）使用大小为

卷积操作进行线性投影得到

，然后将

输入第一个线性层

（

,其中S=64，C=656），得到注意力

,再对注意力

（

,其中C=656,S=64），得到新的切片

，最后将所有新切片沿“高度方向”聚合为大小为

的特征图；在“宽度方向”路径，来自下采样路径的特征图沿“宽度”方向被分为W个切片，切片按照前两个阶段的方式更新并聚合；来自“高度方向”和“宽度方向”两条路径、大小均为

的特征图通过加法运算进行融合；空间上下文模块将传统的卷积层接层(Layer-by-layer)的连接形式的转为特征图中片连片（Slice-by-slice）线性复杂度自注意力的形式，使得图中像素行和列之间能够传递信息，同时采用具有记忆功能的线性层替代传统自注意力中的Key矢量和Value矢量，因此即使是遥感图像中的道路被遮挡或被阴影覆盖，空间上下文模块仍然适用于检测长距离连续形状的道路目标，并且能够提取分布于整个训练集的全局上下文信息，同时降低计算复杂度。As shown in FIG4 , the spatial context module consists of two paths, “height direction” and “width direction”; in the “height direction” path, the feature map from the downsampling path (assuming the size is

, where C represents the number of channels, H represents the feature map height, and W represents the feature map width) is divided into H slices in the vertical direction:

, first for each slice

（

) Use size

The convolution operation is linearly projected to obtain

, then

Input the first linear layer

（

, where S=64, C=656), get attention

, and then pay attention

（

, where C=656, S=64), get the new slice

In the “width direction” path, the feature map from the downsampling path is divided into W slices along the “width direction”, and the slices are updated and aggregated in the same way as in the first two stages; the feature maps from the “height direction” and “width direction” paths are both

The feature maps are fused by addition operations; the spatial context module converts the traditional convolutional layer-by-layer connection form into the slice-by-slice linear complexity self-attention form in the feature map, so that information can be transmitted between pixel rows and columns in the map. At the same time, a linear layer with memory function is used to replace the Key vector and Value vector in the traditional self-attention. Therefore, even if the road in the remote sensing image is blocked or covered by shadows, the spatial context module is still suitable for detecting road targets with long-distance continuous shapes, and can extract global context information distributed in the entire training set while reducing computational complexity.

所述上采样路径上采样路径依次由5个连续的上采样单元、一个卷积核大小为1×1、步长为1的卷积层、Softmax层组成，最后生成分割预测图。The upsampling path is composed of 5 consecutive upsampling units, a convolution layer with a convolution kernel size of 1×1 and a step size of 1, and a Softmax layer, and finally generates a segmentation prediction map.

所述上采样路径上采样路径依次由5个连续的上采样单元、一个聚合操作（记为A聚合操作）、一个卷积核大小为1×1、步长为1的卷积层、Softmax层组成，最后生成分割预测图。The upsampling path is composed of 5 consecutive upsampling units, an aggregation operation (denoted as A aggregation operation), a convolution layer with a convolution kernel size of 1×1 and a step size of 1, and a Softmax layer, and finally generates a segmentation prediction map.

所述上采样单元由一组上转换模块、聚合操作（记为B聚合操作）和一次聚合模块组成，其中上转换模块负责通过上采样恢复特征图的空间分辨率，B聚合操作负责将通道注意力模块调整过的跳跃连接特征图和上转换模块所得特征图进行聚合，一次聚合模块负责从B聚合操作的结果中提取特征。The upsampling unit consists of a group of up-conversion modules, aggregation operations (denoted as B aggregation operations) and a one-time aggregation module, wherein the up-conversion module is responsible for restoring the spatial resolution of the feature map through upsampling, the B aggregation operation is responsible for aggregating the skip connection feature map adjusted by the channel attention module and the feature map obtained by the up-conversion module, and the one-time aggregation module is responsible for extracting features from the results of the B aggregation operation.

所述上转换模块将来自下采样路径中对应下转换模块输出的高频信息特征图（

、

和

）在通过1×1卷积通道数提升后与空间上下文模块或上一个一次聚合模块输出的新特征组合，进行Daubechies小波为小波基的小波逆变换，从而实现上采样。The up-conversion module converts the high-frequency information feature map (

,

and

) After the number of channels is increased through 1×1 convolution, it is combined with the new features output by the spatial context module or the previous one-time aggregation module, and an inverse wavelet transform with Daubechies wavelet as the wavelet basis is performed to achieve upsampling.

所述上采样路径中的A聚合操作将最后一个上采样单元中B聚合操作结果与一次聚合模块输出再次进行聚合。The A aggregation operation in the upsampling path aggregates the result of the B aggregation operation in the last upsampling unit with the output of the primary aggregation module again.

所述卷积核大小为1×1、步长为1的卷积层将通道数转化为2。The convolution layer with a convolution kernel size of 1×1 and a stride of 1 converts the number of channels to 2.

步骤三：将步骤一中经过预处理的训练集图像输入至步骤二中的高分辨遥感图像语义分割网络进行训练，首先使用He Uniform方法对高分辨遥感图像分割网络进行初始化，然后对模型中参数进行更新，优化损失函数直至收敛。Step 3: Input the preprocessed training set images in step 1 into the high-resolution remote sensing image semantic segmentation network in step 2 for training. First, use the He Uniform method to initialize the high-resolution remote sensing image segmentation network, then update the parameters in the model, and optimize the loss function until convergence.

代表X和Y的交集，

和

represents the intersection of X and Y,

and

Represents the number of elements in X and Y.

步骤四：将经过预处理的测试集遥感图像输入到步骤三中训练好的分割网络，输出分高分辨率遥感图像语义分割结果。Step 4: Input the preprocessed test set remote sensing images into the segmentation network trained in step 3, and output the semantic segmentation results of the high-resolution remote sensing images.

关于本发明具体结构需要说明的是，本发明采用的各部件模块相互之间的连接关系是确定的、可实现的，除实施例中特殊说明的以外，其特定的连接关系可以带来相应的技术效果，并基于不依赖相应软件程序执行的前提下，解决本发明提出的技术问题，本发明中出现的部件、模块、具体元器件的型号、连接方式除具体说明的以外，均属于本领域技术人员在申请日前可以获取到的已公开专利、已公开的期刊论文、或公知常识等现有技术，无需赘述，使得本案提供的技术方案是清楚、完整、可实现的，并能根据该技术手段重现或获得相应的实体产品。What needs to be explained about the specific structure of the present invention is that the connection relationship between the various component modules adopted in the present invention is definite and feasible. Except for the special instructions in the embodiments, the specific connection relationship can bring about the corresponding technical effects and solve the technical problems raised by the present invention without relying on the execution of the corresponding software program. Except for the specific instructions, the models and connection methods of the components, modules and specific components appearing in the present invention belong to the existing technologies such as published patents, published journal articles or common knowledge that can be obtained by technical personnel in this field before the application date, and there is no need to elaborate, so that the technical solution provided in this case is clear, complete and feasible, and the corresponding physical products can be reproduced or obtained according to the technical means.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or replace some or all of the technical features therein by equivalents. However, these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A high-resolution remote sensing image semantic segmentation network is characterized in that: comprises a down-sampling path, a spatial context module and an up-sampling path,

the down-sampling path comprises a deformable convolution layer and 5 continuous down-sampling units, input data firstly passes through the deformable convolution layer to obtain a semantic feature map, then feature extraction and down-sampling are carried out through the 5 continuous down-sampling units, and finally a feature map is output;

the up-sampling path sequentially consists of 5 continuous up-sampling units, an aggregation operation A, a convolution layer and a Softmax layer, and finally a segmentation prediction graph is generated;

the 5 down sampling units and the 5 up sampling units are in one-to-one correspondence and are connected by adopting a jump connection which is adjusted by the attention of the channel attention module;

and the spatial context module is used for segmenting and fusing the feature map output by the down-sampling path and outputting the feature map to the up-sampling path.

2. The high resolution remote sensing image semantic segmentation network according to claim 1, characterized in that: the deformable convolution layer is a deformable convolution with a convolution kernel size of 3 x 3 and a step size of 1, and this operation adds a learnable offset variable and weight coefficient to the position of each sample point in the 3 x 3 convolution kernel.

3. The high resolution remote sensing image semantic segmentation network according to claim 2, characterized in that: the down-sampling unit comprises a primary aggregation module, an aggregation operation module and a down-conversion module, wherein the primary aggregation module is responsible for extracting features, then the aggregation operation module aggregates the input and the output of the primary aggregation module, and then the result feature graph is respectively transmitted to the down-conversion module and the channel attention module, wherein the output and the input of the channel attention module are multiplied by corresponding elements to obtain jump connection features subjected to channel attention adjustment and input the jump connection features to the corresponding part of the upper sampling path.

4. The high resolution remote sensing image semantic segmentation network according to claim 1, characterized in that: the up-sampling unit comprises a group of up-conversion modules, a B aggregation operation and a primary aggregation module, wherein the up-conversion modules are responsible for recovering the spatial resolution of the feature map through up-sampling, the B aggregation operation is responsible for aggregating the jump connection feature map adjusted by the channel attention module and the feature map obtained by the up-conversion modules, and the primary aggregation module is responsible for extracting features from the result of the B aggregation operation.

5. The high resolution remote sensing image semantic segmentation network according to claim 3 or 4, characterized in that: the structure of the primary polymerization module is as follows: characteristic diagram

After being input into a primary polymerization module, the data is firstly processed through->

A convolution module to get >>

Individual new characteristic map>

Wherein the first two convolution modules are deformable convolution modules; then combining the result>

Characteristic diagram

Performing channel stacking operation to obtain a characteristic diagram after channel stacking>

；

The convolution module consists of a batch normalization layer, a ReLU activation function layer and 3

3, sequentially forming a convolution layer and a random inactivation layer; the deformable convolution module consists of a batch normalization layer, a ReLU activation function layer and a 3 ^ er>

3 a deformable convolution layer and a random deactivation layer.

6. The high resolution remote sensing image semantic segmentation network according to claim 5, characterized in that: the structure of the channel attention module is as follows: is of the size of

Are subject to global maximum pooling and global mean pooling, respectively, to obtain two->

A characteristic diagram of (1);

then respectively sending the data to a shared multilayer perceptron, wherein the number of first layer neurons is

R is decreasing rate, r =16, activation function is Relu, number of second layer neurons is +>

；/>

Adding the two output characteristics, and generating final channel attention through Sigmoid activation operation; and carrying out corresponding element multiplication operation on the channel attention and the input features.

7. The high resolution remote sensing image semantic segmentation network according to claim 6, characterized in that: the space context module comprises two paths of a height direction and a width direction;

wherein the "height direction" path is from a feature map of the down-sampled path having a size of

Where C represents the number of channels, H represents the feature height, W represents the feature width, divided into H slices in the vertical direction:

For each slice->

，

Is used with a size of->

Convolution operation performs linear projection to get->

Then will->

Input the first linear level->

Get attention->

Then attention is called>

Regularization using Softmax and L1, respectively, in the feature size and channel dimensions in turn, and then inputting the regularized attention into the second linear layer ≦>

Obtaining new sections>

Finally, all new sections are combined in the "height direction" to a size ^ 4>

A characteristic diagram of (1);

in the 'width direction' path, the feature map from the downsampling path is divided into W slices along the 'width' direction, and the slices are updated and aggregated into the feature map in the same way;

the two paths from the height direction and the width direction are both

The feature maps of (a) are fused by addition.

8. A remote sensing image road segmentation method based on a topology-aware neural network is characterized by comprising the following steps: comprises the following steps of (a) preparing a solution,

s100: dividing the high-resolution remote sensing data set into a training set and a testing set, and respectively preprocessing images in the training set and the testing set;

s200: building a high-resolution remote sensing image semantic segmentation network;

s300: inputting the preprocessed training set image in the S100 into a high-resolution remote sensing image semantic segmentation network in the S200 for training, initializing the high-resolution remote sensing image semantic segmentation network by using a He Uniform method, updating parameters in a model, and optimizing a loss function until convergence;

s400: and inputting the preprocessed test set remote sensing image into the trained segmentation network in S200, and outputting a semantic segmentation result of the high-resolution remote sensing image.

9. The remote sensing image road segmentation method based on the topology aware neural network as claimed in claim 8, characterized in that: in the step S100, the preprocessing includes image manual labeling, image cropping, and data enhancement;

the image manual labeling specifically comprises the following steps: manually carrying out pixel-level semantic annotation on a road in the high-resolution image in ArcGIS software to obtain a labeled surface crack image;

the image cutting specifically comprises the following steps: randomly cutting the high-resolution remote sensing image with the label into a sub-image of 512 pixels multiplied by 512 pixels;

the data enhancement comprises: and carrying out scale random transformation, random angle image rotation and image vertical and horizontal overturning on the subimages to obtain a high-resolution remote sensing image.

10. The remote sensing image road segmentation method based on the topology aware neural network as claimed in claim 8, characterized in that: in step S300, the Loss function adopts Dice Loss, which specifically includes:

wherein X represents the set of predicted values corresponding to all pixels in the prediction map, Y represents the set of values corresponding to all pixels in the label image,

represents the intersection of X and Y>

And &>

Represents the number of elements in X and Y. />