CN114596503A

CN114596503A - Road extraction method based on remote sensing satellite image

Info

Publication number: CN114596503A
Application number: CN202210208048.6A
Authority: CN
Inventors: 殷玮伶; 王立君; 戚金清
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2022-06-07
Anticipated expiration: 2042-03-03
Also published as: CN114596503B

Abstract

The invention belongs to the field of computer road extraction, and provides a road extraction method based on remote sensing satellite images. The road extraction method based on remote sensing satellite images includes a feature encoder, an iterative feature enhancement sub-network and a multi-task decoder. The iterative feature enhancement sub-network unit includes a semantic-guided feature enhancement module, a direction-aware feature aggregation module and two side branches; the semantic-guided feature enhancement module includes feature selection and feature fusion; the direction-aware feature aggregation module includes the residuals of multiple branches structure, each branch includes a direction-aware deformable convolution and a ReLU unit; the proposed semantic-guided feature enhancement module and direction-aware feature aggregation module and direction-aware deformable convolution make better use of road directions to automatically align convolutions Nuclear receptive field and road area; it solves the connectivity problem of automated road extraction tasks in remote sensing satellite images, and the results output by this method are highly accurate and efficient.

Description

A road extraction method based on remote sensing satellite images

技术领域technical field

本发明涉及计算机视觉领域中的道路提取(road extraction)领域，具体涉及一种基于遥感卫星影像的道路提取方法。The invention relates to the field of road extraction in the field of computer vision, in particular to a road extraction method based on remote sensing satellite images.

背景技术Background technique

图像语义分割任务旨在帮助计算机理解真实环境中包含的物体类别及位置，根据用户定义的物体类别识别图像中存在的内容及相应的位置。基于语义分割技术的遥感卫星影像道路提取任务旨在预测二值的道路分割图，定位遥感卫星影像中的道路位置。近年来，随着基于深度神经网络的机器学习模型在图像分类、目标检测、语义分割等许多计算机视觉任务中取得巨大的成功，在一些大型的公开数据集上，基于深度学习的方法显着提高了遥感卫星影像道路提取的质量。The image semantic segmentation task aims to help the computer understand the object categories and positions contained in the real environment, and identify the content and corresponding positions in the image according to the user-defined object categories. The remote sensing satellite image road extraction task based on semantic segmentation technology aims to predict the binary road segmentation map and locate the road position in the remote sensing satellite image. In recent years, with the great success of deep neural network-based machine learning models in many computer vision tasks such as image classification, object detection, and semantic segmentation, deep learning-based methods have significantly improved on some large public datasets. quality of road extraction from remote sensing satellite images.

然而，近期研究发现，遥感卫星影像道路提取任务相比于图像语义分割任务存在更多的技术挑战。在遥感卫星影像中，道路区域经常被云层或高层建筑及其阴影遮挡，道路区域的外观受天气、光照强度及拍摄角度等客观因素的影响较大，并且道路区域与临近的非道路区域具有高度相似的纹理特点。因此，直接将图像语义分割技术应用于遥感卫星影像道路提取任务通常会导致提取出的道路网较为破碎，远远落后于工业级应用的要求。同时，研究人员尝试通过改进神经网络结构来获得更好的道路提取结果。AbhishekChaurasia等人提出一种加强信息传递的新网络结构LinkNet，增加了编码器和解码器之间的信息流动，将空间信息直接从编码器输入给解码器。Lichen Zhou等人基于前人的成果，在编码器后设计了一个扩张卷积的模块，扩张模块部分包含级联模式和并行模式的空洞卷积。每个路径的感受野都不同，因此，网络可以捕获多尺度特征从而获得更好的检测结果。Yao Wei等人利用道路边界作为先验信息以降低沿着道路边缘方向的偏差。Anil Batra等人提出了基于多任务学习的遥感影像道路提取方法StackHourglass，利用道路方向作为监督信息，初步证明了道路方向信息可以为道路提取任务带来增益。这些方法在道路连贯性上获得了一定的改善，但是并没有充分利用道路的方向信息来辅助道路提取任务。道路拓扑的建立始终遵循着道路方向的指导，合理利用道路方向可以促进更加精确的道路提取结果。道路方向与道路提取任务具有内在的相关性。However, recent studies have found that the task of road extraction from remote sensing satellite images has more technical challenges than the task of image semantic segmentation. In remote sensing satellite images, the road area is often blocked by clouds or high-rise buildings and their shadows. The appearance of the road area is greatly affected by objective factors such as weather, light intensity and shooting angle, and the road area and the adjacent non-road areas have a high degree of Similar texture features. Therefore, directly applying image semantic segmentation technology to the task of road extraction from remote sensing satellite images usually results in a relatively fragmented road network, which is far behind the requirements of industrial applications. At the same time, researchers try to obtain better road extraction results by improving the neural network structure. Abhishek Chaurasia et al. proposed a new network structure LinkNet to enhance information transfer, increase the information flow between the encoder and the decoder, and input the spatial information directly from the encoder to the decoder. Based on previous results, Lichen Zhou et al. designed a dilated convolution module after the encoder, and the dilated module part contains convolutional convolution in cascade mode and parallel mode. The receptive field of each path is different, so the network can capture multi-scale features for better detection results. Yao Wei et al. used the road boundary as prior information to reduce the bias along the road edge direction. Anil Batra et al. proposed a multi-task learning-based method for road extraction from remote sensing images, StackHourglass, which uses road direction as supervision information and preliminarily proves that road direction information can bring benefits to road extraction tasks. These methods achieve some improvement in road coherence, but do not fully utilize the direction information of the road to assist the road extraction task. The establishment of road topology always follows the guidance of road direction, and reasonable use of road direction can promote more accurate road extraction results. Road directions are inherently related to the road extraction task.

本发明基于对现有遥感影像道路提取方法StackHourglass的调查与分析，针对遥感卫星影像的外观特点和道路要素的几何特性，提出了一种基于遥感卫星影像的道路提取方法。Based on the investigation and analysis of the existing remote sensing image road extraction method StackHourglass, the present invention proposes a road extraction method based on remote sensing satellite images according to the appearance characteristics of remote sensing satellite images and the geometric characteristics of road elements.

发明内容SUMMARY OF THE INVENTION

本发明要解决的技术问题是克服现有方法StackHourglass中对道路方向信息利用不充分的缺陷，提供一种新的充分利用道路方向信息辅助道路提取的方法。本发明目的是在给定单张RGB图像下，通过提出的方法给出一个表示道路像素的二值掩码，本发明适用于各种遥感卫星影像。The technical problem to be solved by the present invention is to overcome the defect of insufficient utilization of road direction information in the existing method StackHourglass, and to provide a new method for fully utilizing road direction information to assist road extraction. The purpose of the present invention is to give a binary mask representing road pixels by the proposed method under a given single RGB image, and the present invention is suitable for various remote sensing satellite images.

本发明为一种基于遥感卫星影像的道路提取方法，基于卷积神经网络实现。本发明将道路方向作为先验知识，更准确地推理道路区域，尤其是在闭塞和噪声区域，从而产生更加完整和一致的道路提取结果。同时，本发明利用道路提取任务辅助道路方向预测的任务。本发明提出的网络由三个组件组成：特征编码器、迭代特征增强子网络和多任务解码器。The present invention is a road extraction method based on remote sensing satellite images, which is realized based on a convolutional neural network. The present invention uses road directions as prior knowledge to more accurately infer road areas, especially in occluded and noisy areas, thereby producing more complete and consistent road extraction results. At the same time, the present invention utilizes the task of road extraction to assist the task of road direction prediction. The proposed network consists of three components: a feature encoder, an iterative feature enhancement sub-network and a multi-task decoder.

本发明的技术方案如下：一种基于遥感卫星影像的道路提取方法(IterNet)，包括特征编码器、迭代特征增强子网络和多任务解码器；具体提取步骤如下；The technical scheme of the present invention is as follows: a road extraction method (IterNet) based on remote sensing satellite images, comprising a feature encoder, an iterative feature enhancement sub-network and a multi-task decoder; the specific extraction steps are as follows;

步骤1:遥感卫星影像转化为RGB图像输入至特征编码器进行特征提取，输出分辨率为原始输入图像下采样4倍大小的特征图，提取道路语义特征和道路方向特征；Step 1: the remote sensing satellite image is converted into RGB image and input to the feature encoder to carry out feature extraction, and the output resolution is the feature map of the original input image down-sampling 4 times of size, extracts road semantic feature and road direction feature;

步骤2：步骤1提取的特征送入迭代特征增强子网络；迭代特征增强子网络包括多个特征增强单元，每个特征增强单元中包含一语义引导特征加强模块 (SGFE)、一方向感知特征聚合模块(OAFA)和两个侧边分支；Step 2: The features extracted in step 1 are sent to the iterative feature enhancement sub-network; the iterative feature enhancement sub-network includes a plurality of feature enhancement units, each feature enhancement unit includes a semantically guided feature enhancement module (SGFE), a direction-aware feature aggregation module (OAFA) and two side branches;

输入道路语义特征和道路方向特征至语义引导特征加强模块，获得增强的道路方向特征；增强的道路方向特征分为两支路，一支路用于特征融合，另一支路输入至一侧边分支获得初步道路方向预测结果，该侧边分支由两个卷积层组成；Input road semantic features and road direction features to the semantic guidance feature enhancement module to obtain enhanced road direction features; the enhanced road direction features are divided into two branches, one branch is used for feature fusion, and the other branch is input to one side The branch obtains the preliminary road direction prediction results, and the side branch consists of two convolutional layers;

语义引导特征增强模块包括特征选择部分和特征融合部分；特征选择部分包括卷积层、激活函数层和沙漏形子网络；The semantic-guided feature enhancement module includes a feature selection part and a feature fusion part; the feature selection part includes a convolution layer, an activation function layer and an hourglass-shaped sub-network;

特征选择部分中，采用级联方式合并输入道路语义特征和道路方向特征，合并后的特征送入卷积层和激活函数层以获得单通道特征置信度；In the feature selection part, the input road semantic features and road direction features are combined in a cascade manner, and the combined features are sent to the convolution layer and the activation function layer to obtain the single-channel feature confidence;

B＝Sigmoid(Ψ₁(cat(F_e，F_o)，Θ₁)) (1)B=Sigmoid(Ψ ₁ (cat(F _e , F _o ), Θ ₁ )) (1)

其中，B为单通道特征置信度，cat(·，·)表示沿着通道方向级联，Ψ₁(·，Θ₁)表示参数为Θ₁的卷积层；F_e表示道路语义特征；F_o表示道路方向特征；Among them, B is the single-channel feature confidence, cat( , ) represents the cascade along the channel direction, Ψ ₁ ( , Θ ₁ ) represents the convolutional layer with parameter Θ ₁ ; F _e represents the road semantic feature; F _o represents the road direction feature;

生成的单通道特征置信度分别对道路语义特征和道路方向特征数值域中进行特征选择，输入至沙漏形子网络；沙漏形子网络对特征选择后的特征先进行下采样，再进行上采样，有利于扩大感受野以更好地捕获全局的上下文信息；The generated single-channel feature confidence is used for feature selection in the numerical domains of road semantic features and road direction features, respectively, and input to the hourglass-shaped sub-network; the hourglass-shaped sub-network first downsamples the features after feature selection, and then upsamples them. It is beneficial to expand the receptive field to better capture the global context information;

F_a＝Ψ₂((1-B)·F_e+B·F_o，Θ₂) (2)F _a =Ψ ₂ ((1-B)·F _e +B·F _o ,Θ ₂ ) (2)

其中Ψ₂(·，Θ₂)表示参数为Θ₂的卷积层；F_a为经过特征选择所生成的特征；Wherein Ψ ₂ ( , Θ ₂ ) represents a convolutional layer with a parameter of Θ ₂ ; F _a is the feature generated through feature selection;

特征融合部分利用融合掩模，将输入的道路语义特征、道路方向特征和经特征选择部分的结果进行加权特征融合，获得增强的道路方向特征；The feature fusion part uses a fusion mask to perform weighted feature fusion of the input road semantic features, road direction features and the results of the feature selection part to obtain enhanced road direction features;

F′_o＝(1+M)·F_o+(1+M)·F_e， (3)F′ _o =(1+M)·F _o +(1+M)·F _e , (3)

M＝Sigmoid(D(E(F_a)))M=Sigmoid(D(E(F _a )))

其中，D(·)和E(·)分别代表沙漏形子网络中的编码部分和解码部分，F′_o表示经增强的道路方向特征，M表示融合掩模；Among them, D( ) and E( ) represent the encoding part and decoding part in the hourglass-shaped sub-network, respectively, F′ _o represents the enhanced road direction feature, and M represents the fusion mask;

初步道路方向预测结果同道路语义特征输入方向感知特征聚合模块，生成加强的道路语义特征；加强的道路语义特征分为两支路，一支路用于特征融合，另一支路输入至另一个侧边分支，产生初步的道路分割结果，用于监督网络学习；Preliminary road direction prediction results and road semantic features are input to the direction-aware feature aggregation module to generate enhanced road semantic features; the enhanced road semantic features are divided into two branches, one branch is used for feature fusion, and the other branch is input to the other Side branches, which produce preliminary road segmentation results for supervised network learning;

方向感知特征聚合模块根据初步道路方向预测结果调整感受野的形状和方向，聚焦于用于道路提取的信息区域；方向感知特征聚合模块包括多个分支的残差结构，每个分支包括一个方向感知可形变卷积(OD-Conv)和一个ReLU单元；不同的分支采用不同大小和形状的卷积核；所有分支得到的特征与输入的道路语义特征加权组合输出经加强后的道路语义特征；The direction-aware feature aggregation module adjusts the shape and direction of the receptive field according to the preliminary road direction prediction results, focusing on the information area for road extraction; the direction-aware feature aggregation module includes a residual structure of multiple branches, each branch includes a direction-aware Deformable convolution (OD-Conv) and a ReLU unit; different branches use convolution kernels of different sizes and shapes; the features obtained by all branches and the input road semantic features are weighted and combined to output the enhanced road semantic features;

F′_e＝F_e+λ·∑_iΦ₀(F_e，α，θ_i) (4)F′ _e = F _e +λ·∑ _i Φ ₀ (F _e , α, θ _i ) (4)

其中，F_e表示输入的道路语义特征，F′_e表示经加强后的道路语义特征；α表示侧边分支由道路方向特征得到的道路方向，i表示分支的序号，Φ₀表示参数为θ_i的方向感知可形变卷积层，λ为残差结构的超参数，用于平衡聚合特征和原始特征的权重；Among them, F _e represents the input road semantic feature, F′ _e represents the enhanced road semantic feature; α represents the road direction obtained by the road direction feature of the side branch, i represents the sequence number of the branch, Φ ₀ represents the parameter θ _i The direction-aware deformable convolutional layer of λ is the hyperparameter of the residual structure, which is used to balance the weight of the aggregated features and the original features;

W来表示常规的2D卷积层参数，常规卷积层的输出可以通过式(5)计算。W represents the parameters of the conventional 2D convolutional layer, and the output of the conventional convolutional layer can be calculated by formula (5).

其中X表示输入特征图，Y(p)表示位置p处的输出值。R表示表示2D卷积的感受野。以3×3卷积核为例，二维网格R可以定义如式(6)。where X represents the input feature map and Y(p) represents the output value at position p. R denotes the receptive field representing the 2D convolution. Taking the 3×3 convolution kernel as an example, the two-dimensional grid R can be defined as Equation (6).

R＝{(-1，-1)，(-1，0)，，...，(0，1)，(1，1)} (6)R={(-1,-1),(-1,0),,...,(0,1),(1,1)} (6)

方向感知可形变卷积的卷积核根据初步道路方向预测结果自动调整感受野；方向感知可形变卷积层的输出为；The convolution kernel of the direction-aware deformable convolution automatically adjusts the receptive field according to the preliminary road direction prediction results; the output of the direction-aware deformable convolution layer is;

其中X表示输入特征图；Y(p)表示位置p处的输出值；R表示感受野；α(p)表示位置p处的预测道路方向角，r(p，α)表示应用于坐标p＝[p^x，p^y]处的旋转平移；W(p_n)卷积核感受野的权重；where X represents the input feature map; Y(p) represents the output value at position p; R represents the receptive field; α(p) represents the predicted road direction angle at position p, r(p, α) represents the applied coordinate p= Rotation translation at [p ^x , p ^y ]; W(p _n ) weight of convolution kernel receptive field;

融合增强的方向特征、加强的道路语义特征、原始道路语义特征和原始道路方向特征，融合后的特征输入下一个特征增强单元；Fuse the enhanced direction feature, enhanced road semantic feature, original road semantic feature and original road direction feature, and the fused feature is input to the next feature enhancement unit;

步骤3：最终子网络的输出特征送入至多任务解码器，获得道路提取结果和道路方向预测结果。Step 3: The output features of the final sub-network are sent to the multi-task decoder to obtain road extraction results and road direction prediction results.

所述不同大小和形状的卷积核分别为5×1、1×5和3×3。The convolution kernels of different sizes and shapes are 5×1, 1×5 and 3×3, respectively.

所述r(p，α)为非整数时，采用双线性插值对输入特征图X进行采样。When the r(p, α) is a non-integer, bilinear interpolation is used to sample the input feature map X.

所述特征编码器选用ResNet50；The feature encoder selects ResNet50;

本发明的有益效果：本发明解决了遥感卫星图像中自动化道路提取任务的连通性问题，经该方法输出的结果具有高度精确性和高效性；本发明能够充分利用道路方向所蕴含的先验信息，辅助提高道路提取任务的结果精确性；提出的语义引导特征增强模块可以利用道路语义信息对道路方向信息进行补充和增强；提出的OD-Conv能够更好的利用道路方向来自动对齐卷积核感受野和道路区域。Beneficial effects of the present invention: the present invention solves the connectivity problem of automated road extraction tasks in remote sensing satellite images, and the results output by the method are highly accurate and efficient; the present invention can fully utilize the prior information contained in the road direction , to help improve the result accuracy of the road extraction task; the proposed semantic-guided feature enhancement module can supplement and enhance the road direction information by using the road semantic information; the proposed OD-Conv can better utilize the road direction to automatically align the convolution kernel Feel the wild and road areas.

附图说明Description of drawings

图1为本发明的整体网络结构。FIG. 1 is the overall network structure of the present invention.

图2为本发明提出的语义引导特征增强模块SGFE示意图。FIG. 2 is a schematic diagram of the semantic-guided feature enhancement module SGFE proposed by the present invention.

图3为本发明提出的方向感知可形变卷积示意图。左侧为输入图像上的感受野，右侧为提出的方向感知可形变卷积。FIG. 3 is a schematic diagram of the direction-aware deformable convolution proposed by the present invention. The left is the receptive field on the input image, and the right is the proposed direction-aware deformable convolution.

图4为本发明提出的方向感知特征聚合模块OAFA示意图。FIG. 4 is a schematic diagram of the direction-aware feature aggregation module OAFA proposed by the present invention.

图5为本发明提出的迭代特征增强子网络。Fig. 5 is the iterative feature enhancement sub-network proposed by the present invention.

图6为本发明的实验结果，从左向右分别为(a)遥感卫星图像、(b)道路提取真值、(c)仅采用多任务学习的结果、(d)采用SGFE模块的结果、(e) 采用OAFA模块的结果以及(f)本发明方法的结果。6 is the experimental results of the present invention, from left to right are (a) remote sensing satellite images, (b) road extraction true values, (c) results using only multi-task learning, (d) results using SGFE modules, (e) Results using the OAFA module and (f) results from the method of the present invention.

具体实施方式Detailed ways

下面结合附图和技术方案，进一步说明本发明的具体实施方式。The specific embodiments of the present invention will be further described below with reference to the accompanying drawings and technical solutions.

图1为本发明所提出的IterNet的整体结构，IterNet由遥感卫星影像特征编码器、迭代特征增强子网络以及多任务解码器构成。遥感卫星影像转化为RGB 图像后被送入本发明所提出的特征编码器，进行特征提取，提取到的遥感影像特征送入迭代特征增强子网络。迭代特征增强子网络由多个特征增强单元构成，每个特征增强单元中包含一个SGFE模块、一个OAFA模块和两个侧边分支。在每个特征增强单元中，输入特征通过卷积操作分别提取道路的道路语义特征和道路方向特征，经过本发明提出的SGFE模块和OAFA模块进行特征增强，输出融合后的特征并送入下一个特征单元。FIG. 1 is the overall structure of IterNet proposed by the present invention. IterNet consists of a remote sensing satellite image feature encoder, an iterative feature enhancement sub-network and a multi-task decoder. After the remote sensing satellite images are converted into RGB images, they are sent to the feature encoder proposed by the present invention for feature extraction, and the extracted remote sensing image features are sent to the iterative feature enhancement sub-network. The iterative feature enhancement sub-network consists of multiple feature enhancement units, and each feature enhancement unit includes a SGFE module, an OAFA module and two side branches. In each feature enhancement unit, the input feature extracts the road semantic feature and the road direction feature of the road respectively through the convolution operation, the feature enhancement is performed by the SGFE module and the OAFA module proposed by the present invention, and the fused feature is output and sent to the next feature unit.

图2为本发明提出的SGFE模块，能够利用道路语义信息对道路方向特征进行特征加强，图4为本发明提出的OAFA模块，能够利用初步道路方向预测结果调整OD-Conv感受野的形状和方向，自动聚焦于用于道路提取的信息区域。图3 为本发明提出的OD-Conv，能够自适应的聚焦于道路提取的信息区域，聚合道路面及其周围环境的信息，使道路提取结果更加准确。图5为本发明提出的迭代特征加强子网络，上一个特征加强单元的输出，作为下一单元的输入，第一个特征加强单元以特征编码器的输出为输入特征，进行特征的加强。最后一个特征加强单元将其加强后的特征输入两分支的多任务解码器中，经过一系列卷积和上采样操作还原图像的分辨率并输出最终的预测结果。Figure 2 is the SGFE module proposed by the present invention, which can use road semantic information to enhance the feature of the road direction. Figure 4 is the OAFA module proposed by the present invention, which can adjust the shape and direction of the OD-Conv receptive field by using the preliminary road direction prediction result , which automatically focuses on the information area for road extraction. Figure 3 shows the OD-Conv proposed by the present invention, which can adaptively focus on the information area extracted by the road, aggregate the information of the road surface and its surrounding environment, and make the road extraction result more accurate. Figure 5 is the iterative feature enhancement sub-network proposed by the present invention. The output of the previous feature enhancement unit is used as the input of the next unit, and the first feature enhancement unit uses the output of the feature encoder as the input feature to enhance the feature. The last feature enhancement unit inputs its enhanced features into a two-branch multi-task decoder, restores the resolution of the image through a series of convolution and upsampling operations, and outputs the final prediction result.

本发明使用DeepGlobe数据集和Massachusetts数据集进行方法的训练和测试。DeepGlobe数据集包括来自泰国、印度尼西亚和印度三个不同地区的图像，总面积为2220平方公里，该数据集包含城市和郊区的卫星图像以及相应的像素级真值。遥感图像的地面分辨率为50cm，像素分辨率为1024×1024。本发明使用4696幅图像进行训练，使用1530幅图像进行测试。Massachusetts数据集由马萨诸塞州的1171张图像组成。每幅图像的大小为1500×1500像素，空间分辨率为每像素1m。该数据集是从航拍图像中收集的，包含城市和郊区的卫星图像以及相应的像素级真值，本发明使用该数据集的验证集进行测试。The present invention uses the DeepGlobe data set and the Massachusetts data set to train and test the method. The DeepGlobe dataset includes images from three different regions of Thailand, Indonesia, and India with a total area of 2,220 square kilometers. The dataset contains satellite images of cities and suburbs and the corresponding pixel-level ground-truths. The ground resolution of the remote sensing images is 50cm and the pixel resolution is 1024×1024. The present invention uses 4696 images for training and 1530 images for testing. The Massachusetts dataset consists of 1171 images of Massachusetts. The size of each image is 1500 × 1500 pixels, and the spatial resolution is 1m per pixel. This dataset is collected from aerial imagery and contains satellite images of cities and suburbs and the corresponding pixel-level ground-truths, and the present invention uses the validation set of this dataset for testing.

本发明主要分为数据处理和网络训练两个子任务。在数据处理上，本发明遵循Anil Batra等人的设置，利用道路面真值生成道路方向预测任务的真值。首先对像素级注释进行骨架化操作以生成道路中心线真值，然后利用David H Douglas等人提出的平滑算法方法对道路中心线真值进行平滑处理。对于道路中心线中的每一段，本发明利用插值的放大生成一系列关键点，对每两个相邻关键点之间的方向角度进行计算。为了简化方向估计任务，本发明将区间大小设置为10°对方向进行聚类。本发明将骨架化的道路方向真值向外侧扩展20个像素以生成像素级的道路方向真值用于道路方向预测任务的监督。最后本发明使用36维方向类在道路区域内注释道路面的方向，非道路区域被标记为背景类，将道路方向预测问题转化为将道路方向转化为36类角度、1类背景的像素级分类问题。The present invention is mainly divided into two subtasks of data processing and network training. In terms of data processing, the present invention follows the setting of Anil Batra et al., and uses the true value of the road surface to generate the true value of the road direction prediction task. First, the pixel-level annotation is skeletonized to generate the ground truth of the road centerline, and then the ground truth value of the road centerline is smoothed using the smoothing algorithm proposed by David H Douglas et al. For each section of the road centerline, the present invention generates a series of key points by means of interpolation and magnification, and calculates the direction angle between every two adjacent key points. In order to simplify the direction estimation task, the present invention sets the interval size to 10° to cluster the directions. The present invention expands the skeletonized road direction ground truth by 20 pixels to the outside to generate pixel-level road direction ground truth for the supervision of the road direction prediction task. Finally, the present invention uses the 36-dimensional direction class to annotate the direction of the road surface in the road area, the non-road area is marked as the background class, and the road direction prediction problem is transformed into a pixel-level classification that converts the road direction into 36 types of angles and 1 type of background. question.

在网络训练上，本发明的所有实验均在具有16GB RAM和两个NVIDIA 2080Ti GPU(11GB)的服务器上，使用PyTorch框架实现训练和测试。在训练期间，本发明在尺寸为512×512的图像上使用随机裁剪操作，生成256×256大小的图像用作IterNet的输入。此外，本发明采用了随机水平翻转、镜像和旋转等数据增强操作以避免过拟合。本发明使用随机梯度下降方法优化整个网络，其动量值为0.9，批量大小为32，总共训练160轮。初始学习率为2e-2，在第 60轮后下降10倍，在第90轮后设置为5e-4，在第120轮后下降10倍。在推理过程中，Deepglobe数据集图像大小为512×512，Massachusetts数据集图像大小为320×320，图像直接输入网络，无需进一步裁剪操作。For network training, all experiments of the present invention are performed on a server with 16GB RAM and two NVIDIA 2080Ti GPUs (11GB), using the PyTorch framework to implement training and testing. During training, the present invention uses random cropping operations on images of size 512 × 512 to generate images of size 256 × 256 as input to IterNet. In addition, the present invention employs data augmentation operations such as random horizontal flipping, mirroring, and rotation to avoid overfitting. The present invention uses the stochastic gradient descent method to optimize the entire network with a momentum value of 0.9, a batch size of 32, and a total of 160 rounds of training. The initial learning rate is 2e-2, which decreases by a factor of 10 after the 60th epoch, and is set to 5e-4 after the 90th epoch, and decreases by a factor of 10 after the 120th epoch. In the inference process, the image size of Deepglobe dataset is 512×512, and the image size of Massachusetts dataset is 320×320. The images are directly input to the network without further cropping operations.

本发明在具体实施时选取的对比方法是StckHourglass，StckHourglass方法是简单利用道路方向信息进行监督以获得更好的道路提取结果，为了进行公平的比较，StckHourglass方法使用其公开的代码以及其建议的参数设置，并且均使用相同的预训练网络，在相同的测试集上进行测试。从最终的实验结果来看，本发明提出的IterNet方法在所有不同实验设置中指标IoU、F1、Precision 和Recall均获得了最佳性能。IoU、F1、Precision和Recall等指标越高证明方法的精度越高。Deepglobe数据集上具体实验结果如下表1所示：The comparison method selected in the present invention is StckHourglass. The StckHourglass method simply uses the road direction information to supervise to obtain better road extraction results. For a fair comparison, the StckHourglass method uses its public code and its suggested parameters. settings, and both use the same pretrained network and test on the same test set. From the final experimental results, the IterNet method proposed in the present invention achieves the best performance in all different experimental settings for the indicators IoU, F1, Precision and Recall. The higher the indicators such as IoU, F1, Precision and Recall, the higher the accuracy of the method. The specific experimental results on the Deepglobe dataset are shown in Table 1 below:

表1 Deepglobe数据集上具体实验结果Table 1 Specific experimental results on the Deepglobe dataset

Massachusetts数据集上具体实验结果如下表2所示：The specific experimental results on the Massachusetts dataset are shown in Table 2 below:

表2 Massachusetts数据集上具体实验结果Table 2 Specific experimental results on the Massachusetts dataset

Claims

1. A road extraction method based on remote sensing satellite images is characterized by comprising a feature encoder, an iterative feature enhancer network and a multitask decoder; the specific extraction steps are as follows;

step 1: converting the remote sensing satellite image into an RGB image, inputting the RGB image into a feature encoder for feature extraction, outputting a feature map with the resolution being 4 times of the down-sampling size of the original input image, and acquiring road semantic features and road direction features;

step 2: the features extracted in the step 1 are sent into an iterative feature enhancement sub-network; the iterative feature enhancer network comprises a plurality of feature enhancement units, wherein each feature enhancement unit comprises a semantic guide feature enhancement module, a direction perception feature aggregation module and two side branches;

inputting road semantic features and road direction features to a semantic guide feature enhancing module to obtain enhanced road direction features; the enhanced road direction characteristics are divided into two branches, one branch is used for characteristic fusion, the other branch is input into a side branch to obtain a preliminary road direction prediction result, and the side branch is composed of two convolution layers;

the semantic guide feature enhancement module comprises a feature selection part and a feature fusion part; the characteristic selection part comprises a convolution layer, an activation function layer and an hourglass-shaped sub-network;

in the feature selection part, the semantic features and the direction features of the input road are merged in a cascading mode, and the merged features are sent to a convolution layer and an activation function layer to obtain a single-channel feature confidence coefficient;

B＝Sigmoid(Ψ₁(cat(F_e,F_o),Θ₁)) (1)

where B is the confidence of the single-channel feature, cat (·,) represents the cascade in the channel direction, Ψ₁(·,Θ₁) The expression parameter is theta₁The convolutional layer of (1); f_eRepresenting road semantic features; f_oRepresenting road direction characteristics;

the generated single-channel feature confidence coefficient respectively selects the features in the road semantic feature and road direction feature numerical value domains and inputs the feature into an hourglass-shaped sub-network; the hourglass-shaped sub-network firstly carries out down-sampling and then carries out up-sampling on the selected features;

F_a＝Ψ₂((1-B)·F_e+B·F_o,Θ₂) (2)

therein Ψ₂(·,Θ₂) The expression parameter is theta₂The convolutional layer of (1); f_aSelecting the generated features for the passed features;

the feature fusion part performs weighted feature fusion on the input road semantic features, road direction features and the results of the feature selection part by using a fusion mask to obtain enhanced road direction features;

F′_o＝(1+M)·F_o+(1+M)·F_e, (3)

M＝Sigmoid(D(E(F_a)))

wherein D (-) and E (-) represent the encoding and decoding parts, F ', respectively, in an hourglass shaped sub-network'_oRepresents an enhanced road direction feature, M represents a fusion mask;

inputting the preliminary road direction prediction result and road semantic features into a direction perception feature aggregation module to generate reinforced road semantic features; the reinforced road semantic features are divided into two branches, one branch is used for feature fusion, the other branch is input to the other side branch, and a preliminary road segmentation result is generated and used for supervising network learning;

the direction perception characteristic aggregation module adjusts the shape and the direction of a receptive field according to the preliminary road direction prediction result and focuses on an information area for road extraction; the direction perception feature aggregation module comprises a residual error structure of a plurality of branches, and each branch comprises a direction perception deformable convolution and a ReLU unit; different branches adopt convolution kernels with different sizes and shapes; weighted combination of the features obtained by all branches and the input road semantic features is carried out to output the reinforced road semantic features;

F′_e＝F_e+λ·∑_iΦ₀(F_e,α,θ_i) (4)

wherein, F_eRepresenting road semantic features, F'_eRepresenting the reinforced road semantic features; alpha represents the road direction of the side branch obtained from the road direction characteristics, and i represents the pointNumber of branches,. phi₀With the expression parameter theta_iLambda is a hyper-parameter of a residual structure, and is used for balancing the weight of the aggregation characteristic and the original characteristic;

a convolution kernel of the direction perception deformable convolution automatically adjusts the receptive field according to the preliminary road direction prediction result; the direction-sensing deformable convolution layer outputs;

wherein X represents an input feature map; y (p) represents an output value at position p; r represents receptive field; α (p) denotes a predicted road direction angle at a position p, and r (p, α) denotes a direction applied to a coordinate p ═ p [ p ]^x,p^y]Rotational translation of (c); w (p)_n) Weights of the convolution kernel receptive fields;

fusing the enhanced direction features, the enhanced road semantic features, the original road semantic features and the original road direction features, and inputting the fused features into a next feature enhancement unit;

and step 3: and finally, the output characteristics of the sub-network are sent to a multitask decoder to obtain a road extraction result and a road direction prediction result.

2. The method for extracting a road based on remote sensing satellite images as claimed in claim 1, wherein the convolution kernels with different sizes and shapes are respectively 5 x 1, 1 x 5 and 3 x 3.

3. The method for extracting a road based on remote sensing satellite images as claimed in claim 1 or 2, wherein when r (p, α) is a non-integer, bilinear interpolation is adopted to sample the input feature map X.