CN117649600A

CN117649600A - Multi-category road automatic extraction method and system combining direction and semantic features

Info

Publication number: CN117649600A
Application number: CN202311555876.8A
Authority: CN
Inventors: 李慧芳; 邵乘霖
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2023-11-17
Filing date: 2023-11-17
Publication date: 2024-03-05
Anticipated expiration: 2043-11-17
Also published as: CN117649600B

Abstract

The invention discloses a multi-category road automatic extraction method and system that combines direction and semantic features. First, remote sensing image data is obtained and data enhancement operations are performed; then the enhanced remote sensing image data is input into a multi-category road extraction network to output accurate The multi-category road extraction result map; the multi-category road extraction network includes a dense feature sharing encoder, a direction-guided feature stacking module, a semantic feature enhancement decoder and a directional feature decoder. The invention can significantly alleviate the situation of road extraction fragmentation and category confusion, and can be effectively applied to large-scale, multi-category road extraction tasks, thereby significantly improving the update efficiency of various road networks, and can extract railways and highways from remote sensing images with high quality. , paths and bridges.

Description

Multi-category road automatic extraction method and system combining direction and semantic features

技术领域Technical field

本发明涉及遥感影像信息处理、机器学习技术领域，涉及一种多类别道路自动提取方法及系统，具体涉及一种联合方向和语义特征的多类别道路自动提取方法及系统，可应用于从高空间分辨率遥感影像当中实时提取并更新多类路网信息。The invention relates to the technical fields of remote sensing image information processing and machine learning, and to a multi-category road automatic extraction method and system. Specifically, it relates to a multi-category road automatic extraction method and system that combines direction and semantic features, and can be applied to extracting data from high space. Extract and update multiple types of road network information in real time from high-resolution remote sensing images.

背景技术Background technique

道路信息提取与更新是测绘领域的重要工作，它对于智慧城市建设、无人自动驾驶、城市建设规划等领域有着极为重要的意义。传统的道路数据主要通过人工目视判别和标注实现，耗时长，效率低，主观误差大，远远不能满足当今社会路网信息实时更新的要求。伴随着高分辨率对地观测技术以及机器学习的发展，从高空间分辨率的遥感影像中提取道路信息能够极大地减少人力成本，具有更高的提取准确性。Road information extraction and update is an important task in the field of surveying and mapping. It is of extremely important significance to smart city construction, unmanned autonomous driving, urban construction planning and other fields. Traditional road data is mainly realized through manual visual identification and labeling, which is time-consuming, low-efficiency, and has large subjective errors. It is far from meeting the real-time update requirements of road network information in today's society. With the development of high-resolution earth observation technology and machine learning, extracting road information from high-spatial-resolution remote sensing images can greatly reduce labor costs and achieve higher extraction accuracy.

现有的道路提取方法总体可分为基于专家知识的方法以及基于深度学习的方法。基于专家知识的方法主要通过利用纹理信息、形态特征等方式提取道路信息，这类方法能够处理背景简单、路网清晰的遥感影像，但处理复杂场景下的道路提取任务具有较大的局限性；基于深度学习的方法是近年来的研究热点，它能够通过深层神经网络，自动获取影像表层的纹理特征和深层的抽象特征，往往具有较高的提取精度。Existing road extraction methods can generally be divided into methods based on expert knowledge and methods based on deep learning. Methods based on expert knowledge mainly extract road information by using texture information, morphological features, etc. This type of method can handle remote sensing images with simple backgrounds and clear road networks, but has great limitations in handling road extraction tasks in complex scenes; Methods based on deep learning have been a hot research topic in recent years. They can automatically obtain surface texture features and deep abstract features of images through deep neural networks, often with high extraction accuracy.

随着研究的深入，上述方法在实际应用中存在着一定的局限：首先道路提取往往被视为像素级的分类任务，在常见的编码器结构中容易丢失细节形态，使其在连通性方面表征不佳；此外，现有的研究都只将道路提取都只粗略划分影像中的背景和道路，尚无研究能够同时处理多种不同类型道路的实时提取，现有方法在处理多类道路提取任务时，存在着严重的类别混淆问题。因此，如何提升现有模型在提取道路目标时的形态挖掘与语义理解能力，仍然是一个面临着诸多挑战的难题。With the deepening of research, the above methods have certain limitations in practical applications: first, road extraction is often regarded as a pixel-level classification task, and detailed shapes are easily lost in common encoder structures, making it difficult to characterize connectivity. Not good; in addition, existing research only roughly divides the background and roads in the image into road extraction. There is no research that can handle the real-time extraction of multiple different types of roads at the same time. Existing methods are unable to handle multi-type road extraction tasks. At this time, there is a serious category confusion problem. Therefore, how to improve the morphological mining and semantic understanding capabilities of existing models when extracting road targets is still a difficult problem facing many challenges.

发明内容Contents of the invention

为了克服现有技术的不足，本发明提出一种联合方向和语义特征的多类别道路自动提取方法及系统，实现复杂场景下同时提取铁路、公路、小路以及桥梁等路网信息，并保证提取结果的类别准确性和道路连通性。In order to overcome the shortcomings of the existing technology, the present invention proposes a multi-category road automatic extraction method and system that combines direction and semantic features to realize the simultaneous extraction of road network information such as railways, highways, paths and bridges in complex scenarios, and ensure the extraction results Class accuracy and road connectivity.

本发明的方法所采用的技术方案是：一种联合方向和语义特征的多类别道路自动提取方法，包括以下步骤：The technical solution adopted by the method of the present invention is: a multi-category road automatic extraction method that combines direction and semantic features, including the following steps:

步骤1：获取遥感影像数据并进行数据增强操作；Step 1: Obtain remote sensing image data and perform data enhancement operations;

步骤2：将增强后的遥感影像数据，输入多类别道路提取网络，输出精确的多类别道路提取结果图；Step 2: Input the enhanced remote sensing image data into the multi-category road extraction network and output an accurate multi-category road extraction result map;

所述多类别道路提取网络，包括密集特征共享编码器、方向引导的特征堆叠模块、语义特征增强解码器和方向特征解码器；The multi-category road extraction network includes a dense feature sharing encoder, a direction-guided feature stacking module, a semantic feature enhancement decoder and a directional feature decoder;

所述密集特征共享编码器，用户获取多尺度精细化的道路特征图；所述方向引导的特征堆叠模块，用于引入道路方向特征并将之与道路语义特征进行充分融合，得到方向信息融合后的特征图；方向信息融合后的特征图进行分别利用语义特征增强解码器和方向特征解码器进行双分支解码。The dense feature sharing encoder enables users to obtain multi-scale refined road feature maps; the direction-guided feature stacking module is used to introduce road direction features and fully integrate them with road semantic features to obtain directional information fusion The feature map; the feature map after fusion of direction information is used for dual-branch decoding using the semantic feature enhancement decoder and the direction feature decoder respectively.

作为优选，步骤1中，所述数据增强操作包括随机翻转、随机旋转、随机擦除等操作，得到增强后的数据。Preferably, in step 1, the data enhancement operations include random flipping, random rotation, random erasing and other operations to obtain enhanced data.

作为优选，步骤2中，所述密集特征共享编码器，由1个初始化卷积层、2个密集连接块以及1个最大池化层组成；所述密集连接块由若干中间层组成，所述中间层由归一化层、非线性激活层、卷积层构成，每一个中间层的输入来自前面所有中间层的输出，与前面特征图的关系由式(1)得到：Preferably, in step 2, the dense feature sharing encoder is composed of 1 initialization convolution layer, 2 dense connection blocks and 1 max pooling layer; the dense connection block is composed of several intermediate layers, and the The intermediate layer consists of a normalization layer, a nonlinear activation layer, and a convolution layer. The input of each intermediate layer comes from the output of all previous intermediate layers. The relationship with the previous feature map is obtained by Equation (1):

X_N＝H_N([X₀,X₁,…,X_N-1]) (1)X _N =H _N ([X ₀ ,X ₁ ,…,X _N-1 ]) (1)

式中，[X₀,X₁,…,X_N-1]表示从第0层到第N-1层的输出特征图进行通道合并，H_N表示第N个中间层。In the _formula _, [X ₀ _,

作为优选，步骤2中，所述方向引导的特征堆叠模块，由两个双分支注意力模块和一个融合模块构建而成；Preferably, in step 2, the direction-guided feature stacking module is constructed from two dual-branch attention modules and a fusion module;

所述双分支注意力模块，包含两条结构相同网络分支，即一条语义分支和一条方向分支，分别用于学习道路语义信息和方向信息；两条分支共享整个模块的下采样部分以及跨步连接部分，在上采样部分具有单独的网络传播路径；The dual-branch attention module includes two network branches with the same structure, namely a semantic branch and a direction branch, which are used to learn road semantic information and direction information respectively; the two branches share the downsampling part of the entire module and the stride connection part, with a separate network propagation path in the upsampling part;

所述下采样部分，包括串联设置的3组最大池化层和残差块；每一组的最大池化层利用2×2的核减少特征图的冗余信息，随后利用残差块提取不同尺度上的道路信息；The downsampling part includes three groups of max pooling layers and residual blocks set in series; the max pooling layer of each group uses a 2×2 kernel to reduce the redundant information of the feature map, and then uses the residual block to extract different Road information at scale;

所述上采样部分，其输入为上述下采样部分最终得到的高维道路特征信息，随后利用一系列残差模块和最邻近插值上采样操作特征图进行逐步将道路特征还原为原始尺寸；The input of the upsampling part is the high-dimensional road feature information finally obtained by the downsampling part, and then a series of residual modules and nearest neighbor interpolation upsampling operation feature maps are used to gradually restore the road features to the original size;

所述跨步连接部分，包含3组并行的坐标注意力模块和残差块；所述坐标注意力模块，输入为通道数为C，宽为W，高为H的特征图，分别在长和宽两个维度进行平均池化，随后进行通道拼接操作，将拼接后的特征图依次通过一层卷积层、一层批归一化层以及sigmoid层，实现特征降维；随后重新分离特征图，将嵌入的特定方向信息的两个特征图分别通过一层卷积层和一层sigmoid层编码为两个方向的注意力图，每个注意力图都沿水平或垂直方向捕获特征图的远距离依存关系；然后通过乘法将两个方向的注意力图都应用于输入特征图，以强调注意道路区域的空间特征表示；最后通过一个残差块和一层卷积层，最终使得特征图还原至原始分辨率；The stride connection part includes three sets of parallel coordinate attention modules and residual blocks; the input of the coordinate attention module is a feature map with a channel number of C, a width of W, and a height of H, with length and height respectively. Average pooling is performed in two wide dimensions, and then the channel splicing operation is performed. The spliced feature map is passed through a convolution layer, a batch normalization layer and a sigmoid layer in order to achieve feature dimensionality reduction; then the feature map is re-separated. , the two feature maps with embedded direction-specific information are encoded into attention maps in two directions through a convolution layer and a sigmoid layer respectively. Each attention map captures the long-range dependence of the feature map along the horizontal or vertical direction. relationship; then the attention maps in both directions are applied to the input feature map through multiplication to emphasize the spatial feature representation of the road area; finally, through a residual block and a convolution layer, the feature map is finally restored to the original resolution Rate;

所述融合模块，包括中间监督部分以及分支融合部分；语义分支和方向分支的特征图在上采样部分还原至原始分辨率后，在中间监督的部分通过一个分类卷积层获得粗略的道路类别预测图或道路方向预测图，直接与对应尺度的真值图计算损失值，实现中间监督；分支融合部分，基于语义分支和方向分支输出，均利用一个卷积层将粗略的道路类别预测图还原至初始通道数，并将该特征与经过卷积层的输出特征进行初步融合；在第一个双分支注意模块内，该方向特征将与语义分支特征相融合，最终实现任务间信息流共享。The fusion module includes an intermediate supervision part and a branch fusion part; after the feature maps of the semantic branch and the direction branch are restored to the original resolution in the upsampling part, a rough road category prediction is obtained through a classification convolution layer in the intermediate supervision part Map or road direction prediction map, directly calculate the loss value with the ground truth map of the corresponding scale to achieve intermediate supervision; the branch fusion part, based on the semantic branch and direction branch output, uses a convolution layer to restore the rough road category prediction map to Initial channel number, and preliminary fusion of this feature with the output feature through the convolutional layer; in the first dual-branch attention module, the direction feature will be fused with the semantic branch feature, ultimately realizing information flow sharing between tasks.

作为优选，步骤2中，所述语义特征增强解码器，包括一个基于反卷积的解码器、一个深层监督模块和一个类别注意力模块；Preferably, in step 2, the semantic feature enhancement decoder includes a deconvolution-based decoder, a deep supervision module and a category attention module;

所述基于反卷积的解码器，通过一系列反卷积和过渡卷积使得特征图逐步还原至输入尺寸；The deconvolution-based decoder gradually restores the feature map to the input size through a series of deconvolution and transition convolution;

所述类别注意力模块，首先将输入特征图通过一个卷积核为1×1的卷积层，将通道数调整为C’；随后将特征图经过矩阵转置与变形操作，得到尺寸为HW×C’的二维特征图；接着将输入的粗预测图进行矩阵变形，得到尺寸为N×HW的类别注意力矩阵P，将类别注意力矩阵P与上述二维特征图进行矩阵相乘，得到N×C’的类中心；再将该类中心转置为C’×N的二维矩阵与N×HW的类别注意力矩阵相乘，经过矩阵变形后得到尺寸为C’×H×W的特征图；最后，将该特征图通过另一个卷积核为1×1的卷积层，最终输出尺寸为C×H×W的类别优化特征；The category attention module first passes the input feature map through a convolution layer with a convolution kernel of 1×1, and adjusts the number of channels to C'; then the feature map is subjected to matrix transposition and deformation operations to obtain a size of HW ×C' two-dimensional feature map; then perform matrix deformation on the input rough prediction map to obtain a category attention matrix P of size N×HW, and perform matrix multiplication of the category attention matrix P and the above two-dimensional feature map, Obtain the class center of N×C'; then transpose the class center into a two-dimensional matrix of C'×N and multiply it with the category attention matrix of N×HW. After matrix deformation, the size is C'×H×W. The feature map; finally, the feature map is passed through another convolution layer with a convolution kernel of 1×1, and the final output is a category optimization feature with a size of C×H×W;

所述经过类别注意力模块的类别优化特征将与基于反卷积的解码器中的特征进行融合，最后通过分类层得到精细的道路类别提取结果。The category-optimized features that pass through the category attention module will be fused with the features in the deconvolution-based decoder, and finally the refined road category extraction results will be obtained through the classification layer.

作为优选，步骤2中，所述方向特征解码器，为基于反卷积的解码器，通过一系列反卷积和过渡卷积使得特征图逐步还原至输入尺寸。Preferably, in step 2, the direction feature decoder is a deconvolution-based decoder, which gradually restores the feature map to the input size through a series of deconvolution and transitional convolution.

作为优选，所述多类别道路提取网络，是训练好的网络；Preferably, the multi-category road extraction network is a well-trained network;

训练过程包括以下子步骤：The training process includes the following sub-steps:

步骤2.1：建立多类别道路数据集；Step 2.1: Create a multi-category road data set;

对包含道路的高分辨率遥感影像数据按照预设比例随机划分为训练集、验证集和测试集，同时手工标定对应区域道路像素的类别，类别为铁路、公路、道路和桥梁，得到道路类别标签；随后利用道路方向计算方法，利用二值化的道路类别标签生成道路方向标签；最后对训练集数据集内的遥感影像、道路类别标签以及道路方向标签采取数据增强操作，完成数据集建立；The high-resolution remote sensing image data containing roads are randomly divided into training sets, verification sets, and test sets according to preset proportions. At the same time, the categories of road pixels in the corresponding areas are manually calibrated. The categories are railways, highways, roads, and bridges, and the road category labels are obtained. ; Then use the road direction calculation method and use the binarized road category labels to generate road direction labels; finally, perform data enhancement operations on the remote sensing images, road category labels, and road direction labels in the training set data set to complete the creation of the data set;

步骤2.2：将训练集数据输入所述多类别道路提取网络中，进行网络训练；Step 2.2: Input the training set data into the multi-category road extraction network and perform network training;

训练过程中，通过使用SGD优化器调整学习率实现参数更新优化；在给定训练集中的道路类别标签和道路方向标签的情况下，利用联合损失函数来训练网络；训练结束后，保存最优的网络参数。During the training process, parameter update optimization is achieved by adjusting the learning rate using the SGD optimizer; given the road category labels and road direction labels in the training set, the joint loss function is used to train the network; after the training is completed, the optimal Network parameters.

作为优选，步骤2.1中，所述利用二值化的道路类别标签生成道路方向标签，具体过程如下：Preferably, in step 2.1, the binarized road category label is used to generate the road direction label. The specific process is as follows:

(1)利用骨架细化算法提取得到道路中心线及其关键点，并按照从上到下，从左到右进行关键点坐标排列；(1) Use the skeleton refinement algorithm to extract the road centerline and its key points, and arrange the key point coordinates from top to bottom and from left to right;

(2)依次连接关键点(p₁,p₂,…,p_i)，计算相邻关键点相连形成的单位方向向量随后将方向向量转换到极坐标系下计算其对应的道路方向角度值；(2) Connect the key points (p ₁ , p ₂ ,..., p _i ) in sequence, and calculate the unit direction vector formed by connecting adjacent key points Then convert the direction vector to the polar coordinate system to calculate its corresponding road direction angle value;

(3)此后将某像素点M与最近的关键点p_i相连形成连线向量，并计算在单位方向向量上的投影长度，若投影长度在自动计算的道路径向宽度阈值内，则该点像素的像素被赋值为与道路单位方向向量相同的方向值；(3) Afterwards, a certain pixel point M is connected to the nearest key point p _i to form a connection vector, and the vector in the unit direction is calculated. If the projection length is within the automatically calculated road directional width threshold, the pixel of the point pixel is assigned the same direction value as the road unit direction vector;

(4)遍历图中所有像素点，最终得到对应的道路方向标签。(4) Traverse all pixels in the image and finally obtain the corresponding road direction label.

作为优选，步骤2.2中，联合损失函数由道路方向损失与道路语义损失两部分构成，其中道路方向损失采用交叉熵损失进行损失值计算；对于道路语义损失/>利用基于类别平衡的SoftIOU损失进行计算；As a preference, in step 2.2, the joint loss function consists of two parts: road direction loss and road semantic loss, where road direction loss Cross-entropy loss is used to calculate the loss value; for road semantic loss/> Calculated using SoftIOU loss based on class balance;

其中s表示尺度，其值为{(H,W),(H/2,W/2),(H/4,W/4)}，H为特征图的高度，W为特征图的宽度；和/>分别为预测道路类别结果与真实道路类别结果；/>为类i的有效样本数，E_all为所有样本的总数；where s represents the scale, and its value is {(H,W),(H/2,W/2),(H/4,W/4)}, H is the height of the feature map, and W is the width of the feature map; and/> They are the predicted road category results and the real road category results respectively;/> is the number of valid samples of class i, E _all is the total number of all samples;

最后，总损失函数L_total定义如下：Finally, the total loss function L _total is defined as follows:

本发明的系统所采用的技术方案是：一种联合方向和语义特征的多类别道路自动提取系统，包括：The technical solution adopted by the system of the present invention is: a multi-category road automatic extraction system that combines direction and semantic features, including:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行时，使得所述一个或多个处理器实现所述的联合方向和语义特征的多类别道路自动提取方法。A storage device configured to store one or more programs, which when executed by the one or more processors, causes the one or more processors to implement the joint direction and semantic features A multi-category road automatic extraction method.

本发明与现有技术相比的优点在于：The advantages of the present invention compared with the prior art are:

①与只能提取单类道路的其他路网提取方法不同，本发明能够同时高精度提取包含铁路、公路、小路、桥梁等多类路网，具有较强的细粒度分类能力，能够适应城市、乡村、水域、山区等多种复杂场景下各类道路的提取工作。① Unlike other road network extraction methods that can only extract a single type of road, this invention can simultaneously extract multiple types of road networks including railways, highways, paths, bridges, etc. with high precision. It has strong fine-grained classification capabilities and can be adapted to cities, Extraction of various types of roads in various complex scenes such as rural areas, water bodies, mountainous areas, etc.

②本发明采用了多任务联合学习模式，引入了道路方向先验信息，相较于其他语义分割方法，有效提升了道路提取结果的连通性。② This invention adopts a multi-task joint learning model and introduces prior information of road direction. Compared with other semantic segmentation methods, it effectively improves the connectivity of road extraction results.

③本发明构建了一趟多类别道路数据集，能够为其他道路提取方法提供可靠的训练数据。③ This invention constructs a multi-category road data set, which can provide reliable training data for other road extraction methods.

④本发明采用了联合损失函数，包括一种方向损失函数以及一种基于类平衡SoftIoU损失函数，能够有效缓解真实场景下训练数据中不同道路类别的像素数量不平衡的问题，进一步提升了网络模型的道路分类能力。④ This invention uses a joint loss function, including a direction loss function and a class-balanced SoftIoU loss function, which can effectively alleviate the problem of imbalance in the number of pixels of different road categories in the training data in real scenarios, and further improve the network model. road classification capabilities.

附图说明Description of drawings

下面使用实施例，以及具体实施方式作进一步说明本文的技术方案。另外，在说明技术方案的过程中，也使用了一些附图。对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图以及本发明的意图。Examples and specific implementations are used below to further illustrate the technical solutions herein. In addition, in the process of explaining the technical solution, some drawings are also used. For those skilled in the art, other drawings and the intention of the present invention can also be obtained based on these drawings without exerting creative efforts.

图1是本发明实施例的网络模型整体结构图；Figure 1 is an overall structural diagram of a network model according to an embodiment of the present invention;

图2是本发明实施例的方向引导堆叠模块结构图；Figure 2 is a structural diagram of a direction guidance stack module according to an embodiment of the present invention;

图3是本发明实施例的语义增强分支结构图；Figure 3 is a semantic enhancement branch structure diagram according to the embodiment of the present invention;

图4是本发明实施例的网络训练流程图；Figure 4 is a network training flow chart according to the embodiment of the present invention;

图5是本发明实施例的数据集中的数据集内遥感影像及道路类别标签样图；Figure 5 is a sample image of remote sensing images and road category labels in the data set according to the embodiment of the present invention;

图6是本发明实施例的数据集中的道路方向标签生成流程图；Figure 6 is a flow chart for generating road direction labels in the data set according to the embodiment of the present invention;

图7采用本发明方法与其他方法在遥感图像道路提取任务的可视化对比结果图。Figure 7 is a visual comparison result diagram of road extraction tasks in remote sensing images using the method of the present invention and other methods.

具体实施方式Detailed ways

为了便于本领域普通技术人员理解和实施本发明，下面结合附图及实施例对本发明作进一步的详细描述，应当理解，此处所描述的实施示例仅用于说明和解释本发明，并不用于限定本发明。In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the drawings and examples. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention and are not intended to limit it. this invention.

本实施例提供的一种联合方向和语义特征的多类别道路自动提取方法，包括以下步骤：This embodiment provides a multi-category road automatic extraction method that combines direction and semantic features, including the following steps:

在一种实施方式中，所述数据增强操作包括随机翻转、随机旋转、随机擦除等操作，得到增强后的数据。In one implementation, the data enhancement operations include random flipping, random rotation, random erasing and other operations to obtain enhanced data.

在一种实施方式中，请见图1，所述多类别道路提取网络，包括密集特征共享编码器、方向引导的特征堆叠模块、语义特征增强解码器和方向特征解码器；In one implementation, please see Figure 1. The multi-category road extraction network includes a dense feature sharing encoder, a direction-guided feature stacking module, a semantic feature enhancement decoder and a directional feature decoder;

在一种实施方式中，请见图1，所述密集特征共享编码器，由1个初始化卷积层、2个密集连接块以及1个最大池化层组成；所述密集连接块由若干中间层组成，所述中间层由归一化层、非线性激活层、卷积层构成，每一个中间层的输入来自前面所有中间层的输出，与前面特征图的关系由式(1)得到：In one implementation, please see Figure 1. The dense feature sharing encoder consists of 1 initialization convolution layer, 2 dense connection blocks and 1 max pooling layer; the dense connection block consists of several intermediate The intermediate layer is composed of a normalization layer, a nonlinear activation layer, and a convolution layer. The input of each intermediate layer comes from the output of all previous intermediate layers. The relationship with the previous feature map is obtained by Equation (1):

式中，[X₀,X₁,…,X_N-1]表示从第0层到第N-1层的输出特征图进行通道合并，H_N表示第N个中间层(归一化—线性校正—卷积层)。这种架构在可以很好地增加层间信息交互，改善特征提取的信息流与梯度流，实现道路精细特征的充分挖掘。In the _formula _, [X ₀ _, correction—convolutional layer). This architecture can effectively increase the information interaction between layers, improve the information flow and gradient flow of feature extraction, and realize the full mining of fine road features.

在一种实施方式中，请见图2，所述方向引导的特征堆叠模块，由两个双分支注意力模块和一个融合模块构建而成；In one implementation, please see Figure 2. The direction-guided feature stacking module is constructed from two dual-branch attention modules and a fusion module;

所述双分支注意力模块包含两条结构相同网络分支，即一条语义分支和一条方向分支，分别用于学习道路语义信息和方向信息。两条分支共享整个模块的下采样部分以及跨步连接部分，在上采样部分具有单独的网络传播路径。The dual-branch attention module includes two network branches with the same structure, namely a semantic branch and a direction branch, which are used to learn road semantic information and direction information respectively. The two branches share the downsampling part and the stride connection part of the entire module, and have separate network propagation paths in the upsampling part.

所述下采样部分，包括串联式设置的3组最大池化层和残差块。每一组的最大池化层利用2×2的核减少特征图的冗余信息，随后利用残差块提取不同尺度上的道路信息。The downsampling part includes three sets of max pooling layers and residual blocks set in series. The maximum pooling layer of each group uses a 2×2 kernel to reduce the redundant information of the feature map, and then uses the residual block to extract road information at different scales.

所述上采样部分，其输入为上述下采样部分最终得到的高维道路特征信息，随后利用一系列残差模块和最邻近插值上采样操作特征图进行逐步将道路特征还原为原始尺寸。The input of the upsampling part is the high-dimensional road feature information finally obtained by the downsampling part, and then a series of residual modules and nearest neighbor interpolation upsampling operation feature maps are used to gradually restore the road features to the original size.

所述跨步连接部分，用于保留下采样过程中丢失原始特征中的细节空间信息，包含3组并行的坐标注意力模块和残差块。坐标注意力模块用于进一步突出重要的空间特征，其输入为通道数为C，宽为W，高为H的特征图，分别在长和宽两个维度进行平均池化，该操作的目的是为了为了避免传统二维全局池化导致位置信息损失。随后进行通道拼接操作，将拼接后的特征图依次通过一层卷积层、一层批归一化层以及sigmoid层，实现特征降维。随后重新分离特征图，将嵌入的特定方向信息的两个特征图分别通过一层卷积层和一层sigmoid层编码为两个方向的注意力图，每个注意力图都沿水平或垂直方向捕获特征图的远距离依存关系。然后通过乘法将两个方向的注意力图都应用于输入特征图，以强调注意道路区域的空间特征表示。利用跨步连接将通过各残差模块的多尺度特征添加到上采样部分每个任务分支对应的相同尺寸特征图，最后通过一个残差块和一层卷积层，最终使得特征图还原至原始分辨率。The stride connection part is used to retain the detailed spatial information lost in the original features during the downsampling process, and includes three sets of parallel coordinate attention modules and residual blocks. The coordinate attention module is used to further highlight important spatial features. Its input is a feature map with a channel number of C, a width of W, and a height of H. It performs average pooling in the two dimensions of length and width. The purpose of this operation is In order to avoid the loss of position information caused by traditional two-dimensional global pooling. Then the channel splicing operation is performed, and the spliced feature map is passed through a convolution layer, a batch normalization layer and a sigmoid layer in sequence to achieve feature dimensionality reduction. The feature maps are then re-separated, and the two feature maps with embedded direction-specific information are encoded into attention maps in two directions through a convolutional layer and a sigmoid layer respectively. Each attention map captures features along the horizontal or vertical direction. Graph long-range dependencies. The attention maps in both directions are then applied to the input feature map via multiplication to emphasize the spatial feature representation of the attention road region. The multi-scale features passed through each residual module are added to the same size feature map corresponding to each task branch in the upsampling part using a stride connection, and finally through a residual block and a convolution layer, the feature map is finally restored to the original resolution.

所述融合模块包括中间监督部分以及分支融合部分。中间监督的目的是为了使得道路提取网络的中间层能够在训练时直接从真值图中获取真实语义或方向信息，更稳定地训练网络。语义分支和方向分支的特征图在上采样部分还原至原始分辨率后，在中间监督的部分通过一个分类卷积层获得粗略的道路类别预测图或道路方向预测图，直接与对应尺度的真值图计算损失值，实现中间监督。分支融合部分，以语义分支为例，利用一个卷积层将粗略的道路类别预测图还原至初始通道数，并将该特征与经过卷积层的输出特征进行初步融合，方向分支的特征所做操作相同。在第一个双分支注意模块内，该方向特征将与语义分支特征相融合，最终实现任务间信息流共享，有助于提升道路的连通性。The fusion module includes an intermediate supervision part and a branch fusion part. The purpose of intermediate supervision is to enable the intermediate layer of the road extraction network to directly obtain real semantic or direction information from the ground truth map during training, so as to train the network more stably. After the feature maps of the semantic branch and the direction branch are restored to the original resolution in the upsampling part, a rough road category prediction map or road direction prediction map is obtained through a classification convolution layer in the intermediate supervision part, which is directly compared with the true value of the corresponding scale. Graph calculates the loss value and implements intermediate supervision. In the branch fusion part, taking the semantic branch as an example, a convolution layer is used to restore the rough road category prediction map to the initial number of channels, and the features are initially fused with the output features of the convolution layer. The features of the direction branch are The operation is the same. In the first dual-branch attention module, the direction feature will be integrated with the semantic branch feature, ultimately realizing information flow sharing between tasks and helping to improve road connectivity.

在一种实施方式中，请见图3，所述语义特征增强解码器，包括一个基于反卷积的解码器、一个深层监督模块和一个类别注意力模块；所述基于反卷积的解码器，通过一系列反卷积和过渡卷积使得特征图逐步还原至输入尺寸；所述深层监督模块，利用基于反卷积的解码器中不同尺度的特征图以及对应尺度的类别标签真值中进行监督并生成粗略分类图；In one implementation, please see Figure 3. The semantic feature enhancement decoder includes a deconvolution-based decoder, a deep supervision module and a category attention module; the deconvolution-based decoder , through a series of deconvolution and transitional convolution, the feature map is gradually restored to the input size; the deep supervision module is carried out using the feature maps of different scales in the deconvolution-based decoder and the true values of the category labels of the corresponding scales Supervise and generate rough classification maps;

所述类别注意力模块，用于增强道路类别的细分能力，通过检查每个像素与计算得到的类中心的一致性，使得网络能够从全局的角度来理解每个类的整体呈现。该模块首先将输入特征图通过一个卷积核为1×1的卷积层，将通道数调整为C’。随后将特征图经过矩阵转置与变形操作，得到尺寸为HW×C’的二维特征图。将输入的粗预测图进行矩阵变形，得到尺寸为N×HW的类别注意力矩阵P，将类别注意力矩阵P与上述二维特征图进行矩阵相乘，得到N×C’的类中心。再将该类中心转置为C’×N的二维矩阵与N×HW的类别注意力矩阵相乘，经过矩阵变形后得到尺寸为C’×H×W的特征图。最后，将该特征图通过另一个卷积核为1×1的卷积层，最终输出尺寸为C×H×W的类别优化特征。所述经过类别注意力模块的类别优化特征将与基于反卷积的解码器中的特征进行融合，最后通过分类层得到精细的道路类别提取结果。The category attention module is used to enhance the subdivision ability of road categories. By checking the consistency of each pixel with the calculated class center, the network can understand the overall presentation of each class from a global perspective. This module first passes the input feature map through a convolution layer with a convolution kernel of 1×1, and adjusts the number of channels to C’. The feature map is then subjected to matrix transposition and deformation operations to obtain a two-dimensional feature map with a size of HW×C’. Perform matrix deformation on the input rough prediction map to obtain a category attention matrix P of size N×HW. Matrix multiply the category attention matrix P and the above two-dimensional feature map to obtain the class center of N×C’. The class center is then transposed into a C’×N two-dimensional matrix and multiplied by the N×HW category attention matrix. After matrix deformation, a feature map of size C’×H×W is obtained. Finally, the feature map is passed through another convolution layer with a convolution kernel of 1×1, and the final output is a category optimized feature with a size of C×H×W. The category-optimized features that pass through the category attention module will be fused with the features in the deconvolution-based decoder, and finally the refined road category extraction results will be obtained through the classification layer.

在一种实施方式中，所述方向特征解码器，为基于反卷积的解码器，通过一系列反卷积和过渡卷积使得特征图逐步还原至输入尺寸。In one implementation, the directional feature decoder is a deconvolution-based decoder, which gradually restores the feature map to the input size through a series of deconvolution and transition convolution.

在一种实施方式中，请见图4，所述多类别道路提取网络，是训练好的网络；训练过程包括以下子步骤：In one implementation, please see Figure 4. The multi-category road extraction network is a trained network; the training process includes the following sub-steps:

步骤2.1：建立数据集；Step 2.1: Create data set;

获取包含道路的高分辨率遥感影像数据，经过裁剪等预处理后，按照训练集80％，验证集10％以及测试集10％的比例进行数据划分。随后人工划分各影像块的道路像素类别，共划分为铁路、公路、小路以及桥梁4个类别，得到如图5所示的对应的道路类别标签。随后利用二值化的道路类别标签，即道路掩膜，生成道路方向标签，计算步骤如图6所示：①利用骨架细化算法得到道路中心线及其拐点、端点等关键点。②依次连接关键点(p₁,p₂,…,p_i)，计算相邻关键点相连形成的单位方向向量随后将方向向量转换到极坐标系下计算其对应的道路方向角度值∠θ；③此后将某像素点M与最近的关键点P₁相连形成连线向量使用公式(2)计算在单位道路方向向量上的投影长度，若投影长度在给定的道路径向宽度阈值λ_Road内，则该点像素的像素被分配与道路单位方向向量相同的方向值O_r，对于所有其他像素，分配非道路方向角度O_b。④随后遍历图中所有像素点完成上述操作，最终得到对应的道路方向标签。Obtain high-resolution remote sensing image data including roads. After pre-processing such as cropping, the data is divided according to the ratio of 80% of the training set, 10% of the verification set and 10% of the test set. The road pixel categories of each image block are then manually divided into four categories: railways, highways, paths, and bridges, and the corresponding road category labels are obtained as shown in Figure 5. Then the binarized road category label, that is, the road mask, is used to generate the road direction label. The calculation steps are shown in Figure 6: ① Use the skeleton refinement algorithm to obtain the road centerline, its inflection points, end points and other key points. ② Connect the key points (p ₁ , p ₂ ,..., p _i ) in sequence, and calculate the unit direction vector formed by connecting adjacent key points Then the direction vector is converted to the polar coordinate system to calculate its corresponding road direction angle value ∠θ; ③ After that, a certain pixel point M is connected to the nearest key point P ₁ to form a connection vector Use formula (2) to calculate the projection length on the unit road direction vector. If the projection length is within the given road directional width threshold λ _Road , then the pixel of the point pixel is assigned the same direction value O as the road unit direction vector. _r , and for all other pixels, assign the non-road direction angle O _b . ④ Then traverse all pixels in the image to complete the above operation, and finally obtain the corresponding road direction label.

最后对训练集数据集内的遥感影像、道路类别标签以及道路方向标签采取同步的数据增强操作，完成数据集建立。Finally, synchronized data enhancement operations are performed on the remote sensing images, road category labels, and road direction labels in the training set data set to complete the creation of the data set.

步骤2.2：训练网络模型，保存最优参数；Step 2.2: Train the network model and save the optimal parameters;

将步骤1中的遥感道路影像训练数据输入多类别道路提取网络模型中，进而训练该网络模型，过程中使用联合损失函数进行损失计算，利用SGD优化器实现参数更新优化。训练中每2个epoch利用验证集计算一次平均交并比精度，若精度更优，则保存当前网络参数，训练过程持续到网络收敛，可获得最优的网络模型参数。Input the remote sensing road image training data in step 1 into the multi-category road extraction network model, and then train the network model. During the process, the joint loss function is used for loss calculation, and the SGD optimizer is used to implement parameter update and optimization. During training, the validation set is used to calculate the average intersection and union ratio accuracy every two epochs. If the accuracy is better, the current network parameters are saved. The training process continues until the network converges, and the optimal network model parameters can be obtained.

联合损失函数由道路方向损失与道路语义损失两部分构成，其中道路方向损失采用交叉熵损失进行损失值计算。对于道路语义损失/>利用基于类别平衡的SoftIOU损失进行计算，如式(3)所示，可有效解决数据集中类别数量不平衡的问题：The joint loss function consists of two parts: road direction loss and road semantic loss, where road direction loss Cross entropy loss is used to calculate the loss value. For road semantic loss/> Calculation using SoftIOU loss based on category balance, as shown in Equation (3), can effectively solve the problem of imbalance in the number of categories in the data set:

其中s表示尺度，其值为{(H,W),(H/2,W/2),(H/4,W/4)}，H为特征图的高度，W为特征图的宽度。和/>分别为预测道路类别结果与真实道路类别结果。/>为类i的有效样本数，E_all为所有样本的总数。where s represents the scale, and its value is {(H,W),(H/2,W/2),(H/4,W/4)}, H is the height of the feature map, and W is the width of the feature map. and/> They are the predicted road category results and the real road category results respectively. /> is the number of valid samples of class i, and E _all is the total number of all samples.

本实施例对于给定的一张包含道路的遥感影像，该网络用一种端到端的方式同时提取道路区域以及对应的方向角度。共享编码器以密集连接卷积块为主要特征提取器，通过密集连接的方式加强特征重用，从而利于实现道路深层特征的挖掘与提取；方向引导堆叠模块则以双分支注意力模块为主要组成部分，它类似于一种迷你的编解码器结构，在加强多尺度捕获空间上下文信息的同时，利用坐标注意力模块计算各尺度输入特征的水平和垂直方向的空间注意力权重，然后将其输入残差块，生成细化特征图，实现道路特征校正与粗预测，两个双分支注意力模块中的融合层则可以实现方向与语义间的特征交互；语义增强解码分支除了通过反卷积的方式，使得语义分支的输出结果图像尺寸恢复到原图像大小外，还利用了深层监督模块以及类别注意力模块。In this embodiment, for a given remote sensing image containing a road, the network simultaneously extracts the road area and the corresponding direction angle in an end-to-end manner. The shared encoder uses densely connected convolution blocks as the main feature extractor, and strengthens feature reuse through dense connections, thereby facilitating the mining and extraction of deep road features; the direction guidance stacking module uses the dual-branch attention module as the main component , which is similar to a mini codec structure. While enhancing the multi-scale capture of spatial context information, it uses the coordinate attention module to calculate the horizontal and vertical spatial attention weights of input features at each scale, and then inputs them into the residual Difference blocks generate refined feature maps to achieve road feature correction and rough prediction. The fusion layer in the two dual-branch attention modules can realize feature interaction between direction and semantics; the semantic enhancement decoding branch uses deconvolution in addition to , in addition to restoring the image size of the output result of the semantic branch to the original image size, it also uses the deep supervision module and the category attention module.

其中，深层监督模块由转置卷积层、卷积层和分类层组成，利用原始解码器中不同尺度的特征图生成粗略预测和类别注意力模块所需的融合特征，此外深层监督的方式能让不同尺度的中间层直接从类别真实标签中学习道路特征，减轻不稳定梯度的不利影响。类别注意力模块能够有效增加不同道路类的特征区分度，通过计算类别中心来使得网络能够从类别的全局视角对于特征进行优化。在实际计算中，给定通道数为C，高度为H，宽度为W的特征图后，应用卷积核为1×1的卷积层将通道数减少到C'。然后通过转置和变形，与粗预测图/>变形后的特征/>相乘，进而得到类中心。受注意力机制的启发，本实施例将变形后的粗预测图视为注意力图，并与转置后的类别中心进行矩阵乘法运算。通过计算可以得到每个像素的特定类别注意力特征/>计算公式如下：Among them, the deep supervision module consists of a transposed convolution layer, a convolution layer and a classification layer. It uses feature maps of different scales in the original decoder to generate the fusion features required for rough prediction and category attention modules. In addition, the deep supervision method can Let the intermediate layers at different scales learn road features directly from the category ground truth labels, mitigating the adverse effects of unstable gradients. The category attention module can effectively increase the feature distinction of different road classes, and by calculating the category center, the network can optimize features from a global perspective of the category. In actual calculations, a feature map with a channel number of C, a height of H, and a width of W is given. Finally, a convolutional layer with a convolution kernel of 1×1 is applied to reduce the number of channels to C'. Then through transposition and deformation, with the coarse prediction map/> Characteristics after deformation/> Multiply together to get the class center. Inspired by the attention mechanism, this embodiment treats the deformed rough prediction map as an attention map, and performs matrix multiplication with the transposed category center. The category-specific attention features of each pixel can be obtained by calculation/> Calculated as follows:

计算出的类别注意特征通过卷积核为1×1的卷积层进一步调整特征大小为C×H×W，然后与语义解码器的特征融合。最后，通过分类可以得到精细的道路分割结果。The calculated category attention features are further adjusted to a feature size of C×H×W through a convolutional layer with a convolution kernel of 1×1, and then fused with the features of the semantic decoder. Finally, fine road segmentation results can be obtained through classification.

为了进一步验证本方法的有效性和可行性，本实施例进行如下实验。In order to further verify the effectiveness and feasibility of this method, the following experiments were conducted in this example.

本实验使用Pytorch1.10作为深度学习网络搭建框架，开发环境为Python3.6。数据集采用了自建的多类道路数据集，为了充分搜集各类道路数据，数据集的遥感影像来源于Google Earth以及3个公开的道路数据集：Deepglobe数据集、RoadNet数据集、CHN6-CUG数据集。对于提取结果评估方面，采用了语义分割准确性的常用指标交并比、平均交并比和频权交并比来进行精度评价。This experiment uses Pytorch1.10 as the deep learning network construction framework, and the development environment is Python3.6. The data set uses a self-built multi-type road data set. In order to fully collect all types of road data, the remote sensing images of the data set come from Google Earth and three public road data sets: Deepglobe data set, RoadNet data set, CHN6-CUG data set. For the evaluation of extraction results, the commonly used indicators of semantic segmentation accuracy are intersection-union ratio, average intersection-union ratio, and frequency-weighted intersection-union ratio for accuracy evaluation.

表1给出了采用本发明方法与其他方法在遥感图像道路提取任务的定量对比结果。图7给出了可视化对比结果。Table 1 shows the quantitative comparison results between the method of the present invention and other methods in the road extraction task of remote sensing images. Figure 7 shows the visual comparison results.

表1本发明所述方法与其他方法的道路提取精度对比Table 1 Comparison of road extraction accuracy between the method of the present invention and other methods

从表1和图7所示的结果来看，本发明所述的方法在多类道路提取上效果显著，不仅在四类道路类型上提取总体精度达到最优，而且在道路的连通性、分类准确性上均相较于其他方法有较为显著的优势。Judging from the results shown in Table 1 and Figure 7, the method described in the present invention is effective in extracting multiple types of roads. Not only does the overall accuracy of extraction reach the optimal level on the four types of road types, but it also improves road connectivity and classification. Compared with other methods, it has significant advantages in accuracy.

应当理解的是，上述针对较佳实施例的描述较为详细，并不能因此而认为是对本发明专利保护范围的限制，本领域的普通技术人员在本发明的启示下，在不脱离本发明权利要求所保护的范围情况下，还可以做出替换或变形，均落入本发明的保护范围之内，本发明的请求保护范围应以所附权利要求为准。It should be understood that the above description of the preferred embodiments is relatively detailed and cannot therefore be considered to limit the scope of patent protection of the present invention. A person of ordinary skill in the art, under the inspiration of the present invention, may not deviate from the claims of the present invention. Within the scope of protection, substitutions or modifications can be made, all of which fall within the scope of protection of the present invention. The scope of protection claimed by the present invention shall be determined by the appended claims.

Claims

1. A multi-category road automatic extraction method that combines direction and semantic features, which is characterized by including the following steps:

Step 1: Obtain remote sensing image data and perform data enhancement operations;

Step 2: Input the enhanced remote sensing image data into the multi-category road extraction network and output an accurate multi-category road extraction result map;

The multi-category road extraction network includes a dense feature sharing encoder, a direction-guided feature stacking module, a semantic feature enhancement decoder and a directional feature decoder;

The dense feature sharing encoder enables users to obtain multi-scale refined road feature maps; the direction-guided feature stacking module is used to introduce road direction features and fully integrate them with road semantic features to obtain directional information fusion The feature map; the feature map after fusion of direction information is used for dual-branch decoding using the semantic feature enhancement decoder and the direction feature decoder respectively.

2. The multi-category road automatic extraction method combining direction and semantic features according to claim 1, characterized in that: in step 1, the data enhancement operations include random flipping, random rotation, random erasing and other operations, which are enhanced. the subsequent data.

3. The multi-category road automatic extraction method of combining direction and semantic features according to claim 1, characterized in that: in step 2, the dense feature sharing encoder consists of 1 initialization convolution layer and 2 dense connections. block and a maximum pooling layer; the densely connected block is composed of several intermediate layers, and the intermediate layer is composed of a normalization layer, a nonlinear activation layer, and a convolution layer. The input of each intermediate layer comes from all the previous The relationship between the output of the middle layer and the previous feature map is obtained by equation (1):

X _N =H _N ([X ₀ ,X ₁ ,…,X _N-1 ]) (1)

In the _formula _, [X ₀ _,

4. The multi-category road automatic extraction method combining direction and semantic features according to claim 1, characterized in that: in step 2, the direction-guided feature stacking module consists of two dual-branch attention modules and a fusion module. Built from modules;

The dual-branch attention module includes two network branches with the same structure, namely a semantic branch and a direction branch, which are used to learn road semantic information and direction information respectively; the two branches share the downsampling part of the entire module and the stride connection part, with a separate network propagation path in the upsampling part;

The downsampling part includes three groups of max pooling layers and residual blocks set in series; the max pooling layer of each group uses a 2×2 kernel to reduce the redundant information of the feature map, and then uses the residual block to extract different Road information at scale;

The input of the upsampling part is the high-dimensional road feature information finally obtained by the downsampling part, and then a series of residual modules and nearest neighbor interpolation upsampling operation feature maps are used to gradually restore the road features to the original size;

The stride connection part includes three sets of parallel coordinate attention modules and residual blocks; the input of the coordinate attention module is a feature map with a channel number of C, a width of W, and a height of H, with length and height respectively. Average pooling is performed in two wide dimensions, and then the channel splicing operation is performed. The spliced feature map is passed through a convolution layer, a batch normalization layer and a sigmoid layer in order to achieve feature dimensionality reduction; then the feature map is re-separated. , the two feature maps with embedded direction-specific information are encoded into attention maps in two directions through a convolution layer and a sigmoid layer respectively. Each attention map captures the long-range dependence of the feature map along the horizontal or vertical direction. relationship; then the attention maps in both directions are applied to the input feature map through multiplication to emphasize the spatial feature representation of the road area; finally, through a residual block and a convolution layer, the feature map is finally restored to the original resolution Rate;

The fusion module includes an intermediate supervision part and a branch fusion part; after the feature maps of the semantic branch and the direction branch are restored to the original resolution in the upsampling part, a rough road category prediction is obtained through a classification convolution layer in the intermediate supervision part Map or road direction prediction map, the loss value is calculated directly with the ground truth map of the corresponding scale to achieve intermediate supervision; the branch fusion part, based on the semantic branch and direction branch output, uses a convolution layer to restore the rough road category prediction map to Initial channel number, and preliminary fusion of this feature with the output feature through the convolutional layer; in the first dual-branch attention module, the direction feature will be fused with the semantic branch feature, ultimately realizing information flow sharing between tasks.

5. The multi-category road automatic extraction method combining direction and semantic features according to claim 1, characterized in that: in step 2, the semantic feature enhanced decoder includes a deconvolution-based decoder, a deep layer Supervision module and a category attention module;

The deconvolution-based decoder gradually restores the feature map to the input size through a series of deconvolution and transition convolution;

The deep supervision module uses the feature maps of different scales in the deconvolution-based decoder and the true values of the category labels of the corresponding scales to supervise and generate a rough classification map;

The category attention module first passes the input feature map through a convolution layer with a convolution kernel of 1×1, and adjusts the number of channels to C'; then the feature map is subjected to matrix transposition and deformation operations to obtain a size of HW ×C' two-dimensional feature map; then perform matrix deformation on the input rough prediction map to obtain a category attention matrix P of size N×HW, and perform matrix multiplication of the category attention matrix P and the above two-dimensional feature map, Obtain the class center of N×C'; then transpose the class center into a two-dimensional matrix of C'×N and multiply it with the category attention matrix of N×HW. After matrix deformation, the size is C'×H×W. The feature map; finally, the feature map is passed through another convolution layer with a convolution kernel of 1×1, and the final output is a category optimization feature with a size of C×H×W;

The category-optimized features that pass through the category attention module will be fused with the features in the deconvolution-based decoder, and finally the refined road category extraction results will be obtained through the classification layer.

6. The multi-category road automatic extraction method that combines direction and semantic features according to claim 1, characterized in that: in step 2, the direction feature decoder is a deconvolution-based decoder, which uses a series of inverse Convolution and transitional convolution gradually restore the feature map to the input size.

7. The multi-category road automatic extraction method combining direction and semantic features according to any one of claims 1-6, characterized in that: the multi-category road extraction network is a trained network;

The training process includes the following sub-steps:

Step 2.1: Create a multi-category road data set;

The high-resolution remote sensing image data containing roads are randomly divided into training sets, verification sets, and test sets according to preset proportions. At the same time, the categories of road pixels in the corresponding areas are manually calibrated. The categories are railways, highways, roads, and bridges, and the road category labels are obtained. ; Then use the road direction calculation method and use the binarized road category labels to generate road direction labels; finally, perform data enhancement operations on the remote sensing images, road category labels, and road direction labels in the training set data set to complete the creation of the data set;

Step 2.2: Input the training set data into the multi-category road extraction network and perform network training;

During the training process, parameter update optimization is achieved by adjusting the learning rate using the SGD optimizer; given the road category labels and road direction labels in the training set, the joint loss function is used to train the network; after the training is completed, the optimal Network parameters.

8. The multi-category road automatic extraction method combining direction and semantic features according to claim 7, characterized in that: in step 2.1, the binarized road category labels are used to generate road direction labels, and the specific process is as follows:

(1) Use the skeleton refinement algorithm to extract the road centerline and its key points, and arrange the key point coordinates from top to bottom and from left to right;

(2) Connect the key points (p ₁ , p ₂ ,..., p _i ) in sequence, and calculate the unit direction vector formed by connecting adjacent key points Then convert the direction vector to the polar coordinate system to calculate its corresponding road direction angle value;

(3) Afterwards, a certain pixel point M is connected to the nearest key point p _i to form a connection vector, and the vector in the unit direction is calculated. If the projection length is within the automatically calculated road directional width threshold, the pixel of the point pixel is assigned the same direction value as the road unit direction vector;

(4) Traverse all pixels in the image and finally obtain the corresponding road direction label.

9. The multi-category road automatic extraction method of combining direction and semantic features according to claim 7, characterized in that: in step 2.2, the joint loss function consists of two parts: road direction loss and road semantic loss, wherein road direction loss Cross-entropy loss is used to calculate the loss value; for road semantic loss/> Calculated using SoftIOU loss based on class balance;

where s represents the scale, its value is {(H,W),(H/2,W/2),(H/4,W/4)}, H is the height of the feature map, and W is the width of the feature map; and/> They are the predicted road category results and the real road category results respectively;/> is the number of valid samples of class i, E _all is the total number of all samples;

Finally, the total loss function L _total is defined as follows:

10. A multi-category road automatic extraction system that combines direction and semantic features, which is characterized by:

one or more processors;

A storage device configured to store one or more programs, which when executed by the one or more processors, causes the one or more processors to implement any of claims 1 to 9 An automatic multi-category road extraction method that combines directional and semantic features.