[go: up one dir, main page]

CN115731461B - A method for extracting buildings from optical remote sensing images with multi-layer feature decoupling - Google Patents

A method for extracting buildings from optical remote sensing images with multi-layer feature decoupling

Info

Publication number
CN115731461B
CN115731461B CN202211377963.4A CN202211377963A CN115731461B CN 115731461 B CN115731461 B CN 115731461B CN 202211377963 A CN202211377963 A CN 202211377963A CN 115731461 B CN115731461 B CN 115731461B
Authority
CN
China
Prior art keywords
feature
semantic
features
deep
building
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211377963.4A
Other languages
Chinese (zh)
Other versions
CN115731461A (en
Inventor
庄胤
李健昊
董珊
陈禾
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202211377963.4A priority Critical patent/CN115731461B/en
Publication of CN115731461A publication Critical patent/CN115731461A/en
Application granted granted Critical
Publication of CN115731461B publication Critical patent/CN115731461B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

本发明公开了一种多层特征解耦的光学遥感图像建筑物提取方法。本发明首先利用多层特征解耦网络提取建筑物多尺度特征并进行分解,获得更稳定的语义主体特征和不确定的语义边界特征;然后基于语义主体特征和不确定的语义边界特征的不同,采用双流语义特征描述网络采用不同的方式进行逐步融合,加深强语义区域中深层特征中的语义表示,并保留弱语义区域中更多的细节信息;最后利用多任务监督方法,在保证建筑物主体部分完整性的同时提高建筑物边缘精确性,实现高分辨率光学遥感图像建筑物的高性能提取。本发明能够显著提高对高分辨率光学遥感图像建筑物的提取效果,可实现复杂环境中的不同尺度、不同空间分布的建筑物的精确提取。

This invention discloses a multi-layer feature decoupling method for extracting buildings from optical remote sensing images. First, a multi-layer feature decoupling network is used to extract and decompose multi-scale features of buildings, obtaining more stable semantic subject features and uncertain semantic boundary features. Then, based on the differences between the semantic subject features and uncertain semantic boundary features, a dual-stream semantic feature description network is used to gradually fuse them in different ways, deepening the semantic representation in deep features in strong semantic regions and preserving more detailed information in weak semantic regions. Finally, a multi-task supervision method is used to improve the accuracy of building edges while ensuring the integrity of the main body of the building, achieving high-performance extraction of buildings from high-resolution optical remote sensing images. This invention can significantly improve the extraction effect of buildings from high-resolution optical remote sensing images and can accurately extract buildings of different scales and spatial distributions in complex environments.

Description

Optical remote sensing image building extraction method with multi-layer characteristic decoupling
Technical Field
The invention relates to the technical fields of remote sensing image processing and building extraction, in particular to a multi-layer characteristic decoupling optical remote sensing image building extraction method.
Background
Building extraction plays an important role in city planning, illegal building monitoring, geographical information exploration and the like. With the rapid development of high resolution optical remote sensing images, more and more data are available for building extraction. However, when faced with massive amounts of optical remote sensing data, manually extracting buildings is a time-consuming and labor-consuming task, which requires an automatic and efficient building extraction algorithm to solve this problem. The buildings exhibit different morphologies and spatial distributions in different environments, and at the same time, limited by imaging conditions, the buildings may have a low contrast with the surrounding environment, which can present challenges for accurate extraction of the buildings. In the face of increasingly complex building extraction tasks, conventional methods relying on the features of the building itself, such as spectrum, shape, color, texture, and shading, have failed to meet the actual needs of people. With the continuous development of deep learning, convolutional neural networks with strong feature extraction and generalization learning capabilities are increasingly and widely applied to tasks of building extraction, and the performance of an automatic building extraction algorithm is remarkably improved.
To better extract buildings accurately from complex environments, many studies have improved building extraction performance by increasing semantic feature description capabilities. Such as by introducing a spatial pyramid pooling module in the encoding-decoding structure to increase the receptive field, and enhancing the feature descriptors with different convolution forms such as asymmetric convolution, dense upsampling convolution, etc. Many studies have focused on improving building edge extraction accuracy by introducing attention mechanisms in the U-Net network to better perform feature fusion or by introducing semantic edge information in feature fusion to further refine the boundaries of irregular buildings. Many studies further optimize the extraction algorithm for the problem of large differences in building volumes. The method can accurately extract building information of different scales by designing independent shape prediction branches, optimize a multi-scale feature fusion mechanism by a multi-scale attention model, or adopt various forms of image post-processing operation to improve the integrity and accuracy of building extraction. However, with the continuous development of high-resolution optical remote sensing images, detailed information contained in the images is continuously increased, differences among classes of buildings are continuously reduced, differences in the classes are continuously increased, and the method can generate more over-extraction and under-extraction results and has poor extraction precision.
Disclosure of Invention
In view of the above, the invention provides a multi-layer feature decoupling optical remote sensing image building extraction method, which constructs powerful semantic feature expression capability in a multi-layer feature decoupling mode, improves the edge accuracy of a building while guaranteeing the integrity of a building main body, and improves the building extraction performance.
The invention relates to a multi-layer characteristic decoupling optical remote sensing image building extraction method, which comprises the following steps:
Firstly, carrying out multi-scale feature extraction on an optical remote sensing image to obtain multi-layer feature images with different scales;
step two, carrying out feature decomposition on the feature map obtained in the step one, namely calculating the offset of each feature in the adjacent deep feature map by using the feature flow field by taking the adjacent shallow feature map as a reference; correcting the deep feature map based on the offset to obtain more stable semantic main body features in a strong semantic region representing the main body part of the building, and then obtaining uncertain semantic boundary features in a weak semantic region representing the edge part of the building in the deep feature map by utilizing subtraction operation;
step three, respectively fusing the more stable semantic main body features and the uncertain semantic boundary features obtained in the step two layer by layer to obtain a plurality of pixel-level prediction graphs, wherein the strong semantic regions are fused from deep to shallow, and the weak semantic regions are fused from shallow to deep;
And step four, respectively performing supervised learning on the plurality of pixel-level prediction graphs obtained in the step three based on the multi-task joint loss function by using a multi-task supervision method, and completing building extraction.
Preferably, in the first step, a AlexNet, VGGNet, resNet, resNeXt or DenseNet network is used to perform feature extraction on the optical remote sensing image.
Preferably, the second step specifically includes the following sub-steps:
S2.1, feature preprocessing, namely, marking F and F ' as relative shallow features and relative deep features of two adjacent feature layers, and marking the deep features as F ' ' by changing the deep features into the same size as the shallow features F;
S2.2, generating a characteristic flow field, namely cascading the deep layer characteristic F '' obtained in the step S2.1 with the shallow layer characteristic F and convolving the deep layer characteristic F to obtain a flow field delta;
S2.3, generating more stable semantic main body features in the strong semantic region, namely obtaining the offset corresponding to each feature in the deep feature map F ' ' according to the flow field delta, correcting the deep feature map F ' ' based on the offset, and obtaining more stable semantic main body features F ' MainBody in the strong semantic region;
S2.4, generating uncertain semantic boundary features in the weak semantic region, namely obtaining uncertain semantic boundary features F' Uncerta in Boundary from a deep feature map F″ by using a subtraction operation;
s2.5, sequentially combining the multilayer feature images in the first step, repeating the steps S2.1-S2.4 to generate N-1 groups of more stable semantic bodies and uncertain semantic boundaries, wherein N is the total layer number of the feature images obtained in the first step.
Preferably, in S2.1, the deep features are changed to the same size as the shallow features using a 1×1 convolution and upsampling operation.
Preferably, a feature flow warping operation is used to correct the deep feature map.
Preferably, in the third step, when the features of the adjacent feature layers are fused, the attention mechanism is used to give weight to the features of the adjacent feature layers.
Preferably, the Sigmoid function is used as a gate to weight the features of the adjacent feature layers.
Preferably, in the third step, the following manner is adopted for fusion:
Wherein, the AndRespectively represent the feature layers after the attention mechanism is adopted, Z represents the feature fusion result, G X and G Y respectively represent the attention coefficients obtained from the selection gates,
Gx=Sigmoid(conv1×1(X))
Gy=Sigmoid(conv1×1(Y))
Where Sigmoid (·) represents performing a Sigmoid operation and conv 1×1 (·) represents a 1 x1 convolution.
Preferably, in the fourth step, each subtask is supervised and learned by using a cross entropy loss function.
The beneficial effects are that:
(1) The method comprises the steps of firstly extracting multi-scale features of a building by utilizing a multi-layer feature decoupling network and decomposing the multi-scale features to obtain more stable semantic main features and uncertain semantic boundary features, then gradually fusing by adopting a double-flow semantic feature description network in different modes based on the difference of the semantic main features and the uncertain semantic boundary features, deepening semantic representation in deep features in a semantic region, retaining more detail information in a weak semantic region, and finally improving the accuracy of edges of the building while guaranteeing the integrity of the main body part of the building by utilizing a multi-task supervision method to realize high-performance extraction of a high-resolution optical remote sensing image building. Compared with the prior art, the method can obviously improve the extraction effect on the high-resolution optical remote sensing image building. Especially when facing buildings with different dimensions and different spatial distributions in complex environments, the method can ensure the integrity of the main body of the building and the edge accuracy of the building, and reduce the occurrence of over-extraction and under-extraction. The method is based on an encoding-decoding framework, utilizes multi-layer feature decoupling and double-flow semantic feature description networks to improve semantic description capability, greatly improves building extraction performance, and has good practical application value.
(2) According to the invention, the offset of the deep feature map features is obtained based on the feature flow field, then the deep feature map is corrected by utilizing the feature flow distortion operation, the deep feature can be adaptively adjusted and aligned, the feature positioning capability is improved, and the more stable semantic main feature is obtained.
(3) When the features of the adjacent feature layers are fused, the complementarity of the adjacent feature layers is considered, the attention mechanism is utilized to select the features of the adjacent feature layers and guide the complementation information fusion, the fusion of invalid feature information can be obviously reduced, and the fusion can be more efficient and reasonable.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of the characteristic structure module structure of the invention.
FIG. 3 is a dual stream semantic feature description framework of the present invention.
Fig. 4 is a schematic diagram of a component fusion module.
Detailed Description
The invention will now be described in detail by way of example with reference to the accompanying drawings.
The invention provides a multi-layer characteristic decoupling optical remote sensing image building extraction method. The flow chart of the method is shown in fig. 1, and specifically comprises the following steps:
Step one, multi-scale feature extraction is carried out on the high-resolution optical remote sensing image, and multi-layer feature diagrams with different scales are obtained.
The step can adopt AlexNet, VGGNet, resNet, resNeXt, denseNet and other feature extraction networks to realize multi-scale feature extraction.
In this embodiment, a ResNet backbone network is used to perform feature extraction on the input high-resolution optical remote sensing image, as shown in fig. 1 (a). Specifically, the original optical image is input ResNet into a feature extraction network, deep feature information is obtained through multiple rolling and pooling operations, and the obtained 4-layer feature images are respectively marked as F 1,F2,F3,F4 from shallow to deep.
And step two, decomposing the feature map obtained in the step one to obtain a more stable semantic main body representing the main body part of the building and an uncertain semantic boundary in a weak semantic region representing the edge part of the building.
The continuous convolution operation in the first step can lose building detail information and have the problem of feature misalignment while stabilizing deep semantic features. Therefore, the deep feature map is corrected by introducing the feature flow field and taking the adjacent low-layer features as references to align the deep features and utilizing the offset corresponding to each feature of the deep feature map obtained in the feature flow field, so that a more stable semantic main body in a strong semantic region is obtained.
As shown in fig. 1 (b), the method can be specifically divided into the following 5 sub-steps:
s2.1, feature pretreatment, namely, as shown in FIG. 2, marking F and F' as the relative shallow features and the relative deep features of two adjacent feature layers. F' is a deep feature with a greater number of data channels and smaller spatial dimensions. To decompose F' using the shallow feature F as a reference, the deep feature is first changed to the same size as the shallow feature F, denoted as F ", using a1 x 1 convolution and upsampling operation, as shown in the following equation:
F′′=Up(conv1×1(F′))
Where Up (-) represents the Up-sampling operation and conv 1×1 (-) represents the 1 x1 convolution.
S2.2, generating a characteristic flow field, wherein the characteristic flow field is introduced to enable the network to automatically learn characteristic dislocation information. First, the deep features f″ obtained in the sub-step (1) are cascaded with the shallow features F, and then a 3×3 convolution is used to obtain the flow field δ. The flow field delta has two dimensions, representing the direction of departure of each feature point in the flow field.
δ=conv3×3(cat(F,F′′))
Where cat (-) represents the cascading operation and conv 3×3 (-) represents the 3 x 3 convolution.
S2.3, generating more stable semantic main body characteristics in the strong semantic region, namely obtaining the offset corresponding to each characteristic in the relative deep characteristic diagram F ' ' according to the flow field delta, and correcting the deep characteristic diagram F ' ' by means of the characteristic flow distortion operation to obtain more stable semantic main body characteristics F ' MainBody in the strong semantic region, wherein the formula is as follows:
Where ψ (·) represents the feature flow warping operation, ρ is a feature point in the deep feature map f″, N (ρ) represents feature points around the warped feature point ρ, ω ρ is the ρ corresponding offset.
S2.4, generating uncertain semantic boundary features in the weak semantic region, wherein the uncertain semantic boundary features F' Uncerta in Boundary can be obtained from the deep feature map F″ by using a subtraction operation.
F′Uncerta in Boundary=F″-F′MainBody
And S2.5, generating three groups of more stable semantic bodies and uncertain semantic boundaries, namely combining adjacent feature images of the 4-layer feature images obtained in the step one two by two, and repeating the sub-steps S2.1-S2.4 to generate three groups of more stable semantic bodies and uncertain semantic boundaries.
And thirdly, based on a component fusion module, describing a network structure by utilizing double-flow semantic features, and respectively fusing the three groups of more stable semantic bodies and the uncertain semantic boundaries obtained in the second step layer by layer. Features belonging to a strong semantic region and a weak semantic region are respectively integrated into two parallel branches, feature fusion is carried out from deep to shallow in the strong semantic region, semantic representation in deep features is deepened, fusion is carried out from shallow to deep in the weak semantic region, and more detail information is reserved.
The specific fusion mode is that firstly, up-sampling, channel compression and other operations are utilized, and feature layers with different depths have the same channel number and space size. And then fusing the processed adjacent feature layer features together through cascading operation, wherein the strong semantic region adopts feature fusion from deep to shallow, and the weak semantic region adopts fusion from shallow to deep.
Furthermore, the complementarity of adjacent feature layers is considered, and the invention introduces a attention mechanism to select and guide the complementarity information fusion in the fusion process, so that the fusion of invalid feature information can be obviously reduced, and the fusion can be more efficient and reasonable.
As shown in fig. 3, the present embodiment is specifically divided into the following 6 substeps:
s3.1, before layer-by-layer feature fusion, up-sampling, down-sampling, channel compression and other operations are used for adjusting the feature graphs with different channel numbers and space sizes obtained in the step two, so that the feature graphs finally have the same channel numbers and space sizes.
S3.2, the designed component feature fusion module is utilized to efficiently fuse the features of the strong semantic region and the weak semantic region. As shown in fig. 4, X and Y are adjacent feature layers adjusted in the substep S3.1, and are used as inputs of the component feature fusion module. Considering the complementarity of adjacent feature layers, the mutual fusion of complementary information is selected and guided by means of the attention mechanism using Sigmoid function as gate, and the process can be represented by the following formula:
Wherein, the AndRepresenting the optimized feature layer, Z represents the output of the component feature fusion module, and G X and G Y represent the attention coefficients from the select gates, respectively, which can be obtained by the following equation:
Gx=Sigmoid(conv1×1(X))
Gy=Sigmoid(conv1×1(Y))
Where Sigmoid (·) represents performing a Sigmoid operation and conv 1×1 (·) represents a 1 x1 convolution.
And S3.3, repeating the substep S3.2, and sequentially fusing the features of each layer obtained in the step two to obtain fused features of the fused strong semantic region and weak semantic region, wherein the more stable semantic main feature fusion in the strong semantic region follows a fusion sequence from top to bottom, and the uncertain semantic boundary feature fusion in the weak semantic region is carried out according to the sequence from bottom to top. The size of the feature space after fusion is 1/4 of the input feature map, and the number of channels is 2 times of the input feature map.
And S3.4, carrying out depth and parallel multi-rate hole convolution on the fused strong semantic region and weak semantic region obtained in the substep S3.3, wherein the hole rate is 1,2 and 5 respectively. The feature fusion is then refined using a 1x1 convolution, resulting in a more stable semantic body and an uncertain semantic boundary.
And S3.5, adding the more stable semantic body obtained in the substep (4) and the uncertain semantic boundary through a point-by-point addition operation to obtain the complete building feature.
And S3.6, respectively obtaining three groups of pixel-level prediction graphs by using the prediction structure module shown in FIG. 3. The structure can be represented by the following formula:
P=Up(ReLU(BN(conv1×1(ReLU(BN(conv3×3(F)))))))
Wherein F represents the final feature, P represents the predicted graph obtained through the predicted architecture module, up (-) represents the upsampling operation, reLU (-) represents the convolution using the ReLU activation function, conv 1×1 (-) represents the convolution of 1X 1, conv 3×3 (-) represents the convolution of 3X 3, and BN (-) represents the batch normalization operation.
And step four, utilizing a multitask supervision method to respectively supervise and optimize the more stable semantic body, the uncertain semantic boundary and the three groups of predictive graphs of the building generated in the step three based on the cross entropy loss function, as shown in fig. 1 (d). The supervision of the more stable semantic subjects and the uncertain semantic boundaries is auxiliary supervision, so that the more complete building subjects can be generated, and meanwhile, the accuracy of the building boundaries is improved.
The method comprises the following steps of:
s4.1, generating a building main body and edge label graph corresponding to the more stable semantic main body and the uncertain semantic boundary in the step three by using morphological image operations such as image erosion and the like.
S4.2, performing supervised learning on each subtask by using a cross entropy loss function, wherein the formula is as follows:
Wherein L S,LB and L E represent the complete building segmentation loss, the more stable semantic body loss in the strong semantic region and the uncertain semantic boundary loss in the weak semantic region, respectively, N represents the number of pixels in the picture, y i E {0,1} represents whether the pixel i belongs to a building, i is a sequence number, and p i E [0,1] is the prediction probability of the pixel i.
S4.3, optimizing the network by using the multi-task joint loss function, and improving the extraction performance of the building, wherein the extraction performance can be represented by the following formula:
Ltotal=λ1·LS2·LB3·LE
where L total is the total loss of the multitasking, λ 12 and λ 3 are the loss weights corresponding to each task, and are set to 1,20 in the method.
And finally obtaining the building extraction result with high accuracy and low false alarm rate through the first step to the fourth step.
In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1.一种多层特征解耦的光学遥感图像建筑物提取方法,其特征在于,包括:1. A method for extracting buildings from optical remote sensing images with multi-layer feature decoupling, characterized in that it includes: 步骤一、对光学遥感图像进行多尺度特征提取,得到不同尺度的多层特征图;Step 1: Extract multi-scale features from the optical remote sensing image to obtain multi-layer feature maps at different scales; 步骤二、对步骤一得到的特征图进行特征分解,具体为:以相邻的浅层特征图为参考,利用特征流场计算相邻的深层特征图中各特征的偏移量;基于所述偏移量矫正深层特征图,得到代表建筑物主体部分的强语义区域中更稳定的语义主体特征;然后利用减法操作,得到深层特征图中代表建筑物边缘部分的弱语义区域中不确定的语义边界特征;Step 2: Perform feature decomposition on the feature map obtained in Step 1. Specifically, using adjacent shallow feature maps as references, calculate the offset of each feature in adjacent deep feature maps using the feature flow field; correct the deep feature map based on the offset to obtain more stable semantic main features in the strong semantic region representing the main part of the building; then use subtraction to obtain uncertain semantic boundary features in the weak semantic region representing the edge part of the building in the deep feature map. 步骤三、分别逐层融合步骤二得到的多组更稳定的语义主体特征和不确定的语义边界特征,得到多个像素级预测图;其中,强语义区域由深至浅进行特征融合,弱语义区域由浅至深进行融合;Step 3: Fuse the multiple sets of more stable semantic subject features and uncertain semantic boundary features obtained in Step 2 layer by layer to obtain multiple pixel-level prediction maps; among them, strong semantic regions are fused from deep to shallow, and weak semantic regions are fused from shallow to deep. 步骤四,利用多任务监督方法,基于多任务联合损失函数对步骤三得到的多个像素级预测图分别进行监督学习,完成建筑物提取。Step four: Using a multi-task supervision method, supervised learning is performed on the multiple pixel-level prediction maps obtained in step three based on the multi-task joint loss function to complete the building extraction. 2.如权利要求1所述的方法,其特征在于,所述步骤一中,采用AlexNet、VGGNet、ResNet、ResNeXt或DenseNet网络对光学遥感图像进行特征提取。2. The method as described in claim 1, wherein in step one, AlexNet, VGGNet, ResNet, ResNeXt or DenseNet networks are used to extract features from the optical remote sensing image. 3.如权利要求1所述的方法,其特征在于,所述步骤二,具体包括如下子步骤:3. The method as described in claim 1, wherein step two specifically includes the following sub-steps: S2.1,特征预处理:记F,F′为相邻两个特征层的相对浅层特征和相对深层特征;将深层特征变化为与浅层特征F相同的尺寸大小,记作F″;S2.1 Feature preprocessing: Let F and F′ be the relatively shallow and relatively deep features of two adjacent feature layers; transform the deep feature to the same size as the shallow feature F, denoted as F″; S2.2,生成特征流场:将S2.1得到的深层特征F″与浅层特征F级联起来并进行卷积,得到流场δ;S2.2, Generate the characteristic flow field: Concatenate the deep feature F″ obtained in S2.1 with the shallow feature F and perform convolution to obtain the flow field δ; S2.3,生成强语义区域中的更稳定的语义主体特征:根据流场δ,得到深层特征图F″中每个特征对应的偏移量;基于所述偏移量矫正深层特征图F″,得到构成强语义区域中的更稳定的语义主体特征F′MainBodyS2.3, Generate more stable semantic subject features in strong semantic regions: Based on the flow field δ, obtain the offset corresponding to each feature in the deep feature map F″; Based on the offset, correct the deep feature map F″ to obtain more stable semantic subject features F′ MainBody that constitute the strong semantic region; S2.4,生成弱语义区域中的不确定的语义边界特征:利用减法操作,从深层特征图F″中得到不确定的语义边界特征F′UncertainBoundaryS2.4, Generate uncertain semantic boundary features in weak semantic regions: Use subtraction operation to obtain uncertain semantic boundary features F′ UncertainBoundary from deep feature map F″; S2.5,对步骤一中的多层特征图依次进行相邻特征图两两组合,重复S2.1~S2.4,生成N-1组更稳定的语义主体和不确定的语义边界;其中N为步骤一得到的特征图的总层数。S2.5, combine adjacent feature maps in pairs in the multi-layer feature maps from step one, and repeat S2.1 to S2.4 to generate N-1 sets of more stable semantic subjects and uncertain semantic boundaries; where N is the total number of layers of feature maps obtained in step one. 4.如权利要求3所述的方法,其特征在于,所述S2.1中,利用一个1×1卷积和上采样操作,将深层特征变化为与浅层特征相同的尺寸大小。4. The method as described in claim 3, wherein in step S2.1, a 1×1 convolution and upsampling operation are used to transform the deep features to the same size as the shallow features. 5.如权利要求1或3所述的方法,其特征在于,采用特征流扭曲操作来矫正深层特征图。5. The method as described in claim 1 or 3, characterized in that a feature flow distortion operation is used to correct the deep feature map. 6.如权利要求1所述的方法,其特征在于,所述步骤三中,相邻特征层的特征进行融合时,利用注意力机制对相邻特征层的特征赋予权重。6. The method as described in claim 1, wherein in step three, when the features of adjacent feature layers are fused, an attention mechanism is used to assign weights to the features of adjacent feature layers. 7.如权利要求6所述的方法,其特征在于,采用Sigmoid函数作为门对相邻特征层的特征赋予权重。7. The method as described in claim 6, characterized in that a Sigmoid function is used as a gate to assign weights to the features of adjacent feature layers. 8.如权利要求7所述的方法,其特征在于,所述步骤三中,采用如下方式进行融合:8. The method as described in claim 7, characterized in that, in step three, the fusion is performed in the following manner: 其中,分别代表采用注意力机制后的特征层,Z代表特征融合结果,GX和GY分别代表从选择门中得到的注意力系数,in, and Z represents the feature layer after applying the attention mechanism, Z represents the feature fusion result, and G<sub>X</sub> and G<sub> Y </sub> represent the attention coefficients obtained from the selection gate, respectively. Gx=Sigmoid(conv1×1(X))G x = Sigmoid(conv 1×1 (X)) Gy=Sigmoid(conv1×1(Y))G y = Sigmoid(conv 1×1 (Y)) 其中,Sigmoid(·)代表进行Sigmoid操作,conv1×1(·)代表1×1的卷积。Here, Sigmoid(·) represents performing a Sigmoid operation, and conv 1×1 (·) represents a 1×1 convolution. 9.如权利要求1所述的方法,其特征在于,所述步骤四中,利用交叉熵损失函数对每个子任务进行监督学习。9. The method as described in claim 1, wherein in step four, supervised learning is performed on each subtask using the cross-entropy loss function.
CN202211377963.4A 2022-11-04 2022-11-04 A method for extracting buildings from optical remote sensing images with multi-layer feature decoupling Active CN115731461B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211377963.4A CN115731461B (en) 2022-11-04 2022-11-04 A method for extracting buildings from optical remote sensing images with multi-layer feature decoupling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211377963.4A CN115731461B (en) 2022-11-04 2022-11-04 A method for extracting buildings from optical remote sensing images with multi-layer feature decoupling

Publications (2)

Publication Number Publication Date
CN115731461A CN115731461A (en) 2023-03-03
CN115731461B true CN115731461B (en) 2025-11-14

Family

ID=85294523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211377963.4A Active CN115731461B (en) 2022-11-04 2022-11-04 A method for extracting buildings from optical remote sensing images with multi-layer feature decoupling

Country Status (1)

Country Link
CN (1) CN115731461B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117671487B (en) * 2023-11-16 2025-05-06 中国科学院空天信息创新研究院 High-resolution building extraction method and device based on feature decoupling and re-coupling
CN120147320B (en) * 2025-05-16 2025-07-22 哈尔滨工业大学(威海) Borescope image damage detection method based on dual-path decoupling and gated memory fusion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119728A (en) * 2019-05-23 2019-08-13 哈尔滨工业大学 Remote sensing images cloud detection method of optic based on Multiscale Fusion semantic segmentation network
CN112052783A (en) * 2020-09-02 2020-12-08 中南大学 High-resolution image weak supervision building extraction method combining pixel semantic association and boundary attention

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109801293B (en) * 2019-01-08 2023-07-14 平安科技(深圳)有限公司 Remote sensing image segmentation method and device, storage medium and server
CN114387521B (en) * 2022-01-14 2024-05-28 中国人民解放军国防科技大学 Building extraction method for remote sensing images based on attention mechanism and boundary loss
CN114387523B (en) * 2022-03-23 2022-06-03 成都理工大学 Building extraction method from remote sensing images based on DCNN boundary guidance
CN114821069B (en) * 2022-05-27 2024-04-26 昆明理工大学 A dual-branch network remote sensing image building semantic segmentation method integrating scale-rich features

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119728A (en) * 2019-05-23 2019-08-13 哈尔滨工业大学 Remote sensing images cloud detection method of optic based on Multiscale Fusion semantic segmentation network
CN112052783A (en) * 2020-09-02 2020-12-08 中南大学 High-resolution image weak supervision building extraction method combining pixel semantic association and boundary attention

Also Published As

Publication number Publication date
CN115731461A (en) 2023-03-03

Similar Documents

Publication Publication Date Title
US11551333B2 (en) Image reconstruction method and device
CN112767251B (en) Image super-resolution method based on multi-scale detail feature fusion neural network
CN112132959B (en) Digital rock core image processing method and device, computer equipment and storage medium
Zhang et al. Dense haze removal based on dynamic collaborative inference learning for remote sensing images
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN116524307B (en) A self-supervised pre-training method based on diffusion model
CN113554032B (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN108960261B (en) Salient object detection method based on attention mechanism
CN113378933A (en) Thyroid ultrasound image classification and segmentation network, training method, device and medium
CN114048822A (en) An Image Attention Mechanism Feature Fusion Segmentation Method
CN109035267B (en) A deep learning-based image target extraction method
CN115049921B (en) Salient object detection method in optical remote sensing images based on Transformer boundary perception
CN114781499B (en) Method for constructing ViT model-based intensive prediction task adapter
CN114359626B (en) Visible light-thermal infrared salient target detection method based on conditional generative adversarial network
CN115731461B (en) A method for extracting buildings from optical remote sensing images with multi-layer feature decoupling
CN113436198A (en) Remote sensing image semantic segmentation method for collaborative image super-resolution reconstruction
Zhang et al. NHNet: A non‐local hierarchical network for image denoising
CN112164078B (en) RGB-D multi-scale semantic segmentation method based on encoder-decoder
CN120765532B (en) Online detection system and grading method for resistor disc defects based on machine vision
Kavitha et al. Convolutional Neural Networks Based Video Reconstruction and Computation in Digital Twins.
Wang et al. Optimized UNet framework with a joint loss function for underwater image enhancement
CN110633706B (en) Semantic segmentation method based on pyramid network
CN119295752B (en) A remote sensing image road segmentation method combining bidirectional multi-level road feature dynamic fusion and dual-context dynamic extraction.
CN118762040B (en) Intestinal ultrasound image segmentation method, system, device and medium based on diffusion network and deep metric learning
Fan et al. EGFNet: Efficient guided feature fusion network for skin cancer lesion segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant