Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. In addition, the technical features of each embodiment or the single embodiment provided by the invention can be combined with each other at will to form a feasible technical scheme, and the combination is not limited by the sequence of steps and/or the structural composition mode, but is necessarily based on the fact that a person of ordinary skill in the art can realize the combination, and when the technical scheme is contradictory or can not realize, the combination of the technical scheme is not considered to exist and is not within the protection scope of the invention claimed.
In order to solve the problems that the traditional semantic segmentation method needs secondary processing, the operation flow is complex (end-to-end can not be achieved), and the calculation cost is high and the accuracy is high. According to the end-to-end building vector contour extraction method constructed by the invention, through the deep learning network, the input remote sensing image can directly obtain the vector polygonal contour of the building, and compared with a semantic segmentation mode needing secondary processing, the operation flow is more convenient; compared with a pixel-by-pixel dense calculation model, the method only needs to carry out convolution transformation on the points on the outline, and has the advantages of lower calculation cost and higher execution efficiency.
Fig. 1 is a flow chart of a method for extracting a vector outline of a building, which is provided by the invention, as shown in fig. 1, and the method comprises the following steps:
Step 1, a contour extraction model and a vertex classification model are constructed, wherein the contour extraction model is used for extracting vector polygonal contours of building targets in remote sensing images, the vertex classification model is used for classifying vertices of the vector polygonal contours of the building targets extracted by the contour extraction model so as to form final vector polygonal contours of the building targets based on effective vertices in the vector polygonal contours of the building targets, and the contour extraction module and the vertex classification module form the vector polygonal contour extraction model.
The profile extraction model comprises a feature extraction module, a profile initialization module and a profile evolution module, wherein the feature extraction module uses a depth residual error network (DLA 34 model) with a multi-level fusion structure, a rectangular detection frame , for extracting a remote sensing image feature matrix F and a building target represents the abscissa of the ith vertex of the rectangular detection frame, and a/> represents the ordinate of the ith vertex of the rectangular detection frame.
The contour initialization module comprises a vertex offset network, which is used for acquiring the external polygonal contour of the building target, wherein the model reasoning value of the external polygonal contour is recorded as , i is an integer, the actual value of the external polygonal contour of the building target is recorded as/> , i is an integer, and is the vertex number of the external polygonal contour of the building target.
The contour evolution module comprises three vertex offset networks connected in series, and is used for acquiring the vector polygonal contour of the building target, wherein the model reasoning value of the vector polygonal contour is recorded as , i is an integer, the actual value of the vector polygonal contour of the building target is recorded as/> , i is an integer, and/> is the number of vertices of the vector polygonal contour of the building target.
The vertex offset network is shown in fig. 2, and the vertex offset network structure comprises 7 serial ring convolution layers, 1 single convolution layer and 2 serial convolution layers, wherein the 7 serial ring convolution layers are used for extracting detailed information, local information and global information of the remote sensing image, the 1 convolution layer is used for fusing information extracted by the ring convolution layers, and the 2 serial convolution layers are used for obtaining reasoning results.
The torus convolution layer, cirConv shown in fig. 2, is a special convolution operation. Compared with the traditional convolution layer, cirConv copies adj features from the head and tail of the input feature map respectively, and adds the copied head features to the tail, and the copied tail features to the head, so that the head and tail features are connected to form a feature map like a circular ring. The input of the vertex offset network is polygon vertex coordinate array and corresponding feature matrix, and the output is the offset of the vertex coordinates.
Wherein, the adj is set to 4 in the invention. The vertex shift network structure as shown in fig. 2, cirConv in the figure represents a torus convolution; reLU represents a ReLU activation function; BN represents normalization calculation; conv1d represents a one-dimensional convolution calculation. D in the figure represents the expansion parameter (dilation) of the convolution layer, k represents the convolution kernel size parameter (kernel) of the convolution layer, and the convolution kernel sizes of CirConv layers without the k parameter are all 9.
As an embodiment, the training process of the contour extraction model includes: constructing a first training data set of the contour initialization module and a second training data set of the contour evolution module; training the contour initialization module based on the first training data set, and training the contour evolution module based on the second training data set.
The process of constructing the first training data set of the contour initialization module and the second training data set of the contour evolution module mainly comprises the following steps:
Acquiring a remote sensing image and cutting the remote sensing image into a plurality of tile images, and respectively marking a rectangular detection frame P box of a building target and a target vector polygonal contour of the building target on each tile image, wherein/> is the number of vertexes of the target vector polygonal contour;
Acquiring an external polygonal contour P ex of the building target according to the target vector polygonal contour ;
Adopting a uniform interpolation method for the external polygonal contour P ex of the building target, expanding the number of vertexes of the external polygonal contour P ex of the building target to N init, and acquiring an actual external polygonal contour P init of the building target to construct a first training data set of the contour initialization module;
And (3) for the target vector polygonal contour P gt of the building target, adopting a uniform interpolation method, expanding the number of vertexes of the target vector polygonal contour P gt of the building target to N poly, and acquiring the actual vector polygonal contour P poly of the building target so as to construct a second training data set of the contour evolution module.
Specifically, the method for acquiring the first training data set and the second training data set comprises the following steps:
(1) And acquiring a remote sensing image and cutting the remote sensing image into tile images with the width W and the height H, and respectively labeling a target detection frame rectangle P box of a building and a target vector polygonal contour of a building target on the tile images, wherein i is an integer, and/> is the number of vertexes of the target vector polygonal contour.
(2) Obtaining an external polygonal outline P ex of a building target according to P gt, wherein the external polygonal outline P ex is specifically:
First, four extrema of are obtained: x-coordinate minimum/> , y-coordinate minimum/> , x-coordinate maximum/> , y-coordinate maximum/> .
For extremum , > , a set of vertices where the difference between the x-coordinate and/> is less than or equal to t is obtained. The two vertices with the largest and smallest y coordinates in/> are taken as the polar edges of the extremum/> .
For extremum , > , a set of vertices where the difference between the y coordinate and/> is less than or equal to t is obtained. The two vertices with the largest and smallest x-coordinates in/> are taken as the polar edges of the extremum/> .
And obtaining extreme values and extreme values/> in the same mode, and finally performing de-duplication operation on eight vertexes on four extreme edges of the four extreme values to obtain the external polygonal contour/> of the building target, wherein i is an integer, and/> is the number of vertexes of the external polygonal contour of the building target. The outline of the circumscribed polygon of different building shapes can be seen in fig. 3, and t is set to be 0.01, and the letter,/> is set to be 4-8.
(3) Obtaining an actual value P init of the external polygonal contour of the building target according to the external polygonal contour P ex of the building target, specifically, adopting a uniform interpolation method to expand the number of the vertexes of the external polygonal contour P ex of the building target to N init to obtain an actual value P init of the external polygonal contour of the building target so as to construct a first training data set. The invention sets N init to 40.
(4) According to the marked target vector polygonal contour P gt of the building target, the actual value P poly of the vector polygonal contour of the building target is obtained, specifically, a uniform interpolation method is adopted, the number of the vertexes of the target vector polygonal contour P gt of the building target is expanded to N poly, and the actual value P poly of the vector polygonal contour of the building target can be obtained, so that the second training data set is constructed. The present invention sets N poly to 128.
The first training data set of the contour initialization model and the second training data set of the contour evolution module are constructed, and then the contour initialization module and the contour evolution module in the contour extraction model are respectively trained, wherein the training process of the contour initialization module is as follows:
Performing data enhancement operation on the remote sensing tile images in a rotating, zooming mode and the like, inputting the remote sensing images in a disordered order into the feature extraction module, and outputting a remote sensing image feature matrix F and rectangular detection frames P box of a plurality of building targets;
Expanding the number of vertexes of the rectangular detection frames P box of each building target to N init by adopting a uniform interpolation method;
Extracting a feature matrix F box of a rectangular detection frame P box of the building target from the remote sensing image feature matrix F according to the vertex coordinate value of the rectangular detection frame P box of the building target;
Inputting the combined result of F box and P box (F box,Pbox) into the contour initialization module to obtain an offset O box, and calculating an inference value of the circumscribed polygon contour of the building target as follows:
;
And calculating a first error between an inferred value of the circumscribed polygonal contour of the building target and an actual circumscribed polygonal contour P init of the building target based on the smoothing loss function, and obtaining a contour initialization module when the first error is minimum.
The training process for the contour evolution module is as follows:
Expanding the number of vertexes of the inferred value of the circumscribed polygonal outline of the building target to N poly by adopting a uniform interpolation method;
Extracting a feature matrix/> of the inferred value/> of the circumscribed polygonal contour of the building target from the remote sensing image feature matrix F according to the vertex coordinate value of the inferred value of the circumscribed polygonal contour of the building target;
inputting the result of and/> combination/> into the contour evolution module to obtain an offset O init, and calculating the reasoning value/> of the vector polygonal contour of the building target as follows:
;
And calculating a second error between the inferred value of the vector polygonal contour of the building target and the actual vector polygonal contour P poly of the building target based on the bidirectional nearest loss function, and acquiring a trained contour evolution model converged by the second error.
Wherein calculating a second error between the inferred value of the vector polygonal contour of the building target and the actual vector polygonal contour P poly of the building target based on the bi-directional nearest loss function comprises:
Let the i-th point in be the nearest Manhattan distance L 1 between the i-th point in/> , and/> of/> , and the distance L 1 between/> , and/> be/> , respectively
;
The average nearest distance between the inferred value of the vector polygonal contour of the building target and the actual vector polygonal contour P poly of the building target is/> :
;
Let the j-th point in be the closest L 1 distance from the/> point in/> , to/> , and the L1 distance from/> , to/> be/> :
;
The average nearest distance between the actual vector polygonal contour P poly of the building target and the inferred value of the vector polygonal contour of the building target is/> , then:
;
Defining the bi-directional nearest loss function as the average of and/> , denoted/> , there are:
。
Wherein, in order to conveniently demonstrate the calculation result of the bidirectional nearest loss function, the invention provides an embodiment, and N poly is set to 4.
Assuming that the model inference value = { (0, 0), (-2, 2), (1, 1), (1, 0) }, the actual value/> = { (0, 0), (0, 1), (1, 0) }, the/> = 0.5 is calculated according to the above-mentioned two-way nearest loss function calculation formula.
The contour extraction model is constructed, and a vertex classification model is constructed, wherein the vertex classification model comprises a vertex offset network and a sigmoid activation layer, the output of the vertex classification model is the classification of the vertex, the value range is 0-1, 0 indicates that the vertex is invalid, and 1 indicates that the vertex is valid.
And obtaining a vertex classification labeling set of the training vertex classification model, specifically, obtaining a vector polygon contour of a building target in the remote sensing image extracted by the contour extraction model, labeling the type of each vertex in the vector polygon contour/> , wherein the type of each vertex is effective or ineffective, the actual value of an ineffective classification Score is 0, and the actual value of an effective classification Score is 1, so as to obtain a vertex classification labeling set Score gt of the vertex classification model.
Wherein labeling the type of each vertex in the vector polygonal contour comprises:
Obtaining a vector polygonal contour of the building target extracted by the contour evolution module, wherein the number of vertexes of the known/> is/> , and the number of vertexes of a target vector polygonal contour/> of the marked building target is/> , and is larger than/> ;
Carrying out optimal matching on the vertexes in and the vertexes in/> based on a weighted bipartite graph optimal matching algorithm, taking the Euclidean distance between the two vertexes as a weight value to enable the sum of the Euclidean distances of a matching result to be minimum, wherein the matching result is a vertex pair of and/> ,/> is a subset of/> , the number of the vertexes of/> is equal to that of/> , and marking the vertexes in as 1 to represent effective vertexes; labeling the vertex of/> as 0, representing an invalid point;
The constructed vertex classification label set is denoted Score gt.
Training the vertex classification model based on a third training data set, wherein the specific training process comprises the following steps:
extracting a feature matrix F of the remote sensing image based on a feature extraction module;
extracting a feature matrix/> of a vector polygonal contour/> of the building target from the feature matrix F according to the vertex coordinate values of the vector polygonal contour of the building target;
Inputting and/> combined results/> into the vertex classification model to obtain classification Score reasoning values Score pre of each vertex in/> ;
Calculating a third error between the classification Score reasoning value Score pre and the classification Score actual value Score gt in the vertex classification annotation set based on the Focal Loss function;
and obtaining the vertex classification model when the third error is minimum.
And 2, inputting the remote sensing image into a vector polygon contour extraction model to obtain the vector polygon contour of the building target in the remote sensing image.
It can be understood that in step 1, a contour extraction model and a vertex classification model are respectively constructed, and the trained contour extraction model and the trained vertex classification model are connected to form a remote sensing image vector polygon contour extraction model.
In reasoning, the input remote sensing image firstly obtains the vector polygon outline of the building target through the outline extraction module, then inputs the vector polygon outline of the building target into the vertex classification module to obtain the classification score of each vertex, regards the vertices with the score smaller than the preset score value score as invalid vertices, and deletes the invalid vertices to obtain the final building target vector polygon outline. In the present invention, the preset score value score is set to 0.85.
Referring to fig. 4, for a schematic diagram of each contour extraction step in a building vector contour provided by the invention, firstly, a remote sensing image is input into a feature extraction module in a contour extraction model to output a rectangular detection frame of a building target, then an external polygonal contour of the building target is output through a contour initialization module, then a vector polygonal contour of the building target is output through a contour evolution module, finally, the validity classification is carried out on the vertexes in the vector polygonal contour of the building target through a vertex classification model, and the final vector polygonal contour of the building target is formed according to the valid vertexes.
The invention provides a building vector contour extraction method, which has the following advantages:
(1) The invention constructs the deep learning network structure, realizes the end-to-end remote sensing image building vector contour extraction process, can directly obtain building vector contour data by inputting one remote sensing image, avoids secondary processing, and has lower calculation cost and higher accuracy.
(2) When the method adopts an external polygonal initializing mode, the shape of the building target can be better wrapped, and the accuracy of the initializing stage is improved. In the contour evolution stage, two-way nearest loss functions are used for promoting each point on the contour to move towards the vertex direction which is closer to the contour, so that convergence is quickened. In the contour simplification stage, a weighted bipartite graph optimal matching algorithm is adopted to construct a training data set, so that a trained module can distinguish whether vertexes are redundant or not, the contour accuracy is further improved, and the output vector polygons can express geographic information more closely.
In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.