Disclosure of Invention
Based on this, there is a need for a method and apparatus for automatic registration of remote sensing images based on a non-rigid bi-directional registration network that may be inaccurate as a result of conventional non-rigid registration.
A remote sensing image automatic registration method based on a non-rigid bidirectional registration network comprises the following steps:
constructing a first multi-layer motion flow field prediction network, and inputting a first motion image and a first fixed image to be registered into the first multi-layer motion flow field prediction network to obtain a first predicted image; the motion flow field prediction networks are connected in cascade, and the input of a first motion flow field prediction network is a first motion image and a first fixed image of a remote sensing image to be registered; the input of other motion flow field prediction networks is a first fixed image and a predicted image output by a previous stage motion flow field prediction network;
a first predicted image output by the first multi-layer motion flow field prediction network is taken as a second motion image, a similar fixed image corresponding to the first fixed image is taken as a second fixed image, and the second multi-layer motion flow field prediction network constructed in advance is input to the second multi-layer motion flow field prediction network, so that a conversion image is output; wherein each motion flow field prediction network in the second multi-layer motion flow field prediction network is connected in cascade, and the second fixed image and the first fixed image meet style consistency and correspond to pixel positions of the first motion image one by one
In one embodiment, the motion flow field prediction network comprises: correlation layer, mutual matching layer, neighborhood correlation layer, mutual matching layer, decoding layer and resampling layer.
In one embodiment, the method further comprises:
calculating a first feature correlation map between the first moving image and a first fixed image through the correlation layer;
processing the feature correlation map through the mutual matching layer, the neighborhood correlation layer and the mutual matching layer to obtain a second feature map;
decoding the second feature map through the decoding layer to obtain a predicted flow field;
and changing the predicted flow field through the resampling layer to obtain a first predicted image.
In one embodiment, the style consistency of the second fixed image and the similar fixed image is:
wherein,representing a similar fixed image.
In one embodiment, the method further comprises: training the two-way registration network formed by the first multi-layer motion flow field prediction network and the second multi-layer motion flow field prediction network according to a pre-constructed loss function to obtain network parameters;
in one embodiment, the loss function includes: EPE penalty, IOU penalty, content feature penalty, and normalized cross-correlation penalty;
the EPE loss is:
wherein m represents the number of pixels of the image, F A→B Representing a first moving image I A Fixed image I relative to first B Is arranged in the motion flow field;
the IOU penalty is:
wherein L is B Representing a first fixed image I B Corresponding building semantic tags, T (L A ) Representing a building semantic tag corresponding to a predicted image output by a motion flow field prediction network;
the content characteristic loss is as follows:
wherein Feat (T (I) A ) Representing extracted content features of predicted images output by a motion flow field prediction network using a pre-trained VGG-16 model on ImageNet, feat (I) B ) Representing the use of a pre-trained VGG-16 model on ImageNet for the first fixed image I B Extracting content characteristics;
the normalized cross-correlation loss is:
wherein I and J are two input image blocks,and->Respectively the local mean of I and J at the x-position, Ω representing all pixels in the image block.
A remote sensing image automatic registration device based on a non-rigid bi-directional registration network, the device comprising:
the first unidirectional registration module is used for constructing a first multilayer motion flow field prediction network, and inputting a first motion image and a first fixed image to be registered into the first multilayer motion flow field prediction network to obtain a first predicted image; the motion flow field prediction networks are connected in cascade, and the input of a first motion flow field prediction network is a first motion image and a first fixed image of a remote sensing image to be registered; the input of other motion flow field prediction networks is a first fixed image and a predicted image output by a previous stage motion flow field prediction network;
the second unidirectional registration module is used for inputting a second multi-layer motion flow field prediction network constructed in advance by taking a first prediction image output by the first multi-layer motion flow field prediction network as a second motion image and a similar fixed image corresponding to the first fixed image as a second fixed image, and outputting a conversion image; and each motion flow field prediction network in the second multi-layer motion flow field prediction network is connected in cascade, and the second fixed image and the first fixed image meet style consistency and correspond to pixel positions of the first motion image one by one.
According to the remote sensing image automatic registration method and device based on the non-rigid bidirectional registration network, registration reversibility and geometric consistency are enhanced through bidirectional registration.
Detailed Description
For a better understanding of the objects, technical solutions and technical effects of the present invention, the present invention will be further explained below with reference to the drawings and examples. Meanwhile, it is stated that the embodiments described below are only for explaining the present invention and are not intended to limit the present invention.
In one embodiment, as shown in fig. 1, a schematic flow chart of a remote sensing image automatic registration method based on a non-rigid bidirectional registration network is provided, comprising:
step 102, a first multi-layer motion flow field prediction network is constructed, and a first motion image and a first fixed image to be registered are input into the first multi-layer motion flow field prediction network to obtain a first prediction image.
The motion flow field prediction networks are connected in cascade, and the input of the first motion flow field prediction network is a first motion image and a first fixed image of the remote sensing image to be registered; the input of other motion flow field prediction networks is a first fixed image and a predicted image output by a previous stage motion flow field prediction network.
And 104, inputting a second multi-layer motion flow field prediction network constructed in advance by taking a first prediction image output by the first multi-layer motion flow field prediction network as a second motion image and a similar fixed image corresponding to the first fixed image as a second fixed image, and outputting a conversion image.
Each motion flow field prediction network in the second multilayer motion flow field prediction network is connected in cascade, and the second fixed image and the first fixed image meet style consistency and correspond to pixel positions of the first motion image one by one.
According to the remote sensing image automatic registration method based on the non-rigid bidirectional registration network, registration reversibility and geometric consistency are enhanced through bidirectional registration.
In one embodiment, a motion flow field prediction network comprises: correlation layer, mutual matching layer, neighborhood correlation layer, mutual matching layer, decoding layer and resampling layer.
In one embodiment, a first feature correlation map between a first moving image and a first fixed image is calculated by a correlation layer; processing the feature correlation map through the mutual matching layer, the neighborhood correlation layer and the mutual matching layer to obtain a second feature map; decoding the second feature map through a decoding layer to obtain a predicted flow field; and changing the predicted flow field through a resampling layer to obtain a first predicted image.
Specifically, the correlation layer calculates the similarity between the positions of the two feature vectors to evaluate the matching probability. With the characteristic f A 、f B For inputting and outputting characteristic correlation diagrams
c ijkl =f B (k,l) T f A (i,j)
Where (i, j) and (k, l) represent the locations of individual features in the feature vector.
Because of large distortion of remote sensing images and complex background, distribution of matching relations between nearest neighbors can lead to a large number of false matches. Then eliminating initial inconsistent matching through a mutual matching layer and a neighborhood related layer, wherein the mutual matching layer:
and->Representing a specific match c ijkl The ratio of the largest fractions across each pair of dimensions ab or cd corresponding to image a or B.
Neighborhood related layer:
where N () consists of a series of convolutional layers and active layers.
The decoding module obtains a normalized prediction flow field through a series of convolution layers, a BN layer and a RELU activation layer, and the resampling layer adopts a grid_sample function in a pytorch and transforms a moving image according to the prediction flow field.
In another embodiment, the multi-layer motion flow field prediction network constructed by the present invention is essentially an iteration of performing image transformation, as shown in fig. 2, taking three iterations as an example:
the decoder adopts a coarse-to-fine strategy, the output motion flow field estimation resolutions are respectively 32 x 32, 64 x 64, 128 x 128, 256 x 256, and the motion flow field according to the highest resolutionObtaining a first round of transformed images T 1 (I A ). Second transformation to fix image B and transformed image T of first round 1 (I A ) For input, predict motion flow field of second wheel +.>And obtain a distorted image T 2 (I A ). The second wheel predicted flow field is equal to the first wheel predicted flow field +.>Plus a second round predictive flow fieldAnd so on. The motion flow field finally predicted by the third iteration is F A→B The transformed image is T 3 (I A ). Bi-directional registration continues with T 3 (I A ) As a moving picture, in picture->Repeating the steps as a fixed image to obtain a predicted motion flow field F B→A The transformed image is T (I A ))。
In one embodiment, the style consistency of the fixed image and the similar fixed image is:
wherein,representing a similar fixed image which maintains the style of the first fixed image through the transformation described above.
In one embodiment, training the bidirectional registration network formed by the first multi-layer motion flow field prediction network and the second multi-layer motion flow field prediction network according to a pre-constructed loss function to obtain network parameters.
In the training process, a moving image I is input A And fixed image I B To the network, according to the motion flow field F predicted by the network A→B Converting moving image into image T (I A ) Then bi-directionally registered with image T (I A ) As a moving image, an imageAs a fixed image, a motion flow field F predicted from a network B→A Converting a moving image into an image T (I A )). In the test process, only the first half is reserved, and the moving image I is input A And fixed image I B To the network, according to the network pre-allocationMeasured motion flow field F A→B Converting moving image into image T (I A ). Image T (I) A ) And image I B Alignment, image I A 、T(T(I A ) Sum->Aligned one by one. Image->And I B Is of the same modality, other images and image I A Is the same modality.
In the training process, a pair of moving images and fixed images are used as network inputs, a predicted moving flow field is output, a loss function is calculated between an output result of the network and a true value label of the flow field in each iteration process, the minimum loss function is used as an objective function, parameters in the deep convolutional neural network are continuously optimized by utilizing an Adam network parameter optimization algorithm, the learning rate is set to be 2e-4, and the attenuation rate is set to be 4e-4. When the loss value is no longer reduced, the network parameters at that time are saved as final network model parameters.
In one embodiment, the loss function includes: EPE penalty, IOU penalty, content feature penalty, and normalized cross-correlation penalty;
EPE loss is:
where m represents the number of pixels of the image, m=256×256, f for a 256×256 image A→B Representing a first moving image I A Fixed image I relative to first B Is arranged in the motion flow field;
the IOU penalty is:
wherein L is B Representing a first fixed image I B Corresponding constructionBuilding semantic tags, T (L) A ) Representing a building semantic tag corresponding to a predicted image output by a motion flow field prediction network;
the content characteristic loss is as follows:
wherein Feat (T (I) A ) Representing extracted content features of predicted images output by a motion flow field prediction network using a pre-trained VGG-16 model on ImageNet, feat (I) B ) Representing the use of a pre-trained VGG-16 model on ImageNet for the first fixed image I B Extracting content characteristics;
normalized cross-correlation loss is:
wherein I and J are two input image blocks,and->Respectively the local mean of I and J at the x-position, Ω representing all pixels in the image block.
Specifically, EPE loss represents the predicted flow field F and the true motion flow field F GT The IOU loss indicates the error of the building tag T (L A ) Building label L corresponding to fixed image B In order to avoid distortion of the content of the distorted image, constraints are put forward on the loss of content features and the loss of normalized cross-correlation (NCC). The content feature loss is to use a VGG-16 model trained in advance on ImageNet to perform feature extraction and to convert the motion image T (I A ) And the fixed image B.
Finally, the total loss function is expressed as:
where i represents a bi-directional registration task, j represents five layers of features output in the loss of content features, and k represents 3 iterations.
Inputting a pair of optical remote sensing images to be registered which are not learned by a network into a model saved in a training process, and obtaining a transformed registration result T (I) by only adopting unidirectional registration from a moving image A to a fixed image B A )。
It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.
In one embodiment, as shown in fig. 3, there is provided a remote sensing image automatic registration device based on a non-rigid bidirectional registration network, including: a first unidirectional registration module 302 and a second unidirectional registration module 304, wherein:
the first unidirectional registration module 302 is configured to construct a first multi-layer motion flow field prediction network, and input a first motion image and a first fixed image to be registered into the first multi-layer motion flow field prediction network to obtain a first predicted image; the motion flow field prediction networks are connected in cascade, and the input of a first motion flow field prediction network is a first motion image and a first fixed image of a remote sensing image to be registered; the input of other motion flow field prediction networks is a first fixed image and a predicted image output by a previous stage motion flow field prediction network;
the second unidirectional registration module 304 is configured to input a second multi-layer motion flow field prediction network pre-constructed for a second fixed image by using the first predicted image output by the first multi-layer motion flow field prediction network as the second motion image, and output a conversion image; and each motion flow field prediction network in the second multi-layer motion flow field prediction network is connected in cascade, and the second fixed image and the first fixed image meet style consistency and correspond to pixel positions of the first motion image one by one.
In one embodiment, the motion flow field prediction network comprises: correlation layer, mutual matching layer, neighborhood correlation layer, mutual matching layer, decoding layer and resampling layer.
In one embodiment, the unidirectional registration module 302 is further configured to calculate a first feature correlation map between the first moving image and the first fixed image through the correlation layer; processing the feature correlation map through the mutual matching layer, the neighborhood correlation layer and the mutual matching layer to obtain a second feature map; decoding the second feature map through the decoding layer to obtain a predicted flow field; and changing the predicted flow field through the resampling layer to obtain a first predicted image.
In one embodiment, the style consistency of the second fixed image and the similar fixed image is:
wherein,representing a similar fixed image.
In one embodiment, a training module includes: training the bidirectional registration network formed by the first multilayer motion flow field prediction network and the second multilayer motion flow field prediction network according to the loss function through a pre-constructed loss function, and obtaining network parameters.
In one embodiment, the loss function includes: EPE penalty, IOU penalty, content feature penalty, and normalized cross-correlation penalty;
the EPE loss is:
wherein m represents the number of pixels of the image, F A→B Representing a first moving image I A Fixed image I relative to first B Is arranged in the motion flow field;
the IOU penalty is:
wherein L is B Representing a first fixed image I B Corresponding building semantic tags, T (L A ) Representing a building semantic tag corresponding to a predicted image output by a motion flow field prediction network;
the content characteristic loss is as follows:
wherein Feat (T (I) A ) Representing extracted content features of predicted images output by a motion flow field prediction network using a pre-trained VGG-16 model on ImageNet, feat (I) B ) Representing the use of a pre-trained VGG-16 model on ImageNet for the first fixed image I B Extracting content characteristics;
the normalized cross-correlation loss is:
wherein I and J are two input image blocks,and->Respectively the local mean of I and J at the x-position, Ω representing all pixels in the image block.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by the processor is used for realizing a remote sensing image automatic registration method based on a non-rigid bidirectional registration network. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the structures shown in FIG. 4 are block diagrams only and do not constitute a limitation of the computer device on which the present aspects apply, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, a computer device is provided comprising a memory storing a computer program and a processor implementing the steps of the method of the above embodiments when the computer program is executed.
In an embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, implements the steps of the method of the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.