CN119205638A

CN119205638A - Remote sensing image change detection method and device based on iterative Mamba architecture

Info

Publication number: CN119205638A
Application number: CN202411198502.XA
Authority: CN
Inventors: 李�杰; 文翊涵; 鄢小虎; 李峰; 毛亮; 戴明
Original assignee: Shenzhen Vocational And Technical University
Current assignee: Shenzhen Vocational And Technical University
Priority date: 2024-08-29
Filing date: 2024-08-29
Publication date: 2024-12-27
Anticipated expiration: 2044-08-29
Also published as: CN119205638B

Abstract

The invention relates to a remote sensing image change detection method and device based on an iteration Mamba architecture, wherein the method comprises the steps of respectively inputting a first remote sensing image and a second remote sensing image into a Mamba feature extractor to respectively extract a multi-dimensional feature image set of the first remote sensing image and the second remote sensing image, inputting the multi-dimensional feature image set into a state space change detection module, obtaining a multi-dimensional long-frequency change feature image set of the first remote sensing image and the second remote sensing image through state space modeling, carrying out feature fusion on the multi-dimensional long-frequency change feature image set to obtain a low-dimensional fusion change feature image, inputting the high-dimensional change feature image and the low-dimensional fusion change feature image into a global mixed attention module to carry out feature fusion to generate a fusion output feature image, and carrying out noise correction on the fusion output feature image through an iteration diffusion model to generate a remote sensing image change detection image, so that the accuracy and efficiency of remote sensing image change detection are remarkably improved.

Description

Remote sensing image change detection method and device based on iteration Mamba architecture

Technical Field

The invention relates to the technical field of computer vision, in particular to a remote sensing image change detection method and device based on an iteration Mamba architecture.

Background

The remote sensing image Change Detection (CD) is a process of analyzing remote sensing images photographed at different periods to identify feature changes. The method has wide application in the fields of land utilization management, environment monitoring, resource evaluation, disaster evaluation and the like. The existing change detection methods mainly comprise a method based on the traditional image processing technology and a method based on deep learning.

Conventional change detection methods generally rely on differential computation at the pixel level, feature level and decision level, but these methods often suffer from problems of insufficient accuracy and noise interference when processing high resolution remote sensing images. The pixel-level method detects a change by comparing gray values or color values of images, is simple and intuitive, but is sensitive to noise and difficult to process complex scenes. The feature level method can reduce noise influence to a certain extent by extracting features such as textures, shapes and the like of the images to perform change detection, but detail information is easy to lose in the high-resolution images. The decision-level method integrates various detection results, generates a final change detection result through voting or other strategies, but has higher computational complexity and is easily affected by individual error detection.

In recent years, deep learning techniques, particularly convolutional neural networks (Convolutional Neural Networks, CNNs) and transformer (Transformers) models, are excellent in the field of image processing, but these methods also have some limitations in the task of change detection.

Convolutional neural networks perform well in capturing local features, but they have shortcomings in capturing long-range dependencies and global information, limiting their performance in complex change detection tasks. Particularly, when processing a large-scale image, CNNs has a limited receptive field, and it is difficult to effectively utilize global information of the image, so that a change detection result is not accurate enough. Furthermore, CNNs requires a large amount of annotation data during the training process, which is a significant challenge in the field of remote sensing images.

The transformer model can better capture global features, but has higher computational complexity and is difficult to operate efficiently in an environment with limited resources. The transformer models each pixel of the image through a self-attention mechanism, so that long-distance dependence in the image can be well captured, but the calculated amount of the method increases in square level along with the increase of the size of the image, and the consumption of calculation resources is huge when processing high-resolution images. In addition, the transformer model has strong dependence on training data, and a large amount of labeling data is required.

The common image information fusion method is easy to cause information loss or redundancy when processing multi-phase images, and influences the accuracy of change detection. The information fusion of multi-phase images is a key step in the change detection, but the existing method often introduces redundant information or loses important change information in the information extraction and fusion process, thereby influencing the accuracy of the detection result. For example, simple image difference methods are prone to introduce noise, while feature extraction-based methods may ignore subtle variations.

Therefore, development of a new method for detecting the change of the remote sensing image is needed.

Disclosure of Invention

The invention mainly aims to provide a remote sensing image change detection method and device based on an iteration Mamba framework, which are used for solving the problems of low remote sensing image change detection precision and efficiency in the prior art.

To achieve the above object, a first aspect of the present invention provides a remote sensing image change detection method based on an iterative Mamba architecture, the method including:

Respectively inputting a first remote sensing image and a second remote sensing image into Mamba feature extractors to respectively extract a multi-dimensional feature map set of the first remote sensing image and a multi-dimensional feature map set of the second remote sensing image, wherein the multi-dimensional feature map set at least comprises a high-dimensional feature map and a low-dimensional feature map;

inputting a multi-dimensional feature map set of the first remote sensing image and a multi-dimensional feature map set of the second remote sensing image to a state space change detection module, and obtaining a multi-dimensional long-frequency change feature map set of the first remote sensing image and the second remote sensing image through state space modeling, wherein the multi-dimensional long-frequency change feature map set at least comprises a high-dimensional change feature map and a low-dimensional change feature map;

feature fusion is carried out on the multi-dimensional long-frequency change graph feature set, and a low-dimensional fusion change feature graph is obtained;

inputting the high-dimensional change feature map and the low-dimensional fusion change feature map to a global mixed attention module for feature fusion, and generating a fusion output feature map;

And carrying out noise correction on the fused output feature images through an iterative diffusion model to generate a remote sensing image change detection image.

Further, the Mamba feature extractor includes a linear embedding layer and an N-layer encoder layer, the encoder layer including a VSS block and a patch merging layer, the step of inputting the first remote sensing image to the Mamba feature extractor to extract a set of multi-dimensional feature maps of the first remote sensing image, comprising:

The first remote sensing image is partitioned into non-overlapping patches through the linear embedding layer, each patch is linearly embedded into a preset feature space, and initial feature vector sets of all patches form an initial feature map;

Inputting the initial characteristic diagram into a first layer of the encoder layer for encoder layer processing; the method comprises the steps of processing an encoder layer, inputting the feature map after VSS block processing to a patch merging layer, linearly transforming a preset number of adjacent patches in the feature map after VSS block processing, and splicing to form super patches, wherein the feature vector sets of all the super patches form a new feature map, and the feature dimension of the new feature map is higher than that of the initial feature map;

And inputting the new feature map to the next layer of encoder layer, and repeating the encoder layer processing step until the encoder processing of the N layers of encoder layers is completed, wherein the feature map output by each layer of encoder layer is used as a multi-dimensional feature map set of the first remote sensing image, and the multi-dimensional feature map set at least comprises a high-dimensional feature map and a low-dimensional feature map.

Further, the step of inputting the multi-dimensional feature map set of the first remote sensing image and the multi-dimensional feature map set of the second remote sensing image to a state space change detection module, and obtaining the multi-dimensional long-frequency change feature maps of the first remote sensing image and the second remote sensing image through state space modeling includes:

Inputting the characteristic diagram of the first remote sensing image output by the i-th layer of the encoder layer and the characteristic diagram of the second remote sensing image output by the i-th layer of the encoder layer into the state space change detection module, and processing the characteristic diagram through a state space model to output a captured i-th layer long-frequency change characteristic diagram, wherein 0<i is less than or equal to N;

And taking the collection of the long-frequency change characteristic graphs of each layer as the multi-dimensional long-frequency change characteristic collection graph, taking the long-frequency change characteristic output of the high-dimensional characteristic graph input as the first remote sensing image and the high-dimensional characteristic graph input as the second remote sensing image as the high-dimensional long-frequency change characteristic graph, and taking the long-frequency change characteristic output of the low-dimensional characteristic graph input as the first remote sensing image and the low-dimensional characteristic graph input as the low-dimensional long-frequency change characteristic graph.

Further, the step of performing feature fusion on the multi-dimensional long-frequency change feature set to obtain a low-dimensional fusion change feature map includes:

Upsampling the high-dimensional change feature map such that a spatial resolution of the high-dimensional change feature map matches a resolution of the low-dimensional spatial feature map;

And fusing the up-sampled high-dimensional change feature map with the low-dimensional change feature map in an element addition mode to obtain a low-dimensional fusion change feature map.

Further, the step of inputting the high-dimensional change feature map and the low-dimensional fusion change feature map to a global mixed attention module to perform feature fusion and generating a fusion output feature map includes:

respectively carrying out channel expansion on the high-dimensional change feature map and the low-dimensional fusion change feature map, and correspondingly respectively obtaining an expanded high-dimensional change feature map and an expanded low-dimensional fusion change feature map;

Performing feature fusion on the expanded high-dimensional change feature map and the expanded low-dimensional fusion change feature map to generate a fusion feature map;

a1 multiplied by 1 convolution and Sigmoid activation function are adopted on the fusion feature map to generate a global attention feature map;

dividing the global attention profile into a high-dimensional attention profile and a low-dimensional attention profile;

Multiplying the high-dimensional attention feature map and the low-dimensional attention feature map with the initial feature map element by element respectively to generate a weighted high-dimensional feature map and a weighted low-dimensional feature map correspondingly;

And reconstructing the weighted high-dimensional feature map and the weighted low-dimensional feature map to generate the final fusion output feature map.

Further, the step of generating a remote sensing image change detection map by performing noise correction on the fused output feature map through an iterative diffusion model includes:

Acquiring initialized input noise characteristics as an iteration starting point;

performing iterative optimization on the fusion output feature map for a plurality of times through an iterative diffusion model to finally generate a remote sensing image change detection map, wherein each iterative optimization comprises a forward process and a reverse process,

In the forward process, gradually adding the input noise characteristics into a change characteristic diagram of an input iterative forward process until a pure noise diagram is added;

In the reverse process, the noise estimation model carries out noise estimation on the image input into the reverse process to generate a noise estimation value, and the denoised image is obtained through calculation according to a reverse denoising formula.

Further, the noise estimation value is:

∈_θ(x_t,I^a,I^b,t)＝D(GHAT(E(m^LC+P(x_t),t),m^HC),t)

Where t is the number of iterations, I ^a and I ^b are the variation characteristics, D represents the decoder of the noise estimation model, GHAT represents the global mixed attention mechanism, m ^LC and m ^HC represent the low and high dimensional variation characteristics, respectively, and P (x _t) is the input noise characteristic.

Further, the inverse denoising formula is:

where t is the number of iterations, z _t represents noise, alpha _t and For the diffusion model parameters, ε _θ(x_t,I^a,I^b, t) is the noise estimate, σ _t is the mean square error of the noise estimate, and x _t-1 is the denoised image after the t-1 th iteration.

Further, the noise estimation model adopts Unet structure, including a second encoder and a second decoder,

The second encoder receives input features, wherein the input features comprise input noise features and input change feature graphs, and the output feature graphs of each layer are extracted and generated layer by layer, wherein the output feature graphs of each layer contain different dimensional information of the input features;

The second decoder is in jump connection with the second encoder, the output characteristic diagram of the corresponding layer in the second encoder is transmitted to the corresponding layer in the second decoder, the resolution of the output characteristic diagram is restored layer by layer until the difference value between the resolution of the output characteristic diagram and the resolution of the input change characteristic diagram is smaller than a preset threshold value, and a preliminary change detection diagram is obtained;

And taking the preliminary change detection graph as input through an iterative optimization mechanism, performing repeated iterative processing on the preliminary change detection graph through a multi-dimensional feature and a global mixed attention mechanism, gradually correcting the change detection result until a preset iteration stop condition is met, and outputting the final remote sensing image change detection graph.

A second aspect of the present invention provides a remote sensing image change detection apparatus based on an iterative Mamba architecture, the apparatus comprising:

The feature extraction module is used for respectively inputting the first remote sensing image and the second remote sensing image into a Mamba feature extractor so as to respectively extract a multi-dimensional feature map set of the first remote sensing image and a multi-dimensional feature map set of the second remote sensing image, wherein the multi-dimensional feature map set at least comprises a high-dimensional feature map and a low-dimensional feature map;

The state space change detection module is used for modeling the multi-dimensional feature map set of the first remote sensing image and the multi-dimensional feature map set of the second remote sensing image through a state space to obtain multi-dimensional long-frequency change feature map sets of the first remote sensing image and the second remote sensing image, wherein the multi-dimensional long-frequency change feature map feature sets at least comprise a high-dimensional change feature map and a low-dimensional change feature map;

The low-dimensional feature generation module is used for carrying out feature fusion on the multi-dimensional long-frequency change graph feature set to obtain a low-dimensional fusion change feature graph;

the global mixed attention module is used for carrying out feature fusion on the high-dimensional change feature map and the low-dimensional fusion change feature map to generate a fusion output feature map;

and the iterative diffusion module is used for carrying out noise correction on the fused output characteristic images through the iterative diffusion model to generate a remote sensing image change detection image.

According to the remote sensing image change detection method and device based on the iteration Mamba architecture, the Mamba feature extractor is introduced to extract the high-dimensional and low-dimensional feature image set of the remote sensing image, so that the multi-dimensional information in the image is fully utilized, and the accuracy of change detection is remarkably improved. The method has the advantages that the state space change detection module is utilized for modeling, multi-dimensional long-frequency change characteristics among images are captured and extracted, the multi-dimensional long-frequency change characteristic images are fused through the characteristic fusion technology, the low-dimensional fusion change characteristic image is obtained, complementary information of different dimensional characteristics is effectively integrated in the process, the influence of redundancy and noise is reduced, fusion of the high-dimensional change characteristic image and the low-dimensional fusion change characteristic image is achieved through the introduction of the global mixed attention module, the integrity of important change information is ensured, and the fidelity of change detection is improved. The design of the iterative Mamba architecture gradually optimizes the feature extraction and change detection process in an iterative mode while maintaining high performance, so that the calculation complexity is effectively reduced, the consumption of calculation resources can be greatly reduced on the premise of ensuring the detection precision, the efficiency of change detection is improved, and the method is suitable for environments with limited resources.

Drawings

FIG. 1 is a flow chart of a remote sensing image change detection method based on an iteration Mamba architecture according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a remote sensing image change detection device based on an iteration Mamba architecture according to an embodiment of the present invention;

Fig. 3 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, an embodiment of the invention discloses a remote sensing image change detection method based on an iteration Mamba architecture, which comprises the following steps:

S1, respectively inputting a first remote sensing image and a second remote sensing image into Mamba feature extractors to respectively extract a multi-dimensional feature map set of the first remote sensing image and a multi-dimensional feature map set of the second remote sensing image, wherein the multi-dimensional feature map set at least comprises a high-dimensional feature map and a low-dimensional feature map;

S2, inputting a multi-dimensional feature map set of the first remote sensing image and a multi-dimensional feature map set of the second remote sensing image into a state space change detection module, and obtaining a multi-dimensional long-frequency change feature map set of the first remote sensing image and the second remote sensing image through state space modeling, wherein the multi-dimensional long-frequency change feature map feature set at least comprises a high-dimensional change feature map and a low-dimensional change feature map;

S3, carrying out feature fusion on the multi-dimensional long-frequency change graph feature set to obtain a low-dimensional fusion change feature graph;

S4, inputting the high-dimensional change feature map and the low-dimensional fusion change feature map to a global mixed attention module for feature fusion, and generating a fusion output feature map;

And S5, carrying out noise correction on the fusion output characteristic map through an iterative diffusion model to generate a remote sensing image change detection map.

In this embodiment, in the step S1, the first remote sensing image and the second remote sensing image are remote sensing images of the same region under different time or conditions. The two images are input into a pre-trained Mamba feature extractor, respectively. The Mamba feature extractor is a model based on Convolutional Neural Network (CNN) or other deep learning architecture, and can automatically learn and extract low-dimensional and high-dimensional features in images. The input remote sensing image is subjected to operations such as multilayer convolution, pooling and the like of a Mamba feature extractor, and the multidimensional features of the image are gradually abstracted.

In the step S2, the difference of the two images in the feature space is analyzed by the state space modeling technique, and the significant change in the long time scale is identified, so as to obtain the multi-dimensional long-frequency change feature map set. The high-dimensional change characteristic diagram reveals the change of macroscopic layers such as the ground object type, and the low-dimensional change characteristic diagram captures the change of microscopic layers such as the edge, the texture, and the like.

In the step S3, feature fusion is performed on the multi-dimensional long-frequency change feature map set obtained in the step S2, and long-frequency change feature maps with different dimensions are combined to form a low-dimensional fusion change feature map containing more abundant information.

In the step S4, the global mixed attention module not only considers the importance of the local features, but also integrates global context information, so that the fused output feature map is more comprehensive and accurate.

In the step S5, the iterative diffusion model gradually smoothes the noise region in the image by simulating the natural diffusion process of the pixel values in the image, while maintaining the edge and detail information of the image.

The fusion output characteristic diagram after noise correction is clearer and more accurate, and the actual change condition in the remote sensing image can be reflected better. The finally generated remote sensing image change detection graph can be used in multiple fields such as environment monitoring, city planning, disaster assessment and the like.

According to the embodiment, the Mamba feature extractor is introduced to extract the high-dimensional and low-dimensional feature image set of the remote sensing image, so that the multi-dimensional information in the image is fully utilized, and the accuracy of change detection is remarkably improved. The method has the advantages that the state space change detection module is utilized for modeling, multi-dimensional long-frequency change characteristics among images are captured and extracted, the multi-dimensional long-frequency change characteristic images are fused through the characteristic fusion technology, the low-dimensional fusion change characteristic image is obtained, complementary information of different dimensional characteristics is effectively integrated in the process, the influence of redundancy and noise is reduced, fusion of the high-dimensional change characteristic image and the low-dimensional fusion change characteristic image is achieved through the introduction of the global mixed attention module, the integrity of important change information is ensured, and the fidelity of change detection is improved. The design of the iterative Mamba architecture gradually optimizes the feature extraction and change detection process in an iterative mode while maintaining high performance, so that the calculation complexity is effectively reduced, the consumption of calculation resources can be greatly reduced on the premise of ensuring the detection precision, the efficiency of change detection is improved, and the method is suitable for environments with limited resources.

In a specific embodiment, the Mamba feature extractor includes a linear embedding layer and an N-layer encoder layer, the encoder layer includes a VSS block and a patch merging layer, and the step S1 of inputting the first remote sensing image to the Mamba feature extractor to extract a multi-dimensional feature map set of the first remote sensing image includes:

s101, performing blocking processing on the first remote sensing image through the linear embedding layer, dividing the first remote sensing image into non-overlapping patches, linearly embedding each patch into a preset feature space, and forming an initial feature map by an initial feature vector set of all patches;

specifically, the first remote sensing image is input Is divided into non-overlapping patches, for example each patch having a size of 4 x 4. The linear embedding layer then embeds each patch into a predetermined feature space, such as a high-dimensional feature space, to form an initial feature map

F₀=LinearEmbedding(I)

S102, inputting the initial feature map into the encoder layer of the first layer for encoder layer processing; the method comprises the steps of processing an encoder layer, inputting the feature map after VSS block processing to a patch merging layer, linearly transforming a preset number of adjacent patches in the feature map after VSS block processing, and splicing to form super patches, wherein the feature vector sets of all the super patches form a new feature map, and the feature dimension of the new feature map is higher than that of the initial feature map;

the VSS block includes a depth convolution, siLU activation functions.

F′_i＝SiLU(DepthwiseConv(F_i))

To reduce the spatial resolution of the feature map layer by layer and increase the number of channels, a patch merging layer (PATCH MERGING layers) is provided at the end of each encoder layer. The layer realizes the downsampling of the features by splicing adjacent patches and performing linear transformation.

F_i+1＝PatchMerging(F′_i)

Wherein, the patch merging operation can be expressed as:

F_i+1＝"{Concat}("{Linear}(F_i′[:,0∷2,0∷2,:]),"{Linear}(F_i′[:,1∷2,0∷2,:]),&"{Linear}(F_i′[:,0∷2,1∷2,:]),"{Linear}(F_i′[:,1∷2,1∷2,:]))

Through the steps, mamba feature extractor can extract multidimensional features from the input image and maintain efficient computing performance while capturing long-distance dependency.

S103, inputting the new feature map to the next encoder layer, repeating the encoder layer processing step until the encoder processing of the N encoder layers is completed, and taking the set of the feature maps output by each encoder layer as a multi-dimensional feature map set of the first remote sensing image, wherein the multi-dimensional feature map set at least comprises a high-dimensional feature map and a low-dimensional feature map. The dimension of the feature map of the first remote sensing image is gradually increased and the spatial resolution is gradually reduced after the encoder layers of each layer are processed. The high-dimensional feature map is a feature map output by the last layer of encoder layer, and relatively, a feature map with lower dimension output by other layers is taken as a low-dimensional feature map.

In a specific embodiment, the step of inputting the second remote sensing image to the Mamba feature extractor to extract the multi-dimensional feature map set of the second remote sensing image, which is the same as the steps S101 to S103, includes:

S111, partitioning the second remote sensing image into non-overlapping patches through the linear embedding layer, linearly embedding each patch into a preset feature space, and forming an initial feature map by an initial feature vector set of all patches;

S112, inputting the initial feature map into the encoder layer of the first layer for encoder layer processing; the method comprises the steps of processing an encoder layer, inputting the feature map after VSS block processing to a patch merging layer, linearly transforming a preset number of adjacent patches in the feature map after VSS block processing, and splicing to form super patches, wherein the feature vector sets of all the super patches form a new feature map, and the feature dimension of the new feature map is higher than that of the initial feature map;

S113, inputting the new feature map to the next layer of encoder layer, repeating the encoder layer processing step until the encoder processing of the N layers of encoder layers is completed, and taking the set of the feature maps output by each layer of encoder layer as a multi-dimensional feature map set of the first remote sensing image, wherein the multi-dimensional feature map set at least comprises a high-dimensional feature map and a low-dimensional feature map.

In a specific embodiment, the step S2 of inputting the multi-dimensional feature map set of the first remote sensing image and the multi-dimensional feature map set of the second remote sensing image to a state space change Detection module (STATE SPACE CHANGE Detection, VSS-CD) and obtaining the multi-dimensional long-frequency change feature maps of the first remote sensing image and the second remote sensing image through state space modeling includes:

S201, inputting a feature map of the first remote sensing image output by the i-th layer of the encoder layer and a feature map of the second remote sensing image output by the i-th layer of the encoder layer into the state space change detection module, and processing the feature map by a state space model to output a captured i-th layer long-frequency change feature map, wherein 0<i is less than or equal to N;

S202, taking a set of long-frequency change feature graphs of each layer as the multi-dimensional long-frequency change feature set graph, taking long-frequency change feature output of a high-dimensional feature graph input into a first remote sensing image and a high-dimensional feature graph input into a second remote sensing image as the high-dimensional long-frequency change feature graph, and taking long-frequency change feature output of a low-dimensional feature graph input into the first remote sensing image and a low-dimensional feature graph input into the second remote sensing image as the low-dimensional long-frequency change feature graph.

In this embodiment, the VSS-CD module performs state space modeling through a linear time invariant system (LTI), which specifically includes the following equations:

wherein h (t) ε R ^N represents a hidden state, AndRespectively representing the characteristics of the first remote sensing image and the second remote sensing image at the ith layer, A epsilon R ^N×N、B∈R^N、C∈R^N andIs a parameter of the state space model.

To implement the model in discrete time, it is discretized by zero-order hold (ZOH), resulting in:

Wherein, AndIs a discretized system matrix, and is specifically calculated as follows:

in a practical implementation, the approximation is developed by a first order taylor

In the VSS-CD module, input featuresAndThe function is activated by a linear embedding layer, then by a deep convolution sum SiLU, and finally enters a state space model for processing. The final output y (t) represents the captured change characteristics.

Through the steps, the VSS-CD module can effectively capture the long-frequency change characteristics in the front-back change image, and the accuracy and the fidelity of change detection are improved.

In a specific embodiment, the step S3 of performing feature fusion on the multi-dimensional long-frequency variation feature set to obtain a low-dimensional fusion variation feature map includes:

S301, upsampling the high-dimensional change feature map so that the spatial resolution of the high-dimensional change feature map is matched with the resolution of the low-dimensional spatial feature map;

S302, fusing the up-sampled high-dimensional change feature map with the low-dimensional change feature map in an element addition mode to obtain a low-dimensional fusion change feature map.

In this embodiment, in the step S301, the up-sampling process is performed on the high-dimensional change feature map to increase the number of pixels of the image, so as to improve the spatial resolution of the image, and match the spatial resolution of the low-dimensional change feature map, so as to perform feature fusion subsequently.

In the step S302, the up-sampled high-dimensional change feature map and the low-dimensional change feature map are fused, and the low-dimensional fusion change feature map is generated by adding elements, so that the information of the two feature maps at the same position can be directly combined, and the low-dimensional fusion change feature map containing richer features is generated.

The low-dimensional fusion change feature map not only maintains the detailed information of the low-dimensional change feature map, but also integrates the semantic information of the high-dimensional change feature map, thereby being beneficial to improving the precision of subsequent change detection.

In a specific embodiment, the step S4 of inputting the high-dimensional change feature map and the low-dimensional fusion change feature map to a global mixed attention module for feature fusion to generate a fusion output feature map includes:

s401, respectively carrying out channel expansion on the high-dimensional change feature map and the low-dimensional fusion change feature map, and correspondingly respectively obtaining an expanded high-dimensional change feature map and an expanded low-dimensional fusion change feature map;

let the input high-dimensional change feature be The low dimensional change is characterized by The number of channels is first spread by a1 x 1 convolution:

M^H＝Conv^1×1(m^HC) M^L＝Conv^1×1(m^LC)

the high-dimensional change characteristic and the low-dimensional change characteristic after expansion are respectively And

S402, carrying out feature fusion on the expanded high-dimensional change feature map and the expanded low-dimensional fusion change feature map to generate a fusion feature map.

Splicing M ^concat＝[M^H,M^L the high-dimensional features and the low-dimensional features in the channel dimension, and splicing the feature images after splicing

Splicing the global feature map and the spliced feature map to generate a fusion feature map

M^fuse＝[M^concat,M^global]

Final fused feature map

S403, generating a global attention feature map by adopting 1X 1 convolution and Sigmoid activation function on the fusion feature map, wherein M ^att＝σ(Conv^1×1(M^fuse) is the global attention feature map

S404, dividing the global attention feature map into a high-dimensional attention feature map and a low-dimensional attention feature map;

Wherein the high-dimensional attention profile Low-dimensional attention profile

S405, multiplying the high-dimensional attention feature map and the low-dimensional attention feature map with the initial feature map element by element respectively, and correspondingly generating a weighted high-dimensional feature map and a weighted low-dimensional feature map respectively;

weighted high-dimensional feature map Weighted low-dimensional feature map

S406, reconstructing the weighted high-dimensional feature map and the weighted low-dimensional feature map to generate the final fusion output feature map,

In this embodiment, the GHAT module realizes the effective fusion of the high-dimensional and low-dimensional features through the steps, and improves the accuracy and the fineness of the change detection. Through the cross-attention mechanism GHAT, interactions between features can be captured in a global scope, generating a more accurate and reliable change feature map.

The step S5 of performing noise correction on the fused output feature map through the iterative diffusion model to generate a remote sensing image change detection map includes:

s501, acquiring initialized input noise characteristics as an iteration starting point;

the input noise feature is usually a randomly generated matrix, which has the same size as the fused output feature map, and the element values follow a certain distribution (such as gaussian distribution or uniform distribution) for simulating the initial noise in the image. These noise features serve as starting points for iterative diffusion models for subsequent forward and reverse iterative processes.

S502, carrying out repeated iterative optimization on the fusion output characteristic diagram through an iterative diffusion model to finally generate a remote sensing image change detection diagram, wherein each iterative optimization comprises a forward process and a reverse process,

Through the forward and reverse iterative processes, the iterative diffusion model decomposes the complex image generation problem into a series of simple denoising sub-problems by gradually adding and removing noise, and effectively corrects the noise in the fusion output characteristic diagram. In each iteration, the model generates a denoised image according to the current image state and noise characteristics. The reverse process gradually removes the noise in the image through a noise estimation and reverse denoising formula, and recovers a clear remote sensing image change detection diagram, thereby improving the signal-to-noise ratio of the remote sensing image change detection diagram and retaining the detailed information in the remote sensing image change detection diagram.

In one embodiment, the initialized input noise characteristics are set toAnd obtaining a final change detection graph x ₀ after T iterations. The process of each iteration can be expressed as:

Wherein, Representing noise, alpha _t andFor the diffusion model parameters, e _θ(x_t,I^a,I^b, t) is the noise estimate.

In the forward process, noise is gradually added to the image, causing it to gradually transition from the original image to pure noise. Let the original image be x ₀, after T iterations, the final noise image x _T is obtained. The forward procedure can be expressed as:

Where β _t is the dimensional parameter of the noise. By the accumulated noise adding process, a noise image at any time can be obtained:

Wherein,

In one embodiment, during the reversal, the noise is removed step by step, and the noisy image is restored to the original image. The reverse process is represented by the following formula:

Wherein mu _θ(x_t, t) is the estimated mean value, Is the variance of the estimate.

In the reverse process, the key is an accurate estimate of the noise. Let the image at the current time be x _t, the model generates a noise estimate by means of a noise estimation network co _θ:

e_θ(x_t,I^a,I^b,t)＝D(GHAT(E(m^LC+P(x_t),t),m^HC),t)

Where E and D represent encoder and decoder, respectively, GHAT represent global mixed attention mechanisms, m ^LC and m ^HC represent low and high dimensional change features, respectively, and P (x _t) represents noise features.

According to the noise estimation value, calculating a denoising image in the reverse process:

the final inverse denoising equation is:

x_t-1＝μ_θ(x_t,t)+σ_tz_t

In one embodiment, the noise estimation model adopts Unet structure, including a second encoder and a second decoder,

Specifically, in the noise estimation model, the key of noise estimation is to encode and decode the input noise feature and the variation feature, and generate a high-precision variation detection map. Let the noise of the input be The change is characterized in thatFirst, the input feature map is processed by a second encoder to generate a multi-dimensional feature representation:

E₀＝P(x_t)+m^LC

E_i+1＝MaxPool(ReLU(Conv(E_i)))

Wherein E _i represents an output feature map of the i-th layer of the second encoder.

In the second decoder section, the multi-dimensional features of the second encoder are gradually restored to the original resolution by upsampling and jump-connecting:

D₀＝E_n

D_i+1＝UpSample(D_i)

D_i+1＝Concat(D_i+1,E_n-i)

D_i+1＝ReLU(Conv(D_i+1))

Finally, the second decoder outputs a high-precision remote sensing image change detection chart:

O=ReLU(Conv(D_n))

In order to further improve the accuracy of the change detection, NEUNet gradually corrects and refines the change detection result through an iterative optimization mechanism. In each iteration, the change detection result is corrected step by step through the multidimensional feature and the global mixed attention mechanism, and finally a high-precision remote sensing image change detection diagram is generated.

Referring to fig. 2, an embodiment of the present invention further provides a remote sensing image change detection apparatus based on an iteration Mamba architecture, where the apparatus includes:

The feature extraction module 10 is configured to input a first remote sensing image and a second remote sensing image to a Mamba feature extractor, respectively, so as to extract a multi-dimensional feature map set of the first remote sensing image and a multi-dimensional feature map set of the second remote sensing image, where the multi-dimensional feature map set includes at least a high-dimensional feature map and a low-dimensional feature map;

The state space change detection module 20 is configured to model the multi-dimensional feature map set of the first remote sensing image and the multi-dimensional feature map set of the second remote sensing image through a state space, so as to obtain multi-dimensional long-frequency change feature map sets of the first remote sensing image and the second remote sensing image, where the multi-dimensional long-frequency change feature map feature sets at least include a high-dimensional change feature map and a low-dimensional change feature map;

the low-dimensional feature generation module 30 is configured to perform feature fusion on the feature set of the multi-dimensional long-frequency variation graph to obtain a low-dimensional fusion variation feature graph;

The global mixed attention module 40 is configured to perform feature fusion on the high-dimensional change feature map and the low-dimensional fusion change feature map to generate a fusion output feature map;

the iterative diffusion module 50 is configured to perform noise correction on the fused output feature map through an iterative diffusion model, so as to generate a remote sensing image change detection map.

In this embodiment, for specific implementation of each module in the above embodiment of the apparatus, please refer to the description in the above embodiment of the method, and no further description is given here.

Referring to fig. 3, in an embodiment of the present invention, there is further provided a computer device, which may be a server, and an internal structure thereof may be as shown in fig. 3. The computer device includes a processor, a memory, a display screen, an input device, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store the corresponding data in this embodiment. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements the method for remote sensing image change detection based on the iterative Mamba architecture.

It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present inventive arrangements and is not intended to limit the computer devices to which the present inventive arrangements are applicable.

An embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above method. It is understood that the computer readable storage medium in this embodiment may be a volatile readable storage medium or a nonvolatile readable storage medium.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present invention and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, apparatus, article, or method that comprises the element.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes using the descriptions and drawings of the present invention or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims

1. A remote sensing image change detection method based on an iterative Mamba architecture, the method comprising:

2. The method of claim 1, wherein the Mamba feature extractor comprises a linear embedding layer and an N-layer encoder layer, the encoder layer comprising a VSS block and a patch merging layer, the step of inputting a first remote sensing image to the Mamba feature extractor to extract a set of multi-dimensional feature maps of the first remote sensing image, comprising:

3. The method for detecting a change in a remote sensing image based on an iterative Mamba architecture according to claim 2, wherein the steps of inputting the set of multi-dimensional feature maps of the first remote sensing image and the set of multi-dimensional feature maps of the second remote sensing image to a state space change detection module, and obtaining the multi-dimensional long-frequency change feature maps of the first remote sensing image and the second remote sensing image through state space modeling include:

4. The remote sensing image change detection method based on iterative Mamba architecture according to claim 3, wherein the step of performing feature fusion on the multi-dimensional long-frequency change feature set to obtain a low-dimensional fusion change feature map includes:

5. The remote sensing image change detection method based on the iterative Mamba architecture according to claim 2, wherein the step of inputting the high-dimensional change feature map and the low-dimensional fusion change feature map to a global mixed attention module for feature fusion and generating a fusion output feature map includes:

6. The method for detecting remote sensing image change based on iterative Mamba architecture according to claim 1, wherein the step of generating a remote sensing image change detection map by performing noise correction on the fused output feature map through an iterative diffusion model includes:

7. The remote sensing image change detection method based on the iterative Mamba architecture of claim 6, wherein the noise estimate is:

∈θ(xt,Ia,Ib,t)=D(GHAT(E(mLC+P(xt),t),mHC),t)

Where t is the number of iterations, ia and Ib are the variation characteristics, D represents the decoder of the noise estimation model, GHAT represents the global mixed attention mechanism, mLC and mHC respectively represent the low-and high-dimensional variation characteristics, and P (xt) is the input noise characteristic.

8. The method for detecting a change in a remote sensing image based on an iterative Mamba architecture according to claim 6, wherein the inverse denoising formula is:

wherein t is the iteration number, zt represents noise, αt and For the diffusion model parameters, ε θ (xt, ia, ib, t) is the noise estimate, σt is the mean square error of the noise estimate, and xt-1 is the denoised image after the t-1 th iteration.

9. The method for detecting changes in a remote sensing image based on the iterative Mamba architecture of claim 6, wherein the noise estimation model adopts a Unet structure, comprises a second encoder and a second decoder,

10. A remote sensing image change detection device based on an iterative Mamba architecture, the device comprising: