Remote sensing image land coverage evolution time-space diagram analysis method for self-adaptive migration training sample
Technical Field
The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image land coverage evolution time-space diagram analysis method of a self-adaptive migration training sample.
Background
The remote sensing image is image data of the earth surface and surrounding environment obtained by a remote sensing technology, and comprises visible light, infrared and microwave band data obtained by satellites, airplanes and unmanned aerial vehicles of different types. It is difficult to obtain high quality, continuous, long time series of training samples, influenced by climatic, geographical location factors. The self-adaptive migration training samples can automatically select proper training samples, so that the tedious processes of manually marking and selecting samples are avoided, and the training samples which are wider and suitable for different time periods are provided.
The self-adaptive migration sample is an automatic cross-time-period training sample migration method, and the method utilizes an optimization algorithm to automatically determine a judgment threshold value by calculating SAD and ED between a source domain spectrum and a target domain spectrum, and automatically identifies and selects pixels which remain stable and have no change in different time periods as training samples, so that stable sample pixels can be utilized to train a model, and the model is suitable for dynamic change of land coverage. Thus, even if there are differences and variations in the time series, the method can automatically select an appropriate training sample to represent the land cover conditions for different time periods. In the conventional method, the threshold value of the migration sample is manually determined through visual analysis, and the threshold value is inaccurate and can affect the accuracy of the training sample.
In classical image time series analysis, land cover classification is often regarded as a static problem, that is, images in different periods are regarded as independent samples, each period is classified, effective association and fusion between data in adjacent periods are lacking, dynamic changes of land cover types are difficult to capture, and in classical methods, in land cover classification, pixel-level features such as colors and textures are often focused, and description of land feature structures is ignored. The invention designs a method for adaptively determining a threshold value to migrate training samples, which combines migrated training samples with a long-time sequence, can classify and map in space-time dimension, and further researches the dynamic change trend and mode of the structural features of land coverage classification.
Disclosure of Invention
The invention aims to solve the problems of lack of long-time sequence training samples and neglected space-time correlation and structure description in classical image time sequence analysis, and designs a method for analyzing a remote sensing image land coverage evolution space-time diagram of a self-adaptive migration training sample.
In order to achieve the above object, the technical scheme of the present invention is as follows:
a remote sensing image land coverage evolution time-space diagram analysis method of a self-adaptive migration training sample comprises the following steps:
11 Preparation work of a remote sensing image land coverage evolution space-time diagram analysis method of the self-adaptive migration training sample: acquiring ground surface reflectivity data of a Source domain remote sensing satellite image Source and a Target domain remote sensing satellite image Target, and a land coverage classification tag image data set; preprocessing Source and Target images; calculating an index feature set based on spectral features of the ground surface reflectivity data set, wherein the index feature set comprises a normalized vegetation index (NDVI), a normalized water index (NDWI), a normalized building index (NDBI) and a Differential Vegetation Index (DVI), and the first main component and the second main component are obtained through main component analysis (PCA) to construct a texture feature set;
12 A process of adaptively migrating training samples to generate remote sensing long-time sequence data and using a model to conduct land cover classification mapping: selecting reference years of a source domain, selecting adjacent years as a target domain, randomly and uniformly selecting sample points, and extracting a reference spectrum corresponding to each training sample pixel; for images with target fields containing given training sample pixel positions, selecting an image closest to the source field time as a target image, and taking a spectrum of a sample in the target image as a target spectrum; calculating a Spectrum Angle Difference (SAD) and a Euclidean Distance (ED) between a reference spectrum and a target spectrum, and comparing the measured value with a threshold value determined by a particle swarm optimization algorithm to obtain an unchanged sample as a sample capable of migrating to a target domain; performing fine adjustment on the trained model in the source domain according to the selected feature set, and performing land coverage classification prediction and drawing for the target domain;
13 A process of land coverage evolution time-space diagram analysis based on remote sensing long-time sequence data: constructing a space-time diagram according to a time-series land cover classification map to connect each Land Cover Object (LCO) in the spatial and temporal domains, including constructing a spatial adjacency diagram and connecting LCOs by a time connection rule; calculating and distributing the attribute of each node and each edge, and deducing the structural characteristics from the space-time topology of the graph; the spatial and temporal dynamics of LCO are described on the graph level, each node and each edge are summarized, a time-space graph of a long time sequence is constructed, and the details and the structure of the land cover evolution are analyzed.
The preparation work of the remote sensing image land coverage evolution time-space diagram analysis method of the self-adaptive migration training sample comprises the following steps:
21 Acquiring ground surface reflectivity data of a Source domain remote sensing satellite image Source and a Target domain remote sensing satellite image Target, and a land coverage classification label image dataset;
22 Performing cloud removal and cloud shadow treatment on all images, performing normalization treatment on source domain label pixel values, performing non-overlapping cutting on all images, and cutting to a fixed size of 256 multiplied by 256 pixels;
23 Based on the spectral characteristics of the ground surface reflectivity dataset, calculating to obtain an index characteristic set, wherein the index characteristic set comprises a normalized vegetation index (NDVI), a normalized water index (NDWI), a normalized building index (NDBI) and a Differential Vegetation Index (DVI), the first main component and the second main component are obtained by carrying out main component analysis (PCA) on an original image, and then a gray level co-occurrence matrix is utilized to extract texture characteristic factors to construct the texture characteristic set.
The specific steps of generating remote sensing long-time sequence data by the self-adaptive migration training sample and using the model to carry out land coverage classification mapping are as follows:
31 The specific steps for generating remote sensing long-time sequence data by the adaptive migration training sample are as follows:
311 Selecting remote sensing satellite types, determining a time span of a long time sequence, selecting reference years as a source domain at intervals of five years, selecting sample points according to a random and uniform principle, and extracting a reference spectrum of a corresponding remote sensing image from each training sample pixel;
312 For a given training sample pixel position, in the image of the target domain, selecting an image close to the source domain time as a target image, taking the spectrum of a corresponding pixel in the target image as a target spectrum, and calculating SAD and ED between the reference spectrum and the target spectrum according to the following formula:
wherein N represents the number of sample points in the pixel, X i(t1) Representing t 1 The value of the ith spectral band in the time reference spectral band, Y i(t2) Representing t 2 The value of the ith spectral band in the target spectral band at the moment;
wherein θ represents an included angle, cos (θ) represents t 1 ,t 2 Angular distance between time sample points;
313 The specific steps of the particle swarm optimization algorithm to determine SAD and ED thresholds are as follows:
3131 Selecting a proper sample of the source domain data set, dividing the sample into four seasons according to the time sequence, and using ED and SAD threshold valuesFor the particle variable (x 1 ,x 2 ) And the threshold value range is [0,1 ]]Initializing the position and speed of particles under the constraint condition, migrating training samples to a target domain under different threshold combinations, inputting sample characteristics into a model trained in a source domain, and calculating land coverage classification prediction accuracy as a target function according to the following formula:
in the formula, accuracy represents land coverage classification prediction Accuracy and N r Indicating the number of sample points for which prediction is correct, N w A number of sample points representing prediction errors;
3132 Setting the maximum iteration Number (max_iter) and the Number of particles (Number), and continuously updating the particle position, the speed and the inertia coefficient before reaching the max_iter according to the following formula to change the particle position to the optimal solution direction:
wherein i represents a particle number, d represents a particle dimension number, k represents the number of iterations, w represents an inertial weight, and c 1 Representing individual learning factors, c 2 Represents a population learning factor, r 1 ,r 2 The random number is represented by a number,a velocity vector representing the d-th dimension of particle i in the kth iteration,/v>A position vector representing the d-th dimension of particle i in the kth iteration,/v>Representing the historical optimal position of particle i in the kth iteration in the d-th dimension,/o>Representing a historical optimal position of the population in a d-th dimension in a k-th iteration;
wherein w is max Represents the maximum inertial weight, w min Representing a minimum inertial weight;
3133 Up to the maximum iteration number, outputting the accurate threshold value of SAD and ED;
314 Comparing the measured value with a threshold value determined by a particle swarm optimization algorithm according toObtaining an unchanged sample as a sample that can migrate to the target domain;
315 Fine-tuning the trained model in the source domain for land coverage classification prediction results and long-time sequence classification mapping of the target domain according to the selected feature set.
The specific steps of the land coverage evolution time-space diagram analysis based on the remote sensing long-time sequence data are as follows:
41 Constructing a graph structure G (V, E) according to a land cover classification graph of a long time sequence, wherein a node V represents LCO, LCO represents a plaque of a specific land cover type, a node is defined as the area, perimeter and land cover classification type of all continuous units of the same land cover type, an edge E represents the space or time interaction between a pair of LCOs and represents a time edge or a space edge, the space edge is a specific space connection between LCOs at a certain time point, and the time edge is a time connection between LCOs which are partially or completely overlapped in the geographic range of the LCOs at adjacent time points;
42 The construction idea according to the space adjacency graph of the long time sequence is specifically as follows:
421 Using adjacency as a spatial connection between LCOs, defining that two LCOs are in contact with each other, i.e. share a part of their boundary, they are spatially connected, becoming neighbors of each other, the spatial map representing all nodes of the LCO and all spatial edges formed by their spatial adjacency at the same time;
422 Nodes of the spatial map represent only LCO, edges in the spatial adjacency map are undirected (i.e., symmetrical or bi-directional), the spatial adjacency map is connected, for any two nodes, there is at least one path from one node to the other, any pair of LCO's can be correlated by spatial adjacency;
43 The construction idea according to the time transition diagram of the long time sequence is specifically as follows:
431 Time connection (transition diagram) describes the change in LCO over time, small changes in LCO at the boundary may create small overlap areas, the overlap is removed according to the following equation:
wherein P is i t And P i t+1 LCO at times t and t +1,represents the overlapping Area, area (P) i t ) Indicating the area size of LCO at time t, TH lap Representing an area ratio threshold, defined as 0.98;
the time connection is directional, from the current moment to the next moment, i.e. from the source to the destination, the LCO at the current moment being the destination of the LCO at the previous moment and also being the source of the LCO at the next moment;
432 Seven types of transition are defined, and the transition types are continued, expanded, contracted, combined, split, created and dissipated according to the following judgment:
continuing:
expansion: area (P) i t )<Area(P i t+1 )(i,j=1)
Shrinkage: area (P) i t )≥Area(P i t+1 )(i,j=1)
Combining:
splitting:
creating:
dissipation:
433 Constructing a time transition diagram: the time series related LCOs are all connected to form an evolution track, the time uncorrelated LCOs cannot be connected by a time edge, each LCO only belongs to a certain evolution track, each evolution track is independent, two created and dissipated transitions do not occur in the form of a time edge, but can be recorded in a graph structure by tracking the origin and the destination of a node, when the LCO at the current moment has no source, the LCO time transition is created at the previous moment, and when the LCO has no target, the LCO time transition is dissipated at the next moment;
44 Constructing a long time series space-time diagram according to the space adjacency diagram and the time transition diagram for analysis: the node meaning is LCO area, perimeter and land coverage classification category, the edge meaning is overlap area, transition type and adjacent boundary length, the constructed structure is converted into role, namely LCO, adjacent type and evolution process; by using long-term continuous observation data, the constructed point-edge graph structure can reveal the space-time correlation and structure description which are ignored in classical image time sequence analysis, and display the spatial adjacency pattern of land cover classification and the consistency and spatial distribution of the dynamic and land cover type conversion characteristics and evolution behaviors of the spatial adjacency pattern.
Advantageous effects
Compared with the prior art, the method for analyzing the remote sensing image land coverage evolution time-space diagram of the self-adaptive migration training sample automatically determines the judgment threshold value by calculating SAD and ED between the source domain spectrum and the target domain spectrum and utilizing a particle swarm optimization algorithm, improves the accuracy of the training sample, solves the uncertainty of manually determining the threshold value, automatically identifies and selects pixels which remain stable in different time periods as the training sample, avoids the complicated process of manually marking and selecting the sample, improves the working efficiency, and enables the model to adapt to the dynamic change of land coverage. In addition, combining the migration samples with long time series, introducing a space-time diagram structure, revealing spatio-temporal correlations and structural descriptions that were ignored in classical image time series analysis, exhibiting spatial adjacency patterns of the earth coverage classification and its dynamic, earth coverage type conversion features and temporal and spatial distributions of evolution behavior, cutting into the course of the earth coverage classification evolution uniquely from the structural level.
Drawings
FIG. 1 is a process sequence diagram of the present invention;
FIG. 2 is a schematic diagram of an adaptive migration sample flow according to the present invention;
FIG. 3 is a schematic diagram of a process for adaptively determining a threshold according to the particle swarm algorithm of the present invention;
FIG. 4 is a schematic diagram of a process of analysis of a space-time diagram of the earth coverage evolution according to the present invention;
FIG. 5 is a schematic illustration of the experimental results of the method according to the present invention.
Detailed Description
For a further understanding and appreciation of the structural features and advantages achieved by the present invention, the following description is provided in connection with the accompanying drawings, which are presently preferred embodiments and are incorporated in the accompanying drawings, in which:
as shown in fig. 1, the remote sensing image land coverage evolution time-space diagram analysis method of the self-adaptive migration training sample comprises the following steps:
firstly, preparing a remote sensing image land coverage evolution time-space diagram analysis method of a self-adaptive migration training sample:
acquiring ground surface reflectivity data of a Source domain remote sensing satellite image Source and a Target domain remote sensing satellite image Target, and a land coverage classification tag image data set; preprocessing Source and Target images; based on the spectral characteristics of the ground surface reflectivity dataset, an index characteristic set is obtained through calculation, wherein the index characteristic set comprises a normalized vegetation index (NDVI), a normalized water index (NDWI), a normalized building index (NDBI) and a Differential Vegetation Index (DVI), and a first principal component and a second principal component are obtained through Principal Component Analysis (PCA), so as to construct the texture characteristic set.
The method comprises the following specific steps:
(1) Acquiring ground surface reflectivity data of a Source domain remote sensing satellite image Source and a Target domain remote sensing satellite image Target, and a land coverage classification tag image data set;
(2) Performing cloud removal and cloud shadow treatment on all images, performing normalization treatment on source domain label pixel values, performing non-overlapping cutting on all images, and cutting to a fixed size of 256 multiplied by 256 pixels;
(3) Based on the spectral characteristics of the ground surface reflectivity dataset, calculating to obtain an index characteristic set, wherein the index characteristic set comprises a normalized vegetation index (NDVI), a normalized water index (NDWI), a normalized building index (NDBI) and a Differential Vegetation Index (DVI), a first main component and a second main component are obtained by carrying out main component analysis (PCA) on an original image, and then a gray level co-occurrence matrix is utilized to extract texture characteristic factors to construct the texture characteristic set.
Secondly, the adaptive migration training sample is used for generating remote sensing long-time sequence data and using a model to carry out the process of land coverage classification mapping:
as shown in fig. 2, selecting a reference year of a source domain, selecting adjacent years as a target domain, randomly and uniformly selecting sample points, and extracting a reference spectrum corresponding to each training sample pixel; for images with target fields containing given training sample pixel positions, selecting an image closest to the source field time as a target image, and taking a spectrum of a sample in the target image as a target spectrum; calculating a Spectral Angular Difference (SAD) and a Euclidean Distance (ED) between a reference spectrum and a target spectrum, and comparing the measured value with a threshold value determined by a particle swarm optimization algorithm shown in FIG. 3 to obtain an unchanged sample as a sample capable of migrating to a target domain; and according to the selected feature set, performing fine adjustment on the model trained in the source domain, and performing land coverage classification prediction and mapping for the target domain.
The method comprises the following specific steps:
(1) The specific steps for generating remote sensing long-time sequence data by self-adaptive migration training samples are as follows:
(1-1) selecting remote sensing satellite types, determining a time span of a long time sequence, selecting reference years as a source domain every five years of time intervals, selecting sample points according to a random and uniform principle, and extracting a reference spectrum of a corresponding remote sensing image from each training sample pixel;
(1-2) for a given training sample pixel position, in the image of the target domain, we choose the image close in time to the source domain as the target image, and take the spectrum of the corresponding pixel in the target image as the target spectrum, and calculate SAD and ED between the reference spectrum and the target spectrum according to the following formula:
where N represents the number of sample points in the pixel,representing t 1 The value of the ith spectral band in the temporal reference spectral band,/->Representing t 2 The value of the ith spectral band in the target spectral band at the moment;
wherein θ represents an included angle, cos (θ) represents t 1 ,t 2 Angular distance between time sample points;
(2) The particle swarm optimization algorithm determines SAD and ED thresholds as follows:
(2-1) selecting a source domain data set, randomly sampling a proper sample, dividing the sample into four seasons according to a time sequence, and measuring ED and SAD threshold valuesFor the particle variable (x 1 ,x 2 ) And the threshold value range is [0,1 ]]Initializing the position and speed of particles under the constraint condition, migrating training samples to a target domain under different threshold combinations, inputting sample characteristics into a model trained in a source domain, and calculating land coverage classification prediction accuracy as a target function according to the following formula:
in the formula, accuracy represents land coverage classification prediction Accuracy and N r Indicating the number of sample points for which prediction is correct, N w A number of sample points representing prediction errors;
(2-2) setting a maximum iteration Number (max_iter) and a Number of particles (Number), and continuously updating the positions, the speeds and the inertia coefficients of the particles before the max_iter is reached, so that the positions of the particles are changed to an optimal solution direction according to the following formula:
wherein i represents a particle number, d represents a particle dimension number, k represents the number of iterations, w represents an inertial weight, and c 1 Representing individual learning factors, c 2 Represents a population learning factor, r 1 ,r 2 The random number is represented by a number,a velocity vector representing the d-th dimension of particle i in the kth iteration,/v>A position vector representing the d-th dimension of particle i in the kth iteration,/v>Representing the historical optimal position of particle i in the kth iteration in the d-th dimension,/o>Representing a historical optimal position of the population in a d-th dimension in a k-th iteration;
wherein w is max Represents the maximum inertial weight, w min Representing a minimum inertial weight;
(2-3) reaching the maximum iteration number, and outputting a threshold value of SAD and ED accuracy;
(3) Comparing the measured value with a threshold value determined by a particle swarm optimization algorithm according toObtaining an unchanged sample as a sample that can migrate to the target domain;
(4) And fine-tuning a trained model in the source domain according to the selected feature set, wherein the model is used for land coverage classification prediction results and long-time sequence classification mapping of the target domain.
Thirdly, a land coverage evolution time-space diagram analysis process based on remote sensing long-time sequence data:
constructing a space-time diagram according to the time-series land cover classification map to connect each Land Cover Object (LCO) in the spatial and temporal domains, as shown in fig. 4, constructing a spatial adjacency diagram and connecting LCO by a time connection rule; calculating and distributing the attribute of each node and each edge, and deducing the structural characteristics from the space-time topology of the graph; the spatial and temporal dynamics of LCO are described on the graph level, each node and each edge are summarized, a time-space graph of a long time sequence is constructed, and the details and the structure of the land cover evolution are analyzed.
The method comprises the following specific steps:
(1) Constructing a graph structure G (V, E) according to a land cover classification graph of a long-time sequence, wherein a node V represents LCO, LCO represents patches of a specific land cover type, a node is defined as the area, perimeter and land cover classification type of all continuous units of the same land cover type, an edge E represents space or time interaction between a pair of LCOs and represents a time edge or a space edge, the space edge is a specific space connection between LCOs at a certain time point, and the time edge is a time connection between LCOs which are partially or completely overlapped at geographic ranges of adjacent time points;
(2) The construction idea according to the space adjacency graph of the long time sequence is specifically as follows:
(2-1) defining two LCOs in contact with each other, i.e., sharing a portion of their boundaries, using adjacency as a spatial connection between the LCOs, which are spatially connected, becoming neighbors of each other, the spatial map representing all nodes of the LCO and all spatial edges formed by their spatial adjacencies at the same time;
(2-2) nodes of the spatial map representing only LCO, edges in the spatial adjacency map being undirected (i.e., symmetrical or bi-directional), the spatial adjacency map being connected, for any two nodes, at least one path from one node to the other, any pair of LCO's being associable by spatial adjacency;
(3) The construction idea according to the time transition diagram of the long time sequence is specifically as follows:
(3-1) time connection (transition diagram) describes the change in LCO over time, small changes in LCO at the boundary may create small overlap regions, the overlap is removed according to the following equation:
wherein P is i t And P i t+1 LCO at times t and t +1,represents the overlapping Area, area (P) i t ) Indicating the area size of LCO at time t, TH lap Representing an area ratio threshold, defined as 0.98;
the time connection is directional, from the current moment to the next moment, i.e. from the source to the destination, the LCO at the current moment being the destination of the LCO at the previous moment and also being the source of the LCO at the next moment;
(3-2) defining transition types in seven ways, namely continuing, expanding, contracting, merging, splitting, creating and dissipating, and judging according to the following steps:
continuing:
expansion: area (P) i t )<Area(P i t+1 )(i,j=1)
Shrinkage: area (P) i t )≥Area(P i t+1 )(i,j=1)
Combining:
splitting:
creating:
dissipation:
(3-3) constructing a time transition diagram: the time series related LCOs are all connected to form an evolution track, the time uncorrelated LCOs cannot be connected by a time edge, each LCO only belongs to a certain evolution track, each evolution track is independent, two created and dissipated transitions do not occur in the form of a time edge, but can be recorded in a graph structure by tracking the origin and the destination of a node, when the LCO at the current moment has no source, the LCO time transition is created at the previous moment, and when the LCO has no target, the LCO time transition is dissipated at the next moment;
(4) Constructing a long-time sequence space-time diagram according to the space adjacency diagram and the time transition diagram for analysis: the node meaning is LCO area, perimeter and land coverage classification category, the edge meaning is overlap area, transition type and adjacent boundary length, the constructed structure is converted into role, namely LCO, adjacent type and evolution process; by using long-term continuous observation data, the constructed point-edge graph structure can reveal the space-time correlation and structure description which are ignored in classical image time sequence analysis, and display the spatial adjacency pattern of land cover classification and the consistency and spatial distribution of the dynamic and land cover type conversion characteristics and evolution behaviors of the spatial adjacency pattern.
The method of the present invention is described below by way of example with respect to a first full season sample set (FAST), a remote sensing satellite Landsat-8 dataset, and a National Land Cover Database (NLCD) dataset:
the FAST data set is taken as a source domain data set, the reference time of the data set is 2015, the FAST contains 343237 training sample units and 144334 verification sample units in total, a center pixel is extracted from the training sample units to serve as a training sample pixel, and the resolution is 30 m. The Landsat-8 data set is taken as a target domain data set, 11 wave bands are total, the spatial resolution of 1-7 wave bands and 9-11 wave bands is 30m, the 8 th wave band is a full-color wave band of 15m, and the condition without data is screened in the image selection process. The label dataset is NLCD, the label is a high quality pixel-level label, and 20 categories are all: open waters, perennial ice and snow, developed open spaces, developed low strength, developed medium strength, developed high strength, barren lands (rock/sand/clay), deciduous forests, evergreen forests, hybrid forests, bushes, shrubs, grasslands, hedges/herbs, mosses, pastures, cultivated crops, woody wetlands, emerging herbaceous wetlands. In order to realize self-adaptive training sample migration, the FA determines a reasonable threshold for judging spectrum change, 100 sample pixels are randomly selected as each land coverage type in each source domain and target domain, ED and SAD are calculated, 9 groups of training sample pixels with unchanged target domain (target year) in 2019 are obtained from different threshold sets, the classification accuracy is compared, the optimal threshold combination is further selected through a particle swarm optimization algorithm, uncertainty of manually determining the threshold is avoided to a certain extent, and the sample accuracy is improved. Finally, constructing a land cover classification evolution process on a long-time sequence, wherein a space-time diagram is combined by a space adjacency diagram and a time transition diagram, reveals space-time correlation and structure description which are ignored in classical image time sequence analysis, and shows the space adjacency pattern of the land cover classification and dynamic and land cover type conversion characteristics and evolution behaviors thereof. As shown in fig. 5, a local area land cover classification map of san francisco in the united states is selected in combination with time series analysis, and the land cover classification map between 2016-2019 is expanded by an adaptive migration learning sample, and a space adjacency map and a time transition map are constructed.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.