1. Introduction
Landslides are a world-wide occurring natural hazard leading to severe loss of life and infrastructure. A global tendency towards steadily increasing landslide risk can be observed, because of the spreading of settlements in unfavorable regions and the consequences of climate change [
1,
2]. Against this background, improved understanding of landslide processes in space and time is of great importance, requiring multi-temporal landslide inventories [
3–
5]. So far, they have been largely missing for most parts of the world, because of their time and labor intense preparation using conventional mapping methods [
5–
7]. In this context, the increasing availability of optical satellite remote sensing data has opened up new opportunities for spatiotemporal analysis of landslide occurrence covering large areas [
5,
8–
10].
The completeness and quality of remote sensing-based landslide inventories depend on the used multi-temporal image database, whereas a high temporal repetition rate over the longest possible time period of data availability is required in order to perform longer term analysis of landslide occurrence, which is necessary for objective landslide hazard assessment [
3–
5]. For this purpose, the global Landsat archive is of key importance, providing free access to the longest available time series of medium-resolution optical satellite remote sensing data [
11]. However, in order to achieve the best possible temporal data coverage, multi-sensor data have to be used, resulting in a heterogeneous database of varying spatial and temporal resolution.
Despite this variability, precise image-to-image co-registration has to be ensured for all multi-temporal and multi-sensor datasets, because insufficient spatial fit leads to various ambiguities, resulting in the detection of artifact changes [
12,
13], as well as incorrect spatial delineation of landslides. The creation of longer term inventories requires maintaining the geometric stability of the image database over several decades, taking into account seasonal and inter-annual landscape changes. Furthermore, the resulting multi-temporal remote sensing database has to be of sufficient absolute positional accuracy related to an external spatial reference system, allowing the combination of information derived from remote sensing analysis with other spatial data, such as GPS-based field measurements within a GIS environment in order to perform subsequent process and hazard analysis.
The overall goal of the presented study has been the development and application of a methodology for automated image-to-image co-registration in order to create an image database that is suitable for longer term automated landslide detection within a 12,000 km
2 study area in Southern Kyrgyzstan (Central Asia) strongly affected by landslides [
8]. The original image database for this area comprises almost 600 datasets acquired by the multispectral Landsat-(E)TM, SPOT, ASTER and RapidEye satellite systems during the last 26 years. Most of these images were obtained in the form of orthorectified standard data products from the respective satellite data providers. Initial evaluation of the relative spatial fit between these higher-level data products has revealed that significant spatial offsets occur between most of them, including data acquired by the same sensor.
Against this background, the objective has been the development of a co-registration methodology that is suitable to correct for the spatial offsets between large amounts of orthorectified standard data products comprising longer term multi-sensor time series. Thus, the approach has to be able to handle various multi-sensor effects, such as differences between the spatial, spectral and radiometric properties of the image data, as well as multi-temporal effects, such as varying atmospheric, solar and land cover conditions, resulting from seasonal and long-term variability between the image datasets [
14–
16]. Despite the large number of existing methods for automated co-registration, which are comprehensively discussed in Le Moigne
et al., 2011 [
14], Dawn
et al., 2010 [
17] and Zitova and Flusser, 2003 [
18], only a few of these methods are capable of dealing with multi-sensor and multi-temporal effects at the same time.
In general, the existing co-registration methods are classified into two main categories comprising feature-based and area-based techniques [
18]. For accommodating multi-sensor effects during co-registration, feature-based techniques, such as scale-invariant feature transform (SIFT) [
19] and speeded-up robust features (SURF) [
20], are considered to be more suitable, because these techniques use salient features, such as edges, corners, intersections of linear structures and centroids of distinct geometric objects. These features are expected to be geometrically stable despite the sensor-related variability of the image data [
21–
24]. However, in rural mountainous areas, like Southern Kyrgyzstan, such distinct time-invariant features are often scarce and unevenly distributed, which largely increases the likelihood for significant co-registration errors [
21,
25]. For such environments, area-based methods are considered to be more suitable, because co-registration is based on identifying distinctive properties for image matching using intensity information rather than local features [
21,
25]. Hence, area-based methods aim at identifying image areas that are similar in intensity, whereas the commonly used similarity measures are cross-correlation and sequential similarity detection [
18,
26].
Independent of the used co-registration method, most of the already existing approaches have not been developed for fully automated and efficient processing of big amounts of multi-sensor and multi-temporal image data covering large areas over longer periods of time. Therefore, the practical usability of these methods is often limited, because of the high methodological complexity, the big computational effort, as well as additional requirements specific to the analyzed datasets [
15,
17,
22]. The presented study aims at the development of a robust and globally applicable methodology for automated co-registration, which is suitable for efficient correction of spatial offsets between orthorectified standard data products representing multi-sensor time series.
In this context, a spatially and temporally consistent spatial reference system is required, allowing spatial alignment of all datasets with sufficient relative and absolute accuracy. For this purpose, globally available Landsat Level 1T time series data have been selected as a common spatial reference. They are characterized by sub-pixel image-to-image co-registration accuracy throughout the whole time series [
27–
29], whereas the absolute accuracy of the global Landsat Level 1T database has been estimated to 15 m [
27]. Both accuracies are considered to be sufficient for landslide detection at a regional scale. Moreover, Landsat data represent the only source of spatial reference information consistently and repeatedly covering the whole study area, allowing consistent spatial alignment of all time-series datasets, which, in part, are irregularly and patchily distributed over the large study area.
The developed co-registration approach is described in Section 3. The results of spatial alignment are presented in Section 4, comprising sensor-specific analysis for the complete database. In Section 5, the relative and absolute accuracy of the achieved co-registration is analyzed for the whole database, including its influence on the multi-temporal delineation of landslides. The developed methodology is comprehensively discussed in Section 6, focusing on achievable accuracy and overall applicability.
3. Co-Registration of Multi-Temporal and Multi-Sensor Optical Satellite Data
3.1. Overall Approach
Image-to-image co-registration aims at the spatial alignment of the whole database (Section 2.2) to a common spatial reference represented by the Landsat Level 1T data (Section 2.3.1). The developed co-registration approach (Section 3.2) is based on the assumption that the orthorectified standard data products of the various sensors only differ by constant spatial offsets, which can be corrected by applying image-specific shifts. Checking the fulfillment of this condition for each dataset is an integral part of the co-registration approach, which is depicted in its overall structure in
Figure 4.
In order to accommodate the needs originating from the diversity of datasets contained in the comprehensive satellite remote sensing database, two modes have been implemented. The first one enables co-registration of single datasets to the Landsat reference representing the standard case. This mode gets applied if datasets of the same sensor have very small or non-existing spatial overlaps and, thus, cannot be reliably co-registered to each other before aligning them to the Landsat reference. The second mode is applied to data stacks of high temporal and spatial resolution acquired by the same sensor. This two-step mode starts with sensor-internal co-registration of the datasets before they are co-registered en bloc to the Landsat reference using the average of all shifts calculated for the single images of the sensor-internal data stack by the same procedure as in Mode 1. This way, high-accuracy spatial fit between datasets of the same sensor can be achieved or maintained avoiding the uncertainties that get introduced by the co-registration of individual images to a spatial reference of a largely differing spatial resolution. In this study, the second mode has been applied to the RapidEye data, since they are characterized by much higher temporal and spatial resolutions than the Landsat reference and the other medium-resolution satellite remote sensing data, which are co-registered using the first mode.
3.2. Co-Registration to Landsat Reference
Co-registration builds on area-based cross-correlation [
17,
18,
26] that requires the same spatial resolution for the reference image and the warp image. Thus, spatial resampling is a critical step for the performance of the correlation process. In order to get comparable results, downsampling the higher resolution dataset to the lower resolution one is applied, since upsampling of the lower resolution image to the higher resolution one does not allow the reconstruction of spectral details, which are only present in the higher resolution image and, thus, not suitable for correlation purposes. In contrast, downsampling enables the simulation of the spectral signatures of lower resolution data by mixing the spectral information of the higher resolution image. In this study, the warp images of higher spatial resolution are resampled to realistic Landsat pixels by applying a Gaussian filter kernel, which takes into account the spatial resolution of both sensors [
34]. The used approach defines the full width at half maximum (FWHM) of the Gaussian kernel as the ratio between the pixel size of the Landsat reference and the pixel size of the warp image. Once both images have the same spatial resolution, the warp image is shifted to the spatial grid of the Landsat reference as a basis for the following correlation.
Using the cross-correlation method, the warp image is co-registered to the reference image by correlating the intensity values within corresponding subsets of the images defined by a moving window. The subset of maximum correlation corresponds to the displacement that is stored in a tie point. In the presented approach, the red and the near infrared (NIR) bands of the input images are used simultaneously, providing a combined overall correlation value. The selection of these bands is performed in an automated way, as long as respective wavelength information is contained in the header files of the warp and reference images. Since the combination of these two bands reacts very sensitively to changes in vegetation cover, high correlation values can only be obtained for temporarily stable vegetated and non-vegetated areas. Moreover, for the correlation window, a relatively large size (51 × 51 pixels) is selected in order to minimize local ambiguities and further increase the robustness of the approach.
The tie point generation process iteratively selects random pixel positions for centering the correlation windows. The correlation coefficient is calculated for each pixel position within a predefined search range, which, by default, is constrained to five pixels, making the approach more robust and computationally less intense. However, this range can be changed depending on the expected offset. If a correlation coefficient is higher than the empirically determined threshold of 0.93, the offset value is stored as the tie point. This process is repeated, until 100 tie points are identified per image pair or 10% of all image pixels have been checked.
In order to validate the identified tie points and to exclude potential outliers, an affine transformation (translation, rotation, scaling) between both images is assumed, because global translation cannot be introduced as an a priori hypothesis. The biggest outliers in regard to the affine model are excluded in an iterative process, until the RMSE is less than one pixel. In the next step, the obtained optimized affine model allows for validating the initial assumption of a global translation. If the scaling or rotation factors of the affine model are negligibly small, i.e., additional offsets at the image corners are less than 1.5 pixels, the global translation transformation is used for co-registration.
If the validation process fails (e.g., due to an unfavorable tie point constellation) or less than 10 tie points per scene remain after the removal of outliers, one of the two other Landsat reference mosaics (Section 2.3) is selected, starting with the one that is seasonally closer to the warp image acquisition date (
Figure 4). If none of the reference images meets these quality criteria, the warp image is excluded from the automated processing chain and has to be checked by the user with the option of interactively choosing an affine transformation for image wrapping based on the selected tie points.
In the case of a successful correlation, a final sub-pixel optimization is performed for images of significantly higher spatial resolution than the Landsat reference. In the initial simulation of the 30-m warp image, only one possible centering of the Gaussian filter kernel has been used for resampling to the 30-m resolution warp image. However, in principle, the number of possibilities amounts to the number of original warp image pixels fitting within a single reference pixel, e.g., a 10-m resolution SPOT image results in nine different possibilities for centering the Gaussian filter kernel. Hence, the Gaussian filter kernel is moved in single pixel steps over the original warp image grid around the first centering position of the initial resampling step in order to derive all spatial variations of the 30-m warp image. Then, the correlation process is repeated for all of the resampled 30-m warp images at the position of the already identified tie points. The image characterized by maximized overall correlation represents the sub-pixel optimized co-registered warp image at 30-m resolution.
The final shift comprises the sum of the shifts used to align the warp image grid to the Landsat grid, the 30-m pixel shift resulting from the initial correlation and the original resolution pixel shift originating from the sub-pixel optimization. In the last step, this shift is used to co-register the warp image using a global translation. As a result, two co-registered warp images are produced: one in the spatial resolution of the Landsat reference (the best correlation result) and one in the spatial resolution of the original image data. In the case of the original resolution warp image, the shift is used to update the coordinate reference point, and thus, the image is corrected without any resampling, which maintains the original spectral information of the image after co-registration. Both images are aligned to the Landsat reference grid. The simulated 30-m warp image has exactly the same spatial grid as the reference, whereas in the case of the original warp image, the upper left coordinate is aligned to the reference grid. The achievable accuracy of the approach is determined by the spatial resolution of the original data. If the original datasets have the same spatial resolution as the reference, the steps for simulating the reference resolution and sub-pixel optimization are omitted.
3.3. Sensor-Internal Co-Registration
In Mode 2 of the developed approach, sensor-internal co-registration is performed as the first step before the whole data stack is co-registered en bloc to the Landsat reference. For this purpose, a single dataset is selected from the data stack representing the sensor-internal spatial reference. All of the remaining images of the data stack are co-registered to this reference using the image-to-image area-based correlation algorithm implemented in the first mode without performing the simulation of 30-m data and the following sub-pixel optimization. If a dataset cannot be co-registered (due to a failed validation process or less than 10 identified tie points), it is iteratively correlated with already co-registered images of former iterations, until a good co-registration result can be achieved (
Figure 4). This iterative approach allows the co-registration of seasonally differing datasets, resulting in a sensor-internal geometrically-consistent data stack, which is then co-registered en bloc to the Landsat reference. For this purpose, the procedure of Mode 1 is applied to each of the datasets contained in the stack in order to determine the average values for all shifts, which then are used for the en bloc co-registration of the whole data stack.
6. Discussion
6.1. Applicability of Approach
Validation as part of the co-registration approach has revealed that the developed procedure of image-to-image co-registration using image-specific global shifts in the X and Y directions could be applied to all of the 592 datasets contained in the database, showing the high internal geometric stability of the orthorectified standard data products. The application of the global shift method results in the preservation of the original spectral properties of the standard data products, since there is no need for performing another resampling step.
Automated spatial alignment has mostly resulted in shifts of several tens of meters, whereas maximum offsets have been obtained in the case of SPOT-1, amounting to more than 400 m. These results show that the developed approach is capable of handling a wide range of offsets occurring in images of various spatial resolutions ranging between 5 m for RapidEye and 30 m for Landsat data. The successful application of the approach to all datasets also proves its robustness against the variability of image data caused by different multi-sensor and multi-temporal effects, which have the potential for impeding the applicability of co-registration, as well as reducing the achievable accuracy [
14–
16].
Sensor-specific analysis of the applied shifts (Section 4) allows for evaluating the sensor-internal spatial fit between standard data products generated by external providers. In the case of Landsat data, no integer pixel shifts have been applied (Section 4.1), confirming the sub-pixel image-to-image accuracy stated in the literature [
27–
29]. For ASTER and SPOT-5 data, the standard deviations of the applied shifts are less than the respective pixel sizes, and the largest spatial offsets amount to approximately three pixels of the original resolutions (Section 4.2). Sensor-internal RapidEye co-registration (
Figure 6) has revealed a steadily improving spatial fit between the datasets since the start of operational data acquisition in 2009, cumulating in offsets of one pixel or less for most of the images acquired in 2011 and 2012. These results are in accordance with the findings of a study assessing the geometric accuracy of the RapidEye constellation [
32].
6.2. Accuracy Assessment
Assessment of the relative image-to-image co-registration accuracy based on time-invariant check points (CPs) has resulted in an overall accuracy of 17 m (RMSE) and the maximum remaining offsets to the Landsat reference amounting to 20 m (Section 5.1). Taking into account the 30-m resolution of the Landsat reference, these results indicate the sub-pixel relative accuracy of the whole multi-sensor database. Sensor-specific analysis of the achieved relative accuracy shows high sensor-internal spatial fit for the SPOT-5 and RapidEye datasets, which exceeds the accuracy obtained in relation to the Landsat reference. In the case of RapidEye, the results show that implementation of Mode 2 allows for generating (Step 1) and maintaining (Step 2) high image-to-image accuracies within sensor-internal data stacks during multi-sensor co-registration.
Moreover, it is noticeable that after co-registration, the majority of images still show a small northwestern shift in regard to the Landsat reference (
Figure 8b), implying a systematic offset, which is of a significantly lesser amount than one Landsat pixel (mean values of Δx
IM: −8 m, Δy
IM: 12 m). Such a bias could not be observed for the co-registration of the Landsat time series (Section 4.1) and also not for sensor-internal RapidEye co-registration using Step 1 of Mode 2 (
Figure 6). Therefore, it is assumed that this small bias is caused by a systematic offset between the Landsat reference and the CPs digitized in panchromatic SPOT-5 datasets, which have been manually co-registered to the Landsat reference (Section 2.3.2). In this case, the detected bias most likely originates from the manual co-registration step and, thus, is not a result of the co-registration approach itself. Direct determination of this offset has not been possible, because of the big difference in spatial resolution between the Landsat reference (30 m) and the panchromatic SPOT-5 datasets (2.5 m and 5 m).
Overall, the achieved relative image-to-image accuracies are comparable or, in parts, even better than the accuracies obtained by other studies dealing with the co-registration of optical time series data. The approach proposed by Gianinetto for automatic co-registration of Level 1A ASTER time series data [
16] has resulted in RMSE values of less than two pixels. Liu and Chen [
35] have co-registered multi-temporal Formosat-2 Level 1A images (8-m resolution), achieving a RMSE of approximately 1.5 pixels in flat terrain and 2.2 pixels in mountainous areas. Barazetti
et al. [
36] automatically co-registered 13 Landsat TM datasets acquired over a 30-year period mainly by correcting sub-pixel translation errors, resulting in a relative accuracy of sub-pixel RMSE values. Although, accuracy requirements for co-registration depend on the targets and methods used for change detection [
12,
37], in general, accuracies (RMSE) of less than one pixel are considered suitable for subsequent change detection [
38].
The absolute accuracy of the whole co-registered database, which has been assessed based on DGPS point measurements (Section 2.3.3), amounts to an RMSE of 23 m and a maximum position error of 29 m, whereas a clear systematic error of 22 m in the western and 5 m in the northern direction could be observed. These results indicate that the Landsat reference is systematically offset to the high accuracy DGPS points. This assumption is further supported by the low standard deviation of the derived absolute offsets (X: 3 m, Y: 5 m), implying a high relative accuracy between images, which is of primary importance for subsequent multi-temporal landslide detection.
Due to the availability of the field measured DGPS points and the high geometric stability of the multi-temporal Landsat reference, it is possible to correct for this systematic offset by applying a constant spatial shift. This procedure has resulted in remaining maximum absolute errors of approximately 6 m in the X and 10 m in the Y direction (Section 5.2). However, even the uncorrected absolute offsets are considered to be negligibly small, taking into account the near global availability of the Landsat reference, allowing for the world-wide application of the developed approach without requiring any ground control information. This is especially the case for large area analysis, such as landslide detection at a regional scale.
6.3. Accuracy of Multi-Temporal Landslide Delineation
Multi-temporal digitization of three stationary landslides within all datasets covering the landslides before and after co-registration (Section 5.3) has revealed a significant improvement in the relative spatial fit of landslide delineation, reducing the maximum offset from 59 m before co-registration to 9 m after co-registration (
Figure 10). These relative accuracies correspond to the maximum absolute offsets, which can be derived for the whole database after correcting for the systematic error introduced by the Landsat reference. These findings indicate that the relative accuracy improvements, which can be observed for all three landslides after co-registration are valid for the whole study area. The remaining relative uncertainty of about 10 m forms a suitable basis for reliable multi-temporal landslide detection, as well as the identification of changes within already existing landslides.
Besides the discussed relative accuracy of landslide delineation, sufficient absolute accuracy is also important for the integration of information derived from remote sensing analysis into a GIS environment for further landslide hazard and risk analysis. Comparing the achieved absolute accuracy of 23 m (RMSE) with the United States National Map Accuracy Standards [
39] has shown that the approach meets the requirements for a mapping scale of 1:50,000 and smaller, which is suitable for landslide analysis at a regional scale.
6.4. Methodological Aspects
It could be shown that the developed co-registration approach is suitable for the efficient spatial alignment of a large database containing numerous multi-temporal and multi-sensor standard data products. Incorporation of three seasonally differing Landsat reference datasets has allowed for successful matching of images characterized by high multi-temporal variability. The implementation of a special resampling procedure [
34] transforming the spatial resolution of the warp image to the one of the Landsat reference enables the application of area-based cross-correlation to images of varying spatial resolution acquired by different optical sensors.
The methodological constraints of the developed co-registration approach are related to the applied area-based cross-correlation, which is restricted to matching images of only slight affine distortions [
18], as well as by the implemented geometric transformation only supporting co-registration based on image-specific two-dimensional shifts. These constraints are attributed to the goal of developing a robust and efficient co-registration approach that can be applied in a fully automatic way to a large number of higher level standard data products, which, in general, are characterized by high internal geometric stability. This assumption could be confirmed for all of the 592 analyzed datasets by an initial validation procedure as part of the co-registration approach. The automatically detected tie points forming the basis for calculating the two-dimensional shifts could also be used in the frame of higher-degree transformation methods, which would allow for correcting more complex local distortions. However, it also needs to be taken into account that higher-degree transformations tend to produce local errors, depending on the spatial distribution and the number of tie points, which has to be larger in order to solve these transformation functions. Furthermore, the use of the two-dimensional shift transformation is more robust against localization errors related to the detected tie points.
In order to preserve the spectral information of the original image datasets for subsequent spectral image analysis, the co-registration approach aligns the original images to the Landsat pixel grid without any further resampling. Thus, the achievable accuracy of the approach is determined by the spatial resolution of the original warp image, allowing for sub-pixel accuracy related to the spatial resolution of the Landsat reference (e.g., 0.16 pixels for 5-m RapidEye data). The implementation of sub-pixel image matching techniques would result in the need for resampling the spectral information of the original warp images and lead to much longer processing times, which would impede the efficient usability of the approach for larger amounts of data.
Compared to other approaches, such as AROP [
15], TARA [
40] and COSI-CORR [
41] aiming at the precise correction of complex geometric distortions, the developed co-registration approach represents a less sophisticated, yet robust and efficient, methodology, which can be applied in a fully automated way to large amounts of multi-sensor time series data, resulting in high relative and absolute accuracy.
7. Conclusions and Outlook
In this study, a new methodology for fully automated co-registration of optical satellite remote sensing data has been developed, allowing for the efficient and robust spatial alignment of big amounts of orthorectified standard data products acquired during the last 26 years for Southern Kyrgyzstan. The co-registration approach is capable of accommodating high image data variability resulting from varying spatial resolutions, as well as seasonal and inter-annual land cover variability. Applying co-registration to the whole database of 592 datasets from five different sensors has resulted in image-specific shifts ranging between 5 m and more than 400 m, showing the robustness of the approach and its suitability for the evaluation of relative spatial fit between standard data products. Moreover, spatial alignment is performed without any further resampling of the initial datasets, maintaining their original spectral information, which is advantageous for subsequent automated image analysis.
Due to the use of freely and globally available Landsat Level 1T data as the spatial reference, the developed methodology is independent of local geometric reference information and can be used in any part of the world covered by suitable Landsat Level 1T data. In this context, the launch of the Landsat-8 Operational Land Imager (OLI) on 11 February 2013, as well as the future Sentinel-2 mission [
42] will ensure its future applicability. The overall relative accuracy of 17 m, as well as the absolute accuracy of 23 m (RMSE) represent sub-pixel accuracy in regard to the 30-m resolution of the Landsat reference. These achieved accuracies make the co-registered database suitable for subsequent multi-temporal change detection and for combination with other spatial data within a GIS environment.
The analysis of co-registration accuracy in relation to multi-temporal landslide delineation has revealed maximum relative spatial offsets of six to 9 m between the otherwise unchanged landslides within the multi-temporal database. These offsets correspond to the minimal size of detectable landslide-related changes. However, since this size is also determined by the coarsest resolution of the used datasets, amounting to 30 m, only changes with an extent of more than 900 m2 can reliably be detected. This is more than sufficient for a region dominated by medium-sized to large failures, such as Southern Kyrgyzstan. Achievable relative image-to-image accuracies of the developed co-registration approach could be further improved by using only higher resolution data (e.g., SPOT-5: 10 m and RapidEye: 5 m). Hence, it would be possible to reliably analyze even smaller changes mostly related to the reactivation of already existing landslides.
Altogether, these findings show that the developed methodology is suitable for robust and efficient co-registration of multi-sensor standard orthorectified data products acquired during longer periods of time. The resulting co-registered datasets of high and medium spatial resolution allow for automated landslide detection at a regional scale. Thus, they have the potential for being used for long-term spatiotemporal analysis, as well as for the monitoring of ongoing landslide activity, both contributing to more complete landslide inventories. However, the developed approach cannot only be used for database generation for landslide detection, but also for spatial alignment of any suitable satellite remote sensing time series data in order to perform subsequent analysis of long-term land cover changes in many parts of the world. This way, the developed co-registration methodology supports remote sensing-based analysis of Earth surface processes, which is important for many applied tasks, such as hazard assessment, environmental monitoring and land-use management.