Abstract
Long-term, daily, and gap-free Normalized Difference Vegetation Index (NDVI) is of great significance for a better Earth system observation. However, gaps and contamination are quite severe in current daily NDVI datasets. This study developed a daily 0.05° gap-free NDVI dataset from 1981–2023 in China by combining valid data identification and spatiotemporal sequence gap-filling techniques based on the National Oceanic and Atmospheric Administration daily NDVI dataset. The generated NDVI in more than 99.91% of the study area showed an absolute percent bias (|PB|) smaller than 1% compared with the original valid data, with an overall R2 and root mean square error (RMSE) of 0.79 and 0.05, respectively. PB and RMSE between our dataset and the MODIS daily gap-filled NDVI dataset (MCD19A3CMG) during 2000 to 2023 are 7.54% and 0.1, respectively. PB between our dataset and three monthly NDVI datasets (i.e., GIMMS3g, MODIS MOD13C2, and SPOT/PROBA) are only −5.79%, 4.82%, and 2.66%, respectively. To the best of our knowledge, this is the first long-term daily gap-free NDVI in China by far.
Similar content being viewed by others
Background & Summary
Vegetation indices (VIs) are pivotal tools for assessing vegetation conditions. Among all VIs, Normalized Difference Vegetation Index (NDVI) is one of the most popular indices due to its high reliability and easy accessibility1. This index has been widely used in many fields, such as identification of vegetation phenology2, assessment of biomass production3, estimation of crop yield4, evaluation of ecosystem stability5, simulation of vegetation carbon sequestration6, and fields related to ecosystem status and functions7,8,9,10,11,12,13,14,15. In addition, many new VIs use NDVI as the basic data to optimize the estimation of some key parameters of the terrestrial biosphere (e.g., leaf area index, gross primary productivity, and sun-induced chlorophyll fluorescence) through nonlinear generalization of NDVI, such as the near-infrared reflectance of terrestrial vegetation (NIRV)16 and the kernel NDVI (kNDVI)17. Therefore, obtaining long term, high temporal resolution (e.g., daily scale), and high accurate NDVI is of great significance for better monitoring the multiple properties of vegetation.
To the best of our knowledge, there are currently two datasets offering daily continuous NDVI: one is the Moderate Resolution Imaging Spectroradiometer (MODIS) MCD19A3CMG v6.1, and the other is the National Oceanic and Atmospheric Administration (NOAA) Climate Data Record (CDR) program. MCD19A3CMG is a global Terra + Aqua combined MODIS Version 6.1 Multi-Angle Implementation of Atmospheric Correction (MAIAC) data dataset, which provides a global daily gap-free NDVI dataset at 0.05° spatial resolution in a global Climate Modeling Grid (CMG) from February 24, 2000 to the present18. Apart from spatial gaps in the NDVI maps on the first few days, the MCD19A3CMG v6.1 NDVI maps are generally complete at other time points. However, its limitation is that data is only available after February 24, 2000. The NOAA CDR program generated a long-term global gridded daily NDVI dataset spanning from 1981 to the present by integrating AVHRR daily surface reflectance from eight NOAA polar orbiting satellites (NOAA-7, −9, −11, −14, −16, −17, −18, and −19), VIIRS daily surface reflectance from the Suomi National Polar-Orbiting Partnership (SNPP) satellite platform and NASA-NOAA Joint Polar Satellite System (JPSS-1 or NOAA-20), and other ancillary data (e.g., digital elevation model, surface pressure, precipitable water, water vapor, and ozone) at 0.05° spatial resolution19,20,21. However, the NOAA CDR NDVI dataset did not fully address the issue of spatial gaps caused by the interference of noise sources. Gaps and noise data are still severe in the NOAA CDR NDVI dataset.
Certainly, a series of methods have been developed to rectify, fill, and reconstruct the contaminated remote sensing NDVI data22. These methods can be broadly categorized into three categories: temporal-based methods (i.e., temporal interpolation-replacement methods, temporal filtering models, temporal function-fitting models, and temporal deep learning models), frequency-based methods, and hybrid methods. These methods have their respective advantages and disadvantages, as well as assumptions and application scopes23. Temporal interpolation-replacement methods may yield subpar outcomes when confronted with extensive data gaps due to their sensitivity to data quality and sample size24. Temporal filtering models are mainly used to smooth NDVI curves, relying on the sizes of sliding windows and filtering functions. The size of the sliding window and the filtering function are predicated on prior knowledge25,26,27, and these methods are particularly susceptible to data plunges or surges28. It is hard to obtain reliable parameters required for temporal function-fitting models when there is much noise in the original data29, which poses challenges for the application of these methods on a large scale. Temporal deep learning models can effectively mitigate noise in most cases, except when there are long temporal gaps that may reduce reconstruction accuracy28. Frequency-based methods require a considerable number of parameters, and perform poorly with irregular time series data30. Some models may exhibit suboptimal performance when there are high-frequency variations31. Hybrid methods consider both temporal continuity and spatial similarity for data denoising. These methods assume that the vegetation phenology in different areas of the same vegetation type in the influence window is the same, and there are discontinuous clouds in space. Therefore, adjacent pixels can be used to reduce the contamination of the target pixel. Hybrid methods can better utilize information in both spatial and temporal domains, but their reliability is influenced by window size, accuracy of the vegetation type, and cloud range32. In summary, the aforementioned methods aim to fill gaps and pay little attention to identifying valid data. However, these gap-filling methods require reliable original valid data in terms of both spatial coverage and temporal density. They can hardly reconstruct reliable results if data is missing or unavailable due to cloud or other noise on a large scale28. Therefore, there is an urgent need to develop a method that can identify valid data and fill gaps simultaneously, in order to reconstruct long-term, daily, and gap-free NDVI data.
This study aims to establish a framework to reconstruct daily gap-free NDVI data by combining valid data identification and spatiotemporal sequence gap-filling techniques based on the NOAA CDR NDVI. We applied the aforementioned framework to China and, through a systematic process of validation and assessment, reconstructed continuous spatiotemporal maps of daily NDVI spanning from June 1981 to the May 2023 in China.
Methods
NDVI reconstruction framework
Reconstructing a daily continuously valid NDVI dataset with no gaps requires addressing two issues: the identification of valid data and the filling of invalid data. To tackle these challenges, we conceived a reconstruction framework of daily NDVI that combines a daily-scale valid data identification algorithm and spatiotemporal sequence gap-filling technique (Fig. 1). To minimize the cumulative impact of errors introduced by different data sources and enhance the practicality of the reconstruction framework, our approach relies solely on the original NDVI CDR daily NDVI, without incorporating additional auxiliary data.
We constructed a sliding window as the fundamental processing unit (FPU). NDVI in the early segment (e.g., the first 10 days) of the target month may be closer to the month-end level of the preceding month. On the contrary, NDVI in the later stage (e.g., the last 10 days) of the target month may tend to be more similar to the level at the start of the following month. Meanwhile, NDVI of the middle of the target month may be closer to the maximum value of this month. Therefore, this study sets the FPU to a duration of three months. For each pixel, we hunt out the maximum NDVI (NDVImax) of each month within the FPU and calculated their mean values between the target month and two adjacent months (five numbers in each FPU, here recorded as Vmax). Then, we calculate the percent bias (PB) between daily NDVI of the target month and Vmax in the corresponding FPU. If the absolute PB (|PB|) between the daily NDVI and any of the Vmax do not exceed a specific threshold T1, NDVI in this day can be deemed as potentially valid data, otherwise, it was identified as invalid data. If the pixel lacks data within the target month (e.g., short gap periods during satellite replacement)21 or the |PB| between NDVImax in the target month and mean value of NDVImax in the same month within the adjacent four years exceed a specific threshold T2, data in the target month is identified as invalid data. The threshold T1 is a critical parameter, serving as the key criterion for delineating valid data. If this threshold is too large, there is a risk of misclassifying noise data as valid data. Conversely, if it is too small, it may lead to the omission of some valid data, especially during periods that vegetation grows (e.g., May) or wilts (e.g., October) rapidly. The threshold T2 is used to further filter out the situation that there is no valid data throughout the entire month due to contamination. If this threshold is too large, noise data may be misclassified as valid records. On the contrary, if the threshold is too small, some valid data may be ignored due to substantially interannual variations in NDVI during some periods. We used the GIMMS3g monthly maximum NDVI during 1982 through 2015 to calculate the two thresholds. The national average |PB| of adjacent months for all months is about 20.39% (Supplementary Figure S1a). Hence, we set the threshold T1 as 20%. The national average value of the maximum |PB| of all months is about 39.26% (Supplementary Figure S1b). We set the threshold T2 as 39%. This valid data identification approach fully considers the trend of data changes within the time window, and determines the availability of the data by setting |PB| thresholds. This strategy helps to reduce the impact of possible outliers while retaining sensitivity to vegetation changes. The above identification algorithm can eliminate invalid and missing values in the original NOAA CDR daily NDVI dataset, and provide important baseline data for the next step of data reconstruction.
Considering the asymmetric distribution of the vegetation greenness curves during growth and withering phases33, this study utilized the segmented fitting function for the reconstruction of daily NDVI. For each pixel, based on the extracted daily valid NDVI of each year, we identified the date of the maximum value for that year and determined it as a dividing point to categorize the dates into vegetation growth and withering stages. We stacked the valid NDVI in the last three months of the previous year and valid NDVI in the current year’s growth stage. Similarly, for the withering stage, we combined the valid NDVI in the first three months of the following year and valid NDVI in the current year’s withering stages. These two valid NDVI collections constitute the foundational data segments for reconstructing the daily NDVI during the growth and withering stages of the target year. We used the quartic polynomial fitting function to construct daily NDVI fitting functions for the growth and withering stages of each year at the pixel scale based on the valid NDVI records, respectively. Then, these functions were separately applied for the reconstruction of daily NDVI in these two segments. Finally, a continuous daily gap-free NDVI dataset from 1981 to 2023 was constructed by iterating through all dates for each pixel. Considering the issue of missing data for some pixels in few short periods, this study used the mean value of the reconstructed NDVI in the adjacent four years to fill these gaps.
NOAA CDR NDVI
The NOAA CDR program provides a series of long-term, consistent, and accurate climate datasets used for monitoring and analyzing global climate change. The NOAA CDR daily NDVI is an important remote sensing indicator used to evaluate the status and changes of vegetation. The NOAA CDR daily NDVI maps with a spatial resolution of 0.05° by 0.05° used in this study were acquired from NOAA’s National Centers for Environmental Information, with a time span from 1981 to ten days before the present20. The fundamental data of this dataset includes data obtained by AVHRR sensor that equipped in eight NOAA polar orbiting satellites (NOAA-7, -9, -11, -14, -16, -17, -18, and -19) and VIIRS sensor that mounted on the SNPP and JPSS-1 (or NOAA-20) satellite platforms20,21. A series of ancillary data (e.g., digital elevation model, surface pressure, precipitable water, water vapor, and ozone) are also used in producing the NOAA CDR daily NDVI. This dataset has undergone systematic atmospheric correction, bidirectional reflectance distribution function (BRDF) correction, and cloud/cloud shadow identification. The reflectance spectra and NDVI constructed by the NOAA CDR program were compared and validated against MODIS data, demonstrating high levels of consistency (root mean square deviation, RMSD < 0.03 for red and NIR spectral bands, RMSD < 0.075 for NDVI) between the two datasets. In addition, a comparison with the daily Long Term Data Record (LTDR) observations at 48 global sites in 1999 also revealed the high reliability of this dataset. However, this original dataset only identified partial valid information and did not addressed the spatial gaps. Limitations such as data gaps and noise remain noticeable20,21.
NDVI datasets for comparison
We validated the reconstructed results at different time scales. First, following a comparison with the original NOAA CDR valid NDVI, we further assessed the reliability of our reconstruction results at the daily scale by comparing with the MODIS MCD19A3CMG v6.1 daily continuous gap-filled NDVI18 between February 24, 2000 through May 10, 2023. Second, we conducted comparative validation at the monthly scale with three popular NDVI datasets—the third generation of Global Inventory Modeling and Mapping Studies (GIMMS3g) NDVI34, Systeme Probatoire d’Observation de la Terre (SPOT)/PROBA NDVI35, and MODIS MOD13C2 NDVI36. The GIMMS3g NDVI was constructed based on the daily NOAA AVHRR recordings by optimizing and integrating the original NDVI. This NDVI dataset has a temporal resolution of 15 days and a spatial resolution of 1/12° (approximately 8 km), spanning from 1982 to 201534. The SPOT/PROBA NDVI merged vegetation datasets from SPOT/VEGETATION C3 and PROBA-V, and further adjusted the original data based on the bidirectional reflectance distribution function37,38. This dataset generated a global scale NDVI dataset over a long duration (from 1999 to 2020) with a spatial resolution of 1 km × 1 km and a temporal resolution of 10 days35. The MODIS NDVI dataset (MOD13C2) is obtained from the Terra satellite platform. The MOD13C2 NDVI maps were cloud-free spatial composites that retrieved from daily, atmosphere-corrected, bidirectional surface reflectance and generated at monthly intervals using maximum-value composite (MVC) method. The MOD13C2 NDVI data is projected on a 0.05-degree geographic climate modeling grid. This dataset has been improved by undergoing various calibration changes and showed high accuracy from various airborne and field validation campaigns39. We generated monthly NDVI of the aforementioned three datasets with a spatial resolution of 0.05° based on the MVC algorithm and nearest neighbor assignment method. The time span of the monthly GIMMS3g NDVI is from January 1982 to December 2015, and that of the monthly SPOT/PROBA NDVI is from January 1999 to September 2019. The monthly MOD13C2 NDVI spans from February 2000 to May 2023.
Quality evaluation of the reconstructed NDVI
We adopted several approaches to validate the quality of the reconstructed daily NDVI. First, we quantified the accuracy of the reconstructed NDVI by comparing it with the original valid NOAA CDR NDVI at the pixel scale. Specifically, we calculated the PB, coefficient of determination (R2), and root mean square error (RMSE) of each pixel. We overlapped the above three maps to derive the RGB (Red-Green-Blue) combinations to visualize the spatial patterns of the quality of the reconstructed NDVI. Second, we analyzed the distribution of all pixels in nine geographical zones in a three-dimensional spatial coordinate system composed of |PB|, R2, and RMSE to evaluate the quality of the reconstructed NDVI in each zone. Third, we randomly selected three equidistantly distributed samples in each geographical zone (Supplementary Figure S2a), and quantified the matching degree of the temporal evolution characteristics between the reconstructed NDVI and the original valid NDVI at each sample point. In addition, we constructed a certain number of random points in each geographical zone (Supplementary Figure S2b), and evaluated the matching degree between the reconstructed NDVI and the original valid NDVI on all random points in each zone.
In addition to the aforementioned evaluation strategies, we conducted spatiotemporal consistency comparisons between the reconstructed NDVI and other NDVI datasets (i.e., MODIS MCD19A3CMG v6.1 daily continuous gap-filled NDVI, GIMMS3g monthly NDVI, SPOT/PROBA monthly NDVI, and MODIS MOD13C2 monthly NDVI). We evaluated the spatial patterns of the spatiotemporal consistency between the reconstructed NDVI and the comparative datasets by mapping the RGB combination of Pearson correlation coefficient (r), RMSE, and mean absolute error (MAE) between the reconstructed NDVI and the other NDVI datasets, respectively. We also analyzed the distribution patterns of these three statistical metrics along the latitude to systematically assess the differences between the reconstructed NDVI and the comparative datasets. The five criteria used in this study to evaluate the quality of the reconstructed NDVI include r, R2, MAE, RMSE, and PB.
where n is the number of the samples. Yi,rec and Yi,com are the reconstructed NDVI and comparative data, respectively. \({\bar{Y}}_{rec}\) and \({\bar{Y}}_{com}\) are the mean values of reconstructed NDVI and comparative data, respectively. Yi,obs and \({\bar{Y}}_{obs}\) are the original NOAA CDR valid NDVI and the corresponding mean value, respectively.
Data Records
The daily gap-free NDVI dataset covers a spatial extent of 18.16°N to 53.56°N and 73.44°E to 135.09°E with a spatial resolution of 0.05°. The NDVI dataset from June 24, 1981 to May 10, 2023 is available at figshare repository40. The NDVI data for each year is individually stored as a standard Network Common Data Form, version 4 (NetCDF-4, NC4) format and a total of 43 NC4 files are provided. The file names follow the structure of “Daily_Gap-filled_NDVI_YYYY.nc4”, where “YYYY” represents the year. Additional information, such as the scale factor, is stored in the files.
Technical Validation
The reconstruction NDVI
Figure 2a–f map the spatial distributions of the original NOAA CDR NDVI and the reconstructed NDVI in three specific days. We can see a large range of missing data in the original data. Our reconstructed NDVI maps filled these gaps in the original dataset. In addition, there are many unreliable data in the original dataset due to the impact of cloud and other noise. For example, Fig. 2g illustrates the difference map in the original dataset between September 3, 2009 and September 2, 2009. Large positive (>0.6) and negative (<−0.6) differences in NDVI within two days are widely distributed over China. It is certain that, in the real situation, there will be no significant difference in NDVI within a two-day period. We mapped the differences in the reconstructed NDVI within the same two-day period and there is no significant difference between the reconstructed NDVI within these two days (Fig. 2h). Overall, our reconstructed daily NDVI effectively filled the gaps in the original NOAA CDR NDVI and fixed the invalid data in the original dataset.
Quantitative evaluation
Evaluation at random point scale
We randomly selected three equidistantly distributed samples in each geographical zone (27 sites in total, Supplementary Figure S2a) and quantified the matching degree of the temporal evolution characteristics between the reconstructed NDVI and the original valid NOAA CDR NDVI at each site (Fig. 3, Supplementary Figure S3, and Supplementary Figure S4). The proportion of valid data in the entire time series showed an increasing trend from the southeastern China to the northwestern China due to more frequent impacts of cloud and rain in the humid and semi-humid regions in China. The distributions of daily NDVI showed substantial differences in different regions, even within the same geographical zone. Nevertheless, our reconstructed daily NDVI exhibited high consistency with the original valid NDVI across all sites (R2 ≥ 0.83, |PB| < 0.7%, and RMSE < 0.08), emphasizing the robustness of the reconstruction framework.
In addition, to fully evaluate the spatiotemporal reliability of our reconstructed daily NDVI, we drew a density scatter plot between the reconstructed NDVI and the original valid NDVI on a certain number of random sites in each geographical zone (Fig. 4). Scatter points in all plots are distributed nearby the 1:1 lines. High-density scatter points are tightly concentrated near the 1:1 lines. R2 between the reconstructed NDVI and original valid NDVI in the nine zones are all larger than 0.8, and the largest |PB| and RMSE are only 0.21% and 0.073, respectively. These evaluations confirmed the high spatiotemporal consistency between our reconstructed daily NDVI with the original valid NOAA CDR NDVI.
Evaluation at full pixel scale
Figure 5 shows the comparisons between the reconstructed daily NDVI and the original NOAA CDR valid NDVI. The PB between the two datasets over China are quite small with more than 99.91% of the study area showing a PB between −1% and 1% (Fig. 5a). The high positive PB (0.5%~1%) are mainly located in the border region between North China and Southeast China (accounting for an area ratio of 2.80%), where are mostly covered by farmlands. Negative PB are mainly distributed in Northeast China, Qinghai Tibetan Plateau, and the northwest of North China. However, most negative PB are larger than −0.5% with an area ratio of 48.41%. The overall average PB between the reconstructed NDVI and the original NOAA CDR valid NDVI is about −0.02%. R2 between the two datasets over China are ranging from 0.41 to 0.96 with high values in most parts of China, except for some farmlands in the North China Plain and deserts in the northwest of China (Fig. 5b). The overall average R2 between the two daily NDVI datasets over China is 0.79 with more than 96.22% of the study area showing an R2 larger than 0.6 and more than half of the study area (50.63%) owning an R2 larger than 0.8. The RMSE map of the two daily NDVI datasets shows that RMSE in more than 98.26% of the study area are smaller than 0.1 (Fig. 5c). The spatial pattern of high RMSE is similar to the PB map, which are mainly located in the border region between North China and Southeast China, some parts of Southwest China, and south of the Qinghai Tibetan Plateau. The overall mean value of RMSE in China is about 0.05. The above evaluation results indicate that our reconstructed daily NDVI are highly consistent with the NOAA CDR valid NDVI.
We visualized the quality of the reconstructed NDVI over China by giving an RGB combination of the PB, R2, and RMSE maps (Fig. 5d). NDVI pixels with high quality should have low |PB|, low RMSE, but high R2. Here, we defined that a high-quality pixel should have a |PB| < 1%, RMSE < 0.08, and R2 > 0.5, simultaneously. Based on this definition, we found that the reconstructed NDVI in 89.76% of the study area showed high quality. These high-quality pixels are widely distributed throughout China, except for farmlands in the North China Plain and some hilly land in Southwest China and South China. The lower quality of these pixels is mainly due to the higher RMSE ( >0.08). We further analyzed the pixel distributions of the nine geographical zones in a three-dimensional spatial coordinate system composed of |PB|, R2, and RMSE (Fig. 6). It is clear that most pixels (>68%) in all zones are located in the quadrant with a |PB| < 1%, RMSE < 0.08, and R2 > 0.5 (i.e., high-quality pixel proportion). Northwest China, Inner Mongolia, and Qinghai-Tibetan Plateau present the largest high-quality pixel proportions (>94%) among all geographical zones. Overall, the high-quality pixel proportions of eight zones are larger than 75.5%, and only that in South China is relatively lower (68.21%). These pixel-based evaluations indicate the reconstructed NDVI in the nine geographical zones are reliable.
Comparison with MODIS daily gap-filled NDVI
Figure 7 shows the comparison of the reconstructed daily NDVI with the MODIS MCD19A3CMG v6.1 daily continuously gap-filled NDVI. The national average r, RMSE, and PB between the two daily NDVI datasets are 0.62, 0.1, and 7.54%, respectively. The RGB combination map of the r, RMSE, and |PB| maps (Fig. 7a) indicates that in 66.93% of the study area, our reconstructed NDVI shows relatively high spatiotemporal consistency (r > 0.5, |PB| < 50%, and RMSE < 0.2) with the MCD19A3CMG gap-filled NDVI (Fig. 7e). These regions are widely distributed throughout the country, except for the northwestern desert region and the southwestern mountainous area. The distribution of average PB along latitudes is shown in Fig. 7b. PB is relatively large between 28°N and 45°N with a substantial fluctuation range (i.e., standard deviation). The maximum PB occurs around 40°N, primarily due to differences between the two datasets in the northwestern desert regions, where vegetation coverages are quite sparse. In terms of the average r along latitudes (Fig. 7c), the average r exceed 0.5 for most latitudinal bands, except for regions around 40°N and 20°N. Similarly, the latitudinal average correlation coefficients corresponding to the northwestern desert regions exhibit larger fluctuations. The average RMSE at most latitudes are smaller than 0.2, with the lowest values observed around 40°N (Fig. 7d). This is mainly attributed to the relatively small NDVI values in the northwestern desert regions within this range. We quantified the average values of the three statistical criteria between our reconstructed NDVI and the MCD19A3CMG gap-filled NDVI in each geographical zone (Fig. 7f–h). The average |PB| in all zones, except for Northwest China and Qinghai-Tibetan Plateau, are smaller than 20%. The average r in all zones, except for Northwest China, are larger than 0.5. The largest regional average r (0.9) occurs in Northeast China. In terms of RMSE, the regional average RMSE in all zones are smaller than 0.17. The largest regional average RMSE (0.16) occurs in South China and the smallest regional average RMSE (0.06) occurs in Northwest China.
Comparison at monthly scale
We further evaluated the reliability of our reconstructed NDVI by comparing with other three popular NDVI datasets (i.e., GIMMS3g NDVI, MODIS MOD13C2 NDVI, and SPOT/PROBA NDVI) on a monthly scale. Figure 8a–c map the RGB combinations of r, RMSE, and MAE between the monthly maximum NDVI of the reconstructed NDVI and the other three NDVI datasets, respectively. The spatial distributions of the RGB combinations of the three statistical criteria between our reconstructed NDVI and the three NDVI datasets present similar patterns over China. The reconstructed NDVI in most pixels ( > 75%) over China show a high significant positive correlation (r > 0.5, p < 0.05) with the other three datasets. Pixels with low r or negative r are mainly located in Southwest China, Qinghai Tibetan Plateau, and deserts in Northwest China. On the whole, the average r between our reconstructed NDVI and GIMMS3g NDVI, MODIS NDVI, and SPOT/PROBA NDVI are 0.65, 0.65, and 0.67, respectively. The corresponding average MAE (or RMSE) are 0.1, 0.07, and 0.08 (or 0.12, 0.09, and 0.1), respectively.
Figure 8d–l present the distributions of average MAE, r, and RMSE along the latitude, respectively. The average MAE and RMSE at most latitudes are smaller than 0.25 and show the lowest values at about 40°N in the three comparisons, presenting an increasing pattern with the decrease of latitude (Fig. 8d–f, Fig. 8j–l). In contrast, the latitudinal average r between our dataset and other three datasets all show a decreasing trend with the decrease of latitude (Fig. 8g–i). The average r at most latitudes in the three comparisons are larger than 0.5. However, there is a noticeable low-value region with high fluctuation range in the distribution curves of r, primarily attributed to the low correlations between our dataset and the three comparative datasets in the desert region of Northwest China. The PB maps between monthly maximum NDVI of our dataset and the three comparative datasets are shown in Fig. 8m–o. The spatial distributions of the three PB maps are similar, showing an increasing gradient from southeast to northwest. The national average PB between our dataset and GIMMS3g NDVI, MODIS NDVI, and SPOT/PROBA NDVI are −5.79%, 4.82%, and 2.66%, respectively.
Usage Notes
In this study, we established a daily NDVI reconstruction framework by integrating valid data identification method with spatiotemporal sequence gap-filling technique and provided the first long-term (1981~2023) gap-free daily NDVI dataset over China with a spatial resolution of 0.05°. We adopted several approaches to validate the reliability of the reconstruction framework and the accuracy of the reconstruction NDVI. Our systematic evaluation indicated that the framework effectively addressed the inefficiency encountered by previous reconstruction methods when dealing with extensive data gaps and successfully filled these gaps and restored contaminated data in the original dataset.
The reconstructed daily NDVI dataset can provide crucial data support for better understanding the vegetation responses under climate and environmental changes, especially for rapidly occurring climate extremes41,42,43. Furthermore, our reconstructed daily gap-free NDVI can offer essential data support for improving the inversion reliability and temporal resolution of related vegetation parameters (e.g., gross primary productivity and net primary productivity44). For instance, the conventional satellite-based light use efficiency models (e.g., the Carnegie-Ames-Stanford Approach, CASA) use monthly maximum NDVI to simulate the fraction of photosynthetically active radiation (FPAR), which is subsequently used for the estimates of monthly vegetation productivity45,46,47. Vegetation productivity models based on monthly maximum vegetation indices may overestimate monthly vegetation productivity. Our reconstructed daily gap-free NDVI dataset can help improve the temporal resolution of these model estimates while also enhance the reliability of the simulation results.
All the daily NDVI data are stored in the NC4 format at the figshare repository. Users can use Python, R, and MATLAB platforms to read and manipulate the data. It should be noted that the data must be multiplied by the scale factor (0.001).
Code availability
The Python codes for generating and processing the daily gap-free NDVI data in China can be accessed through GitHub (https://github.com/mainearth/Daily-Gap-free-NDVI-Code.git).
References
Zeng, Y. et al. Optical vegetation indices for monitoring terrestrial ecosystems globally. Nat. Rev. Earth. Environ. 3, 477–493 (2022).
Dronova, I., Taddeo, S. & Harris, K. Plant diversity reduces satellite-observed phenological variability in wetlands at a national scale. Sci. Adv. 8, eabl8214 (2022).
Wang, Y., Shen, X., Jiang, M., Tong, S. & Lu, X. Spatiotemporal change of aboveground biomass and its response to climate change in marshes of the Tibetan Plateau. Int. J. Appl. Earth Obs. 102, 102385 (2021).
Jin, N. et al. Assimilating remote sensing data into a crop model improves winter wheat yield estimation based on regional irrigation data. Agr. Water Manage. 266, 107583 (2022).
Smith, T., Traxl, D. & Boers, N. Empirical evidence for recent global shifts in vegetation resilience. Nat. Clim. Change 12, 477–484 (2022).
Liu, Z. et al. Respiratory loss during late-growing season determines the net carbon dioxide sink in northern permafrost regions. Nat. Commun. 13, 5626 (2022).
Bai, X. et al. A carbon-neutrality-capacity index for evaluating carbon sink contributions. Environmental Science and Ecotechnology 15, 100237 (2023).
Xiong, L. et al. High-Resolution Data Sets for Global Carbonate and Silicate Rock Weathering Carbon Sinks and Their Change Trends. Earth’s Future 10, e2022EF002746 (2022).
Li, H. W., Wang, S. J., Bai, X. Y., Cao, Y. & Wu, L. H. Spatiotemporal evolution of carbon sequestration of limestone weathering in China. Sci. China Earth Sci. 62, 974–991 (2019).
Li, C. et al. Effects of carbonate minerals and exogenous acids on carbon flux from the chemical weathering of granite and basalt. Global Planet. Change 221, 104053 (2023).
Li, H. et al. Responses of soil organic carbon to climate change in the Qilian Mountains and its future projection. J. Hydrol. 596, 126110 (2021).
Ran, C. et al. Threat of soil formation rate to health of karst ecosystem. Sci. Total Environ. 887, 163911 (2023).
Li, H. et al. The Grain-for-Green project offsets warming-induced soil organic carbon loss and increases soil carbon stock in Chinese Loess Plateau. Sci. Total Environ. 837, 155469 (2022).
Li, H. et al. Decipher soil organic carbon dynamics and driving forces across China using machine learning. Global Change Biol. 28, 3394–3410 (2022).
Tang, Z., Ma, J., Peng, H., Wang, S. & Wei, J. Spatiotemporal changes of vegetation and their responses to temperature and precipitation in upper Shiyang river basin. Adv. Space Res. 60, 969–979 (2017).
Badgley, G., Field, C. B. & Berry, J. A. Canopy near-infrared reflectance and terrestrial photosynthesis. Sci. Adv. 3, e1602244 (2017).
Camps-Valls, G. et al. A unified vegetation index for quantifying the terrestrial biosphere. Sci. Adv. 7, eabc7447 (2021).
Lyapustin, A. MODIS/Terra+Aqua Vegetation Index from MAIAC, Daily L3 Global 0.05Deg CMG V061. NASA EOSDIS Land Processes Distributed Active Archive Center https://doi.org/10.5067/MODIS/MCD19A3CMG.061 (2023).
Claverie, M., Matthews, J. L., Vermote, E. F. & Justice, C. O. A 30+ Year AVHRR LAI and FAPAR Climate Data Record: Algorithm Description and Validation. Remote Sens-Basel 8, 263 (2016).
Vermote, E., NOAA CDR Program. NOAA Climate Data Record (CDR) of AVHRR Normalized Difference Vegetation Index (NDVI), Version 5. NOAA National Centers for Environmental Information https://doi.org/10.7289/V5ZG6QH9 (2019).
Vermote, E., NOAA CDR Program. NOAA Climate Data Record (CDR) of VIIRS Normalized Difference Vegetation Index (NDVI), Version 1. NOAA National Centers for Environmental Information https://doi.org/10.25921/gakh-st76 (2022).
Julien, Y. & Sobrino, J. A. Optimizing and comparing gap-filling techniques using simulated NDVI time series from remotely sensed global data. Int. J. Appl. Earth Obs. 76, 93–111 (2019).
Deng, G., Tang, Z., Dong, C., Shao, D. & Wang, X. Development and Evaluation of a Cloud-Gap-Filled MODIS Normalized Difference Snow Index Product over High Mountain Asia. Remote Sens-Basel 16, 192 (2024).
Li, J. & Heap, A. D. Spatial interpolation methods applied in the environmental sciences: A review. Environ. Modell. Softw. 53, 173–189 (2014).
Liu, Y., Dang, B., Li, Y., Lin, H. & Ma, H. Applications of Savitzky-Golay Filter for Seismic Random Noise Reduction. Acta Geophys. 64, 101–124 (2016).
Tang, Z. et al. Satellite observed spatiotemporal variability of snow cover and snow phenology over high mountain Asia from 2002 to 2021. J. Hydrol. 613, 128438 (2022).
Tang, Z. et al. Spatiotemporal variation of snowline altitude at the end of melting season across High Mountain Asia, using MODIS snow cover product. Adv. Space Res. 66, 2629–2645 (2020).
Li, S. et al. High-quality vegetation index product generation: A review of NDVI time series reconstruction techniques. Int. J. Appl. Earth Obs. 105, 102640 (2021).
Beck, P. S. A., Atzberger, C., Høgda, K. A., Johansen, B. & Skidmore, A. K. Improved monitoring of vegetation dynamics at very high latitudes: A new method using MODIS NDVI. Remote Sens. Environ. 100, 321–334 (2006).
Sellers, P. J. et al. A global 1° by 1° NDVI data set for climate studies. Part 2: The generation of global fields of terrestrial biophysical parameters from the NDVI. Int. J. Remote Sens. 15, 3519–3545 (1994).
Yang, G., Shen, H., Zhang, L., He, Z. & Li, X. A Moving Weighted Harmonic Analysis Method for Reconstructing High-Quality SPOT VEGETATION NDVI Time-Series Data. IEEE T. Geosci. Remote 53, 6008–6021 (2015).
Cao, R. et al. A simple method to improve the quality of NDVI time-series data by integrating spatiotemporal information with the Savitzky-Golay filter. Remote Sens. Environ. 217, 244–257 (2018).
Zeng, L., Wardlow, B. D., Xiang, D., Hu, S. & Li, D. A review of vegetation phenological metrics extraction using time-series, multispectral satellite data. Remote Sens. Environ. 237, 111511 (2020).
Tucker, C. J. et al. An extended AVHRR 8-km NDVI dataset compatible with MODIS and SPOT vegetation NDVI data. Int. J. Remote Sens. 26, 4485–4498 (2005).
European Commission Directorate-General Joint Research Centre. Normalised Difference Vegetation Index 1999-2020 (raster 1 km), global, 10-daily - version 3. Copernicus Global Land Service https://land.copernicus.vgt.vito.be/geonetwork/srv/api/records/urn:cgls:global:ndvi_v3_1km (2021).
Didan, K. MODIS/Terra Vegetation Indices Monthly L3 Global 0.05Deg CMG V061. NASA EOSDIS Land Processes Distributed Active Archive Center https://doi.org/10.5067/MODIS/MOD13C2.061 (2021).
Toté, C. et al. Evaluation of the SPOT/VEGETATION Collection 3 reprocessed dataset: Surface reflectances and NDVI. Remote Sens. Environ. 201, 219–233 (2017).
Toté, C. et al. Evaluation of PROBA-V Collection 1: Refined Radiometry, Geometry, and Cloud Screening. Remote Sens-Basel 1013, 1375 (2018).
Huete, A. et al. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 83, 195–213 (2002).
Li, H. et al. A daily gap-free normalized difference vegetation index dataset from 1981 to 2023 in China, Figshare, https://doi.org/10.6084/m9.figshare.c.7002225.v1 (2024).
Feldman, A. F. et al. A multi-satellite framework to rapidly evaluate extreme biosphere cascades: The Western US 2021 drought and heatwave. Global Change Biol. 29, 3634–3651 (2023).
Wolf, S. & Paul-Limoges, E. Drought and heat reduce forest carbon uptake. Nat. Commun. 14, 6217 (2023).
Yuan, X. et al. A global transition to flash droughts under climate change. Science 380, 187–191 (2023).
Shang, R. et al. China’s current forest age structure will lead to weakened carbon sinks in the near future. The Innovation 4, 100515 (2023).
Li, H., Wu, Y., Liu, S. & Xiao, J. Regional contributions to interannual variability of net primary production and climatic attributions. Agr. Forest Meteorol. 303, 108384 (2021).
Cao, Y. et al. Regional Contribution and Attribution of the Interannual Variation of Net Primary Production in the Yellow River Basin, China. Remote Sens-Basel 15, 5212 (2023).
Liang, W. et al. Analysis of spatial and temporal patterns of net primary production and their climate controls in China from 1982 to 2010. Agr. Forest Meteorol. 204, 22–36 (2015).
Acknowledgements
We thank Eric Vermote and colleagues from NOAA’s CDR Program, who developed the original NDVI CDR dataset. This work is jointly supported by the Opening Fund of the State Key Laboratory of Environmental Geochemistry (SKLEG2024202), National Funded Postdoctoral Researcher Program (GZC20233507), China Postdoctoral Science Foundation (2023M742832), Shaanxi Province Postdoctoral Science Foundation (2023BSHTBZZ19), the Open Project of Technology Innovation Center for Natural Ecosystem Carbon Sink (CS2023D05), National Natural Science Foundation of China (32230067, 32171581), Natural Science Basic Research Program of Shaanxi (22JHQ036), the Key Research and Development Program of Shaanxi (2024SF-YBXM-551), Natural Science Basic Research Program of Shaanxi (2023-JC-QN-0348), and the Fundamental Research Funds for the Central Universities. We also thank the editors and reviewers for their constructive comments which led to improvements to the paper.
Author information
Authors and Affiliations
Contributions
Huiwen Li: Conceptualization, Methodology, Formal analysis, Investigation, Writing - original draft. Yue Cao: Methodology, Formal analysis, Investigation. Jingfeng Xiao: Validation, Writing - review & editing. Zuoqiang Yuan: Formal analysis, Validation, Review & editing. Zhanqing Hao: Resources, Review & editing. Xiaoyong Bai: Resources, Review & editing. Yiping Wu: Review & editing. Yu Liu: Review & editing.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no conflict of interest for this article. The funding agencies had no role in the design of the study, the data processing and writing or in the decision to publish the results.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, H., Cao, Y., Xiao, J. et al. A daily gap-free normalized difference vegetation index dataset from 1981 to 2023 in China. Sci Data 11, 527 (2024). https://doi.org/10.1038/s41597-024-03364-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03364-3