Background & Summary

The increasing frequency of extreme climate events, coupled with increasing global volatility — such as the Russian-Ukrainian war and food trade restrictions — has had a dramatic impact on global food security and agricultural trade liberalization in recent years1,2,3. Since the 1990s, crop production has significantly increased both locally and globally4,5, primarily due to higher crop yields (production per harvested area), rather than the expansion of harvested areas. However, year-to-year variability remains substantial due to climate fluctuations6,7. As the world’s population continues to grow and environmental pressures increase, research on crop yields and their spatiotemporal changes has gained increasing prominence6,8. Therefore, a high-quality, spatially explicit, gridded crop yield dataset spanning several decades would be invaluable for addressing the risks posed by climate change, identifying yield gap, maintaining stability in international markets, and ensuring food security9,10,11.

Several global historical crop yield datasets, covering recent decades and derived from census or satellite data, are publicly available. These datasets have greatly supported studies on global food security, sustainable development, and climate change impacts and adaptation12,13,14. For example, M3Crops assigned uniform statistical crop yields to downscaled, crop-specific areas and generated the first global yield mapping circa 200014. The Global Agro-Ecological Zones (GAEZ) dataset provided the potential yields circa 2000, 2010 and 2015, respectively, through statistical downscaling12,15. Ray2012, based on approximately 13,500 crop censuses, allocated four major crop yields to each grid within each political unit and provided three 5-year average maps (1995, 2000 and 2005)16. The Spatial Production Allocation Model (SPAM) used cross entropy to downscale crop statistics to grid cells17, producing global crop yields data around 2000, 2005, and 2010 13,18,19. The four datasets primarily rely on downscaling agricultural census data and other supplementary information to a resolution of 5 arc-minutes. However, these datasets are limited to specific years, and as a result, lack temporal continuity. The GDHY (Global Dataset of Historical Yields) and GGCMI (Global Gridded Crop Model Intercomparison) are the only two datasets that provide continuous temporal coverage, supporting studies on the inter-annual variations in crop yields10,20,21,22. Unfortunately, these two crop yield datasets use national-level crop statistics or data from limited experimental sites as inputs, and have a coarser spatial resolution of 0.5°2,23. Therefore, they may overlook localized spatial variations at the sub-national or smaller scale, especially in major crop-producing countries across larger areas. Additionally, capturing interannual fluctuations and upward trends becomes challenging with these datasets24. Therefore, current global crop yield datasets are lack of detailed time and spatial information, warranting the necessary to develop a new, high-resolution, and long-term global crop yield dataset2.

The current global crop yield estimations primarily rely on process-oriented crop models (e.g., GGCMI) and downscaling methods (including M3Crops, GAEZ, SPAM and GDHY). However, the limited availability of detailed and precise input data on a global scale — such as spatially heterogeneous soil, cultivars, and management practices — poses challenges in calibrating process-oriented crop models. As a result, they often rely on national statistics or limited experimental data, and in some cases, remain uncalibrated. These limits them to accurately simulate crop yields over larger areas24,25,26. The downscaling methods depend on current agricultural statistics and other supplementary information, limiting the scalability of these methods across multiple crops, regions, and years10,23. Encouragingly, in recent years, Machine Learning (ML) algorithms — known for their ability to capture complex, higher-order, and nonlinear relationships between predictors and target variable — have been successfully employed for yield estimation in some countries27,28. However, to the best of our knowledge, global yields mapping for four major crops based on ML methods has not yet been conducted.

The objectives of this study are to: (1) develop ML models and select the optimal one for each country and crop by integrating satellite data, climate data, soil properties, agricultural management practices, and detailed census records for approximately 12,000 administrative units; (2) producing yield maps for maize, soybean, rice, and wheat, with a 5 arc-minute spatial resolution for the period 1982–2015 (GlobalCropYield5min), using the optimal ML model; and (3) comprehensively evaluate the data products. These four crops, as the primary cereal and legume sources for global population, account for nearly two-thirds of total calorie production worldwide16.

Methods

Research framework

The study’s flowchart is presented as Fig. 1. First, we compared three commonly used ML models for yield estimation, and selected the optimal models for each country and crop. Next, we applied the selected model to estimate annual crop yield for each 5 × 5 arc-minute grid cell from 1982 to 2015, producing a global crop yield dataset GlobalCropYield5min. To assess the accuracy of the dataset, we conducted a comprehensive evaluation from multiple perspectives. The evaluation included comparing simulation accuracy, analyzing the spatial patterns of the coefficient of variation (CV) and mean annual yield, as well as examining yield temporal trends and variations between GlobalCropYield5min and recorded data. Finally, the study compared the accuracy of GlobalCropYield5min with the SPAM and GDHY crop yield data for the years 2000, 2005, and 2010. The selected datasets are shown in Table 1.

Fig. 1
figure 1

The flowchart of this study.

Table 1 Information on the selected datasets.

Model development

The performance of ML models varies depending on factors such as data structure, distribution, and volume. It is important to note that a single model may not be universally applicable29,30. Consequently, three commonly used ML models — Random Forest (RF), eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM) were employed for crop yield estimation. The LightGBM model supports efficient parallel training and offers notable advantages, including faster training, lower memory consumption, higher accuracy, and support for distributed, fast processing of large datasets. The XGBoost model utilizes optimization algorithms to reduce computational complexity and effectively mitigates overfitting through regularization, exhibiting significant advantages in handing large-scale datasets. Compared to XGBoost, the RF model is better suited for managing high-dimensional and noisy data. Further details about these models are described in previous studies31,32,33.

Crop planting and harvesting months were determined according to Vogel et al.34 (Fig. S4). All spatial data were aggregated to the administration unit level using the Google Earth Engine (GEE) platform. To ensure consistency, we re-gridded all datasets — including climate, NDVI, irrigation, fertilizer application rates, and soil data — onto a 5 × 5 arc-minute resolution grid over all months of the growing season, aligning with the harvested area dataset14. Cropped areas were masked using grid cells with a crop-specific harvested area fraction of ≥1% for each crop35. Before applying the ML models, all variables were standardized to have mean of 0 and standard deviation of 1. To evaluate and compare the accuracy of the three models, the datasets for all years and administrative units from 1982 to 2015 were randomly divided into two parts: 70% for predictor selection, model calibration/training, and through determination of optimal hyperparameters, and 30% for evaluating model performance.

For maize and rice, only the primary seasons were considered in this study, as they contribute most to national production and economic impacts36. For wheat, winter wheat and spring wheat differ in variety, growth habits, cropping regions, climate preferences, and harvest times37. For example, winter wheat requires vernalization to flower, so it must be sown before winter, typically in temperate climates. Spring wheat, on the other hand, does not need vernalization and is grown in warmer climates. Therefore, when modeling yields or studying the impacts of climate change, it is important to treat spring and winter wheat separately. Winter wheat and Spring wheat were determined following Vogel et al.34 (Fig. S5).

Model predictor selection and parameter optimization

We designed an automatic selection process for the optimal combination of predictors and the parameters in a modular and extensible manner, allowing for the selection of the optimal models for different crops and countries. First, Recursive Feature Elimination cross-validation (RFECV)38 was used to automatically select potential predictors with the best mean score across each country and crop, reducing predictors redundancy39 and minimizing the risk of overfitting and collinearity. For parameter optimization, we first identified the range of parameters values requiring tuning. Subsequently, Bayesian optimization was utilized to select the optimal parameters40,41. The selection of models, predictors, and the model parameters varied by country and crops (Table S4).

Comparison and selection of the optimal yield estimation models

To evaluate the simulation accuracy of the GlobalCropYield5min product, this study first aggregated the dataset by administrative unit and then compared it with recorded data. The coefficient of determination (R2), root mean square error (RMSE), and normalized root mean square error (NRMSE) were used to assess accuracy42.

$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{({y}_{i,j}^{{True}}-{y}_{i,j}^{{Pred}})}^{2}}{{\sum }_{i=1}^{n}{({y}_{i,j}^{{True}}-{\bar{y}}_{i,j}^{{True}})}^{2}}$$
(1)
$${RMSE}=\sqrt{\frac{{\sum }_{i=1}^{n}{\left({y}_{i,j}^{{True}}-{y}_{i,j}^{{Pred}}\right)}^{2}}{n}}$$
(2)
$${NRMSE}=\frac{{RMSE}}{{\bar{y}}_{i}}$$
(3)

Here, i represents the index of the administrative unit, n denotes the total number of administrative units, and j corresponds to the year. \({y}_{i,j}^{{True}}\) refers to the observed crop yield obtained from governmental or FAO websites for the i-th administrative unit in year j, \({\bar{y}}_{i,j}^{{True}}\) represents the average observed crop yield for the i-th administrative unit in year j, and \({y}_{i,j}^{{Pred}}\) indicates the GlobalCropYield5min yield estimate for the i-th administrative unit in year j. To ensure results stability, we ran each model for 50 times and calculated the mean predicted R2, RMSE, and NRMSE, and the standard deviation (SD) as a measure of model performance. Therefore, all the R2, RMSE, and NRMSE hereafter refer to the mean predicted R2, RMSE, and NRMSE. The model with the highest predicted R2 and lowest RMSE is selected as the optimal model for each crop and country.

Development of the GlobalCropYield5min product

For each crop and country, predictors at the gridded scale were aligned with those selected at the administrative scale level. These predictors were then input into the selected optimal model to generate gridded annual crop yield maps from 1982 to 2015 with a 5 arc-minute spatial resolution. To ensure the robustness of the results, 50 simulations were performed, and their outputs were averaged to create a global gridded long-term yield dataset. This process was consistently applied across all countries for each crop, and the resulting datasets were combined to form the GlobalCropYield5min product.

Verification of the GlobalCropYield5min product

The gridded crop yields were initially averaged and aggregated at the administrative unit level and then compared with recorded data to evaluate its accuracy. To further assess the product’s ability to capture interannual yield variations, we compared the temporal patterns of actual yield and yield anomalies (∆Y) at the national level. This comparison utilized data from GlobalCropYield5min and reported yields, calculated as average values across all administrative units within each country. The ∆Y was defined as follows:

$$\varDelta {Y}_{i,j}=\frac{{y}_{i,j}^{{True}}{-\hat{y}}_{i,j}^{{True}}}{{\bar{Y}}_{t,g}}\times 100$$
(4)

Here, \({\hat{y}}_{i,j}^{{True}}\) is calculated by the 5-year moving average method. The \(\varDelta {Y}_{i,j}\) (yield anomalies) represents the percent crop yield anomalies, which are widely used to quantify the changes in yields caused by climatic variability by removing the trend of the yield caused by non-climatic and time-dependent factors (i.e., demand, prices, technology, and other factors)43,44. It is worth mentioning that the percentage yield anomalies for the first two years and the last two years were not available because we used a time window of t−2 to t + 2 to get moving average means.

Comparison with existing global crop yield products

To assess the accuracy of our product and compare it with two widely used global products — SPAM and GDHY — across various crops, countries, and globally, we followed a two-step process. First, we computed the average values for each administrative unit in 2000, 2005, and 2010, as these are the only three years publicly accessible for SPAM. Then, we compared the three sets of global gridded yield products with our collected yield data at the administrative unit level, using R2 and RMSE between the reported yield and estimates.

Spatial uncertainty assessment of globalCropYield5min products

To evaluate the spatial uncertainty of the GlobalCropYield5min Products, we calculated the NRMSE following previous similar researches45,46. The NRMSE for each administrative unit was allocated to its centroid, and the kriging interpolation method was applied to spatially distribute the uncertainty.

Data

Crop system data

We collected crop yield, harvested area and production from various public sources, including national and regional statistical bureaus and agricultural agencies (Table S1, all websites are available and accessed before April 2020). Data availability varied across regions and time periods. To ensure a consistent global database for the four crops across all spatial levels, we conducted a preliminary assessment of data quality, excluding potential outliers that exceeded biophysically attainable yields for each crop type. Additionally, we excluded administrative units with missing yield data for more than one-third of the data collection period. In cases where crop yield data was entirely absent for a political unit during the study period or in a specific year, we examined the harvested areas and production data in the upper-level administrative units where the crop was harvested. If confirmed, we used an interpolation method to estimate the production and harvested area for the individual administrative unit. This involved selecting the five closest years of available data preceding the missing year and calculating the average harvested area and production for each administrative unit. Using the average values, we determined the proportion of the total harvested and production for each administrative unit. Subsequently, we used these proportion values to estimate the missing harvested area and production for the specific administrative unit, based on the corresponding harvested area and production at its upper administrative unit. For years when we have data for the administrative unit, we summed the harvested area and production up to the national level and compared it to the FAO-reported national data. In cases that discrepancies arose, we scaled the sub-national sum proportionately to match the FAO-reported data at the national level, assuming the FAO-reported data to be accurate. To ensure consistency, we converted all data for the period of 1982–2015 to the same units, specifically hectares for harvest areas, tons for production, and tons per hectare (ton/ha) for yield, as some countries used different units.

Regional climate and large-scale climate modes

In this study, we used the TerraClimate gridded monthly dataset, with a spatial resolution of 1/24° (~4 km)47. To verify the TerraClimate dataset in representing regional climate (RC) conditions, we initially examined the correspondence between monthly temperature and precipitation values from TerraClimate and national weather stations, specifically the China Meteorological Administration (CMA). For example, in the cultivation areas of winter wheat in mainland China (Fig. S1), the results showed the TerraClimate dataset effectively captured the spatial and temporal variations in regional climate, exhibiting high correlation coefficients (R2) of 0.83 for temperature and 0.99 for precipitation (Fig. S2). The primary climate variables considered in the study include precipitation (Pre), minimum temperature (Tmin), maximum temperature (Tmax), mean temperature (Tmean), Vapor pressure (Vap), vapor pressure deficit (Vpd), and Srad (Surface shortwave radiation).

In addition to regional climate variables, large-scale climate (LC) oscillations, such as El Niño–Southern Oscillation (ENSO), Indian Ocean Dipole (IOD) and Pacific Decadal Oscillation (PDO), have been reported to cause extreme events like floods, droughts, and storms, with significant impacts on harvested area and productions9,43,44,48,49,50,51,52. Hence, we selected the monthly values of six major climate mode indices as potential predictors of crop yields. These indices include Nino34 (Niño 3.4 SST Index), EA (Eastern Atlantic pattern of the 500-hPa height), PDO, North Atlantic Oscillation (NAO), Atlantic Multidecadal Oscillation (AMO) and IOD (Table S3).

Satellite dataset, environmental stressor factors, and technology advancement

The Normal Difference Vegetation Index (NDVI) has been widely used for estimating crop yields and detecting crop drought stress53,54. The PKU GIMMS provides NDVI values with a temporal resolution of 15 days and a spatial resolution of 5 minutes, covering the entire globe, starting from 1982. The GIMMS NDVI product stands out due to its extended temporal coverage, enabling yield simulations before 2000 and surpassing other available satellite NDVI datasets in this regard55. In this study, we utilized NDVI data from PKU GIMMS Normalized Difference Vegetation Index dataset as a predictor for crop yield.

In addition, to capture the spatial and time-dependent improvement of agriculture technology information (TI), we incorporated additional predicting variables into our analysis56,57,58,59. Specifically, we used the irrigation area ratio (Fig. S3)60, dynamic fertilizer application rates, and year46,61. These variables serve as indicators of evolving field management and techniques in agriculture. Recognizing the substantial impact of soil characteristics on crop growth and yield62,63, we also considered six soil parameters for each 1 km × 1 km grid. These parameters, obtained from the Harmonized World Soil Database31, include topsoil sand fraction (T_SAND), soil texture (T_TEXTURE), organic carbon content (T_OC), pH (T_PH_H2O), cation exchange capacity (T_CEC_SOIL) and bulk density (T_REF_BULK). Incorporating these soil parameters and spatial information (longitude, latitude, and elevation) as yield prediction variables enhances our understanding of the influence of constant environmental conditions (CEC) on crop performance.

Data Records

The resultant GlobalCropYield5min dataset64 in this paper is freely available online at https://doi.org/10.17632/hg8wzgx4yp.3. The dataset contains global gridded annual yield for the four major crops with 5-minute resolution from 1982 to 2015. Each crop dataset is in standard NetCDF4 format with the georeferenced information embedded. The unit of crop yield is t/ha the dataset.

Technical Validation

Performance of the ML models in estimating crop yield

We trained the three ML models separately for each country and crop. Figure 2 presents the R2, RMSE (t/ha), and NRMSE (%) values for the top seven countries and globally, using the selected optimal ML models. Overall, the results demonstrate a high accuracy, although the optimal model type and predictor combinations varied across different crop and countries. At the global scale, wheat yield estimation exhibits the highest accuracy among the four crops, with a R2 of 0.95 and a RMSE (NRMSE) of 0.46 t/ha (13.1%). Maize follows closely, with a R2 of 0.93 and a RMSE (NRMSE) of 0.76 t/ha (17.8%). Rice yield estimation achieved a R2 of 0.90 and a RMSE (NRMSE) of 0.63 t/ha (18.3%). The model for soybean yield estimation performed comparatively worse, with a R2 of 0.86 and a RMSE (NRMSE) of 0.31 t/ha (15.3%).

Fig. 2
figure 2

The R2 values of the three ML models for the top seven production countries and the globe using the testing dataset. The error bars are one standard deviation of R2 from 50 ensemble simulations by randomly separating training and testing datasets. Note: I-VII represent the top seven production countries for each crop respectively and VIII represents the globe (see Table S2).

The model skill is also the best for wheat among the major producing countries, with a R2 ranging from 0.84 to 0.91 and a NRMSE (RMSE) from 5.6% (0.27 t/ha) to 16.7% (0.53 t/ha). For maize, the models perform best in Mexico and Ukraine, with a R2 from 0.90 to 0.92, and a RMSE (NRMSE) from 0.50 t/ha (21.2%) to 0.39 t/ha (10.7%), respectively. In other countries, R2 values range from 0.76 to 0.87, RMSE ranges from 0.36 to 1.1 t/ha, and NRMSE ranges from 15.3% to 25.1%. Regarding rice, the model achieved the highest accuracy in Bangladesh, with a R2 of 0.90 and a RMSE (NRMSE) of 0.3 t/ha (9.8%). In other countries, the models had a R2 ranging from 0.69 to 0.87 and a RMSE from 0.23 (7.6%) to 0.88 t/ha (24.6%). For soybean, the models exhibited a R2 ranging from 0.72 to 0.85 and a RMSE (NRMSE) from 0.17 t/ha (7.2%) to 0.40 t/ha (18.3%). Furthermore, the error bars show relatively low deviations, suggesting the robustness of our models. The recorded and estimated yields for countries and the globe closely align along the 1: 1 line (Figs. S6-9). We further found that year and NDVI were the consistently important factors for four crop yield estimations, highlighting the role of long-term agronomic advancements and vegetation health in yield determination.

Comparison of GlobalCropYield5min products with observed records

Because global-scale field crop yield measurements were not available, we aggregated the GlobalCropYield5min data to the administrative unit level to compare the estimated yields with recorded data, enabling an analysis of yield estimation accuracy at the grid scale. At the global scale (Fig. 3), the mean R2 between the estimates and recorded data for the top seven production countries was 0.85 for maize, 0.82 for wheat, 0.76 for rice and 0.78 for soybean. The mean RMSE (NRMSE) was 0.97 t/ha (24%), 0.62 t/ha (21%), 1.19 t/ha (28%), and 0.36 t/ha (18%) for maize, wheat, rice, and soybean, respectively. Overall, the simulation accuracy was significantly higher for maize and wheat than rice and soybean. For both maize and wheat, the average R2 was greater than 0.7 in all major producing countries, with RMSE ranging from 0.32 t/ha to 1.1 t/ha and NRMSE from 10.6% to 28.1%, except for maize in Brazil and wheat in Russia. For rice, the R2 ranged from 0.62 to 0.79, RMSE mostly ranged from 0.34 to 0.86 t/ha, and NRMSE ranged from 11.1 to 24.1%. Regarding soybean, the R2 ranged from 0.61 to 0.76, RMSE ranged from 0.05 t/ha to 0.51 t/ha, and NRMSE ranged mostly from 5% to 21.8%. Note that a higher R2 did not necessarily correspond to a lower RMSE (NRMSE) when comparing different crop types and countries, due to the substantial variation in recorded yields.

Fig. 3
figure 3

Evaluation of the simulation accuracy using R2, RMSE(t/ha), and NRMSE (%) of the four crops across the top seven production countries and the globe from 1982 to 2015. The black lines within the box indicate the medians in 34 years, while red dots represent means. The boundaries of boxes are the 25th and 75th percentiles and the whiskers below and above the boxes represent the 10th and 90th percentiles.

Regionally, the performance of the selected crop estimation model at the grid level varied by country and crop type. For the top seven production countries, the model performance was generally better for maize and wheat, with mean R2 exceeding 0.7, except for maize in Brazil and wheat in Russia. The RMSE ranged from 0.33 t/ha to 1.1 t/ha, and NRMSE from 11% to 28%. The model performance was moderate for rice (mean R2 ranging from 0.62 to 0.79 and RMSE from 0.34 t/ha to 1.45 t/ha) and soybean (mean R2 ranging from 0.61 to 0.76 and RMSE from 0.19 t/ha to 0.51 t/ha, except for India).

In addition, we examined the prediction skill over time (Fig. S15). The average R2 at the grid level ranged from 0.65 to 0.81, with RMSE (NRMSE) from 0.24 t/ha (13%) to 1.1 t/ha (38%) during the period of 1982–2015. The highest R2 was achieved for maize in 2010, while the lowest R2 was observed for rice in 2002. The prediction skill for maize (with R2 ranging from 0.7 to 0.81 and mean R2 being 0.76) and wheat (with R2 ranging from 0.66 to 0.80 and mean R2 being 0.74) was higher than that for rice (R2 from 0.65 to 0.73 and mean R2 being 0.70) and soybean (R2 from 0.68 to 0.78 and mean R2 being 0.72). These results are consistent with the earlier sub-national analysis. The time series analysis indicated a relatively high skill of these models even at the grid level (all R2 ≥ 0.65), although the skill varied across different years.

The spatial patterns of CV and mean crop yield of GlobalCropYield5min over the past three decades closely align with the reported yields (Fig. 4 and Fig. S16). The CV and mean of crop yields exhibit distinct patterns, particularly in large countries. Regions with high crop production such as the United States tend to have relatively low CV. Over the study period, the global average CV of maize yield was approximately 0.39 tons/ha/year (Fig. 4a). The CV of maize was higher than the global average in Brazil, Argentina, India, parts of China and Africa, and the Southeast Asia. The global average CV for rice yield was relatively low, except in southern China, northeastern Brazil, central India, and northwestern Africa. The global average CV for wheat yield was 0.38 tons/ha/year. The highest CV values were observed in the Australian wheat belt, northeastern China, Argentina, and southeastern Brazil. Conversely, in major soybean-producing countries such as the Midwestern U.S., India, Southeast and Central Asia, and parts of Latin America, the CV of soybean yield was low. However, it was higher in northeastern China, marginal soybean-producing areas of the United States, Argentina, and Nigeria. Overall, the spatial patterns of crop yield CV and mean exhibit substantial spatial variability at the global level.

Fig. 4
figure 4

Spatial pattern of CV and mean crop yields over the entire study period based on GlobalCropYield5min product for maize (a, e), rice (b, f), wheat (c, g) and soybean (d, h). The sample size is approximately 12,000 administrative units × 34 years for each crop. White represents the areas where crop was not harvested or analyzed.

We also compared the yield time series for the top four producing countries to assess GlobalCropYield5min’s ability to capture yield trends (Fig. 5) and year-to-year variations (Fig. S17). The simulated and reported yields exhibit closely aligned increasing trends, with R2 values ranging from 0.80 to 1 (Fig. 5). Notably, a clear increasing yield trend is observed for each crop and major country from 1982 to 2015, although the magnitude varies. Additionally, interannual year-to-year yield variations are well captured, with R2 values spanning from 0.36 to 0.96. Both show statistically significant positive correlations (P < 0.001). Overall, these results indicate that GlobalCropYield5min captures both the spatial heterogeneity of yield and its year-to-year variation fairly well.

Fig. 5
figure 5

The yield change at the nationally aggregated level for the top four production countries according the Reported yield (red dashed line) and Simulated yield (green dashed-dotted line) during 1982–2015 for global maize, rice, wheat, and soybean, respectively.

Comparison with existing global crop yield products

We compared GlobalCropYield5min, SPAM, and GDHY with sub-national yield records from approximately 12,000 administrative units (Figs. 6, 7 and Figs. S1820) across the top seven crop-producing countries and globally. Their performance was evaluated using the Taylor Diagram (Fig. 6 and Figs. S1820) for 2000, 2005 and 2010, respectively. Notably, GlobalCropYield5min aligns more closely with the observed data (purple dot) on the x-axis than SPAM and GDHY. For all crop types in the top seven production countries, GlobalCropYield5min generally shows the highest correlation (grey lines) and the lowest RMSE (yellowish dashed lines), with the predicted yields closely matching the observed data (black dashed line, Fig. 6). SPAM exhibits a lower correlation with the observed data, moderate RMSE values, and greater spatiotemporal variations. In contrast, GDHY performs the worst at the sub-national level, showing the weakest correlation with the observed data (correlation coefficient <0.1 in many countries, which were excluded from the figures), and a high RMSE compared to observed data.

Fig. 6
figure 6

Taylor diagram comparing the existing three global maize yield data products with the records (i.e., observed in the figure) in 2000, 2005, and 2010, respectively.

Fig. 7
figure 7

The scatter plots between the simulated yields from the GlobalCropYield5min, SPAM and GDHY and the reported yield (~12,000 administrative units) in 2000, 2005, and 2010, respectively. The solid and gray dotted lines represent the fitted linear and 1:1 line, respectively.

At the global level (Fig. 7), similar to previous results, the GlobalCropYield5min dataset shows the highest performance, with R2 ranging from 0.73 to 0.86. This is followed by SPAM, with R2 ranging from 0.44 to 0.82, and GDHY, with R2 from 0.01 to 0.39. Clearly, the accuracy of GlobalCropYield5min and SPAM is significantly higher than that of GDHY. This is likely because the yield datasets used to generate GlobalCropYield5min and SPAM include data from all sub-national administrative units, whereas GDHY relies solely on FAO statistics and has a spatial resolution of 0.5°. Additionally, GlobalCropYield5min provides continuous coverage from 1982 to 2015, whereas SPAM offers crop yield data only for 2000, 2005, and 2010, though its accuracy continues to improve.

Collecting records yield data at the 5-minute resolution for model verification purposes is indeed challenging. However, to address this limitation, we collected actual maize yields from agro-meteorological stations in China for the years 2000, 2005, and 2010 (Figs. S21S22), and compared them with three existing products for crop yield prediction. Our analysis revealed a significant correlation between the GlobalCropYield5min and agro-meteorological stations yield data (P < 0.001), with an average R2 of 0.69 and an NRMSE of 16.2%. Importantly, the GlobalCropYield5min product consistently demonstrated the lowest NRMSE. Notably, the GlobalCropYield5min product consistently exhibited the lowest RRMSE, whether analyzed for all three years collectively or separately. In comparison, the SPAM dataset exhibited slightly lower accuracy, with an average R2 of 0.61 and an NRMSE of 21.15%. The GDHY dataset performed the worst, aligning with our validation results at the administrative scale (Figs. 6, 7, S1314). Consequently, GDHY consistently shows the lowest accuracy in both administrative and field-scale validations. Although SPAM has slightly lower accuracy than GlobalCropYield5min, its temporal discontinuity, with updates only every five years, is a major limitation. In contrast, our dataset offers continuous time-series data, supporting both spatial and temporal analyses. Overall, GlobalCropYield5min outperforms the other two global yield datasets in terms of simulation accuracy, spatial resolution, and temporal coverage.

Spatial distribution of uncertainty in GlobalCropYield5min

Regarding spatial uncertainty, the mean NRMSE for maize, rice, wheat, and soybean was 25.7%, 22.1%, 23.7%, and 22.1%, respectively. Notably, 83.6%, 72.9%, 74.6% and 78.4% of the grids for maize, rice, wheat, and soybean, respectively, had a NRMSE below 30%, indicating low uncertainty in the GlobalCropYield5min dataset. Specifically, uncertainty was low in the Midwest region of the United States, most of Europe, India, Thailand and Pakistan for maize. Uncertainty was low in Myanmar, Thailand, Lao People’s Democratic Republic, Cambodia, Viet Nam, Indonesia, and southern Brazil for rice. It was low in eastern United States, eastern Argentina, Europe, and parts of Asia including Afghanistan, Pakistan, and the North China Plain for wheat; and in the Midwest United States and central-western Brazil for soybean. However, uncertainty was high (NRMSE > 40%) in 12.4%, 1.8%, 4.0%, and 5.8% of grids for maize, rice, wheat, and soybean, respectively. These regions with higher uncertainty were primarily located in northeastern Brazil, northern Argentina, northeastern China, and the Philippines for maize; southwestern China, central Pakistan, and northwestern Brazil for rice (Fig. 8); northeastern China and western Australia for wheat; and southern Brazil and northern China for soybean.

Fig. 8
figure 8

Spatial pattern of uncertainty (NRMSE in %) in GlobalCropYield5min.

Uncertainties and caveats

We applied ML models optimized for crop yield prediction at global scales, demonstrating their notable performance compared to previous studies (See Supplementary Text 2 for details). While the GlobalCropYield5min dataset provides a high-resolution global crop yield coverage, we acknowledge the uncertainties in its production. Firstly, the input datasets from multiple sources47,65 may introduce biases in the crop yield estimates. For example, variations in planting and harvesting dates over time due to climate, technology, and socioeconomic changes can affect the estimation results. As some previous studies50,58,66, although we converted the fixed crop calendar to a monthly time step to reduce sensitivity, it is important to consider the dynamic nature of planting and harvesting dates for accurate yield mapping. Additionally, the use of a dynamic crop harvested area map would enhance long-time yield mapping. In this study, we employed crop harvested area-weighted gridded data for the administrative units based on some previous studies25,67,68, but future studies should address the challenge of obtaining globally continuous coverage, high-resolution and temporal crop harvested area dataset. The uncertainties mentioned above are difficult to reduce unless substantial improvements in data quality28,65,69,70,71. Secondly, due to the difference in time length of collected sub-national statistics across regions, we interpolated data from high-level units to low-level units by using their proportions based on 5 years of average proportions7,50,72. By assuming the fluctuations in the proportion and spatial distributions conserved over time, minor errors may arise if these assumptions do not hold true. Moreover, misreporting of the collected sub-national agricultural statistics, spatial interpolating methods, various data availability, and imperfect modeling can contribute to uncertainty in developing the crop yield products. Additionally, the large discrepancies in crop yield gaps between two adjacent countries may lead to apparent spatial edges in the GlobalCropYield5min dataset because the model was built for the country, particularly for wheat and maize in Europe. Finally, it is important note that while the GlobalCropYield5min dataset offers high-resolution and long-time coverage, it may not be suitable for the regions with microclimate features, complex terrains, or heterogeneous land cover, as these factors were not explicitly considered in the simulations.