Keywords

1 Introduction

The main goals of precision agriculture (or precision farming) generally include the minimisation of negative environmental impacts on the one hand, and the maximisation of economic profit on the other hand [3, 17]. Geospatial information is highly valuable for these purposes [24, 25], in particular when based on Semantic web principles [16]. A differentially corrected Global Navigation Satellite Systems (GNSS) equipped yield monitoring system on field harvesters enables collection of georeferenced yield data [10, 18]. These data can be processed within Geographic Information system (GIS) using several interpolation techniques in order to generate detailed yield maps [5, 19]. On their basis, farmers can more precisely determine where exactly to put which inputs and in what quantities, because data from field harvesters represent the most detailed source of yield information. Unfortunately, these measurements usually contain errors which influence results of complex spatial analyses [7]. As suggested for example by [2], as well as [4], such errors might arise for the following reasons: the occurrence of unexpected events during the harvesting process leading to unusual behaviour on the part of the machine; the trajectory of the field harvester; and errors caused by the wrong calibration of the yield monitor. Therefore, the measurements need to be processed and filtered (different types of filters can be used; see Sect. 2.2 for more details).

Other aspects requiring further investigation include the influence of the individual interpolation methods on the quality of the resulting yield maps. The objective of this paper is to investigate the influence of three interpolation methods (Inverse Distance Weighted, Inverse Distance Squared and Ordinary Kriging) commonly used to generate of yield maps. To compare them, we used descriptive statistics, Mean Prediction Error, Root Mean Square Prediction Error and Map Algebra. Interpolation methods were applied on three fields from Rostěnice Farm (Czech Republic). See details in Sect. 2.1.

2 Materials and Methods

2.1 Study Site

Data measured by a cereal field harvester were used to analyse and evaluate the approaches of spatial filtering and interpolation. Data acquisition was conducted at the Rostěnice cooperative farm in the south-eastern part of the Czech Republic (Fig. 1). The farm, Rostěnice a.s. (N49.105 E16.882), manages over 8,300 ha of arable land in the South Moravian region of the Czech Republic (see Fig. 2).

Fig. 1.
figure 1

Overview map of the Rostěnice farm.

Fig. 2.
figure 2

Detailed map of the Rostěnice farm with the studied fields highlighted.

The average annual temperature is 8.8 ℃ and the average annual rainfall is 544 mm. Within the managed land, the most prevalent soil types consist of Chernozem, Cambisol, haplic Luvisol, Fluvisol near bodies of water, and, occasionally, also Calcic Leptosols. The fields are located mainly in sloping terrain. The main programme consists in plant production, where the main focus is on the cultivation of malting barley, maize for grain and biogas production, winter wheat, oilseed rape, and other crops and products such as soybean and lamb. The high spatial variability of soil conditions in the southern part of farm has led to the adoption of precision farming practices, such as the variable application of fertilisers (since 2006) and crop yield mapping by field harvesters (since 2010).

2.2 Sensor Measurement and Processing

Data were measured for three fields by a CASE IH AXIAL FLOW 9120 field harvester equipped with an AFS Pro 700 monitoring unit in 2017. The measurements were of GNSS-RTK (Real Time Kinematics) quality. Measurements were taken continuously each second at an average speed of 1.55 m.s−1, recommended as optimal at the Rostěnice farm for cereal harvesting by the CASE IH AXIAL FLOW 9120 harvester. The harvesting width was 9.15 m. The monitoring of yield was conducted as on-the-go mapping by recording grain flow and moisture continuously over the whole plot area. The crop type did not influence the spatial density of the performed measurements. The measurements were stored directly in the field harvester and manually copied to a USB flash drive after the end of operations on the pilot fields.

As mentioned above, data from field harvesters contain different types of errors. These data errors corrupt the results, which means the datasets need to be processed and filtered. Filtering of data from field harvesters was described, e.g., by [6, 12, 22]. We used the approach introduced [20]. This approach comprises two subsequent steps – global filtering and local filtering. Global filtering removes non-reliable measurements (data point values) within the whole dataset by means of a statistical analysis of measurement values and related attributes. Local filtering then focuses on some parts of the dataset in a higher detail (Table 1), and it is mostly based on the analysis of the neighbourhood of data point values.

In the approach used [20], global filters detect incorrect outliers based on:

  • the range of possible yield values,

  • the speed of a field harvester,

  • the direction of harvesting.

Local filtering brings the most accurate results regarding domain knowledge, e.g. measurements, data processing and yield history, as well as knowledge of the data, of the situation, and of the whole range of issues in general. Local filtering comprises a set of subjective methods (points are excluded manually). In the approach used [20] local filters identify potentially incorrect values when the following situations occur in datasets:

  • the crossings in harvester trajectory,

  • the neighbouring rows in harvesting trajectory are too close to each other,

  • the gaps in measurements within one row of trajectory.

    Table 1. Absolute number of sensor measurements, relative percentage of data points after global and local filtering.

2.3 Interpolation Methods

There are many spatial interpolation methods applied in various environmental-related disciplines and many diverse factors affect the performance of these methods [13]. From the wide range of interpolation methods mentioned by [13], only a few are used in precision agriculture to process field harvester data. [11] used punctual and block kriging. [21] applied IDW in addition to the aforementioned two methods. [23] compares Ordinary Kriging (OK), Inverse Distance Weighted (IDW) and Inverse Distance Squared (IDS). We decided to test these three interpolation methods, firstly because they are used in precision agriculture and, secondly, because they are the most commonly used interpolation methods in environmental sciences (see [13]).

IDW is a popular deterministic method for spatial analysis. IDW is multivariate interpolation, which means it is a function for more than one variable. The method is used to predict the unknown points in specific locations. The unknown values are calculated as a weighted average from the known available values [15]. Inverse Distance Squared (IDS) is very similar to IDW, but power of IDW is equal to 1 and power of IDS is equal to 2. IDW is referred to as linear interpolation, while equation of IDS is squared. IDW and IDS are deterministic methods [1, 23].

Kriging – also known under the acronym BLUE (Best Linear Unbiased Estimator) – is a stochastic (geostatistical) interpolation method which uses geostationary estimation methods. The interpolated values are calculated by a Gaussian process and controlled by covariances. Local estimate is used for kriging, which means that the expected values of the variable are calculated from available data in a relatively small neighbouring area. There are several types of kriging, where ordinary kriging is the most frequently used one. Ordinary Kriging (OK) is spatial interpolation, where the error variance is minimised [8]. OK provides estimate values in points or in blocks for which a variogram is known. Data in the neighbourhood of the predicted value are used for the estimate [26]. An alternative to OK that is another variant of kriging and that can be used to process yield data is, for example, Simple Kriging (see [20]).

The three algorithms mentioned above represent both deterministic (IDW, IDS) and stochastic methods (OK). IDS and IDW methods can be considered simpler in terms of setting their input parameters. All three interpolations were computed in ArcGIS 10.6 software. The parameters were computed by means of Exploratory Spatial Data Analysis.

2.4 Verification Methods

The differences between observed and calculated values can be evaluated using simple descriptive statistics, Mean Prediction Error, Root Mean Square Prediction Error and Map Algebra. Within the methods of descriptive statistics, we investigated the mean, and especially the minimum and maximum, to determine whether the studied interpolation methods overestimate or underestimate the results compared to the original data.

Mean Prediction Error (MPE) is the average difference between the measured and the predicted values (Formula 1). The error values of the estimates should be impartial and their average should be zero [9].

$$ MPE\, = \, \frac{{\mathop {\sum }\nolimits_{i = 1}^{n} (\hat{Z}(s_{i} )\, - \,z(s_{i} ))}}{n} $$
(1)

Where:

  • \( \hat{Z}(s_{i} ) \): predicted value,

  • \( z(s_{i} ) \): measured value,

  • \( n \): number of observations.

Root Mean Square Prediction Error (RMSE) is the standard deviation of the prediction errors. It is the square root of the average of squared differences between predicted and measured values (see Formula 2). Smaller value of the RMSE means the model is more suitable, because the calculated values are closer to the measured values [9].

$$ RMSE\, = \, \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} (\hat{Z}(s_{i} )\, - \,z(s_{i} ))^{2} }}{n}} $$
(2)

Where:

  • \( \hat{Z}(s_{i} ) \): predicted value,

  • \( z(s_{i} ) \): measured value,

  • \( n \): number of observations.

To visualise and describe spatial patterns of the differences between interpolation algorithms, Map Algebra was used. Map Algebra and especially the relative difference method provide insights into variances between values in overlapping raster data [14]. The relative differences (dv) were used as defined in Formula 3. Relative differences are often used as a quantitative indicator of quality assurance and quality control for repeated measurements/calculations where the results are expected to be similar or the same.

$$ d_{v} \, = \, \frac{{i_{a} \, - \, i_{b} }}{{i_{a} }}\, \times \,100 $$
(3)

Where:

  • ia: reference interpolated value,

  • ib: compared interpolated value.

All these verification methods were computed in ArcGIS 10.6 software. Verification methods were calculated for all three studied fields (Zákostelní, Milešovsko and Kobersko Široké – see Fig. 2) and interpolated surfaces from all three steps of data filtration (Measured data; Global filtering; Global and Local filtering).

3 Results

Interpolations were made for all datasets, with nine different interpolated surfaces for each field (see the example of Kobersko Široké field provided in Fig. 3). Relative yield values were used for more suitable comparison. The size of the pixels of the interpolated surfaces was 3.5 × 3.5 m. OK parameters are presented in Table 2. For filtered data, spatial extents were smaller due to the filtration process, which removes measurement points at the edges of fields. In order to achieve homogeneous (consistent) and comparable results, we decided not to use extrapolation methods because their precision in the respective areas would be debatable.

Fig. 3.
figure 3

Interpolations of measured, globally filtered, and both globally and locally filtered data for the Kobersko Široké field [relative yield values in %].

Table 2. Basic settings of the OK interpolation for the selected fields.

As the first step in comparison of the interpolation algorithms used, descriptive statistics were calculated. Figure 4 shows the mean values and the total range of values, both for the input data and the interpolated surfaces. At the same time, it is clear that especially global filtering removes extreme outliers and considerably reduces the overall range of values (Fig. 4 – compare the columns Measured data and Global filtering). A similar but much smaller effect in reducing the range to the measured data is also detectable in all three interpolation algorithms used (Fig. 4 – compare the columns Input data and the others).

Fig. 4.
figure 4

Minimal, mean and maximum values for three compared interpolation algorithms, input data and three testing fields [absolute yield values].

The second step in comparison of the interpolation algorithms used was based on MPE and RMSE calculation. Results are presented in Fig. 5. Both MPE and RMSE express average model prediction error, but RMSE also expresses extreme errors.

Fig. 5.
figure 5

RMSE and MPE values for three compared interpolation algorithms and three testing fields [absolute yield values].

Figure 6 depicts the relative differences between the compared interpolation methods. We chose the Kobersko Široké field for this visualization, because there are evident differences between interpolation algorithms (see Fig. 5). Figure 6 shows also the spatial pattern of the calculated relative differences. Relative differences occur in both pairs of compared algorithms at the field edges. However, the biggest relative differences exist between OK and IDS in some field passes; this is since IDS is an exact interpolator, which gives the most weight to near input points, while OK is rather a spatial estimator.

Fig. 6.
figure 6

Relative differences between IDS and IDW (left) and IDS and OK (right) for both globally and locally filtered data from Kobersko Široké field [in %].

4 Discussion

When comparing individual interpolation algorithms (IDW, IDS and OK), we focus primarily on their applicability within the three above-mentioned steps of processing data from field harvesters (measured data, global filtering and local filtering). The analysis of differences between fields in which algorithms have been compared lies beyond the scope of this paper as these differences are influenced by multiple natural and artificial factors (i.e., field shape, topography, soil, water regime, fertilisation, harvesting strategy). A set of various methods have been used in the comparison: descriptive statistic, MPE, RMSE and Map Algebra. RMSE is the most commonly used method for comparing the individual interpolation methods. However, a combination of multiple methods provides a more comprehensive comparison, although it cannot be interpreted as unambiguously as the result of a single indicator. [21] also reach similar conclusions.

It seems that IDW and IDS are more suitable for interpolation of directly measured (unfiltered) data. For the data used in this paper, IDS is characterised by the fact that the average, minimum and maximum values in the interpolated surface are closer to the input data. IDS is also characterized by lower RMSE values. These differences can be partially explained by the number of neighbouring measurement points used in the interpolations. For example, IDS increases weight of the nearest measurement points and, therefore, it essentially reduces the number of measurement points used in the interpolation, thus better capturing the local variability in the data. Similar conclusions are also provided by [23].

When comparing interpolation algorithms for filtered data (for globally filtered and especially globally and locally filtered data), the differences between IDW, IDS and OK have been considerably smaller. Also, with regard the filtered data interpolations, the lowest RMSE values were achieved for IDS. Regarding MPE it was also closest to zero on all three fields. An interesting fact is that MPE values for OK were less than zero in all three fields, which shows that OK tends to underestimate the interpolated results. In general, ordinary kriging acted as a spatial estimator, rather than an exact interpolator [21].

Essential settings for IDW/IDS calculation are simpler than the settings necessary for OK interpolation. Basically, it is necessary to set the cell size of the resulting raster and the power value (1 for IDW; 2 for IDS). Thus, we can conclude that IDW and IDS are more suitable for less experienced users of the GIS technology such as farmers. OK is more difficult to set up, but these additional parameters have a positive effect on the accuracy of the interpolation, especially if there are large gaps without measurement points in the input datasets appearing, e.g., when yield data are filtered.

The obvious limit of this research certainly lies in the fact that we have tested only a limited number of interpolation algorithms (and their settings). However, these interpolation algorithms are also used by other authors for yield data processing [11, 21, 23] and, according to [13], IDW, IDS and OK are generally the most commonly used interpolation algorithms in the environmental sciences.

5 Conclusions and Future Work

This paper compared three interpolation techniques in terms of their usability in yield mapping: Inverse Distance Weighted, Inverse Distance Squared and Ordinary Kriging. The measurements from a field harvester equipped with a GNSS unit need to be filtered and interpolated for follow-up analyses. This study verified the differences between the three aforementioned methods of interpolation used on data that were derived from field sensor measurements. These measurements were (1) obtained directly from the field harvester, (2) processed by global filters, and (3) processed by both global and local filters. Statistical analyses evaluated the results of interpolations from three fields (Zákostelní, Milešovsko and Kobersko Široké) cultivated by a fully operational farm (Rostěnice, the Czech Republic).

So far, existing approaches evaluated positional accuracy only with respect to the Root Mean Square Prediction Error. The outcomes of this study have confirmed that a different interpolation method has to be chosen when taking into account: (1) solely the Root Mean Square Prediction Error, or (2) a combination of Mean Prediction Error, Root Mean Square Prediction Error, in combination with descriptive statistics and Map Algebra. In general, Inverse Distance Squared seems to be the most suitable interpolation method, especially when it comes to interpolating unfiltered data. Both Inverse Distance Squared and Inverse Distance Weighted methods are exact interpolators and it is relatively easier to define their input parameters. Ordinary Kriging appears to be relatively the least suitable, but its importance grows especially when a dataset contains missing/removed measurements. However, as in previous studies, no universally valuable advice could be given as to which interpolation method is the best. More general recommendations, but only for yield mapping domain, can be formulated after additional testing on multiple datasets from different fields with different crops etc.

The conducted study will also serve as a resource for further research that will attempt to compare the measured yield with that predicted based on yield productivity zones. Interpolated surfaces serve as the yield productivity zones for both measurement and prediction. The resulting interpolated surfaces are considerably influenced by the applied interpolation method. Interpolation algorithms and their settings therefore influence the evaluation of yield productivity predictions when confronted with the measured values.