Open AccessArticle

Generalization Ability of Bagging and Boosting Type Deep Learning Models in Evapotranspiration Estimation

Manoranjan Kumar

Yash Agrawal

²,

Sirisha Adamala

^3,*,

Pushpanjali

¹,

A. V. M. Subbarao

¹,

V. K. Singh

¹ and

Ankur Srivastava

^4,*

Central Research Institute for Dryland Agriculture, Hyderabad 500059, Telangana, India

Gramworkx Agrotech Pvt Ltd.—GramworkX, Keonics, Phase 3, 1st Sector, HSR Layout, Bengaluru 560102, Karnataka, India

National Bureau of Soil Survey and Land Use Planning (NBSS&LUP), Nagpur 440033, Maharashtra, India

⁴

Faculty of Science, University of Technology Sydney, Sydney, NSW 2007, Australia

Authors to whom correspondence should be addressed.

Water 2024, 16(16), 2233; https://doi.org/10.3390/w16162233

Submission received: 3 July 2024 / Revised: 5 August 2024 / Accepted: 6 August 2024 / Published: 8 August 2024

(This article belongs to the Special Issue Water Management in Arid and Semi-arid Regions)

Download

Browse Figures

Figure 1
Map showing the locations selected for the study. "> Figure 2
Flowchart to estimate ETo accurately using different models. "> Figure 3
Scatter plot showing evapotranspiration computed by P-M method and RF deep learning models for (a) Ballari, (b) Bengaluru, (c) Chikmaglur, (d) Chitradurga, (e) Devnagiri, (f) Dharwad, (g) Gadag, (h) Haveri, (i) Koppal, (j) Mandya, (k) Shivmoga, and (l) Tumkuru. "> Figure 3 Cont.
Scatter plot showing evapotranspiration computed by P-M method and RF deep learning models for (a) Ballari, (b) Bengaluru, (c) Chikmaglur, (d) Chitradurga, (e) Devnagiri, (f) Dharwad, (g) Gadag, (h) Haveri, (i) Koppal, (j) Mandya, (k) Shivmoga, and (l) Tumkuru. "> Figure 4
Scatter plot showing evapotranspiration computed by P-M method and GB deep learning models for (a) Ballari, (b) Bengaluru, (c) Chikmaglur, (d) Chitradurga, (e) Devnagiri, (f) Dharwad, (g) Gadag, (h) Haveri, (i) Koppal, (j) Mandya, (k) Shivmoga, and (l) Tumkuru. "> Figure 4 Cont.
Scatter plot showing evapotranspiration computed by P-M method and GB deep learning models for (a) Ballari, (b) Bengaluru, (c) Chikmaglur, (d) Chitradurga, (e) Devnagiri, (f) Dharwad, (g) Gadag, (h) Haveri, (i) Koppal, (j) Mandya, (k) Shivmoga, and (l) Tumkuru. "> Figure 5
Scatter plots showing evapotranspiration computed by P-M method and XGBoost deep learning models for (a) Ballari, (b) Bengaluru, (c) Chikmaglur, (d) Chitradurga, (e) Devnagiri, (f) Dharwad, (g) Gadag, (h) Haveri, (i) Koppal, (j) Mandya, (k) Shivmoga, and (l) Tumkuru. "> Figure 5 Cont.
Scatter plots showing evapotranspiration computed by P-M method and XGBoost deep learning models for (a) Ballari, (b) Bengaluru, (c) Chikmaglur, (d) Chitradurga, (e) Devnagiri, (f) Dharwad, (g) Gadag, (h) Haveri, (i) Koppal, (j) Mandya, (k) Shivmoga, and (l) Tumkuru. ">

Versions Notes

Abstract

The potential of generalized deep learning models developed for crop water estimation was examined in the current study. This study was conducted in a semiarid region of India, i.e., Karnataka, with daily climatic data (maximum and minimum air temperatures, maximum and minimum relative humidity, wind speed, sunshine hours, and rainfall) of 44 years (1976–2020) for twelve locations. The Extreme Gradient Boosting (XGBoost), Gradient Boosting (GB), and Random Forest (RF) are three ensemble deep learning models that were developed using all of the climatic data from a single location (Bengaluru) from January 1976 to December 2017 and then immediately applied at eleven different locations (Ballari, Chikmaglur, Chitradurga, Devnagiri, Dharwad, Gadag, Haveri, Koppal, Mandya, Shivmoga, and Tumkuru) without the need for any local calibration. For the test period of January 2018–June 2020, the model’s capacity to estimate the numerical values of crop water requirement (Penman-Monteith (P-M) ETo values) was assessed. The developed ensemble deep learning models were evaluated using the performance criteria of mean absolute error (MAE), average absolute relative error (AARE), coefficient of correlation (r), noise to signal ratio (NS), Nash–Sutcliffe efficiency (ɳ), and weighted standard error of estimate (WSEE). The results indicated that the WSEE values of RF, GB, and XGBoost models for each location were smaller than 1 mm per day, and the model’s effectiveness varied from 96% to 99% across various locations. While all of the deep learning models performed better with respect to the P-M ETo approach, the XGBoost model was able to estimate ETo with greater accuracy than the GB and RF models. The XGBoost model’s strong performance was also indicated by the decreased noise-to-signal ratio. Thus, in this study, a generalized mathematical model for short-term ETo estimates is developed using ensemble deep learning techniques. Because of this type of model’s accuracy in calculating crop water requirements and its ability for generalization, it can be effortlessly integrated with a real-time water management system or an autonomous weather station at the regional level.

Keywords:

ensembled deep learning; reference evapotranspiration; decision tree; XGBoost

1. Introduction

Crop water requirements form the major basis for irrigation water management. Evapotranspiration (ET), a word used to describe the combined processes of evaporation and transpiration from the surface of the soil and plant leaves, respectively, is measured by crop water requirements. The evaporation from the soil surface and the transpiration from plant stomatal activity are combined in the ET, a bio-physical process [1]. One of the most crucial elements of the hydrologic cycle and the water balance model is the process of determining ET. Numerous studies, including hydrologic water balance, crop yield simulation, irrigation system design and management, and the planning and management of water resources, depend on its precise estimation.

An empirical crop coefficient (Kc) is used to moderate the estimated reference evapotranspiration (ETo) from a standard surface, which is a typical method for predicting ET at any given time for a particular crop. Even though a lysimeter is the optimal tool for measuring ETo, this strategy is not always feasible because of its time consumption, expensiveness, and need for extensive fieldwork as well as careful observation. Because ETo data is widely used in agriculture and hydrology studies, over the past 70 years, a number of studies and research projects have focused on developing various types of mathematical models to indirectly estimate ETo and enhance the performance of these models. The literature [2] offers a variety of indirect ETo estimation methods, from empirical to complicated models. However, the choice of appropriate models for ET estimation is contingent upon the data at hand, the features of the region, and the level of accuracy required. A sophisticated method known as the Penman–Monteith (P-M) model [3,4,5,6] makes use of the physical properties of the ET process, such as heat and mass balance, by fusing energy and aerodynamic terms. This approach is acknowledged as the most effective worldwide method for estimating ETo in situations where measured ETo values are unavailable [7,8]. Moreover, a number of numerical models are calibrated and validated using P-M calculated ETo as the base method.

Because machine learning and artificial intelligence-based numerical approaches can map the input–output relationship without requiring a deep understanding of the underlying physical process, they have become increasingly used in evapotranspiration estimation over the past 20 years [9,10,11,12]. Within machine learning, deep learning is a subset that simulates how the human brain processes information and forms patterns to aid in decision-making. Deep learning is different from typical machine learning techniques in that it can automatically learn data representations from raw inputs without requiring manual feature extraction. The availability of big datasets, increased processing power, advancements in algorithms, and optimization approaches have all contributed to the extraordinary advancements in deep learning in recent years. The key advantages of deep learning are unparalleled performance, feature learning, scalability, versatility, adaptability, interpretability, integration with big data, and real-world applications. In agriculture, deep learning can be used for crop monitoring and disease detection, precision agriculture, weed and pest control, yield predictions, climate resilience, and real-time water management by accurately estimating crop water requirements.

Deep learning uses many layered artificial neural networks (ANNs) that have been trained on a vast quantity of data in order to identify patterns and provide predictions. With the introduction of the ANNs and their numerous variations, the use of advanced computational capabilities in the estimation of evapotranspiration was initiated [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28]. The use of these deep learning techniques in modeling and evapotranspiration estimation has produced a number of success stories. The ANN model further improved the algorithm of support vector machines (SVMs), which had a (i) less complex structure, (ii) fewer data requirements for training, (iii) very fast data training/learning (iv) parameter optimization with less risk of the learning function’s local minimum, unlike ANN, and (v) was less sensitive to the initial randomization of the weight matrix. Thus, in the last decade, much of the focus has shifted to the implementation of the SVM algorithm in modeling the evapotranspiration process [29]. However, since 2015, simulating the evapotranspiration process with deep learning and extreme learning techniques has received a lot of interest. The decision tree method of the classification and regression tree (CART) type was primarily used in this technique [30,31,32,33,34]. However, by using the more reliable ensemble deep learning approach, the performance could be improved even more. The boosting techniques utilized in this ensemble deep learning approach further enhance the learning performance algorithm [35].

The focus of this study is on how different deep learning algorithms, including Random Forest, Gradient Boosting (GB), and Extreme Gradient Boosting (XGBoost), perform in relation to one another. The application of these techniques and their comparative impact on the mapping ability of the evapotranspiration process has not been explicitly demonstrated in the literature [31,36,37,38,39,40,41,42]. Additionally, limited or no studies have been carried out in the past to explore the generalization capabilities of these models based on deep learning approaches. This is particularly important because once these models can be generalized, they can be directly implemented in locations where past data are unavailable to develop locally trained models. The current study summarizes the comparative effectiveness of these techniques in estimating the ETo for locations whose data are not included in the model training and validation, in addition to reporting the successful implementation of these enhancements in the modeling of the evapotranspiration process. The novelty of the present study involves the use of ensemble deep learning approaches (bagging and boosting) to develop generalized ETo models from climatic data from a single station and test their utility in eleven other locations that were not used during model training and development.

2. Materials and Methods

2.1. Location

The current study was conducted using the climatic data collected from twelve locations, such as Bengaluru, Ballari, Chikmaglur, Chitradurga, Devnagiri, Dharwad, Gadag, Haveri, Koppal, Mandya, Shivmoga, and Tumkuru. The different locations selected for the present study represent the Karnataka state of India, which mostly falls under Agroecological Zone 10, called the Southern Plateau and Hill region in the Indian subcontinent (Figure 1). This region is spread to the extent of 11.30° N to 18.30° N and 74.0° E to 74.30° E, and the area of the region is 19.20 million ha. The climate is typically semiarid, with 69% of the cultivable area falling under dryland agriculture. The major soil types include sandy loam, red calcareous, and black soils. The region receives meager annual rainfall ranging between 500 and 700 mm, compelling inhabitant farmers (mostly small and marginal farmers owning less than 1 ha of culturable land) to practice sustenance farming due to low cropping intensity and agricultural productivity. Thus, water management at a regional scale assumes a key factor in sustainable and enhanced farm production in this region. The important crops grown in this region include cotton, jowar, bajra, groundnut, millets, banana, turmeric, onions, and chilies, along with fodder crops.

2.2. Data Set and Methodology

Table 1 contains information related to latitude, longitude, altitude of the chosen sites, and length of data used. The altitude of selected sites varied from 485 m at Ballari to 1090 m at Chikmaglur above the mean sea level. In this study, the ensemble deep learning models were developed using the input climate data of seven variables (maximum and minimum air temperatures, maximum and minimum relative humidity, wind speed, sunshine hours, and rainfall) of the Bangalore location with approximately 44 years of daily data. Since measured lysimeter ETo data were unavailable for the chosen study locations, the FAO-56 P-M approach was used to estimate ETo instead. This method is recommended as the only accepted way to compute ETo when lysimeter data is not available [5]. The 44 years (1 January 1976 to 30 June 2020) of daily climatic data correspond to all the basic parameters used to estimate ETo using the FAO-56 P-M method, which was used as observed or measured data to map input-output relations using the ensemble deep learning approach. As recommended by [5], a data quality check was carried out, and incorrect and missing records were excluded from the model training, validation, and testing phases. The data was divided into input data for learning (used for developing the model) and validation (testing the model).

Table 2 consists of the statistical summary of the climatic data used in this study for twelve locations. The values in Table 2 are mean ± standard deviation of corresponding climatic data for a location. The mean T_max and T_min range from 28.42 to 33.48 °C and 18.14 to 21.73 °C at Chikmaglur and Koppal locations, respectively. The mean RH_min and RH_max range from 44.2 to 66.0% and 75.1 to 82.8%, respectively. The mean wind speed and sunshine hours ranged from 4.58 to 13.2 km h⁻¹ and 6.62 to 8.52 h, respectively. The mean annual rainfall varied from a minimum of 529.1 ± 124.6 mm at Ballari to a maximum of 1074.9 ± 183.7 mm at the Chikmaglur location. The variation of P-M ETo varied as the minimum mean at Chikmaglur (5.07 mm day⁻¹) and the maximum mean at Chitradurga (7.97 mm day⁻¹).

Figure 2 depicts the detailed methodology. For the meteorological station Bengaluru in the Indian state of Karnataka, daily meteorological data for fundamental climatic parameters, such as minimum and maximum temperature, minimum and maximum relative humidity, wind speed, sunshine hours, and rainfall, were gathered (Figure 1). Numerical values for Julian Days (1–365), months (1–12 respectively for January–December), and quarters (1–4 respectively for January–March; April–June; July–September and October–December) were also included in the input data set in order to capture the monthly and seasonal variation in the underlying evapotranspiration process. The chosen stations represent the agroclimatic zone of the Southern Plateau and Hill in the Indian subcontinent.

The models were developed using the climatic data of Bengaluru stations with corresponding P-M ETo values as targets. For this purpose, daily records for the duration 1 January 1976–31 December 2017, excluding the incomplete and erroneous daily records, were considered. For model testing, daily records, from 1 January 2018 to 30 June 2020 were considered. The model thus developed was directly applied to 11 other locations that were not included in model development. In these locations, developed models were tested using the daily records from 1 January 2018 to 30 June 2020, as is the case with model validation using the meteorological station in Bengaluru. If discrepancies in any of the input parameters were observed, the entire daily record was not considered. Like data sets for model learning, for testing too, incomplete and erroneous records were not considered. All the codes required to run Extreme Gradient Boosting (XGBoost), Gradient Boosting (GB), and Random Forest (RF) models were written in Python 3.10 (Python.org) https://www.python.org/downloads/ (accessed on 6 May 2021). It is to be noted that these models can also be executable in R packages [43,44,45].

2.3. FAO-56 Penman Monteith (P-M) ETo Estimation

If the required climate data are available, the combination methods, more precisely the Food and Agricultural Organization (FAO)-56 Penman–Monteith (P-M) equation, produce the most accurate estimate of ETo across all climatic circumstances [5]. The FAO-56 P-M method uses a combination of mass and energy equations with climatological data as input to calculate ETo. By using a reference surface, this method avoids considering parameters for every crop and stage of growth. Rather, crop coefficients are used to convert ETo to actual crop ET. The FAO-56 P-M approach was acknowledged as a standard equation for estimating ETo and evaluating other methods in the absence of measured lysimeter data. The disadvantage of FAO-56 P-M method is that it cannot be used for locations with limited climatic data, as many sites in developing countries lack access to all the necessary climate data. The FAO-56 P-M method, which is explained below, was used to calculate daily potential evapotranspiration using climatic data as input.

E T o = \frac{0.408 Δ (R_{n} - G) + γ \frac{900}{(T_{m e a n} + 273)} u_{2} (e_{a} - e_{d})}{Δ + γ^{*}}

(1)

where ETo = reference evapotranspiration (mm/d),

T_{m e a n}

= mean of minimum and maximum temperature (°C⁻¹),

u_{2}

= horizontal wind speed at 2-m height, (m/s),

Δ

= slope of the saturation vapor pressure-temperature curve (kPa °C⁻¹),

e_{d}

= saturation vapor pressure at air temperature (kPa),

R_{n}

= net radiation (MJ m⁻² d⁻¹),

γ

= psychrometric constant (kPa °C⁻¹),

G

= solar heat flux density to the ground (MJ m⁻² d⁻¹),

e_{a}

= saturation vapor pressure at air temperature (kPa).

2.4. Ensemble Learning through Regression Tree Algorithm

The regression trees are essentially decision trees utilized for the regression problems. Regression trees can be used to predict continuous, valued outputs rather than discrete outputs. The decision tree algorithm is a kind of supervised classifier that combines classification and regression analysis. The algorithm first classifies the target values in which a certain class is identified within which the target variable most likely falls. Later, the regression tree component or predictive model component of the decision tree predicts the value based on the classified target variable. This iterative process continues unless predefined criteria to terminate the process (usually the subdivision process and minimizing the error) are met. The training data, which is initially included in the root node, is subdivided recursively during the creation of the regression tree model. The data is divided into branches, or subdomains, by the recursive subdivision. Most of them are linear regression models with many variables. The parent node of a branch is divided into left and right child nodes, or leaf nodes, as a result of further subdividing the data into branches. This process is also accomplished by a linear model, which is used to generate predictions.

The growth process of the regression tree is responsible for the recurrent recursive split of training data sets. This is accomplished by taking into account two factors: the subdivision process and minimizing error, or impurity. In order to minimize the least square deviation, the division process is divided into smaller divisions at each stage. There are several variants of decision tree models based on the error optimization model used in the algorithm. The most adopted algorithm is Random Forest (RF), in which many decision trees are individually evaluated for the input and target data sets and then clubbed together (bagging) to find the global optimal solution. In another approach, the data sets are initially filtered through the function of the base learner before employing a decision tree algorithm for an optimal solution. This approach is called boosting-type deep learning. Gradient Boosting (GB) and Extreme Gradient Boosting (XGBoost) are the most common algorithms under this approach.

In deep learning, two sophisticated ensemble techniques are bagging and boosting. Several base models are combined in the ensemble approach of deep learning to get the best possible prediction/optimized model by reducing errors. Deep learning ensemble techniques facilitate more accurate and improved decision-making. While boosting trains the model sequentially by emphasizing the error produced by the previous model, bagging combines many models trained on distinct subsets of data. Bagging uses Bootstrap to create subsets of the data, while boosting reweights the data based on the error from the previous model, making the next model focus on misclassified instances. The advantages of the boosting and bagging ensample approach lie not only in producing more accurate model results but also in reducing noise, bias and variance errors, avoiding over-fitting, and yielding a higher stability model.

2.5. Random Forest Algorithm (Bagging)—RF Model

Decision tree algorithms are best suited for the binary classification of data as they determine the optimal choice at each node and do not consider global optimization. The decision tree model leaves out bias-related errors while attempting to reduce variance-related errors. As a result, the decision tree algorithm may overfit, and problems with local minima may occur. The RF algorithm takes into account both variance and bias-related errors. The set of decision trees whose output is combined to get the final result is known as the random forest algorithm. This algorithm prevents significant bias-related inaccuracy by limiting overfitting. Because the random forest approach uses a random subset of data for training (row and feature subsampling), it lowers error due to variation. The final result is the culmination of the individual tree growth that each group of data points produces. One can obtain comprehensive details regarding the RF model by referring to [46].

2.6. Gradient Boosting (GB) Algorithm—GB Model

Another type of ensemble deep learning method is the GB algorithm [47]. This approach combines many base learners, which are basic functions, to produce a hypothesis function. A loss function was produced by the differentiation of the hypothesis function. Ultimately, the model learns as a result of the input of the training data set and the loss function into GB. Below is a brief explanation and presentation of the algorithm. The model is started with a constant function,

F_{o} (x)

, which is an optimization problem, in the first phase.

F_{o} (x) = γ_{o p t m a l} = m i n \sum_{i = 1}^{n} L (y_{i}, γ)

(2)

where

L (y_{i}, γ)

= loss function. Initially

F_{o} (x)

γ

that fits the actual y-values in the data sets. The pseudo residuals are then computed by differentiating the loss function, which is thus given as:

r_{i m} = - \frac{\partial L (y, F_{m - 1} (x))}{\partial F_{m - 1} (x)}, for x = x_{i}, y = y_{i} and i = 1, 2, \dots, n

(3)

The pseudo residuals are used to replace

y_{i}

in datasets with

r_{i m}

. Pseudoresidual datasets are used for training and fitting a base learner,

h_{m} (x)

. The algorithm iteratively makes the following changes to itself until the termination conditions are satisfied.

F_{m} (x) = f_{m - 1} (x) + γ_{m} h_{m} (x)

(4)

2.7. Extreme Gradient Boosting (XGBoost) Algorithm—XGBoost Model

An improvement to GB is the ensemble deep learning algorithm XGBoost, which approximates the loss function using a Taylor series second-order approximation rather than pseudo residuals as GB does. The objective function, also known as the loss function, and the explicit regularization term in the objective function are the two main components of the XGBoost algorithm. In a nutshell, the XGBoost model is explained like this:

L = \sum_{i = 1}^{n} l (y_{i} y_{i}^{t - 1} + f_{t} (x_{i})) + Ω (f_{t})

(5)

where

Ω (f_{t}) = γ T + \frac{1}{2} λ ω^{2}

(6)

y_{i}

and

y_{i}^{t - 1}

= observed target and predicted target value, respectively,

x_{i}

= input data, which is approximated by the Taylor series, and

Ω

= regularization function. The

γ

and

λ

= regularization terms, which penalize T (the number of leaves) and

ω

(the weight of different leaves).

2.8. Performance Evaluation

The standard statistical evaluation criteria were adopted for the performance evaluation of the developed models. Those include mean absolute error (MAE), average absolute relative error (AARE), coefficient of correlation (r), noise-to-signal ratio (NS), and Nash–Sutcliffe efficiency (

ɳ

). These evaluation equations are described in Table 3. The various developed models were evaluated in two stages. Firstly, the best-performing models were scrutinized using ɳ coefficient and MAE as defined in Table 3.

Ref. [7] evaluated several conventional ETo estimation models based on the statistical parameter of weighted standard error of estimate (WSEE). The WSEE is comprehensive as it exclusively considers the model error that occurred during peak season. This is the most important aspect, as many irrigation systems are designed to meet water requirements during peak seasons. Therefore, the developed model was also tested for WSEE parameters. The steps for computing WSEE are given in the following paragraphs. Using the following equation, the standard error of estimate (SEE) for ETo estimated by the model for all months and peak months is computed. This shows the goodness of fit without any adjustments.

S E E = {[\frac{\sum_{i = 1}^{n} {(E T_{M} - E T_{P - M})}^{2}}{n - 1}]}^{0.5}

(7)

where ET_M = ETo estimated by model, ET_P-M = ETo estimated by P-M, and n = overall count of data. The ET_P-M and ET_M were used to fit the linear regression line as below.

E T_{P - M} = b \times E T_{M}

(8)

where b = regression coefficient, used to adjust the ETo estimates and SEEs were recalculated for the adjusted SEE (ASEE) values. The WSEE is calculated as below [7]:

W S E E = 0.7 (0.67 (S E E_{a l l}) + 0.33 (A S E E_{a l l})) + 0.3 (0.67 (S E E_{p e a k}) + 0.33 (A S E E_{p e a k}))

(9)

where all and peak = all months and peak month, respectively.

3. Results

The input data (daily climate parameters of 44 years) for Bengaluru location were analyzed using frequency distribution to know the distribution and variability [48]. The analysis indicated that every feature in the input data set showed traits of a conventional Gaussian distribution. This implied that the bulk of the data was centered around the mean, the variables were random, and the corresponding probability distribution function had a conventional bell shape. According to the analysis, the highest frequency range of the different input data varies between 26–30 °C, 18–20 °C, 85–95%, 40–60%, 6–10 kmph, 6–8 h, 0–20 mm, and 4.5–7.0 mm/d for maximum and minimum temperature, maximum and minimum relative humidity, wind speed, sunshine hours, rainfall, and ETo, respectively. The correlation in terms of the ‘Pearson correlation matrix’ between input climatic data and ETo was performed to check the most effecting variables in the estimation [48].

3.1. Performance Evaluation of RF Model

The scatter diagram of the RF model estimated ETo and P-M ETo is presented in Figure 3. For, all the locations, linear trends were observed, although residuals varied to some extent. The model underestimates the higher values and overestimates the lower values. Thus, the range of model output shrinks as compared to observed values of P-M ETo. Table 4 presents the performance statistics. The model could perform fairly across the location as the WSEE values are less than 1 mm/day except in Ballari and Koppal, where Nash–Sutcliffe model efficiency and correlation coefficient deteriorated significantly. The model performed satisfactorily on other performance criteria as well. However, the performance deteriorated significantly in predicting higher values, except for locations in Bengaluru, Chikmaglur, and Mandya.

3.2. Performance Evaluation of the GB Model

Figure 4 presents the scatter plot between P-M ETo and those estimated using the GB model for respective locations. A significant improvement in model performance was observed as compared to the RF model for all the parameters. In this case, the WSEE error was 0.87 mm/day for all the locations as compared to 1.05 mm/day for the RF model. The limitation of the RF model is that it underestimated higher values and overestimated lower values, which were addressed significantly in the GB model. This model also performed substantially better than the RF model on other selected model performance criteria (Table 5). The Nash–Sutcliffe model efficiency and correlation coefficient were more than 0.95 for all the locations. The lower noise-to-signal ratio indicated that the model output is less scattered and more robust as compared to the RF model. Ref. [49] also reported that the GB model was highly recommended for estimating ETo as compared to generalized linear models (GLM) and RF in the Sistan Plains, Iran. Contrary to current results, among the various deep learning models considered in accurately estimating ETo using climatic data, the RF model outperformed the GB model with a higher reduction in RMSE and MAE values [50,51,52].

3.3. Performance Evaluation of the XGBoost Model

The XGBoost model further improves the performance in estimating the ETo to some extent over the GB model. However, the improvement in model performance is significant as compared to the RF model, like in the case of the GB model. In this case, lower values of NS could be obtained, signifying more robustness of the model as compared to the GB and RF models. This can be observed from the scatter plot presented in Figure 5, which is substantiated by a higher correlation coefficient and Nash–Sutcliffe model efficiency, as these values are more than 0.95 (Table 6). The WSEE in estimating P-M ETo is slightly better than the GB model for all the locations, as these are less than 0.85 mm/day. It was also concluded that the XGBoost model got a high score in estimating ETo as compared to the GB and RF models. These results are consistent with the previous studies that were conducted to estimate ETo using various deep learning models, such as XGBoost, GB, RF, Bagged Trees (BT), and Custom Deep Learning [52]. It has also been shown that this study resembles the assessment of ETo in the same agroclimatic region (Karnataka, India), where it was found that the XGBoost model was the most stable model out of other models (Linear Regressor Bagging, RF Regressor Bagging, and Light Gradient Boosting [53].

4. Discussion

Over the past 20 years, there have been numerous reports in the literature about the effective use of sophisticated computer systems to estimate water requirements. On the other hand, not much has been reported on how well these computing methods generalize. All three of the chosen approaches in this study show some degree of generalization capacity at varying degrees. Though it performs decently in the majority of locations, the more widely used RF model as a computational tool for calculating P-M ETo lags noticeably behind GB and XGBoost [54]. Both XGBoost and GB make use of the fastest convergence technique when operating on par. For Bengaluru, the WSEE error was, nevertheless, as low as 0.19 mm/day.

This is to be expected, as model training and validation are performed using this location data. The contrary finding relates to the generalization capacity of the GB and XGBoost models, which fared reasonably well for sites excluded from model validation and training. The use of temporal and spatial data to map the seasonal variations, such as a quarter of the year, Julian day, longitude, and latitude, may be one of the factors contributing to this outstanding performance. This is particularly significant because, in contrast to the P-M method, which also indirectly takes into account temporal and spatial data, most modeling work of this kind only takes into account the basic daily climatic data of six parameters: maximum and minimum temperature, sunshine hour or solar radiation, minimum and maximum relative humidity, and wind speed. As a result, the modeling approach’s current framework more accurately reflects the underlying evapotranspiration mechanism, as specified by the P-M technique.

Due to its high degree of nonlinearity, the ETo at any given day is not directly correlated with the weather from the day before. With little input data, the ensemble models were able to map the underlying nonlinear evapotranspiration process. Consequently, the findings imply that these models could be used to produce estimates of P-M ETo that are reasonably accurate. Moreover, the outcome can be effectively applied in other locations due to the numerical models’ capacity for generalization. It is known that the performance of ensemble deep learning models varies as per the input data. The models can be effectively and accurately trained with more inputs and longer data than with less and shorter time series data. However, in many locations, the availability of all required climatic variables to model the ETo is a major constraint, or most of the climatic data are available on a monthly basis. In those cases, the applicability of ensemble deep learning models developed with full climatic data on a daily basis is not feasible. It may be anticipated that the performance of ensemble deep learning models would deteriorate in cases of limited climatic data, as they are data-driven models. Therefore, ensemble deep learning models need to be developed and tested with different input combinations to enhance their applicability under data-limiting conditions by defining acceptable performance criteria ranges for their adaptability to different climates. Further, it is also suggested to use hourly data using the same abovementioned methodology for more accurate crop water requirement determination and precise water distribution and management. With the advances in technology and artificial intelligence applications in precision agriculture, it is suggested to incorporate the best-performing ensemble deep learning model (XGBoost)-produced code into an automatic weather station to predict the ETo and avoid delays in water management decisions.

5. Conclusions

Accurate quantification of ETo is necessary for different activities such as computation of the hydrological water balance, crop water requirement, crop yield simulation, irrigation scheduling, irrigation system design, reservoir operation, and water allocation. In this study, an attempt was made to develop generalized deep learning models to estimate ETo indirectly using climatic data. For these three ensemble deep learning models, such as XGBoost, GB, and RF, they were developed (trained) using data from one location (Bengaluru). After training and testing of the model at the Bangalore location, the developed models were tested at other eleven locations (Ballari, Chikmaglur, Chitradurga, Devnagiri, Dharwad, Gadag, Haveri, Koppal, Mandya, Shivmoga, and Tumkuru) to test the generalizing capability of the developed ensemble deep learning models. The performance of ensemble deep learning models was evaluated using statistical indices, such as MAE, AARE, r, NS, ɳ, and WSEE.

The findings showed that the decision tree modeling approach based on ensemble deep learning algorithms effectively takes into consideration the nonlinear link between meteorological variables and related P-M ETo. The performances are further enhanced when this strategy is accompanied by an ensemble deep learning algorithm, either in the form of GB and XGBoost boosting or RF bagging approach. For each model and location, the WSEE values produced by the three models were less than 1 mm/day for P-M ETo estimation. It is also found that the performance of the GB model is superior in estimating ETo as compared to RF models. Similar and contradictory results were reported by [49] in the Sistan Plains, Iran, and [50,51,52] in different climatic locations over the globe, respectively. Though all the developed deep learning models performed better with respect to the standard P-M ETo method, the XGBoost model estimated ETo more accurately as compared to the GB and RF models. The higher performance of XGBoost models in estimating ETo across different climatic locations was also witnessed by [52]. Moreover, the XGBoost model performed well out of other deep learning models in estimating ETo accurately for the same study area (Karnataka, India) in different locations [53]. In this work, the ensemble deep learning technique is used to develop a generalized mathematical model for short-term ETo estimation. The applicability of these generalized ensemble deep learning models will be enhanced by developing models with different input combinations (maximum to minimum) to apply under data-limiting conditions and also to develop with hourly climatic data for precise crop water requirement determination to save water under critical conditions. This is especially crucial for real-time water management when estimating P-M ETo. The “on-demand” irrigation water supply made possible by the ETo estimation has the potential to improve overall irrigation efficiency in the command area of large irrigation projects. This result from this study helps in making precise water management decisions for the stakeholders by incorporating the best-performing XGBoost model in the real-time irrigation sensor network or in an automatic weather station.

Author Contributions

Conceptualization, M.K. and Y.A.; methodology, Y.A.; software, M.K. and Y.A.; validation, M.K., Y.A., and P.; formal analysis, M.K.; investigation, Y.A.; resources, V.K.S.; data supply, A.V.M.S.; data curation, S.A.; writing—original draft preparation, M.K.; writing—review and editing, S.A. and A.S.; visualization, M.K.; supervision, V.K.S.; project administration, M.K.; funding acquisition, M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Upon request, the corresponding author will provide the data supporting the study’s findings.

Acknowledgments

The Director of the ICAR-Central Research Institute for Dryland Agriculture (CRIDA), Hyderabad, Telangana, India, is appreciated by the authors for providing the facilities required to complete the current work.

Conflicts of Interest

Author Yash Agrawal was employed by the company Gramworkx Agrotech Pvt Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Smith, M.; Allen, R.G.; Pereira, L.; Camp, C.R.; Sadler, E.J.; Yoder, R.E. Revised FAO methodology for crop water requirements. In Proceedings of the International Conference on Evapotranspiration and Irrigation Scheduling, San Antonio, TX, USA, 3–6 November 1996; ASCE: Reston, VA, USA, 1996; pp. 116–123. [Google Scholar]
Ghiat, I.; Mackey, H.R.; Al-Ansari, T. A review of evapotranspiration measurement models, techniques and methods for open and closed agricultural field applications. Water 2021, 13, 2523. [Google Scholar] [CrossRef]
Penman, H.L. Vegetation and hydrology. Soil Sci. 1963, 96, 357. [Google Scholar] [CrossRef]
Monteith, J.L. Evaporation and environment. In Symposia of the Society for Experimental Biology; Cambridge University Press (CUP): Cambridge, UK, 1965; Volume 19, pp. 205–234. [Google Scholar]
Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapo-Transpiration: Guidelines for Computing Crop Water Requirements; Irrigation and Drainage Paper No. 56; FAO: Rome, Italy, 1998. [Google Scholar]
Pereira, L.S.; Perrier, A.; Allen, R.G.; Alves, I. Evapotranspiration: Concepts and future trends. J. Irrig. Drain. Eng. 1999, 125, 45–51. [Google Scholar] [CrossRef]
Jensen, M.E.; Burman, R.D.; Allen, R.G. Evapotranspiration and irrigation water requirements: A manual. In ASCE Manuals and Reports on Engineering Practice (USA); No. 70; ASCE: New York, NY, USA, 1990. [Google Scholar]
Srivastava, A.; Sahoo, B.; Raghuwanshi, N.S.; Singh, R. Evaluation of variable-infiltration capacity model and MODIS-terra satellite-derived grid-scale evapotranspiration estimates in a River Basin with Tropical Monsoon-Type climatology. J. Irrig. Drain. Eng. 2017, 143, 04017028. [Google Scholar] [CrossRef]
Heramb, P.; Ramana Rao, K.V.; Subeesh, A.; Srivastava, A. Predictive modelling of reference evapotranspiration using machine learning models coupled with grey wolf optimizer. Water 2023, 15, 856. [Google Scholar] [CrossRef]
ASCE Task Committee on Application of Artificial Neural Networks in Hydrology. Artificial neural networks in hydrology-I: Preliminary concepts. J. Hydrol. Eng. 2000, 5, 115–123. [Google Scholar] [CrossRef]
ASCE Task Committee on Application of Artificial Neural Networks in Hydrology. Artificial neural networks in hydrology-II: Hydrologic applications. J. Hydrol. Eng. 2000, 5, 124–137. [Google Scholar] [CrossRef]
Kumar, M.; Raghuwanshi, N.S.; Singh, R. Artificial neural networks approach in evapotranspiration modeling: A review. Irrig. Sci. 2011, 29, 11–25. [Google Scholar] [CrossRef]
Kumar, M.; Raghuwanshi, N.S.; Singh, R.; Wallender, W.W.; Pruitt, W.O. Estimating evapotranspiration using artificial neural network. J. Irrig. Drain. Eng. 2002, 128, 224–233. [Google Scholar] [CrossRef]
Kumar, M.; Bandyopadhyay, A.; Raghuwanshi, N.S.; Singh, R. Comparative study of conventional and artificial neural network based ETo estimation models. Irrig. Sci. 2008, 26, 531–545. [Google Scholar] [CrossRef]
Kumar, M.; Raghuwanshi, N.S.; Singh, R. Development and validation of GANN model for evapotranspiration estimation. J. Hydrol. Eng. 2009, 14, 131–140. [Google Scholar] [CrossRef]
Eslamian, S.S.; Gohari, S.A.; Zareian, M.J.; Firoozfar, A. Estimating Penman–Monteith reference evapotranspiration using artificial neural networks and genetic algorithm: A case study. Arab. J. Sci. Eng. 2012, 37, 935–944. [Google Scholar] [CrossRef]
Adamala, S.; Raghuwanshi, N.S.; Mishra, A.; Tiwari, M.K. Evapotranspiration modeling using second-order neural networks. J. Hydrol. Eng. 2014, 19, 1131–1140. [Google Scholar] [CrossRef]
Adamala, S.; Raghuwanshi, N.S.; Mishra, A.; Tiwari, M.K. Development of generalized higher-order synaptic neural based ETo models for different agro-ecological regions in India. J. Irrig. Drain. Eng. 2014, 140, 04014038. [Google Scholar] [CrossRef]
Adamala, S.; Raghuwanshi, N.S.; Mishra, A. Generalized quadratic synaptic neural networks for ETo modeling. Environ. Process. 2015, 2, 309–329. [Google Scholar] [CrossRef]
Dai, X.; Shi, H.; Li, Y.; Ouyang, Z.; Huo, Z. Artificial neural network models for estimating regional reference evapotranspiration based on climate factors. Hydrol. Process. 2009, 23, 442–450. [Google Scholar] [CrossRef]
Jahanbani, H.; El-Shafie, A.H. Application of artificial neural network in estimating monthly time series reference evapotranspiration with minimum and maximum temperatures. Paddy Water Environ. 2011, 9, 207–220. [Google Scholar] [CrossRef]
Jain, S.K.; Nayak, P.C.; Sudheer, K.P. Models for estimating evapotranspiration using artificial neural networks, and their physical interpretation. Hydrol. Process. 2008, 22, 2225–2234. [Google Scholar] [CrossRef]
Kisi, O. The potential of different ANN techniques in evapotranspiration modelling. Hydrol. Process. 2008, 22, 2449–2460. [Google Scholar]
Kisi, O. Modeling reference evapotranspiration using evolutionary neural networks. J. Irrig. Drain. Eng. 2011, 137, 636–643. [Google Scholar]
Kisi, O. Evapotranspiration modeling using a wavelet regression model. Irrig. Sci. 2011, 29, 241–252. [Google Scholar]
Marti, P.; Royuela, A.; Manzano, J.; Palau-Salvador, G. Generalization of ETo ANN models through data supplanting. J. Irrig. Drain. Eng. 2010, 136, 161–174. [Google Scholar] [CrossRef]
Rahimikhoob, A. Estimation of evapotranspiration based on only air temperature data using artificial neural networks for a subtropical climate in Iran. Theor. Appl. Climatol. 2010, 101, 83–91. [Google Scholar] [CrossRef]
Zanetti, S.S.; Sousa, E.F.; Oliveira, V.P.S.; Almeida, F.T.; Bernardo, S. Estimating evapotranspiration using artificial neural network and minimum climatological data. J. Irrig. Drain. Eng. 2007, 133, 83–89. [Google Scholar] [CrossRef]
Yao, Y.; Liang, S.; Li, X.; Chen, J.; Liu, S.; Jia, K.; Zhang, X.; Xiao, Z.; Fisher, J.B.; Mu, Q.; et al. Improving global terrestrial evapotranspiration estimation using support vector machine by integrating three process-based algorithms. Agric. For. Meteorol. 2017, 242, 55–74. [Google Scholar] [CrossRef]
Hassan, M.A.; Khalil, A.; Kaseb, S.; Kassem, M.A. Exploring the potential of tree-based ensemble methods in solar radiation modelling. Appl. Energy 2017, 203, 897–916. [Google Scholar] [CrossRef]
Feng, Y.; Cui, N.; Zhao, L.; Hu, X.; Gong, D. Comparison of ELM, GANN, WNN and empirical models for estimating reference evapotranspiration in humid region of Southwest China. J. Hydrol. 2016, 536, 376–383. [Google Scholar] [CrossRef]
Fan, J.; Wang, X.; Wu, L.; Zhou, H.; Zhang, F.; Yu, X.; Lu, X.; Xiang, Y. Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy Convers. Manag. 2018, 164, 102–111. [Google Scholar] [CrossRef]
Chia, M.Y.; Huang, Y.F.; Koo, C.H.; Fung, K.F. Recent advances in evapotranspiration estimation using artificial intelligence approaches with a focus on hybridization techniques—A review. Agronomy 2020, 10, 101. [Google Scholar] [CrossRef]
Chia, M.Y.; Huang, Y.F.; Koo, C.H. Swarm-based optimization as stochastic training strategy for estimation of reference evapotranspiration using extreme learning machine. Agric. Water Manag. 2021, 243, 106447. [Google Scholar] [CrossRef]
Tiwari, M.K.; Chatterjee, C. Development of an accurate and reliable hourly flood forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach. J. Hydrol. 2010, 394, 458–470. [Google Scholar] [CrossRef]
Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zeng, W.; Zhou, H. Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions. J. Hydrol. 2019, 574, 1029–1041. [Google Scholar] [CrossRef]
Ponraj, A.S.; Vigneswaran, T. Daily evapotranspiration prediction using gradient boost regression model for irrigation planning. J. Supercomput. 2020, 76, 5732–5744. [Google Scholar] [CrossRef]
Wu, M.; Feng, Q.; Wen, X.; Deo, R.C.; Yin, Z.; Yang, L.; Sheng, D. Random forest predictive model development with uncertainty analysis capability for the estimation of evapotranspiration in an Arid Oasis region. Hydrol. Res. 2020, 51, 648–665. [Google Scholar] [CrossRef]
Wu, T.; Zhang, W.; Jiao, X.; Guo, W.; Hamoud, Y.A. Comparison of five boosting-based models for estimating daily reference evapotranspiration with limited meteorological variables. PLoS ONE 2020, 15, e0235324. [Google Scholar] [CrossRef]
Mokari, E.; DuBois, D.; Samani, Z.; Mohebzadeh, H.; Djaman, K. Estimation of daily reference evapotranspiration with limited climatic data using machine learning approaches across different climate zones in New Mexico. Theor. Appl. Climatol. 2022, 147, 575–587. [Google Scholar] [CrossRef]
Pagano, A.; Amato, F.; Ippolito, M.; De Caro, D.; Croce, D.; Motisi, A.; Provenzano, G.; Tinnirello, I. Machine learning models to predict daily actual evapotranspiration of citrus orchards under regulated deficit irrigation. Ecol. Inform. 2023, 76, 102133. [Google Scholar] [CrossRef]
Kiraga, S.; Peters, R.T.; Molaei, B.; Evett, S.R.; Marek, G. Reference evapotranspiration estimation using genetic algorithm-optimized machine learning models and standardized Penman–Monteith equation in a highly advective environment. Water 2024, 16, 12. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and regression by random forest. R News 2002, 2, 18–22. [Google Scholar]
Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T. Xgboost: Extreme gradient boosting. R Package Version 0.4-2 2015, 1, 1–4. Available online: https://cran.r-project.org/web/packages/xgboost/index.html (accessed on 1 August 2024).
Greenwell, B.; Boehmke, B.; Cunningham, J.; Developers, G.B.M. Gbm: Generalized boosted regression models. R Package Version 2.1-4 2018, 2, 37–40. Available online: https://cran.r-project.org/web/packages/gbm/index.html (accessed on 1 August 2024).
Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. Promot. Commun. Stat. Stata 2020, 20, 3–29. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Agrawal, Y.; Kumar, M.; Ananthakrishnan, S.; Kumarapuram, G. Evapotranspiration modeling using different tree based ensembled machine learning algorithm. Water Resour. Manag. 2022, 36, 1025–1042. [Google Scholar] [CrossRef]
Siasar, H.; Honar, T.; Abdolahipour, M. Comparing of generalized linear models, random forest and gradient boosting trees in estimation of reference crop evapotranspiration (Case study: The Sistan plain). J. Water Soil Sci. 2020, 23, 395–410. [Google Scholar]
Tausif, M.; Dilshad, S.; Umer, Q.; Iqbal, M.W.; Latif, Z.; Lee, C.; Bashir, R.N. Ensemble learning-based estimation of reference evapotranspiration (ETo). Internet Things 2023, 24, 100973. [Google Scholar] [CrossRef]
Kumar, U. Modelling monthly reference evapotranspiration estimation using machine learning approach in data-scarce North Western Himalaya region (Almora), Uttarakhand. J. Earth Syst. Sci. 2023, 132, 129. [Google Scholar] [CrossRef]
Akar, F.; Katipoğlu, O.M.; Yeşilyurt, S.N.; Taş, M.B.H. Evaluation of tree-based machine learning and deep learning techniques in temperature-based potential evapotranspiration prediction. Polish J. Environ. Stud. 2023, 32, 1009–1023. [Google Scholar] [CrossRef] [PubMed]
Jayashree, T.R.; Reddy, N.V.S.; Acharya, U.D. Modeling daily reference evapotranspiration from climate variables: Assessment of bagging and boosting regression approaches. Water Resour. Manag. 2023, 37, 1013–1032. [Google Scholar]
Feng, Y.; Cui, N.; Gong, D.; Zhang, Q.; Zhao, L. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric. Water Manag. 2017, 193, 163–173. [Google Scholar] [CrossRef]

Figure 1. Map showing the locations selected for the study.

Figure 2. Flowchart to estimate ETo accurately using different models.

Figure 3. Scatter plot showing evapotranspiration computed by P-M method and RF deep learning models for (a) Ballari, (b) Bengaluru, (c) Chikmaglur, (d) Chitradurga, (e) Devnagiri, (f) Dharwad, (g) Gadag, (h) Haveri, (i) Koppal, (j) Mandya, (k) Shivmoga, and (l) Tumkuru.

Figure 4. Scatter plot showing evapotranspiration computed by P-M method and GB deep learning models for (a) Ballari, (b) Bengaluru, (c) Chikmaglur, (d) Chitradurga, (e) Devnagiri, (f) Dharwad, (g) Gadag, (h) Haveri, (i) Koppal, (j) Mandya, (k) Shivmoga, and (l) Tumkuru.

Figure 5. Scatter plots showing evapotranspiration computed by P-M method and XGBoost deep learning models for (a) Ballari, (b) Bengaluru, (c) Chikmaglur, (d) Chitradurga, (e) Devnagiri, (f) Dharwad, (g) Gadag, (h) Haveri, (i) Koppal, (j) Mandya, (k) Shivmoga, and (l) Tumkuru.

Table 1. Location characteristics and data.

Sl. No.	Location	Location Characteristics			Length of Records
Sl. No.	Location	Latitude (°N)	Longitude (°E)	Elevation (m)	Model Development (Training)	Model Testing
1.	Bengaluru	12.97	77.59	920	1 January 1976–31 December 2017	1 January 2018–30 June 2020
2.	Ballari	15.14	76.92	485	-
3.	Chikmaglur	13.31	75.77	1090
4.	Chitradurga	14.22	76.4	732
5.	Devnagiri	14.33	75.99	603
6.	Dharwad	15.46	75.01	750
7.	Gadag	15.43	75.63	654
8.	Haveri	14.79	75.4	571
9.	Koppal	15.35	76.16	529
10.	Mandya	12.52	76.89	678
11.	Shivmoga	13.93	75.57	569
12.	Tumkuru	13.34	77.12	822

Table 2. Summary of climatic characteristics (mean ± standard deviation) of study sites.

Sl. No.	Location	RF (mm)	T_max (°C)	T_min (°C)	RH_max (%)	RH_min (%)	W_s (km/hour)	SS_h (hours/day)	P-M ET_o (mm/day)
1.	Bengaluru	650.6 + 172.5	30.99 ± 2.8	20.19 ± 1.8	77.9 ± 11.7	47.4 ± 19.7	6.76 ± 4.2	7.22 ± 3.2	6.31 ± 1.9
2.	Ballari	529.1 ± 124.6	32.93 ± 4.6	20.27 ± 3.0	75.1 ± 13.1	48.3 ± 20.4	7.10 ± 6.8	7.93 ± 3.1	6.92 ± 3.0
3.	Chikmaglur	1074.9 ± 183.7	28.42 ± 3.3	18.14 ± 1.9	82.8 ± 6.8	66.0 ± 16.6	7.79 ± 1.7	6.62 ± 3.2	5.07 ± 1.8
4.	Chitradurga	796.7 ± 184.4	32.03 ± 3.5	20.66 ± 2.0	80.8 ± 10.9	49.2 ± 22.9	13.2 ± 7.3	7.00 ± 3.9	7.97 ± 3.5
5.	Devnagiri	764.6 ± 156.7	32.23 ± 4.0	19.25 ± 2.5	77.9 ± 9.9	47.5 ± 21.2	4.58 ± 4.5	8.16 ± 1.9	6.21 ± 2.9
6.	Dharwad	1067.6 ± 217.1	31.64 ± 3.8	19.79 ± 2.6	77.6 ± 14.7	52.7 ± 21.9	5.85 ± 3.8	8.27 ± 2.0	6.04 ± 2.5
7.	Gadag	827.8 ± 158.6	32.54 ± 3.8	20.74 ± 2.2	76.4 ± 12.7	44.2 ± 20.2	9.21 ± 6.2	7.20 ± 3.2	7.53 ± 2.7
8.	Haveri	1038.7 ± 289.5	31.30 ± 4.0	20.50 ± 2.6	79.4 ± 11.9	52.9 ± 20.7	6.80 ± 3.8	8.14 ± 0.6	6.30 ± 2.6
9.	Koppal	849.1 ± 131.5	33.48 ± 3.9	21.73 ± 2.4	78.4 ± 13.4	53.2 ± 20.9	4.76 ± 4.3	6.79 ± 1.9	6.27 ± 3.2
10.	Mandya	856.4 ± 74.9	31.72 ± 2.6	20.45 ± 2.1	80.1 ± 8.4	50.1 ± 18.0	5.05 ± 2.7	8.52 ± 1.7	6.03 ± 1.9
11.	Shivmoga	1021.3 ± 106.4	31.89 ± 3.5	19.79 ± 2.3	82.5 ± 10.2	57.9 ± 21.2	6.39 ± 2.5	7.20 ± 1.5	6.04 ± 2.4
12.	Tumkuru	828.3 ± 123.8	31.45 ± 2.8	20.05 ± 1.8	75.4 ± 12.6	47.5 ± 19.1	6.87 ± 4.5	8.05 ± 1.0	6.60 ± 2.2

Notes: RF = Mean Annual Rainfall (mm), T_max = Mean Maximum Temperature, T_min = Mean Minimum Temperature, RH_max = Mean Maximum Relative Humidity, RH_min = Mean Minimum Relative Humidity, W_s = Mean Wind Speed, SS_h = Mean Sunshine Hours, P-M ET_o = Penman–Monteith Reference Evapotranspiration.

Table 3. Performance criteria.

Statistical Model	Equation
Average Absolute Relative Error	$A A R E = \frac{1}{n} \sum_{i = 1}^{n} \| R E_{i} \|$ in which, $R E_{i} = \frac{y_{i}^{c} - y_{i}^{e}}{y_{i}^{c}}$
Noise-to-Signal Ratio	$N S = \frac{S E E}{σ_{y}}$
Mean Absolute Error	$M A E = \frac{1}{n} \sum_{i = 1}^{n} \| y_{i}^{c} - y_{i}^{e} \|$
Coefficient of Correlation	$r = \frac{\frac{1}{n} \sum_{i = 1}^{n} (y_{i}^{c} - \bar{y_{i}^{e}}) (y_{i}^{c} - \bar{y_{i}^{e}})}{σ_{c} σ_{e}}$
Nash and Sutcliffe Efficiency	$ɳ = 1 - \frac{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i}^{e} - \bar{y_{e}})}^{2}}{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i}^{c} - \bar{y_{c}})}^{2}}$

Note(s): where,

y_{i}^{c}

y_{i}^{e}

, and

\bar{y_{i}^{e}}

are observed, estimated, and mean values respectively.

Table 4. Model performance on various criteria of the RF model.

Location	Model Performance Criteria
Location	WSEE	r	AARE	NS	MAE	ɳ
Ballari	1.05	0.92	7.36	0.26	0.56	0.92
Bengaluru	0.28	0.98	3.24	0.13	0.19	0.98
Chikmaglur	0.33	0.98	3.60	0.12	0.16	0.99
Chitradurga	0.99	0.96	5.14	0.21	0.45	0.95
Devnagiri	0.74	0.95	6.45	0.20	0.41	0.95
Dharwad	0.72	0.94	6.04	0.21	0.36	0.95
Gadag	0.88	0.93	5.75	0.24	0.44	0.94
Haveri	0.56	0.96	6.34	0.18	0.34	0.97
Koppal	1.00	0.80	13.05	0.32	0.68	0.89
Mandya	0.31	0.98	3.73	0.15	0.21	0.98
Shivmoga	0.55	0.94	6.66	0.21	0.36	0.96
Tumkuru	0.39	0.97	3.78	0.16	0.25	0.97

Table 5. Model performance on various criteria of the GB model.

Location	Model Performance Criteria
Location	WSEE	r	AARE	NS	MAE	ɳ
Ballari	0.87	0.95	5.43	0.21	0.43	0.95
Bengaluru	0.25	0.98	3.14	0.12	0.17	0.98
Chikmaglur	0.32	0.98	3.75	0.15	0.18	0.98
Chitradurga	0.76	0.96	5.70	0.21	0.45	0.95
Devnagiri	0.53	0.98	4.00	0.14	0.27	0.98
Dharwad	0.56	0.97	4.74	0.16	0.27	0.97
Gadag	0.66	0.95	5.14	0.19	0.34	0.96
Haveri	0.39	0.98	4.41	0.10	0.19	0.99
Koppal	0.61	0.96	5.38	0.17	0.33	0.96
Mandya	0.21	0.99	2.84	0.11	0.15	0.99
Shivmoga	0.42	0.97	5.63	0.16	0.29	0.97
Tumkuru	0.31	0.98	3.28	0.14	0.20	0.98

Table 6. Model performance on various criteria for ensemble XGBoost.

Location	Model Performance Criteria
Location	WSEE	r	AARE	NS	MAE	ɳ
Ballari	0.84	0.96	4.91	0.20	0.40	0.95
Bengaluru	0.19	0.99	2.13	0.09	0.12	0.99
Chikmaglur	0.19	0.99	2.82	0.09	0.13	0.99
Chitradurga	0.71	0.98	3.43	0.15	0.29	0.98
Devnagiri	0.49	0.98	3.77	0.13	0.25	0.98
Dharwad	0.50	0.98	3.89	0.14	0.23	0.98
Gadag	0.62	0.97	3.80	0.16	0.27	0.97
Haveri	0.33	0.99	3.17	0.10	0.19	0.99
Koppal	0.66	0.95	6.11	0.18	0.36	0.96
Mandya	0.19	0.99	2.43	0.09	0.13	0.99
Shivmoga	0.34	0.98	4.18	0.13	0.23	0.98
Tumkuru	0.22	0.99	2.35	0.09	0.15	0.99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kumar, M.; Agrawal, Y.; Adamala, S.; Pushpanjali; Subbarao, A.V.M.; Singh, V.K.; Srivastava, A. Generalization Ability of Bagging and Boosting Type Deep Learning Models in Evapotranspiration Estimation. Water 2024, 16, 2233. https://doi.org/10.3390/w16162233

AMA Style

Kumar M, Agrawal Y, Adamala S, Pushpanjali, Subbarao AVM, Singh VK, Srivastava A. Generalization Ability of Bagging and Boosting Type Deep Learning Models in Evapotranspiration Estimation. Water. 2024; 16(16):2233. https://doi.org/10.3390/w16162233

Chicago/Turabian Style

Kumar, Manoranjan, Yash Agrawal, Sirisha Adamala, Pushpanjali, A. V. M. Subbarao, V. K. Singh, and Ankur Srivastava. 2024. "Generalization Ability of Bagging and Boosting Type Deep Learning Models in Evapotranspiration Estimation" Water 16, no. 16: 2233. https://doi.org/10.3390/w16162233

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu