sustainability
Article
Exploring Machine Learning Models in Predicting Irrigation
Groundwater Quality Indices for Effective Decision Making in
Medjerda River Basin, Tunisia
Fatma Trabelsi *
and Salsebil Bel Hadj Ali
Research Unit Sustainable Management of Water and Soil Resources, Higher School of Engineers of
Medjez El Bab (ESIM), University of Jendouba, Jendouba 8189, Tunisia; belhadjali.salsebil@esim.u-jendouba.tn
* Correspondence: fatma.trabelsi@esim.u-jendouba.tn
Citation: Trabelsi, F.; Bel Hadj Ali, S.
Exploring Machine Learning Models
in Predicting Irrigation Groundwater
Quality Indices for Effective Decision
Making in Medjerda River Basin,
Tunisia. Sustainability 2022, 14, 2341.
https://doi.org/10.3390/su14042341
Academic Editor: Fernando António
Leal Pacheco
Received: 24 January 2022
Accepted: 12 February 2022
Published: 18 February 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
Abstract: Over the last years, the global application of machine learning (ML) models in groundwater
quality studies has proved to be a robust alternative tool to produce highly accurate results at a
low cost. This research aims to evaluate the ability of machine learning (ML) models to predict
the quality of groundwater for irrigation purposes in the downstream Medjerda river basin (DMB)
in Tunisia. The random forest (RF), support vector regression (SVR), artificial neural networks
(ANN), and adaptive boosting (AdaBoost) models were tested to predict the irrigation quality water
parameters (IWQ): total dissolved solids (TDS), potential salinity (PS), sodium adsorption ratio (SAR),
exchangeable sodium percentage (ESP), and magnesium adsorption ratio (MAR) through low-cost,
in situ physicochemical parameters (T, pH, EC) as input variables. In view of this, seventy-two (72)
representative groundwater samples have been collected and analysed for major cations and anions
during pre-and post-monsoon seasons of 3 years (2019–2021) to compute IWQ parameters. The
performance of the ML models was evaluated according to Pearson’s correlation coefficient (r), the
root means square error (RMSE), and the relative bias (RBIAS). The model sensitivity analysis was
evaluated to identify input parameters that considerably impact the model predictions using the
one-factor-at-time (OFAT) method of the Monte Carlo (MC) approach. The results show that the
AdaBoost model is the most appropriate model for predicting all parameters (r was ranged between
0.88 and 0.89), while the random forest model is suitable for predicting only four parameters: TDS,
PS, SAR, and ESP (r was with 0.65 to 0.87). Added to that, this study found out that the ANN and
SVR models perform well in predicting three parameters (TDS, PS, SAR) and two parameters (PS,
SAR), respectively, with the most optimal value of generalization ability (GA) close to unity (between
1 and 0.98). Moreover, the results of the uncertainty analysis confirmed the prominent superiority
and robustness of the ML models to produce excellent predictions with only a few physicochemical
parameters as inputs. The developed ML models are relevant for predicting cost-effective irrigation
water quality indices and can be applied as a DSS tool to improve water management in the Medjerda
basin.
Keywords: groundwater; irrigation water quality indices; machine learning; RF; SVR; ANN; AdaBoost;
Medjerda river basin; Tunisia
iations.
1. Introduction
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Water is a critical input for agricultural production and plays an important role in food
security [1]. Due to population growth, urbanization, and climate change (CC), competition
for water resources has excessively increased, with adverse effects on agriculture. In
particular, groundwater resources rapidly depleted in many parts of the world, especially
in the Mediterranean region, notably Tunisia, referenced as one of the most responsive
regions to CC and a primary “Hot-spot” [2,3]. This is an emerging threat to agriculture-led
rural development. To achieve sustainable development goals (SDGs) related to the efficient
Sustainability 2022, 14, 2341. https://doi.org/10.3390/su14042341
https://www.mdpi.com/journal/sustainability
Sustainability 2022, 14, 2341
2 of 23
use of water as well as eliminating hunger, it is crucial to improve water management,
rationalize the water irrigation [4,5] uses and improve the tools of groundwater quality
assessment. Indeed, the suitability of groundwater for irrigation purposes depends on the
nature of the mineral elements present in water and their impacts on soil and crops [6,7]. It
is based on the concentration of cations and anions present in the groundwater. Quality
indices such as the sodium adsorption ratio (SAR), residual sodium carbonate (RSC),
magnesium adsorption ratio (MAR), Kelly ratio (KR), and percentage of sodium (%Na) are
frequently used in assessing the suitability of waters for irrigation [8–10]. Furthermore,
one of the main challenges of qualitative assessment methods is their subjectivity, as
they require expert knowledge in assigning weights of variables for calculating the index
score, which means that the actual result is not clear [11,12]. However, some parameters
require a sampling protocol, laboratory analysis, and at a larger scale, testing and data
management [13] which increase the cost and study time of water quality assessment and
affects the decision-making on water quality management planning. To cope with these
issues, it is crucial to develop a powerful and cost-effective approach for quick and accurate
assessment of irrigation water quality. Thus, several contemporary studies have opted for
a non-physical tool, successfully predicting groundwater quality using ‘Machine Learning’
models [14,15]. The ML technique is a promising and capable multi-functioning approach
in all scientific fields [16,17]. Globally, several researchers have applied ML techniques
in various water research studies. They were applied [18,19] for nitrate groundwater
contamination [20,21], Manganese removal prediction [13], a flood susceptibility study [22],
pollution source identification in water supply network [23], wastewater heavy metal
removal [24], heavy metal pollution prediction [25], water level forecasting [26], and, in the
last decades, artificial intelligence (AI) techniques have been investigated and showed great
ability to predict and monitor water quality [15,27]. These techniques include machine
learning (ML), deep learning (DL) and artificial neural networks (ANN).
For example, ML models (supervised machine learning, gradient boosting, and multilayer perceptron) have been studied by [28,29], who demonstrated the relevance of this
technique in predicting water quality [30,31] for drinking use. The support vector machine
(SVR) model was applied by [12] to predict the water quality index that showed its accurate
prediction. The authors of [32] have compared deep learning (DL) models with three
other ML models: random forest (RF), eXtreme Gradient Boosting (XGBoost), and ANN to
predict groundwater quality.
However, few research studies have applied AI models to predict irrigation water
quality. Recently, the ANN model was used by [33] to predict the suitability of groundwater
for irrigation purposes in India using physicochemical parameters as input variables.
Similarly, [15] predicted groundwater quality in Morocco using ANN, AdaBoost, Random
Forest (RF), ANN, and support vector regression (SVR) models based on irrigation water
quality indices as inputs. It is important to note that all published studies have proved the
good performance of ML models in the prediction of the suitability of groundwater quality
for irrigation purposes using few datasets of physicochemical parameters measured in situ
or by smart sensor technologies.
This study is performed for the lower and middle sub-basins of the Medjerda catchment known as the basin downstream from the Sidi Salem dam (DMB). This basin is part
of the largest watershed of Tunisia, where it supplies about half of the country’s drinking
water. The DMB basin, subject of this study, is essentially agricultural, where irrigation
water supply depends on surface water in conjunction with groundwater resources. In
recent decades, the study area has experienced water scarcity problems due to the increased
frequency of droughts that have led to the increased exploitation of groundwater resources,
mainly by the agricultural and agro-industrial sectors [34,35]. Nevertheless, despite the
importance of groundwater in the Medjerda basin, there is currently a huge lack of data
regarding its quality that undermines the ability of decision makers and users to manage
it properly. The few studies that have been conducted are limited geographically and, in
a time, where few groundwater sampling campaigns and analyses were conducted, and
Sustainability 2022, 14, 2341
3 of 23
they are therefore insufficient to fill the existing data gap and to give a real time information about suitability of groundwater use. Thus, improving the water quality evaluation
process based on non-cost data using an objective tool with reliability and flexibility in
its decision-making capacity for water management and planning is essential in the DMB
basin.
Against this backdrop, the main objectives of this research are: (i) to evaluate the
effectiveness of machine learning (ML) models to predict the suitability of groundwater for
irrigation purposes in the DMB basin using four ML models (random forest, support vector
regression (SVR), ANN, and adaptive boosting (AdaBoost)), (ii) to evaluate the accuracy of
the implemented models, and (iii) to analyse the uncertainty and sensitivity of the tested
models. Concerning the scientific interest, this study is original, as no previous similar
studies were carried out in the pilot area using machine learning methods. Then, the focus
of this study was to test the performance of the novel approach and to provide spatial
information and guidance to support decision-making processes concerning groundwater
management in the Medjerda basin.
2. Materials and Methods
2.1. Study Area
The DMB basin is located in the northern part of Tunisia, it expands from the “Sidi
Salem” dam to the outlet of the river into the Mediterranean Sea. It is situated between
4,117,516–4,040,248 m in the north and 527,822 m–613,659 m in the east (zone 32 North
of the east of the Universal Transverse Mercator (UTM) coordinate system) (Figure 1).
It covers a total geographical area of about 1773 km2 . The average annual precipitation
calculated between the period of 1991 and 2020 is about 448.6 mm/year.
From the geological framework, the study area is a subsidence zone belonging to the
Tellian domain. It consists of a Quaternary depression limited by the nappes zone in the
north [36,37] and the diapirs zone or Triassic province in the south [38,39]. The sedimentary
distribution of the basin is essentially controlled by two NE-trending master faults, which
are associated with outcrops of Triassic evaporites. From west to east, there is the El
Alia-Teboursouk fault (ETF) and the Tunis-Elles fault (TEF) [40]. The Lithostratigraphy of
the study area shows geological formations ranging from Triassic to late Quaternary. The
Triassic outcrops have often-abnormal contact with Jurassic and Cretaceous outcrops in
several localities. The thick lithostratigraphic sequences formed by the Cretaceous, Eocene,
Miocene, Pliocene, and Quaternary deposits host the shallow and deep aquifers of the
study area such as the aquifer of Bled Guenima, the aquifer of the Anti-Pliocene Medjerda,
the plio-quaternary aquifer of Medjerda, the Campanian limestone aquifer of Medjerda,
Medjerda aquifer of marls, and Barremian limestones. The alluvial aquifers known as
the aquifer of the middle valley of the Medjerda, the aquifer of the lower valley of the
Medjerda and the aquifer of Ousja Ghar El Meleh (OGM) are hosted in the colluvial series
of the mountains and the alluvial fillings of the deltaic plain. The groundwater of DMB
aquifers is primarily used for irrigation and agroindustry and it knew, in last years, severe
exploitation, especially in the drought seasons. Moreover, they suffer from salinization,
largely caused by natural processes such as evaporation, water-rock interaction, saltwater
intrusion, and up-coning of saline waters from deep layers in addition to anthropogenic
causes related to irrigation return flow [35,41,42]. The hydromorphic nature of soils at the
level of DMB is a rather important problem, observed at the level of irrigated areas of Kalâat
El Andalous accompanied by drainage that worsens it, noting, moreover, the clogging
and stagnation at the level of Garaâ. This phenomenon enhances the problem of salinity
of groundwater due to the excessive use of chemical fertilizers at the level of irrigated
areas. Moreover, the coastal aquifer of OGM is affected by saltwater intrusion due to the
communication between the lagoon of Ghar El Melh and the sea [30]. Saline groundwater
used in irrigation adversely affects soil as well as crop yields. The most harmful associated
effects on the irrigated areas are sodification, salinization, and alkalinization, which may
Sustainability 2022, 14, 2341
4 of 23
alter soil structure [43,44]. Consequently, the quality of groundwater is deteriorated, and it
is crucial to evaluate its suitability, especially for irrigation purposes [45,46].
Figure 1. Location map of the downstream Medjerda River Basin (DMB).
2.2. Methodology and Datasets
The methodology adopted in this work is based on five steps (Figure 2): (i) data
development (data checking reliability and data exploration); (ii) development of machine
learning models (ANN, AdaBoost, SVR, and RF) based on the training datasets; (iii) validation of the models performance based on the validation datasets; (iv) generalization
ability; (v) uncertainty and sensitivity analysis of the performed models. This allowed us
to evaluate whether the developed models are useful to predict irrigation groundwater
quality parameters to help farmers and decision makers to manage irrigation strategies.
Sustainability 2022, 14, 2341
5 of 23
INPUT DATA
Physico-chemical parameters
Irrigation Water Quality Indices (IWQI)
Cleaning Datasets
•
Statistical analysis and imputation
Solve for missing values
Delete inherent values ( i.e. pH= 0.5, T°= 88)
•
Reliability check of the data using Ionic balance
(>5% rejected)
Evaluation of correlation index r
Scatterplot of sum cations vs sum anions
Data Exploration
Data preparation
Data checking reliability
1008 variables (14 columns & 72 lines)
Irrigation water quality indices
Basic statistical characteristics
Matrix correlation analysis
Data Normalization
x normalized = (x – x min) / (x max – x min)
Training 80%
Validation 20%
1
0.8
Artificial Neural Network (ANN)
Input Data
0.6
0.4
Prediction
0.2
R
0
EC
3.71
3.73
24.3
7.2
4.13
…
1.757
1.787
0.934
4.08
3.91
TDS
2.44
2.45
15.81
4.73
2.77
…
1.198
1.21
0.66
2.7
2.59
pH
7.35
7.61
7.61
7.98
7.22
…
7.43
7.45
7.74
7.32
6.21
SAR
7.64
12.87
41.29
12.93
8.99
…
3.69
3.57
6.93
6.52
5.63
PS
25.38
28.53
233.65
55.64
33.29
…
16.27
18.09
10.23
42.56
48.7
ESP
57.07
74.03
80.51
64.41
61.41
…
41.85
39.51
74.81
45.05
42.39
SVR
MAR
53.2
87.38
89.56
63.21
61.21
…
23.47
34.28
25.32
36.72
24.23
0
1
2
3
4
…
67
68
69
70
71
0.154286
0.155235
1.000000
0.299153
0.175338
…
0.075740
0.076972
0.041921
0.170692
0.164212
Random Forest (RF)
ANN
RF
RBIAS
ADABOOST
RMSE (meq L-1)
Metric Validation
0
1
2
3
4
…
67
68
69
70
71
Te
18
19.7
13.6
5.8
21.9
…
20.5
18.3
23
21.6
23.9
7.00%
2.00%
SVR
-3.00%
ANN
RF
ADABOOST
-8.00%
15
10
5
SVR
Te
18
1
2
3
4
…
67
68
69
70
19.7
13.6
5.8
21.9
…
20.5
18.3
23
21.6
EC TDS pH SAR PS
ESP
MAR
3.71 2.44 7.35 7.64 25.38 57.07 53.2
3.73
24.3
7.2
4.13
…
1.757
1.787
0.934
4.08
71 23.9 3.91
2.45
15.81
4.73
2.77
…
1.198
1.21
0.66
2.7
7.61
7.61
7.98
7.22
…
7.43
7.45
7.74
7.32
12.87
41.29
12.93
8.99
…
3.69
3.57
6.93
6.52
2.59 6.21 5.63
28.53
233.65
55.64
33.29
…
16.27
18.09
10.23
42.56
48.7
74.03
80.51
64.41
61.41
…
41.85
39.51
74.81
45.05
87.38
89.56
63.21
61.21
…
23.47
34.28
25.32
36.72
<
42.39 24.23
0
1
2
3
4
…
67
68
69
70
71
0.154286
0.155235
1.000000
0.299153
0.175338
…
0.075740
0.076972
0.041921
0.170692
0.164212
Support Vector Machine (SVM)
Prediction
Input Data
0
1
2
3
4
…
67
68
69
70
71
Te
18
19.7
13.6
5.8
21.9
…
20.5
18.3
23
21.6
23.9
EC
3.71
3.73
24.3
7.2
4.13
…
1.757
1.787
0.934
4.08
3.91
TDS
2.44
2.45
15.81
4.73
2.77
…
1.198
1.21
0.66
2.7
2.59
pH
7.35
7.61
7.61
7.98
7.22
…
7.43
7.45
7.74
7.32
6.21
SAR
7.64
12.87
41.29
12.93
8.99
…
3.69
3.57
6.93
6.52
5.63
PS
25.38
28.53
233.65
55.64
33.29
…
16.27
18.09
10.23
42.56
48.7
ESP
57.07
74.03
80.51
64.41
61.41
…
41.85
39.51
74.81
45.05
42.39
MAR
53.2
87.38
89.56
63.21
61.21
…
23.47
34.28
25.32
36.72
24.23
0
1
2
3
4
…
67
68
69
70
71
0.154286
0.155235
1.000000
0.299153
0.175338
…
0.075740
0.076972
0.041921
0.170692
0.164212
Adaptive Boosting (Adaboost)
Prediction
Input Data
0
1
2
3
4
…
67
68
69
70
71
Te
18
19.7
13.6
5.8
21.9
…
20.5
18.3
23
21.6
23.9
EC
3.71
3.73
24.3
7.2
4.13
…
1.757
1.787
0.934
4.08
3.91
TDS
2.44
2.45
15.81
4.73
2.77
…
1.198
1.21
0.66
2.7
2.59
pH
7.35
7.61
7.61
7.98
7.22
…
7.43
7.45
7.74
7.32
6.21
SAR
7.64
12.87
41.29
12.93
8.99
…
3.69
3.57
6.93
6.52
5.63
PS
25.38
28.53
233.65
55.64
33.29
…
16.27
18.09
10.23
42.56
48.7
ESP
57.07
74.03
80.51
64.41
61.41
…
41.85
39.51
74.81
45.05
42.39
MAR
53.2
87.38
89.56
63.21
61.21
…
23.47
34.28
25.32
36.72
24.23
0
1
2
3
4
…
67
68
69
70
71
0.154286
0.155235
1.000000
0.299153
0.175338
…
0.075740
0.076972
0.041921
0.170692
0.164212
Figure 2. Flowchart of adopted methodology.
Generalization Ability
0
ANN
RF
ADABOOST
MAR
ESP
SAR
PS
TDS
0
0.5
1
AdaBoost
Parameter
Uncertainty and sensitivity
Analysis
Machine Learning Models
0
Prediction
Input Data
RF
Error
ANN
SVR ANN
E
SVR
RF
AdaBoost
4.79
50.65
0.21
0.97
-0.01
0.09
0.13
1.14
11.57
27.55
-0.09
0.91
-0.02
0.04
0.56
0.74
0.27 -0.05 -0.02
0.19
E
412.4 -27.01
CB (95%) 142.5 55.07
E
0.45 -0.27
PS (meq L-1)
CB (95%) 0.96
1
E
0.04 -0.36
SAR (meq0.5
-0.5
L )
CB (95%) 0.37 0.47
E
-1.45 -1.31
ESP (%)
CB (95%) 1.89 1.69
TDS (mg L-1)
MAR (%)
1.5
CB (96%) 2.47 2.01
1.47
0.69
TDS
PS
SAR
ESP
(mg L-1)
(meq L-1)
(meq0.5 l-0.5)
(%)
(%)
42.33
2.06
0.27
0.97
1.63
48.17
0.28
0.2
1.15
1.75
56.16
0.20
0.21
1.01
1.38
MAR
Variable
EC
Average
pH
|ΔRMSE|
T°
Sustainability 2022, 14, 2341
6 of 23
2.2.1. Input Data
•
Physico-chemical parameters
The input data for the used models are the results of physico-chemical analyses
of groundwater taken from the DMB basin. It is important to respect the standards of
sampling and analysis to have reliable data to be used as input variables of the ML models.
In this study, groundwater samples were collected in September 2020, during the dry
season, to have water samples less affected by the dilution processes and that present the
highest concentrations of solutes during a year. A total of 72 groundwater samples were
collected from surface wells and piezometers. The samples were analysed (Figure 1) at the
“LandcareMed” laboratory of water and soil analysis at the Higher School of Engineers of
Medjez El Bab (ESIM) by adopting the standard procedures [46,47]. The measurement of
filtrate dry residue or TDS (total dissolved salts) was performed by evaporating 100 mL of
groundwater sample at 105 ◦ C for 24 h. Alkalinity was analysed by titration with 0.1 HCl
acid. Measurement of major elements, cations (Na+ , NH4+, K+ , Mg2+ , and Ca2+ ) and anions,
(Cl− , NO3 − , SO4 2− , F− , Br− ) was performed by means of ion chromatography system.
Table 1 summarizes the statistical analysis of the groundwater samples analysis.
Table 1. Statistical summary of physico-chemical parameters of groundwater samples.
Parameter
Unit
Min
Max
Mean
Standard
Deviation
Skew
Kurtosis
TDS
T◦ C
pH
EC
% O2
HCO3 −
F−
Cl−
NO2 −
Br−
NO3 −
PO4 2SO4 2Na+
NH4 +
K+
Mg2+
Ca2+
mg/L
◦C
282.20
5.80
3.70
348
0.70
6.32
0.12
30.70
0.03
0.08
0.38
0.38
1.85
19.52
3.79
0.03
0.54
2.76
15,818
26
10.1
24,300
44.10
820.01
9.44
8492.67
22.94
123.33
805.43
80.04
2173.01
4649
25.44
119.28
521.53
659.46
3167.72
18.65
7.66
4974.97
6.10
329.67
1.62
1211.08
7.81
45.37
124.90
39.78
530.36
708.52
12.39
13.95
150.72
149.59
2525.07
0.67
3996.26
3996.26
6.83
174.31
1.55
1306.23
7.88
31.35
125.99
20.22
505.27
730.89
8.02
21.46
92.03
144.15
3.39
3.40
3.08
3.40
3.08
0.00
2.84
4.18
0.64
−0.03
3.01
0.31
1.76
4.06
1.30
3.52
1.50
1.68
13.61
13.06
−0.41
13.69
13.06
−0.41
11.98
19.80
−1.18
−0.53
12.56
−0.42
2.99
18.65
2.21
13.39
3.98
2.54
µs/cm
mg/L
mg/L
mg/L
mg/L
mg/L
mg/L
mg/L
mg/L
mg/L
mg/L
mg/L
mg/L
mg/L
•
Irrigation water quality Indices (IWQ)
Irrigation water chemistry varies depending on its source, reservoir aquifer lithology,
and climatic trends. Poor irrigation water quality adversely affects plant growth, agricultural production, soil deterioration, and human health. Generally, the assessment of
groundwater suitability for irrigation purposes is evaluated through various agricultural
water quality indicators such as percent sodium (%Na), sodium adsorption ratio (SAR),
Kelley ratio (KR), magnesium hazard (MH), residual sodium carbonate (RSC), residual
sodium bicarbonate (RSBC), permeability index (PI), and potential salinity (PS). In this
study, we focus on SAR, PS, TDS, ESP, RSC, and MAR parameters which are calculated
according to Table 2.
Sustainability 2022, 14, 2341
7 of 23
Table 2. Irrigation water quality indices (IWQ).
Index Formula
Description
TDS = ∑ (cations + anions)
[48]
The TDS is the sum of the ion concentrations in
the water.
Na+
SAR (sodium adsorption ratio) is a measure
that determines the degree of hazard to crops
by measuring the alkali/sodium risk.
SAR =
[49]
q
Mg2+ +Ca2+
2
PS = Cl − +
[50]
ESP =
[9]
The potential salinity or Doneen is used for risk
assessment of cations (calcium, sodium, and
magnesium) and bicarbonates present in water
that can affect soil permeability if used for
long-term irrigation.
SO42−
2
Na+
Ca2+ + Mg2+ + Na+ +K +
The percent exchangeable sodium parameter
(ESP in %) is used to evaluate the effect of
sodium on soil texture.
× 100
RSC = (CO32− + HCO3− ) − Ca2+ + Mg2+
[51]
MAR =
[52]
Mg2+
Mg2+ +Ca2+
Residual sodium carbonates RSC indicate
excess bicarbonate and carbonate in the
irrigation water
The excess of the concentration of magnesium,
compared with the sum of the concentration of
calcium and magnesium in water, affects the
quality of soils that can translate into low crop
yield.
× 100
2.2.2. Data Pre-Processing and Explanatory Data Analysis (EDA)
Data pre-processing and EDA are the most important part of the machine learning
project. It is the operation that transforms raw data into clean data (Figure 3).
The verification of the reliability of physicochemical and IWQ datasets was performed
using the ionic balance, the ionic scatter plot, and the boxplot.
Firstly, the data cleaning processing was performed to correct mistakes and errors in
the quality dataset by checking the accuracy of physico-chemical datasets.
As a first step, the reliability of the analytic procedures used was checked using the
ionic balance (IB). Water samples whose IB exceeds 5% were eliminated.
IB(%) =
∑ Cations – ∑ Anions
× 100 ,
∑ Cations + ∑ Anions
(1)
Then, the elaborated scatter plot between the sum of anions and cations (Figure 4) was
built and shows a very good correlation (R2 = 0.98), which confirms the reliability of the
used data. Secondly, the IWQ were calculated, (Table 3), and their accuracy was checked
using correlation matrix. The box plot of the distribution of IWQ and physicochemical
variables (Figure 5) was used to screen the outliers’ values for a group of variables. Only
few outliers were detected for the majority of variables. Thus, 69 samples were retained
and normalized to an interval of 0 to 1 to improve the prediction performance by reducing
the influence of extreme and lower values.
xnormalized =
( x − xmin )
( xmax − xmin )
(2)
Finally, the dataset of computed Irrigation water quality parameters (IWQ) was divided into two sub-sets for model training and model validation (80:20).
−0.53
−0.64
−0.62
−2.37
Sustainability 2022, 14, 2341
8 of 23
Figure 3. Scatter plots showing the correlation of major cations/anions.
Table 3. Descriptive statistics of the Irrigation Water Quality Indices (IWQ).
Mean
Standard error
Median
Mode
Standard deviation
Variance
Kurstosis (kurtosis coefficient)
Skewness coefficient
Range
Minimum
Maximum
Te
EC
TDS
pH
SAR
PS
ESP
MAR
18.65
0.38
18.45
14.70
d
10.66
2.28
−0.53
20.20
5.80
26.00
4.97
0.47
3.91
3.71
4.02
16.20
13.69
3.40
23.95
0.35
24.30
31.68
3.00
26.00
50.38
25.43
646.58
13.61
3.39
155.36
2.82
158.18
7.66
0.08
7.75
7.61
0.67
0.45
18.36
−2.37
6.40
3.70
10.10
9.60
0.81
7.76
10.65
6.91
47.75
13.01
3.23
42.66
0.72
43.38
39.68
4.73
28.52
13.78
40.12
1609.57
16.94
3.82
246.14
1.31
247.44
57.32
1.51
56.16
56.05
12.77
163.18
0.62
0.08
64.90
22.05
86.95
63.17
2.84
71.26
85.25
24.08
579.90
−0.64
−0.62
93.03
5.48
98.51
Sustainability 2022, 14, 2341
9 of 23
Figure 4. Boxplots of IWQ parameters and physico-chemical variables.
Figure 5. Distribution of the raw values of parameters by sample.
2.2.3. Machine Learning Modelling
The ML models were developed in the Jupyter Lab using the open-source tool of the
anaconda platform (www.anaconda.com/products/individual, accessed on 8 November
2021) to perform the python package of data science and machine learning.
•
•
•
Artificial Neural Network (ANN)
Sustainability 2022, 14, 2341
10 of 23
ANN is commonly used as an ML model in groundwater modelling [53]. It is a wellestablished and long-standing machine learning technique that is designed to evaluate the
processes (represented by the data) that have high complexities and reduced availability
of information for the purpose of regression [54]. In this study, a feed forward multilayer
perceptron (MLP) architecture was used for training the ANN committee model. A MLP,
which is a specific case of ANN, consists of an input layer, one or more hidden layers,
and an output layer [55,56]. The authors of [57] have stated as follows: It consists of a
weighted input layer, hidden layers, and an output layer. These layers are interconnected
by neurons. Hence, designing ANN requires the transformation from the jth to the (j + 1)th
layer through an activation function (f ) and so on until the target layer [57]. The iterative
training process is repeated for the layers until good preliminary performance.
𝑦𝑦𝐶𝐶
In this study, only three layers were developed to obtain an output yi following the
Equation (3):
𝑁𝑁
!
N
𝑓𝑓 �∑
�W𝑊𝑊
𝐶𝐶 b j 𝑏𝑏𝑖𝑖 �
Y𝑌𝑌
xi𝑥𝑥+
i 𝐶𝐶= f
ij 𝐶𝐶𝑖𝑖
(3)
i =𝐶𝐶=1
1
with N, x𝑥𝑥i 𝐶𝐶, y 𝑦𝑦
wij 𝑤𝑤
showing
the number of nodes in the previous layer, the ith nodal
𝑖𝑖
𝐶𝐶𝑖𝑖
j ,𝑖𝑖b j 𝑏𝑏and
in the previous layer, the jth nodal in the present layer, the bias of jth nodal in the present
layer, and a weight connecting xi and y j𝑥𝑥[58].
𝑦𝑦𝑖𝑖
𝐶𝐶
••
Adaptive boosting model (AdaBoost)
AdaBoost is an ensemble learning algorithm developed by [46]. It can be used in
combination with many other types of learning algorithms to improve ability.
It integrates multiple weak learners into an individual strong learner and initializes an
equal weight for all datasets. Then, the weights of the samples misclassified by the previous
weak learner are improved. Finally, the samples with the updated weights are used to
train the next weak learner. With this approach, new learners are trained to decrease the
weighted error produced by previous learners (Figure 6).
Figure 6. Flow chart of the AdaBoost algorithm.
••
Support vector machine
The SVM is a machine learning algorithm [59] based on statistical learning theory. It
is extensively used in resolving issues related to classification (SVC) and regression (SVR)
which also diminishes the algorithm over-fitting [60].
n
For an observational data set (Ds) 𝑇𝑇
D𝐶𝐶s = (𝑥𝑥x𝐶𝐶i ,𝑦𝑦y𝐶𝐶i )𝐶𝐶𝐶𝐶=1
i =1 , the optimal function is the minimization of the function (4) (subject to (5)). Hence, the lossfunctions
functionssuch
suchas
asεε-insensitive,
quadratic, and Hubber methods can be used [44].
−
𝜀𝜀𝑐𝑐
ω
𝜀𝜀𝑐𝑐 ∗
+
𝜔𝜔 𝑏𝑏 𝜀𝜀 𝜀𝜀
min ω, b, ε− , ε
+
=
1‖𝜔𝜔2 ‖
2
𝐶𝐶
n
𝜀𝜀𝑐𝑐 ∗
𝐶𝐶
� 𝜀𝜀𝑐𝑐
× ||ω 2 || + C × ∑ (εi + εi∗ )
𝐶𝐶=1
i =1
𝑇𝑇
∅ 𝑥𝑥 − 𝑏𝑏 ≤ 𝜖𝜖 − 𝜀𝜀𝑐𝑐
⎧𝑦𝑦𝐶𝐶 − 𝜔𝜔
⎪−𝑦𝑦 𝜔𝜔𝑇𝑇 ∅ 𝑥𝑥
𝑏𝑏 ≤ 𝜖𝜖 − 𝜀𝜀𝑐𝑐 ∗
𝐶𝐶
𝑇𝑇 𝑐𝑐
∗
⎨
𝜀𝜀𝑐𝑐 𝜀𝜀𝑐𝑐 ≥
⎪
𝑐𝑐
𝑐𝑐
⎩
(4)
Sustainability 2022, 14, 2341
11 of 23
with εi and εi∗ as the lower and upper constraints on the output
yi − ω T × ∅( x ) − b ≤ ǫ − εi
−yi + ω T × ∅( x ) + b ≤ ǫ − εi∗
S.t
εi, εi∗ ≥ 0
i = 1, . . . . . . n
(5)
with ω, b, and C representing weight, basis vectors, and the prespecified value to penalize
the training error, while ∅(x) is a Kernel function (k) (polynomial, radial basis, and linear
functions).
In this study, a radial basis function (RBF) was adopted as Kernel function.
2
k xi , x j exp −γ xi − x j
(6)
•
Random forest
The random forest algorithm proposed by [45] is a general-purpose classification and
regression method. It builds an ensemble of weighted average of decision trees in training
by swapping and changing the covariates to improve the prediction performance.
In this study, the k-fold (k = 5) cross-validation method was used during the learning
process to further prevent model overfitting [61]. The optimal architectures, functions, and
hyperparameters of each model were determined by trial-and-error analysis based on their
evolution during the training process. All models’ parameters used for prediction of IWQ
parameters are summarized in the Table 4.
Table 4. Optimal parameters and functions used for IWQ indices prediction.
Model
Description of Parameters and Functions
ANN
3 layers
12 neurons in hidden layer algorithm:
Levenberg–Marquardt
Function activation: sigmoid identity in output
layer
Epoch number: 1000
Learning rate: 0.01
Momentum coefficient: 0.85
SVR
C = 200
Kernel function: RBF (γ = 1.2)
ε-function loss, ε = 0.002
Gamma = 0.1
Random Forest
AdaBoost
Number of trees: 20
Loss function: exponential
Estimator number: 50
Learning rate: 0.5
2.2.4. Validation of Models Performance
•
Metric validation
This step consists of evaluating the developed models. During it, their robustness is
tested in order to assess if the results obtained can be trusted.
In this study, three statistical criteria were used to validate the above models (Table 5):
(i) Pearson’s correlation coefficient® , (ii) the root mean square error (RMSE), and (iii) the
relative bias (RBIAS).
Sustainability 2022, 14, 2341
12 of 23
Table 5. Statistical criteria to validate the models.
Designation
Formula
Description
-
r=
Pearson’s correlation
coefficient (r)
n
h
The root mean square error
(RMSE)
n
∑i=1 ( X0i − X0 )( X pi − X p )
2
n
∑i=1 ( X0i − X0 ) ∑i=1 ( X pi − X p )
RMSE =
r
∑ ( X pi − X0i )
n
2
i
2 0.5
!
-
A lower value of RMSE
compared with the values of
the results indicates a better fit
of the model
-
The relative bias (RBIAS).
n
RBI AS =
∑i=1 ( X pi − X0i )
n
∑i=1 X0i
r = 1: best correlation
between the observed
and predicted values,
but it does not indicate
the best model.
r < 1 indicates a less fit
model.
-
RBIAS > 0: the model
tended to underestimate
RBIAS < 0: overestimate
the target magnitude
RBIAS = 0: the model is
perfect,
higher absolute value of
RBIAS indicates that the
model is biased
•
Generalization ability
Good performance in the testing phase is believed to be evidence for an algorithm’s
practical plausibility, where this performance provides an evaluation of the model’s generalization capability. Achievement of this objective is typically measured by the generalization
ability (GA) of the models [52]. The author of [62] defined GA in groundwater level
prediction by:
RMSE pendant la phase de validation
GA =
(7)
RMSE pendant la phase d′ apprentissage.
GA values equal to unity indicate that the ML model is perfect. If the GA is less
than unity, the models are under-trained, while if it is greater than unity, the models are
over-trained.
•
Uncertainty and Sensitivity Analysis
In this study, uncertainties of the fitted models were assessed by comparing the
observed and simulated values and calculating the standard error and confidence Bound
as explained in Equations (8) and (9)
SD =
s
∑in=1 (ei − e)2
( n − 1)
SD
CB = z × √
n
(8)
(9)
with ei = X0i − X pi , z is the z-score of the confidence level (for 95%, it is about 1.96), and
e is the mean prediction error.
Finally, the model sensitivity analysis was [63,64] performed to identify input parameters that considerably impact the model predictions of IWQ. This analysis was performed
using the one-factor-at-time (OFAT) method based on the Monte Carlo approach, which
is used to estimate the possible outcomes of an uncertain event [65,66]; an input variable
was generated randomly while keeping other variables constant. Then, the absolute value
Sustainability 2022, 14, 2341
13 of 23
of the difference in RMSE (|∆RMSE|) was calculated to assess the impact of each input
variable. Therefore, the sensitivity of the model to an input increases the absolute value of
the difference in RMSE.
3. Results
3.1. Statistical Analysis
For further exploration of the variables, a correlation matrix analysis and an assessment
of the importance of the input variables [66] were performed.
The correlation matrix is performed since it illustrates the importance of each parameter independently and their effect on the hydrochemistry [67,68]. If the values of (r) are +1
or−1 in the Pearson’s correlation matrix, they are treated as strong correlation coefficients
values and signify total correlation. If the values are closer to zero, it means there is no
significant interaction between two variables at the p < 0.05 level [19,55]. If r is bigger than
0.7, the parameters are highly correlated, and if r is between 0.4 and 0.7, the parameters are
moderately correlated. In this study, a correlation matrix is used to consider the correlation
between chemical parameters and IWQ values. The results reported in Figure 7 show that
electrical conductivity (EC) has a high correlation with TDS (r = 0.99), PS (r = 0.99), and SAR
(r = 0.86)), while it has a low correlation with ESP (r = 0.30) and MAR (r = 0.05) indices. The
pH has low correlations with all parameters. The temperature has the lowest correlations
with all parameters. These results show that electrical conductivity (EC) is a more correlated
input variable with the predicted parameters than pH and temperature. Nevertheless, high
correlations do not imply causality since complex combinations of the features can have
influences on the target variable. According to [15], the lowest correlations between T, pH,
and EC prove that these parameters are separable and non-redundant and, therefore, are
useful for improving the predictive accuracy of machine learning.
Figure 7. Matrix correlation.
3.2. Implementation and Evaluation of Models
This study included the results of performing four different methods of predicting
the irrigation water quality parameters (IWQ). The models used were as follows: artificial
neural network (ANN), adaptive boosting (AdaBoost), support vector machine for regression (SVR) and random forest (RF). Three metric criteria were used to validate the above
models: Pearson’s correlation coefficient (r), RMSE, and RBIAS.
The results of the training and validation processes of the developed models are
illustrated in Figures 8 and 9, respectively.
−1
Sustainability 2022, 14, 2341
14 of 23
The results of the training process reveal that the SVR model has significant values of
RBIAS and RMSE compared with the other models for predicting the TDS parameter. The
ANN, RF, and AdaBoost models revealed high accuracy in predicting the TDS parameter
during the learning process with values of r equal to 0.94, RMSE equal to 500.07 mg
L−1 , and RBIAS of 1% on average. It showed that all developed models performed very
well with average correlation coefficients of 0.90, RBIAS less than 3% in absolute value,
and average RMSE around 5 meq L−1 . Based on the training results (Figure 8), the four
models perform satisfactory for the prediction of the sodium absorption ratio (SAR) and
the percent exchangeable sodium (ESP). In fact, the correlation coefficients are 0.61 and
0.62, respectively. Similarly, the coefficients RMSE and RBIAS proved acceptable results
for the two IWQs. As for the magnesium adsorption ratio (MAR), two of the statistical
parameters (RBIAS and RMSE) showed that all models performed it moderately well, and
only AdaBoost has a good person’s coefficient (r). Hence, it was inferred that the AdaBoost
model had a good performance in predicting all the IWQs parameters. However, random
forest and artificial neural network models were unable to predict the MAR parameter.
Overall, we can notice that there is no significant superiority between the ensemble models
in the training process.
Figure 8. Results of training model performance.
Sustainability 2022, 14, 2341
15 of 23
Figure 9. Results of validation model performance.
Yet, the validation process, evaluation of generalizability, sensitivity, and uncertainty
analysis are essential issues to evaluate the above models. Therefore, model validation was
performed using same algorithm with twenty percent of the data that were simulated to
assess the validation (Figure 9) and generalization ability. The Pearson’s coefficient values
range from 0.65 to 0.94 for the four parameters TDS, PS, SAR, and ESP over ANN and
SVR models. However, RMSE showed an unacceptable performance for all models for the
simulation of the TDS and MAR parameters, and RBIAS showed a lowest performance
for the SVR model for the simulation of the TDS and MAR parameters. When comparing
the performance results, two of the simulated models (AdaBoost and RF) had lower
performance in the training process while the ANN and SVR models presented very close
results during the two processes for the prediction of all IWQs parameters. All models,
except ANN for the SAR parameter, have RBIAS values less of than 6% in absolute value,
indicating that the fitted models are unbiased.
The scatter plot (Figure 10) shows the relationship between observed and simulated
variables over all IWQs parameters for all developed models. It identifies a better distribution on the X = Y line for the random forest for all models. Moreover, it shows that the
predicted values are very close to the observed values for the AdaBoost model except for
the MAR parameter. In fact, the accuracy of the models is satisfactory when the values
are distributed on or uniformly across the two diagonals of the X = Y line, showing that
the errors obey the Gaussian distribution [15]. Even though the SVR and ANN models
showed a satisfactory performance during the training phase, they failed to reproduce the
ESP parameter due to an RMSE which was very high (greater than 10%).
Sustainability 2022, 14, 2341
16 of 23
Figure 10. Scatterplots of observed and simulated values for the prediction of IWQs parameters
during the validation process.
Therefore, it can be deduced that the SVR model has the weakest performance in
predicting PS and SAR parameter, whereas the AdaBoost model has the best performance
in predicting all parameters. After follows the ANN and the RF in predicting TDS, PS, and
SAR parameters and TDS, PS, SAR, and ESP parameters, respectively. These results are
in accordance with previous findings [15,69]. The researchers found that the AdaBoost
model is superior to the support vector machine and artificial neural network models. To
have useful models to predict new data sets, while avoiding errors, it is necessary to test its
generalization capability. This way, once the model is developed, the end-users could test it
with any new dataset coming, for example, from real-time measurement sensors. Therefore,
Sustainability 2022, 14, 2341
17 of 23
the stability of machine learning models in forecasting real-time water quality parameters
is essential, especially when policy makers and researchers have strategies to develop this
approach in irrigation water management [15]. In this study, the generalization ability to
different input variables was evaluated. Figure 11 indicates that the ANN model for TDS
model is overfitted while all other models are underfitted. However, the generalization
ability of the random forest and AdaBoost model are weaker than the ANN and SVR
models.
Figure 11. Generalization ability (GA) indices of the models.
3.3. Uncertainty and Sensitivity Analysis
The issue of uncertainties in conceptual models in water quality modelling is inevitable
and has been discussed in many studies [42,45,70,71]. In this study, the uncertainty was
analysed and showed that the SVR model has the highest (95%) confidence bound values,
followed by the ANN, RF, and AdaBoost models (Table 6).
Table 6. Model uncertainty analysis.
Parameter
Error
ANN
SVR
RF
AdaBoost
TDS (mg L−1 )
E
CB (95%)
E
CB (95%)
E
CB (95%)
E
CB (95%)
E
CB (96%)
−27.01
55.07
−0.27
1.00
−0.36
0.47
−21.31
−
1.69
−
0.05
−0
2.01
412.48
142.56
0.45
1.96
0.04
0.57
−1.45
1.89
0.27
2.47
4.79
50.65
0.21
0.97
−RF
0.01
0.09
4.
0.13
1.14
−0.02
1.47
11.57
27.55
−0.09
0.91
−0.02
0.04
0.56
0.74
0.19
−0
0.69
PS (meq L−1 )
SAR (meq0.5 L−0.5 )
−1
ESP (%)
MAR (%)
−1
−0
−0
−0
−0
The sensitivity of the model provides an overview of the impact of input variables on
the output. This analysis is necessary to assess−1
how the model
−1 acts according to shifts in
input values (data quality, noise tolerance, etc.). Therefore, in this study, sensitivity analysis
the models after
of built models (Figure 12) was performed by simulating
−0
−0 adding a random
Gaussian noise to the input variables (EC, pH and T).
Sustainability 2022, 14, 2341
18 of 23
Figure 12. Sensitivity analysis results.
Sensitivities of the models to the inputs differ based on type of inputs, IWQs parameters and models. In fact, the results of sensitive analysis show that the models are more
sensitive to: (i) electrical conductivity followed by temperature and pH, respectively for
predicting TDS and MAR; (ii) pH for predicting ESP parameter; (iii) electrical conductivity
followed by the pH and the temperature, respectively for predicting PS and SAR.
Moreover, the AdaBoost was found to be the most sensitive model since it has the
highest values of the absolute value of the difference in RMSE. However, the overall results
of the sensitivity analysis show that the models are quite stable in predicting IWQ.
4. Discussion
In this research, four models: random forest (RF), support vector regression (SVR),
artificial neural networks (ANN), and adaptive boosting (Adaboost) were used to predict
the irrigation water quality parameters (IWQ): total dissolved solids (TDS), potential
salinity (PS), sodium adsorption ratio (SAR), exchangeable sodium percentage (ESP), and
magnesium adsorption ratio (MAR) through low-cost in situ physicochemical [72,73]
parameters (T, pH, EC) as input variables. The performance of the tested models was
evaluated according to Pearson’s correlation coefficient (r), the root means square error
Sustainability 2022, 14, 2341
19 of 23
(RMSE), and the relative bias (RBIAS). The model sensitivity was evaluated to identify [74]
input parameters that considerably impact the model prediction using the one-factor-attime (OFAT) method of the Monte Carlo (MC) approach. In accordance with the reviewed
literature, [30,69,75] the results show that the AdaBoost model is the most appropriate
for predicting all parameters, with R ranged between 0.88 and 0.89, and that the random
forest model is suitable for predicting only four parameters: TDS, PS, SAR, and ESP, with R
ranged between 0.65 and 0.87. Added to that, as found by [22,76], this study identifies that
The ANN and SVR models perform well in predicting three parameters (TDS, PS, SAR)
and two parameters (PS, SAR), respectively, with most optimal value of generalization
ability (GA) close to the unity.
Furthermore, MAR is the worst predictive parameter. This unproductive prediction
accuracy is probably due to the low relationship between the EC and the pH used as
input variables. Additionally, as explained by [7,9,22,27,29,61,74], the more significant the
correlation between the input and output variables, the higher the performances of the
models. Hence, the accurate prediction highly depends on the number of input variables
and their impact.
In general, the methodology of the proposed models for prediction of the irrigation
water quality parameters (IWQ) has proved its effectiveness. The effectiveness of ML
models does not only depend on the accuracy of the prediction but also on the nature and
number of predictors used. It is noteworthy that the use of physicochemical parameters
such as EC, pH, and T could significantly enhances the performance of machine learning
models [15,77]. Consequently, it is important to explore ML models for water quality index
prediction using only physicochemical parameters as input variables without decreasing
the efficiency of the models. Accordingly, this provides an incentive for decision makers to
apply artificial intelligence for water quality planning and management.
However, the stability of the ML models in the forecasting of the IWQ parameters in
real time is crucial, mainly when it is closely linked with the decision maker. Therefore,
while ML models are fairly stable in forecasting the IWQ parameters, it should be highlighted that the selection of the models must be based on deeper sensitivity analysis by
using smart technologies based on the Internet of Things (IoT) as a more secure and regular
data alternative as explained by [60]. Moreover, the generalization of these models must be
deeply studied because there are other variables that may interfere and influence water
quality.
5. Conclusions and Future Trends
The key goal of this research is to evaluate the ability of machine learning (ML) models
to predict the quality of groundwater for irrigation purposes in the downstream Medjerda
river basin (DMB), in Tunisia. Therefore, Adaboost, random forest, ANN, and SVR models
were developed and evaluated to predict TDS, PS, SAR, ESP, and MAR parameters using
physico-chemical parameters as input variables. This study confirmed that the AdaBoost
model is appropriate for predicting all parameters while the random forest model is suitable
for predicting only four parameters: TDS, PS, SAR, and ESP.
Added to that, this study found out that the ANN and SVR models perform well
in predicting 3 parameters (TDS, PS, SAR) and 2 parameters (PS, SAR) of 5 parameters,
respectively. However, the SVR and ANN models showed better generalization ability than
the AdaBoost and random forest models. Then, the sensitivity analysis showed that the
developed models are less sensitive to the input variables used compared with the range
of each predicted parameter. The ML models characterized by physical parameters are
effective tools and should be recommended for predicting water quality parameters.
This research presents an effective use of machine learning models in forecasting
the irrigation groundwater quality indices through low-cost data and can be used as a
decision support systems (DSS) tool for sustainable water management in DMB. In fact,
the traditional simulation modelling approaches are dependent on datasets that involve
a large amount of unknown or unspecified input data and generally consist of high-cost
Sustainability 2022, 14, 2341
20 of 23
time-consuming processes. Therefore, setting up a DSS based on machine learning models
will boost the efficient use of water and rationalize its use by all water stakeholders at
watershed level.
Author Contributions: Conceptualization, F.T. and S.B.H.A.; methodology, F.T.; software, S.B.H.A.;
validation, F.T.; formal analysis, F.T.; investigation, S.B.H.A.; resources, F.T.; data curation, F.T.;
writing—original draft preparation, F.T. and S.B.H.A.; writing—review and editing, F.T.; visualization,
F.T. and S.B.H.A.; supervision, F.T.; project administration, F.T.; funding acquisition, F.T. All authors
have read and agreed to the published version of the manuscript.
Funding: This research was funded by the United States Agency for International Development
(USAID) through Partnerships for Enhanced Engagement in Research program of the National
Academies of Sciences, Engineering, and Medicine (grant number: PEER 7_ Tunisia project 7-289).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The study did not report any data.
Acknowledgments: The authors are greatly thankful to the four Regional Commissariats for Agricultural Development (CRDA) of the Béjà, Mannouba, Ariana, and Bizerte regions for providing some
data and facilitating the groundwater sampling campaigns. We thank all reviewers and the editors
for their kind reviews and comments that improved the clarity of the final manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
FAO. Water for Sustainable Food and Agriculture; Food and Agriculture Organization of the United Nations: Caracalla, Rome, 2017;
ISBN 978-92-5-109977-3.
Knaepen, H. Climate Risks in Tunisia Challenges to Adaptation in the Agri-Food System; European Centre for Development Policy
Management (ECDPM): Maastricht, The Netherlands, 2021.
Hssaisoune, M.; Bouchaou, L.; Sifeddine, A.; Bouimetarhan, I.; Chehbouni, A. Moroccan Groundwater Resources and Evolution
with Global Climate Changes. Geosciences 2020, 10, 81. [CrossRef]
Aureli, A.; Ganoulis, J.; Margat, J. Groundwater Resources in the Mediterranean Region: Importance, Uses and Sharing. Water
Mediterr. 2008, 96–105. Available online: https://www.iemed.org/publication/groundwater-resources-in-the-mediterraneanregion-importance-uses-and-sharing (accessed on 8 November 2021).
Berhail, S. The impact of climate change on groundwater resources in northwestern Algeria. Arab. J. Geosci. 2019, 12, 770.
[CrossRef]
Rahmati, O.; Pourghasemi, H.R.; Melesse, A.M. Application of GIS-based data driven random forest and maximum entropy
models for groundwater potential mapping: A case study at Mehran Region, Iran. CATENA 2016, 137, 360–372. [CrossRef]
Yang, L.; Hua, G.; Caoab, L.; Wanga, X.; Chen, M.-H. A comparison of Monte Carlo methods for computing marginal likelihoods
of item response theory models. J. Korean Stat. Soc. 2019, 48, 503–512. [CrossRef]
Kopittke, P.M.; So, H.B.; Menzies, N.W. Effect of ionic strength and clay mineralogy on Na–Ca exchange and the SAR–ESP
relationship. Eur. J. Soil Sci. 2006, 57, 626–633. [CrossRef]
Wang, L.; Long, F.; Liao, W.; Liu, H. Prediction of anaerobic digestion performance and identification of critical operational
parameters using machine learning algorithms. Bioresour. Technol. 2020, 298, 122495. [CrossRef]
Paliwal, K.V. Irrigation with Saline Water; Water Technology Centre, Indian Agriculture Research Institute: New Delhi, India, 1972;
p. 198.
Amiri, V.; Rezaei, M.; Sohrabi, N. Groundwater quality assessment using entropy weighted water quality index (EWQI) in
Lenjanat, Iran. Environ. Earth Sci. 2014, 72, 3479–3490. [CrossRef]
Gorgij, A.D.; Kisi, O.; Moghaddam, A.A.; Taghipour, A. Groundwater quality ranking for drinking purposes, using the entropy
method and the spatial autocorrelation index. Environ Earth Sci. 2017, 76, 269. [CrossRef]
Bhagat, S.K.; Tiyasha, T.; Tung, T.M.; Mostafa, R.R.; Yaseen, Z.M. Manganese (Mn) removal prediction using extreme gradient
model. Ecotoxicol. Environ. Saf. 2020, 204, 111059. [CrossRef]
Leong, Y.C.; Hughes, B.L.; Wang, Y.; Zaki, J. Neurocomputational mechanisms underlying motivated seeing. Nat. Hum. Behav.
2019, 3, 1. [CrossRef]
El Bilali, A.; Taleb, A.; Brouziyne, Y. Groundwater quality forecasting using machine learning algorithms for irrigation purposes.
Agric. Water Manag. 2021, 245, 106625. [CrossRef]
Evangelos, R. Machine learning, urban water resources management and operating policy. Resources 2019, 8, 173.
Sustainability 2022, 14, 2341
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
21 of 23
Kim, H.; Kim, S.; Hwang, J.Y.; Seo, C. Efficient Privacy-Preserving Machine Learning for Blockchain Network. IEEE Access 2019, 7,
136481–136495. [CrossRef]
Nolan, B.T.; Fienen, M.N.; Lorenz, D.L. A statistical learning framework for groundwater nitrate models of the Central Valley,
California, USA. J. Hydrol. 2015, 531, 902–911. [CrossRef]
Ransom, K.M.; Nolan, B.T.; Traum, J.A.; Faunt, C.C.; Bell, A.M.; Gronberg, J.A.M.; Wheeler, D.C.; Rosecrans, C.Z.; Jurgens, B.;
Schwarz, G.E.; et al. A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central
Valley aquifer, California, USA. Sci. Total Environ. 2017, 601–602, 1160–1172. [CrossRef]
Rodriguez-Galiano, J.A.V.F.; Luque-Espinar, M.; Chica-Olmo, M.P. Mendes, Feature selection approaches for predictive modelling
of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods. Sci. Total Environ. 2018, 624, 661–672.
[CrossRef]
Ouedraogo, I.; Defourny, P.; Vanclooster, M. Application of random forest regression and comparison of its performance to
multiple linear regression in modeling groundwater nitrate concentration at the African continent scale. Hydrogeol. J. 2019, 27,
1081–1098. [CrossRef]
Chen, H.K.; Chen, C.; Zhou, Y.; Huang, X.; Qi, R.; Shen, F.; Liu, M.; Zuo, X.; Zou, J.; Wang, Y.; et al. Ren Comparative analysis
of surface water quality prediction performance and identification of key water parameters using different machine learning
models based on big data. Water Res. 2020, 171, 115454. [CrossRef]
Grbčić, L.; Lučin, I.; Kranjčević, L.; Družeta, S. Water supply network pollution source identification by random forest algorithm.
J. Hydroinform. 2020, 22, 1521–1535. [CrossRef]
Bhagat, S.K.; Tung, T.M.; Yaseen, Z.M. Development of artificial intelligence for modeling wastewater heavy metal removal: State
of the art, application assessment and possible future research. J. Clean. Prod. 2020, 250, 119473. [CrossRef]
Lal, R.; Stewart, B.A. Soil Processes and Water Quality, 1st ed.; CRC Press: Boca Raton, FL, USA, 1994. [CrossRef]
Zhu, S.; Hrnjica, B.; Ptak, M.; Choiński, A.; Sivakumar, B. Forecasting of water level in multiple temperate lakes using machine
learning models. J. Hydrol. 2020, 585, 124819. [CrossRef]
Ahmed, U.; Mumtaz, R.; Anwar, H.; Shah, A.A.; Irfan, R. Efficient water quality prediction using supervised machine learning.
Water 2019, 11, 2210. [CrossRef]
Fijani, E.; Barzegar, R.; Deo, R.; Tziritis, E.; Skordas, K. Design and implementation of a hybrid model based on two-layer
decomposition method coupled with extreme learning machines to support real-time environmental monitoring of water quality
parameters. Sci. Total Environ. 2019, 648, 839–853. [CrossRef]
Lu, H.; Ma, X. Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 2020,
249, 126169. [CrossRef]
Bel Hadj Ali, S.; Trabelsi, F. CAJG-2020-P527: Saltwater Intrusion Vulnerability Mapping Using Multi-Model Ensemble of Machine
Learning Algorithms: A Case Study of the Aousja Ghar El Melh Coastal Aquifer, Northeast of Tunisia; Advances in Science, Technology &
Innovation (ASTI); Springer: Berlin/Heidelberg, Germany, 2022.
Bel Hadj Ali, S.; Trabelsi, F. Impact of Anthropogenic Activities on the Groundwater Quality Using Machine Learning Algorithms:
A Case Study of the Aousja Ghar El Melh Coastal Aquifer, Northeast of Tunisia. In Proceedings of the Mediterranean Geosciences
Union Annual Meeting (MedGU-21), Istanbul, Turkey, 25–28 November 2021.
Singh, R.; Kumar, S.; Nangare, D.D.; Meena, M.S. Drip irrigation and black polyethylene mulch influence on growth. Yield-WaterUse Effic. Tomato 2009, 4, 1427–1430. [CrossRef]
Wagh, V.M.; Panaskar, D.B.; Muley, A.A.; Mukate, S.V.; Lolage, Y.P.; Aamalawar, M.L. Prediction of groundwater suitability for
irrigation using artificial neural network model: A case study of Nanded tehsil, Maharashtra, India. Model. Earth Syst. Environ.
2016, 2, 1–10. [CrossRef]
Trabelsi, F.; LEE, S. GIS-based groundwater potential mapping using Machine learning models: Case of Medjerda aquifer, North
of Tunisia. In Proceedings of the IAH2019, the 46th Annual Congress of the International Association of Hydrogeologists,
Málaga, Spain, 22–27 September 2019.
Trabelsi, F.; Ali, S.B.; Mukherjee, S.; Sipolya, R. Integrated Use of Satellite Remote Sensing and Hydraulic Modeling for the flood
Risk Assessment at the middle valley of Medjerda. In Proceedings of the International Conference & Exhibition. Advanced
Geospatial Science & Technology (TeanGeo 2016), Tunis, Tunisia, 26–28 September 2016.
Ayed, B.N. Evolution Tectonique de l’Avant-Pays de la Chaîne Alpine de Tunisie du Début du Mésozoïque à l’Actuel Thèse d’Etat; Université
de Paris Sud—Centre d’Orsay: Gif-sur-Yvette, France, 1986.
Rouvier, H. Géologie de l’Extrême Nord-Tunisien: Tectonique et Paléogéographie Superposées à l’Extrémité Orientale de la
Chaine Nord-Maghrébine. Thèse d’Etat, Paris, France, 1977; p. 307.
Perthuisot, V. Dynamique et Pétrogenèse des Extrusions Triasiques en Tunisie Septentrionale. Thèse Doct, ès Science, Travelling
Laboratory Geology Ecole North Superior, Paris, France, 1978; p. 312.
Ghanmi, M. Etude géologique du J. Kebbouch (Tunisie septentrionale). Ph.D. Thesis, Thèse 3 ème Cycle, Toulouse, France, 1980;
p. 141.
Melki, F.; Zouaghi, T.; Chelbi, M.B.; Bédir, M.; Zargouni, F. Role of the NE-SW Hercynian Master Fault Systems and Associated
Lineaments on the Structuring and Evolution of the Mesozoic and Cenozoic Basins of the Alpine Margin, Northern Tunisia. In
Tectonics—Recent Advances; IntechOpen: London, UK, 2012. Available online: https://www.intechopen.com/chapters/37864
(accessed on 8 November 2021).
Sustainability 2022, 14, 2341
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
22 of 23
Trabelsi, F.; Mukherjee, S. Remote Sensing and GIS Techniques for Evaluation of Groundwater Quality in middle valley of
Medjerda, Tunisia. In Proceedings of the 1st Euro-Mediterranean Conference for Environmental Integration (EMCEI), Sousse,
Tunisia, 22–25 November 2017; p. 526.
Trabelsi, F.; Mammou, A.B.; Tarhouni, J.; Piga, C.; Ranieri, G. Delineation of saltwater intrusion zones using the time domain
electromagnetic method: The Nabeul–Hammamet coastal aquifer case study (NE Tunisia). Hydrol. Process. 2013, 27, 2004–2020.
[CrossRef]
Hachicha, M.; Cheverry, C.; Mhiri, A. The impact of long-term irrigation on change of groundwater level and soil salinity in
northern Tunisia. Arid. Soil Res. Rehabil. 2010, 14, 175–182. [CrossRef]
Chatti, A.; Trabelsi, F.; Arfaoui, A. Qualité et Vulnérabilité des Ressources en eau Souterraine de la Basse Vallée de la Medjerda; University
of Jendouba: Jendouba, Tunisia, 2018.
Breiman, L. Random Forests. Mach. Learn. USA 2001, 45, 5–32. [CrossRef]
APHA. Standard Methods for the Examination of Water and Wastewater, 21st ed.; American Public Health Association/American
Water Works Association/Water Environment Federation: Washington, DC, USA, 2005.
Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput.
Syst. Sci. 1997, 55, 119–139. [CrossRef]
Sorensen, D.L. Suspended and Dissolved Solids Effects on Freshwater Biota: A Review; US Environmental Protection Agency, Office of
Research and Development: Washington, DC, USA, 1977.
Richards, L.A. Diagnosis and Improvement of Saline Alkali Soils, Agriculture, 160, Handbook 60; US Department of Agriculture:
Washington, DC, USA, 1954.
Freeze, R.A.; Cherry, J.A. Groundwater; Prentice-Hall: Hoboken, NJ, USA, 1979.
Raghunath, H.M. Groundwater; Wiley Eastern Ltd.: Delhi, India, 1987; p. 563.
Barzegar, R.; Moghaddam, A.A.; Baghban, H. A supervised committee machine artificial intelligent for improving DRASTIC
method to assess groundwater contamination risk: A case study from Tabriz plain aquifer, Iran. Stoch. Env. Res. Risk A. 2016,
30, 883–899. [CrossRef]
Barzegar, R.; Adamowski, J.; Moghaddam, A.A. Application of wavelet-artificial intelligence hybrid models for water quality
prediction: A case study in Aji-Chay River, Iran. Stoch. Env. Res. Risk Assess. 2016, 30, 1797–1819. [CrossRef]
Barzegar, R.; Moghaddam, A.A. Combining the advantages of neural networks using the concept of committee machine in the
groundwater salinity prediction. Model. Earth Syst. Environ. 2016, 2, 26. [CrossRef]
Belayneh, A.; Adamowski, J.; Khalil, B.; Quilty, J. Coupling machine learning methods with wavelet transforms and the bootstrap
and boosting ensemble approaches for drought prediction. Atmos. Res. 2016, 172, 37–47. [CrossRef]
Dawson, C.W.; Wilby, R. An Artificial Neural Network Approach to Rainfall-Runoff Modelling. Hydrol. Sci. J. 1998, 43, 47–66.
[CrossRef]
Robert, J.S. Artificial Neural Networks by (1997-06-01) Hardcover–January 1; Mcgraw-hill Companies: New York, NY, USA, 1997.
Castrillo, M.; García, A.L. Estimation of high frequency nutrient concentrations from water quality surrogates using machine
learning methods. Water Res. 2020, 172, 115490. [CrossRef]
Chen, K.; Chen, H.; Zhou, C.; Huang, Y.; Qi, X.; Shen, R.; Liu, F.; Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn.
1995, 20, 273–297. [CrossRef]
Gayen, A.; Pourghasemi, H.R.; Saha, S.; Keesstra, S.; Bai, S. Gully erosion susceptibility assessment and management of
hazard-prone areas in India using different machine learning algorithms. Sci. Total Environ. 2019, 668, 124–138. [CrossRef]
Arlot, S.; Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv. 2010, 4, 40–79. [CrossRef]
Rajaee, T.; Ebrahimi, H.; Nourani, V. A review of the artificial intelligence methods in groundwater level modeling. J. Hydrol.
2019, 572, 336–351. [CrossRef]
Khalil, A.; Almasri, M.N.; McKee, M.; Kaluarachchi, J.J. Applicability of statistical learning algorithms in groundwater quality
modelling. Water Resour. Res. 2005, 41, W05010. [CrossRef]
Yoon, H.; Jun, S.C.; Hyun, Y.; Bae, G.O.; Lee, K.K. A comparative study of artificial neural networks and support vector machines
for predicting groundwater levels in a coastal aquifer. J. Hydrol. 2011, 396, 128–138. [CrossRef]
Qiu, Y.; Aufiero, M.; Wang, K.; Fratoni, M. Development of sensitivity analysis capabilities of generalized responses to nuclear
data in Monte Carlo code RMC. Ann. Nucl. Energy 2016, 97, 142–152. [CrossRef]
Patil, R.; Bellary, S. Machine learning approach in melanoma cancer stage detection. J. King Saud Univ.-Comput. Inf. Sci. 2020.
[CrossRef]
Islam, M.M.S.; Ferdous, Z.; Potenza, M.N. Panic and generalized anxiety during the COVID-19 pandemic among Bangladeshi
people: An online pilot survey early in the outbreak. J. Affect. Disord. 2020, 276, 30–37. [CrossRef] [PubMed]
Zhao, X.; Ning, B.; Liu, L.; Song, G. A prediction model of short-term ionospheric foF2 based on AdaBoost. Adv. Space Res. 2014,
53, 387–394. [CrossRef]
Kardos, J.S.; Obropta, C.C. Water quality model uncertainty analysis of a pointpoint source phosphorus trading program. J. Am.
Water Resour. Assoc. 2011, 47, 1317–1337. [CrossRef]
Moreno-Rodenas, A.M.; Tscheikner-Gratl, F.; Langeveld, J.G.; Clemens, F.H.L.R. Uncertainty analysis in a large-scale water quality
integrated catchment modelling study. Water Res. 2019, 158, 46–60. [CrossRef]
Sustainability 2022, 14, 2341
71.
72.
73.
74.
75.
76.
77.
23 of 23
Radwan, M.; Willems, P.; Berlamont, J. Sensitivity and uncertainty analysis for river quality modelling. J. Hydroinform. 2004, 6,
83–99. [CrossRef]
Saghafi, H.; Arabloo, M. Modeling of CO2 solubility in MEA, DEA, TEA, and MDEA aqueous solutions using adaboost-decision
tree and artificial neural network. Int. J. Greenh. Gas Control 2017, 58, 256–265. [CrossRef]
Zhou, Z.; Feng, J. Deep Forest. Natl. Sci. Rev. 2019, 6, 74–86. [CrossRef] [PubMed]
Di, M.Z.; Chang, P. Guo Water quality evaluation of the Yangtze River in China using machine learning techniques and data
monitoring on different time scales. Water 2019, 11, 339. [CrossRef]
Shojaei, M.; Nazif, S.; Kerachian, R. Joint uncertainty analysis in river water quality simulation: A case study of the Karoon River
in Iran. Environ. Earth Sci. 2015, 73, 3819–3831. [CrossRef]
Ayadi, A.; Ghorbel, O.; BenSalah, M.S.; Abid, M. A framework of monitoring water pipeline techniques based on sensors
technologies. J. King Saud Univ.-Comput. Inf. Sci. 2022. [CrossRef]
Chowdury, M.S.U.; Emran, T.; Ghosh, S.B.; Pathak, A.; Alam, M.M.; Absar, N.; Andersson, K.; Hossain, M.S. IoT based real-time
river water quality monitoring system. Procedia Comput. Sci. 2019, 155, 161–168. [CrossRef]