Lange and Sippel MachineLearning Hydrology
Lange and Sippel MachineLearning Hydrology
net/publication/339751225
CITATIONS READS
37 4,887
2 authors:
Some of the authors of this publication are also working on these related projects:
BACI: Detecting changes in essential ecosystem and biodiversity properties - towards a Biosphere Atmosphere Change Index View project
Land use and land cover changes in African protected areas and their surroundings View project
All content following this page was uploaded by Holger Lange on 18 June 2021.
10.1 Introduction
H. Lange (*)
Norwegian Institute of Bioeconomy Research, Ås, Norway
e-mail: Holger.Lange@nibio.no
S. Sippel
Norwegian Institute of Bioeconomy Research, Ås, Norway
Department of Environmental System Science, ETH Zürich, Zürich, Switzerland
                                  holger.lange@nibio.no
234                                                                H. Lange and S. Sippel
                              holger.lange@nibio.no
10     Machine Learning Applications in Hydrology                                             235
Table 10.1 Examples for methods used in data analysis in four different categories
                 Unsupervised                          Supervised
    Manual/      Time series analysis, frequency       ESLLEa
    non-         decomposition, principal compo-       GLMs, GAMs, GAMMsb
    iterative    nent analysis
    Automatic/   Clustering, autoencoders, data min-   Neural networks, decision trees, random
    iterative    ing, dimensionality reduction         forests, support vector machines, and many
                                                       more; proper machine learning
a
    Enhanced Supervised Locally Linear Embedding (Zhang 2009)
b
    Generalized Linear Models, Generalized Additive Models, Generalized Additive Mixed Models
                                   holger.lange@nibio.no
236                                                                H. Lange and S. Sippel
The term “Machine Learning” was coined already 60 years ago (Samuel 1959) in the
context of game-learning computers (in that case, checkers). The fundamental and
then-new insight was that software is able to learn how to play strategic games, with
a performance superior to that of the human programmer.
   From the start, machine learning was closely connected to Artificial Intelligence
(AI). However, since probabilistic and iterative methods are notoriously data-thirsty
and data availability was always an issue prior to the ability to collect, store and
distribute large datasets automatically, machine learning faded away in favor of
expert systems and knowledge-based AI software.
   However, with the advent of the Internet, the corresponding availability of
digitized information and with increasing computing power, machine learning
experienced a revival, now as its own field separated from AI. It has strong roots
in statistics and probability theory, and many applications of machine learning,
including those fashionable in hydrology, are indeed tools based on and designed
for regression and/or classification.
   A closely related concept is that of data mining. In this review, we consider data
mining and machine learning as separate approaches under the umbrella of data-
oriented modeling: whereas data mining explores unknown datasets for knowledge
discovery, the typical machine learning algorithms has prediction as its target; to
achieve it, it is first fed with training data to learn patterns and dynamics and then
validated on hitherto unseen datasets. Still, the two concepts are interwoven: some
unsupervised learning methods are taken from data mining as a preprocessing step of
machine learning; some data mining approaches utilize machine learning methods
but with a different goal. Data mining is often concerned with dimension reduction,
providing an efficient (low-dimensional) representation of the main patterns present
in high-dimensional data, unraveling redundancies, and so on. Whereas some
machine learning methods perform dimension reduction as preparatory step, it is
not part of machine learning per se.
   It is surely no overstatement that we are living in the era of machine learning since
a few years. Their applications span a wide range, including not only scientific
domains but also industry and commercial businesses. Examples are automatic text
and speech recognition, translation from one natural language to another or from
specialist to laymen language, health care and medical data analysis, stock market
predictions, spotting fraud and plagiarism in science and elsewhere, optimizing
search engines like Google, autonomous driving vehicles, or autonomous self-
exploration and social interactions of robots. This list is far from being exhaustive,
and there are many job announcements and project applications mentioning machine
learning in their title. Overall, the scientific and commercial potential is considered
as huge.
                              holger.lange@nibio.no
10   Machine Learning Applications in Hydrology                                           237
    A word of caution might be necessary at this stage. The rise of machine learning
is deeply connected to data availability, which includes the ability to automatically
collect, store, and distribute data—we also live in the era of “big data.” The titanic
flow of data presents a challenge in its own, not only to computer storage and
processing speed but also to human comprehension. However, large classes of
methods used in machine learning are not genuinely new; others are mere refine-
ments of existing approaches. It is possible that there is a mismatch between data
volume and sophistication of our tools; the latter is always in danger to lag behind.1
And as today, skeptics could consider machine learning as just nonlinear fitting in
big datasets, vulnerable to wild errors when extrapolating from the domain of
training sets provided to the “machine” and (as a consequence) difficult to apply in
areas where huge amounts of reliable data are difficult or impossible to get or where
the test datasets differ in some underlying aspects from the training datasets.
    A recent poll among chemists2 indicated that 45% of the participants agreed or
even strongly agreed to the statement that machine learning is overhyped. An
extreme example of hyping is that of the director of Artificial Intelligence at Tesla
calling machine learning (and neural nets in particular) “Software2.0.”3 While
overrating or overselling are common side effects of new technologies and emerging
markets, one could as well relax and just add machine learning approaches to the
toolbox of the modeler, recognizing that machine learning is good at things where
machines are good at learning and where the appropriate training data are available,
no more no less.
    The increasing popularity of machine learning in hydrology originates also in the
availability of data sources other than traditional ones like precipitation, runoff,
groundwater height, and so on. Remote sensing from satellites or airplanes, embed-
ded sensor networks, drones and even internet-based social networks (sensu citizen
science) all contribute to ever-increasing data streams, creating a rich playground for
machine learning approaches.
1
  https://insidebigdata.com/2018/10/19/report-depth-look-big-data-trends-challenges/ (accessed
January 30th, 2019).
2
  Found on https://cen.acs.org/physical-chemistry/computational-chemistry/machine-learning-
overhyped/96/i34 (accessed January 30th, 2019).
3
  https://medium.com/@karpathy/software-2-0-a64152b37c35 (accessed January 30th, 2019).
                                holger.lange@nibio.no
238                                                                 H. Lange and S. Sippel
                                           1     X
                                   YbðxÞ ¼                   yi
                                           k
                                               xi 2N k ðxÞ
where Nk(x) is the set of the k points xi in the training data which are closest to the
point x. Mean squared error (MSE) curves for approximating f are shown as a
function of k in Fig. 10.1a for the training and test datasets. For large k, kNN
regression “predictions” are close to the training sample mean; hence, a bias arises
toward the extreme cases of X resulting in underfitting (e.g., k ¼ 50 in Fig. 10.1b
where the prediction tends toward a constant at both edges). In contrast, for small k,
kNN “predictions” follow (arbitrarily) closely the training dataset (Fig. 10.1b).
Hence, by successively decreasing k, the training set MSE reduces monotonically
as flexibility is added to the model. This continuously reduces bias and leads
eventually to a perfect fit of the training dataset (Fig. 10.1b). However, a “perfect”
fit on the training dataset (such as shown for k ¼ 1 in Fig. 10.1b) is clearly
undesirable, as the model will generalize poorly to a different (“test set”) realization
of f (Fig. 10.1c), i.e., leading to high variance of the fit. Hence, generalization ability
to unseen data is a crucial property of any machine learning model, and consequently
training set MSE is not a reliable measure for model performance. MSE on an unseen
test (or validation) dataset is a sum of bias (resulting from too little model flexibility,
i.e., underfitting, such as for k ¼ 50 in Fig. 10.1b/c), variance (resulting from too
much model flexibility, i.e., overfitting, e.g., for k ¼ 1 in Fig. 10.1b/c), and
irreducible error (see for more details, e.g., Hastie, Tibshirani, and Friedman
(2008)); and hence from Fig. 10.1a, the modeler would choose a k-value that
minimizes test MSE.
    In most practical cases, hyperparameters are tuned through cross-validation, i.e.,
through a systematic partitioning of the training dataset into internal training/
validation sets.
    An important property of hydrological datasets is that they are highly structured
and typically involve substantial spatial and temporal correlations and
nonstationarities (Roberts et al. 2017) that might be induced either through missing
predictors or through underlying correlated noise. This property implies that a
random partitioning of data into folds for cross-validation, which is the default in
many standard software cross-validation routines, is prone to fail: The reason is of
course that data used internally for training and validation in cross-validation are
correlated independent, and hence the resulting MSE curve would resemble the
“training set MSE” in Fig. 10.1a; resulting in an underestimation of generalization
error of the model and leading to a model fit that is tuned too closely to the
                              holger.lange@nibio.no
10   Machine Learning Applications in Hydrology                                               239
Fig. 10.1 Illustration of the bias-variance trade-off as a key concept in machine learning that is
crucially relevant for hydrological applications. (a) Training and test set mean squared error of
k-nearest neighbor (kNN) regression with various tuning parameter values k used to approximate an
unknown function (third-degree polynomial). (b, c) kNN regression fits for the (b) training and (c)
test datasets. k-nearest neighbor regression fits are based on the “kknn” R-package (Schliep and
Hechenbichler 2016)
available data. Hence, a smart partitioning of the data into training and test datasets
is crucial particularly in real-world hydrological applications (Roberts et al. 2017).
For practical applications, the issue of correlations might be alleviated by choosing
block-wise folds (in space and/or time, i.e., eliminating information leakage
between folds) for cross-validation (Roberts et al. 2017). Although the problem
of correlated errors in prediction problems is known since quite some time (see
Schoups and Vrugt (2010) for hydrology, and Bergmeir et al. (2018) for an
overview), it has been argued that some predictions that failed gracefully such as
                                 holger.lange@nibio.no
240                                                               H. Lange and S. Sippel
the outcome of the 2017 US presidential elections or the onset of the 2008 financial
crisis had been at least partly caused by ignoring correlated errors among the
instances used for prediction (Silver 2012).
This is a classic algorithm for both classification and regression (Altman 1992) – and
has been used for illustration in Sect. 10.2. In feature space, i.e., the space spanned
by pairs of input and output vectors, one must find the k-nearest neighbors of any
sample point, which obviously requires a distance measure. Among the common
choices, the Minkowski distance (with distance parameter q) as a generalization of
the Euclidean distance is the most flexible one. In KNN regression, the output
(prediction) for the sample point is the weighted average of the values for the
k-nearest neighbors, where the weights are inversely proportional to the distances.
It might be necessary to reduce the dimensions of the problem first.
4
    https://mloss.org/software/
                                  holger.lange@nibio.no
10   Machine Learning Applications in Hydrology                                    241
Regularized regression methods have been developed to account for cases in which
the number of predictors is relatively large compared to the number of samples, i.e.,
where multiple linear regression might overfit (Hastie et al. 2008).5 The idea is to
reduce the regression model’s flexibility by shrinking the regression coefficients,
such as to avoid overfitting and to allow interpretability of the model. Shrinkage is
done by adding a constraint that encapsulates the size of the regression coefficients to
the least squares optimization function. Lasso and ridge regression are two related
regression methods that differ in the nature of shrinkage (Hastie et al. 2008), where
ridge regression shrinks coefficients toward zero based on the L2-norm, whereas
Lasso regression performs some kind of subset selection by preferably shrinking
coefficients exactly to zero via the L1-norm penalty. The degree of shrinkage is
regulated by a hyperparameter (λ in the equation below) that is typically determined
by cross-validation.
   Elastic net models (Zou and Hastie 2005) represent a blending of Lasso and ridge
regression, and uses an additional hyperparameter allowing to switch continuously
between the two regression methods. The vector of regression coefficients β is
obtained in elastic nets as
                                        h                    i
                βb ¼ argmin ky  Xβk2 þ λ αkβk2 þ ð1  αÞkβk1
                        β
where X is the matrix of predictors, λ is the penalty strength, and α is the new
hyperparameter (α ¼ 0 is Lasso, α ¼ 1 is ridge regression).
  Elastic nets are implemented in the glmnet package in R (Friedman et al. 2010).
These are the most well-known learning algorithms, probably also among the oldest
and with applications to hydrology since several decades (Maier and Dandy 1995;
Kuligowski and Barros 1998; Zealand et al. 1999; Lischeid 2001; Parasuraman et al.
2006; Wang et al. 2006; Daliakopoulos and Tsanis 2016). There are also many
excellent textbooks on the subject, so we provide only some key elements here. The
interested reader might consult Chapter 11 of (Hastie et al. 2008) or Chapter 5 in
Bishop (2006) for a thorough overview on ANNs.
5
 This book is freely available as pdf document from https://web.stanford.edu/~hastie/
ElemStatLearn/
                               holger.lange@nibio.no
242                                                              H. Lange and S. Sippel
   ANNs are motivated by an analogy to the physiology of the human brain, where
neurons are represented as nodes in a network, whereas the connections between
neurons (synapses) are links between them. Each neuron is equipped with an
activation function, determining according to its input value whether it “fires” (pro-
duces output) or not (or rather, the “firing” is a continuous process between none and
maximal firing). The prototypical architecture of an ANN is shown in Fig. 10.2. It
should be obvious from the figure that the analogy with the human brain should not
be taken too far. In the latter, there are no immediately identifiable “input” and
“output” neurons, let alone hidden layers between them.
   In many applications, there will be only one output node Y; it is also important to
note that foreseeing more than one hidden layer does not necessarily improve model
performance, consistent with the universal approximation theorem (Cybenko 1989)
which states that a feed-forward network with a single hidden layer containing a
finite number of neurons can approximate continuous functions, under mild assump-
tions on the activation function; therefore, one hidden layer is the most common
choice.
   ANNs have a long history, where the invention of the perceptron (Rosenblatt
1958) was a particular important milestone. Plagued by the bottleneck of insufficient
computing power and some theoretical problems, the proper breakthrough came
with the invention of the backpropagation algorithm (Werbos 1975), later made
popular in particular through the seminal work of (Rumelhart et al. 1986), which is
until today by far the most common method to determine the weights on the links
through training.
   There are some serious issues with ANNs in practical circumstances. The learn-
ing rate can be quite slow, preventing an efficient online updating, where the
networks learns continuously while new data are streaming in, rather than working
                               holger.lange@nibio.no
10     Machine Learning Applications in Hydrology                                  243
6
    see e.g., https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software
                                   holger.lange@nibio.no
244                                                                         H. Lange and S. Sippel
The starting point for support vector machines (SVMs) (Boser et al. 1992) is the
observation that there is always noise in any set of observations and that a
Fig. 10.3 The typical architecture of a convolutional neural network. The convolutional layer to
the right is a three-dimensional object which is connected to a small region of the layer before it
(box in the middle), the receptive field, which, in turn, has few connections with the layer before it
(left). (Source: Wikipedia (https://en.wikipedia.org/wiki/Convolutional_neural_network);
Reproduced under Creative Commons license type Attribution-Share Alike 4.0 International)
                                  holger.lange@nibio.no
10   Machine Learning Applications in Hydrology                                    245
(generalized) regression should not run through data points which differ less than an
assumed threshold ε; SVM regression is sometimes also called ε-insensitive regres-
sion. Only the data points outside of the threshold are used for the regression; they
constitute the support vectors (Fig. 10.4). Therefore, SVMs are especially suitable
when analyzing noisy data. Arbitrary nonlinear regression is possible when using
kernel functions (the “kernel trick” (Hofmann et al. 2008)). Typical kernels are
Gaussian (in this context often called “Radial Basis Functions” (RBF)) or polyno-
mial ones, but also sigmoidal kernels have been tried. This flexibility comes also
with a price, however: the selection of the kernel function is largely heuristic and
based on trial and error.
   The best-known application of SVMs is the automatic identification of handwrit-
ten digits, but there are also other important areas like bioinformatics, biochemistry,
and not the least environmental sciences, including hydrology.
One of the key ingredients of many machine learning algorithms are decision trees.
These are (graphical) presentations of conditional statements about the outcome of
decisions; depending on weights (probabilities) of the input features, one branch or
the other in a decision tree is taken. Observations are contained in the branches,
whereas target values (predictions) are contained in the leaves. Thus, decision trees
are examples of prediction models. If the target values are continuous, decision trees
are often called regression trees. The prototype and arguably the oldest regression
tree procedure is the Classification and Regression Tree (CART) introduced by
                                holger.lange@nibio.no
246                                                                H. Lange and S. Sippel
7
    https://cran.r-project.org/web/packages/rpart/index.html
                                    holger.lange@nibio.no
10     Machine Learning Applications in Hydrology                                    247
8
    https://github.com/gbm-developers/gbm3
                                  holger.lange@nibio.no
248                                                                H. Lange and S. Sippel
(threshold) values. The M5 model tree treats the standard deviation of the class
values that reach a node as the node error and calculates the expected reduction in
this error by splitting the node and testing each attribute at that node. Among all
splits, M5 choses the one which maximizes the expected error reduction. In most
cases, a subsequent pruning step is required. M5 was introduced by Quinlan (1993).
    Cubist or M5-cubist (Loh 2011) is a further extension of M5 avoiding sudden
“jumps” or discontinuities potentially arising in M5 when changing the decision
node, if the corresponding linear models have very different coefficients. M5-cubist
is available through the Cubist R package.9
As the diversity of methods discussed in this chapter indicate, it is not always easy to
decide for the “optimal” approach when confronted with a specific modeling prob-
lem or dataset. Meta approaches (or ensemble models) combine the predictions of a
suite of (machine learning) algorithms (level 0 methods). These predictions are input
(predictor variables) to another layer of machine learning (level 1). This procedure
resembles boosting in the GBM or bagging in random forests and is referred to as
stacked regression (Wolpert 1992). It can also be iterated (level 2, i.e., meta-meta
models and so on), although this has been rarely seen until now.
As machine learning applications are still relatively new in the hydrological litera-
ture, the performance is usually compared to more conventional approaches, or
against each other. This raises the question how to compare observations and pre-
dictions (or simulations). There are many ways to do this; it is not obvious which
metric serves the purpose best, and it also depends on the focus of the data analyst: is
reproduction of the mean and the variance most important? Should the autocorrela-
tion function be reproduced (or the power spectrum)? Should the prediction do well
only on short time scales, or also on longer ones (statistically)? Is the reproduction of
the seasonal cycle (phase, asymmetry) an issue? There are many more aspects of the
time series to be considered. However, the vast majority of papers restrict themselves
to very basic metrics, usually delivering just one number for the whole record
indicating the data-model mismatch: the correlation coefficient, Root Mean Squared
9
    https://cran.r-project.org/web/packages/Cubist/index.html
                                    holger.lange@nibio.no
10   Machine Learning Applications in Hydrology                                   249
Error (RMSE), Mean Absolute Error (MAE) or the Nash-Sutcliffe Efficiency (Nash
and Sutcliffe 1970). There is a risk that according to these simple metrics, the
methods do not differ much from each other, and their ranking is much more a
random function of the case study at hand rather than a typical one for the research
area. In some cases, it might also not do justice to some methods which have a very
good description of the general dynamics, but simply fail to reproduce the correct
scale (or the mean value) accurately enough.
   There are lots of more sophisticated model-data comparisons available; two
examples are (Lange et al. 2013) and (Tongal and Berndtsson 2017), focusing on
the complexity of streamflow. In general, we advocate to optimize for more than just
one metric (or several, but completely independent from each other), i.e., a multi-
objective optimization (Miettinen 1999), to proper evaluate the different machine
learning techniques and to create structure in the zoo of algorithms and their
performance in hydrological applications.
                               holger.lange@nibio.no
250                                                                H. Lange and S. Sippel
same paper). A strictly univariate approach (called “top-down black box”), i.e.,
using only streamflow to predict streamflow, was used in Wang et al. (2006). It
turned out that careful preprocessing, in particular deseasonalization without nor-
malization, was the clue to obtain good performances; consistent with that, periodic
ANNs performed best in their case study.
   The excellent performance of ANNs for real-time forecasting purposes was
documented in Toth and Brath (2007); however, these authors point to the impor-
tance of extensive hydro-meteorological datasets which the ANNs need for training.
   Daliakopoulos and Tsanis (2016) compare an ANN with a conceptual model for
rainfall-runoff modeling. They conclude that, on average, their ANN variant (“input
delay neural network”) is superior to the conceptual model, but it is outmatched for
low flow conditions. It could be that extrapolation to yet unseen conditions (not
contained in the training set) is a problem for ANNs. A comparison of ANNs to a
newer version of it, the extreme learning machine (ELM) (Huang et al. 2006), was
performed in Yaseen et al. (2018). ELMs are single-hidden layer feedforward neural
networks; however, the output weights are calculated analytically rather than itera-
tively using gradient-based methods: they are non-tuned and tremendously faster
than ordinary ANNs. In the paper, it was also shown that they exhibit slightly better
performance on different time scales. Another paper on ELM expands the method to
incorporate online changes in the network structure (number of hidden nodes),
which the authors call variable complexity online sequential ELM (Lima et al.
2017). For daily streamflow prediction, this extension is clearly outperforming the
standard ELM.
   In an applied approach (Bozorg-Haddad et al. 2018), ANNs and SVMs are used
to give advice for the optimal operation of water reservoirs. In their case, SVMs
outperform ANNs in the forecasting scenarios. In a similar vein, Siqueira et al.
(2018) investigate forecasting for Brazilian hydroelectric power plants using “unor-
ganized” machines—ELM and so-called echo state networks—in comparison with
standard ANNs. Concerning quality metrics, they present an exception from the rule
as they consider the partial autocorrelation function, mutual information and max-
imum relevance/minimum common redundancy as evaluation criteria. Still another
exception is the already mentioned paper (Tongal and Berndtsson 2017) which
conclude for daily streamflow data, based on entropy and complexity analysis, that
ANNs are well suited for 1-day forecasting, but should be used with care beyond
2-day forecasting (their favorite algorithm for longer forecasting horizons, the Self-
Exciting Threshold AutoRegressive Model (SETAR), is not in the class of machine
learning techniques).
                              holger.lange@nibio.no
10   Machine Learning Applications in Hydrology                                   251
It should be obvious at this stage that there is no single best machine learning
technique for hydrological applications. The request for analysis and modeling is
spread over different temporal and spatial scales, the amount of (training) data is
very different, and even if we focus here on prediction tasks, comparisons with
conceptual or process-based models is occasionally desired. This can’t be covered
by a single framework. Thus, an increasing number of papers is comparing several
methods and elucidate their respective strengths and weaknesses for the application
at hand.
   Groundwater potential mapping is the focus of Naghibi and Pourghasemi (2015).
Here, boosted regression trees, CART (which is not necessary in the class of
machine learning algorithms), and random forests were trained on spring locations,
using 14 predictors. These three were all performing very well, but almost indistin-
guishable, and were outperforming conventional methods easily. The heterogeneity
of the predictors makes the task a typical application domain for machine learning.
                               holger.lange@nibio.no
252                                                              H. Lange and S. Sippel
   Using wavelets as base functions for ANNs, Shafaei and Kisi (2017) compare
ANNs to SVM for daily streamflow prediction. For short prediction times (up to
3 days), the wavelet-based ANN was outperforming both ordinary ANN as well
as SVM.
   In the context of daily streamflow from semiarid mountainous regions, Yin et al.
(2018) compare SVR, a multivariate adaptive spline algorithm (MARS) and the M5
tree discussed above, where M5 turned out to be the winner for short-term prediction
up to 3 days.
   A new technique coined selected model fusion is developed in (Modaresi et al.
2018a, b). This is an example where the individual methods are not performed
individually and then compared against each other; rather, the output of all of
them (ANN, SVR, K-NN, and ordinary multiple linear regression) are fused together
with an ordered selection algorithm. The fusion of the outputs is superior to even the
best of the individual methods for the case of monthly streamflow prediction,
demonstrating that it is feasible to combine the respective strengths of single
algorithms.
   Finally, Worland et al. (2018) set up most of the methods presented here (i.e.,
elastic nets, gradient boosting machine, KNN, M5-cubist, and SVM) to predict a
peculiar extreme statistics, the 7-day mean streamflow which corresponds to the 10%
quantile (“7Q10”), i.e., an indicator for low flow. They tackle the problem of
generalizing these 7Q10 values obtained from gauged sites also to ungauged ones.
They also exploit stack generalization, using a M5-cubist as metamodel, resembling
a Leave-One-Out Cross-Validation, but now on the level of machine learning
techniques (plus three baseline models not related to machine learning). The meta-
model (“meta cubist”) outcompetes each individual method in terms of the standard
quality metrics RMSE and Nash-Sutcliffe.
10.5 Outlook
Machine learning and deep learning are on their way to becoming the key empirical
analysis techniques in hydrology; and increasingly ML applications in hydrological
studies are made reproducible through code sharing (e.g., Peters et al. 2016).
According to Shen (2018), however, we are still in the early “value discovery”
stage of deep learning. One can expect synergies between deep learning/machine
learning methods and process-based models; it is possible that patterns detected by
the automatic methods initiate new questions on the relevance and nature of pro-
cesses at different scales, leading to new routes in mechanistic modeling. The bet is
open on whether this is going to happen. So far, process-based modeling and ML
approaches are more in competition to each other, and their proponents often belong
to different scientific communities.
    However, the interplay between process understanding and ML application is a
more complex one. Broadly speaking, ML results are based solely on data presented
to the algorithm, and do not come with any interpretation of what is going on in the
                             holger.lange@nibio.no
10     Machine Learning Applications in Hydrology                                            253
system. It is rather easy to produce colorful results devoid of any meaning unless
combined with expert knowledge. In return, this knowledge can be sharpened,
confirmed, or revised with the aid of pattern detection based on ML. It is this
feedback loop which should be pursued further, be it in the field of hydrology proper
or as an interdisciplinary effort.
    Arguably the central benefit of deep learning is the learning of huge amounts of
unsupervised (or unlabeled, uncategorized) data. The pattern extraction, information
retrieval, classification, and prediction abilities of deep learning algorithms indicate
their suitability for “Big Data Hydrology” (Irving et al. 2018). Still, the low maturity
of deep learning warrants extensive further research (Najafabadi et al. 2015). A
prerequisite of big data analysis methods to be successful is the presence of big data
in the first place. The monitoring of hydrological systems has to be continued and
extended. This is a challenging and long-lasting task, since some patterns are only
apparent in time series extending over decades.
    We expect the field of ML and deep learning to expand rapidly, with a prolifer-
ation of publications also in the field of hydrology. Currently, SVMs, CNNs, and
random forests appear to be the most actively investigated algorithms, but new ones
are forthcoming. Thus, it is likely that any future chapter on ML in hydrology in the
coming years would probably focus on new additional (or different) methods as well
as those that are now being widely utilized.
    At the time of writing, a special collection of Water Resources Research on “Big
Data and Machine Learning in Water Sciences: Recent Progress and Their Use in
Advancing Science”10 is being compiled, where seven articles already have been
published. We are waiting with excitement to the rest of the contributions and their
discussion within the community of hydrology researchers.
References
10
     https://agupubs.onlinelibrary.wiley.com/doi/toc/10.1002/(ISSN)1944-7973.MACHINELEARN
                                 holger.lange@nibio.no
254                                                                      H. Lange and S. Sippel
                                 holger.lange@nibio.no
10   Machine Learning Applications in Hydrology                                                255
                                 holger.lange@nibio.no
256                                                                        H. Lange and S. Sippel
Peters J, Janzing D, Schölkopf B (2017) Elements of causal inference, Foundations and learning
    algorithms. MIT Press, Cambridge, MA. 288 p
Peters R, Lin Y, Berger U (2016) Machine learning meets individual-based modelling: self-
    organising feature maps for the analysis of below-ground competition among plants. Ecol
    Model 326:142–151. https://doi.org/10.1016/j.ecolmodel.2015.10.014
Quinlan JR (1993) Combining instance-based and model-based learning. In: Proceedings of the
    tenth international conference on machine learning. Morgan Kaufmann, Amherst, MA, pp
    236–243
Raghavendra SN, Deka PC (2014) Support vector machine applications in the field of hydrology: a
    review. Appl Soft Comput 19:372–386. https://doi.org/10.1016/j.asoc.2014.02.002
Rasouli K, Hsieh WW, Cannon AJ (2012) Daily streamflow forecasting by machine learning
    methods with weather and climate inputs. J Hydrol 414–415:284–293. https://doi.org/10.
    1016/j.jhydrol.2011.10.039
Rasp S, Pritchard MS, Gentine P (2018) Deep learning to represent subgrid processes in climate
    models. Proc Natl Acad Sci USA 115:9684–9689. https://doi.org/10.1073/pnas.1810286115
Richards LA (1931) Capillary conduction of liquids in porous mediums. Physics 1:318–333.
    https://doi.org/10.1063/1.1745010
Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J, Guillera-Arroita G et al (2017) Cross-validation
    strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography
    40:913–929. https://doi.org/10.1111/ecog.02881
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization
    in the brain. Psychol Rev 65:386–408. https://doi.org/10.1037/h0042519
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating
    errors. Nature 323:533–536. https://doi.org/10.1038/323533a0
Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev
    3:210–229. https://doi.org/10.1147/rd.33.0210
Schliep K, Hechenbichler K (2016) kknn: Weighted k-Nearest Neighbors. R package version 1.3.1.
    https://CRAN.R-project.org/package¼kknn
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117.
    https://doi.org/10.1016/j.neunet.2014.09.003
Schoups G, Vrugt JA (2010) A formal likelihood function for parameter and predictive inference of
    hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resour Res
    46:W10531. https://doi.org/10.1029/2009WR008933
Schultz W (2007) Reward signals. Scholarpedia 2:2184. https://doi.org/10.4249/scholarpedia.2184
Shafaei M, Kisi O (2017) Predicting river daily flow using wavelet-artificial neural networks based
    on regression analyses in comparison with artificial neural networks and support vector machine
    models. Neural Comput Appl 28:S15–S28. https://doi.org/10.1007/s00521-016-2293-9
Shen C (2018) Deep learning: a next-generation big-data approach for hydrology. EOS Trans 99.
    https://doi.org/10.1029/2018EO095649
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G et al (2016) Mastering the
    game of Go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.
    1038/nature16961
Silver N (2012) The signal and the noise: why so many predictions fail--but some don’t. Penguin
    Books, New York. 560 p
Siqueira H, Boccato L, Luna I, Attux R, Lyra C (2018) Performance analysis of unorganized
    machines in streamflow forecasting of Brazilian plants. Appl Soft Comput 68:494–506. https://
    doi.org/10.1016/j.asoc.2018.04.007
Sivapalan M (2003) Process compexity at hillslope scale, process simplicity at the watershed scale:
    is there a connection? Hydrol Process 17:1037–1041. https://doi.org/10.1002/hyp.5109
Sivapalan M (2006) Pattern, process and function: elements of a unified theory of hydrology at the
    catchment scale. Encycl Hydrol Sci. https://doi.org/10.1002/0470848944.hsa012
Sivapalan M, Grayson R, Woods R (2004) Scale and scaling in hydrology. Hydrol Process
    18:1369–1371. https://doi.org/10.1002/hyp.1417
                                 holger.lange@nibio.no
  10       Machine Learning Applications in Hydrology                                                257
                                     holger.lange@nibio.no
View publication stats