Math Geosci (2010) 42: 101–127
DOI 10.1007/s11004-009-9259-8
Estimating Intrinsic Formation Constants of Mineral
Surface Species Using a Genetic Algorithm
Adrián Villegas-Jiménez · Alfonso Mucci
Received: 1 April 2007 / Accepted: 21 November 2009 / Published online: 10 December 2009
© International Association for Mathematical Geosciences 2009
Abstract The application of a powerful evolutionary optimization technique for the
estimation of intrinsic formation constants describing geologically relevant adsorption reactions at mineral surfaces is introduced. We illustrate the optimization power
of a simple Genetic Algorithm (GA) for forward (aqueous chemical speciation calculations) and inverse (calibration of Surface Complexation Models, SCMs) modeling
problems of varying degrees of complexity, including problems where conventional
deterministic derivative-based root-finding techniques such as Newton–Raphson, implemented in popular programs such as FITEQL, fail to converge or yield poor data
fits upon convergence.
Subject to sound a priori physical–chemical constraints, adequate solution encoding schemes, and simple GA operators, the GA conducts an exhaustive probabilistic
search in a broad solution space and finds a suitable solution regardless of the input values and without requiring sophisticated GA implementations (e.g., advanced
GA operators, parallel genetic programming). The drawback of the GA approach is
the large number of iterations that must be performed to obtain a satisfactory solution. Nevertheless, for computationally demanding problems, the efficiency of the
optimization can be greatly improved by combining heuristic GA optimization with
the Newton–Raphson approach to exploit the power of deterministic techniques after
the evolutionary-driven set of potential solutions has reached a suitable level of numerical viability. Despite the computational requirements of the GA, its robustness,
flexibility, and simplicity make it a very powerful, alternative tool for the calibration of SCMs, a critical step in the generation of a reliable thermodynamic database
describing adsorption equilibria. The latter is fundamental to the forward modeling
of the adsorption behavior of minerals and geologically based adsorbents in hydro-
A. Villegas-Jiménez () · A. Mucci
Earth and Planetary Sciences, McGill University, 3450 University Street, Montréal, QC H3A 2A7
Canada
e-mail: adriano@eps.mcgill.ca
102
Math Geosci (2010) 42: 101–127
geological settings (e.g., aquifers, pore waters, water basins) and/or in engineered
reactors (e.g., mining, hazardous waste disposal industries).
Keywords Evolutionary programming · Heuristic optimization · Inverse modeling ·
Surface complexation modeling and calibration · Adsorption equilibria
1 Introduction
The quantitative characterization of the sorptive properties of minerals and geologically based adsorbents (geo-adsorbents) is key to the understanding of natural geochemical processes (e.g., solute mobility/sequestration) in hydro-geological settings
(e.g., aquifers, pore waters, water basins) and to the optimization of multiple engineering processes (e.g., mining, hazardous waste disposal) and wastewater treatment technologies. Among the approaches devised for the quantitative description of
adsorption equilibria (e.g., isotherm equations, partition coefficients), Surface Complexation Models (SCMs) represent, at present, the most geochemically sound and
powerful theoretical framework for the prediction of adsorption equilibria. Detailed
descriptions of SCMs can be found in most modern aquatic chemistry/geochemistry
textbooks (e.g., Morel and Hering 1993; Stumm and Morgan 1996; Langmuir 1997).
In the last few decades, a number of computer programs have been developed and successfully validated to perform SCM-based routine calculations of
adsorption equilibria in heterogeneous systems involving aqueous and adsorbent
phases (forward modeling): MICROQL II (Westall 1979); WATEQ (Ball et al.
1981); HYDRAQL (Papelis et al. 1988); SOILCHEM (Sposito and Coves 1988);
MINTEQA2/PRODEFA2 (Allison et al. 1991); MINEQL+ (Schecher and McAvoy
1992); EQ3NR (Wolery 1992); WHAM (Tipping 1994); PHREEQC (Parkhurst
1995); GEOSURF (Sahai and Sverjensky 1998); CHESS (van der Lee and de Windt
1999), and ECOSAT (Keizer and van Riemsdijk 1999). These computer codes were
preceded by the program HALTAFALL, developed by a group of Swedish chemists
(Ingri et al. 1967). The latter may have been the first computer code devised for the
computation of the equilibrium concentrations of chemical species in a fluid (liquid
or gaseous) phase in the presence of one or several solid phases and, hence, laid the
foundations of all modern geochemical equilibrium codes.
In general, the solution to adsorption equilibrium problems can be achieved by two
equivalent approaches: (i) the Gibbs Free Energy Minimization of the system (GEM)
and (ii) the application of Laws of Mass Action (LMA) and mass balance constraints
where chemical species concentrations are relaxed until solution of the derived set of
nonlinear equations (see Zeleznik and Gordon 1968, for a review of both methods).
This latter approach was exploited by Morel and Morgan (1972) nearly four decades
ago to develop a derivative-based iterative numerical procedure for the solution of
aqueous chemical speciation in homogeneous and heterogeneous systems (forward
problem) that was later extended to the computation of adsorption equilibria (Westall
1979). This method typically computes a correct and unique solution, provided that
the geochemical equilibrium problem is mathematically defined in terms of adequate
chemical components (Tableau approach, see Morel and Hering 1993) and the set
Math Geosci (2010) 42: 101–127
103
of values initializing the iterative procedure is wisely chosen to prevent convergence
problems (Westall 1979). Most modern forward adsorption modeling computer codes
are based upon the LMA-Tableau approach.
Similarly, computer codes may be adapted for the calibration of Surface Complexation Models (SCMs) that typically involve the estimation of SCM parameters such
as intrinsic formation constant(s) [K int’s ], capacitance(s) [C], and/or total concentrations of primary surface sites [TOTsite ] from experimental sorption (i.e., adsorption/desorption) data (inverse modeling). For example, FITEQL (Herbelin and Westall 1996) is a derivative-based nonlinear least squares optimization program (also
based upon the LMA-Tableau approach) that is commonly used to obtain best estimates of intrinsic formation constants of mineral surface species using data from
batch or titration adsorption experiments. As most modern chemical and geochemical equilibrium programs, FITEQL shows few convergence problems, provided that
the number of adjustable parameters is not particularly large (especially when an extensive data set is not available) and the adjustable SCM parameters are not strongly
correlated (Herbelin and Westall 1996). Other inverse modeling codes with similar
applications either make use of deterministic root-finding approaches (ECOSAT-FIT,
Kinniburgh 1999), heuristic direct search minimization techniques (Turner and Fein
2006), or hybrid optimization schemes that combine heuristic Particle Swarm Optimization with deterministic Levenberg–Marquardt nonlinear regression (ISOFIT,
Matott and Rabideau 2008). The latter two were devised for specific inverse modeling applications: Protofit estimates proton adsorption/desorption intrinsic (surface
complexation) constants using a proton buffering function, whereas ISOFIT fits conditional constants to adsorption isotherms, in contrast to FITEQL and ECOSAT-FIT
that can extract intrinsic constants involving any type of sorbate(s) (in addition to
protons) within the framework of surface complexation theory. Recently, the nonlinear regression computer code UCODE-2005, typically used for the calibration of
water flow and contaminant transport models (Poeter et al. 2005), was coupled with
PHREEQC and ECOSAT forward modeling codes and used in inverse problem applications (i.e., calibration of SCMs).
Despite the usefulness of the aforementioned computer codes, it is well known that
derivative-based and simple hill-climbing numerical techniques are local in scope and
are often plagued by numerical instability and convergence problems, particularly
for nondifferentiable, discontinuous, and underdetermined (i.e., more unknowns than
data points) functions. Furthermore, nonlinear regression techniques are susceptible
to excessive parameter correlation (Essaid et al. 2003). Consequently, these techniques may provide solutions close to the initial “guess” values, possibly a local well
from which the solver may not be able to emerge, rather than the best solution, or
they may not converge at all. Accordingly, an alternative tool that can circumvent
or minimize these limitations and provide a higher flexibility in the optimization of
multiple SCM parameters (including multisorbate adsorption) would be desirable.
Genetic Algorithms (GAs) are efficient and robust heuristic, evolutionary, exhaustive sampling techniques that have been successfully used in a wide range
of applications (e.g., Holland 1975; Goldberg 1989; Mestres and Scuseira 1995;
Michalewicz 1996; Gen and Cheng 1997, 2000; Sait and Youssef 1999). Nowadays,
GAs and Simulated Annealing are the preferred stochastic optimization algorithms
104
Math Geosci (2010) 42: 101–127
(Mosegaard and Sambridge 2002) and are particularly reliable for small inverse problems (Mosegaard 1998). GA optimizations are performed using probabilistic rather
than deterministic rules and, thus, are especially well-suited for ill-conditioned, nonsmooth, discontinuous problems (Fernández Alvarez et al. 2008) and perform well
irrespective of the number of data points or the error associated with the data. This
contrasts with other conventional root-finding methods such as Newton–Raphson and
Quasi-Newton that require: (i) calculation of the local gradient, (ii) a reasonably wellbehaved (smooth) objective function with reasonably separated roots, and (iii) extensive data sets (Gans 1976; Epperson 2002).
In this paper, we examine the application of a simple GA to the solution of adsorption equilibrium inverse problems of varying degrees of complexity. We first verify
the ability of GAs to solve several forward aqueous speciation problems subjected to
identical thermodynamic, mass, and charge balance constraints to those of the inverse
problems. We then test the performance of the GA on several inverse problems requiring the optimization of multiple SCM parameters and compare the results against
those returned by FITEQL, the most frequently used inverse modeling speciation
code for SCM calibration.
2 Definition of the Adsorption Equilibrium Problem
Relevant definitions and elaborated explanations on adsorption equilibrium problems
in aqueous solutions can be found in the textbooks of Dzombak and Morel (1990),
Morel and Hering (1993), and Stumm and Morgan (1996). The generalized relationship defining the adsorption equilibrium inverse problem is given by (Malinverno
2000)
d = Gm + e,
(1)
m = Aθ ,
(2)
where d stands for the vector (single sorbate) or matrix (D, for multiple sorbates)
of experimental adsorption data (adsorption densities), m is the vector (for a single
chemical component) or matrix (M, for multiple chemical components) of measured
data (free concentrations of chemical components in aqueous solution), e is the vector
or matrix (E, for multiple chemical components) of measurement errors, θ is the
vector of unknown SCM parameters (e.g., K int’s , C’s, TOTsite ), G is the forward
modeling mass balance matrix that relates d (or D) and m (or M), and A is the
mass action matrix relating θ and m (or M). In this study, we focus on the solution
of inverse adsorption equilibrium problems (computation of elements of θ ) via GAbased exhaustive sampling, but, as a preliminary step to evaluate the performance
of this technique in the optimization of variables or parameters varying over several
orders of magnitude, we also address the solution of the aqueous equilibrium forward
problem which is defined by
t = H s + e,
(3)
s = Bω,
(4)
Math Geosci (2010) 42: 101–127
105
where H is the forward modeling mass balance matrix relating t (vector of total
concentrations of chemical components in aqueous solution) and s (vector of concentrations of free chemical components in aqueous solution), and B is the mass action law matrix relating ω (vector of thermodynamic formation constants of aqueous
species) and s. Given known values of ω and t, the elements of vector s (unknowns)
can be calculated via root-finding (e.g., Newton–Raphson) or optimization (e.g., GA)
techniques.
3 Implementation of the Genetic Algorithm
Matlab© software (MathWorks, Inc.) was used to write the subroutines in which we
incorporated a modified version of the GA originally written by Ron Shaffer from
the Chemometrics Research Group of the Naval Research Laboratory (USA). Six
subroutines are required: (i) EQUIL reads the input file containing all the information
relevant to the definition of the sorption equilibrium problem, (ii) FITGEN defines the
GA parameters, performs the binary-string encoding (see below), and initializes iterations, (iii) FITLOG decodes the potential set of solutions, performs all calculations
defining the objective function, and computes the weighted squared residuals corresponding to the mass and/or charge balance equations (see later sections); finally,
three additional subroutines, (iv) EVAL_GA, (v) MUT_GA, and (vi) XOVER_GA,
contain the genetic operators, selection, mutation, and crossover, required by the evolutionary process.
The optimization of the unknown quantities (s for the forward problem and θ for
the inverse problem) is based on global minimization of selected objective functions
(see below). Each unknown was encoded as a binary string within a section of the solution chromosome. The length of each section (the number of bits nj ) is proportional
to the search domain and constrained to reasonable boundary values (i.e., maximum
and minimum for each parameter). The length of each section is given by
nj = log2 (Vj ),
(5)
where Vj stands for the boundary value that requires the maximum number of bits
for its encoding. For instance, in the forward problem, the maximum value assigned
to the free chemical component concentrations would correspond to the total analytical molar concentration (i.e., free plus complexed species), whereas the minimum is
assigned an arbitrary value of 10−50 M, grossly overestimating the degree of interaction with other chemical components in the system. Accordingly, the search domain
would be defined by log2 (10−50 ), which, in binary representation, corresponds to 166
bits. To shorten the string length and, thus, save computational time while maintaining an acceptable numerical precision of the adjustable parameters, Vj values were
expressed as 103 times their logarithmic values. Accordingly, the modified string
length (nj -ext )
(6)
nj -ext = log2 log10 (Vj ) · 103
corresponds to a chromosome section of 16 bits for each chemical component. This
operation requires that the decoded values (extended logarithmic units) be divided by
106
Math Geosci (2010) 42: 101–127
a factor of 103 at each generation (iteration) to be consistent with units of the objective function described below. This simple encoding scheme substantially reduces
round-off and truncation error during GA optimization while keeping the chromosome size practical for GA optimization. In addition, as recognized earlier (Fernández
Alvarez et al. 2008), logarithmic parameterization linearizes the correlation structure
among model parameters, improving the sampling efficiency by reducing the number
of rejected moves in the algorithm. All fitting parameters involved in the solution of
forward and inverse problems were encoded according to this scheme.
Simple stochastic genetic operators (Gen and Cheng 2000) were used in all optimization problems presented in this study. To carry out chromosome selection, the
binary tournament operator was used (Goldberg et al. 1989). Only the fittest chromosome from each generation was preserved to exploit its entire numerical genotype
(encoded solution) in the next generation. This operation is called elitism and was
used in all the equilibrium problems described in this paper. All other selected chromosomes participate in the crossover and mutation operations to generate a transient
set of solutions to the problem. In this study, we tested four crossover operators: onecut-point, two-cut-point, uniform (Gen and Cheng 2000), and the randomized and/or
crossover (RAOC, Keller and Lutz 1997). This operator is essential because the robustness of GA comes from its ability to transmit information and create, after a number of generations, better fitted individuals. Hence, the search for the best individual
is not blind, as in random walk procedures, but guided (Mestres and Scuseira 1995).
Similarly, different types of mutation techniques can be carried out (e.g., Sait and
Youssef 1999), but only the simplest type, the so-called “uniformly distributed random mutation,” was applied in this study. Low mutation probabilities, equal to the
reciprocal of the length of the chromosomes (Keller and Lutz 1997), were chosen to
avoid pushing the population towards unfavorable areas of the solution space. Nevertheless, higher probabilities (0.05, 0.1, and 0.15) were also tested but produced
statistically identical results. In all optimizations performed in this study, all other
GA parameters (population size, number of generations, and type and probability of
crossover) were arbitrarily chosen for each run and were empirically optimized for
each type of problem.
In the following sections, we illustrate the application of a simple GA to a number
of forward and inverse problems defined within the LMA-Tableau approach. The fitting strategy shown in Fig. 1 applies in all inverse problems presented here, but some
adaptations were made to meet problem-specific requirements and are described below.
4 Application of the GA to the Forward Problem
In this section, we verified the performance of the GA in the solution of chemical
speciation (forward problem) which requires the optimization of variables (molar
concentrations) varying over several orders of magnitude. For simplicity, a single
phase (aqueous solution) was considered for the forward problem. We solved a number of aqueous speciation problems and compared the GA results to those returned
by commercially available programs (MINEQL+, HYDRAQL, and WHAM). As explained above, the forward problem consists of obtaining the elements of vector s
Math Geosci (2010) 42: 101–127
107
Fig. 1 Generalized fitting strategy to perform adsorption equilibrium inverse modeling using a genetic
algorithm
(see (3) and (4)). Hence, the GA searches the solution that best minimizes the total
sum of residuals (Y ) between the total experimental molar concentrations (free plus
complexed, vector t) specified by the modeler and those estimated from the mass
balance equations of all chemical components in the system as defined by Herbelin
108
Math Geosci (2010) 42: 101–127
and Westall (1996)
Y=
n
m
j =1
i=1
v(i, j )ci − tj
2
,
(7)
where the first term inside the brackets represents the calculated molar concentration of the j th chemical component, n is the number of species (aqueous) derived
from chemical component j th, v is the stoichiometric coefficient for the j th chemical component describing the formation of the ith aqueous species, ci is the molar
concentration of the ith aqueous species produced by chemical component j th, tj is
the total concentration of component j , and m is the number of chemical components
(Morel and Morgan 1972; Westall 1979).
Large differences in the experimental concentrations of the chemical components
may bias the optimization because of the weight carried by the individual residuals
(Rj , term in brackets in (7)). Consequently, these residuals were normalized (Rj ′ ) as
follows
1
.
(8)
Rj ′ =
exp(Rj ) + | log10 (Rj )|
In general, for problems with 10 chemical components or less, the GA returned a
suitable solution after 100 generations, using a population size of 100 and either the
one-point (Gen and Cheng 2000) or the RAOC crossover strategy (Keller and Lutz
1997) at 10% of crossover probability. For this type of applications, both crossover
operators appeared to outperform the two-cut-point and uniform operators both in
terms of speed and ability to locate the best solution in the search space. The GApredicted concentrations of chemical components for three equilibrium problems (involving 4, 6, and 10 chemical components and 7, 32, and 39 chemical species, respectively) were nearly identical (RSD ≤ 0.03%) to those obtained using HYDRAQL,
WHAM, and MINEQL+ for ionic strengths ≤0.001 M. This exercise confirmed the
efficiency of the GA in dealing with optimization problems with numerous adjustable
parameters varying over several orders of magnitude and subjected to identical constraints to those of the inverse adsorption problem that we wish to address via stochastic optimization. Hereafter our discussion will be exclusively focused on the solution
of inverse problems.
5 Application of the GA to the Inverse Problem
5.1 Estimation of Intrinsic Ionization Constants: Constant Capacitance Model
Our main objective is to implement a reliable and flexible approach to address specific inverse problems that cannot be easily handled by conventional, deterministic,
derivative-based, root-finding techniques implemented in popular SCM calibration
programs such as FITEQL.
Because a reasonable a priori knowledge is available (physical–chemical constraints, geochemistry of the adsorbent phase, etc.), the viability of conceptual adsorption reactions can be evaluated intuitively against specified criteria prior to optimization, greatly reducing the number of alternative SCMs that deserve systematic
Math Geosci (2010) 42: 101–127
109
evaluation via inverse modeling. Furthermore, emphasis on conceptual and mathematical simplicity in the formulation of SCMs is paramount and must be consistent with the quantity and quality of available data. As emphasized in earlier studies
(e.g., Herbelin and Westall 1996), when several SCMs fit the data equally well, the
most parsimonious one is preferred unless there is compelling evidence in support
of another. Models with many degrees of freedom incur serious risks, among which:
(i) fitting of inconsistent or irrelevant noise in the data records; (ii) severely diminished predictive power; (iii) the generation of ill-defined, near-redundant parameter
combinations; and (iv) masking of geochemically significant behavior derived from
data over-fitting (Jakeman et al. 2006). Furthermore, the inverse problem should be
well determined and, hence, contain more data points than adjustable parameters. Finally, a posteriori physical–chemical evaluation of the fitted SCM parameters is key
to ascertain their thermodynamic relevance within the SCM. It follows that within
this scheme, the calibration of SCMs, by deterministic or heuristic approaches, is
properly constrained and goes well beyond a mere data fitting exercise.
We illustrate the above approach by testing the GA on various adsorption equilibrium inverse problems. The first and simplest one calls for the estimation of surface
ionization (i.e., proton adsorption/desorption) constants from proton adsorption data
at a mineral surface. For a generic hydrated (primary) surface site or adsorption center (e.g., ≡S·H2 O), these reactions can be generalized as follows (Davis and Kent
1990)
≡S·H2 O0 + H+ ⇔ ≡S·H3 O+
≡S·H2 O0 ⇔ ≡S·OH− + H+
(protonation),
(9a)
(deprotonation).
(9b)
Each of the above reactions is governed by an intrinsic ionization constant whose
estimation represents the essence of the inverse problem (calculation of elements
of θ ). The first step is to define the generalized objective function that applies to all
adsorption inverse problems. This implies the computation of the residuals between
the theoretical and experimental adsorption values, Yk , which, for an n number of
adsorbed species (i) produced by adsorbate, k, is given by
Yk =
n
i=1
v(i, k)Mi − tk ,
(10)
where Mi represents the predicted molar concentration of the ith adsorbate-bearing
surface species, v is the stoichiometric coefficient for the k adsorbate describing the
formation of the ith surface species, tk is the experimental adsorbed molar concentration of the adsorbate k. For each chromosome, the GA optimization is subjected to
WSSE =
p
r
Yk
1
1
Sk
2
,
(11)
where WSSE is the weighted sum of squared errors of an r number of adsorbate
components computed at all titration points, p, and Sk is the input error specified for
110
Math Geosci (2010) 42: 101–127
Yk from the experimental errors associated with the quantitative (analytical) determination of the kth adsorbate. Equation (11) is the generalized objective function used
in the optimization of intrinsic formation constants (ionization and adsorption) when
suitable adsorption data for all adsorbates under consideration are available.
To estimate intrinsic formation constants of mineral surface species, reactions
must be referenced to a zero electrical potential surface by taking into account, at
each stage of the titration, of the electrostatic work required to transport ions through
the interfacial electrical potential gradient according to (Dzombak and Morel 1990)
e
K app = K int ·
x=1
−ZF ψx
exp
,
RT
(12)
where K app is the apparent formation constant, K int stands for the intrinsic constant,
Z is the net charge transfer of the reaction, F is the Faraday constant, R is the
gas constant, T is the absolute temperature, e is the number of electrostatic planes
where explicit adsorption is assumed to take place, and Ψx is the electrical potential
at the plane(s) “x” involved in adsorption (e.g., 0-plane, Stern-plane; Davis and Kent
1990). For brevity, the latter term will, hereafter, be referred to as potential (ascribed
to a specific electrostatic plane). It is an adjustable parameter that, upon minimization
of (11), must satisfy the following constraint
q
F
·
zi [≡ Ci ] = σxelect ,
S ·A
(13)
i=1
where S is the specific surface area of the mineral (m2 g−1 ), A is the mass-volume
ratio of the experimental suspension (g L−1 ), zi and [≡ Ci ] are, respectively, the
charge and molar concentration of species i adsorbed at plane x, and q is the number of species contributing to the charge at plane x. The left-hand term gives the net
charge density at plane x (C m−2 ) from the surface species concentrations computed
at each generation, whereas the right-hand term, σxelect , represents the charge density
at plane x (C m−2 ) derived from a theoretical electrostatic model describing the relationship between the surface charge and surface potential (see Davis and Kent 1990).
The electrostatic correction is specified in the mathematical definition of the inverse
(and forward as well) adsorption equilibrium problem, according to the method presented by Westall and Hohl (1980). An e number of dummy chemical component(s)
is added to the model, corresponding to the electrostatic planes where explicit adsorption is assumed to take place according to the selected electrostatic model (see
below).
The GA was initially tested with data taken from Gao and Mucci (2001), who used
FITEQL v. 2.0 to optimize the intrinsic ionization constants of the goethite surface in
a 0.7 M NaCl solution considering the following set of ionization reactions
≡FeOH0 + H+ ⇔ ≡FeOH+
2,
0
−
≡FeOH ⇔ ≡FeO + H
+
(14)
(15)
which are governed, respectively, by the intrinsic ionization constants Ka1 and Ka2 .
Math Geosci (2010) 42: 101–127
111
To describe the electrostatics at the interface, these authors applied the Constant
Capacitance Model (CCM, Schindler and Kamber 1968; Hohl and Stumm 1976),
and, thus, this model was implemented in the Matlab© script to compute the value
of Ψx at each generation according to the following expression (Stumm and Morgan
1996)
σ0
(16)
ψ0 = ,
C
where C is the specific capacitance, and σ0 and ψ0 are the charge and potential at
the surface (i.e., plane-0) calculated from electrostatics σ0elect and ψ0elect . Given that
only ionization reactions were considered to take place at plane 0, experimental surface charge data are available (net proton adsorption densities are identical to surface charge densities), and, therefore, for a given C value, these data can be used
to compute the surface potential at each titration point and perform the electrostatic
correction using (12) and (16).
Aqueous equilibrium was solved first using either the GA or the Newton–Raphson
approach (implemented in an additional Matlab© subroutine) to compute the concentrations of the free chemical components in solution (computation of s, forward problem) before the K int values were optimized with the GA (inverse problem). In contrast to the strategy typically applied by FITEQL users, whereby the K int values are
numerically optimized, whereas the value of C is varied manually in each optimization and the goodness-of-fit evaluated on the basis of the WSOS/DOF (“weighted sum
of squares divided by the degrees of freedom”) parameter (Dzombak and Morel 1990;
Herbelin and Westall 1996), the K int and C values were optimized simultaneously by
the GA. Using the encoding rules described earlier, the capacitance can be added to
the chromosome encoding the solution to the intrinsic ionization constants and successfully treated as a fitting parameter.
Like the chemical component concentrations (see preceding section), K int values
were formulated in extended logarithmic units. Conventionally, chromosome binary
strings represent solutions from −25000 to 25000 (i.e., 10−25 to 1025 M or 50 orders
of magnitude) that approximately cover the range of formation constants typically reported in thermodynamic compilations (e.g., NIST 1998). Nevertheless, for specific
adsorption reactions, such a wide range may not necessarily represent thermodynamically meaningful quantities, and hence, based upon physical–chemical concepts, the
modeler may select a reasonable solution space for each adjustable parameter (as
explained above), constraining the range of potential solutions and its compatibility
with the physical reality. The specific capacitance was constrained between 0.2 and
2 F/m2 , covering the range of values previously reported for oxide mineral surfaces
(e.g., Hiemstra et al. 1999).
SCM parameters optimized with the GA and input values of S and A are presented
in Table 1 and compared to those reported by Gao and Mucci (2001). Because of the
probabilistic nature of the GA optimization, several GA runs were performed using
different GA parameters (e.g., population size, maximum number of generations, mutation rates, etc.), and the associated error was calculated for each adjustable quantity.
It should be noted that this error is associated to the GA-optimization rather than to
the experiment. True uncertainties of the optimized SCM parameters must be determined from multiple GA runs using independent replicate data sets (≥3). As shown
112
Math Geosci (2010) 42: 101–127
Table 1 Intrinsic acidity constants of the goethite surface in a 0.7 M NaCl solution returned by the GA
optimization and FITEQL using the Constant Capacitance Model. Errors represent confidence intervals at
95%. Results shown using the GA approach are averaged values obtained from three optimizations using
the following GA parameter sets (population size, number of generations, crossover rate, mutation rate):
[10, 1000, 0.1, 0.1], [50, 500, 0.1, 0.05], and [100, 1000, 0.1, 0.02]
Surface Equilibria
≡FeOH0 + H+ = ≡FeOH+
2
≡FeOH0 = ≡FeO− + H+
Capacitance (F/m2 )
Log10 K int
Gao and Mucci (2001)
This study
FITEQL 2.0
GA
7.45
−9.60
1.86
7.23 ± (< 0.01)a
−9.6 ± 0.04a
1.93 ± (< 0.01)a
Experimental conditions: A = 27.7 m2 g−1 , S = 7.93 g L−1
Site Density: 2.96 · 10−6 moles m−2
a Reported errors are a measure of the reproducibility of the optimization itself and, thus, do not provide
information on the associated experimental error
in Table 1, the optimized values are very reproducible, even when the population
size or number of generations varied significantly, and the mean values are close to
those obtained by Gao and Mucci (2001). Nevertheless, the K int and C values that
best describe the experimental data are slightly different than those obtained when
fixed capacitance values are used in the optimization, as in the case of FITEQL. In
other words, the GA facilitates the search of suitable K int and C values in a single
optimization run, in contrast to FITEQL and most available adsorption equilibrium
inverse modeling programs. Although the difference is small in this particular example, in other applications, relaxation of the capacitance(s) and minimization of
round-off and truncation errors via GA optimization may impact significantly on the
optimized SCM parameters. The experimental proton adsorption data and the GApredicted proton adsorption densities are shown in Fig. 2.
5.2 Estimation of Intrinsic Adsorption Constants: Constant Capacitance Model
Typically, direct, experimental surface charge data are unavailable when adsorption/desorption of adsorbates other than protons occurs. Consequently, surface charge
must be computed from the speciation predicted at each generation to obtain adequate
estimates of the surface potential and apply the appropriate electrostatic corrections
((12) and (16) for the CCM). Within the CCM scheme, this problem can be easily
circumvented by initializing the GA iterative process using a set of best guess surface charge values. This serves to preadjust the constants to reasonable estimates of
surface potential (related to surface charge via an electrostatic model: (16), Westall and Hohl 1980) and leads the chromosome population towards favorable regions
of the solution space. After a number of generations (iterth ), the GA will calculate
surface potentials from (16) using the surface charge computed from the predicted
surface speciation (left-hand term in (13)) and will estimate the electrostatic correction in subsequent generations. This procedure guarantees fulfillment of (13) from
generation iterth while pushing evolution towards successful minimization of (11).
Math Geosci (2010) 42: 101–127
113
Fig. 2 Data fit of proton adsorption data (goethite in a 0.7 M NaCl solution) obtained with the
GA-optimized ionization constants
Using published phosphate adsorption data on the goethite surface in 0.7 M NaCl
solutions (Gao and Mucci 2001), we reoptimized the corresponding intrinsic adsorpint , with the GA, using an identical conceptual SCM defined by the
tion constants, Kads
following set of surface reactions
2−
+
≡FeOH0 + H2 PO−
4 ⇔ ≡FePO4 + H + H2 O,
(17)
−
≡FeOH0 + H2 PO−
4 ⇔ ≡FePO4 H + H2 O,
(18)
+
≡FeOH0 + H2 PO−
4 + H ⇔ ≡FePO4 H2 + H2 O.
(19)
For this application, assigning initial zero potentials (i.e., null electrostatic corrections) to the surface allowed for a good performance of the GA in combination with a rather large maximum number of iterations (maxiter ≥ 300). Several
GA runs indicated that after 50% of the prefixed maxiter (150), the numerical viability of the evolved chromosome population was suitable for the computation
of surface potentials from the surface speciation (left-hand term of (13)) and (16)
and for successful fitting of the data (Fig. 3). The success of the optimization conint values within the CCM formalism, in
firms the ability of the GA to extract Kads
the absence of surface charge data, without explicit fitting of the surface potential.
int values are reported in Table 2 and compared to those obtained
The optimized Kads
by Gao and Mucci (2001). To see the impact of the GA optimization on the estimation
114
Math Geosci (2010) 42: 101–127
Fig. 3 Data fit of phosphate adsorption data (goethite in a 0.7 M NaCl solution) obtained with the
GA-optimized adsorption constants
Table 2 Intrinsic affinity constants of phosphate on the goethite surface in a 0.7 M NaCl solution returned
by the GA optimization and FITEQL using the Constant Capacitance Model. Errors represent confidence
intervals at 95%. Results shown using the GA approach are averaged values obtained from three optimizations using the following GA parameter sets (population size, number of generations, crossover rate,
mutation rate): [80, 500, 0.1, 0.1], [100, 500, 0.1, 0.15], and [100, 1000, 0.1, 0.02]
Surface Equilibria
2−
+
≡FeOH0 + H2 PO−
4 = ≡FePO4 + H + H2 O
−
0
−
≡FeOH + H2 PO4 = ≡FePO4 H + H2 O
0
+
≡FeOH0 + H2 PO−
4 + H = ≡FePO4 H2 + H2 O
Capacitance (F/m2 )
int
Log10 Kads
Gao and Mucci (2001)
This study
FITEQL 2.0
GA
0.70
7.83
12.47
1.86
−0.23 ± 0.16a
7.68 ± 0.1a
13.02 ± 0.48a
1.86 (Fixed)
Experimental conditions: A = 27.7 m2 g−1 , S = 0.234 g L−1
Site Density: 2.96 · 10−6 moles m−2
a Reported errors are a measure of the reproducibility of the optimization itself and, thus, do not provide
information on the associated experimental error
of phosphate adsorption constants independently of other SCM parameters, the Ka1 ,
Ka2 , and C values originally proposed by Gao and Mucci (2001) were used in this GA
Math Geosci (2010) 42: 101–127
115
int values corresponding to reactions
optimization exercise. Note that, whereas the Kads
(18) and (19) optimized by the GA are, within error, very similar to those estimated by
int of reaction (17) is nearly one order of magnitude
FITEQL, the GA-optimized Kads
lower. This discrepancy is attributable to the better numerical stability of the GA
optimization.
To evaluate the performance of the GA with more complex optimization problems
within the CCM scheme, we simultaneously optimized ionization and adsorption intrinsic constants using synthetic data of a gedanken experiment: an acidimetric titration of a reactive mineral (MeX(s) ) displaying moderate pH-promoted dissolution.
In this experiment, we considered that dissolved, lattice-constituent divalent metal
ions (Me2+ ) reabsorb on the mineral surface, competing with protons for reactive
surface sites and altering surface protonation and surface charge (see synthetic data
in Appendix). The ionization and metal adsorption surface reactions, formulated in
terms of the single-site 2-pK ionization model (Lützenkirchen 2003), are
≡MeOH0 + H+ ⇔ ≡MeOH+
2,
≡MeOH0 ⇔ ≡MeO− + H+ ,
0
≡MeOH + Me
2+
(20)
(21)
+
⇔ ≡MeOMe + H
+
(22)
pH, total dissolved Me2+ concentration (Me), total adsorbed proton ([H+ ]Ads , data
set (1), and total adsorbed metal concentrations ([Me2+ ]Ads , data set (2) compose the
available data at each titration point and were used in the minimization of (11) for the
optimization of the intrinsic constants describing reactions (20)–(22). It is assumed
that Me2+ does not form ion pairs or complexes in aqueous solution (i.e., total Me2+
concentrations are equal to free Me2+ concentrations). The chemical scenario of this
gedanken experiment is similar to classical, competitive ion adsorption studies where
the pH and the aqueous adsorbate(s) concentration(s) are simultaneously monitored
at each experimental point.
Using a population of 100 chromosomes and after 500 generations, the GA successfully reproduces all the experimental data, as shown in Fig. 4. In contrast, the
simultaneous optimization of two or three of the constants (reaction (20), (21), or
(22)) and the concentration of available primary surface sites (TOTsite ) with a specified capacitance (GA-optimized value) with FITEQL v. 3.2 resulted in convergence
problems and no output. When only the constant describing reaction (20) was optimized, while the constants for reactions (21) and (22) were fixed at the GA-optimized
values, FITEQL converged but gave very poor fits to the [Me2+ ]Ads , and [H+ ]Ads
data, with a WSOS/DOF value of 19.5. Similarly, when only one intrinsic constant
was estimated with FITEQL using the GA-optimized TOTsite value and after manual
trial-and-error adjustment of the other two constants and the capacitance, FITEQL
converged (WSOS/DOF = 2.8), and the fit to data set 1 largely improved (although
the visual goodness-of-fit was still inferior to that of the GA) but to the detriment of
the quality of the fit to data set 2 (Fig. 4). This reveals that, in some cases, successful
convergence of FITEQL does not necessarily return good fits to all the experimental
data. In turn, this raises questions about the validity of the recommended range of
WSOS/DOF values (0.1 to 20) for the evaluation of the goodness of fit, a parameter that strongly depends on the input error estimates used in its calculation (Westall
116
Math Geosci (2010) 42: 101–127
Fig. 4 Data fits of proton (upper panel) and metal (lower panel) adsorption data (Gedanken acidimetric
titration of MeX(s) in a 0.001 M NaCl solution) obtained with ionization and adsorption constants returned
by the GA and FITEQL: (A) WSOS/DOF = 19.5 and (B) WSOS/DOF = 2.8 (see text for details)
Math Geosci (2010) 42: 101–127
117
Fig. 5 Illustration of an 8-bit binary-string 3D matrix encoding a set of solutions for the “dummy” components (see text for details). The selected group of chromosomes that best minimizes the objective function
at all titration points represents the “elite” of the population and, therefore, is preserved in the next generation
1982). It appears that FITEQL converged towards a local minimum rather than to a
global one, and the relative contribution of reactions (20) and (21) (ionization reactions) versus (22) (divalent metal adsorption reaction) to the proton adsorption behavior could not be resolved accurately upon fitting of data sets 1 and 2. Conversely, the
GA optimization gradually minimizes, for both sets of data, the difference between
the experimental and simulated data, and, thus, the evolved generation retains critical
information for searching the global minimum. Given an adequate conceptual SCM
and sufficient high-quality data, the GA will find the combination of model parameters that best minimizes (11) even in cases that are difficult to tackle by deterministic
approaches.
5.3 Simultaneous Estimation of Intrinsic Ionization and Adsorption Constants:
Triple Layer Model
It is often necessary to adopt an electrostatic model other than the CCM to describe
the spatial distribution of charged surface species at the mineral–water interface
(Lützenkirchen 2003). For multilayer electrostatic models, such as the Triple Layer
Model (TLM, see Yates et al. 1974; Davis et al. 1978), a specific potential must be
defined at each interfacial plane (surface and Stern planes and diffuse layer). In this
case, the GA must find a suitable set of K int , C, and TOTsite values, intensive variables that in combination with appropriate potentials, extensive variables (Herbelin
and Westall 1996), minimize (11) while satisfying (13). Since the surface charge displayed by the mineral varies according to the chemical conditions of the system (i.e.,
pH, chemical speciation), individual potentials must be adjusted at each titration or
experimental data point, and for each electrostatic plane. Given the large number of
118
Math Geosci (2010) 42: 101–127
adjustable potentials, some modifications were made to the optimization approach
described earlier.
Sets of potential solutions for these extensive variables are encoded in a 3D binarystring matrix (Matrix-A) whose dimensions are determined by: (i) the selected population size, (ii) the number of available data points, and (iii) the number of interfacial
electrostatic planes (dummy components), excluding the diffuse region, required by
the selected multilayer electrostatic model (Westall and Hohl 1980). To define the
size of the chromosomes in Matrix-A and the solution space, realistic boundary values of surface potential must be chosen for each electrostatic plane. For this purpose,
zeta-potentials obtained from electrokinetic measurements are useful for the selection of these constraints (Dzombak and Morel 1990). The structure of a generalized
3D matrix, including the dummy components, is illustrated in Fig. 5.
Another matrix (Matrix-B) encodes the solutions for the intensive variables. Its
dimensions are dictated by the selected population size and the number of adjustable
intensive variables. To improve the effectiveness of the evolutionary process, MatrixA and Matrix-B were dimensioned and manipulated separately throughout the GA
optimization. In other words, genetic operations are carried out independently for
each matrix, maintaining a good numerical diversity in both matrices and facilitating
the search for the best solution to the optimization problem. Whereas the evaluation
of the chromosomes in Matrix-B is subject to (11), the evaluation of chromosomes in
Matrix-A is based on fulfillment of (13) by minimization of
Yσ =
p
e
calc
2
σx − σxelect ,
1
(23)
1
where the total sum of residuals (Yσ ) corresponds to the difference between
the charge calculated from the speciation predicted at plane “x” (σxcalc ) and the
charge computed from electrostatics for plane “x” using appropriate surface-chargepotential relationships (σxelect ). For the TLM, these are (Davis and Kent 1990)
σ0 = C1 (ψ0 − ψβ ),
(24)
σβ = C2 (ψβ − ψd ) − σ0 ,
√
σd = −0.1174 I sinh(ZF ψd /2RT ),
(25)
(26)
where, as before, σ and ψ represent, respectively, the charge and potential at each
specific interfacial region: (i) the surface (plane 0), (ii) the plane where outer-sphere
complexes are located (plane β), and (iii) the diffuse layer (plane d). C1 and C2
are the integral capacitance values of the interfacial layers, and F /RT describes the
Boltzmann factor for charged molecules and ions (38.92 V−1 ). Because the TLM
makes no explicit provision for ion adsorption at the diffuse layer, the calculation of
the diffuse layer charge, σd , is obtained from the charge neutrality equation (Davis
and Kent 1990)
σd = −σ0calc − σβcalc
(27)
and hence, the ψd can be obtained directly from (26). In other words, only ψ0 and
ψβ are subjected to optimization via (23), and, thus, only two dummy components
Math Geosci (2010) 42: 101–127
119
Fig. 6 Evaluation strategy to select best chromosomes encoding the values of extensive (Matrix-A) and
intensive (Matrix-B) variables for multilayer electrostatic models
are required in Matrix A. Additional dummy components would be required by more
sophisticated multilayer electrostatic models such as the Four-Layer Model (Charmas
1999).
One advantage of this treatment is that the best potential computed at each iteration, the one that best minimizes (23), can be tested against inequality constraints
defined by physically realistic boundary values for each electrostatic plane (Dzombak
and Morel 1990; Davis and Kent 1990). This operation serves to determine whether
the calculated potentials are realistic and, thus, can be used in the next iteration; otherwise the old value is retained for evaluation in the next generation. The performance
of the GA is improved by interrupting the optimization of the extensive variables
once the above-mentioned constraints are achieved. In other words, hereafter, only
Matrix-B will be subject to evolution, via (11) using, in the first subsequent iteration,
the realistic set of potential values that best minimizes (23) but using the potentials
computed from (24)–(27) at each subsequent iteration. The strategy employed for
each matrix is schematically represented in Fig. 6.
To further improve the performance (i.e., speed and computational requirements)
of the GA optimization in this type of application, an additional implementation must
be made. Given a chromosome population ≥100 and after a sufficient number of generations (i.e., n > 100), the GA optimization should have reached a suitable level of
numerical viability and can be interrupted even if the data fit is inadequate. The optimized intensive variables are best values and can be used to quickly verify whether
they can successfully simulate the data upon solution of the adsorption equilibrium
forward problem upon adjustment of extensive variables by a derivative-based nu-
120
Math Geosci (2010) 42: 101–127
Fig. 7 Data fits of proton adsorption data (goethite in a 0.01 M NaCl solution) obtained with intrinsic
constants returned by the GA-NR optimization and FITEQL within the framework of the TLM
merical technique. The implementation of the Newton–Raphson (NR) technique for
these types of problems is straightforward and was explained in detail elsewhere
(e.g., Papelis et al. 1988). In other words, evolutionary optimization can be switched
to a subroutine that calculates the Jacobians corresponding to all chemical component(s) and electrostatic term(s). Once the Jacobians are obtained (Westall 1979),
the matrix is solved by the Gauss-elimination procedure (Nicholson 1995), and the
iterative procedure carries on until the residual meets a prefixed tolerance level (Sahai and Sverjensky 1998). This strategy exploits the advantages of the GA to optimize the intensive variables (with preadjusted extensive variables), which, in combination with deterministically derived potentials, can successfully simulate titration
data. The Newton–Raphson technique is used to fine-tune the estimation of the potentials, ensure minimization of (23) at the end of the iterative process, and reduce
the computational time. If required, additional GA-NR micro-iterations can be implemented to improve the quality of the fit and further refine the intensive variable
estimates.
This hybrid GA-NR optimization technique was successfully tested against published surface titration data for goethite in 0.01 M NaCl solutions (Villalobos and
Leckie 2001). The authors originally modeled their data using the TLM (Fig. 7).
In this formalism, seven intensive variables must be adjusted: two ionization constants (Ka1 and Ka2 , describing reactions (14) and (15)), two background electrolyte
int
int ) describing, respectively, the following equiadsorption constants (Kcation
and Kanion
libria
Math Geosci (2010) 42: 101–127
≡FeOH0 + Na+ ⇔ ≡ FeO− · · · Na+ + H+ ,
−
≡FeOH0 + Cl− + H+ ⇔ ≡FeOH+
2 · · · Cl
121
(28)
(29)
two capacitance values (C1 and C2 ), and the total concentration of primary surface
sites (TOTsite ).
Because FITEQL convergence is not guaranteed when several parameters are adjusted, particularly when interdependent reactions (reactions (14), (15), (28), and (29)
are pH-dependent) are fitted simultaneously (Hayes et al. 1991), some criteria are
commonly applied to preselect either the ionization or electrolyte adsorption constants before proceeding with the optimization of all other fitting parameters. Accordingly, Villalobos and Leckie (2001) selected fixed values of Ka1 and Ka2 (in addition
int
int
to preselected values of C1 , C2 , and TOTsite ) and optimized Kcation
and Kanion
to fit
their data (although C1 was varied manually at each FITEQL run). In contrast, the
hybrid GA-NR technique allows for the simultaneous optimization of all seven intensive variables invoked by the TLM and the successful simulation of titration data, as
shown in Fig. 7.
5.4 Adequacy of GA-based Exhaustive Sampling and Model Evaluation
It has been stressed that an a posteriori analysis of nonuniqueness and uncertainty
of solutions to inverse problems (resolution analysis) is necessary to assess scientific conclusions based on inverse modeling (Mosegaard 1998). Whereas this is an
important step for inverse problems involving a large number of ill-defined, poorly
constrained parameters, in general, it is not crucial for the small inverse problems
presented in our study since the selected SCM parameters are physically meaningful and reasonably well-constrained. Hence, unless the available data are insufficient
(fewer data points that unknowns) or unsuitable (high uncertainty) to quantitatively
resolve the contribution of each adsorption reaction, exhaustive sampling techniques
such as GA optimization can be safely applied to problems of this size (Mosegaard
1998).
To illustrate, the reliability of GAs in exhaustive sampling applications, and in
analogy to the procedures applied in earlier studies (e.g., Fernández Alvarez et al.
2008; Armstrong and Hibbert 2009; Hibbert and Armstrong 2009), we evaluated the
behavior of the sample statistics derived from numerous GA optimizations of the inverse problem presented in Sect. 5.1 and compared it with the theoretical (Gaussian)
posterior probability distribution function (pdf, Armstrong and Hibbert 2009) using
experimental uncertainties typically associated with adsorption data (Hayes et al.
1991). In Fig. 8, we show how the GA visited solutions reasonably match the expected posterior pdf for one key SCM parameter, pK = | log10 Ka2 + log10 Ka1 |.
Most noteworthy is that the highest frequencies of the GA visited solutions are centered along the Gaussian curve, slightly exceeding the posterior values, whereas the
lowest frequencies are approximately evenly distributed at both sides of the maximum value. This exercise demonstrates that, given a sound hypothesis (set of adsorption reactions and appropriate electrostatic model), the binary tournament operator in combination with the crossover and mutation probabilities selected in our
GA optimizations consistently converge towards the solution within a restricted tolerance.
122
Math Geosci (2010) 42: 101–127
Fig. 8 Histogram of GA visited solutions (crossover and mutation probabilities of 0.1) and posterior
probability distribution function of the pK parameter (pK = | log10 Ka2 + log10 Ka1 |), given a priori
information (I), describing proton adsorption data of goethite in a 0.7 M NaCl solution in terms of reactions
(14) and (15)
For more complex adsorption equilibrium inverse problems that may be equivalently solved by several SCM models (i.e., multiple hypotheses) and for which a
priori information on the physical viability of the selected adsorption reactions and
the values of their associated intrinsic formation constants is unavailable, a sensitivity
analysis should be conducted to choose the best solution among the possible options
(several SCMs fitting the data reasonably well) and prevent over-interpretation of the
data (e.g., Poeter et al. 2005). This is typically done by gradually incorporating adsorption reactions to the simplest physically meaningful SCM and by testing its ability to reproduce the experimental data. We performed this analysis in a recent study
(Villegas-Jiménez et al. 2009), where, based upon fundamental chemistry concepts,
several SCMs were formulated and alternatively calibrated via GA optimization using relatively large experimental data sets for two rhombohedral carbonate minerals:
magnesite and dolomite (Pokrovsky et al. 1999a, 1999b). Whereas more than one
SCM fitted the data within the same tolerance (as confirmed by ANOVA tests), only
one reasonably complied with all other a posteriori criteria (theoretical constraints
and alternate experimental data) imposed for model validation. As emphasized earlier, we favor model parsimony.
Math Geosci (2010) 42: 101–127
123
6 Concluding Remarks
A powerful evolutionary optimization technique, the genetic algorithm, was applied
to the calibration of SCMs for mineral surfaces. This technique was successfully
tested for the inverse modeling of adsorption equilibria under several scenarios of
varying degrees of complexity using published and synthetic data.
The GA performs an exhaustive probabilistic search in a broad solution space that
is constrained by physically realistic values selected by the modeler. Given suitable
combinations of geochemical (set of adsorption reactions) and electrostatic (charge
distribution at the adsorbent–water interface) models and sufficient quality experimental data, the GA can successfully optimize numerous parameters without incurring convergence problems while achieving good numerical stability, a notable advantage over conventional deterministic, root-finding, and optimization techniques
implemented in popular inverse modeling geochemical codes such as FITEQL. At
the modeler’s discretion, multiple intrinsic ionization and adsorption constants, capacitance(s), and/or primary surface site concentrations can be simultaneously fitted
by the GA. Nevertheless, an a posteriori theoretical evaluation of the fitted SCM parameters (based on physical–chemical concepts and alternate experimental evidence)
must be made to ascertain their thermodynamic meaningfulness and confirm their
relevance within the SCM. This aspect is key for the construct of sound, yet parsimonious, SCMs.
The drawback of the GA approach is the large number of iterations that must be
performed to obtain a satisfactory solution, particularly for cases involving numerous fitting parameters. Alternatively, the power of the GA can be more efficiently
exploited when a deterministic technique such as Newton–Raphson is incorporated
once the chromosome population has reached a suitable level of numerical viability. Consequently, we propose the use of a hybrid GA-NR approach for the efficient
optimization of intrinsic constants in complex problems such as those involving the
TLM.
We believe that the computational requirements of the GA to the calibration of
SCMs are greatly outweighed by its robustness, simplicity, and potential to generate
a reliable thermodynamic database describing adsorption reactions. This is a critical
step to making reliable predictions of the adsorption behavior of minerals and geologically based adsorbents in hydro-geological settings (e.g., aquifers, pore waters,
water basins) and/or in engineered reactors (e.g., mining and wastewater treatment industries). Future work to test and develop more sophisticated adaptive strategies and
hybrid heuristic–deterministic optimization approaches may improve the GA performance in this type of applications.
Acknowledgements A.V.-J. thanks Dr. David Burns for critical discussions that inspired this investigation and EMEA S.C. Environmental Consulting Firm for providing suitable computer facilities to complete
this work. The insightful comments of two anonymous reviewers and the critical remarks of Dr. Johannes
Lützenkirchen have improved the quality of this paper. This research was supported by a graduate research student grant to A.V.-J. from the Geological Society of America (GSA) and Natural Sciences and
Engineering Research Council of Canada (NSERC) Discovery grants to AM. A.V.-J. benefited from postgraduate scholarships from the Consejo Nacional de Ciencia y Tecnología (CONACyT) of Mexico and
additional financial support from the Department of Earth and Planetary Sciences, McGill University and
Consorcio Mexicano Flotus-Nanuk.
124
Math Geosci (2010) 42: 101–127
Appendix: Experimental Data Used in GA Optimizations
(Gedanken Experiment)
The synthetic data used for the optimization of intrinsic ionization and adsorption
constants in the gedanken experiment problem presented in our study are provided in
the Appendix.
Acidimetric Titration Data. Sorbent: MeX(s) ; Sorbates: H+ and Me2+ ; Solution: I = 0.001 NaCl; Site
Density: 2.2 · 10−6 moles m−2 ; Specific Surface Area: 0.232 m2 g−1 ; Mass/Volume Ratio: 48.4 g L−1
PH
[H+ ]ads
[Me2+ ]ads
Mesolution
mol L−1
10.17
10.14
10.11
10.08
10.06
10.02
9.98
9.94
9.89
9.85
9.79
9.73
9.67
9.59
9.51
9.40
9.27
9.10
8.86
8.62
8.48
−2.17E-05
−2.15E-05
−2.14E-05
−2.12E-05
−2.11E-05
−2.09E-05
−2.07E-05
−2.04E-05
−2.01E-05
6.74E-08
1.47E-07
8.25E-08
1.81E-07
1.02E-07
2.25E-07
1.23E-07
2.74E-07
1.42E-07
3.20E-07
1.77E-07
4.04E-07
2.22E-07
5.11E-07
2.81E-07
6.57E-07
3.55E-07
8.43E-07
4.47E-07
1.08E-06
−1.93E-05
5.66E-07
1.40E-06
7.25E-07
1.84E-06
−1.82E-05
9.17E-07
2.40E-06
1.15E-06
3.16E-06
−1.97E-05
−1.88E-05
−1.74E-05
−1.65E-05
−1.51E-05
−1.34E-05
−1.11E-05
−7.23E-06
−3.22E-06
1.44E-06
4.15E-06
1.77E-06
5.50E-06
2.10E-06
7.27E-06
2.34E-06
9.37E-06
2.30E-06
1.20E-05
1.97E-06
1.43E-05
1.68E-06
1.57E-05
1.35E-06
1.45E-06
1.69E-05
8.19
4.10E-06
1.11E-06
1.84E-05
8.07
6.17E-06
8.91E-07
1.96E-05
8.35
−6.82E-07
7.90
9.12E-06
6.21E-07
2.13E-05
7.77
1.13E-05
4.56E-07
2.30E-05
7.63
1.37E-05
3.18E-07
2.46E-05
7.49
1.61E-05
2.20E-07
2.65E-05
7.36
1.76E-05
1.40E-07
2.83E-05
7.23
1.92E-05
8.76E-08
3.02E-05
7.09
2.08E-05
4.49E-08
3.31 E-05
6.94
2.18E-05
2.53E-08
3.56E-05
Math Geosci (2010) 42: 101–127
125
6.80
2.25E-05
1.53E-08
6.60
2.32E-05
<1E-9
3.81 E-05
4.17E-05
6.48
2.36E-05
<1E-9
4.39E-05
6.30
2.41E-05
<1E-9
4.76E-05
6.17
2.44E-05
<1E-9
5.05E-05
5.99
2.48E-05
<1E-9
5.46E-05
5.85
2.50E-05
<1E-9
5.78E-05
5.68
2.52E-05
<1E-9
6.21 E-05
5.54
2.54E-05
<1E-9
6.59E-05
5.40
2.55E-05
<1E-9
6.96E-05
5.27
2.57E-05
<1E-9
7.33E-05
5.15
2.58E-05
<1E-9
7.71 E-05
5.03
2.59E-05
<1E-9
8.09E-05
References
Allison JD, Brown DS, Novo-Gradac KJ (1991) MINTEQA2/PRODEFA2, a geochemical assessment
model for environmental systems: version 3.0. User’s manual. Environmental Research Laboratory,
Office of Research and Development, US Environmental Protection Agency, Athens, GA, 106
Armstrong N, Hibbert DB (2009) An introduction to Bayesian methods for analyzing chemistry data Part 1:
An introduction to Bayesian theory and methods. Chemom Intell Lab Syst 97:194–210
Ball JW, Jenne EA, Norstrom DK (1981) WATEQ2—A computerized chemical model for trace and major
element speciation and mineral equilibrium of natural waters. In: Jenne EA (ed) Chemical modeling
in aqueous systems. Symposium series, vol 93. American Chemical Society, Washington, pp 815–
836
Charmas R (1999) Four-layer complexation model for ion adsorption at energetically heterogeneous metal
oxide/electrolyte interfaces. Langmuir 15(17):5635–5648
Davis JA, Kent DB (1990) Surface complexation modeling in aqueous geochemistry. In: Hochella MF,
White AF (eds) Mineral-water interface geochemistry. Rev mineral, vol 23. Mineral Soc, Washington, pp 177–260
Davis JA, James RO, Leckie JO (1978) Surface ionization and complexation at the oxide/water interface:
I. Computation of electrical double layer properties in simple electrolytes. J Colloid Interface Sci
63(3):480–499
Dzombak DA, Morel FMM (1990) Surface complexation modeling. Wiley, New York
Epperson JF (2002) An introduction to numerical methods and analysis. Wiley, New York
Essaid HI, Cozzarelli IM, Eganhouse RP, Herkelrath WN, Bekins BA, Delin GN (2003) Inverse modeling
of BTEX dissolution and biodegradation at the Bemidji, MN crude-oil spill site. J Contam Hydrol
67(1):269–299
Fernández Alvarez JP, Fernández Martínez JL, Menéndez Pérez CO (2008) Feasibility analysis of the
use of binary genetic algorithms as importance samplers application to 1-D DC resistivity inverse
problem. Math Geosci 40:375–408
Gans P (1976) Numerical methods for data-fitting problems. Coord Chem Rev 19:99–124
Gao Y, Mucci A (2001) Acid base reactions, phosphate and arsenate complexation, and their competitive
adsorption at the surface of goethite in 0.7 M NaCl solution. Geochim Cosmochim Acta 65:2361–
2378
Gen M, Cheng R (1997) Genetic algorithms and engineering design. Wiley, New York
Gen M, Cheng R (2000) Genetic algorithms and engineering optimization. Wiley, New York
Goldberg D (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley,
Reading
Goldberg D, Korb K, Deb K (1989) Messy genetic algorithms: Motivation, analysis and first results. Complex Syst 3:493–530
126
Math Geosci (2010) 42: 101–127
Hayes KF, Redden G, Ela W, Leckie JO (1991) Surface complexation models: An evaluation of model
parameter estimation using FITEQL and oxide mineral titration. J Colloid Interface Sci 142(2):448–
469
Herbelin A, Westall J (1996) FITEQL—A computer program for determination of chemical equilibrium
constants from experimental data version 3.2: User’s manual. Department of Chemistry, Oregon State
University, Corvallis, OR, Report 96-01
Hibbert DB, Armstrong N (2009) An introduction to Bayesian methods for analyzing chemistry data.
Part II: A review of applications of Bayesian methods in chemistry. Chemom Intell Lab Syst 97:211–
220
Hiemstra T, Yong H, van Riemsdijk WH (1999) Interfacial charging phenomena of aluminum
(hydr)oxides. Langmuir 15(18):5942–5955
Hohl H, Stumm W (1976) Interaction of Pb2+ with hydrous γ -Al2 O3 . J Colloid Interface Sci 55:281–288
Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Ingri N, Kakolowicz W, Sillén LG, Warnqvist B (1967) High-speed computers as a supplement to graphical methods-V* HALTAFALL, A general program for calculating the composition of equilibrium
mixtures. Talanta 14:1261–1286
Jakeman AJ, Letcher RA, Norton JP (2006) Ten iterative steps in development and evaluation of environmental models. Environ Model Softw 21:602–614
Keizer MG, van Riemsdijk WH (1999) ECOSAT. Technical Report of the departments of Soil Science and
Plant Nutrition, Wageningen University, Wageningen, The Netherlands
Keller B, Lutz R (1997) A new crossover operator for rapid function optimization using a genetic algorithm. In: Proceedings of the eighth Ireland conference on artificial intelligence (AI-97), vol 2,
pp 48–57
Kinniburgh DG (1999) FIT user guide. British Geological Survey, Keyworth, WD/93/23
Langmuir D (1997) Aqueous environmental geochemistry. Prentice Hall, Upper Saddle River
Lützenkirchen J (2003) Surface complexation models for adsorption: A critical survey in the context of
experimental data. In: Tóth J (ed) Adsorption: Theory, modeling, and analysis. Surfactant science
series, vol 107. Dekker, New York, pp 631–710
Malinverno A (2000) A Bayesian criterion for simplicity in inverse problem parametrization. Geophys J
Int 140:267–285
Matott LS, Rabideau AJ (2008) ISOFIT—A program for fitting sorption isotherms to experimental data.
Environ Model Softw 23:670–676
Mestres J, Scuseira GE (1995) Genetic algorithms, a robust scheme for geometry optimizations and global
minimum structure problems. J Comput Chem 16(6):729–742
Michalewicz Z (1996) Genetic algorithms + data structure = evolution programs, 2nd edn. Springer, New
York
Morel FMM, Hering JG (1993) Principles and applications of aquatic chemistry. Wiley, New York
Morel FMM, Morgan J (1972) A numerical method for computing equilibria in aqueous chemical systems.
Environ Sci Technol 6:58–87
Mosegaard K (1998) Resolution analysis of general inverse problems through inverse Monte Carlo sampling. Inverse Probl 14:405–426
Mosegaard K, Sambridge M (2002) Monte Carlo analysis of inverse problems. Inverse Probl 18:29–54
Nicholson K (1995) Linear algebra with applications. PWS, Boston
NIST (1998) Critically selected stability constants of metal complexes electronic database. National Institute of Standards and Technology, reference database 46, US Department of Commerce, Gaithersburg, MD, USA
Papelis C, Hayes KF, Leckie JO (1988) HYDRAQL: A program for the computation of chemical equilibrium, composition of aqueous batch systems including surface-complexation modeling of ion adsorption at the oxide/solution interface. Report No 306, Stanford University, Stanford, CA
Parkhurst DL (1995) User’s guide to PHREEQC. US Geological Survey, Water Resources Investigations
Report 95-4277, 212
Poeter EP, Hill MC, Banta ER, Mehl S, Christensen S (2005) UCODE 2005 and six other computer codes
for universal sensitivity analysis, calibration, and uncertainty evaluation. US Geological Survey Techniques and Methods 6-A11
Pokrovsky OS, Schott J, Thomas F (1999a) Processes at the magnesium-bearing carbonates/solution interface. I. A surface speciation model for magnesite. Geochim Cosmochim Acta 63(6):863–880
Pokrovsky OS, Schott J, Thomas F (1999b) Dolomite surface speciation and reactivity in aquatic systems.
Geochim Cosmochim Acta 63(19/20):3133–3143
Math Geosci (2010) 42: 101–127
127
Sahai N, Sverjensky DA (1998) GEOSURF: A computer program for modeling adsorption on mineral
surfaces from aqueous solution. Comput Geosci 24(9):853–873
Sait SM, Youssef H (1999) Iterative computer algorithms with applications in engineering solving combinatorial optimization problems. IEEE Computer Society, Los Alamitos
Schecher WD, McAvoy DC (1992) MINEQL+: A software environment for chemical equilibrium modeling. Comput Environ Urban Syst 16:65–76
Schindler PW, Kamber HR (1968) Die acidität von silanolgruppen. Helv Chim Acta 51:1781–1786
Sposito G, Coves J (1988) SOILCHEM: A computer program for the calculation of chemical speciation
in soils. Keamey Found Soil Sci, Univ California, Riverside
Stumm W, Morgan J (1996) Aquatic chemistry: Chemical equilibria and rates in natural waters. Wiley,
New York
Tipping E (1994) WHAM—A chemical equilibrium model and computer code for water, sediments and
soils incorporating a discrete site/electrostatic model of ion-binding by humic substances. Comput
Geosci 20:973–1023
Turner BF, Fein JB (2006) Protofit: A program for determining surface protonation constants from titration
data. Comput Geosci 32:1344–1356
van der Lee J, de Windt L (1999) CHESS tutorial and cookbook. Technical report No LHM/RD/99/05,
École des Mines de Paris, Fontainebleu, France
Villalobos M, Leckie JO (2001) Surface complexation modeling and FTIR study of carbonate adsorption
to goethite. J Colloid Interface Sci 235:15–32
Villegas-Jiménez A, Mucci A, Pokrovsky OS, Schott J (2009) Defining reactive sites on hydrated mineral
surfaces: Rhombohedral carbonate minerals. Geochim Cosmochim Acta 73:4326–4345
Westall JC (1979) MICROQL II: Computation of adsorption equilibria in BASIC. Technical Report, Swiss
Federal Institute of Technology, EAWAG, 8600 Dübendorf, Switzerland
Westall JC (1982) FITEQL: A computer program for determination of chemical equilibrium constants
from experimental data. Version 1.2. Report 82-01, Department of Chemistry, Oregon State University, Corvallis, OR, USA
Westall J, Hohl H (1980) A comparison of electrostatic models for the oxide/solution interface. Adv Colloid Interface Sci 12:265–294
Wolery TJ (1992) EQ3NR: A computer program for geochemical aqueous speciation-solubility calculations: Theoretical manual, user’s guide and related documentation (Version 7.0). Report No UCRLMA-110662-PT-III, Lawrence Livermore National Laboratory, Livermore, CA
Yates DE, Levine S, Healy TW (1974) Site-binding model of the electrical double layer at the oxide-water
interface. J Chem Soc Faraday Trans 1 70:1807–1818
Zeleznik FJ, Gordon S (1968) Calculation of complex chemical equilibria. Ind Eng Chem 60:27–57