[go: up one dir, main page]

Academia.eduAcademia.edu
Math Geosci (2010) 42: 101–127 DOI 10.1007/s11004-009-9259-8 Estimating Intrinsic Formation Constants of Mineral Surface Species Using a Genetic Algorithm Adrián Villegas-Jiménez · Alfonso Mucci Received: 1 April 2007 / Accepted: 21 November 2009 / Published online: 10 December 2009 © International Association for Mathematical Geosciences 2009 Abstract The application of a powerful evolutionary optimization technique for the estimation of intrinsic formation constants describing geologically relevant adsorption reactions at mineral surfaces is introduced. We illustrate the optimization power of a simple Genetic Algorithm (GA) for forward (aqueous chemical speciation calculations) and inverse (calibration of Surface Complexation Models, SCMs) modeling problems of varying degrees of complexity, including problems where conventional deterministic derivative-based root-finding techniques such as Newton–Raphson, implemented in popular programs such as FITEQL, fail to converge or yield poor data fits upon convergence. Subject to sound a priori physical–chemical constraints, adequate solution encoding schemes, and simple GA operators, the GA conducts an exhaustive probabilistic search in a broad solution space and finds a suitable solution regardless of the input values and without requiring sophisticated GA implementations (e.g., advanced GA operators, parallel genetic programming). The drawback of the GA approach is the large number of iterations that must be performed to obtain a satisfactory solution. Nevertheless, for computationally demanding problems, the efficiency of the optimization can be greatly improved by combining heuristic GA optimization with the Newton–Raphson approach to exploit the power of deterministic techniques after the evolutionary-driven set of potential solutions has reached a suitable level of numerical viability. Despite the computational requirements of the GA, its robustness, flexibility, and simplicity make it a very powerful, alternative tool for the calibration of SCMs, a critical step in the generation of a reliable thermodynamic database describing adsorption equilibria. The latter is fundamental to the forward modeling of the adsorption behavior of minerals and geologically based adsorbents in hydro- A. Villegas-Jiménez () · A. Mucci Earth and Planetary Sciences, McGill University, 3450 University Street, Montréal, QC H3A 2A7 Canada e-mail: adriano@eps.mcgill.ca 102 Math Geosci (2010) 42: 101–127 geological settings (e.g., aquifers, pore waters, water basins) and/or in engineered reactors (e.g., mining, hazardous waste disposal industries). Keywords Evolutionary programming · Heuristic optimization · Inverse modeling · Surface complexation modeling and calibration · Adsorption equilibria 1 Introduction The quantitative characterization of the sorptive properties of minerals and geologically based adsorbents (geo-adsorbents) is key to the understanding of natural geochemical processes (e.g., solute mobility/sequestration) in hydro-geological settings (e.g., aquifers, pore waters, water basins) and to the optimization of multiple engineering processes (e.g., mining, hazardous waste disposal) and wastewater treatment technologies. Among the approaches devised for the quantitative description of adsorption equilibria (e.g., isotherm equations, partition coefficients), Surface Complexation Models (SCMs) represent, at present, the most geochemically sound and powerful theoretical framework for the prediction of adsorption equilibria. Detailed descriptions of SCMs can be found in most modern aquatic chemistry/geochemistry textbooks (e.g., Morel and Hering 1993; Stumm and Morgan 1996; Langmuir 1997). In the last few decades, a number of computer programs have been developed and successfully validated to perform SCM-based routine calculations of adsorption equilibria in heterogeneous systems involving aqueous and adsorbent phases (forward modeling): MICROQL II (Westall 1979); WATEQ (Ball et al. 1981); HYDRAQL (Papelis et al. 1988); SOILCHEM (Sposito and Coves 1988); MINTEQA2/PRODEFA2 (Allison et al. 1991); MINEQL+ (Schecher and McAvoy 1992); EQ3NR (Wolery 1992); WHAM (Tipping 1994); PHREEQC (Parkhurst 1995); GEOSURF (Sahai and Sverjensky 1998); CHESS (van der Lee and de Windt 1999), and ECOSAT (Keizer and van Riemsdijk 1999). These computer codes were preceded by the program HALTAFALL, developed by a group of Swedish chemists (Ingri et al. 1967). The latter may have been the first computer code devised for the computation of the equilibrium concentrations of chemical species in a fluid (liquid or gaseous) phase in the presence of one or several solid phases and, hence, laid the foundations of all modern geochemical equilibrium codes. In general, the solution to adsorption equilibrium problems can be achieved by two equivalent approaches: (i) the Gibbs Free Energy Minimization of the system (GEM) and (ii) the application of Laws of Mass Action (LMA) and mass balance constraints where chemical species concentrations are relaxed until solution of the derived set of nonlinear equations (see Zeleznik and Gordon 1968, for a review of both methods). This latter approach was exploited by Morel and Morgan (1972) nearly four decades ago to develop a derivative-based iterative numerical procedure for the solution of aqueous chemical speciation in homogeneous and heterogeneous systems (forward problem) that was later extended to the computation of adsorption equilibria (Westall 1979). This method typically computes a correct and unique solution, provided that the geochemical equilibrium problem is mathematically defined in terms of adequate chemical components (Tableau approach, see Morel and Hering 1993) and the set Math Geosci (2010) 42: 101–127 103 of values initializing the iterative procedure is wisely chosen to prevent convergence problems (Westall 1979). Most modern forward adsorption modeling computer codes are based upon the LMA-Tableau approach. Similarly, computer codes may be adapted for the calibration of Surface Complexation Models (SCMs) that typically involve the estimation of SCM parameters such as intrinsic formation constant(s) [K int’s ], capacitance(s) [C], and/or total concentrations of primary surface sites [TOTsite ] from experimental sorption (i.e., adsorption/desorption) data (inverse modeling). For example, FITEQL (Herbelin and Westall 1996) is a derivative-based nonlinear least squares optimization program (also based upon the LMA-Tableau approach) that is commonly used to obtain best estimates of intrinsic formation constants of mineral surface species using data from batch or titration adsorption experiments. As most modern chemical and geochemical equilibrium programs, FITEQL shows few convergence problems, provided that the number of adjustable parameters is not particularly large (especially when an extensive data set is not available) and the adjustable SCM parameters are not strongly correlated (Herbelin and Westall 1996). Other inverse modeling codes with similar applications either make use of deterministic root-finding approaches (ECOSAT-FIT, Kinniburgh 1999), heuristic direct search minimization techniques (Turner and Fein 2006), or hybrid optimization schemes that combine heuristic Particle Swarm Optimization with deterministic Levenberg–Marquardt nonlinear regression (ISOFIT, Matott and Rabideau 2008). The latter two were devised for specific inverse modeling applications: Protofit estimates proton adsorption/desorption intrinsic (surface complexation) constants using a proton buffering function, whereas ISOFIT fits conditional constants to adsorption isotherms, in contrast to FITEQL and ECOSAT-FIT that can extract intrinsic constants involving any type of sorbate(s) (in addition to protons) within the framework of surface complexation theory. Recently, the nonlinear regression computer code UCODE-2005, typically used for the calibration of water flow and contaminant transport models (Poeter et al. 2005), was coupled with PHREEQC and ECOSAT forward modeling codes and used in inverse problem applications (i.e., calibration of SCMs). Despite the usefulness of the aforementioned computer codes, it is well known that derivative-based and simple hill-climbing numerical techniques are local in scope and are often plagued by numerical instability and convergence problems, particularly for nondifferentiable, discontinuous, and underdetermined (i.e., more unknowns than data points) functions. Furthermore, nonlinear regression techniques are susceptible to excessive parameter correlation (Essaid et al. 2003). Consequently, these techniques may provide solutions close to the initial “guess” values, possibly a local well from which the solver may not be able to emerge, rather than the best solution, or they may not converge at all. Accordingly, an alternative tool that can circumvent or minimize these limitations and provide a higher flexibility in the optimization of multiple SCM parameters (including multisorbate adsorption) would be desirable. Genetic Algorithms (GAs) are efficient and robust heuristic, evolutionary, exhaustive sampling techniques that have been successfully used in a wide range of applications (e.g., Holland 1975; Goldberg 1989; Mestres and Scuseira 1995; Michalewicz 1996; Gen and Cheng 1997, 2000; Sait and Youssef 1999). Nowadays, GAs and Simulated Annealing are the preferred stochastic optimization algorithms 104 Math Geosci (2010) 42: 101–127 (Mosegaard and Sambridge 2002) and are particularly reliable for small inverse problems (Mosegaard 1998). GA optimizations are performed using probabilistic rather than deterministic rules and, thus, are especially well-suited for ill-conditioned, nonsmooth, discontinuous problems (Fernández Alvarez et al. 2008) and perform well irrespective of the number of data points or the error associated with the data. This contrasts with other conventional root-finding methods such as Newton–Raphson and Quasi-Newton that require: (i) calculation of the local gradient, (ii) a reasonably wellbehaved (smooth) objective function with reasonably separated roots, and (iii) extensive data sets (Gans 1976; Epperson 2002). In this paper, we examine the application of a simple GA to the solution of adsorption equilibrium inverse problems of varying degrees of complexity. We first verify the ability of GAs to solve several forward aqueous speciation problems subjected to identical thermodynamic, mass, and charge balance constraints to those of the inverse problems. We then test the performance of the GA on several inverse problems requiring the optimization of multiple SCM parameters and compare the results against those returned by FITEQL, the most frequently used inverse modeling speciation code for SCM calibration. 2 Definition of the Adsorption Equilibrium Problem Relevant definitions and elaborated explanations on adsorption equilibrium problems in aqueous solutions can be found in the textbooks of Dzombak and Morel (1990), Morel and Hering (1993), and Stumm and Morgan (1996). The generalized relationship defining the adsorption equilibrium inverse problem is given by (Malinverno 2000) d = Gm + e, (1) m = Aθ , (2) where d stands for the vector (single sorbate) or matrix (D, for multiple sorbates) of experimental adsorption data (adsorption densities), m is the vector (for a single chemical component) or matrix (M, for multiple chemical components) of measured data (free concentrations of chemical components in aqueous solution), e is the vector or matrix (E, for multiple chemical components) of measurement errors, θ is the vector of unknown SCM parameters (e.g., K int’s , C’s, TOTsite ), G is the forward modeling mass balance matrix that relates d (or D) and m (or M), and A is the mass action matrix relating θ and m (or M). In this study, we focus on the solution of inverse adsorption equilibrium problems (computation of elements of θ ) via GAbased exhaustive sampling, but, as a preliminary step to evaluate the performance of this technique in the optimization of variables or parameters varying over several orders of magnitude, we also address the solution of the aqueous equilibrium forward problem which is defined by t = H s + e, (3) s = Bω, (4) Math Geosci (2010) 42: 101–127 105 where H is the forward modeling mass balance matrix relating t (vector of total concentrations of chemical components in aqueous solution) and s (vector of concentrations of free chemical components in aqueous solution), and B is the mass action law matrix relating ω (vector of thermodynamic formation constants of aqueous species) and s. Given known values of ω and t, the elements of vector s (unknowns) can be calculated via root-finding (e.g., Newton–Raphson) or optimization (e.g., GA) techniques. 3 Implementation of the Genetic Algorithm Matlab© software (MathWorks, Inc.) was used to write the subroutines in which we incorporated a modified version of the GA originally written by Ron Shaffer from the Chemometrics Research Group of the Naval Research Laboratory (USA). Six subroutines are required: (i) EQUIL reads the input file containing all the information relevant to the definition of the sorption equilibrium problem, (ii) FITGEN defines the GA parameters, performs the binary-string encoding (see below), and initializes iterations, (iii) FITLOG decodes the potential set of solutions, performs all calculations defining the objective function, and computes the weighted squared residuals corresponding to the mass and/or charge balance equations (see later sections); finally, three additional subroutines, (iv) EVAL_GA, (v) MUT_GA, and (vi) XOVER_GA, contain the genetic operators, selection, mutation, and crossover, required by the evolutionary process. The optimization of the unknown quantities (s for the forward problem and θ for the inverse problem) is based on global minimization of selected objective functions (see below). Each unknown was encoded as a binary string within a section of the solution chromosome. The length of each section (the number of bits nj ) is proportional to the search domain and constrained to reasonable boundary values (i.e., maximum and minimum for each parameter). The length of each section is given by nj = log2 (Vj ), (5) where Vj stands for the boundary value that requires the maximum number of bits for its encoding. For instance, in the forward problem, the maximum value assigned to the free chemical component concentrations would correspond to the total analytical molar concentration (i.e., free plus complexed species), whereas the minimum is assigned an arbitrary value of 10−50 M, grossly overestimating the degree of interaction with other chemical components in the system. Accordingly, the search domain would be defined by log2 (10−50 ), which, in binary representation, corresponds to 166 bits. To shorten the string length and, thus, save computational time while maintaining an acceptable numerical precision of the adjustable parameters, Vj values were expressed as 103 times their logarithmic values. Accordingly, the modified string length (nj -ext )   (6) nj -ext = log2 log10 (Vj ) · 103  corresponds to a chromosome section of 16 bits for each chemical component. This operation requires that the decoded values (extended logarithmic units) be divided by 106 Math Geosci (2010) 42: 101–127 a factor of 103 at each generation (iteration) to be consistent with units of the objective function described below. This simple encoding scheme substantially reduces round-off and truncation error during GA optimization while keeping the chromosome size practical for GA optimization. In addition, as recognized earlier (Fernández Alvarez et al. 2008), logarithmic parameterization linearizes the correlation structure among model parameters, improving the sampling efficiency by reducing the number of rejected moves in the algorithm. All fitting parameters involved in the solution of forward and inverse problems were encoded according to this scheme. Simple stochastic genetic operators (Gen and Cheng 2000) were used in all optimization problems presented in this study. To carry out chromosome selection, the binary tournament operator was used (Goldberg et al. 1989). Only the fittest chromosome from each generation was preserved to exploit its entire numerical genotype (encoded solution) in the next generation. This operation is called elitism and was used in all the equilibrium problems described in this paper. All other selected chromosomes participate in the crossover and mutation operations to generate a transient set of solutions to the problem. In this study, we tested four crossover operators: onecut-point, two-cut-point, uniform (Gen and Cheng 2000), and the randomized and/or crossover (RAOC, Keller and Lutz 1997). This operator is essential because the robustness of GA comes from its ability to transmit information and create, after a number of generations, better fitted individuals. Hence, the search for the best individual is not blind, as in random walk procedures, but guided (Mestres and Scuseira 1995). Similarly, different types of mutation techniques can be carried out (e.g., Sait and Youssef 1999), but only the simplest type, the so-called “uniformly distributed random mutation,” was applied in this study. Low mutation probabilities, equal to the reciprocal of the length of the chromosomes (Keller and Lutz 1997), were chosen to avoid pushing the population towards unfavorable areas of the solution space. Nevertheless, higher probabilities (0.05, 0.1, and 0.15) were also tested but produced statistically identical results. In all optimizations performed in this study, all other GA parameters (population size, number of generations, and type and probability of crossover) were arbitrarily chosen for each run and were empirically optimized for each type of problem. In the following sections, we illustrate the application of a simple GA to a number of forward and inverse problems defined within the LMA-Tableau approach. The fitting strategy shown in Fig. 1 applies in all inverse problems presented here, but some adaptations were made to meet problem-specific requirements and are described below. 4 Application of the GA to the Forward Problem In this section, we verified the performance of the GA in the solution of chemical speciation (forward problem) which requires the optimization of variables (molar concentrations) varying over several orders of magnitude. For simplicity, a single phase (aqueous solution) was considered for the forward problem. We solved a number of aqueous speciation problems and compared the GA results to those returned by commercially available programs (MINEQL+, HYDRAQL, and WHAM). As explained above, the forward problem consists of obtaining the elements of vector s Math Geosci (2010) 42: 101–127 107 Fig. 1 Generalized fitting strategy to perform adsorption equilibrium inverse modeling using a genetic algorithm (see (3) and (4)). Hence, the GA searches the solution that best minimizes the total sum of residuals (Y ) between the total experimental molar concentrations (free plus complexed, vector t) specified by the modeler and those estimated from the mass balance equations of all chemical components in the system as defined by Herbelin 108 Math Geosci (2010) 42: 101–127 and Westall (1996) Y=  n m   j =1 i=1 v(i, j )ci − tj 2 , (7) where the first term inside the brackets represents the calculated molar concentration of the j th chemical component, n is the number of species (aqueous) derived from chemical component j th, v is the stoichiometric coefficient for the j th chemical component describing the formation of the ith aqueous species, ci is the molar concentration of the ith aqueous species produced by chemical component j th, tj is the total concentration of component j , and m is the number of chemical components (Morel and Morgan 1972; Westall 1979). Large differences in the experimental concentrations of the chemical components may bias the optimization because of the weight carried by the individual residuals (Rj , term in brackets in (7)). Consequently, these residuals were normalized (Rj ′ ) as follows  1 . (8) Rj ′ = exp(Rj ) + | log10 (Rj )| In general, for problems with 10 chemical components or less, the GA returned a suitable solution after 100 generations, using a population size of 100 and either the one-point (Gen and Cheng 2000) or the RAOC crossover strategy (Keller and Lutz 1997) at 10% of crossover probability. For this type of applications, both crossover operators appeared to outperform the two-cut-point and uniform operators both in terms of speed and ability to locate the best solution in the search space. The GApredicted concentrations of chemical components for three equilibrium problems (involving 4, 6, and 10 chemical components and 7, 32, and 39 chemical species, respectively) were nearly identical (RSD ≤ 0.03%) to those obtained using HYDRAQL, WHAM, and MINEQL+ for ionic strengths ≤0.001 M. This exercise confirmed the efficiency of the GA in dealing with optimization problems with numerous adjustable parameters varying over several orders of magnitude and subjected to identical constraints to those of the inverse adsorption problem that we wish to address via stochastic optimization. Hereafter our discussion will be exclusively focused on the solution of inverse problems. 5 Application of the GA to the Inverse Problem 5.1 Estimation of Intrinsic Ionization Constants: Constant Capacitance Model Our main objective is to implement a reliable and flexible approach to address specific inverse problems that cannot be easily handled by conventional, deterministic, derivative-based, root-finding techniques implemented in popular SCM calibration programs such as FITEQL. Because a reasonable a priori knowledge is available (physical–chemical constraints, geochemistry of the adsorbent phase, etc.), the viability of conceptual adsorption reactions can be evaluated intuitively against specified criteria prior to optimization, greatly reducing the number of alternative SCMs that deserve systematic Math Geosci (2010) 42: 101–127 109 evaluation via inverse modeling. Furthermore, emphasis on conceptual and mathematical simplicity in the formulation of SCMs is paramount and must be consistent with the quantity and quality of available data. As emphasized in earlier studies (e.g., Herbelin and Westall 1996), when several SCMs fit the data equally well, the most parsimonious one is preferred unless there is compelling evidence in support of another. Models with many degrees of freedom incur serious risks, among which: (i) fitting of inconsistent or irrelevant noise in the data records; (ii) severely diminished predictive power; (iii) the generation of ill-defined, near-redundant parameter combinations; and (iv) masking of geochemically significant behavior derived from data over-fitting (Jakeman et al. 2006). Furthermore, the inverse problem should be well determined and, hence, contain more data points than adjustable parameters. Finally, a posteriori physical–chemical evaluation of the fitted SCM parameters is key to ascertain their thermodynamic relevance within the SCM. It follows that within this scheme, the calibration of SCMs, by deterministic or heuristic approaches, is properly constrained and goes well beyond a mere data fitting exercise. We illustrate the above approach by testing the GA on various adsorption equilibrium inverse problems. The first and simplest one calls for the estimation of surface ionization (i.e., proton adsorption/desorption) constants from proton adsorption data at a mineral surface. For a generic hydrated (primary) surface site or adsorption center (e.g., ≡S·H2 O), these reactions can be generalized as follows (Davis and Kent 1990) ≡S·H2 O0 + H+ ⇔ ≡S·H3 O+ ≡S·H2 O0 ⇔ ≡S·OH− + H+ (protonation), (9a) (deprotonation). (9b) Each of the above reactions is governed by an intrinsic ionization constant whose estimation represents the essence of the inverse problem (calculation of elements of θ ). The first step is to define the generalized objective function that applies to all adsorption inverse problems. This implies the computation of the residuals between the theoretical and experimental adsorption values, Yk , which, for an n number of adsorbed species (i) produced by adsorbate, k, is given by Yk = n  i=1 v(i, k)Mi − tk , (10) where Mi represents the predicted molar concentration of the ith adsorbate-bearing surface species, v is the stoichiometric coefficient for the k adsorbate describing the formation of the ith surface species, tk is the experimental adsorbed molar concentration of the adsorbate k. For each chromosome, the GA optimization is subjected to WSSE = p  r   Yk 1 1 Sk 2 , (11) where WSSE is the weighted sum of squared errors of an r number of adsorbate components computed at all titration points, p, and Sk is the input error specified for 110 Math Geosci (2010) 42: 101–127 Yk from the experimental errors associated with the quantitative (analytical) determination of the kth adsorbate. Equation (11) is the generalized objective function used in the optimization of intrinsic formation constants (ionization and adsorption) when suitable adsorption data for all adsorbates under consideration are available. To estimate intrinsic formation constants of mineral surface species, reactions must be referenced to a zero electrical potential surface by taking into account, at each stage of the titration, of the electrostatic work required to transport ions through the interfacial electrical potential gradient according to (Dzombak and Morel 1990) e K app = K int · x=1  −ZF ψx exp , RT (12) where K app is the apparent formation constant, K int stands for the intrinsic constant, Z is the net charge transfer of the reaction, F is the Faraday constant, R is the gas constant, T is the absolute temperature, e is the number of electrostatic planes where explicit adsorption is assumed to take place, and Ψx is the electrical potential at the plane(s) “x” involved in adsorption (e.g., 0-plane, Stern-plane; Davis and Kent 1990). For brevity, the latter term will, hereafter, be referred to as potential (ascribed to a specific electrostatic plane). It is an adjustable parameter that, upon minimization of (11), must satisfy the following constraint q  F · zi [≡ Ci ] = σxelect , S ·A (13) i=1 where S is the specific surface area of the mineral (m2 g−1 ), A is the mass-volume ratio of the experimental suspension (g L−1 ), zi and [≡ Ci ] are, respectively, the charge and molar concentration of species i adsorbed at plane x, and q is the number of species contributing to the charge at plane x. The left-hand term gives the net charge density at plane x (C m−2 ) from the surface species concentrations computed at each generation, whereas the right-hand term, σxelect , represents the charge density at plane x (C m−2 ) derived from a theoretical electrostatic model describing the relationship between the surface charge and surface potential (see Davis and Kent 1990). The electrostatic correction is specified in the mathematical definition of the inverse (and forward as well) adsorption equilibrium problem, according to the method presented by Westall and Hohl (1980). An e number of dummy chemical component(s) is added to the model, corresponding to the electrostatic planes where explicit adsorption is assumed to take place according to the selected electrostatic model (see below). The GA was initially tested with data taken from Gao and Mucci (2001), who used FITEQL v. 2.0 to optimize the intrinsic ionization constants of the goethite surface in a 0.7 M NaCl solution considering the following set of ionization reactions ≡FeOH0 + H+ ⇔ ≡FeOH+ 2, 0 − ≡FeOH ⇔ ≡FeO + H + (14) (15) which are governed, respectively, by the intrinsic ionization constants Ka1 and Ka2 . Math Geosci (2010) 42: 101–127 111 To describe the electrostatics at the interface, these authors applied the Constant Capacitance Model (CCM, Schindler and Kamber 1968; Hohl and Stumm 1976), and, thus, this model was implemented in the Matlab© script to compute the value of Ψx at each generation according to the following expression (Stumm and Morgan 1996) σ0 (16) ψ0 = , C where C is the specific capacitance, and σ0 and ψ0 are the charge and potential at the surface (i.e., plane-0) calculated from electrostatics σ0elect and ψ0elect . Given that only ionization reactions were considered to take place at plane 0, experimental surface charge data are available (net proton adsorption densities are identical to surface charge densities), and, therefore, for a given C value, these data can be used to compute the surface potential at each titration point and perform the electrostatic correction using (12) and (16). Aqueous equilibrium was solved first using either the GA or the Newton–Raphson approach (implemented in an additional Matlab© subroutine) to compute the concentrations of the free chemical components in solution (computation of s, forward problem) before the K int values were optimized with the GA (inverse problem). In contrast to the strategy typically applied by FITEQL users, whereby the K int values are numerically optimized, whereas the value of C is varied manually in each optimization and the goodness-of-fit evaluated on the basis of the WSOS/DOF (“weighted sum of squares divided by the degrees of freedom”) parameter (Dzombak and Morel 1990; Herbelin and Westall 1996), the K int and C values were optimized simultaneously by the GA. Using the encoding rules described earlier, the capacitance can be added to the chromosome encoding the solution to the intrinsic ionization constants and successfully treated as a fitting parameter. Like the chemical component concentrations (see preceding section), K int values were formulated in extended logarithmic units. Conventionally, chromosome binary strings represent solutions from −25000 to 25000 (i.e., 10−25 to 1025 M or 50 orders of magnitude) that approximately cover the range of formation constants typically reported in thermodynamic compilations (e.g., NIST 1998). Nevertheless, for specific adsorption reactions, such a wide range may not necessarily represent thermodynamically meaningful quantities, and hence, based upon physical–chemical concepts, the modeler may select a reasonable solution space for each adjustable parameter (as explained above), constraining the range of potential solutions and its compatibility with the physical reality. The specific capacitance was constrained between 0.2 and 2 F/m2 , covering the range of values previously reported for oxide mineral surfaces (e.g., Hiemstra et al. 1999). SCM parameters optimized with the GA and input values of S and A are presented in Table 1 and compared to those reported by Gao and Mucci (2001). Because of the probabilistic nature of the GA optimization, several GA runs were performed using different GA parameters (e.g., population size, maximum number of generations, mutation rates, etc.), and the associated error was calculated for each adjustable quantity. It should be noted that this error is associated to the GA-optimization rather than to the experiment. True uncertainties of the optimized SCM parameters must be determined from multiple GA runs using independent replicate data sets (≥3). As shown 112 Math Geosci (2010) 42: 101–127 Table 1 Intrinsic acidity constants of the goethite surface in a 0.7 M NaCl solution returned by the GA optimization and FITEQL using the Constant Capacitance Model. Errors represent confidence intervals at 95%. Results shown using the GA approach are averaged values obtained from three optimizations using the following GA parameter sets (population size, number of generations, crossover rate, mutation rate): [10, 1000, 0.1, 0.1], [50, 500, 0.1, 0.05], and [100, 1000, 0.1, 0.02] Surface Equilibria ≡FeOH0 + H+ = ≡FeOH+ 2 ≡FeOH0 = ≡FeO− + H+ Capacitance (F/m2 ) Log10 K int Gao and Mucci (2001) This study FITEQL 2.0 GA 7.45 −9.60 1.86 7.23 ± (< 0.01)a −9.6 ± 0.04a 1.93 ± (< 0.01)a Experimental conditions: A = 27.7 m2 g−1 , S = 7.93 g L−1 Site Density: 2.96 · 10−6 moles m−2 a Reported errors are a measure of the reproducibility of the optimization itself and, thus, do not provide information on the associated experimental error in Table 1, the optimized values are very reproducible, even when the population size or number of generations varied significantly, and the mean values are close to those obtained by Gao and Mucci (2001). Nevertheless, the K int and C values that best describe the experimental data are slightly different than those obtained when fixed capacitance values are used in the optimization, as in the case of FITEQL. In other words, the GA facilitates the search of suitable K int and C values in a single optimization run, in contrast to FITEQL and most available adsorption equilibrium inverse modeling programs. Although the difference is small in this particular example, in other applications, relaxation of the capacitance(s) and minimization of round-off and truncation errors via GA optimization may impact significantly on the optimized SCM parameters. The experimental proton adsorption data and the GApredicted proton adsorption densities are shown in Fig. 2. 5.2 Estimation of Intrinsic Adsorption Constants: Constant Capacitance Model Typically, direct, experimental surface charge data are unavailable when adsorption/desorption of adsorbates other than protons occurs. Consequently, surface charge must be computed from the speciation predicted at each generation to obtain adequate estimates of the surface potential and apply the appropriate electrostatic corrections ((12) and (16) for the CCM). Within the CCM scheme, this problem can be easily circumvented by initializing the GA iterative process using a set of best guess surface charge values. This serves to preadjust the constants to reasonable estimates of surface potential (related to surface charge via an electrostatic model: (16), Westall and Hohl 1980) and leads the chromosome population towards favorable regions of the solution space. After a number of generations (iterth ), the GA will calculate surface potentials from (16) using the surface charge computed from the predicted surface speciation (left-hand term in (13)) and will estimate the electrostatic correction in subsequent generations. This procedure guarantees fulfillment of (13) from generation iterth while pushing evolution towards successful minimization of (11). Math Geosci (2010) 42: 101–127 113 Fig. 2 Data fit of proton adsorption data (goethite in a 0.7 M NaCl solution) obtained with the GA-optimized ionization constants Using published phosphate adsorption data on the goethite surface in 0.7 M NaCl solutions (Gao and Mucci 2001), we reoptimized the corresponding intrinsic adsorpint , with the GA, using an identical conceptual SCM defined by the tion constants, Kads following set of surface reactions 2− + ≡FeOH0 + H2 PO− 4 ⇔ ≡FePO4 + H + H2 O, (17) − ≡FeOH0 + H2 PO− 4 ⇔ ≡FePO4 H + H2 O, (18) + ≡FeOH0 + H2 PO− 4 + H ⇔ ≡FePO4 H2 + H2 O. (19) For this application, assigning initial zero potentials (i.e., null electrostatic corrections) to the surface allowed for a good performance of the GA in combination with a rather large maximum number of iterations (maxiter ≥ 300). Several GA runs indicated that after 50% of the prefixed maxiter (150), the numerical viability of the evolved chromosome population was suitable for the computation of surface potentials from the surface speciation (left-hand term of (13)) and (16) and for successful fitting of the data (Fig. 3). The success of the optimization conint values within the CCM formalism, in firms the ability of the GA to extract Kads the absence of surface charge data, without explicit fitting of the surface potential. int values are reported in Table 2 and compared to those obtained The optimized Kads by Gao and Mucci (2001). To see the impact of the GA optimization on the estimation 114 Math Geosci (2010) 42: 101–127 Fig. 3 Data fit of phosphate adsorption data (goethite in a 0.7 M NaCl solution) obtained with the GA-optimized adsorption constants Table 2 Intrinsic affinity constants of phosphate on the goethite surface in a 0.7 M NaCl solution returned by the GA optimization and FITEQL using the Constant Capacitance Model. Errors represent confidence intervals at 95%. Results shown using the GA approach are averaged values obtained from three optimizations using the following GA parameter sets (population size, number of generations, crossover rate, mutation rate): [80, 500, 0.1, 0.1], [100, 500, 0.1, 0.15], and [100, 1000, 0.1, 0.02] Surface Equilibria 2− + ≡FeOH0 + H2 PO− 4 = ≡FePO4 + H + H2 O − 0 − ≡FeOH + H2 PO4 = ≡FePO4 H + H2 O 0 + ≡FeOH0 + H2 PO− 4 + H = ≡FePO4 H2 + H2 O Capacitance (F/m2 ) int Log10 Kads Gao and Mucci (2001) This study FITEQL 2.0 GA 0.70 7.83 12.47 1.86 −0.23 ± 0.16a 7.68 ± 0.1a 13.02 ± 0.48a 1.86 (Fixed) Experimental conditions: A = 27.7 m2 g−1 , S = 0.234 g L−1 Site Density: 2.96 · 10−6 moles m−2 a Reported errors are a measure of the reproducibility of the optimization itself and, thus, do not provide information on the associated experimental error of phosphate adsorption constants independently of other SCM parameters, the Ka1 , Ka2 , and C values originally proposed by Gao and Mucci (2001) were used in this GA Math Geosci (2010) 42: 101–127 115 int values corresponding to reactions optimization exercise. Note that, whereas the Kads (18) and (19) optimized by the GA are, within error, very similar to those estimated by int of reaction (17) is nearly one order of magnitude FITEQL, the GA-optimized Kads lower. This discrepancy is attributable to the better numerical stability of the GA optimization. To evaluate the performance of the GA with more complex optimization problems within the CCM scheme, we simultaneously optimized ionization and adsorption intrinsic constants using synthetic data of a gedanken experiment: an acidimetric titration of a reactive mineral (MeX(s) ) displaying moderate pH-promoted dissolution. In this experiment, we considered that dissolved, lattice-constituent divalent metal ions (Me2+ ) reabsorb on the mineral surface, competing with protons for reactive surface sites and altering surface protonation and surface charge (see synthetic data in Appendix). The ionization and metal adsorption surface reactions, formulated in terms of the single-site 2-pK ionization model (Lützenkirchen 2003), are ≡MeOH0 + H+ ⇔ ≡MeOH+ 2, ≡MeOH0 ⇔ ≡MeO− + H+ , 0 ≡MeOH + Me 2+ (20) (21) + ⇔ ≡MeOMe + H + (22) pH, total dissolved Me2+ concentration (Me), total adsorbed proton ([H+ ]Ads , data set (1), and total adsorbed metal concentrations ([Me2+ ]Ads , data set (2) compose the available data at each titration point and were used in the minimization of (11) for the optimization of the intrinsic constants describing reactions (20)–(22). It is assumed that Me2+ does not form ion pairs or complexes in aqueous solution (i.e., total Me2+ concentrations are equal to free Me2+ concentrations). The chemical scenario of this gedanken experiment is similar to classical, competitive ion adsorption studies where the pH and the aqueous adsorbate(s) concentration(s) are simultaneously monitored at each experimental point. Using a population of 100 chromosomes and after 500 generations, the GA successfully reproduces all the experimental data, as shown in Fig. 4. In contrast, the simultaneous optimization of two or three of the constants (reaction (20), (21), or (22)) and the concentration of available primary surface sites (TOTsite ) with a specified capacitance (GA-optimized value) with FITEQL v. 3.2 resulted in convergence problems and no output. When only the constant describing reaction (20) was optimized, while the constants for reactions (21) and (22) were fixed at the GA-optimized values, FITEQL converged but gave very poor fits to the [Me2+ ]Ads , and [H+ ]Ads data, with a WSOS/DOF value of 19.5. Similarly, when only one intrinsic constant was estimated with FITEQL using the GA-optimized TOTsite value and after manual trial-and-error adjustment of the other two constants and the capacitance, FITEQL converged (WSOS/DOF = 2.8), and the fit to data set 1 largely improved (although the visual goodness-of-fit was still inferior to that of the GA) but to the detriment of the quality of the fit to data set 2 (Fig. 4). This reveals that, in some cases, successful convergence of FITEQL does not necessarily return good fits to all the experimental data. In turn, this raises questions about the validity of the recommended range of WSOS/DOF values (0.1 to 20) for the evaluation of the goodness of fit, a parameter that strongly depends on the input error estimates used in its calculation (Westall 116 Math Geosci (2010) 42: 101–127 Fig. 4 Data fits of proton (upper panel) and metal (lower panel) adsorption data (Gedanken acidimetric titration of MeX(s) in a 0.001 M NaCl solution) obtained with ionization and adsorption constants returned by the GA and FITEQL: (A) WSOS/DOF = 19.5 and (B) WSOS/DOF = 2.8 (see text for details) Math Geosci (2010) 42: 101–127 117 Fig. 5 Illustration of an 8-bit binary-string 3D matrix encoding a set of solutions for the “dummy” components (see text for details). The selected group of chromosomes that best minimizes the objective function at all titration points represents the “elite” of the population and, therefore, is preserved in the next generation 1982). It appears that FITEQL converged towards a local minimum rather than to a global one, and the relative contribution of reactions (20) and (21) (ionization reactions) versus (22) (divalent metal adsorption reaction) to the proton adsorption behavior could not be resolved accurately upon fitting of data sets 1 and 2. Conversely, the GA optimization gradually minimizes, for both sets of data, the difference between the experimental and simulated data, and, thus, the evolved generation retains critical information for searching the global minimum. Given an adequate conceptual SCM and sufficient high-quality data, the GA will find the combination of model parameters that best minimizes (11) even in cases that are difficult to tackle by deterministic approaches. 5.3 Simultaneous Estimation of Intrinsic Ionization and Adsorption Constants: Triple Layer Model It is often necessary to adopt an electrostatic model other than the CCM to describe the spatial distribution of charged surface species at the mineral–water interface (Lützenkirchen 2003). For multilayer electrostatic models, such as the Triple Layer Model (TLM, see Yates et al. 1974; Davis et al. 1978), a specific potential must be defined at each interfacial plane (surface and Stern planes and diffuse layer). In this case, the GA must find a suitable set of K int , C, and TOTsite values, intensive variables that in combination with appropriate potentials, extensive variables (Herbelin and Westall 1996), minimize (11) while satisfying (13). Since the surface charge displayed by the mineral varies according to the chemical conditions of the system (i.e., pH, chemical speciation), individual potentials must be adjusted at each titration or experimental data point, and for each electrostatic plane. Given the large number of 118 Math Geosci (2010) 42: 101–127 adjustable potentials, some modifications were made to the optimization approach described earlier. Sets of potential solutions for these extensive variables are encoded in a 3D binarystring matrix (Matrix-A) whose dimensions are determined by: (i) the selected population size, (ii) the number of available data points, and (iii) the number of interfacial electrostatic planes (dummy components), excluding the diffuse region, required by the selected multilayer electrostatic model (Westall and Hohl 1980). To define the size of the chromosomes in Matrix-A and the solution space, realistic boundary values of surface potential must be chosen for each electrostatic plane. For this purpose, zeta-potentials obtained from electrokinetic measurements are useful for the selection of these constraints (Dzombak and Morel 1990). The structure of a generalized 3D matrix, including the dummy components, is illustrated in Fig. 5. Another matrix (Matrix-B) encodes the solutions for the intensive variables. Its dimensions are dictated by the selected population size and the number of adjustable intensive variables. To improve the effectiveness of the evolutionary process, MatrixA and Matrix-B were dimensioned and manipulated separately throughout the GA optimization. In other words, genetic operations are carried out independently for each matrix, maintaining a good numerical diversity in both matrices and facilitating the search for the best solution to the optimization problem. Whereas the evaluation of the chromosomes in Matrix-B is subject to (11), the evaluation of chromosomes in Matrix-A is based on fulfillment of (13) by minimization of Yσ = p  e   calc 2 σx − σxelect , 1 (23) 1 where the total sum of residuals (Yσ ) corresponds to the difference between the charge calculated from the speciation predicted at plane “x” (σxcalc ) and the charge computed from electrostatics for plane “x” using appropriate surface-chargepotential relationships (σxelect ). For the TLM, these are (Davis and Kent 1990) σ0 = C1 (ψ0 − ψβ ), (24) σβ = C2 (ψβ − ψd ) − σ0 , √ σd = −0.1174 I sinh(ZF ψd /2RT ), (25) (26) where, as before, σ and ψ represent, respectively, the charge and potential at each specific interfacial region: (i) the surface (plane 0), (ii) the plane where outer-sphere complexes are located (plane β), and (iii) the diffuse layer (plane d). C1 and C2 are the integral capacitance values of the interfacial layers, and F /RT describes the Boltzmann factor for charged molecules and ions (38.92 V−1 ). Because the TLM makes no explicit provision for ion adsorption at the diffuse layer, the calculation of the diffuse layer charge, σd , is obtained from the charge neutrality equation (Davis and Kent 1990) σd = −σ0calc − σβcalc (27) and hence, the ψd can be obtained directly from (26). In other words, only ψ0 and ψβ are subjected to optimization via (23), and, thus, only two dummy components Math Geosci (2010) 42: 101–127 119 Fig. 6 Evaluation strategy to select best chromosomes encoding the values of extensive (Matrix-A) and intensive (Matrix-B) variables for multilayer electrostatic models are required in Matrix A. Additional dummy components would be required by more sophisticated multilayer electrostatic models such as the Four-Layer Model (Charmas 1999). One advantage of this treatment is that the best potential computed at each iteration, the one that best minimizes (23), can be tested against inequality constraints defined by physically realistic boundary values for each electrostatic plane (Dzombak and Morel 1990; Davis and Kent 1990). This operation serves to determine whether the calculated potentials are realistic and, thus, can be used in the next iteration; otherwise the old value is retained for evaluation in the next generation. The performance of the GA is improved by interrupting the optimization of the extensive variables once the above-mentioned constraints are achieved. In other words, hereafter, only Matrix-B will be subject to evolution, via (11) using, in the first subsequent iteration, the realistic set of potential values that best minimizes (23) but using the potentials computed from (24)–(27) at each subsequent iteration. The strategy employed for each matrix is schematically represented in Fig. 6. To further improve the performance (i.e., speed and computational requirements) of the GA optimization in this type of application, an additional implementation must be made. Given a chromosome population ≥100 and after a sufficient number of generations (i.e., n > 100), the GA optimization should have reached a suitable level of numerical viability and can be interrupted even if the data fit is inadequate. The optimized intensive variables are best values and can be used to quickly verify whether they can successfully simulate the data upon solution of the adsorption equilibrium forward problem upon adjustment of extensive variables by a derivative-based nu- 120 Math Geosci (2010) 42: 101–127 Fig. 7 Data fits of proton adsorption data (goethite in a 0.01 M NaCl solution) obtained with intrinsic constants returned by the GA-NR optimization and FITEQL within the framework of the TLM merical technique. The implementation of the Newton–Raphson (NR) technique for these types of problems is straightforward and was explained in detail elsewhere (e.g., Papelis et al. 1988). In other words, evolutionary optimization can be switched to a subroutine that calculates the Jacobians corresponding to all chemical component(s) and electrostatic term(s). Once the Jacobians are obtained (Westall 1979), the matrix is solved by the Gauss-elimination procedure (Nicholson 1995), and the iterative procedure carries on until the residual meets a prefixed tolerance level (Sahai and Sverjensky 1998). This strategy exploits the advantages of the GA to optimize the intensive variables (with preadjusted extensive variables), which, in combination with deterministically derived potentials, can successfully simulate titration data. The Newton–Raphson technique is used to fine-tune the estimation of the potentials, ensure minimization of (23) at the end of the iterative process, and reduce the computational time. If required, additional GA-NR micro-iterations can be implemented to improve the quality of the fit and further refine the intensive variable estimates. This hybrid GA-NR optimization technique was successfully tested against published surface titration data for goethite in 0.01 M NaCl solutions (Villalobos and Leckie 2001). The authors originally modeled their data using the TLM (Fig. 7). In this formalism, seven intensive variables must be adjusted: two ionization constants (Ka1 and Ka2 , describing reactions (14) and (15)), two background electrolyte int int ) describing, respectively, the following equiadsorption constants (Kcation and Kanion libria Math Geosci (2010) 42: 101–127 ≡FeOH0 + Na+ ⇔ ≡ FeO− · · · Na+ + H+ , − ≡FeOH0 + Cl− + H+ ⇔ ≡FeOH+ 2 · · · Cl 121 (28) (29) two capacitance values (C1 and C2 ), and the total concentration of primary surface sites (TOTsite ). Because FITEQL convergence is not guaranteed when several parameters are adjusted, particularly when interdependent reactions (reactions (14), (15), (28), and (29) are pH-dependent) are fitted simultaneously (Hayes et al. 1991), some criteria are commonly applied to preselect either the ionization or electrolyte adsorption constants before proceeding with the optimization of all other fitting parameters. Accordingly, Villalobos and Leckie (2001) selected fixed values of Ka1 and Ka2 (in addition int int to preselected values of C1 , C2 , and TOTsite ) and optimized Kcation and Kanion to fit their data (although C1 was varied manually at each FITEQL run). In contrast, the hybrid GA-NR technique allows for the simultaneous optimization of all seven intensive variables invoked by the TLM and the successful simulation of titration data, as shown in Fig. 7. 5.4 Adequacy of GA-based Exhaustive Sampling and Model Evaluation It has been stressed that an a posteriori analysis of nonuniqueness and uncertainty of solutions to inverse problems (resolution analysis) is necessary to assess scientific conclusions based on inverse modeling (Mosegaard 1998). Whereas this is an important step for inverse problems involving a large number of ill-defined, poorly constrained parameters, in general, it is not crucial for the small inverse problems presented in our study since the selected SCM parameters are physically meaningful and reasonably well-constrained. Hence, unless the available data are insufficient (fewer data points that unknowns) or unsuitable (high uncertainty) to quantitatively resolve the contribution of each adsorption reaction, exhaustive sampling techniques such as GA optimization can be safely applied to problems of this size (Mosegaard 1998). To illustrate, the reliability of GAs in exhaustive sampling applications, and in analogy to the procedures applied in earlier studies (e.g., Fernández Alvarez et al. 2008; Armstrong and Hibbert 2009; Hibbert and Armstrong 2009), we evaluated the behavior of the sample statistics derived from numerous GA optimizations of the inverse problem presented in Sect. 5.1 and compared it with the theoretical (Gaussian) posterior probability distribution function (pdf, Armstrong and Hibbert 2009) using experimental uncertainties typically associated with adsorption data (Hayes et al. 1991). In Fig. 8, we show how the GA visited solutions reasonably match the expected posterior pdf for one key SCM parameter, pK = | log10 Ka2 + log10 Ka1 |. Most noteworthy is that the highest frequencies of the GA visited solutions are centered along the Gaussian curve, slightly exceeding the posterior values, whereas the lowest frequencies are approximately evenly distributed at both sides of the maximum value. This exercise demonstrates that, given a sound hypothesis (set of adsorption reactions and appropriate electrostatic model), the binary tournament operator in combination with the crossover and mutation probabilities selected in our GA optimizations consistently converge towards the solution within a restricted tolerance. 122 Math Geosci (2010) 42: 101–127 Fig. 8 Histogram of GA visited solutions (crossover and mutation probabilities of 0.1) and posterior probability distribution function of the pK parameter (pK = | log10 Ka2 + log10 Ka1 |), given a priori information (I), describing proton adsorption data of goethite in a 0.7 M NaCl solution in terms of reactions (14) and (15) For more complex adsorption equilibrium inverse problems that may be equivalently solved by several SCM models (i.e., multiple hypotheses) and for which a priori information on the physical viability of the selected adsorption reactions and the values of their associated intrinsic formation constants is unavailable, a sensitivity analysis should be conducted to choose the best solution among the possible options (several SCMs fitting the data reasonably well) and prevent over-interpretation of the data (e.g., Poeter et al. 2005). This is typically done by gradually incorporating adsorption reactions to the simplest physically meaningful SCM and by testing its ability to reproduce the experimental data. We performed this analysis in a recent study (Villegas-Jiménez et al. 2009), where, based upon fundamental chemistry concepts, several SCMs were formulated and alternatively calibrated via GA optimization using relatively large experimental data sets for two rhombohedral carbonate minerals: magnesite and dolomite (Pokrovsky et al. 1999a, 1999b). Whereas more than one SCM fitted the data within the same tolerance (as confirmed by ANOVA tests), only one reasonably complied with all other a posteriori criteria (theoretical constraints and alternate experimental data) imposed for model validation. As emphasized earlier, we favor model parsimony. Math Geosci (2010) 42: 101–127 123 6 Concluding Remarks A powerful evolutionary optimization technique, the genetic algorithm, was applied to the calibration of SCMs for mineral surfaces. This technique was successfully tested for the inverse modeling of adsorption equilibria under several scenarios of varying degrees of complexity using published and synthetic data. The GA performs an exhaustive probabilistic search in a broad solution space that is constrained by physically realistic values selected by the modeler. Given suitable combinations of geochemical (set of adsorption reactions) and electrostatic (charge distribution at the adsorbent–water interface) models and sufficient quality experimental data, the GA can successfully optimize numerous parameters without incurring convergence problems while achieving good numerical stability, a notable advantage over conventional deterministic, root-finding, and optimization techniques implemented in popular inverse modeling geochemical codes such as FITEQL. At the modeler’s discretion, multiple intrinsic ionization and adsorption constants, capacitance(s), and/or primary surface site concentrations can be simultaneously fitted by the GA. Nevertheless, an a posteriori theoretical evaluation of the fitted SCM parameters (based on physical–chemical concepts and alternate experimental evidence) must be made to ascertain their thermodynamic meaningfulness and confirm their relevance within the SCM. This aspect is key for the construct of sound, yet parsimonious, SCMs. The drawback of the GA approach is the large number of iterations that must be performed to obtain a satisfactory solution, particularly for cases involving numerous fitting parameters. Alternatively, the power of the GA can be more efficiently exploited when a deterministic technique such as Newton–Raphson is incorporated once the chromosome population has reached a suitable level of numerical viability. Consequently, we propose the use of a hybrid GA-NR approach for the efficient optimization of intrinsic constants in complex problems such as those involving the TLM. We believe that the computational requirements of the GA to the calibration of SCMs are greatly outweighed by its robustness, simplicity, and potential to generate a reliable thermodynamic database describing adsorption reactions. This is a critical step to making reliable predictions of the adsorption behavior of minerals and geologically based adsorbents in hydro-geological settings (e.g., aquifers, pore waters, water basins) and/or in engineered reactors (e.g., mining and wastewater treatment industries). Future work to test and develop more sophisticated adaptive strategies and hybrid heuristic–deterministic optimization approaches may improve the GA performance in this type of applications. Acknowledgements A.V.-J. thanks Dr. David Burns for critical discussions that inspired this investigation and EMEA S.C. Environmental Consulting Firm for providing suitable computer facilities to complete this work. The insightful comments of two anonymous reviewers and the critical remarks of Dr. Johannes Lützenkirchen have improved the quality of this paper. This research was supported by a graduate research student grant to A.V.-J. from the Geological Society of America (GSA) and Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grants to AM. A.V.-J. benefited from postgraduate scholarships from the Consejo Nacional de Ciencia y Tecnología (CONACyT) of Mexico and additional financial support from the Department of Earth and Planetary Sciences, McGill University and Consorcio Mexicano Flotus-Nanuk. 124 Math Geosci (2010) 42: 101–127 Appendix: Experimental Data Used in GA Optimizations (Gedanken Experiment) The synthetic data used for the optimization of intrinsic ionization and adsorption constants in the gedanken experiment problem presented in our study are provided in the Appendix. Acidimetric Titration Data. Sorbent: MeX(s) ; Sorbates: H+ and Me2+ ; Solution: I = 0.001 NaCl; Site Density: 2.2 · 10−6 moles m−2 ; Specific Surface Area: 0.232 m2 g−1 ; Mass/Volume Ratio: 48.4 g L−1 PH [H+ ]ads [Me2+ ]ads Mesolution mol L−1 10.17 10.14 10.11 10.08 10.06 10.02 9.98 9.94 9.89 9.85 9.79 9.73 9.67 9.59 9.51 9.40 9.27 9.10 8.86 8.62 8.48 −2.17E-05 −2.15E-05 −2.14E-05 −2.12E-05 −2.11E-05 −2.09E-05 −2.07E-05 −2.04E-05 −2.01E-05 6.74E-08 1.47E-07 8.25E-08 1.81E-07 1.02E-07 2.25E-07 1.23E-07 2.74E-07 1.42E-07 3.20E-07 1.77E-07 4.04E-07 2.22E-07 5.11E-07 2.81E-07 6.57E-07 3.55E-07 8.43E-07 4.47E-07 1.08E-06 −1.93E-05 5.66E-07 1.40E-06 7.25E-07 1.84E-06 −1.82E-05 9.17E-07 2.40E-06 1.15E-06 3.16E-06 −1.97E-05 −1.88E-05 −1.74E-05 −1.65E-05 −1.51E-05 −1.34E-05 −1.11E-05 −7.23E-06 −3.22E-06 1.44E-06 4.15E-06 1.77E-06 5.50E-06 2.10E-06 7.27E-06 2.34E-06 9.37E-06 2.30E-06 1.20E-05 1.97E-06 1.43E-05 1.68E-06 1.57E-05 1.35E-06 1.45E-06 1.69E-05 8.19 4.10E-06 1.11E-06 1.84E-05 8.07 6.17E-06 8.91E-07 1.96E-05 8.35 −6.82E-07 7.90 9.12E-06 6.21E-07 2.13E-05 7.77 1.13E-05 4.56E-07 2.30E-05 7.63 1.37E-05 3.18E-07 2.46E-05 7.49 1.61E-05 2.20E-07 2.65E-05 7.36 1.76E-05 1.40E-07 2.83E-05 7.23 1.92E-05 8.76E-08 3.02E-05 7.09 2.08E-05 4.49E-08 3.31 E-05 6.94 2.18E-05 2.53E-08 3.56E-05 Math Geosci (2010) 42: 101–127 125 6.80 2.25E-05 1.53E-08 6.60 2.32E-05 <1E-9 3.81 E-05 4.17E-05 6.48 2.36E-05 <1E-9 4.39E-05 6.30 2.41E-05 <1E-9 4.76E-05 6.17 2.44E-05 <1E-9 5.05E-05 5.99 2.48E-05 <1E-9 5.46E-05 5.85 2.50E-05 <1E-9 5.78E-05 5.68 2.52E-05 <1E-9 6.21 E-05 5.54 2.54E-05 <1E-9 6.59E-05 5.40 2.55E-05 <1E-9 6.96E-05 5.27 2.57E-05 <1E-9 7.33E-05 5.15 2.58E-05 <1E-9 7.71 E-05 5.03 2.59E-05 <1E-9 8.09E-05 References Allison JD, Brown DS, Novo-Gradac KJ (1991) MINTEQA2/PRODEFA2, a geochemical assessment model for environmental systems: version 3.0. User’s manual. Environmental Research Laboratory, Office of Research and Development, US Environmental Protection Agency, Athens, GA, 106 Armstrong N, Hibbert DB (2009) An introduction to Bayesian methods for analyzing chemistry data Part 1: An introduction to Bayesian theory and methods. Chemom Intell Lab Syst 97:194–210 Ball JW, Jenne EA, Norstrom DK (1981) WATEQ2—A computerized chemical model for trace and major element speciation and mineral equilibrium of natural waters. In: Jenne EA (ed) Chemical modeling in aqueous systems. Symposium series, vol 93. American Chemical Society, Washington, pp 815– 836 Charmas R (1999) Four-layer complexation model for ion adsorption at energetically heterogeneous metal oxide/electrolyte interfaces. Langmuir 15(17):5635–5648 Davis JA, Kent DB (1990) Surface complexation modeling in aqueous geochemistry. In: Hochella MF, White AF (eds) Mineral-water interface geochemistry. Rev mineral, vol 23. Mineral Soc, Washington, pp 177–260 Davis JA, James RO, Leckie JO (1978) Surface ionization and complexation at the oxide/water interface: I. Computation of electrical double layer properties in simple electrolytes. J Colloid Interface Sci 63(3):480–499 Dzombak DA, Morel FMM (1990) Surface complexation modeling. Wiley, New York Epperson JF (2002) An introduction to numerical methods and analysis. Wiley, New York Essaid HI, Cozzarelli IM, Eganhouse RP, Herkelrath WN, Bekins BA, Delin GN (2003) Inverse modeling of BTEX dissolution and biodegradation at the Bemidji, MN crude-oil spill site. J Contam Hydrol 67(1):269–299 Fernández Alvarez JP, Fernández Martínez JL, Menéndez Pérez CO (2008) Feasibility analysis of the use of binary genetic algorithms as importance samplers application to 1-D DC resistivity inverse problem. Math Geosci 40:375–408 Gans P (1976) Numerical methods for data-fitting problems. Coord Chem Rev 19:99–124 Gao Y, Mucci A (2001) Acid base reactions, phosphate and arsenate complexation, and their competitive adsorption at the surface of goethite in 0.7 M NaCl solution. Geochim Cosmochim Acta 65:2361– 2378 Gen M, Cheng R (1997) Genetic algorithms and engineering design. Wiley, New York Gen M, Cheng R (2000) Genetic algorithms and engineering optimization. Wiley, New York Goldberg D (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading Goldberg D, Korb K, Deb K (1989) Messy genetic algorithms: Motivation, analysis and first results. Complex Syst 3:493–530 126 Math Geosci (2010) 42: 101–127 Hayes KF, Redden G, Ela W, Leckie JO (1991) Surface complexation models: An evaluation of model parameter estimation using FITEQL and oxide mineral titration. J Colloid Interface Sci 142(2):448– 469 Herbelin A, Westall J (1996) FITEQL—A computer program for determination of chemical equilibrium constants from experimental data version 3.2: User’s manual. Department of Chemistry, Oregon State University, Corvallis, OR, Report 96-01 Hibbert DB, Armstrong N (2009) An introduction to Bayesian methods for analyzing chemistry data. Part II: A review of applications of Bayesian methods in chemistry. Chemom Intell Lab Syst 97:211– 220 Hiemstra T, Yong H, van Riemsdijk WH (1999) Interfacial charging phenomena of aluminum (hydr)oxides. Langmuir 15(18):5942–5955 Hohl H, Stumm W (1976) Interaction of Pb2+ with hydrous γ -Al2 O3 . J Colloid Interface Sci 55:281–288 Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor Ingri N, Kakolowicz W, Sillén LG, Warnqvist B (1967) High-speed computers as a supplement to graphical methods-V* HALTAFALL, A general program for calculating the composition of equilibrium mixtures. Talanta 14:1261–1286 Jakeman AJ, Letcher RA, Norton JP (2006) Ten iterative steps in development and evaluation of environmental models. Environ Model Softw 21:602–614 Keizer MG, van Riemsdijk WH (1999) ECOSAT. Technical Report of the departments of Soil Science and Plant Nutrition, Wageningen University, Wageningen, The Netherlands Keller B, Lutz R (1997) A new crossover operator for rapid function optimization using a genetic algorithm. In: Proceedings of the eighth Ireland conference on artificial intelligence (AI-97), vol 2, pp 48–57 Kinniburgh DG (1999) FIT user guide. British Geological Survey, Keyworth, WD/93/23 Langmuir D (1997) Aqueous environmental geochemistry. Prentice Hall, Upper Saddle River Lützenkirchen J (2003) Surface complexation models for adsorption: A critical survey in the context of experimental data. In: Tóth J (ed) Adsorption: Theory, modeling, and analysis. Surfactant science series, vol 107. Dekker, New York, pp 631–710 Malinverno A (2000) A Bayesian criterion for simplicity in inverse problem parametrization. Geophys J Int 140:267–285 Matott LS, Rabideau AJ (2008) ISOFIT—A program for fitting sorption isotherms to experimental data. Environ Model Softw 23:670–676 Mestres J, Scuseira GE (1995) Genetic algorithms, a robust scheme for geometry optimizations and global minimum structure problems. J Comput Chem 16(6):729–742 Michalewicz Z (1996) Genetic algorithms + data structure = evolution programs, 2nd edn. Springer, New York Morel FMM, Hering JG (1993) Principles and applications of aquatic chemistry. Wiley, New York Morel FMM, Morgan J (1972) A numerical method for computing equilibria in aqueous chemical systems. Environ Sci Technol 6:58–87 Mosegaard K (1998) Resolution analysis of general inverse problems through inverse Monte Carlo sampling. Inverse Probl 14:405–426 Mosegaard K, Sambridge M (2002) Monte Carlo analysis of inverse problems. Inverse Probl 18:29–54 Nicholson K (1995) Linear algebra with applications. PWS, Boston NIST (1998) Critically selected stability constants of metal complexes electronic database. National Institute of Standards and Technology, reference database 46, US Department of Commerce, Gaithersburg, MD, USA Papelis C, Hayes KF, Leckie JO (1988) HYDRAQL: A program for the computation of chemical equilibrium, composition of aqueous batch systems including surface-complexation modeling of ion adsorption at the oxide/solution interface. Report No 306, Stanford University, Stanford, CA Parkhurst DL (1995) User’s guide to PHREEQC. US Geological Survey, Water Resources Investigations Report 95-4277, 212 Poeter EP, Hill MC, Banta ER, Mehl S, Christensen S (2005) UCODE 2005 and six other computer codes for universal sensitivity analysis, calibration, and uncertainty evaluation. US Geological Survey Techniques and Methods 6-A11 Pokrovsky OS, Schott J, Thomas F (1999a) Processes at the magnesium-bearing carbonates/solution interface. I. A surface speciation model for magnesite. Geochim Cosmochim Acta 63(6):863–880 Pokrovsky OS, Schott J, Thomas F (1999b) Dolomite surface speciation and reactivity in aquatic systems. Geochim Cosmochim Acta 63(19/20):3133–3143 Math Geosci (2010) 42: 101–127 127 Sahai N, Sverjensky DA (1998) GEOSURF: A computer program for modeling adsorption on mineral surfaces from aqueous solution. Comput Geosci 24(9):853–873 Sait SM, Youssef H (1999) Iterative computer algorithms with applications in engineering solving combinatorial optimization problems. IEEE Computer Society, Los Alamitos Schecher WD, McAvoy DC (1992) MINEQL+: A software environment for chemical equilibrium modeling. Comput Environ Urban Syst 16:65–76 Schindler PW, Kamber HR (1968) Die acidität von silanolgruppen. Helv Chim Acta 51:1781–1786 Sposito G, Coves J (1988) SOILCHEM: A computer program for the calculation of chemical speciation in soils. Keamey Found Soil Sci, Univ California, Riverside Stumm W, Morgan J (1996) Aquatic chemistry: Chemical equilibria and rates in natural waters. Wiley, New York Tipping E (1994) WHAM—A chemical equilibrium model and computer code for water, sediments and soils incorporating a discrete site/electrostatic model of ion-binding by humic substances. Comput Geosci 20:973–1023 Turner BF, Fein JB (2006) Protofit: A program for determining surface protonation constants from titration data. Comput Geosci 32:1344–1356 van der Lee J, de Windt L (1999) CHESS tutorial and cookbook. Technical report No LHM/RD/99/05, École des Mines de Paris, Fontainebleu, France Villalobos M, Leckie JO (2001) Surface complexation modeling and FTIR study of carbonate adsorption to goethite. J Colloid Interface Sci 235:15–32 Villegas-Jiménez A, Mucci A, Pokrovsky OS, Schott J (2009) Defining reactive sites on hydrated mineral surfaces: Rhombohedral carbonate minerals. Geochim Cosmochim Acta 73:4326–4345 Westall JC (1979) MICROQL II: Computation of adsorption equilibria in BASIC. Technical Report, Swiss Federal Institute of Technology, EAWAG, 8600 Dübendorf, Switzerland Westall JC (1982) FITEQL: A computer program for determination of chemical equilibrium constants from experimental data. Version 1.2. Report 82-01, Department of Chemistry, Oregon State University, Corvallis, OR, USA Westall J, Hohl H (1980) A comparison of electrostatic models for the oxide/solution interface. Adv Colloid Interface Sci 12:265–294 Wolery TJ (1992) EQ3NR: A computer program for geochemical aqueous speciation-solubility calculations: Theoretical manual, user’s guide and related documentation (Version 7.0). Report No UCRLMA-110662-PT-III, Lawrence Livermore National Laboratory, Livermore, CA Yates DE, Levine S, Healy TW (1974) Site-binding model of the electrical double layer at the oxide-water interface. J Chem Soc Faraday Trans 1 70:1807–1818 Zeleznik FJ, Gordon S (1968) Calculation of complex chemical equilibria. Ind Eng Chem 60:27–57