1 s2.0 S0920410518310246 Main
1 s2.0 S0920410518310246 Main
Machine learning technique for the prediction of shear wave velocity using T
petrophysical logs
Mohammad Anemangelya,∗, Ahmad Ramezanzadeha, Hamed Amiria, Seyed-Ahmad Hoseinpourb
a
School of Mining, Petroleum and Geophysics Engineering, Shahrood University of Technology, Shahrood, Iran
b
Petroleum University of Technology, Ahvaz, Iran
Keywords: Compressional wave velocity (Vp) and shear wave velocity (Vs) are widely used as quick, easy to use, and cost-
Shear wave velocity estimation effective means of determining the mechanical properties of formations in the petroleum industry. However,
Machine learning shear wave logs are only available in a limited number of wells in an oil field due to the high cost of the log
Outlier detection acquisition. For this reason, many attempts have been made to find a correlation between Vs and other petro-
Feature selection
physical logs. In this study, a set of log data consisting of depth, neutron porosity (NPHI), density (RHOB),
Optimization algorithm
photoelectric (PEF), gamma ray (GR), caliper, true resistivity (RT), VP, and Vs were used to develop a model for
Vs estimation in a well drilled in Ahvaz field. For this purpose, Tukey's method was employed to preprocess the
data and eliminate outliers. Then, using the Non-dominated Sorting Genetic Algorithm II (NSGA-II), the best
features were selected among the inputs to estimate Vs. Results indicate that increasing the number of input
parameters of the model leads to an increase in accuracy and determination coefficient; however, this increase
was negligible for the models with more than five input parameters. These parameters, i.e., Vp, RT, GR, RHOB,
and NPHI, were selected as the best features to estimate Vs adopting Least Square Support Vector Machine
(LSSVM) combined with Particle Swarm Optimization (PSO), Genetic Algorithm (GA), and Cuckoo Optimization
Algorithm (COA). Results obtained from modeling reveal that LSSVM-COA algorithm produces a more accurate
estimate of VS compared to LSSVM-PSO and LSSVM-COA in both training and testing steps. In addition, a very
slight difference between the error in testing and training steps suggests a higher reliability of LSSVM-COA
compared to two other hybrid algorithms. Comparing this model with empirical and regression ones clearly
showed that the model error is considerably less than those models. Eventually, another set of data gathered in
Ab-Teymour field was used to evaluate the model from the perspective of generalizability. Similarly, Vs esti-
mation by LSSVM-COA model provided more reliable and accurate results compared to those of empirical and
regression models, and also LSSVM-PSO and LSSVM-GA algorithms. Finally, it can be concluded that the method
employed in this study could be used as an efficient means to accurately estimate Vs.
∗
Corresponding author.
E-mail address: manemangaly@gmail.com (M. Anemangely).
https://doi.org/10.1016/j.petrol.2018.11.032
Received 2 June 2018; Received in revised form 26 October 2018; Accepted 13 November 2018
Available online 17 November 2018
0920-4105/ © 2018 Elsevier B.V. All rights reserved.
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
In addition to laboratory studies, empirical correlations have been NSGA-II (Non-dominated Sorting Genetic Algorithm II), the input data
commonly used as a simple and quick method to estimate Vs from other are normalized. The selected features are then considered as input for
available logs. Nevertheless, these correlations are not able to produce LSSVM-COA (Least Square Support Vector Machine Combined with
the desired results in different settings (wang, 200; Akhundi et al., Cuckoo Optimization Algorithm), LSSVM-PSO (Least Square Support
2014); even the most frequently used correlations such as those pro- Vector Machine Combined with Particle Swarm Optimization), and
posed by Pickett (1963), Carroll (1969), Castagna et al. (1993), LSSVM-GA (Least Square Support Vector Machine Combined with Ge-
Eskandari et al. (2003), and Brocher (2005). Besides, these correlations netic Algorithm) models. Each of these stages will be briefly described
merely take into account the compressional wave velocity (Vp) and, as in the following sections.
a result, are barely able to obtain an accurate estimate of Vs. Therefore,
researchers have a higher tendency toward multiple regression methods 3.1. Outliers elimination
(Anemangely et al., 2017a; Eskandari et al., 2003; Lee, 1990).
In recent years, numerous studies have shown the high power of Several factors such as environmental conditions, human error, and
intelligent systems such as artificial neural network (ANN), support poor instrument calibration can result in inaccurate data acquisition.
vector regression (SVR), fuzzy logic, and neuro-fuzzy methods for The error in the data, which is commonly known as noise, has negative
prediction of geomechanical properties of rocks. The results obtained effects on data interpretation and, as a result, on deriving a reliable
from these studies demonstrate that intelligent systems are far superior correlation between independent and dependent variables. However,
to regression methods (Akhundi et al., 2014; Anemangely et al., 2017a; the presence of noise in the data is inevitable so that there is a 5% error
Bagheripour et al., 2015; Behnia et al., 2017; Mehrgini et al., 2017; even in the best-controlled conditions (Maletic and Marcus, 2000; Wu,
Maleki et al., 2014; Rezaee et al., 2007; Rajabi et al., 2010). However, 1995). Under many circumstances, it is impossible to determine the
most of the related works have not deeply explored feature selection effect of noise on a parameter and noise reduction techniques are
process and the existence of noise in the data. Anemangely et al. usually employed.
(2017a) showed that considering these two processes in the phase of Tukey's method was applied to detect outliers and reduce the noise
data preprocessing could have a significant effect on accuracy and in the data, mainly because it is simple and conventional. In this
performance of estimator models. A noteworthy point in this regard is method, after finding the first and third quartiles, interquartile range
assessing the impact of feature selection and noise reduction on the (IQR) can be calculated using Eq. (1) (Tukey, 1975). Lower Inner Fence
generalizability of the estimator model to other wells. Furthermore, (LIF) and Upper Inner Fence (UIF) are subsequently calculated using
LSSVM has shown a high performance in solving complex classification Eqs. (2) and (3) (Tukey, 1975). According to Tukey's method, all the
and estimation problems (Suykens et al., and Vandewalle, 1999; values outside of these two fences are considered as outliers and should
Suykens et al., 2003) in preference to ANN, fuzzy logic, and neuro- be eliminated.
fuzzy.
IQR = 1.5 × (Q3 Q1) (1)
The present study was conducted to develop an estimator model of
Vs from petrophysical logs. For this purpose, a data set was collected LIF = Q1 IQR (2)
from two wells drilled in the southwest reservoirs of Iran and analyzed
UIF = Q3 + IQR (3)
in order to eliminate their outliers and select the best features. Next, the
selected features were considered as input parameters for hybrid
methods, including LSSVM-COA, LSSVM-PSO, and LSSVM-GA. 3.2. Feature selection
2. Studied wells The use of best features instead of all features available in estimator
models reduces the process volume, thereby increasing the process
In this study, two vertical wells drilled in two oilfields located in the speed (Anemangely et al., 2018). Furthermore, feature selection im-
southwest of Iran were studied. In order to keep wells’ information proves the accuracy and the generalizability of the models. In this
confidential, we name the well drilled in Ahvaz oilfield as Well A, and study, the second version of Non-dominated Sorting Genetic Algorithm
the other one, which has been drilled in Ab-Teymour oil field, as Well- (NSGA-II) was employed to select the best features among all the
B. Moreover, since petrophysical logs were only available in reservoir parameters available to estimate Vs. As units of the input parameters
intervals, intervals of interest were placed in Ilam and Sarvak carbonate are different, their range of values varies considerably. Inputs with
reservoirs for Well A and Well B, respectively. Variations of these logs larger values may significantly affect the outputs of models; conse-
along with Vs and Vp logs in two wells are shown in Figs. 1 and 2. Some quently, coefficients of the parameters are not correctly determined,
statistical properties of these data are presented in Table 1. There are which may reduce both accuracy and generalizability of a model.
2632 and 1042 data points in Well A and Well B, respectively. Therefore, data normalization is needed to carefully select the best
features. Eq. (4) normalizes the data between 0 and 1.
3. Methodology
X Xmin
Xnorm =
Xmax Xmin (4)
To construct a model with high accuracy and generalizability, the
data obtained from Well A were used for modeling and the data of Well where X is an input parameter, Xmin and Xmax are the minimum and
B for model validation. We did so because the number of data points is maximum values of the input parameter, respectively, and Xnorm is the
larger and range of variations in parameters is wider in Well A. Fig. 3 normalized value of the parameter to [0,1] range.
shows the general procedure followed in this study. As can be seen, Fig. 4 shows NSGA-II flowchart. In this algorithm, the binary tour-
input data are subjected to a preprocessing stage to filter out the out- nament is used as an effective method to choose the best solutions
liers. Next, to select the best features for shear wave estimation using among those obtained from each generation. Initially, two solutions are
307
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
randomly selected from the population followed by making a compar- lower the solution rank and the greater the crowding distance, the
ison between these two solutions to select the better one. Selection better the solution can be. Applying the binary selection on the popu-
criteria consist primarily of the solution rank, as a more important lation of each generation, a set of this generation is selected for cross-
criterion, and crowding distance of the solution (Deb et al., 2002). The over and mutation; in fact, the crossover is applied on a part of the
308
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Table 1
Some statistical characteristics of the petrophysical logs in the studied wells.
Well Name Statistical Index Petrophysical well logs
Depth (m) Caliper (in) PEF (B/E) GR (GAPI) RT (OHMM) NPHI (V/V) RHOB (G/C3) Vp (km/s) Vs (km/s)
Well A Minimum 3514.80 1.77 3.50 3.80 4.98 0.00 1.99 4.28 2.30
1st Quartile 3615.04 6.21 4.79 18.30 79.36 0.02 2.52 5.23 2.86
Median 3715.28 6.26 4.98 29.54 148.66 0.04 2.62 5.61 3.00
Average 3715.28 6.48 4.94 30.74 159.46 0.06 2.59 5.55 2.96
3rd Quartile 3815.53 6.54 5.14 39.84 210.01 0.07 2.68 5.87 3.10
Maximum 3915.77 11.64 5.73 141.31 992.86 0.60 2.78 6.48 3.37
Well B Minimum 3315.00 3.88 2.51 3.78 0.36 0.12 2.14 4.05 2.14
1st Quartile 3354.67 6.43 4.51 12.35 32.47 0.25 2.44 4.59 2.53
Median 3394.33 6.79 4.69 21.48 53.95 0.27 2.53 5.06 2.69
Average 3394.33 6.76 4.57 23.21 85.39 0.28 2.51 5.02 2.69
3rd Quartile 3433.99 7.37 4.86 32.18 92.33 0.30 2.61 5.40 2.83
Maximum 3473.65 8.69 5.53 54.02 1801.32 0.37 3.06 6.28 3.40
selected members and the rest are subjected to mutation, and a popu- estimator model and minimizing the estimation error are two main
lation of offsprings and mutants are ultimately generated. Afterward, a objectives that were established for the NSGA-II algorithm. The esti-
new population is generated by combining this generated population mation error was calculated using observed (T) and predicted (O) va-
with the main one. Members of the new population are initially sorted lues by means of root mean square error (RMSE) (Eq. (5)). Furthermore,
based on their rank in an ascending order. In addition, the members a multilayer perceptron (MLP) neural network was considered as a cost
with the same rank should be sorted according to the crowding distance function for the NSGA-II. This ANN is made up of three hidden layers
in descending order. The number of members equal to the number of within which there are 4, 3, and 3 neurons respectively. It is worth
the main population is selected from the top of the sorted list, but the mentioning that this structure was determined as a result of a trial and
other numbers are not considered. This cycle keeps repeating until a error procedure. According to the literature, 70% of all data were se-
stopping criterion is satisfied. Solving a multi-objective optimization lected to train the network and the other 30% to test the trained model.
problem offers a series of non-dominated solutions, which are often Eq. (6) was used to combine the training and testing errors in order to
termed Pareto fronts. None of these solutions is prioritized to the others consider a single error as NSGA-II output for each certain number of
(Deb et al., 2002) and any of them can be suggested as an optimal input parameters. Additionally, weights 0.6 and 0.4 were assigned to
solution under various circumstances (Anemangely et al., 2017a). testing error (WTest ) and training error (WTrain ), respectively. A higher
In this study, crossover and mutation coefficients were taken 0.6 testing error results in selecting the parameters that can increase the
and 0.3, respectively, with a mutation rate of 0.1. In addition, initial generalizability of the estimator model. The same weights were used in
population size and maximum iteration in the algorithm's main loop Eq. (7) for these terms. What's more, a neural network produces a dif-
were set to 60 and 50, respectively. Reducing the number of inputs for ferent solution for a given problem at each iteration. In order to deal
309
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
where (xi ) is a nonlinear function for mapping the input data into a
higher dimensional feature space, ei is the error in the i-th data point,
with this problem and achieve more reliable results, the average of five
and is regularization parameter. Also, N, W, and b denote the number
iterations of the neural network was used in the model for each certain
of data points in the input vector, the weight vector, and bias term,
number of the input parameters.
respectively.
1
n According to Karush-Kuhn-Tucker theorem and Lagrangian func-
RMSE = (Ti Oi )2 tion, LSSVM can be used for nonlinear functions as follows:
n i=1 (5)
N
Estimation Error = WTrain × RMSETrain + WTest × RMSETest (6) yi = iK (x , x i ) + b
i=1 (10)
2
RModel 2
= WTrain × RTrain 2
+ WTest × RTrain (7)
where i and K (x , x i ) are support values and Kernel function, respec-
tively. The most common Kernel functions in this regard are presented
3.3. LSSVM algorithm in Table 2.
The ability of LSSVM is basically determined by Kernel function.
Suykens and Vandewalle (1999) first introduced LSSVM which is a Since kernel functions have different effects on the estimation ability of
310
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
3.3.1. Cuckoo Optimization Algorithm (COA) where β is an integer and control the maximum value of ELR. In this
Rajabioun (2011) was the first who introduced COA via inspiring by study, the initial value of β was set to 1. However, it is reduced by 1% at
the life of cuckoos. This algorithm begins finding an optimum solution each iteration in order to increase both the rate of convergence and
with an initial population that has random values in search space. The accuracy of the algorithm. Flowchart of COA is shown in Fig. 5. In this
algorithm is based on the manner of reproduction and egg laying of algorithm, using a trial and error approach, Cuckoos' population and
cuckoos. These birds lay their eggs in other birds' nest, but only the eggs the number of iterations were considered to be 15 and 30, respectively.
similar to those of host bird have the chance of surviving and growing, Moreover, RMSE was calculated using the predicted and observed va-
and the other eggs will be identified and killed by the host birds. lues of Vs and employed as the cost function.
Therefore, cuckoos seek the positions in which the most eggs can be
saved. After growing and becoming mature, the survivor eggs create 3.3.2. PSO algorithm
their own societies. Then, a specific habitat is assigned to each society, The PSO algorithm seeks the best solution in the search space by a
and the cuckoos living in different societies migrate toward the best population of particles known as a swarm. At first, initial position and
habitat. In this method, egg-laying radius (ELR) is a function of the velocity of the population are randomly generated according to their
number of each cuckoo’ eggs, the lower limit (varlow ) and upper limit upper and lower limits. Each particle's position is considered as its best
(varhi ) for each variable, and the distance between the cuckoo and the personal position (Pb ). Having evaluated all particles' positions using a
best habitat (Eq. (11)). This procedure is followed by cuckoos until they cost function, the position corresponding to minimum cost is selected as
reach the habitat in which survival rate of the population is maximum the best global position among particles (Gb ). Then, new velocity
(Rajabioun, 2011). (Vi (t + 1) ) for each particle (i) is defined based on its previous velocity (
Vi (t ) ) and the distance from the particle's current position (x i (t ) ) to best
personal and global position (Eq. (12)). The particle's new position
311
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Figs. 8 and 9 show the relation between the input parameters and Vs
in wells A and B. it seems that there is a strong correlation between
compressional and shear wave velocities in the both well. However, a
high correlation is also observed between depth and Vs in Well B. In
addition, it is seen that the relation between the inputs and Vs is af-
fected by the outliers. Therefore, these outliers were eliminated from
data points using Tukey's method. For this purpose, 467 and 195 data
points were detected and eliminated consequently. The results obtained
after data preprocessing and outlier elimination are demonstrated in
Figs. 10 and 11. It can be seen that in addition to compressional wave
velocity there is a high correlation between density log and Vs in the
two wells. Moreover, in Well A, porosity log shows a stronger correla-
Fig. 7. Flowchart of GA. tion with Vs than Vp. In this well, porosity is the only log that has a
negative correlation with shear wave log. The depth log has the same
characteristic as porosity in Well B.
( x i (t + 1) ) is calculated from the previous position and new velocity
Fig. 12 demonstrates the results of features selection using NSGA-II.
(Eq. (13)). The next stage is to evaluate the new position of each par-
It is observed that an increase in the number of inputs of estimator
ticle by the cost function, and if its current cost is less than its best
model results in an increase in both accuracy and determination coef-
personal position's cost the best personal position will be updated. In
ficient of the model. However, the rate of these changes is negligible for
the same way, the best global position is updated by a comparison
the models that have more than five inputs. Therefore, to avoid de-
between the cost of best global position and that of each particle. These
veloping a complex and time-consuming model, a model with five in-
stages are depicted in Fig. 6.
puts is used for estimating Vs. Table 3 presents a combination of
Vi (t + 1) = wVi (t ) + c1 r1 (Pbi (t ) x i (t )) + c2 r2 (Gb (t ) x i (t )) (12) parameters which provides the best solution for each number of inputs.
The five inputs selected in this paper are Vp , RHOB, NPHI, RT, and GR
x i (t + 1) = xi (t ) + Vi (t + 1) (13) logs.
These five parameters were considered, along with different Kernel
where i = 1, 2, …, n , n denotes the number of particles and w is inertia functions (as presented in Table 2), as inputs for the LSSVM algorithm
weight that controls the amount of recurrence in the particle's velocity in order that a kernel function appropriate for this problem could be
(Pedersen and Chipperfield, 2010), r1 and r2 are random numbers in the selected. The results suggest that RBF-Kernel produces more accurate
range of 0–1 (Coello et al., 2007), and c1 and c2 are positive coefficients results than the other functions (Fig. 13); therefore, this function will be
named personal and social factor, respectively. These factors can be used in combination with LSSVM and PSO, GA, and COA algorithms. In
considered 2 for all applications (Marini and Walczak, 2015). this regard, two variables (γ and σ2) must be optimized while using the
Here, adopting a trial and error approach, the number of particles’ RBF-Kernel function. In order to construct the LS-SVM model with the
population and iterations were set to 40 and 100, respectively. RBF kernel, the optimal model parameters should be appropriately
Minimization of RMSE was considered as the objective function, similar selected for achieving a desirable performance: gamma (γ) which is the
to COA algorithm. regularization parameter and RBF kernel function parameter sig2 (σ2)
or the squared bandwidth. These two tuning parameters which de-
3.3.3. GA termine the prediction, learning and generalization abilities of the de-
The genetic algorithm (GA) was first proposed by Holland John veloped LS-SVM. The tuning parameters, were determined using opti-
(1975) for finding the best solution for complex problems. Afterward, mization algorithms. For the aim of achieving optimized hyper-
many researchers have presented modified versions of this algorithm in parameters of LS-SVM, All of the values for each hyper-parameter (γ,
312
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 8. Cross plot of input variables versus shear wave velocity (Vs) before the removal of outliers, Well A.
313
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 9. Cross plot of input variables versus Vs before the removal of outliers, Well B.
314
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 10. Cross plot of input variables versus Vs after the removal of outliers, Well A.
315
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 11. Cross plot of input variables versus Vs after the removal of outliers, Well B.
316
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 12. The results from NSGA-II to compare the models with different number of inputs in Well A.
Table 3 σ2) were put in two different vectors. The objective of each optimiza-
Selected input parameters corresponding to each number of different inputs by tion algorithm is to find the best (γ, σ2) values with the lowest cost
NSGA-II in Well A (the selected parameters for developing the model are function among the different combinations of the parameters, (γ) and
marked in bold). (σ2) in search space. Gamma (γ) is used to maximize model perfor-
Number of input(s) Selected feature(s) Error R2 mance on training and minimize complexity of the model. Large
amount of gamma (γ) demonstrates little regularization which leads to
1 Vp 0.0701 0.834 a more nonlinear model. Sig2 (σ2) influences the number of neighbors
2 Vp , RHOB 0.0553 0.912
in the model, so that large amount of sig2 (σ2) means more neighbors in
3 Vp , RHOB, NPHI 0.0478 0.931
the model and thus a more nonlinear model.
4 Vp , RHOB, NPHI, RT 0.0436 0.942
To develop proper estimator models and verify their general-
5 Vp , RHOB, NPHI, RT, GR 0.0397 0.949
6 Vp , RHOB, NPHI, RT, GR, Depth 0.0386 0.951
izability, 70% of data points in Well A was used to train and the other
7 Vp , RHOB, NPHI, RT, GR, Depth, PEF 0.0377 0.952 30% to test the models. Fig. 14 shows error reduction rate in the
8 Vp , RHOB, NPHI, RT, GR, Depth, PEF, Cal 0.0373 0.954 training phase of LSSVM in Well A. As can be observed, the error re-
duction slope reaches zero at the last 10 iterations for all the three
Fig. 13. Evaluation of different Kernel functions by the selected parameters in Well A.
317
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 14. Error reduction rate in different iterations of COA, PSO, and GA algorithms for training LSSVM using the data of Well A.
algorithms, which confirms that the number of iterations has been estimator models of Vs in Well A, followed by applying the prepared
appropriately selected. As shown in this figure, COA algorithm con- models for estimation of the Vs in Well B.
verges more quickly than PSO and GA algorithms. Furthermore, the
final error resulted from COA algorithm is lower than those obtained 5.1. Estimation of Vs using Vp
from PSO and GA, implying that these two algorithms have been stuck
into local optimum solutions. Several empirical and regression models have been proposed for
Fig. 15 illustrates a cross plot of measured Vs versus the predicted estimation of Vs using the compressive wave velocity (Vp). In this
one using LSSVM-COA, LSSVM-PSO, and LSSVM-GA at training and section, some of the most common models in this regard (Table 5) are
testing phases in Well A. it can be observed that RMSE obtained from used. In addition, the univariate linear regression model is fitted to Well
LSSVM-COA is smaller than those from the two other models at both A data. The main reason for applying linear regression instead of
training and testing phases. Besides, comparing the difference between multivariable regression with higher orders is described in the Ap-
the training and testing errors resulted from the three models indicates pendix.
that LSSVM-COA is of higher reliability. Consequently, it is expected The results of applying the models shown in Table 5 and the linear
that this model could most accurately estimate Vs in other wells. equation fitted by the authors on Well A data are presented in Fig. 17.
After being trained by the data of Well A, the models were used to As can be seen, the relation fitted by the authors (Eq. (14)) has a higher
estimate Vs in Well B. As demonstrated in Fig. 16, the estimated values accuracy compared to the empirical and regression models. Meanwhile,
by LSSVM-COA model are closest to the measured values when com- the accuracy of this model compared to the LSSVM-COA model is very
pared to results of other models. Table 4 summarizes RMSE values and lower for this well. Among the empirical models, the one proposed by
the corresponding R2 obtained from the developed models. The results Castagna et al. (1993) has a lower error in estimation of Vs in Well A.
demonstrate that LSSVM-COA model is capable of estimating Vs with a However, also this model underestimates the Vs values. In contrast, the
greater accuracy and, therefore, can be implemented in other wells. model proposed by Carroll (1969) overestimated Vs values con-
siderably.
5. A comparison between LSSVM-COA and empirical and The linear regression model derived by the authors using Well A
regression relations data was used to estimate Vs in Well B. Fig. 18 compares the estimated
Vs through this model with the one obtained using LSSVM-COA model
In this section, empirical and regression relations are used to vali- in Well B. As can be noted, the Vs values estimated using LSSVM-COA,
date the hybrid Least Square Support Vector Machine-Cuckoo compared to those estimated using the regression model, are closer to
Optimization Algorithm (LSSVM-COA) model. Using some empirical the measured values in Well B. Table 6 compares the performance of all
relations, the Vs is estimated in Well A and Well B and the obtained these models for Vs estimation in the studied wells based on the RMSE
results are compared with those of LSSVM-COA. In the following, uni- and coefficient of determination (R2). As can be noted, the LSSVM-COA
variate and multivariable regression methods are applied to create model has a considerably lower error compared to the other models.
318
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 15. Cross plot of measured versus predicted Vs using LSSVM-COA, LSSVM-PSO, and LSSVM-GA in training (a, c, and e) and testing (b, d, and f) phases in Well A.
319
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 16. A comparison between observed and estimated Vs in Well B by the means of the trained models in Well A.
Hence, the use of this model for Vs estimation is highly recommended. 6. Conclusion
In case of preferring a simpler and more convenient model for Vs es-
timation, Eq. (14) is recommended. As shown in Table 6, the model In the present study, the hybrid LSSVM-COA algorithm is used to
proposed by Carroll (1969) has a very low accuracy for wells A and B. design a Vs estimator using the common petrographical variables in an
Thus, using this model for Vs estimation is not recommended at all. oil well (Well A) in Ahvaz oil field. To evaluate the obtained model, in
addition to the empirical and regression models, two hybrid algorithms
of LSSVM-PSO and LSSVM-GA were used. To assess the accuracy and
5.2. Estimation of Vs using petrophysical logs generalizability of the models, data of a similar well (Well B) in Ab-
Teymour oil field were employed.
Considering the widespread use of regression models in previous Implementation of NSGA-II algorithm combined with the MLP re-
studies, we validated the LSSVM-COA using nonlinear multivariable vealed that an increase in the number of estimator model's inputs re-
stepwise regression (NLMSR). For this purpose, five parameters were sults in a decrease in modeling errors; however, the error decrease trend
selected and in each step, some features were used as the inputs of the is slight from fiver input variables onward. Hence, variables Vp, RHOB,
regression model. The regression type used for this purpose was NPHI, RT, and GR were selected as the optimum modeling inputs. The
quadratic, which contain parameters such as intercept, a linear term, results of training the hybrid algorithms in Well A showed that LSSVM-
and all products of pairs of the distinct predictor with squared terms. To COA model has a higher accuracy compared to other hybrid models.
select the input parameters, five criteria were employed: a change in Besides, the lower error between the training and testing steps of this
320
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 17. The plots of the measured and estimated Vs in Well A using equations of a) Castagna et al. (1993), b) Carroll (1969), c) Brocher (2005), d) Eskandari et al.
(2003), and e) Pickett (1963) and f) the model fitted by the authors of this study.
321
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 18. A comparison between the outputs of LSSVM-COA and one-degree polynomial (ODP) models in Well A for Vs estimation in Well B and measured Vs values in
Well B.
Table 7
The selected variables along with their coefficients using the NLMSR model for
Table 6 Vs estimation based on the processed data of Well A.
A comparison between regression, empirical, and LSSVM-COA models for Vs
estimation in the studied wells. Vs = 1.6876 0.0034 × RT 6.1786 × RHOB + 4.0897 × Vp
Well name Model RMSE R-square + 0.0016 × RT × RHOB 1.4608 × RHOB × Vp 1.9154 × 10 6 × RT 2
Castagna et al. (1993) 0.117 0.741 RT × RHOB 0.0016 0.0003 5.0571 4.6172 × 10 7
Table 8
RMSE and R2 of the proposed NLMSR model on Well A data.
R-square Adjusted R-square RMSE p-value
Fig. 19. A comparison of the LSSVM-COA and NLMSR models trained based on Well A data for Vs estimation in Well B and the measured Vs in Well B.
322
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Table 9 revealed its much higher accuracy compared to the other hybrid
A comparison of LSSVM-COA and NLMSR models in Well A and Well B based on models. The comparison of LSSVM-COA with univariate and multi-
the RMSE and the coefficient of determination values. variable regression models in the studied wells also prove the super-
Well name Model RMSE R-square iority of LSSVM-COA. Accordingly, it can be confidently stated that the
use of this model for Vs estimation in other wells provides more ac-
Well A LSSVM-COA 0.029 0.983 curate results compared to other models studied in this paper.
NLMSR 0.079 0.864
Well B LSSVM-COA 0.073 0.929
NLMSR 0.138 0.757
Acknowledgement
model suggests its higher reliability and accuracy in Vs estimation. National Iranian South Oil Company (NISOC) is gratefully ac-
Implementation of LSSVM-COA model for Vs estimation in Well B knowledged for providing useful data and valuable support.
7 Appendix
In this section, we evaluate the efficiency of polynomial equations for estimation of the optimum model for Vs estimation using Vp. For this
purpose, first, the models are prepared separately for both studied wells followed by validating the Vs estimation model for Well A in Well B, vice
versa.
Tables 10 and 11 present fitted polynomial equations of order one to three for Vs estimation using Vp for Well A and Well B, respectively. As
presented in this table, the error values decrease with an increase in the order of polynomial equations and increase, thereafter; such that the
minimum error is seen in a second order polynomial equation. Implementation of the models developed for each well for Vs estimation in the other
well provides some other useful points. Fig. 20 illustrates the implementation of polynomial models with different orders of Well A for Vs estimation
in Well B. The results show a much higher accuracy of the linear model compared to the second order and third order polynomial models. This
procedure was also repeated for Vs estimation in Well A using the polynomial models of Well B (Fig. 21). The results confirm the previous results and
suggest the lower modeling error of the linear model compared to the polynomial models with higher orders for Vs estimation using Vp.
Table 10
The polynomial models proposed for estimating Vs using Vp in Well A
Table 11
The prepared polynomial models for estimating Vs using Vp in Well B
323
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 20. Implementation of the prepared polynomial models with the orders of a) one, b) two, and c) three in Well A for Vs estimation in Well B.
324
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 21. Implementation of the polynomial models prepared for Well A for Vs estimation in Well B.2
Fig. 22 presents Vs estimation model in Well B using the empirical and regression models prepared for Well A. As can be noted, the linear model
fitted by the authors provides a higher accuracy compared to the other models. Thus, this model, instead of other regression and empirical models, is
recommended for Ahvaz and Ab-Teymour oil fields. Moreover, it is seen that the model proposed by Carroll (1969) considerably overestimated the
Vs values.
325
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
Fig. 22. The plots of predicted Vs versus the measured Vs values for Well B using the equations proposed by a) Castagna et al. (1993), b) Carroll (1969), c) Brocher
(2005), d) Eskandari et al. (2003), and e) Pickett (1963) and f) the model fitted by the authors of this study.3
326
M. Anemangely et al. Journal of Petroleum Science and Engineering 174 (2019) 306–327
327