0% found this document useful (0 votes)

47 views19 pages

Geographic Information System and Machine Learning

This study investigates the suitability of the Cholistan Desert in Pakistan for solar photovoltaic (PV) power plants using Geographic Information System (GIS) and machine learning techniques. It identifies optimal areas for solar energy production, highlighting Bahawalnagar and Bahawalpur as regions with high potential for PV installation. The research emphasizes the integration of real-time data and advanced modeling to enhance renewable energy planning and address the energy crisis in Punjab.

Uploaded by

Junaid Safdar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views19 pages

Geographic Information System and Machine Learning

Uploaded by

Junaid Safdar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Article

Geographic Information System and Machine Learning

Approach for Solar Photovoltaic Site Selection: A Case
Study in Pakistan
Hafiz Adnan Ashraf 1 , Jiajun Li 1, *, Zeyu Li 1 , Azam Sohail 2 , Raza Ahmed 3 , Muhammad Hamza Butt 4
and Hameed Ullah 5

1 School of Astronautics, Beihang University, Beijing 102206, China; adnanspsc@buaa.edu.cn (H.A.A.);

lizeyu123478@buaa.edu.cn (Z.L.)
2 Punjab Wildlife and Park Department, Loi Bher, Islamabad 44000, Pakistan; sohail.spsc@gmail.com
3 Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute,
Chinese Academy of Sciences, Beijing 100101, China; razaahmed3550@mails.ucas.ac.cn
4 Department of Space Science, University of the Punjab, Lahore 54000, Pakistan; hamzabutt466@gmail.com
5 GIS Department, The Urban Unit 503 Shaheen Complex, Lahore 54000, Pakistan; bhuttabrother@hotmail.com
* Correspondence: jiajunli@buaa.edu.cn

Abstract: Punjab, the most populous province in Pakistan, is currently facing substantial
electricity shortages that are adversely affecting both residential and industrial sectors. To
address this issue, the Cholistan Desert presents a promising solution due to its high solar
irradiance, making it an ideal location for solar energy production. This study aims to
identify the most suitable area for solar photovoltaic (PV) power plants in the Cholistan
Desert using Geographic Information System (GIS) and machine learning techniques. The
analysis included field survey data encompassing 14 conditioning factors such as geophysi-
cal, socio-economic, and resource conditions. Three machine learning models were utilized:
Random Forest, XGBoost, and Multilayer Perceptron (MLP). The Random Forest model
demonstrated superior performance with an AUC of 0.92, and feature importance was
measured through SHAP. The resulting suitability map indicates that Bahawalnagar in the
eastern region and Bahawalpur in the central region have 10.50% and 11.06% of their areas
Academic Editor: Michael C. classified as having a “high” and “very high” probability for solar PV installation, respec-
Georgiadis tively. For stakeholders in the wind industry, these regions also present potential for wind
Received: 1 February 2025 farm feasibility due to favorable wind conditions and flat terrain. The methodology can be
Revised: 5 March 2025 adapted to prioritize wind energy sites by incorporating factors such as land availability,
Accepted: 18 March 2025
wind direction, and other related factors. Co-locating solar and wind farms in these regions
Published: 25 March 2025
could optimize land use, enhance grid stability, and support Pakistan’s renewable energy
Citation: Ashraf, H.A.; Li, J.; Li, Z.;
targets. Future research integrating real-time solar and wind data could further refine
Sohail, A.; Ahmed, R.; Butt, M.H.;
site selection and support multi-source renewable energy planning, providing actionable
Ullah, H. Geographic Information
System and Machine Learning
insights for policymakers and investors.
Approach for Solar Photovoltaic Site
Selection: A Case Study in Pakistan. Keywords: site selection; feature importance; renewable energy planning; Random Forest;
Processes 2025, 13, 981. https:// XGBoost; Multilayer Perceptron; site suitability analysis
doi.org/10.3390/pr13040981

Copyright: © 2025 by the authors.

Licensee MDPI, Basel, Switzerland.
This article is an open access article 1. Introduction
distributed under the terms and
Pakistan has grappled with a significant energy crisis since 2004, driven by several
conditions of the Creative Commons
factors, including the absence of advanced modeling tools in power planning and policy-
Attribution (CC BY) license
(https://creativecommons.org/ making, over-reliance on imported energy resources, and inadequate governance [1]. This
licenses/by/4.0/). crisis has particularly impacted the province of Punjab, which is home to over 90 million

Processes 2025, 13, 981 https://doi.org/10.3390/pr13040981

Processes 2025, 13, 981 2 of 19

people and consumes 68% of the country’s electricity. With an annual demand growth
of 6–8%, Punjab faces a persistent demand–supply gap of 4000 MW, leading to severe
economic repercussions and disruptions in citizens’ daily lives. Addressing this energy
deficit is critical for the region’s sustainable development and economic stability.
In response to this crisis, the government has prioritized mitigating power deficits by
investing in the energy sector and promoting renewable energy initiatives to fully leverage
the country’s sustainable electricity potential [2]. Renewable energy sources, such as solar,
wind, and hydro, offer sustainable and abundant alternatives to finite fossil fuels. These
alternatives not only enhance energy security and reduce geopolitical tensions associated
with energy dependence but also play a vital role in addressing climate change. Further-
more, renewable energy improves air quality and drives economic growth by creating
employment opportunities in manufacturing, installation, operation, and maintenance.
These benefits underscore the importance of transitioning to renewable energy systems to
ensure long-term energy sustainability.
Technological advancements and economies of scale have significantly reduced the
cost of renewable energy production, making it increasingly competitive with conventional
energy sources. The global transition to renewable energy reveals significant disparities
in adoption rates across regions, with the EU, China, and India leading in wind and solar
advancements [3], due to prices being more stable over time compared to the volatile prices
of fossil fuels. Despite these advantages, Pakistan still heavily relies on non-renewable
resources such as coal, oil, and natural gas, which dominate the country’s electricity
generation. This reliance highlights the urgent need to accelerate the adoption of renewable
energy technologies to diversify the energy mix and reduce dependency on fossil fuels.
Among the renewable energy options, recent advances in solar photovoltaic (PV) ma-
terials and systems have enhanced efficiency, reduced costs, and improved energy storage,
making PV a viable renewable energy solution [4]. The Cholistan Desert, with its vast
high-irradiation areas, offers an ideal location for large-scale solar PV deployment, pro-
viding a viable pathway to address the energy crisis effectively and sustainably. However,
despite this immense potential, the solar energy sector in Pakistan remains in its nascent
stages, presenting both challenges and opportunities for growth [5]. The increasing global
demand for renewable energy has introduced complexities in assessing site suitability and
evaluating the technical potential for solar installations. Optimal spatial location selection
for utility-scale PV systems is critical to maximizing the benefits of solar resources while
addressing the inherent variability of solar energy [6].
While Geographic Information Systems (GIS) and machine learning (ML) have been
widely adopted for renewable energy site selection, existing studies often rely on frag-
mented approaches that prioritize theoretical datasets or fail to integrate multi-dimensional,
field-verified inputs. This limitation is particularly evident in hyper-arid regions like the
Cholistan Desert, Pakistan, where extreme environmental conditions (e.g., temperature
gradients, seasonal variability) and dynamic land-use patterns pose unique challenges. Ad-
ditionally, the integration of GIS and ML for solar photovoltaic (PV) site selection remains
underexplored, especially in understudied desert ecosystems. Current methodologies lack
adaptability to real-time environmental fluctuations and often overlook the interpretability
of ML models, limiting their practical relevance and scalability. This gap highlights the
need for an integrated framework that combines spatially explicit ground-truth data with
advanced ML algorithms to enhance prediction accuracy, transparency, and long-term
project viability in complex environments.
The primary research question guiding this study is as follows: How can an integrated
GIS–ML framework, leveraging spatially explicit ground-truth data and advanced ma-
chine learning algorithms (Random Forest, XGBoost, and Multilayer Perceptron), improve
Processes 2025, 13, 981 3 of 19

the precision, interpretability, and adaptability of solar PV site selection in hyper-arid

regions like the Cholistan Desert, Pakistan? To address this, this study aims to develop
and validate an integrated GIS–ML framework tailored to the unique challenges of hyper-
arid environments. Specifically, it seeks to synthesize spatially explicit ground-truth data
with advanced ML algorithms to enhance prediction accuracy and adaptability, incorpo-
rate SHAP (SHapley Additive exPlanations) for transparent feature contribution analysis,
and unify adaptive geospatial modeling with temporal dynamics to account for extreme
environmental conditions and ensure long-term project viability. By providing a scalable,
context-sensitive solution, this framework advances precision in site selection and supports
sustainable solar energy development in understudied desert ecosystems, bridging the gap
between theoretical datasets and practical applications.
The remaining part of the paper is structured as follows. Section 2 reviews literature
on GIS–ML approaches in renewable energy planning, contextualizing global trends and
regional challenges. Section 3 outlines the methodology, including the Cholistan Desert
study area and data sources for solar PV site evaluation. Section 4 presents results, mapping
high-potential zones and validating outcomes via geospatial/statistical models. Section 5
discusses strategies for addressing policy, economic, and environmental factors. The con-
clusion underscores how data-driven analytics can bridge renewable energy potential and
practical deployment, offering scalable solutions for energy security in resource-constrained
regions like Punjab.

2. Literature Review
Identifying optimal locations for solar photovoltaic (PV) installations is a critical step in
advancing renewable energy strategies, requiring a comprehensive evaluation of ecological,
technical, economic, and social factors. The integration of Geographic Information Systems
(GIS) and machine learning (ML) methods provides a structured and robust approach to
addressing the complexities inherent in site suitability assessments. The proliferation of
Geographic Information System (GIS) platforms [7,8] has revolutionized spatial analysis
for renewable energy projects, enabling a systematic assessment of solar and wind farm
feasibility and the identification of optimal site configurations. GIS facilitates the integration,
processing, and visualization of multi-source geospatial datasets—such as solar irradiance,
land cover, and topography—to support data-driven decision-making in photovoltaic
(PV) infrastructure development. By coupling GIS with economic models, researchers can
quantify grid-connected technical potential and analyze cost-benefit dynamics for solar
energy generation, enhancing the precision of large-scale project planning.
A critical application of GIS lies in evaluating land suitability for utility-scale PV in-
stallations. Advanced methodologies incorporating exclusion criteria (e.g., environmental
sensitivities, slope limitations) and spatial constraints (e.g., proximity to grids, protected
areas) have proven instrumental in minimizing ecological and operational risks [9]. Fur-
thermore, GIS-based multi-criteria decision-making (MCDM) frameworks are increasingly
adopted to assess site suitability and technical capacity for renewable energy projects.
These frameworks synthesize environmental, economic, and social variables, offering
stakeholders actionable insights for prioritizing high-potential locations and streamlining
regulatory compliance [10]. Such tools also enable the integration of dynamic parameters
like solar radiation patterns and land-use changes, reinforcing GIS’s role as a cornerstone
of sustainable energy planning [11].
Machine learning (ML) has emerged as a transformative paradigm for addressing
complex spatial classification and regression challenges in renewable energy siting [12].
ML algorithms excel in processing heterogeneous datasets, with applications spanning soil
property mapping, biodiversity conservation, land-use classification, and energy infras-
Processes 2025, 13, 981 4 of 19

tructure optimization. For instance, this study [13] demonstrated the efficacy of ensemble
ML techniques in identifying optimal locations for waste-to-energy facilities, generating
high-resolution suitability maps while isolating critical siting parameters like population
density and transportation networks. Similarly, this study [14] employed ML models to
map wind turbine suitability in Iowa, USA, validating their utility in balancing energy
output with environmental and land-use constraints.
These advancements underscore how machine learning (ML) complements GIS-driven
spatial analysis by enhancing predictive accuracy and adaptability in clean energy infras-
tructure planning. Building on GIS frameworks that evaluate land suitability and technical
potential (e.g., [9]), ML algorithms such as Support Vector Regression (SVR), decision trees,
and Random Forests now enable a deeper analysis of multidimensional geospatial data.
By systematically integrating resource availability (e.g., solar irradiance), microclimatic
variability, regulatory constraints, and socio-economic indicators, ML refines predictive
models for solar PV deployment, optimizing both site selection precision and long-term
project viability [15].
The integration of ML with GIS platforms addresses a critical gap identified in earlier
studies: the need for dynamic, data-driven adaptability in renewable energy planning.
For instance, while traditional GIS methodologies rely on static criteria (e.g., [11]), ML–GIS
hybrid models uncover latent patterns in large-scale datasets, such as shifting climatic
trends or evolving land-use dynamics, to improve decision robustness [16–18]. This synergy
enables predictive analytics that adapt to real-time environmental fluctuations, ensuring
that renewable energy projects remain resilient under changing conditions.
Existing photovoltaic (PV) site suitability assessments in arid regions frequently em-
ploy fragmented analytical approaches—relying solely on geospatial tools or machine
learning (ML) and prioritizing theoretical datasets over multi-dimensional, field-verified
inputs. Such methods inadequately address hyper-arid environmental complexities, includ-
ing extreme temperature gradients, and seasonal resource variability, limiting their practical
relevance in regions like the Cholistan Desert, Pakistan. To resolve these shortcomings, this
study introduces an integrated framework that synthesizes spatially explicit ground-truth
data with three advanced ML algorithms: Random Forest, XGBoost, and a Multilayer
Perceptron (MLP) neural network. Leveraging ArcGIS 10.8 for spatial data processing
and SHAP (SHapley Additive exPlanations) for interpretability, the methodology ensures
both high prediction accuracy and transparent feature contribution analysis. By unifying
adaptive geospatial modeling with temporal dynamics, the framework delivers a scalable,
context-sensitive solution tailored to the unique solar energy challenges of understudied
desert ecosystems, advancing precision in site selection and long-term project viability.
The main contributions of this study are threefold. First, it introduces a hybrid GIS–ML
framework that integrates geospatial analysis with advanced machine learning algorithms
(Random Forest, XGBoost, and MLP neural networks), overcoming the limitations of
isolated methodologies. This synergy enhances adaptability to dynamic desert-specific
challenges. Second, the study pioneers the use of ground-truth data, including solar
potential, environmental, and socio-economic indicators, to empirically validate models
in hyper-arid regions a critical advancement beyond theoretical or single-source datasets.
Third, it establishes interpretable and scalable decision-making through SHAP-driven
model transparency, clarifying feature contributions (e.g., environmental, solar irradiance)
while ensuring replicability for other arid ecosystems. Collectively, these innovations
provide a precision-driven, context-sensitive solution for sustainable solar energy planning
in understudied desert environments in Pakistan.
Processes 2025, 13, 981 5 of 19

3. Materials and Methods

3.1. Study Area
The study focused on the Cholistan Desert Punjab, Pakistan as the designated area
of interest, covering an area of approximately 25,900 square kilometers. This desert lies
between latitudes 27°42′ to 29°45′ and longitudes 69°52′ to 75°24′ [19]. It supports a
population exceeding 0.3 million residents and sustains around 2.0 million livestock .
The land in Cholistan is utilized for various purposes, including grasslands, agriculture,
built-up areas, and barren terrain. The desert stretches roughly 480 km in length, with a
width that varies between 32 and 192 km.
The Cholistan Desert is divided into two regions: greater Cholistan, encompassing
an area of 18,130 square kilometers, and lesser Cholistan, covering 7770 square kilometers.
Situated in southwest Punjab, Pakistan, it spans three districts: Bahawalpur, Bahawal-
nagar, and Rahim Yar Khan [20]. The region’s climate is classified as arid to semi-arid,
characteristic of tropical desert environments, and is marked by very low annual humidity.
The average yearly temperature is 28.33 °C (82.99 °F), with July being the hottest month,
recording a mean temperature of 38.5 °C (101.3 °F).
The region experiences abundant sunshine throughout the year, making it highly
suitable for solar photovoltaic (PV) systems. This climatic advantage presents significant
potential for harnessing solar energy to meet Punjab’s increasing energy needs while
addressing environmental challenges. The study highlights the strategic role of utilizing
the country’s solar resources to optimize energy generation and contribute to sustainable
energy solutions. Figure 1 provides a spatial representation of the study area, offering
essential geographical context for this research.

Figure 1. Map of the study area.

Processes 2025, 13, 981 6 of 19

3.2. Data Collection

A detailed geospatial dataset was developed to identify optimal locations for the in-
stallation of solar photovoltaic (PV) plants. This dataset was constructed using field survey
data collected from 355 location points, including both PV and non-PV sites. These points
were meticulously analyzed using ArcGIS software to process and evaluate critical vari-
ables for PV site selection, considering 14 distinct features such as physical-geographical,
socio-economic, and resource conditioning factors. The raster values for these features
were extracted using the Extract Values to Points tool in ArcGIS 10.8, with all conditioning
factor maps converted into raster format at a spatial resolution of 1 km. To ensure compati-
bility among datasets with differing resolutions, a multi-step procedure was implemented,
resampling all datasets to a uniform 1 km resolution using the nearest neighbor method in
ArcGIS 10.8. The integration of advanced spatial analytics and field-derived data ensures
the dataset’s reliability for sustainable energy planning and decision-making.
Solar irradiation data were downloaded from https://solargis.com and accessed on
31 October 2024 with a resolution of 250 m. The Digital Elevation Model (DEM) was
obtained from the USGS repository https://www.usgs.gov/ and accessed on 18 November
2024 with a resolution of 30 m. Vegetation analysis was conducted using the Normalized
Difference Vegetation Index (NDVI), derived from Sentinel-2 imagery (COPERNICUS/S2),
with Near-Infrared (NIR) and red bands obtained from http://livingatlas.arcgis.com and
accessed on 20 November 2024 at a resolution of 30 m. These datasets were upsampled to a
uniform resolution of 1 km for consistency.
Infrastructure and socio-economic considerations were addressed using road network
and population density data, which were critical for evaluating the proximity of poten-
tial sites to transportation routes and population centers. Both datasets were retrieved
from https://data.humdata.org and accessed on 11 and 3 November 2024 respectively,
had a spatial resolution of 1 km, remaining unchanged as they already matched the tar-
get resolution. Wind speed data, sourced from the ERA5 Monthly Aggregates dataset
(ECMWF/ERA5/MONTHLY), was computed by combining the u- and v-components of
wind measured at 10 m above the surface. This dataset, with a spatial resolution of 1 km,
was obtained through https://earthengine.google.com and accessed on 31 October 2024,
and also remained unchanged. Temperature data were obtained from https://solargis.com
and accessed on 31 October 2024 with a spatial resolution of 1 km. This process ensured
that all datasets could be effectively integrated into the site suitability analysis, providing a
coherent and consistent basis for the machine learning models. The summary of detailed
indicators is shown in Figure 2 and listed in Table 1.

Table 1. Detailed description of the dataset used in this study.

Category Variables Description References

Land Use/Land Cover Land use/Land cover [21]
Aspect Topography [22]
Slope Topography [22]
Geographical Factors Elevation Topography [23]
Air Temperature Air temperature at 2 m above surface [24]
PM10 Concentration Particles with a diameter ≤ 10 micrometers [17]
NDVI Using NIR and red bands from Sentinel-2 [25]
Wind Speed v-components at 10m above surface [26]
Processes 2025, 13, 981 7 of 19

Table 1. Cont.

Category Variables Description References

Carbon emission Fossil CO2 emissions [27]
Distance to residential area Proximity to residential population [28]
Socio-Economic Factors Population density Proximity to population centers [29]
Distance to road network Proximity to infrastructure [30]
Solar irradiation (kWh/m2 ) Estimates energy generation potential [31]
Resource Condition Factors
Sunshine duration Calculated from the hourly sunshine time [25]

Figure 2. Maps of key conditioning factors for solar PV site suitability (a–i).

4. Methodology
It is presumed that the PV power station installation enjoys a fairly optimal geo-
graphical position, as determined collaboratively by investment decision-makers and
experts. The overall process comprises the following steps. Initially, ground-truth data
were gathered via field surveys conducted across the Cholistan Desert Punjab, Pakistan.
Subsequently, paired non-PV installation points were randomly generated using the spatial
buffer sampling method. A total of 14 conditioning factors, including physical geography,
Processes 2025, 13, 981 8 of 19

proximity, and solar resources, were identified through a literature review and extracted
from multi-source datasets by using ArcGIS 10.8 software. Next, the combined dataset of
dependent and independent indicators was randomly selected and split into a 70/30 ratio.
Specifically, 70% of the dataset was utilized for model building, while the remaining 30%
was reserved for model validation.
Following this, three widely used machine learning techniques—Multilayer Percep-
tron (MLP), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) models—were
employed for evaluation. Ultimately, the most robust model was selected to predict the suit-
ability of PV installations throughout the desert. The relative variable importance diagram
(SHAP) from the chosen robust model was then utilized to evaluate the marginal contribu-
tion and direction of each variable in relation to PV location selection. The methodological
framework is comprehensively detailed in Figure 3.

Figure 3. Illustration of proposed methodology.

4.1. Multilayer Perceptron Neural Network (MLP NN)

The MLP network is one of the most applicable feed-forward neural network tech-
niques for modeling and prediction of the real world, thus it has been used as the benchmark
model in many fields. We implemented a Multilayer Perceptron (MLP) that consists of
three hidden layers with 256, 128, and 64 neurons, respectively. The Rectified Linear Unit
(ReLU) [32] activation function was used for the hidden layers.
To optimize the MLP model, we employed the Adam optimizer [33], which combines
the benefits of momentum and adaptive learning rates. The optimization process of the
Adam algorithm is given by the Equations (1)–(4) [34].
m t = β 1 m t −1 + (1 − β 1 ) g t (1)

vt = β 2 vt−1 + (1 − β 2 ) gt2 (2)

mt vt
m̂t = , v̂t = (3)
1 − βt1 1 − βt2
m̂t
θ t = θ t −1 − η √ (4)
v̂t + ϵ
where mt and vt are the biased first and second moment estimates, β 1 and β 2 are the decay
rates, η is the learning rate, and ϵ is a small constant for numerical stability.
ReLU introduces non-linearity while mitigating the vanishing gradient problem.
To prevent overfitting, early stopping is employed with a validation fraction of 10% of
the training data. Additionally, L2 regularization (α = 1000) adds stability to the model.
The parameters of the MLP are optimized through fine-tuning, and the most suitable values
are subsequently selected.

4.2. Random Forest

Random Forest (RF) is a robust ensemble learning approach that enhances predictive
accuracy by constructing numerous decision trees and aggregating their outputs. It is partic-
ularly effective in addressing overfitting and variance, making it suitable for datasets with
complex non-linear relationships, high-dimensional features, and minimal pre-processing
Processes 2025, 13, 981 9 of 19

requirements. In classification tasks, RF leverages the principle of combining multiple

uncorrelated decision trees to achieve more reliable predictions.
Specifically, RF constructs regression trees by recursively splitting the data into subsets,
optimizing a criterion such as the residual sum of squares at each step. Each split divides
the feature space into regions represented as nodes, which are further divided until a
predefined stopping condition is met, such as a minimum number of observations in
terminal nodes. For regression tasks, predictions are derived by averaging the outputs
across all trees, enabling the estimation of solar PV installation potential through the
aggregated results.
We employed the Random Forest (RF) algorithm as an ensemble learning approach
for classification tasks. In our study, we utilized 500 decision trees (n_estimator = 500) [35],
with the class_weight parameter set to balanced to handle class imbalance in the dataset.
We also fixed the random seed (random_state=123) to ensure the reproducibility of our
results. During inference, class probabilities were calculated as the mean of the probabilities
estimated by all decision trees in the ensemble:

N
1
P(c| x ) =
N ∑ Pt (c|x), (5)
t =1

where N is the total number of decision trees in the ensemble.

The predicted class ŷ for input x is determined by selecting the class with the highest
aggregated probability:
ŷ = arg max P(c| x ). (6)
c

This ensemble approach is highly effective in classification tasks, particularly in scenar-

ios involving imbalanced datasets and complex feature spaces [36]. Equations (5) and (6)
describe the process of calculating class probabilities and determining the predicted class
in the Random Forest algorithm.

4.3. Extreme Gradient Boosting Model (XGBoost)

The Extreme Gradient Boosting (XGBoost) algorithm was proposed by [37]. It com-
bines multiple weakly supervised models to produce a strong supervised model. The XG-
Boost can help to reduce overfitting and perform better prediction accuracy. Meanwhile,
XGBoost is not influenced by multi-collinearity, and all influential features in the model
may be retained, even if some of them were in strong correlation with one another. XGBoost
makes a second-order Taylor expansion of the loss function and adds a regular term to the
loss function to find the optimal solution, balance the decline of the loss function and the
complexity of the model, and avoid overfitting.
We applied the Extreme Gradient Boosting (XGBoost) algorithm as a second ensemble
learning method. XGBoost builds an additive ensemble of weak learners (decision trees) in
a sequential manner, where each new tree minimizes the residual errors of the previously
constructed ensemble. To achieve this, the algorithm optimizes the following objective
function is given by the Equation (7).

Obj(θ ) = L(y, ŷ) + Ω( f ) (7)

where L(y, ŷ) is the logarithmic loss function, and Ω( f ) is the regularization term. The
complete objective function is given by the Equation (8):
N T
1 1
Obj(θ ) = −
N ∑ [yi log(ŷi ) + (1 − yi ) log(1 − ŷi )] + γT + 2 λ ∑ w2j (8)
i =1 j =1
Processes 2025, 13, 981 10 of 19

In our study, we used XGBoost for classification tasks. The loss function used is the
logarithmic loss, which is represented in Equations (9):

N
1
L(y, ŷ) = −
N ∑ [yi log(ŷi ) + (1 − yi ) log(1 − ŷi )], (9)
i =1

where L(y, ŷ) is the loss function, yi is the true label, and ŷi is the predicted probability for
the i-th instance.
The model is regularized by penalizing the complexity, which is controlled by the
number of leaves (T) and their weights (w) in each tree. The regularization term is given by
the Equation (10):
1 T
Ω( f ) = γT + λ ∑ w2j , (10)
2 j =1

where γ is a parameter that controls the complexity by penalizing the number of leaves, w j
represents the weight of the j-th leaf, and λ is a regularization parameter that penalizes
large weights to prevent overfitting.
To address a class imbalance in the dataset, we used the scale_pos_weight parameter,
which assigns a higher weight to the minority class during training to handle imbalanced
classes effectively.
We used 500 boosting rounds (n_estimators = 500) with a fixed random seed
(random_state = 123) for consistency. XGBoost leverages second-order gradients (Hessian
information) for efficient optimization, and its implementation supports parallel processing
to improve computational efficiency.

4.4. Model Settings

The experiments were performed in a computational environment running Windows
10 Pro (64-bit), powered by an Intel Core i7-6820HQ CPU operating at 2.70 GHz (8 CPUs)
and supported by an NVIDIA GeForce RTX 3090 GPU server to ensure efficient and high-
performance computation. For coding and implementation, we utilized Visual Studio Code
(VS Code) as the integrated development environment (IDE), with Python as the primary
programming language.

4.5. Models Evaluation

A total of 355 samples, which include both PV and non-PV location points, were
randomly divided into two groups, with 70% allocated for training and 30% for testing.
For each type of machine learning model, receiver operating characteristic (ROC) curves
were constructed to assess and compare the effectiveness of various data-driven models.
The ROC curve illustrates the false positive rate on the x-axis versus the true positive rate
on the y-axis. This curve has become a widely accepted method for gauging the overall
performance of classification models [38]. Additionally, the area under the ROC curve
(AUC) serves as a quantitative indicator of a model’s quality. Equation (11) is used for
computing the ROC:
∑ TP + ∑ TN
AUC = (11)
P+N
In this study, we define P data points as representing solar PV installation locations,
while N denotes the data points for non-solar PV installation sites. True positive (TP)
and true negative (TN) refer to the correctly classified solar PV installation locations and
non-solar PV sites, respectively. The classification accuracy (CA) is calculated as the ratio
Processes 2025, 13, 981 11 of 19

of correctly classified samples, which is based on the 30% validation dataset. CA values are
calculated by Equation (12). A higher CA value suggests better model performance.

TP + TN
CA = (12)
TP + FP + TN + FN

where TP represents the number of pixels correctly identified as solar PV installation

locations, FN denotes the number of pixels misclassified as non-solar PV installation sites,
TN is the count of pixels correctly classified as non-solar PV sites, and FP refers to the
number of pixels incorrectly assigned to solar PV installation locations. Accuracy serves as
a metric to assess the performance of binary classification tasks by evaluating the rate of
correct identifications [17].
The SHAP (SHapley Additive exPlanations) method, as discussed by [39,40], is a
robust technique for interpreting machine learning models, particularly tree-based models.
SHAP quantifies the contribution of each feature to the model’s predictions, offering
valuable insights into how different features influence the outcomes. While SHAP can
evaluate the effect of features on individual predictions, it also provides a way to estimate
feature importance. In this study, we focused on calculating the feature importance values
rather than assessing the direct influence of individual features on the model’s output.

5. Results and Discussion

5.1. Evaluation of Three Models Applied
An assessment and comparison were conducted on three frequently utilized machine
learning algorithms using statistical metrics; AUC, CA, and ROC curves are shown in
Figure 4. The optimal model results are displayed in Table 2 and Figure 5. Overall,
the Random Forest (RF) models demonstrated strong performance in modeling the locations
of photovoltaic (PV) power plants, with an AUC exceeding 0.92. In contrast, the Extreme
Gradient Boosting (XGBoost) models exhibited the lowest performance scores, having an
AUC value 0.819, which was lower than that of other models. Regarding classification
accuracy, the RF models also surpassed the other ML models, boasting a CA value of 0.723.
The RF prediction models achieved commendable results in forecasting PV installation
locations. Thus, the RF model holds significant potential for modeling PV power plant
locations in the study area.

Figure 4. ROC curve comparison between RF, MLP, and XGBoost models.
Processes 2025, 13, 981 12 of 19

Figure 5. Comparison of RF, XGBoost, and MLP models (AUC and CA) values.

Table 2. Comparison of AUC and CA of considered models for testing sets.

Models AUC CA
RF 0.921 0.723
XGBoost 0.819 0.681
MLP 0.858 0.702

5.2. Feature Importance Analysis

To better interpret the machine learning (ML) modeling results, a feature impor-
tance analysis was conducted across three machine learning models—Random Forest (RF),
Multilayer Perceptron (MLP), and XGBoost (XGB). The analysis reveals consistent key
factors influencing photovoltaic (PV) power plant location predictions, as illustrated in
Figures 6–8. In the Random Forest (RF) model, several key features were identified as
significantly influencing the suitability of PV installations. The most influential factors
include Global Horizontal Irradiance (GHI) and population density, followed by other
critical factors such as aspect, distance to residential areas, and elevation. These findings
suggest that geographic elevation and local climate conditions play a pivotal role in de-
termining optimal locations for PV installations. Additionally, other important features
such as Normalized Difference Vegetation Index (NDVI), PM10 concentration, temperature,
wind speed, and slope emerged as significant considerations. These elements underscore
the importance of solar irradiance levels and geographical factors in assessing the potential
of a site for PV installations. However, variables such as distance to road exhibited a less
pronounced impact on PV site suitability, allowing for a more focused evaluation of the
most relevant factors.
In the context of the Multilayer Perceptron (MLP) model, Global Horizontal Irradiance
(GHI) has been identified as the most critical factor, underscoring the pivotal role of envi-
ronmental factors and terrain characteristics in determining the suitability of photovoltaic
(PV) installations. Subsequently, factors such as elevation and wind speed are also found to
play significant roles, further emphasizing the impact of local geographic and atmospheric
conditions on PV potential. The MLP model indicates that slope, distance to residential
areas, and aspect are recognized as less influential factors. Among these, population den-
sity, temperature, and distance to roads are identified as the least impactful factors in the
model’s assessment of PV suitability. This finding suggests that, while certain environmen-
Processes 2025, 13, 981 13 of 19

tal and infrastructural factors are more influential in determining PV installation suitability,
others have a relatively minor effect on the model’s predictions.

Figure 6. Feature importance ranking diagram for the RF model.

Figure 7. Feature importance ranking diagram for the MLP model.

Processes 2025, 13, 981 14 of 19

Figure 8. Feature importance ranking diagram for the XGBoost model.

In the context of the XGBoost (XGB) model, the most dominant feature, as determined
by the XGBoost model, is the aspect, which is a critical geographical attribute influencing
the efficiency of solar energy capture. This factor’s prominence in the model’s output
underscores the importance of the landscape’s orientation in maximizing solar energy yield.
Subsequently, Global Horizontal Irradiance (GHI) and population density are identified as
significant contributors to the model’s predictions. The inclusion of GHI reflects the direct
impact of solar energy availability on PV potential, while population density serves as an
indicator of the accessibility and the potential for infrastructure development in the area.
While these factors are the most influential, the XGBoost model also considers temperature,
distance to residential areas, elevation, and wind speed as additional variables that affect
PV suitability. However, these factors are ranked lower in terms of importance, suggesting
that they have a lesser impact on the model’s predictions.
We noticed that the “sunshine duration” feature was expected to be of great impor-
tance, yet its importance was negligible in all three models due to several factors. First,
the feature may be redundant with Global Horizontal Irradiance (GHI) data, as GHI directly
measures the amount of solar energy available, likely capturing essential information about
solar potential and overshadowing the contribution of sunshine duration and reducing the
independent contribution of sunshine duration in the model’s decision-making process.
Second, the accuracy and resolution of the ’sunshine duration’ data might not be sufficient
to provide meaningful insights, as inaccuracies or a lack of granularity could diminish its
predictive power. Third, ’sunshine duration’ might be highly correlated with other features,
such as temperature, leading the model to attribute predictive power to those features
instead. Finally, the machine learning models, particularly Random Forest, XGBoost, and
MLP, are capable of handling complex interactions and selecting the most relevant features;
if ’sunshine duration’ does not added unique predictive value beyond what is already
captured by other features, the models may effectively ignore it.
Processes 2025, 13, 981 15 of 19

5.3. Spatial Modeling of Probability Maps for Solar Photovoltaic Installations

Advanced machine learning techniques such as Multilayer Perceptron (MLP), Random
Forest (RF), and XGBoost are employed to predict the suitability of regions for large-scale
solar photovoltaic (PV) installations, which are shown in Figure 9. These models process
geospatial and environmental data differently, resulting in variations in predicted suitability
and emphasizing the importance of evaluating multiple models for robust assessments.
In the Cholistan Desert, Punjab, Pakistan, Bahawalnagar in the eastern region has 10.50%
of its area classified as “high” and “very high” probability for solar PV installations,
attributed to its flat terrain and abundant solar irradiance. Bahawalpur, in the central
Cholistan Desert, shows even greater potential, with 11.06% of its area classified similarly,
highlighting its strategic suitability for large-scale solar energy development. Conversely,
the part of the Cholistan Desert in Rahim Yar Khan exhibits a very low probability for
solar PV installations, likely due to its limited road network and topographical factors that
reduce its overall feasibility for solar energy projects.

Figure 9. The probability maps for Solar PV installations, generated using the Multilayer Perceptron
(MLP) (a), Random Forest (RF) (b), and XGBoost models (c), are presented. The area percentages
corresponding to each machine learning model’s output are compared in (d).
Processes 2025, 13, 981 16 of 19

5.4. Area Suitability Distributions

The analysis of the classification maps derived from multiple machine learning mod-
els—Random Forest (RF), Multilayer Perceptron (MLP), and XGBoost—revealed distinct
suitability classes for solar photovoltaic (PV) installations in the Cholistan Desert of Punjab,
Pakistan. These classes, determined by the probability of successful PV installations, are
categorized as low, moderate, high, and very high suitability, which are shown in Figure 9d.
For the Random Forest model, the largest portion of the study area, 76.99%, was classified
under the low suitability class, indicating a minimal likelihood for successful solar PV
installations. This was followed by the moderate class, which accounted for 14.37% of the
area, suggesting a slightly better potential for solar energy generation. The high suitability
class covered 2.89% of the area, while the very high class, which is the most promising for
solar PV installations, made up 5.75% of the total area.
Similarly, the MultiLayer Perceptron model classified 68.25% of the area as low suit-
ability, 12.02% as moderate, 5.10% as high, and 8.63% as very high suitability. XGBoost
demonstrated a comparable distribution, with 68.72% in the low class, 16.22% as moderate,
10.61% as high, and 6.45% as very high. These classification results were instrumental in
estimating the solar energy generation potential of the Cholistan Desert.

5.5. Model Validation and Comparison

To evaluate the generalization capability of the models, cross-validation was employed,
providing a robust measure of their performance on unseen data. The Random Forest (RF)
model demonstrated strong and consistent performance, achieving a mean cross-validation
accuracy of 86.10% with a low standard deviation of 5.83%. This indicates that the RF model
is highly reliable and generalizes well across different subsets of the data. The Multilayer
Perceptron (MLP) model also performed competitively, with a mean accuracy of 85.24%,
though it exhibited a slightly higher standard deviation of 6.60%, suggesting moderate
variability in its predictions. In contrast, the XGBoost model achieved a mean accuracy of
79.35% with a standard deviation of 11.61%, indicating challenges in generalization and
stability compared to RF and MLP.
These results highlight the superior performance of the RF model, which not only
achieved the highest mean accuracy but also demonstrated the lowest variability across
folds. The MLP model, while competitive in terms of mean accuracy, showed greater
variability, which may be attributed to its sensitivity to hyperparameters or the complexity
of the dataset. The XGBoost model, despite its lower mean accuracy and higher variability,
still provides valuable insights, particularly in scenarios where interpretability and feature
importance are critical. These findings underscore the importance of selecting the appro-
priate model based on the specific requirements of the task, balancing accuracy, stability,
and computational efficiency.

5.6. Sensitivity Analysis

Sensitivity analysis was conducted on the Random Forest (RF) model to evaluate
the impact of varying input features on model performance, quantified through changes
in Accuracy and Area Under the Curve (AUC). The most influenced variable was per-
turbed by ±10%, and the resulting changes in model performance were measured over
100 iterations. The baseline RF model achieved an accuracy of 0.723 and an AUC of 0.921,
demonstrating strong predictive performance. Perturbations in features such as Global Hor-
izontal Irradiance (GHI) and slope resulted in negligible changes to both metrics, indicating
minimal influence on the model. In contrast, population density exhibited a moderate
impact, with an average decrease of 0.0100 in accuracy and a slight increase of 0.0020
in AUC, underscoring its significance in the model’s decision-making process. Similarly,
Processes 2025, 13, 981 17 of 19

distance to residential areas showed a small negative impact on accuracy (−0.0111) and
a negligible effect on AUC (−0.0001), while aspect demonstrated a minor reduction in
accuracy (−0.0041) with a marginal improvement in AUC (0.0001). These findings highlight
the model’s robustness to certain features while identifying Population Density as a key
variable, warranting further investigation and optimization to enhance predictive stability
and performance. The results provide valuable insights into feature importance and guide
future efforts in refining the model for improved reliability.

6. Conclusions
This study provides a comprehensive framework for identifying optimal sites for
solar photovoltaic (PV) power plants in the Cholistan Desert, leveraging Geographic
Information System (GIS) and machine learning techniques. By evaluating 14 critical
parameters—including solar irradiance, terrain slope, proximity to infrastructure, and land-
use constraints—we generated detailed suitability maps that prioritize high-yield zones in
Bahawalnagar and Bahawalpur. The Random Forest (RF) model demonstrated superior
performance, achieving an AUC of 0.921 and a CA of 0.723, underscoring its effective-
ness in modeling the spatial dynamics of solar energy deployment. The results reveal
that approximately 5.75% of the study area falls within high- and very-high-suitability
zones, contributing disproportionately to the total solar energy potential, estimated at
120,475 TWh/year for the region.
The findings of this study hold transformative implications for renewable energy
developers, policymakers, grid infrastructure planners, environmental consultancies, and fi-
nancial institutions. For renewable energy developers, the identification of high-suitability
zones reduces site-selection costs and enhances project feasibility, enabling more efficient
deployment of solar PV plants. Policymakers can leverage these insights to align incentives
with national and global net-zero targets, fostering a supportive regulatory environment for
renewable energy projects. Grid infrastructure planners can prioritize transmission routes
to high-yield hubs, improving energy distribution efficiency and reducing transmission
losses. Environmental consultancies can integrate ecological constraints into site-selection
processes, ensuring that solar energy development is balanced with biodiversity conserva-
tion. Finally, financial institutions can utilize suitability maps to de-risk solar investments,
fostering greater confidence in renewable energy projects and attracting more capital to
the sector.
For academic researchers, this study bridges the gap between machine learning and
industrial-scale energy planning, offering a replicable blueprint for optimizing solar energy
deployment in resource-constrained regions. The methodology’s transferability provides a
scalable framework for similar studies in other arid or high-irradiance regions, advancing
global climate resilience. By integrating geospatial analysis with machine learning, this
research contributes to the growing body of knowledge on sustainable energy planning,
offering actionable insights for both theoretical and applied domains. Future research could
explore the integration of real-time solar and wind data, enabling multi-source renewable
energy planning and further enhancing the accuracy of site suitability models.
The findings of this study not only identify optimal sites for solar PV installations but
also provide a scalable, data-driven approach to maximize solar energy potential, reduce
costs, and support global climate resilience. By aligning these findings with Pakistan’s
renewable energy targets, this research offers a replicable model for stakeholders across
industries and academia, fostering collaboration to accelerate the transition to renewable
energy. The integration of ecological and socio-economic considerations ensures that solar
energy development is both environmentally sustainable and socially equitable, paving the
way for a more resilient and sustainable energy future.
Processes 2025, 13, 981 18 of 19

Author Contributions: H.A.A.: Conceptualization; H.A.A.: Methodology; H.A.A.: Software; H.A.A.:

Validation; H.A.A., J.L., Z.L., A.S., R.A., M.H.B. and H.U.: Formal Analysis; H.A.A.: Data Curation;
H.A.A.: Writing—Original Draft Preparation; H.A.A. and R.A.: Writing—Review and Editing; H.A.A.
and R.A.: Visualization; J.L.: Supervision. All authors have read and agreed to the published version
of the manuscript.

Funding: This work was supported in part by the National Natural Science Foundation of China
under Grant 62325101 and Grant 62202020, and in part by the Fundamental Research Funds for the
Central Universities.

Data Availability Statement: The data presented in this study are available on request from the
corresponding author.

Acknowledgments: We acknowledge the support of Beihang University, Beijing, China, for providing
the infrastructure and resources required for this study.

Conflicts of Interest: Author Hameed Ullah is employ at The Urban Unit (USPMU) during the course
of this research. The other authors declare no competing commercial or financial interests that could
be perceived as influencing the study. The Urban Unit (USPMU) had no involvement in the study’s
design, data collection, analysis, interpretation, manuscript preparation, or the decision to publish
the findings.

References
1. Raza, M.A.; Khatri, K.L.; Israr, A.; Ul Haque, M.I.; Ahmed, M.; Rafique, K.; Saand, A.S. Energy demand and production
forecasting in Pakistan. Energy Strategy Rev. 2022, 39, 100788. [CrossRef]
2. Ahmad, H.; Jamil, F. Investigating power outages in Pakistan. Energy Policy 2024, 189, 114117. [CrossRef]
3. Hassan, Q.; Viktor, P.; Al-Musawi, T.J.; Ali, B.M.; Algburi, S.; Alzoubi, H.M.; Al-Jiboory, A.K.; Sameen, A.Z.; Salman, H.M.;
Jaszczur, M. The renewable energy role in the global energy Transformations. Renew. Energy Focus 2024, 48, 100545.
4. Panagoda, L.; Sandeepa, R.; Perera, W.; Sandunika, D.; Siriwardhana, S.; Alwis, M.; Dilka, S. Advancements in photovoltaic (Pv)
technology for solar energy generation. J. Res. Technol. Eng. 2023, 4, 30–72.
5. Mian, S.H.; Moiduddin, K.; Alkhalefah, H.; Abidi, M.H.; Ahmed, F.; Hashmi, F.H. Mechanisms for choosing PV locations that
allow for the most sustainable usage of solar energy. Sustainability 2023, 15, 3284. [CrossRef]
6. Aghaloo, K.; Ali, T.; Chiu, Y.R.; Sharifi, A. Optimal site selection for the solar-wind hybrid renewable energy systems in
Bangladesh using an integrated GIS-based BWM-fuzzy logic method. Energy Convers. Manag. 2023, 283, 116899.
7. Munkhbat, U.; Choi, Y. GIS-based site suitability analysis for solar power systems in Mongolia. Appl. Sci. 2021, 11, 3748.
[CrossRef]
8. Al Garni, H.Z.; Awasthi, A. Solar PV power plants site selection: A review. In Advances in Renewable Energies and Power Technologies;
Elsevier: Amsterdam, The Netherlands, 2018; pp. 57–75.
9. Hafeznia, H.; Yousefi, H.; Astaraei, F.R. A novel framework for the potential assessment of utility-scale photovoltaic solar energy,
application to eastern Iran. Energy Convers. Manag. 2017, 151, 240–258.
10. Settou, B.; Settou, N.; Gouareh, A.; Negrou, B.; Mokhtara, C.; Messaoudi, D. A high-resolution geographic information system-
analytical hierarchy process-based method for solar PV power plant site selection: A case study Algeria. Clean Technol. Environ.
Policy 2021, 23, 219–234.
11. Al-Ruzouq, R.; Shanableh, A.; Yilmaz, A.G.; Idris, A.; Mukherjee, S.; Khalil, M.A.; Gibril, M.B.A. Dam site suitability mapping
and analysis using an integrated GIS and machine learning approach. Water 2019, 11, 1880. [CrossRef]
12. Ashraf, W.M.; Uddin, G.M.; Ahmad, H.A.; Jamil, M.A.; Tariq, R.; Shahzad, M.W.; Dua, V. Artificial intelligence enabled efficient
power generation and emissions reduction underpinning net-zero goal from the coal-based power plants. Energy Convers. Manag.
2022, 268, 116025.
13. Al-Ruzouq, R.; Abdallah, M.; Shanableh, A.; Alani, S.; Obaid, L.; Gibril, M.B.A. Waste to energy spatial suitability analysis using
hybrid multi-criteria machine learning approach. Environ. Sci. Pollut. Res. 2022, 29, 2613–2628. [CrossRef] [PubMed]
14. Petrov, A.N.; Wessling, J.M. Utilization of machine-learning algorithms for wind turbine site suitability modeling in Iowa, USA.
Wind Energy 2015, 18, 713–727. [CrossRef]
15. Yin, P.Y.; Wu, T.H.; Hsu, P.Y. Risk management of wind farm micro-siting using an enhanced genetic algorithm with simulation
optimization. Renew. Energy 2017, 107, 508–521. [CrossRef]
Processes 2025, 13, 981 19 of 19

16. Shorabeh, S.N.; Samany, N.N.; Minaei, F.; Firozjaei, H.K.; Homaee, M.; Boloorani, A.D. A decision model based on decision tree
and particle swarm optimization algorithms to identify optimal locations for solar power plants construction in Iran. Renew.
Energy 2022, 187, 56–67. [CrossRef]
17. Sun, Y.; Zhu, D.; Li, Y.; Wang, R.; Ma, R. Spatial modelling the location choice of large-scale solar photovoltaic power plants:
Application of interpretable machine learning techniques and the national inventory. Energy Convers. Manag. 2023, 289, 117198.
[CrossRef]
18. Cattani, G. Combining data envelopment analysis and Random Forest for selecting optimal locations of solar PV plants. Energy
AI 2023, 11, 100222. [CrossRef]
19. Zubair, M.; Saleem, A.; Baig, M.A.; Islam, M.; Razzaq, A.; Gul, S.; Ahmad, S.; Moyo, H.P.; Hassan, S.; Rischkowsky, B.; et al. The
influence of protection from grazing on Cholistan desert vegetation, Pakistan. Rangelands 2018, 40, 136–145.
20. Haider, S.; Malik, S.; Nadeem, B.; Sadiq, N.; Ghaffari, A. Impact of population growth on the natural resources of Cholistan desert.
PalArch’s J. Archaeol. Egypt/Egyptol. 2021, 18, 1778–1790.
21. Jahangiri, M.; Ghaderi, R.; Haghani, A.; Nematollahi, O. Finding the best locations for establishment of solar-wind power stations
in Middle-East using GIS: A review. Renew. Sustain. Energy Rev. 2016, 66, 38–52.
22. Aly, A.; Jensen, S.S.; Pedersen, A.B. Solar power potential of Tanzania: Identifying CSP and PV hot spots through a GIS
multicriteria decision making analysis. Renew. Energy 2017, 113, 159–175.
23. Sirén, A.P.; Pekins, P.J.; Kilborn, J.R.; Kanter, J.J.; Sutherland, C.S. Potential influence of high-elevation wind farms on carnivore
mobility. J. Wildl. Manag. 2017, 81, 1505–1512.
24. Koc, A.; Turk, S.; Şahin, G. Multi-criteria of wind-solar site selection problem using a GIS-AHP-based approach with an
application in Igdir Province/Turkey. Environ. Sci. Pollut. Res. 2019, 26, 32298–32310.
25. Kuru, A. Solar power plant site selection modeling for sensitive ecosystems. Clean Technol. Environ. Policy 2023, 25, 2529–2544.
26. Rekik, S.; El Alimi, S. Optimal wind-solar site selection using a GIS-AHP based approach: A case of Tunisia. Energy Convers.
Manag. X 2023, 18, 100355. [CrossRef]
27. Hasti, F.; Mamkhezri, J.; McFerrin, R.; Pezhooli, N. Optimal solar photovoltaic site selection using geographic information
system–based modeling techniques and assessing environmental and economic impacts: The case of Kurdistan. Sol. Energy 2023,
262, 111807.
28. Demir, A.; Dinçer, A.E.; Yılmaz, K. A novel procedure for the AHP method for the site selection of solar PV farms. Int. J. Energy
Res. 2024, 2024, 5535398.
29. Joseph, J.I.; Umoren, A.M.; Markson, I. Development of optimal site selection method for large scale solar photovoltaic power
plant. Math. Softw. Eng. 2016, 2, 66–75.
30. Georgiou, A.; Skarlatos, D. Optimal site selection for sitting a solar park using multi-criteria decision analysis and geographical
information systems. Geosci. Instrum. Methods Data Syst. 2016, 5, 321–332.
31. Alhammad, A.; Sun, Q.; Tao, Y. Optimal solar plant site identification using GIS and remote sensing: framework and case study.
Energies 2022, 15, 312. [CrossRef]
32. Wang, M.X.; Qu, Y. Approximation capabilities of neural networks on unbounded domains. Neural Netw. 2022, 145, 56–67.
[PubMed]
33. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial
networks. Commun. ACM 2020, 63, 139–144.
34. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on
Learning Representations, ICLR, San Diego, CA, USA, 7–9 May 2015.
35. Li, Y.; Sun, Y.; Li, J. Heterogeneous effects of climate change and human activities on annual landscape change in coastal cities of
mainland China. Ecol. Indic. 2021, 125, 107561.
36. Diaconu, A.M.; Sulea, M. A review on ensemble methods for classification problems. Comput. Sci. Eng. 2018, 11, 45–52.
37. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference
on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794.
38. Arabameri, A.; Cerda, A.; Pradhan, B.; Tiefenbacher, J.; Lombardo, L.; Bui, D. A methodological comparison of head-cut based
gully erosion susceptibility models: Combined use of statistical and artificial intelligence. Geomorphology 2020, 359, 107136.
39. Roshan, K.; Zafar, A. Utilizing XAI technique to improve autoencoder based model for computer network anomaly detection
with shapley additive explanation (SHAP). arXiv 2021, arXiv:2112.08442.
40. Kim, Y.; Kim, Y. Explainable heat-related mortality with random forest and SHapley Additive exPlanations (SHAP) models.
Sustain. Cities Soc. 2022, 79, 103677.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Roychowdhury 2020
No ratings yet
Roychowdhury 2020
16 pages
Energies 16 06724
No ratings yet
Energies 16 06724
28 pages
Eco-Friendly Renewable Energy Installation
No ratings yet
Eco-Friendly Renewable Energy Installation
11 pages
Saraswat 2021
No ratings yet
Saraswat 2021
20 pages
Applsci 11 03748
No ratings yet
Applsci 11 03748
14 pages
India, Modelling, Wind
No ratings yet
India, Modelling, Wind
14 pages
Icp 2025 1095 Final
No ratings yet
Icp 2025 1095 Final
8 pages
Renewable Energy: Mehmet Akif Günen
No ratings yet
Renewable Energy: Mehmet Akif Günen
14 pages
SCDM Theory Report
No ratings yet
SCDM Theory Report
28 pages
1 s2.0 S0960148123006985 Main
No ratings yet
1 s2.0 S0960148123006985 Main
13 pages
Rediske 2020
No ratings yet
Rediske 2020
16 pages
A Global Inventory of Photovoltaic Solar Energy Generating Units
No ratings yet
A Global Inventory of Photovoltaic Solar Energy Generating Units
8 pages
Salma 66
No ratings yet
Salma 66
17 pages
Floating Solar Site Selection Guide
No ratings yet
Floating Solar Site Selection Guide
24 pages
Tailored Development of A New Power System
No ratings yet
Tailored Development of A New Power System
2 pages
Spatial Potential Analysis and Site Selection For Agrivoltaics in Germany
No ratings yet
Spatial Potential Analysis and Site Selection For Agrivoltaics in Germany
17 pages
J Energy 2020 118993
No ratings yet
J Energy 2020 118993
24 pages
Renewable Energies in Pakistan
No ratings yet
Renewable Energies in Pakistan
16 pages
Prospects of Renewable Energy Sources in Pakistan.: February 2005
No ratings yet
Prospects of Renewable Energy Sources in Pakistan.: February 2005
23 pages
1 s2.0 S1364032113007727 Main
No ratings yet
1 s2.0 S1364032113007727 Main
8 pages
Rooftop Solar PV for Academic Buildings
No ratings yet
Rooftop Solar PV for Academic Buildings
17 pages
GIS-Based Solar Site Selection in Erbil
No ratings yet
GIS-Based Solar Site Selection in Erbil
17 pages
Empowering Pakistan Sustainable Energy Solutions For Development-ICESM24
No ratings yet
Empowering Pakistan Sustainable Energy Solutions For Development-ICESM24
21 pages
Green Energy Initiative 1922183
No ratings yet
Green Energy Initiative 1922183
14 pages
Daily Essay 8 Aug
No ratings yet
Daily Essay 8 Aug
3 pages
Geography PDF
No ratings yet
Geography PDF
11 pages
B01 Renewable EnergyTechnologiesandSustainableDevelopment2005 PDF
No ratings yet
B01 Renewable EnergyTechnologiesandSustainableDevelopment2005 PDF
23 pages
The Suitable Location For A Hybrid Renewable Energy Wind-Solar Power Plant: A Review by Bibliometric
No ratings yet
The Suitable Location For A Hybrid Renewable Energy Wind-Solar Power Plant: A Review by Bibliometric
12 pages
Sustainable Electrification of Remote Communities Techno-Economic and Demand Response Analysis F
No ratings yet
Sustainable Electrification of Remote Communities Techno-Economic and Demand Response Analysis F
23 pages
National Energy Scenario of Pakistan - Current Status, Future Alternatives, and Institutional Infrastructure - An Overview
No ratings yet
National Energy Scenario of Pakistan - Current Status, Future Alternatives, and Institutional Infrastructure - An Overview
12 pages
Solar PV Solutions for Pakistan Homes
No ratings yet
Solar PV Solutions for Pakistan Homes
7 pages
Optimal Solar PV Site Selection
No ratings yet
Optimal Solar PV Site Selection
17 pages
GIS-based Multi-Influencing Factor (MIF) Application For Optimal Site Selection of Solar Photovoltaic Power Plant in Nashik, India
No ratings yet
GIS-based Multi-Influencing Factor (MIF) Application For Optimal Site Selection of Solar Photovoltaic Power Plant in Nashik, India
25 pages
Amer, M., & Daim, T. U. (2011)
No ratings yet
Amer, M., & Daim, T. U. (2011)
16 pages
1 s2.0 S1364032116300508 Main
No ratings yet
1 s2.0 S1364032116300508 Main
19 pages
Solar Energy in Pakistan Benefits of Developing Standalone Projects For The Industrial Sector in The Country PDF
No ratings yet
Solar Energy in Pakistan Benefits of Developing Standalone Projects For The Industrial Sector in The Country PDF
20 pages
1 s2.0 S2352484723001737 Main
No ratings yet
1 s2.0 S2352484723001737 Main
11 pages
Solar PV Plant Location Selection AHP
No ratings yet
Solar PV Plant Location Selection AHP
10 pages
Assignment 4 EPS
No ratings yet
Assignment 4 EPS
11 pages
Sustainability 14 11276 v2
No ratings yet
Sustainability 14 11276 v2
21 pages
Concept Note QuantumLeap
No ratings yet
Concept Note QuantumLeap
5 pages
Need of Renewable Energy Resources in Pakistan
No ratings yet
Need of Renewable Energy Resources in Pakistan
5 pages
LR042
No ratings yet
LR042
14 pages
Photovoltaic Power: China's New Journey Towards "Carbon Peak" and "Carbon Neutrality"
No ratings yet
Photovoltaic Power: China's New Journey Towards "Carbon Peak" and "Carbon Neutrality"
25 pages
Wind Turbines and Rooftop Photovoltaic Technical P
No ratings yet
Wind Turbines and Rooftop Photovoltaic Technical P
35 pages
Salma 11
No ratings yet
Salma 11
18 pages
Economic Analysis and Impact On National Grid by Domestic Pho - 2020 - Renewable
No ratings yet
Economic Analysis and Impact On National Grid by Domestic Pho - 2020 - Renewable
13 pages
qt1h35n0c4 Nosplash
No ratings yet
qt1h35n0c4 Nosplash
176 pages
22ee007 Mlea CIA-3 QB Answers
No ratings yet
22ee007 Mlea CIA-3 QB Answers
10 pages
GIS-based Photovoltaic Solar Farms Site Selection Using ELECTRE-TRI Evaluating The Case For Torre Pacheco, Murcia, Southeast of Spain
No ratings yet
GIS-based Photovoltaic Solar Farms Site Selection Using ELECTRE-TRI Evaluating The Case For Torre Pacheco, Murcia, Southeast of Spain
17 pages
Sustainability 16 08775
No ratings yet
Sustainability 16 08775
30 pages
Renewable Resources To Overcome Energy Crisis
No ratings yet
Renewable Resources To Overcome Energy Crisis
5 pages
Dynamic Forecasting of Solar Energy Microgrid Systems Using Feature Engineering
No ratings yet
Dynamic Forecasting of Solar Energy Microgrid Systems Using Feature Engineering
13 pages
Contents 2025 Computer Vision and Machine Intelligence For Renewable
No ratings yet
Contents 2025 Computer Vision and Machine Intelligence For Renewable
8 pages
1 s2.0 S2090447921002938 Main
No ratings yet
1 s2.0 S2090447921002938 Main
20 pages
Journal 1
No ratings yet
Journal 1
7 pages
Rotational Motion - DPPs - Pathshala 11th NEET 2024 (AN11MA)
No ratings yet
Rotational Motion - DPPs - Pathshala 11th NEET 2024 (AN11MA)
24 pages
MAT170 - Exam 2 Review - Fall - 24
No ratings yet
MAT170 - Exam 2 Review - Fall - 24
15 pages
TAGGING
No ratings yet
TAGGING
2 pages
Sliver On Rolled Alluminium
No ratings yet
Sliver On Rolled Alluminium
1 page
MATH 9 Q4 Module 2 Trigonometric Ratios of Special Angles 1
No ratings yet
MATH 9 Q4 Module 2 Trigonometric Ratios of Special Angles 1
16 pages
BS en 1926
No ratings yet
BS en 1926
20 pages
UG190239 - Avaneesh Nataraja
No ratings yet
UG190239 - Avaneesh Nataraja
21 pages
Design of Agricultural Machine Elements
No ratings yet
Design of Agricultural Machine Elements
12 pages
By Yeoh Seng Guan in Philippine Studies
No ratings yet
By Yeoh Seng Guan in Philippine Studies
6 pages
202A-V08-0000-A Standard Practices Manual SEP-23
No ratings yet
202A-V08-0000-A Standard Practices Manual SEP-23
154 pages
CIC BIM Standards For Underground Utilities
100% (1)
CIC BIM Standards For Underground Utilities
37 pages
Feur Feasibility Study Format
No ratings yet
Feur Feasibility Study Format
4 pages
Application Pack For Teacher of Science - 5
No ratings yet
Application Pack For Teacher of Science - 5
12 pages
Q4M4 - Dispersion, Scattering
No ratings yet
Q4M4 - Dispersion, Scattering
25 pages
SYCM Before Final
No ratings yet
SYCM Before Final
30 pages
PRM
No ratings yet
PRM
24 pages
Testbank For Fundamentals of Taxation 2024 Edition 17th Edition Cruz
No ratings yet
Testbank For Fundamentals of Taxation 2024 Edition 17th Edition Cruz
17 pages
KSD 5703
No ratings yet
KSD 5703
2 pages
CJC - Maths 2008 JC2 H1 Prelim Exam
No ratings yet
CJC - Maths 2008 JC2 H1 Prelim Exam
4 pages
Automotive NVH With Abaqus
No ratings yet
Automotive NVH With Abaqus
26 pages
FPS Second Semester FINAL Time Table 2023-2024
No ratings yet
FPS Second Semester FINAL Time Table 2023-2024
6 pages
CM 11 MCQ (2018)
No ratings yet
CM 11 MCQ (2018)
13 pages
The Role of Music in Enhancing Student Performance
No ratings yet
The Role of Music in Enhancing Student Performance
6 pages
Mensuration Module
No ratings yet
Mensuration Module
29 pages
Information Booklet 2025 26
No ratings yet
Information Booklet 2025 26
28 pages
Paracetamol Safety Data Sheet
No ratings yet
Paracetamol Safety Data Sheet
5 pages
Kalman Filter Enhances Buck Converter
No ratings yet
Kalman Filter Enhances Buck Converter
2 pages
Ipdc-2 Unit1
No ratings yet
Ipdc-2 Unit1
14 pages
14 Principles of Toyota
No ratings yet
14 Principles of Toyota
18 pages
Acumen Measurements and Consultancy Private Limited
No ratings yet
Acumen Measurements and Consultancy Private Limited
66 pages

Geographic Information System and Machine Learning

Uploaded by

Geographic Information System and Machine Learning

Uploaded by

Article

Geographic Information System and Machine Learning

1 School of Astronautics, Beihang University, Beijing 102206, China; adnanspsc@buaa.edu.cn (H.A.A.);

Copyright: © 2025 by the authors.

Processes 2025, 13, 981 https://doi.org/10.3390/pr13040981

the precision, interpretability, and adaptability of solar PV site selection in hyper-arid

3. Materials and Methods

Figure 1. Map of the study area.

3.2. Data Collection

Table 1. Detailed description of the dataset used in this study.

Category Variables Description References

Category Variables Description References

Figure 3. Illustration of proposed methodology.

4.1. Multilayer Perceptron Neural Network (MLP NN)

vt = β 2 vt−1 + (1 − β 2 ) gt2 (2)

4.2. Random Forest

requirements. In classification tasks, RF leverages the principle of combining multiple

where N is the total number of decision trees in the ensemble.

This ensemble approach is highly effective in classification tasks, particularly in scenar-

4.3. Extreme Gradient Boosting Model (XGBoost)

Obj(θ ) = L(y, ŷ) + Ω( f ) (7)

4.4. Model Settings

4.5. Models Evaluation

where TP represents the number of pixels correctly identified as solar PV installation

5. Results and Discussion

Table 2. Comparison of AUC and CA of considered models for testing sets.

5.2. Feature Importance Analysis

Figure 6. Feature importance ranking diagram for the RF model.

Figure 7. Feature importance ranking diagram for the MLP model.

Figure 8. Feature importance ranking diagram for the XGBoost model.

5.3. Spatial Modeling of Probability Maps for Solar Photovoltaic Installations

5.4. Area Suitability Distributions

5.5. Model Validation and Comparison

5.6. Sensitivity Analysis

Author Contributions: H.A.A.: Conceptualization; H.A.A.: Methodology; H.A.A.: Software; H.A.A.:

You might also like