[go: up one dir, main page]

0% found this document useful (0 votes)
51 views26 pages

Sustainability 12 09338 v2

Uploaded by

gis300765
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views26 pages

Sustainability 12 09338 v2

Uploaded by

gis300765
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

sustainability

Article
Application of Remote Sensing, GIS and Machine
Learning with Geographically Weighted Regression
in Assessing the Impact of Hard Coal Mining on the
Natural Environment
Anna Kopeć * , Paweł Trybała , Dariusz Głabicki
˛ , Anna Buczyńska , Karolina Owczarz,
Natalia Bugajska , Patrycja Kozińska, Monika Chojwa and Agata Gattner
Department of Mining and Geodesy, Faculty of Geoengineering, Mining and Geology,
Wroclaw University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland;
pawel.trybala@pwr.edu.pl (P.T.); dariusz.glabicki@pwr.edu.pl (D.G.); anna.buczynska@pwr.edu.pl (A.B.);
karolina.owczarz@pwr.edu.pl (K.O.); natalia.bugajska@pwr.edu.pl (N.B.); patrycja.kozinska@pwr.edu.pl (P.K.);
monika.chojwa@pwr.edu.pl (M.C.); agata.gattner@pwr.edu.pl (A.G.)
* Correspondence: anna.kopec@pwr.edu.pl; Tel.: +48-71-320-6810

Received: 22 September 2020; Accepted: 3 November 2020; Published: 10 November 2020 

Abstract: Mining operations cause negative changes in the environment. Therefore, such areas
require constant monitoring, which can benefit from remote sensing data. In this article, research was
carried out on the environmental impact of underground hard coal mining in the Bogdanka mine,
located in the southeastern Poland. For this purpose, spectral indexes, satellite radar interferometry,
Geographic Information System (GIS) tools and machine learning algorithms were utilized. Based on
optical, radar, geological, hydrological and meteorological data, a spatial model was developed to
determine the statistical significance of the selected factors’ individual impact on the occurrence
of wetlands. Obtained results show that Normalized Difference Vegetation Index (NDVI) change,
terrain height, groundwater level and terrain displacement had a considerable influence on the
occurrence of wetlands in the research area. Moreover, the machine learning model developed
using the Random Forest algorithm allowed for an efficient determination of potential flooding
zones based on a set of spatial variables, correctly detecting 76% area of wetlands. Finally, the GWR
(Geographically Weighted Regression (GWR) modelling enabled identification of local anomalies of
selected factors’ influence on the occurrence of wetlands, which in turn helped to understand the
causes of wetland formation.

Keywords: machine learning; wetlands; mining subsidence; spectral indexes; SBAS

1. Introduction
The mining industry, through the exploitation of raw materials, has a negative impact on the
natural environment and urbanized areas [1–3]. Therefore, the monitoring and protection of areas under
the influence of mining is an important issue because changes caused by mining are, in many cases,
irreversible, as the reclamation of such areas can take up to several dozen years [4–6]. Monitoring of
mining areas is aimed at observing the changes taking place, defining their extent and determining the
level of threat of a given phenomenon (e.g., displacements of terrain and induced shocks) [7,8]. On the
other hand, environmental protection is associated with the implementation of solutions in technological
processes related to the operation, which will both minimize the undesirable impact on the environment
and mitigate the negative effects through reclamation. Environmental degradation depends primarily
on the method of exploitation of raw materials: opencast and underground methods. Specifically,

Sustainability 2020, 12, 9338; doi:10.3390/su12229338 www.mdpi.com/journal/sustainability


Sustainability 2020, 12, 9338 2 of 26

the negative effects of exploitation using the above methods include: terrain subsidence [9,10], changes
in hydrogeological conditions [10–12], mining tremors [13,14], changes in land cover [15,16] and
damage to technical and urban infrastructure [17,18]. Terrain deformations lead to disturbances in
the gravitational runoff of water in both surface and underground watercourses. The result of this
phenomenon is the formation of floodplains and wetlands in the areas of subsidence. The following
factors contribute to the formation of the catchment areas: the size and distribution of post-exploitation
depressions, natural conditions concerning the permeability of the substrate and topography and
weather conditions. River valleys are among the areas most endangered by waterlogging [19].
Monitoring and research of the above effects arising in the environment of mining areas in large
domains using field methods is difficult, depends on the space and location, consumes plenty of time
and requires significant financial outlays. This difficulty is answered by passive and active remote
sensing from outer space, which gives the possibility of frequent observations of the Earth, and a
Geographic Information System (GIS), which offers tools for various types of spatial analysis and
graphic visualization of the results. One of the applications of remote sensing in mining is the detection
of terrain displacements caused by active or terminated mining, and the second application is the
study of terrain coverage (vegetation condition, land use and changes in the range of water reservoirs).
Noteworthy scientific publications on the use of remote sensing in monitoring and environmental
impact assessment are presented in Table 1. The studies cited in Table 1 show a clear division between
passive and active remote sensing techniques. Passive remote sensing is suitable for investigating
the environment in the context of land cover changes [20–26] while active remote sensing allows for
the detection of terrain deformation [27–31]. Both passive and active methods are used to assess
the condition of mining areas, enabling the creation of a sustainable mining strategy. The above
studies confirm the validity of using individual aspects of remote sensing, GIS and machine learning
classification and regression algorithms to identify subsidence zones in mining areas. This is due to
the fact that remote sensing is characterized by systematic observation of large areas. Additionally,
the time needed to download and process imagery is relatively short, often impossible to obtain during
field measurements. Another advantage is a short interval for image acquisition of the same area
(e.g., Sentinel-1A/B satellites provide new imagery every 6 days) and the possibility of researching past
phenomena, as long as satellite images for the studied period are available. The dynamic development
of remote sensing and related computing methods is caused by the increase in the number of satellite
missions observing the surface of terrain and oceans, as well as universal access to Earth images
and open-source software for satellite data processing. GIS is a complement to remote sensing
research [21,27] because it provides plenty of tools for further processing in conjunction with other
spatially referenced data. Spatial analysis enables the linking of environmental, geological-tectonic,
topographic and mining variables appearing over the research areas in such a way as to obtain
the result of the relationship between them both in time and space. GIS is also identified with the
storing, indexing, data management and modeling for various types of phenomena. The classification
and regression machine learning algorithms [20,26,32–34] enable the appropriate adjustment and
generalization of the dataset in the statistical analysis of dynamic phenomena. These methods not only
determine the statistically significant factors influencing the phenomenon studied, but also make it
possible to effectively forecast values of the dependent variable based on a set of independent variables.
Additionally, it should be emphasized that machine learning as a process of statistical analysis aids in
recognition of hidden relationships in the dataset and can detect anomalies and relationships between
variables. The above advantages of the use of remote sensing, GIS and machine learning, as well
as satisfactory results obtained in the referenced papers, prompted the authors to combine all the
techniques in the case study in this paper.
Sustainability 2020, 12, 9338 3 of 26

Table 1. Selected studies on the use of active and passive remote sensing in monitoring the impact of
exploitation on the environment.

Source Area of Interest Issues Satellite Data Methods


Impact of mining activities Landsat-5 and Unsupervised
[21] Amynteon mine (Greece) on terrain and water Landsat-7, SPOT classification techniques in
resource and ASTER GIS
The PT Freeport Impact of mining activities Three false-color
[22] Landsat-5
Indonesia (Indonesia) on land cover composite image
Lignite mine Impact of mining on Landsat-5 and Normalized Difference
[23]
“Belchatów”, (Poland) hydrogenic habitats Landsat-7 Vegetation Index (NDVI)
ETM, SPOT-4,
Coal mine in Tangshan, Impact of mining activities Multi-spectral composite
[24] SPOT-5 and
(China) on terrain imagery
IKONOS
Monitoring of the Maximum Likelihood
Lignite mines (eastern Landsat-TM and
[25] environment and Classification of Landsat
Germany) ERS-1
reclaimed areas Thematic Mapper
Coal mine in
Subsidence analysis in a
[27] Gangwon-do (South JERS-1 SBAS algorithm and GIS
mining area
Korea)
Determination of common
Two coalfields, Bulianta
[28] features of deformations ALOS PALSAR SBAS processing
and Shangwan (China)
detected in two areas
Coal mines in the
Mapping coal
Greater Region of ERS-1/2 and
[29] mining-related ground SBAS processing
Luxembourg (the ENVISAT
subsidence and uplift
French–German border)
The Northumberland Investigation terrain
ERS, ENVISAT and Intermittent SBAS
[30] and Durham coalfield motion and groundwater
Sentinel-1 technique
(United Kingdom) level change phenomena
The South Wales Monitoring of uplift
Intermittent SBAS
[31] Coalfield (United caused by pumping ERS-1/2
technique
Kingdom) groundwater
Random Forest, Spectral
Mine of coal and metal
Landscape dynamics and Mixture Analysis (SMA),
[20] ores, Kirchheller Heide Landsat ETM+
extensive soil movement Normalized Difference
(Germany)
Vegetation Index (NDVI)
The Three Gorges Prediction of groundwater Classification and
[32] -
Reservoir area (China) level regression tree (CART)
Multi-criteria decision
model (MCDM) integrated
The southwestern part of Assessment of
[26] Landsat 8 OLI 7/5/3 with decision tree
Ritu county, Bahrain groundwater potential
techniques (C5.0) and
CART
Least Square Support
Sichuan Province Predicting displacement of
[33] - Vector Machines (LSSVM),
(China) landslide
Genetic Algorithm (GA)
Darling River Floodplain
[34] Modeling of wetlands - Random Forest
(Australia)

The main aim of the article was to combine Synthetic Aperture Radar (SAR) data from the
Sentinel-1A/B mission and optical data from the Sentinel-2A/B mission as well as geological and
meteorological data with machine learning and GIS analysis, to statistically determine whether the
terrain subsidence caused by underground operations has an influence on the state of the natural
environment. Emphasis was also put on learning whether different variables, such as geological
structure or hydrogeological state, influence the occurrence rate of flooding. The analysis was
conducted for the area of the Bogdanka underground hard coal mine. The time scope of the study
is from October 2014 to April 2019. The article is divided into six main sections. The Introduction
contains the motivation and goals of the conducted research. In the second section, various materials
and methods used in the study are characterized and the study area is described. Next, the GIS and
Sustainability 2020, 12, 9338 4 of 26

Machine Learning analyses performed on the gathered data are outlined in the third section. The fourth
section presents the results of the analysis, inter alia the identified terrain subsidence zones and the
determined floodplains, together with the Machine Learning models and then the discussion on the
results is carried out. The last section formulates the conclusions of the research.

2. Materials

2.1. Study Area Description


The Bogdanka Hard Coal Mine, located in Lublin Voivodeship in south-eastern Poland (Figure 1a,b),
is currently the only active mining facility in the Lublin Coal Basin (LCB) region, specifically its
north-eastern part called the Central Coal Region (CCR). A total of 10 coal deposits are documented in
the LCB region, covering an area of around 1200 km2 altogether, out of which only one (Bogdanka)
is under exploitation, two (K-3 and Ostrów) are being prepared for extraction and the rest of the
documented reserves remain undeveloped [35]. The mining area of the Bogdanka deposit is divided
into three exploitation fields—one main field and two peripherals (Nadrybie and Stefanów). The three
coal deposits, numbered 385/2, 389 and 391, lie under a layer of overburden 650- to 730-m thick [36].
Mining operations are carried out at a maximum depth of around 1100 m below sea level using a
longwall extraction system with roof collapse and simultaneous elimination of longwall galleries as the
mining face progresses [37]. The extraction system used in the Bogdanka mine causes the roof layers to
collapse, filling the exploitation void with rock material. This leads the strata above the roof to slowly
subside and, as a consequence, continuous deformations may appear on the terrain, e.g., in the form of
subsidence troughs. If a subsidence occurs over an area with naturally shallow groundwater levels,
it can lead to local flooding and the formation of wetlands.
From a geological standpoint, the LCB area is situated in marginal, south-west part of the East
European (Pre-Cambrian) Platform in the region of the Bug River Basin. The basin, also known as
the Lublin Coal Basin, together with its fragment located in Ukraine—the Lviv Basin—is a gentle,
asymmetric synclinal structure. Its south-west wing is steep, while the north-east wing takes the form
of a gentle monocline, with many framework and fold structures manifesting as well. Relative to the
axis, the basin is divided into two predominant parts—the Lublin trench and the Łuków-Hrubieszów
elevation, over which the documented deposits of the LCB are located. The Bug River Basin area is
filled with lower and upper Carboniferous sediments, from Visen to Westphalian, formed as a result
of cyclical processes of marine, paralytic and limnic sedimentation. Two tectonic phases played an
important role in shaping the structure of the basin—the Breton one, which started the formation
of the structure, and the Asturian one, which decided its final shape [38]. Productive carbon within
the LCB is constituted by the Lublin Formation (Westphalian B), and hard coal is located within
relatively weak silt and clay rocks. In contrast to the deposits of the Upper Silesian Coal Basin, the
seams and accompanying layers are mostly horizontal and no faults are characterized by significant
discharges [36].
Sustainability 2020, 12, 9338 5 of 26
Sustainability 2020, 12, x FOR PEER REVIEW 5 of 26

0 ’ Digital Elevation Model (DEM) Shuttle Radar


Figure
Figure (a)(a)
1. 1. TheThe Lublin
Lublin Coal
Coal Basin
Basin location
location over
over the
the 1′’1Digital Elevation Model (DEM) Shuttle Radar
Topography Mission (SRTM) background. The black
Topography Mission (SRTM) background. The black polygons polygons represent the documented
represent coal deposits,
the documented coal
the thicker line covers the Bogdanka deposit and the Bogdanka Coal Mine is indicated
deposits, the thicker line covers the Bogdanka deposit and the Bogdanka Coal Mine is indicated by the redby
dot.
Map representing Lublin Coal Basin is in WGS 84 /Universal Transverse Mercator (UTM)
the red dot. Map representing Lublin Coal Basin is in WGS 84 /Universal Transverse Mercator (UTM) zone 32N
projection.
zone (b) Map showing
32N projection. (b) Maplocalization of Lublin Coal
showing localization of Basin
LublininCoal
Poland, in WGS84
Basin projection.
in Poland, in WGS84
projection.
2.2. Meteorological Data
Before preparing
2.2. Meteorological Data the input data, an analysis of precipitation distribution was performed for
the study area, based on meteorological data. Data on the sum of monthly precipitation used in
Before preparing the input data, an analysis of precipitation distribution was performed for the
the analysis were obtained from the Bulletins of the Polish Institute of Meteorology and Water
study area, based on meteorological data. Data on the sum of monthly precipitation used in the
Management National Research Institute [39]. It covers two stations located closest to the Bogdanka
analysis were obtained from the Bulletins of the Polish Institute of Meteorology and Water
mine (Lublin and Terespol), shown in the Figure 1b. Figure 2 shows a graph of the average sum of
Management National Research Institute [39]. It covers two stations located closest to the Bogdanka
precipitation in each month, covering the studied period. It can be noted that the highest sums of
mine (Lublin and Terespol), shown in the Figure 1b. Figure 2 shows a graph of the average sum of
rainfall were recorded in July while the lowest were observed in the winter months. It should be noted,
precipitation in each month, covering the studied period. It can be noted that the highest sums of
however, that from November to March, a snow cover can occur in this region of Poland; therefore,
rainfall were recorded in July while the lowest were observed in the winter months. It should be
in early spring, the range of areas with a high groundwater level may be enlarged due to snow melting.
noted, however, that from November to March, a snow cover can occur in this region of Poland;
In the summer (June–August) and autumn (September–November) periods, cycles of large amounts
therefore, in early spring, the range of areas with a high groundwater level may be enlarged due to
of rainfall or seasonal drought can occur. Due to the above, it was decided that the best period to
snow melting. In the summer (June–August) and autumn (September–November) periods, cycles of
perform the analysis is springtime, after the thaw period (April), as the period with the average sum of
large amounts of rainfall or seasonal drought can occur. Due to the above, it was decided that the
precipitation allowing for the most objective analysis.
best period to perform the analysis is springtime, after the thaw period (April), as the period with the
average sum of precipitation allowing for the most objective analysis.
Sustainability 2020, 12, 9338 6 of 26
Sustainability 2020, 12, x FOR PEER REVIEW 6 of 26

Figure 2.
Figure Monthlyaverage
2.Monthly averagesum
sumof
ofprecipitation
precipitationcalculated
calculatedbased
basedon
onthe
themeteorological
meteorologicaldata
datacollected
collected
over the studied period [39].
over the studied period [39].
2.3. Identification of Flooding Areas Based on the Optical Satellite Data Analysis
2.3. Identification of Flooding Areas Based on the Optical Satellite Data Analysis
The identification of flooding areas over the area of the Bogdanka Coal Mine, and areas
The identification of flooding areas over the area of the Bogdanka Coal Mine, and areas directly
directly adjacent to it, was carried out using multispectral imagery from the Sentinel-2 mission
adjacent to it, was carried out using multispectral imagery from the Sentinel-2 mission (Level-1C
(Level-1C product), downloaded from [40]. For research purposes, 9 data packages were obtained
product), downloaded from [40]. For research purposes, 9 data packages were obtained in *SAFE
in *SAFE format, which contained images showing the reflectance value in 13 spectral channels
format, which contained images showing the reflectance value in 13 spectral channels (from visible
(from visible light to mid-infrared wavelengths). The aforementioned data set covered images
light to mid-infrared wavelengths). The aforementioned data set covered images registered over 6-
registered over 6-month periods, starting from the second half of the year 2015. In order to increase the
month periods, starting from the second half of the year 2015. In order to increase the accuracy of the
accuracy of the conducted analysis, during the selection of data, only images with a cloud cover not
conducted analysis, during the selection of data, only images with a cloud cover not exceeding 10%
exceeding 10% were chosen.
were chosen.
The acquired images were subjected to procedures aimed at standardizing the data registered in
The acquired images were subjected to procedures aimed at standardizing the data registered
various weather and lighting conditions. The first stage consisted of converting the radiometric values
in various weather and lighting conditions. The first stage consisted of converting the radiometric
of Digital Number (DN) to the value of recorded radiance LSAT according to the following Equation (1):
values of Digital Number (DN) to the value of recorded radiance LSAT according to the following
Equation (1): L = c + c ·DN (1)
SAT 0 1

𝐿 = 𝑐 + 𝑐 ∙ 𝐷𝑁 (1)
where c0 and c1 are the calibration constants for a given type of sensor and spectra channel, called shift
where 𝑐0 respectively
and gain, and 𝑐1 are the calibration constants for a given type of sensor and spectra channel, called
[41].
shift and
The gain, respectively
influence [41].
of phenomena and processes happening in the atmosphere was reduced in
the process of atmospheric correction,processes
The influence of phenomena and happening
during which in the atmosphere
the following parameterswaswere
reduced
takenin into
the
process
account:ofaverage
atmospheric
terraincorrection,
height (200during
m), thewhich the model
aerosol following parameters
(in the analyzedwerecase,taken intodedicated
a model account:
average
to poorlyterrain heightand
urbanized (200slightly
m), theindustrialized
aerosol modelareas
(in the analyzed
was adopted case,
[42])a and
model
thededicated
atmosphereto poorly
model
urbanized and slightly industrialized areas was adopted [42]) and the atmosphere
(depending on the date of image registration, the models Mid-Latitude Summer, Mid-Latitude Winter model (depending
on
andthe date of image
Sub-Arctic Summerregistration, the models Mid-Latitude Summer, Mid-Latitude Winter and Sub-
were chosen).
ArcticDuring
Summer were chosen).
the last stage of pre-processing, the spatial resolution of the data registered in near- and
During the
mid-infrared last stage
channels wasofresampled
pre-processing,
to 10 mthe spatial resolution
(channels 5, 6, 7, 8a, of
11 the
anddata registered
12 have in resolution
a spatial near- and
mid-infrared
of 20 m) using channels was interpolation
the bilinear resampled to method.
10 m (channels 5, 6, 7, 8a, 11 and 12 have a spatial resolution
of 20 In
m) using the bilinear interpolation method.
order to determine areas of flooding in the study area, two spectral indices were
used:InNormalized
order to determine areas of flooding
Difference Vegetation in theand
Index (NDVI) study area, Normalized
Modified two spectral indices were
Difference Water used:
Index
Normalized Difference Vegetation Index (NDVI) and Modified Normalized Difference Water Index
(MNDWI). The NDVI index was used to identify sites of vegetation degradation. Terrain subsidence,
with relatively high groundwater level, may lead to local flooding, having a significant impact on the
Sustainability 2020, 12, 9338 7 of 26

(MNDWI). The NDVI index was used to identify sites of vegetation degradation. Terrain subsidence,
with relatively high groundwater level, may lead to local flooding, having a significant impact on
the plant cover condition. The MNDWI index, on the other hand, is a combination of green and
mid-infrared channels
Sustainability and
2020, 12, x FOR was used for the purpose of monitoring changes in the coastlines
PEER REVIEW 7 of 26 of water

reservoirs (MNDWI assigns positive values to surface waters and negative values to vegetation cover,
plant cover condition. The MNDWI index, on the other hand, is a combination of green and mid-
soil andinfrared
built-up areas).and
channels Table
was2used
contains
for thethe equations
purpose used tochanges
of monitoring calculate thecoastlines
in the spectral of
indices.
water
reservoirs (MNDWI assigns positive values to surface waters and negative values to vegetation cover,
Table The
soil2.and main studies
built-up on the2 contains
areas). Table use of active and passive
the equations usedremote sensing
to calculate in monitoring
the spectral indices.the impact of
exploitation on the environment.
Table 2. The main studies on the use of active and passive remote sensing in monitoring the impact
environment. Mathematical Formula 1
of exploitation on the Index Source
Index
NDVI NIR−RED
Mathematical Formula 1 Source[43]
NIR+RED
MNDWI 𝑁𝐼𝑅 𝑅𝐸𝐷
GREEN−SWIR1
NDVI GREEN +SWIR1 [43] [44]
𝑁𝐼𝑅 + 𝑅𝐸𝐷
𝐺𝑅𝐸𝐸𝑁 𝑆𝑊𝐼𝑅1
1 NIR, Red, Green and SWIR1 correspond to the reflectance value in the Near-IR, Red, Green and Mid-IR channels,

respectively (1565–1655 nm). MNDWI 𝐺𝑅𝐸𝐸𝑁 + 𝑆𝑊𝐼𝑅1


[44]
1NIR, Red, Green and SWIR1 correspond to the reflectance value in the Near-IR, Red, Green and Mid-
IR channels, respectively (1565–1655 nm).
Calculations were carried out in the ENVI 5.5 software, using the following modules:
CalculationsFLAASH,
Raster Management, were carried out in the ENVI
Radiometric 5.5 software,
Calibration andusing
Bandthe following modules: Raster
Math.
Management, FLAASH, Radiometric Calibration and Band Math.
The location of the floodplains was determined on the basis of classification of the image
The location of the floodplains was determined on the basis of classification of the image
representing the difference
representing between
the difference between the
theMNDWI indexvalues
MNDWI index values for for
20152015 and During
and 2019. 2019. During the process
the process
of classification,
of classification, it was assumed that a change in the said index by a value exceeding 0.9 indicatesindicates
it was assumed that a change in the said index by a value exceeding 0.9 a a
significant increase in the area of floodplains or an increase in the area of water reservoirs in the study
significant increase in the area of floodplains or an increase in the area of water reservoirs in the study
area. Onarea.
the On thehand,
other other hand,
areas areas
where where pixels
pixels taketake values
values between0.7
between 0.7and
and0.90.9may
may potentially
potentially be be flooded
flooded by groundwater in the future. Figure 3 presents the identified flooding areas within and
by groundwater in the future. Figure 3 presents the identified flooding areas within and adjacent to
adjacent to the Bogdanka hard coal mine.
the Bogdanka hard coal mine.

Figure 3. Identification of floodplains in the Bogdanka hard coal mining region, based on Modified
Normalized Difference Water Index (MNDWI) values difference between 2015 and 2019. The black
solid polygon represents the Bogdanka deposit.
Sustainability 2020, 12, x FOR PEER REVIEW 8 of 26

Figure 3. Identification of floodplains in the Bogdanka hard coal mining region, based on Modified
Sustainability 2020, 12, 9338 8 of 26
Normalized Difference Water Index (MNDWI) values difference between 2015 and 2019. The black
solid polygon represents the Bogdanka deposit.
As the results indicate, the floodplains constitute 0.14% of the total area (4597 out of 3,243,601 pixels
As the results indicate, the floodplains constitute 0.14% of the total area (4597 out of 3,243,601
were classified as floodplains), covering 4511 km2 , and 2are located mostly along the borders of water
pixels were classified as floodplains), covering 4511 km , and are located mostly along the borders of
bodies located in the northern part of the study area. It should be emphasized that the southernmost
water bodies located in the northern part of the study area. It should be emphasized that the
subdivision zonesubdivision
southernmost is located within the areawithin
zone is located of pseudo-vertical displacements
the area of pseudo-vertical of terrain, the
displacements extent of
of terrain,
which was determined using Synthetic Aperture Radar Interferometry (InSAR).
the extent of which was determined using Synthetic Aperture Radar Interferometry (InSAR).
The identification
The identificationofofthe
theflooding
floodingzones
zones was
was also based on
also based onthe
theanalysis
analysisofofchanges
changes ininthethe NDVI
NDVI
in in
2015
2015 and 2019. Figure 4a shows the areas of the smallest and the largest changes in the NDVI in in
and 2019. Figure 4a shows the areas of the smallest and the largest changes in the NDVI
thethe
analyzed
analyzedtime
timeperiod,
period,andandFigure
Figure4b 4bshows
shows the final boundaries
the final boundariesofofthe
theflooding
floodingzones
zones (the
(the zones
zones
areare
located inin
located the places
the placesofofthe
thelargest
largestnegative
negative changes
changes inin NDVI
NDVIandandthe
thelargest
largestpositive
positive changes
changes in in
MNDWI). These areas were used at a later
MNDWI). These areas were used at a later stage stage in the analyses as input data in the spatial regression
analyses as input data in the spatial regression
model
modeland
andmachine
machinelearning.
learning.

Figure
Figure 4. 4.
(a)(a) NormalizedDifference
Normalized DifferenceVegetation
Vegetation Index
Index (NDVI)
(NDVI) index
indexrange;
range;(b)
(b)floodplains identified
floodplains identified
based on spectral indices for the studied area. The black solid polygon represents the Bogdanka
based on spectral indices for the studied area. The black solid polygon represents the Bogdanka deposit.
deposit.
2.4. InSAR Terrain Displacement Data
2.4. InSAR Terrain Displacement Data
Pseudo-vertical (in the Line of Sight LOS direction) displacements of the terrain caused by
Pseudo-vertical
underground hard coal(in the Line
mining in theof area
SightofLOS direction) displacements
the “Bogdanka” of the terrain
mine were determined usingcaused by
Sentinel-1
A/Bunderground hard coal
Synthetic Aperture mining
Radar in the area
imagery. of the “Bogdanka”
The displacement mine were
calculations were determined
carried outusing
using Sentinel-
the Small
1 A/B Synthetic
Baseline Aperture
Subset (SBAS) Radar[45–47],
technique imagery.allowing
The displacement calculations
for the processing were carried out
of interferograms in ausing the
time-series
Small Baseline Subset (SBAS) technique [45–47], allowing for the processing
manner. A total of 228 SAR images from ascending path no. 131 were used, covering a time period of interferograms in a
between 26th October 2014 and 1st October 2019. The computational process was performed with athe
time-series manner. A total of 228 SAR images from ascending path no. 131 were used, covering
time period between 26th October 2014 and 1st October 2019. The computational process was
use of the GMTSAR software [48]. The precise orbit ephemerides provided by the Sentinel-1 Quality
performed with the use of the GMTSAR software [48]. The precise orbit ephemerides provided by
Control Subsystem [49] were used to correct the satellite state vector. The 1 arc second digital elevation
the Sentinel-1 Quality Control Subsystem [49] were used to correct the satellite state vector. The 1 arc
model developed as part of the Shuttle Radar Topography Mission (SRTM) was used to correct for the
second digital elevation model developed as part of the Shuttle Radar Topography Mission (SRTM)
topographic phase of interferograms [50]. The interferometric phase unwrapping process was carried
was used to correct for the topographic phase of interferograms [50]. The interferometric phase
outunwrapping
using the Statistical-cost Network-flow
process was carried out using Algorithm for Phase
the Statistical-cost Unwrapping
Network-flow (SNAPHU)
Algorithm [51–53].
for Phase
A total of 778 interferograms
Unwrapping (SNAPHU) [51–53]. wereAcreated
total offor778the time-series analysis
interferograms with the
were created SBAS
for the technique.
time-series
Theanalysis
results with
of thethe
SBAS analysis are contained in raster format with cumulative terrain
SBAS technique. The results of the SBAS analysis are contained in raster format displacements
forwith
the date of each acquisition,
cumulative beginning for
terrain displacements withthethedate
firstof
acquisition (displacement
each acquisition, beginningvaluewith
0). For
the further
first
analysis, rasters
acquisition representing
(displacement the 0).
value annual (consistent
For further withrasters
analysis, optical data) increase
representing in subsidence
the annual (consistentwere
selected, i.e., October 2015, April 2016, April 2017, April 2018 and April 2019.
The majority of the subsidence areas were identified in the vicinity of the exploited deposit
Bogdanka. Figure 5 shows cumulative displacements from 27 September 2015 to 21 April 2019.
The only area outside the deposit boundaries is the Brzezno Lake Reserve, which is located 2 km north
of the Bogdanka deposit. Cumulative displacements in this area have reached a maximum of 185 mm.
with optical data) increase in subsidence were selected, i.e., October 2015, April 2016, April 2017,
April 2018 and April 2019.
The majority of the subsidence areas were identified in the vicinity of the exploited deposit
Bogdanka. Figure 5 shows cumulative displacements from 27 September 2015 to 21 April 2019. The
Sustainability 2020, 12, 9338 9 of 26
only area outside the deposit boundaries is the Brzezno Lake Reserve, which is located 2 km north of
the Bogdanka deposit. Cumulative displacements in this area have reached a maximum of 185 mm.
The
Thelargest
largestdisplacements
displacementswere wererecorded
recordedininthe
thearea
areaof
ofthe
themain
mainexploitation
exploitationfield,
field,especially
especiallyininits
its
northern
northern and and central parts,
parts, where
wherethe thecumulative
cumulativevalue
valueofof terrain
terrain displacement
displacement hashas reached
reached up1050
up to to
1050
mm.mm. WithinWithin the other
the other two two fields
fields (Stefanów
(Stefanów and and Nadrybie),
Nadrybie), the maximum
the maximum displacement
displacement waswas
600 600
mm.
mm.

Figure 5. Top: Maps of cumulative Line of Sight (LOS) displacements from October 2015;
Figure 5. Top: Maps of cumulative Line of Sight (LOS) displacements from October 2015; middle:
middle: cross-section AA’; bottom: cross-section BB’ of cumulative displacements over the studied
cross-section AA’; bottom: cross-section BB’ of cumulative displacements over the studied period.
period. The red border in the upper part of the map indicates the Brzezno Lake Reserve.
The red border in the upper part of the map indicates the Brzezno Lake Reserve.
2.5. Geological and Hydrogeological Data
The Detailed Geological Map of Poland 1:50,000 and the Hydrogeological Map of Poland—the
First Aquifer—Occurrence and Hydrodynamics 1:50,000, provided by the Polish Geological
Institute—National Research Institute—were used to select places where, from both geological
and engineering standpoints, flooding may occur. The analyzed area of the Bogdanka deposit is
Sustainability 2020, 12, x FOR PEER REVIEW 10 of 26

2.5. Geological
Sustainability and9338
2020, 12, Hydrogeological Data 10 of 26

The Detailed Geological Map of Poland 1:50,000 and the Hydrogeological Map of Poland—the
First
contained Aquifer—Occurrence
on four sheets of the andabove-mentioned
Hydrodynamics maps, 1:50,000, provided714
numbered by(Ostrów
the Polish Geological
Lubelski) [54,55],
Institute—National Research Institute—were used to select
715 (Orzechów Nowy) [56,57], 750 (ٞeczna) [58,59] and 751 (Siedliszcze) [60,61].places where, from both geological and
engineering standpoints, flooding may occur. The analyzed area of the Bogdanka deposit is contained
According to the Polish Hydrogeological Dictionary [62], a flooding is defined as “the appearance
on four sheets of the above-mentioned maps, numbered 714 (Ostrów Lubelski) [54,55], 715
of groundwater close to the ground surface in connection with: lowering of the ground surface,
(Orzechów Nowy) [56,57], 750 (Łęczna) [58,59] and 751 (Siedliszcze) [60,61].
accumulation of groundwater due to the rising of the water table in watercourses and surface reservoirs,
According to the Polish Hydrogeological Dictionary [62], a flooding is defined as “the
and anthropogenic inhibition of groundwater flow”. The phenomenon of terrain flooding may occur
appearance of groundwater close to the ground surface in connection with: lowering of the ground
naturally, inter alia in non-drainage depressions, provided there is a poorly permeable substrate below
surface, accumulation of groundwater due to the rising of the water table in watercourses and surface
thereservoirs,
aquifer orand could be a consequence
anthropogenic of human
inhibition activity, especially
of groundwater flow”. Theinphenomenon
the case of underground mining,
of terrain flooding
which leads to the formation of subsidence troughs. Flooding often occurs
may occur naturally, inter alia in non-drainage depressions, provided there is a poorly permeable in large areas with a slight
slope
substrate below the aquifer or could be a consequence of human activity, especially in the case of or
or depression, with shallow occurrence of groundwater, in the presence of impermeable
poorly permeable
underground soil, and
mining, during
which leadsthe occurrence
to the formationofofheavy rainfall,
subsidence whichFlooding
troughs. lead to an increase
often occursininthe
groundwater
large areas with levela [63].
slight slope or depression, with shallow occurrence of groundwater, in the presence
Various sitesor were
of impermeable poorlyclassified
permeable as soil,areas predisposed
and during to theof occurrence
the occurrence heavy rainfall, ofwhich
flooding
lead toand
wetlands—mainly
an increase in theareas with organic
groundwater soil, originating primarily from lake accumulation, such as low
level [63].
Various sites were
peats (characterized classified
by slowly flowingas areas predisposed
groundwater), to the occurrence
transitional peats (fedof mainly
floodingwithandrainwater)
wetlands—and
mainly areas with organic soil, originating primarily from lake accumulation,
silt, as well as the accompanying clays and stagnant silts and their derivatives, with the simultaneous such as low peats
(characterized
presence by slowly flowing
of the groundwater table atgroundwater),
a shallow depth transitional
(up to 2 peats
m below (fedthe
mainly
terrainwith rainwater)
level). and
The obtained
silt,were
data as well as the accompanying
vectorized to enable their clays and stagnant
further siltsand
processing andthe
their derivatives,was
classification withperformed
the simultaneous
based on
thepresence of the groundwater
susceptibility to flooding andtableoccurrence
at a shallowofdepth (up to 2Three
wetlands. m below the terrain
categories for level). The obtained
the occurrence of the
data were vectorized to enable their further
first level of the groundwater table were adopted (Figure 6b):processing and the classification was performed based
on the susceptibility to flooding and occurrence of wetlands. Three categories for the occurrence of
1. theUp firsttolevel
1 m of
below ground level.
the groundwater table were adopted (Figure 6b):
2. Between 1–2m below ground level.
1. Up to 1 m below ground level.
3. 2. More than1–2m
Between 2 m below
belowground
ground level.
level.
3. More than
Furthermore, 2m
three below ground
simplified level.
soil categories were adopted (Figure 6a):
1. Furthermore, three simplified soil categories were adopted (Figure 6a):
Organic soils.
2. 1. Poorly permeable
Organic soils. silts, loams and clays.
3. 2. Permeable
Poorly permeable
sands andsilts, loams and clays.
gravels.
3. Permeable sands and gravels.

Figure 6. (a) Soil types; (b) groundwater levels used in the analysis, acquired from the geological and
Figure 6. (a) Soil types; (b) groundwater levels used in the analysis, acquired from the geological and
hydrogeological
hydrogeologicalmaps.
maps.The
Theblack
blacksolid
solid polygon representsthe
polygon represents theBogdanka
Bogdankadeposit.
deposit.

3. Methods—GIS and Machine Learning Analysis


Figure 7 contains a diagram presenting the methodology of data processing and the performed
analysis. The individual stages are described in this chapter.
Sustainability 2020, 12, x FOR PEER REVIEW 11 of 26

3. Methods—GIS and Machine Learning Analysis


Sustainability 2020, 12, 9338 11 of 26
Figure 7 contains a diagram presenting the methodology of data processing and the performed
analysis. The individual stages are described in this chapter.

Figure7.7.Block
Figure Blockdiagram
diagramofofthe
theperformed
performedanalysis.
analysis.

3.1.Overview
3.1. Overview
Analysisofofthethe
Analysis flooding
flooding characteristics
characteristics within
within the study
the study area started
area started with
with the the preparation
preparation of inputof
inputIndata.
data. In addition
addition to all thetodata
all the data mentioned
mentioned in the previous
in the previous section, a section, a digital model
digital elevation elevation
frommodel
the
from the
SRTM SRTMwas
mission mission was (Figure
obtained obtained8a)(Figure 8a) a[50].
[50]. As As of
result a result of processing,
processing, the rastersthe of
rasters of terrain
terrain slope
slope (values
(values in %) and inexposure
%) and exposure (values generalized
(values generalized to 4 main geographical
to 4 main geographical directions anddirections and
flat terrain) wereflat
terrain)
also were(Figure
obtained also obtained (Figure 8b,c,
8b,c, respectively). Therespectively).
above data, as Thewellabove
as thedata, as welldeveloped
previously as the previously
rasters
developed
(Table rasters
3), were (Tableinto
combined 3), were combined
a stack of data ininto a stack ofreference
a uniform data in aframe,
uniform reference
spatial extentframe, spatial
and spatial
extent andAll
resolution. spatial resolution.
rasters All rastersinto
were transformed werethetransformed into the Universal
Universal Transverse MercatorTransverse
(UTM) reference (UTM)
Mercatorframe
reference
(UTM frameData
zone 32N). (UTM withzone 32N).resolution
a spatial Data with a spatial
other than 10 resolution otherresolution
× 10 m (highest than 10 of × the
10 m (highest
Sentinel-2
resolution
data) of the Sentinel-2
were converted data) wereusing
to that resolution converted
a linearto interpolation
that resolution using aDuring
method. linear the
interpolation
analysis,
method.
the pixels During
that didthenotanalysis, the pixels
have correct valuesthat
fordid not have
at least correct
one of valueswere
the rasters for atomitted.
least oneTheof the rasters
range of
were
the NDVIomitted.
valuesThe range
in the of the period
analyzed NDVI values
were alsoin the analyzed
calculated outperiod weredatasets;
of 4 NDVI also calculated
for eachout of 4
pixel,
aNDVI
minimumdatasets;
valuefor
waseach pixel, a minimum
subtracted value was subtracted
from the maximum. from the
The cumulative maximum.
terrain The cumulative
displacements for the
terrain
entire displacements
period of the study for the calculated
were entire period of the
as well. Bothstudy
data were calculatedand
pre-processing as proper
well. Both
data data pre-
analysis
processing
and modeling and proper
were data analysis
performed in theand modeling
Python were performed
environment in the Python
using open-source environment
libraries: numpy using
[64],
open-source
sklearn libraries:[66]
[65], geopandas numpy
and [64],
pysalsklearn
[67]. [65], geopandas [66] and pysal [67].
Sustainability 2020, 12, 9338 12 of 26

Sustainability 2020, 12, x FOR PEER REVIEW 12 of 26

Figure 8. (a) terrain elevation (based on the SRTM 1” DEM); (b) terrain slope; (c) terrain aspect,
Figure 8. (a) terrain elevation (based on the SRTM 1” DEM); (b) terrain slope; (c) terrain aspect,
calculated from the terrain elevation data. The black solid polygon represents the Bogdanka deposit.
calculated from the terrain elevation data. The black solid polygon represents the Bogdanka deposit.
Table 3. List of variables used in the analysis.
Table 3. List of variables used in the analysis.
No. Name Description
No. Name Description
Based on the Sentinel-2 imagery obtained for April,
1 NDVI RangeBased on the Sentinel-2 imagery obtained for(Section
April, during the
1 NDVI Range during the years 2016–2019 2.3)
years
Based on 2016–2019 (Section
the Sentinel-1 2.3)5 rasters: October
SAR data,
2 LOS terrain displacements
LOS terrain Based on the Sentinel-12015,SARApril
data,2016–2019
5 rasters:(Section
October 2.4)
2015, April
2 3displacementsGroundwater depth Divided into 3 categories (Section 2.5)
2016–2019 (Section 2.4)
Divided into 3 categories, depending on the soil
3 Groundwater
4 depthclassification of soil Divided
Geological into 3 categories (Section 2.5)
susceptibility to water retention (Section 2.5)
Geological
5 classification DEM Divided into 3 categories, dependingSRTM on1”the soil susceptibility
4
6 of soil Slope to waterBased
retention
on the(Section 2.5) in %
DEM, values
5 DEM Based on theSRTM
DEM, 1”values generalized to 4 main
7 Exposure
6 Slope Basedgeographic
on the DEM, directions and flat terrain
values in %
8 Identified floodplains Based on the MNDWI values (Section 2.3)
Based on the DEM, values generalized to 4 main geographic
7 Exposure
directions and flat terrain
3.2.
8 Preliminary Statistical
Identified floodplainsAnalysis Based on the MNDWI values (Section 2.3)
The analysis started with a correlation study between spatial variables that have the potential for
3.2.being
Preliminary Statisticalvariables
the descriptive Analysis for the occurrence of flooding. Histograms of independent variable
distributions
The analysiswere thenwith
started prepared and examined.
a correlation The variable
study between distributions
spatial variables that were
have compared with
the potential
for being the descriptive variables for the occurrence of flooding. Histograms of independent variable
distributions were then prepared and examined. The variable distributions were compared with
Sustainability 2020, 12, 9338 13 of 26

histograms of variables, separated according to the classification of the flooding areas performed
in Section 2.3. Zone statistics describing these distributions were also calculated. This allowed for
the selection of factors that may affect the occurrence of flooding and the size of their impact on
floodplain formation.

3.3. Machine Learning Global Model


The rate of influence of the examined variables on the occurrence of wetlands was assessed
on the basis of a model created with the use of supervised classification. Due to the presence of
both continuous and discrete (categorical) variables, the ensemble model based on tree architecture,
Random Forest (RF), was selected [68]. After discarding pixels with missing data, 3,177,834 pixels out
of the initial 3,243,601 pixels were processed. Incorrect values, which were excluded from the analysis,
constituted 2.03% of the total image pixel count. Each pixel contained information about the values
of 7 independent variables (features 1–7 in Table 3) and 1 dependent variable (feature 8 in Table 3).
The optimal parameters of the RF model, in particular the maximum tree depth, the minimum number
of observations in a leaf, the minimum number of observations for branch split and the number of
base estimators, were all determined using Grid Search analysis using stratified 3-fold cross-validation.
The F1 (4) value, i.e., the harmonic mean of the precision (2) and recall (3) metrics, was selected as a
criterion of the quality of the model fit.

TP
precision = (2)
TP + FP

TP
recall = , (3)
TP + FN
2
F1 = 1 1
, (4)
precision + recall

As the RF model is based on an ensemble of decision trees, it is impossible to determine the rate
of influence of individual independent variables on the model result, contrary to traditional regression
models. It is, however, possible to do so indirectly by analyzing the SHapley Additive exPlanations
(SHAP) values. SHAP is based on assumptions resulting from the game theory and allows for the
assessment of the size and type of influence of individual explanatory variables on the result of any
machine learning or deep learning model [69]. Efficient algorithms are available to calculate SHAP
values, in particular for tree-based models [70]. For the RF model used in the analysis, the SHAP
values were calculated and analyzed in two ways: (1) the SHAP point values of all observations
were summarized, reflecting the positive or negative impact of a given independent variable on the
classification of a given pixel as a flooded area and (2) the SHAP values were averaged to obtain the
overall weight of a given independent variable in the model’s decision on whether to classify the area
as flooded.

3.4. Geographically Weighted Regression (GWR) Local Modeling


The RF model constructed in Section 3.3 is a global model, i.e., it does not take into account
the spatial relationships between pixels closely related to each other. In order to check whether
such relationships exist, and, as a result, whether the influence of a given explanatory variable on
the occurrence of flooding in some regions is greater or smaller than in others, the Geographically
Weighted Regression (GWR) model [71] was fitted to the set of explanatory variables with a continuous
distribution. This method is based on fitting local regression models to the groupings of observations in
a specific neighborhood of each point and can be expressed as (5). Observation weights are determined
Sustainability 2020, 12, 9338 14 of 26

based on the distance from the central point of a given local regression model (fixed bandwidth) or by
the neighborhood rank of the k-nearest neighbors, computed for the central point (adaptive bandwidth).
r
X
yi = β0 (ui , vi ) + β j (ui , vi )xij + εi , i = 1, 2, . . . , n (5)
j=1

where: yi —values of the dependent variable; xij — independent variables; ui , vi —coordinates of n


observations’ locations; β j (ui , vi )—r + 1 coefficients of the local regression model; εi —independent,
random errors following the N(0, 1) distribution.
The use of GWR allows for the identification and correct modeling of the relationship between the
independent variables and the dependent variables in areas where this relationship differs significantly
from the global trend. GWR comes in several variants, depending on the type of distribution of
dependent variables. Three types of distributions can be used: Gaussian, Poisson and binomial
distribution [72]. Due to lack of possibility of simultaneous modeling of variables with different
distributions in the utilized software, the GWR model was adjusted only to continuous variables:
NDVI range, terrain height, cumulative terrain displacements and slope. During the analysis in
Section 4.2, these variables were ranked first, second, fourth and fifth, respectively, in order of the
magnitude of the impact on the RF model classification score, so the resulting GWR model should
describe the relationships between the variables well enough to identify possible local anomalies
in the impact of specific factors. Due to the use of the Gaussian kernel, the analyzed variables
were transformed using the Yeo–Johnson power transform [73], converting the distributions of these
variables to the standard normal distribution N(0, 1).
Due to computational limitations and a significant excess of data on non-flooded areas, the GWR
analysis was carried out on down-sampled data: all pixels with flooding were selected, and an additional
10 times more pixels, selected randomly from the whole study area, were added, representing places
of no flooding. The optimal bandwidth of the model was determined by optimizing the Akaike
Information Criterion with a correction for small sample sizes (AICc) estimator, which describes
the quality of the statistical model used for the comparative analysis of a number of models [72].
Lower estimator values correspond to a higher quality of the model.

4. Results and Discussion


The presentation of analysis results has been divided into three subsections corresponding to the
order of operations performed (see Section 3).

4.1. Preliminary Statistical Analysis


The correlation analysis between spatial variables showed that most of the variables do not have
a significant correlation with each other (Figure 9). Only the groundwater depth shows a noticeable
correlation with the geological type of the substrate. However, it is not significant enough to consider
them as directly dependent (collinearity) and exclude any of them from further analysis.
Figure 10 shows histograms and statistics of cumulative displacements, NDVI range, elevation and
slope, classified into flooded and non-flooded areas. Results of the same classification of parameters
with discrete values: terrain aspect, groundwater level and soil type, are shown in Figure 11.
The histogram of cumulative terrain displacements clearly shows that in the analyzed period,
flooding occurred more frequently in the areas of subsidence caused by mining workings. Both mean
and median value of subsidence in flooded areas is 2.5–3 times higher than in the rest of the area.
The NDVI index range distribution also indicates differences between flooded areas and areas with
no flooding. However, this dependence is not so clear—although, flooding was much more frequent in
areas where the NDVI value was approximately constant (and therefore the maximum differences
oscillated around 0). The non-flooded areas were most often characterized by moderate differences,
reaching 0.3. In the flooded areas, there were maximum NDVI range values, from predominant values
Sustainability 2020, 12, 9338 15 of 26

Sustainability 2020, 12, x FOR PEER REVIEW 15 of 26


of around 0 to very significant differences in the index values, constituting the second distribution
mode of around 1.35. Both distribution medians are approximately the same, but their mean values are
their mean values are significantly different: for flooded areas, the mean is almost twice the mean for
significantly different: for flooded areas, the mean is almost twice the mean for the remaining areas.
the remaining areas.

Figure 9. Correlation matrix of explanatory features.


Figure 9. Correlation matrix of explanatory features.
The terrain elevation and the slope histograms clearly show that the flooding occurred mainly in
The terrain elevation and the slope histograms clearly show that the flooding occurred mainly
the areas situated relatively low and with low slopes, particularly in flat areas. The mean elevation
in the areas situated relatively low and with low slopes, particularly in flat areas. The mean elevation
of the flooded areas was approximately 5 m lower than in other areas. For the slope, the analogous
of the flooded areas was approximately 5 m lower than in other areas. For the slope, the analogous
difference in mean values was 0.7%, but their medians differed by more than 1.5%.
difference in mean values was 0.7%, but their medians differed by more than 1.5%.
The terrain aspect, apart from the more frequent occurrence of flooding in flat areas,
The terrain aspect, apart from the more frequent occurrence of flooding in flat areas, already
already mentioned in the slope analysis, does not show a significantly strong correlation with
mentioned in the slope analysis, does not show a significantly strong correlation with flooding
flooding occurrence. The share of individual geographic directions of exposure for flooded areas,
occurrence. The share of individual geographic directions of exposure for flooded areas, after
after considering the increase in the share of flat areas, is approximately reduced relative to non-flooded
considering the increase in the share of flat areas, is approximately reduced relative to non-flooded
regions. The smallest difference is for areas facing north.
regions. The smallest difference is for areas facing north.
The groundwater level category distributions for flooded and other areas differ substantially.
The groundwater level category distributions for flooded and other areas differ substantially.
Around 75% of the flooding occurred in areas with the shallowest groundwater table. The regions with
Around 75% of the flooding occurred in areas with the shallowest groundwater table. The regions
the deepest groundwaters account for only 5% of the floodplains, compared to 53% of the entire region
with the deepest groundwaters account for only 5% of the floodplains, compared to 53% of the entire
where these waters occur. Similarly, the dependence of the geological type of substrate on flooding
region where these waters occur. Similarly, the dependence of the geological type of substrate on
occurrence is considerable—in the areas most susceptible to water retention, constituting a total of 28%
flooding occurrence is considerable—in the areas most susceptible to water retention, constituting a
of the study area, 86% of flooding occurred. On the other hand, soil types with the lowest retention,
total of 28% of the study area, 86% of flooding occurred. On the other hand, soil types with the lowest
covering 40% of the research area, were covered by only 1% of all flooding.
retention, covering 40% of the research area, were covered by only 1% of all flooding.
Sustainability 2020, 12, 9338 16 of 26
Sustainability 2020, 12, x FOR PEER REVIEW 16 of 26

Figure
Figure 10. Histograms
10. Histograms andand statistics
statistics of continuous
of continuous variables,
variables, classified
classified intointo flooded
flooded andand non-flooded
non-flooded areas.
areas.
Sustainability 2020, 12, 9338 17 of 26
Sustainability 2020, 12, x FOR PEER REVIEW 17 of 26

Figure 11. Histograms


Figure 11. Histograms of
of discrete
discrete variables,
variables, classified
classified into
into flooded
flooded and
andnon-flooded
non-floodedareas.
areas.
4.2. Machine Learning Global Model
4.2. Machine Learning Global Model
The metrics values of the trained RF model are summarized in Table 4. The confusion matrix is
The metrics
presented in Tablevalues
5. Theofmap
the (Figure
trained 12)
RF model
depictsare
thesummarized in Table
predictions made by 4.
theThe confusion
model. It canmatrix is
be seen
presented in Table 5. The map (Figure 12) depicts the predictions made by the model. It can
that the model achieves satisfactory results. The developed RF model correctly detected about 76% of
be seen
that the model achieves satisfactory results. The developed RF model correctly detected about 76%
all floodplains, with a total model accuracy of 99.96%. However, it should be considered that pixels
of all floodplains, with a total model accuracy of 99.96%. However, it should be considered that pixels
classified as flooded are also subject to an error of earlier classification with the MNDWI. Moreover,
classified as flooded are also subject to an error of earlier classification with the MNDWI. Moreover,
the discrepancies in the classification results and the RF model’s prediction do not concern most of the
the discrepancies in the classification results and the RF model’s prediction do not concern most of
entire floodplains but only the spatial extent of individual larger flooding areas, which is visible on the
the entire floodplains but only the spatial extent of individual larger flooding areas, which is visible
map in Figure 12a,b.
on the map in Figure 12.
Table 4. Random Forest model: accuracy metrics.
Table 4. Random Forest model: accuracy metrics.
Precision Recall F-1 Score
Precision Recall F-1 Score
Non-flooded area 99.97% 99.99% 99.98%
Non-flooded
Flooded area area 99.97%
93.42% 99.99%
75.85% 99.98%
83.73%
Flooded area 93.42% 75.85% 83.73%
Table 5. Random Forest model: confusion matrix.
Table 5. Random Forest model: confusion matrix.
Predicted
Non-FloodedPredicted
Area Flooded Area
Non-Flooded Area Flooded Area
Non-flooded area 3,173,140 238
True Non-flooded area 3,173,140 238
True Flooded area 1076 3380
Flooded area 1076 3380
The SHAP values and the averaged influence of the dependent variables are shown in Figure 13.
The SHAP values and the averaged influence of the dependent variables are shown in Figure
The SHAP values analysis confirms most of the observations made in the histogram analysis in
13. The SHAP values analysis confirms most of the observations made in the histogram analysis in
Section 4.1. It accounts for an additional validation of the correct functioning of the developed RF
Section 4.1. It accounts for an additional validation of the correct functioning of the developed RF
model. High values of the NDVI ranges contributed significantly to the classification of the area
model. High values of the NDVI ranges contributed significantly to the classification of the area as
as flooded, while low ranges were only of slight importance in reducing the probability of such
flooded, while low ranges were only of slight importance in reducing the probability of such
classification. Lower elevation values contributed to the classification of the area as flooded more
classification. Lower elevation values contributed to the classification of the area as flooded more
often, but it is also clear that many low-lying areas were not classified correctly. High groundwater
often, but it is also clear that many low-lying areas were not classified correctly. High groundwater
depth values clearly reduced the likelihood of the site being identified as a floodplain. Large values
depth values clearly reduced the likelihood of the site being identified as a floodplain. Large values
of subsidence significantly increased this probability, while values around 0 resulted in a slight
of subsidence significantly increased this probability, while values around 0 resulted in a slight
decrease of possibility of classifying an area as flooded. Areas with a low slope or flat regions
decrease of possibility of classifying an area as flooded. Areas with a low slope or flat regions had an
had an increased chance of being classified as a floodplain. Finally, high values of the geological
increased chance of being classified as a floodplain. Finally, high values of the geological type of
type of substrate (corresponding to the areas with the lowest retention) had a negative impact on
substrate (corresponding to the areas with the lowest retention) had a negative impact on positive
classification.The averaged influence of the dependent variables shows that the NDVI range variable
Sustainability 2020, 12, 9338 18 of 26

positive classification.The averaged influence of the dependent variables shows that the NDVI
Sustainability 2020, 12, x FOR PEER REVIEW 18 of 26
range
variable mostly associated with the occurrence of floodplains. This may be due to both the change in
Sustainability 2020, 12, x FOR PEER REVIEW 18 of 26
mostly associated
characteristics with the occurrence
of the electromagnetic waveofreflection
floodplains. Thisarea
in the maymostly
be duecovered
to bothwith
the change
water andin the
characteristics
degradation of the electromagnetic wave reflection in the area mostly covered
mostly associated with the occurrence of floodplains. This may be due to both the change in local
of vegetation caused by inundation. The next factor is terrain with water
elevation—logically, and the
the
lowerdegradation
areas are more
characteristics ofofvegetation
prone to caused
flooding.
the electromagnetic by The
inundation.
wavenext Thevariables
three
reflection next
in thefactor is terrain
area(groundwater
mostly elevation—logically,
covered level, water and the
withcumulative terrain
the
local lower
degradation areas are more prone to flooding. The next three variables (groundwater level, cumulative
displacement andofslope)
vegetation
havecaused by inundation.
a similar The next factor
effect on prediction results.is terrain elevation—logically,
The geological the has
type of subsoil
terrain displacement
local lower and slope) have a similar
Theeffect on prediction results. The geological type of
a slightly lowerareas are more
impact. prone to
Exposure flooding.
has the lowest next three
impact variables
of (groundwater
all variables—flat level, cumulative
areas, more prone to
subsoil
terrain has a slightly lower
displacement impact.
and slope) haveExposure haseffect
a similar the lowest impact ofresults.
all variables—flat areas, more
flooding,
prone
areto
already
flooding,
included
are
in the
already
slope
included
variable,
in the
andonthe
slope
prediction
geographic
variable, and the
The geological
direction
geographic
type
of exposure
direction
of not
did
of
subsoil has a slightly lower impact. Exposure has the lowest impact of all variables—flat areas, more
turn out to be did
exposure related
not turnto occurrence of inundation.
out to be related to occurrence of inundation.
prone to flooding, are already included in the slope variable, and the geographic direction of
exposure did not turn out to be related to occurrence of inundation.

Figure 12. Result of the Random Forest machine learning algorithm, showing the classification of the
Figure 12. Result of the Random Forest machine learning algorithm, showing the classification of the
areas as flooded and non-flooded, over the studied area.
areasFigure 12. Result
as flooded and of the Random over
non-flooded, Forestthe
machine learning
studied area. algorithm, showing the classification of the
areas as flooded and non-flooded, over the studied area.

Figure 13. Left: feature importance in the Random Forest prediction of flooded areas; right: feature
impact on the Random Forest prediction of flooded areas.
Figure
Figure 13. Left:
13. Left: feature
feature importanceininthe
importance theRandom
Random Forest
Forest prediction
predictionofofflooded areas;
flooded right:
areas; feature
Right: feature
impact on the Random Forest prediction of flooded
impact on the Random Forest prediction of flooded areas. areas.
Sustainability 2020, 12, x FOR PEER REVIEW 19 of 26
Sustainability 2020, 12, 9338 19 of 26
4.3. GWR Local Modeling

4.3. GWR Local Modeling


The bandwidth estimated for the optimal AICc value was 383 m. The global linear regression
model,Thefitted to compare
bandwidth the results,
estimated for thehas an AICc
optimal AICcofvalue
22,533, and
was them.correlation
383 The globalcoefficient reaches
linear regression
only 0.02.
model, Onto
fitted itscompare
basis, however,
the results,the has
statistical
an AICcsignificance
of 22,533, of
andthethe
regionalization of the regression
correlation coefficient reaches
model
only deviations
0.02. for a given
On its basis, however, variable was determined.
the statistical The of
significance GWR model statisticsof
the regionalization are
thepresented
regression in
Table 6.deviations
model Based on for the ap-value, for which
given variable wasthe statistical significance
determined. The GWR model threshold was set
statistics areatpresented
<5%, it wasin
found6.that
Table onlyonthe
Based therelation
p-value, offor
flooding
whichand the NDVIsignificance
the statistical range did not show a was
threshold set at <5%,
statistically significant
it was
regionalization.
found that only theTherelation
remaining three variables
of flooding and thereached p-values
NDVI range didbelow
not show0.5%.a The AICc value
statistically for the
significant
GWR model was
regionalization. The44,768 and three
remaining the correlation coefficient
variables reached was below
p-values 0.92. This
0.5%. means
The AICc thatvalue
the for
model
the
accounting
GWR modelforwasthe spatial
44,768 andvariability of thecoefficient
the correlation regressionwascoefficients
0.92. This for the that
means fourthe
analyzed dependent
model accounting
variables
for is much
the spatial better of
variability thanthethe global regression
regression coefficientsmodel.
for theThe
fourmodel
analyzed deviation
dependentmap variables
(Figure 14) is
showsbetter
much a relatively
than the low
globalspatial autocorrelation
regression of positive
model. The model andmap
deviation negative
(Figuredeviation
14) showsvalues—the
a relatively
regionalization
low of the model
spatial autocorrelation prevented
of positive andthe occurrence
negative of large,
deviation uniform regionalization
values—the clusters of similar values
of the modelof
deviation. the occurrence of large, uniform clusters of similar values of deviation.
prevented

Table 6. GWR
Table6. GWR model:
model: coefficients
coefficients statistics.
statistics.

Variable
Variable Mean Std.
Mean Std. Min.
Min. Median
Median Max.
Max. p-Value
p-Value
Cumulative
Cumulative displacement−0.01−0.01
displacement 0.08
0.08 −0.66
−0.66 0.00
0.00 0.33
0.33 0.00
0.00
NDVI
NDVI maximum
maximum difference
difference −0.02−0.02 0.14
0.14 −0.54
−0.54 0.00
0.00 0.61
0.61 0.54
0.54
Terrain heightheight
Terrain −0.10−0.10 0.33
0.33 −8.13
−8.13 0.00
0.00 2.28
2.28 0.00
0.00
Slope 0.00 0.14 −1.98 0.00 4.06 0.00
Slope 0.00 0.14 −1.98 0.00 4.06 0.00

Figure 14. GWR


Figure 14. GWR model:
model: map
map of
of residuals.
residuals.

The spatial variability of the regression coefficients in the GWR model is shown in Figure 15.
The spatial variability of the regression coefficients in the GWR model is shown in Figure 15.
Through their analysis, one can see the dependencies causing their regionalization. The areas highlighted
Through their analysis, one can see the dependencies causing their regionalization. The areas
in Figure 15 as no. 1, 3 and 4, where the negative values of terrain displacement have a significantly
highlighted in Figure 15 as no. 1, 3 and 4, where the negative values of terrain displacement have a
Sustainability 2020, 12, 9338 20 of 26

Sustainability 2020, 12, x FOR PEER REVIEW 20 of 26


greater impact on the occurrence of flooding, are in fact covering lakes created (or enlarged) as a
resultsignificantly
of damage causedgreater impact on the
by mining occurrence
activities, of flooding,
in particular theare in fact covering
subsidence formation lakes created
and (or
disturbances
enlarged) as a result of damage caused by mining activities, in particular the subsidence formation
in local groundwater levels. Negative values of the coefficient for the Z variable are caused by the
and disturbances in local groundwater levels. Negative values of the coefficient for the Z variable are
increased probability of flooding in the subsided regions, but due to local changes in the topography, the
caused by the increased probability of flooding in the subsided regions, but due to local changes in
magnitude of this impact may be greater or smaller in various regions. For this reason, negative values
the topography, the magnitude of this impact may be greater or smaller in various regions. For this
of thereason,
GWRnegative
coefficient were
values obtained
of the in the areas
GWR coefficient wereof flooding
obtained marked
in the asflooding
areas of 1, 3 andmarked
4. In area
as 1, 3no. 2,
with and
a topography
4. In area no. 2, with a topography favoring the occurrence of flooding, there were only a fewof the
favoring the occurrence of flooding, there were only a few enlargements
existing water reservoirs,
enlargements most likely
of the existing unrelated most
water reservoirs, to mining
likely activities,
unrelated to hence
mining theactivities,
positivehence
valuestheof the
GWRpositive values
coefficient forofthe
theelevation
GWR coefficient
variable.forBased
the elevation
on the variable.
NDVI, no Based on the NDVI,
significant no significant
vegetation degradation
vegetationindegradation
was detected was detected
this area, resulting in in this area,
negative resulting
values in negative
of this factor values
assigned of this factor
in the assigned
GWR analysis.
in the GWR analysis. The impact of slope on the occurrence of flooding may also
The impact of slope on the occurrence of flooding may also vary, depending on the local conditions vary, depending on of
the local conditions of the terrain relief, e.g., increasing the risk of flooding in flat areas with natural
the terrain relief, e.g., increasing the risk of flooding in flat areas with natural topography (area no. 1).
topography (area no. 1). On the other hand, for the no. 4 area the values of the slope coefficient assume
On the other hand, for the no. 4 area the values of the slope coefficient assume peak values-negative for
peak values-negative for the area of the water reservoir (because it was formed on a flat area), and
the area of the water reservoir (because it was formed on a flat area), and positive near its north-west
positive near its north-west shore, which may be caused by its bigger inclination. In the areas marked
shore,with
which
no. 5,may be caused
flooding byidentified
has been its biggerin inclination. In the
agricultural areas areas
in the nearmarked
vicinity with no. 5, flooding
of post-mining heaps. has
been NDVI
identified
valueindecreases
agricultural
were areas in the near
also detected vicinity
for this of post-mining
area, visible on the NDVI heaps.
rangeNDVI
factor value
map fordecreases
the
were GWR
also detected
method. for this area, visible on the NDVI range factor map for the GWR method.

Figure 15. Geographically Weighted Regression (GWR) model: spatial variability of coefficients.
Black solid rectangles represent significant areas, characterized in text.
Sustainability 2020, 12, 9338 21 of 26

4.4. Summary
A very wide spectrum of spatio-temporal data was used in the conducted study: radar and optical
imaging from the Sentinel-1 and Sentinel-2 satellites, geological and hydrogeological data and maps,
a Digital Elevation Model and meteorological data as well. SBAS satellite radar interferometry and
indices based on hyperspectral data as well as GIS tools were used to process these input data. As a
result, various datasets (NDVI, LOS displacement time series, groundwater depth, geological substrate,
DEM, slope map, aspect map and map of areas identified as flooded, based on the MNDWI index)
were used. Based on these datasets, a statistical study of the correlation between spatial variables using
spatial regression methods and machine learning supervised classification was performed.
The RF model trained in the study allowed for the correct identification of 76% of flooded zones,
based on seven independent spatial variables, and an analysis of the impact of individual variables
on the recognition of a given region as susceptible to flooding by the machine learning algorithm.
It should also be noted that with the GWR analysis, it was possible to identify and interpret local
changes in the magnitude of said impact in selected areas.
Firstly, attention should be paid to the fact that the flooded area formation may be the result
of several factors, as is shown by the analysis. Inundations occurred much more frequently in the
areas of subsidence caused by mining activities, as well as in natural local sites of terrain depression.
An important aspect is that coal mining causes a gradual subsidence of areas where mining is carried out.
The process of subsiding is due to the fact that there are no larger geological structures, e.g., faults [74],
which would most likely reinforce the terrain deformation process. Over the studied area, the main
effect of mining operations is a slow increase in terrain range of the existing subsidence troughs.
As the exploitation area grows, the subsiding area also widens. Based on the analysis of cross-sections
(Figure 5), it can be concluded that the terrain displacements are not spatially uniform—the change
in displacement speed depends on the progress of the front and the exploitation of a given seam.
For this reason, in some areas, the rate of subsidence can be very high in one year and can decrease
sharply later. Such periodic changes may also affect the rate of appearance and disappearance of
floodplains. Contrary to other Polish hard coal mines located in the Silesian region, the influence zone
of the Bogdanka mine covers agricultural areas that are not heavily urbanized. Over uninhabited
areas, the terrain subsidence causes a seeming rise in the water level [75]. With a relatively high
groundwater level in the Bogdanka mine vicinity, its further raising contributes to the enlargement
of inundation zones, some of which have a permanent form [76]. This phenomenon is exacerbated
by unfavorable weather conditions, such as prolonged rainfall or spring thaw, which may further
cause flooding. In places where the variable of cumulative ground displacement is of the greatest
significance, primarily water reservoirs and wetlands can be found.
The examples of the use of remote sensing data, GIS and machine learning in the aspect of
monitoring changes in the area of terrain surface, quoted in Section 1, do not undertake this issue in
such a comprehensive way as the approach adopted in this study. In the presented paper, it was shown
that with the combined use of various datasets and modern methods of geospatial data processing
and analysis, a more accurate and comprehensive analysis of the occurring phenomena is possible.
The methodology presented in the paper can provide a tool for companies from the mining and energy
industries to enhance the monitoring of areas affected by mining activities, or by other operations with
a significant impact on the surface and the natural environment. Systematic monitoring of the influence
of mining on the environment, e.g., the occurrence of flooding, may translate into controlling the
degree of weight that underground mining may put on the surface. In line with the idea of sustainable
extraction of natural resources, it can play a vital role when mining has a negative social response
related to progressing climate changes. Due to the worldwide coverage, as well as open access to
the data from the Copernicus Sentinel programme, the adopted methodology can be successfully
applied in other areas of raw material extraction. Satellite data mentioned above, as well as geological,
hydrological and other data listed in the article, can be extended with additional datasets available for
selected areas. An emphasis must be put on the stage of initial data processing and preparation for
Sustainability 2020, 12, 9338 22 of 26

use with machine learning algorithms and GWR modeling. It should be taken into account, though,
that the classification accuracy, as well as the degree of influence of individual factors on the studied
phenomenon, may achieve values different from those obtained in this paper, e.g., due to different
geological structure and topography, or for particular climate zones and excavation characteristics.

5. Conclusions
An application of various available open-source data (Sentinel-1 SAR imagery, Sentinel-2 optical
imagery and other spatial datasets) in monitoring the changes in the natural environment on a case
study of an underground coal mine operation was presented in the article. During the study, a focus
was put on the area covered by the influence of underground mining operations. Based on the
Sentinel-1 SAR data, a Line-of-Sight displacement time-series was calculated using the Small Baseline
InSAR technique for the study area, which showed how the terrain surface subsided over the analyzed
period (2015–2019). Using Sentinel-2 optical and hyperspectral imagery, the NDVI and MNDWI
were calculated, based on which the wetland locations were determined and the overall condition of
vegetation was assessed. The derived products of the Sentinel-1 and Sentinel-2 missions, together with
data about terrain elevation (DEM), groundwater depth and geological data, were processed using
GIS tools and machine learning algorithms. A Random Forest machine learning model was created,
which aims at detecting floodplains based on the input data. The developed model achieved an
accuracy of about 75%. In addition, a Geographically Weighted Regression (GWR) analysis was
conducted on the input data to investigate the influence of individual factors on the occurrence of
flooding. The executed analysis showed that the elevation and displacement variables play the most
significant role on the probability of wetland occurrence. The GWR analysis also made it possible to
identify areas where individual factors may locally influence said probability in a more significant way.
Both the causes and effects of flooding were studied. The results obtained clearly indicate that
flooding occurrence does not depend on a single factor but on multiple ones. A significant subsidence
of terrain is not the only and the most important factor, as the analysis has shown. Terrain elevation,
geological structure of the terrain and groundwater level all play a significant role in the occurrence of
flooding. Additionally, the NDVI changes allowed for an assessment of the extent to which flooding
affects the condition of the vegetation cover.
In subsequent analyses, it is suggested that more precise geological and hydrogeological data
should be used to improve the accuracy of the machine learning model. Furthermore, the model
can also be extended with additional variables based on additional meteorological or hydrological
geospatial data, which were either not available for the area studied in this paper or their form did not
enable their use in spatial modeling.

Author Contributions: Conceptualization, A.K., P.T., D.G., A.B. and K.O.; Methodology, P.T., A.K., A.B. and
D.G.; Software, P.T., D.G., A.B.; Validation, A.K., P.T., D.G., A.B. and K.O.; Formal analysis, P.T., D.G. and A.B.;
Investigation, P.T., A.B., D.G. and A.K.; Resources, K.O. and M.C.; Data curation, A.B., N.B., P.K., D.G., A.G., P.T.
and A.K.; Writing—original draft preparation, A.K., P.T., K.O., N.B., A.B., D.G., M.C. and P.K.; Writing—review and
editing, A.K. and D.G.; Visualization, D.G., P.T., A.K., N.B., A.B. and A.G.; Supervision, A.K.; Project administration,
A.K. All authors have read and agreed to the published version of the manuscript.
Funding: The research was partly supported by the statutory grant at the Department of Mining and Geodesy,
Faculty of Geoengineering, Mining and Geology, Wroclaw University of Science and Technology.
Conflicts of Interest: The authors declare no conflict of interest.
Sustainability 2020, 12, 9338 23 of 26

Abbreviations
AICc Akaike Information Criterion with a correction for
small sample sizes
CCR Central Coal Region
DN Digital Number
DEM Digital Elevation Model
GIS Geographic Information Systems
GWR Geographically Weighted Regression
InSAR Interferometric Synthetic Aperture Radar
LCB Lublin Coal Basin
LOS Line of Sight
MNDWI Modified Normalized Difference Water Index
NDVI Normalized Difference Vegetation Index
RF Random Forest
SAR Synthetic Aperture Radar
SBAS Small Baseline Subset
SHAP SHapley Additive exPlanations
SNAPHU Statistical-cost Network-flow Algorithm for Phase
Unwrapping
SRTM Shuttle Radar Topography Mission
UTM Universal Transverse Mercator

References
1. Klukanová, A.; Rapant, S. Impact of mining activities upon the environment of the Slovak Republic: Two case
studies. J. Geochem. Explor. 1999, 66, 299–306. [CrossRef]
2. Chauhan, S.S. Mining, Development and Environment: A Case Study of Bijolia Mining Area in Rajasthan,
India. J. Hum. Ecol. 2010, 31, 65–72. [CrossRef]
3. Majumder, S.; Sarkar, K. Impact of mining and related activities on physical and cultural environment
of Singrauli Coalfield—A case study through application of remote sensing techniques. J. Indian Soc.
Remote Sens. 1994, 22, 45–56. [CrossRef]
4. Nichols, O.G.; Carbon, B.A.; Colquhoun, I.J.; Croton, J.T.; Murray, N.J. Rehabilitation after bauxite mining in
south-western australia. Landsc. Plan. 1985, 12, 75–92. [CrossRef]
5. Sklenicka, P.; Prikyl, I. Non-productive principles of landscape rehabilitation after long-term opencast mining
in north-west Bohemia. J. S. Afr. Inst. Min. Metall. 2004, 104, 83–88.
6. Lechner, A.M.; Kassulke, O.; Unger, C. Spatial assessment of open cut coal mining progressive rehabilitation
to support the monitoring of rehabilitation liabilities. Resour. Policy 2016, 50, 234–243. [CrossRef]
7. Kwinta, A.; Gradka, R. Mining exploitation influence range. Nat. Hazards 2018, 94, 979–997. [CrossRef]
8. Kwinta, A.; Gradka, R. Analysis of the damage influence range generated by underground mining. Int. J.
Rock Mech. Min. Sci. 2020, 128, 104263. [CrossRef]
9. Blachowski, J.; Milczarek, W. Analysis of surface changes in the Walbrzych hard coal mining grounds
(SW Poland) between 1886 and 2009. Geol. Q. 2014, 58, 353–368. [CrossRef]
10. Blachowski, J.; Kopec, A.; Milczarek, W.; Owczarz, K. Evolution of Secondary Deformations Captured by
Satellite Radar Interferometry: Case Study of an Abandoned Coal Basin in SW Poland. Sustainability 2019,
11, 884. [CrossRef]
11. Szczepiński, J. The Significance of Groundwater Flow Modeling Study for Simulation of Opencast Mine
Dewatering, Flooding, and the Environmental Impact. Water 2019, 11, 848. [CrossRef]
12. Currell, M.J.; Werner, A.D.; McGrath, C.; Webb, J.A.; Berkman, M. Problems with the application of
hydrogeological science to regulation of Australian mining projects: Carmichael Mine and Doongmabulla
Springs. J. Hydrol. 2017, 548, 674–682. [CrossRef]
13. Milczarek, W. Investigation of post inducted seismic deformation of the 2016 MW 4.2 Tarnowek Poland
mining tremor based on DinSAR and SBAS method. Acta Geodyn. Geomater. 2019, 16, 183–193. [CrossRef]
Sustainability 2020, 12, 9338 24 of 26

14. Hejmanowski, R.; Malinowska, A.A.; Witkowski, W.T.; Guzy, A. An Analysis Applying InSAR of Subsidence
Caused by Nearby Mining-Induced Earthquakes. Geosciences 2019, 9, 490. [CrossRef]
15. Sonter, L.J.; Moran, C.J.; Barrett, D.J.; Soares-Filho, B.S. Processes of land use change in mining regions.
J. Clean. Prod. 2014, 84, 494–501. [CrossRef]
16. Redondo-Vega, J.M.; Gómez-Villar, A.; Santos-González, J.; González-Gutiérrez, R.B.; Álvarez-Martínez, J.
Changes in land use due to mining in the north-western mountains of Spain during the previous 50years.
Catena 2017, 149, 844–856. [CrossRef]
17. Malinowska, A.; Hejmanowski, R. Building damage risk assessment on mining terrains in Poland with GIS
application. Int. J. Rock Mech. Min. Sci. 2010, 47, 238–245. [CrossRef]
18. Florkowska, L. Example building damage caused by mining exploitation in disturbed rock mass.
Stud. Geotech. Mech. 2013, 35, 19–37. [CrossRef]
19. Brinson, M.M. A Hydrogeomorphic Classification for Wetlands; U.S. Army Engineer Waterways Experiment
Station: Vicksburg, MS, USA, 1993; pp. 1–103.
20. Padmanaban, R.; Bhowmik, A.K.; Cabral, P. A Remote Sensing Approach to Environmental Monitoring in a
Reclaimed Mine Area. ISPRS Int. J. Geo-Inf. 2017, 6, 401. [CrossRef]
21. Charou, E.; Stefouli, M.; Dimitrakopoulos, D.; Vasiliou, E.; Mavrantza, O.D. Using Remote Sensing to Assess
Impact of Mining Activities on Land and Water Resources. Mine Water Environ. 2010, 29, 45–52. [CrossRef]
22. Paull, D.; Banks, G.; Ballard, C.; Gillieson, D. Monitoring the Environmental Impact of Mining in Remote
Locations through Remotely Sensed Data. Geocarto Int. 2006, 21, 33–42. [CrossRef]
23. Miatkowski, Z.; Kowalik, W.; Lewiński, S.; Sołtysik, A.; Turbiak, J. Assessment of Possibilities of Satellite
Remote Sensing use for the Identification of Hydrogenic Habitats Transformations under the Influence of
Deep Drainage in the Region of the Bełchatów Brown Coal Mine. (In Polish with English summary). Available
online: http://warsztatygornicze.pl/wp-content/uploads/2004_36.pdf (accessed on 10 November 2020).
24. Wang, X.; Nie, H.-F.; Li, C.-Z.; Wang, J. Application of different data sources in the investigation of exploitation
situation and environment of mines. Remote Sens. Land Resour. 2006, 68, 69–71. [CrossRef]
25. Schmidt, H.; Glaesser, C. Multitemporal analysis of satellite data and their use in the monitoring of the
environmental impacts of open cast lignite mining areas in Eastern Germany. Int. J. Remote Sens. 1998,
19, 2245–2260. [CrossRef]
26. Duan, H.; Deng, Z.; Deng, F.; Wang, D. Assessment of Groundwater Potential Based on Multicriteria Decision
Making Model and Decision Tree Algorithms. Available online: https://www.hindawi.com/journals/mpe/
2016/2064575/ (accessed on 6 September 2020).
27. Baek, J.; Kim, S.-W.; Park, H.-J.; Jung, H.-S.; Kim, K.-D.; Kim, J.W. Analysis of ground subsidence in coal
mining area using SAR interferometry. Geosci. J. 2008, 12, 277–284. [CrossRef]
28. Zhao, C.; Lu, Z.; Zhang, Q. Time-series deformation monitoring over mining regions with SAR intensity-based
offset measurements. Remote Sens. Lett. 2013, 4, 436–445. [CrossRef]
29. Samsonov, S.; D’Oreye, N.; Smets, B. Ground deformation associated with post-mining activity at the
French–German border revealed by novel InSAR time series method. Int. J. Appl. Earth Obs. Geoinf. 2013,
23, 142–154. [CrossRef]
30. Gee, D.; Bateson, L.; Sowter, A.; Grebby, S.; Novellino, A.; Cigna, F.; Marsh, S.; Banton, C.; Wyatt, L. Ground
Motion in Areas of Abandoned Mining: Application of the Intermittent SBAS (ISBAS) to the Northumberland
and Durham Coalfield, UK. Geosciences 2017, 7, 85. [CrossRef]
31. Bateson, L.; Cigna, F.; Boon, D.; Sowter, A. The application of the Intermittent SBAS (ISBAS) InSAR method
to the South Wales Coalfield, UK. Int. J. Appl. Earth Obs. Geoinf. 2015, 34, 249–257. [CrossRef]
32. Zhao, Y.; Li, Y.; Zhang, L.; Wang, Q. Groundwater level prediction of landslide based on classification and
regression tree. Geodesy Geodyn. 2016, 7, 348–355. [CrossRef]
33. Zhu, X.; Xu, Q.; Tang, M.; Nie, W.; Ma, S.; Xu, Z. Comparison of two optimized machine learning models for
predicting displacement of rainfall-induced landslide: A case study in Sichuan Province, China. Eng. Geol.
2017, 218, 213–222. [CrossRef]
34. Karimi, S.S.; Saintilan, N.; Wen, L.; Valavi, R. Application of Machine Learning to Model Wetland Inundation
Patterns Across a Large Semiarid Floodplain. Water Resour. Res. 2019, 55, 8765–8778. [CrossRef]
35. Bońda, R.; Brzeziński, D.; Czapigo-Czapla, M.; Czapowski, G.; Fabiańczyk, J.; Kalinowska, A.; Malon, A.;
Mazurek, S.; Mikulski, S.Z.; Miśkiewicz, W.; et al. Balance of Mineral Resources in Poland as at 31 December 2019;
Polish Geological Institute–National Research Institute: Warsaw, Poland, 2020.
Sustainability 2020, 12, 9338 25 of 26

36. Lubelski Coal “Bogdanka” Characteristics of Coal Deposit. 2018. Available online: https://www.lw.com.pl/
en,2,s169.html (accessed on 10 September 2020).
37. Stachowicz, S. Current Problems of Exploitations of Hard Coal in “Bogdanka” Mine in Lublin Coal
Basin (LZW). (In Polish with English Summary). Available online: http://warsztatygornicze.pl/wp-content/
uploads/2005-1.pdf (accessed on 10 November 2020).
38. Stupicka, E.; Stempień-Sałek, M. Regional Geology of Poland; University of Warsaw Publishing House: Warsaw,
Poland, 2019; pp. 161–168. ISBN 978-83-235-2022-1. (In Polish)
39. Bulletins of the Polish Institute of Meteorology and Water Management; Institute of Meteorology and Water
Management–National Research Institute: Warsaw, Poland, 2015.
40. U.S. Geological Survey, EarthExplorer. Available online: https://earthexplorer.usgs.gov/ (accessed on 15
August 2020).
41. Osińska-Skotak, K. The Importance of Radiometric Correction in Sarellite Images. Arch. Photogramm. Cartogr.
Remote Sens. 2007, 17, 577–590, (In Polish with English summary).
42. Harris Geospatial Solutions, Inc. Fast Line-of-Sight Atmospheric Analysis of Hypercubes (FLAASH).
Available online: https://www.l3harrisgeospatial.com/docs/FLAASH.html (accessed on 17 August 2020).
43. Bell, G.E. Turfgrass Physiology and Ecology: Advanced Management Principles; CABI: Wallingford, UK, 2011;
ISBN 978-1-84593-648-8.
44. Acharya, T.D.; Lee, D.H.; Yang, I.T.; Lee, J.K. Identification of Water Bodies in a Landsat 8 OLI Image Using a
J48 Decision Tree. Sensors 2016, 16, 1075. [CrossRef] [PubMed]
45. Berardino, P.; Fornaro, G.; Lanari, R.; Sansosti, E. A new algorithm for surface deformation monitoring based
on small baseline differential SAR interferograms. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2375–2383.
[CrossRef]
46. Schmidt, D.A.; Bürgmann, R. Time-dependent land uplift and subsidence in the Santa Clara valley, California,
from a large interferometric synthetic aperture radar data set. J. Geophys. Res. Solid Earth 2003, 108. [CrossRef]
47. Tong, X.; Schmidt, D. Active movement of the Cascade landslide complex in Washington from a
coherence-based InSAR time series method. Remote Sens. Environ. 2016, 186, 405–415. [CrossRef]
48. Sandwell, D.T.; Mellors, R.J.; Tong, X.; Wei, M.; Wessel, P. Open radar interferometry software for mapping
surface Deformation. Eos Trans. Am. Geophys. Union 2011, 92, 234. [CrossRef]
49. European Space Agency, Sentinel-1 Quality Control. Available online: https://qc.sentinel1.eo.esa.int/
(accessed on 19 February 2020).
50. Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.;
Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45. [CrossRef]
51. Chen, C.W.; Zebker, H.A. Network approaches to two-dimensional phase unwrapping: Intractability and
two new algorithms. J. Opt. Soc. Am. A 2000, 17, 401–414. [CrossRef]
52. Chen, C.W.; Zebker, H.A. Two-dimensional phase unwrapping with use of statistical models for cost functions
in nonlinear optimization. J. Opt. Soc. Am. A 2001, 18, 338–351. [CrossRef]
53. Chen, C.W.; Zebker, H.A. Phase unwrapping for large SAR interferograms: Statistical segmentation and
generalized network models. IEEE Trans. Geosci. Remote Sens. 2002, 40, 1709–1719. [CrossRef]
54. Liszkowski, J. Detailed Geological Map of Poland 1:50 000, Sheet 714 (Ostrów Lubelski); Polish Geological
Institute–National Research Institute: Warsaw, Poland, 1977.
55. Czerwińska-Tomczyk, J.; Łusiak, R. Hydrogeological Map of Poland 1:50 000, First Aquifer, Occurrence and
Hydrodynamics, Sheet 714 (Ostrów Lubelski); Polish Geological Institute–National Research Institute: Warsaw,
Poland, 2005.
56. Buraczyński, J.; Wojtanowicz, J. Detailed Geological Map of Poland 1:50 000, Sheet 715 (Orzechów Nowy);
Polish Geological Institute–National Research Institute: Warsaw, Poland, 1979.
57. Rysak, A.; Zwoliński, Z. Hydrogeological Map of Poland 1:50 000, First Aquifer, Occurrence and Hydrodynamics,
Sheet 715 (Orzechów Nowy); Polish Geological Institute–National Research Institute: Warsaw, Poland, 2005.
58. Harasimiuk, M.; Henkiel, A. Detailed Geological Map of Poland 1:50 000, Sheet 750 (ٞeczna); Polish Geological
Institute–National Research Institute: Warsaw, Poland, 1978.
59. Pietruszka, W.; Zezula, H. Hydrogeological Map of Poland 1:50 000, First Aquifer, Occurrence and Hydrodynamics,
Sheet 750 (ٞeczna); Polish Geological Institute–National Research Institute: Warsaw, Poland, 2006.
60. Harasimiuk, M.; Szwajgier, W.; Jezierski, W. Detailed Geological Map of Poland 1:50 000, Sheet 751 (Siedliszcze);
Polish Geological Institute–National Research Institute: Warsaw, Poland, 1998.
Sustainability 2020, 12, 9338 26 of 26

61. Zezula, H.; Pietruszka, W. Hydrogeological Map of Poland 1:50 000, First Aquifer, Occurrence and Hydrodynamics,
Sheet 751 (Siedliszcze); Polish Geological Institute–National Research Institute: Warsaw, Poland, 2006.
62. Dowgiałło, J.; Kleczkowski, A.S.; Macioszczyk, T.; Różkowski, A. (Eds.) Hydrogeological Dictionary;
Polish Geological Institute–National Research Institute: Warsaw, Poland, 2002. (In Polish)
63. Frankowski, Z.; Gałkowski, P.; Majer, K. Directions of Use Maps of Areas at Risk of Flooding in May
2010; Polish Geological Institute–National Research Institute: Warsaw, Poland. Available online:
https://www.pgi.gov.pl/psh/materialy-informacyjne-psh/artykuly-psh/8851-artykul-2010-kierunki-
wykorzystania-mapy-obszarow-zagrozonych-ryzykiem-podtopien-w-majowej-powodzi-2010-roku.html
(accessed on 17 August 2020).
64. Oliphant, T.E. A Guide to NumPy; Trelgol Publishing: USA, 2006; Volume 1. Available online: https:
//numpy.org/ (accessed on 25 August 2020).
65. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.;
Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830.
66. Jordahl, K. GeoPandas: Python Tools for Geographic Data. 2014. Available online: https://github.com/
geopandas/geopandas (accessed on 25 August 2020).
67. Rey, S.J.; Anselin, L. PySAL: A Python Library of Spatial Analytical Methods. In Handbook of Applied Spatial
Analysis; Springer: Berlin/Heidelberg, Germany, 2009; pp. 175–193.
68. Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222.
[CrossRef]
69. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Advances in
Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R.,
Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; pp. 4765–4774.
70. Lundberg, S.M.; Erion, G.; Chen, H.; Degrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.;
Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell.
2020, 2, 56–67. [CrossRef] [PubMed]
71. Brunsdon, C.; Fotheringham, A.S.; Charlton, M.E. Geographically Weighted Regression: A Method for
Exploring Spatial Nonstationarity. Geogr. Anal. 1996, 28, 281–298. [CrossRef]
72. Fotheringham, A.S.; Brunsdon, C.; Charlton, M. Geographically Weighted Regression: The Analysis of Spatially
Varying Relationships; John Wiley & Sons: Hoboken, NJ, USA, 2003.
73. Yeo, I.-K.; Johnson, R.A. A new family of power transformations to improve normality or symmetry.
Biometrika 2000, 87, 954–959. [CrossRef]
74. Bielski, P.; Goluch, M.; Koszarna, A.; Kraszewski, P.; Szewczyk, M.; Wojewoda, J.; Dymowski, J. Integrated
report GW LW “Bogdanka” for 2016; GK LW “Bogdanka” SA. 2016. Available online: https://ri.lw.com.pl/
pub/files/BogdankaRAPORTZINTEGROWANY2016.pdf (accessed on 17 August 2020).
75. Król, Ż.; Mikrut, S.; Gabryszuk, J.; Postek, P.; Mazur, A. Changes in the structure of land use in the areas of
mining damage. Inżynieria Ekol. Ecol. Eng. 2015, 44, 26–33. [CrossRef]
76. GW LW “Bogdanka”, Responsible Business Report of the Lubelski Coal Bogdanka Capital Group for
2012–2013; 2013. Available online: https://www.lw.com.pl/pl,2,s428,raport_csr_za_lata_2012-2013.html
(accessed on 17 August 2020).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional
affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like