[go: up one dir, main page]

Next Article in Journal
Research on the Wild Mushroom Recognition Method Based on Transformer and the Multi-Scale Feature Fusion Compact Bilinear Neural Network
Previous Article in Journal
A Lightweight Cotton Verticillium Wilt Hazard Level Real-Time Assessment System Based on an Improved YOLOv10n Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating Cadmium Concentration in Agricultural Soils with ZY1-02D Hyperspectral Data: A Comparative Analysis of Spectral Transformations and Machine Learning Models

1
School of Geospatial Engineering and Science, Sun Yat-sen University, Zhuhai 519082, China
2
Key Laboratory of Natural Resources Monitoring in Tropical and Subtropical Area of South China, Ministry of Natural Resources, Zhuhai 519082, China
3
Key Laboratory of Ecosystem Network Observation and Modeling, Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
4
The Zhongke-Ji’an Institute for Eco-Environmental Sciences, Ji’an 343000, China
5
College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Agriculture 2024, 14(9), 1619; https://doi.org/10.3390/agriculture14091619
Submission received: 1 July 2024 / Revised: 7 September 2024 / Accepted: 13 September 2024 / Published: 15 September 2024
(This article belongs to the Section Digital Agriculture)
Graphical abstract
">
Figure 1
<p>Distribution of soil sampling sites in the study area: (<b>a</b>) Jiangxi Province, China; (<b>b</b>) Geographic location of the study area; (<b>c</b>) Distribution of sampling points and elevation within the study area. The top-right image shows the coverage of the study area by the original ZY1-02D imagery.</p> ">
Figure 2
<p>(<b>a</b>) Original spectral curves and (<b>b</b>) Savitzky–Golay (SG) smoothed spectral curves of soil samples from hyperspectral images. Note: Each color represents a sampling point.</p> ">
Figure 3
<p>The correlation coefficients between soil Cd and original soil spectral data, and after Savitzky–Golay (SG) smoothed spectral data.</p> ">
Figure 4
<p>Nine spectral transformation curves of soil samples from hyperspectral images. (<b>a</b>) logarithmic transformation (LT), (<b>b</b>) reciprocal transformation (RT), (<b>c</b>) first derivative (FD), (<b>d</b>) logarithm of reciprocal transformation (LR), (<b>e</b>) reciprocal of logarithmic transformation (RL), (<b>f</b>) reciprocal of logarithmic and first derivative (RLFD), (<b>g</b>) standard normal variate (SNV), (<b>h</b>) continuum removal (CR), and (<b>i</b>) multiplicative scatter correction (MSC). Note: Each color represents a sampling point.</p> ">
Figure 5
<p>The correlation coefficient curves between the spectra derived from nine spectral transformation methods and the soil Cd content. (<b>a</b>) logarithmic transformation (LT), (<b>b</b>) reciprocal transformation (RT), (<b>c</b>) first derivative (FD), (<b>d</b>) logarithm of reciprocal transformation (LR), (<b>e</b>) reciprocal of logarithmic transformation (RL), (<b>f</b>) reciprocal of logarithmic and first derivative (RLFD), (<b>g</b>) standard normal variate (SNV), (<b>h</b>) continuum removal (CR), and (<b>i</b>) multiplicative scatter correction (MSC).</p> ">
Figure 6
<p>Spatial distribution of soil Cd content in the study area driven by the RF model constructed with first derivative-transformed spectral data. Note that this Cd distribution map has been masked with a cropland layer derived from the GlobeLand30 dataset (<a href="http://www.globallandcover.com/" target="_blank">http://www.globallandcover.com/</a>, accessed on 20 December 2022).</p> ">
Figure 7
<p>Relative proportional and spatial extents of three soil pollution categories based on soil Cd contents.</p> ">
Versions Notes

Abstract

:
The accumulation of cadmium (Cd) in agricultural soils presents a significant threat to crop safety, emphasizing the critical necessity for effective monitoring and management of soil Cd levels. Despite technological advancements, accurately monitoring soil Cd concentrations using satellite hyperspectral technology remains challenging, particularly in efficiently extracting spectral information. In this study, a total of 304 soil samples were collected from agricultural soils surrounding a tungsten mine located in the Xiancha River basin, Jiangxi Province, Southern China. Leveraging hyperspectral data from the ZY1-02D satellite, this research developed a comprehensive framework that evaluates the predictive accuracy of nine spectral transformations across four modeling approaches to estimate soil Cd concentrations. The spectral transformation methods included four logarithmic and reciprocal transformations, two derivative transformations, and three baseline correction and normalization transformations. The four models utilized for predicting soil Cd were partial least squares regression (PLSR), support vector machine (SVM), bidirectional recurrent neural networks (BRNN), and random forest (RF). The results indicated that these spectral transformations markedly enhanced the absorption and reflection features of the spectral curves, accentuating key peaks and troughs. Compared to the original spectral curves, the correlation analysis between the transformed spectra and soil Cd content showed a notable improvement, particularly with derivative transformations. The combination of the first derivative (FD) transformation with the RF model yielded the highest accuracy (R2 = 0.61, RMSE = 0.37 mg/kg, MAE = 0.21 mg/kg). Furthermore, the RF model in multiple spectral transformations exhibited higher suitability for modeling soil Cd content compared to other models. Overall, this research highlights the substantial applicative potential of the ZY1-02D satellite hyperspectral data for detecting soil heavy metals and provides a framework that integrates optimal spectral transformations and modeling techniques to estimate soil Cd contents.

Graphical Abstract">

Graphical Abstract

1. Introduction

Soil, as a fundamental component of the natural geographical landscape, is an indispensable resource for both agriculture and human well-being [1]. Yet, the intensification of human activities, including increased mineral exploitation, pesticide usage, and industrial waste discharge, has led to a concerning escalation in soil contamination with heavy metals [2]. Among heavy metals, cadmium (Cd) contamination in soil is a matter of great concern due to its high phytotoxicity and lipophilic nature that enhances uptake by diverse crop species [3,4]. Moreover, the incorporation of these metals into the food chain poses substantial risks to human health and longevity [5]. Considering these alarming effects, accurate monitoring of soil Cd levels, particularly in agricultural regions, is imperative for preservation of ecological integrity and public well-being. This underscores the urgent need for timely prediction of Cd concentrations to safeguard both ecosystems and human health. Traditional methods for soil heavy metal assessment, which involve field sampling followed by laboratory analyses, are acknowledged for their high determination accuracy. However, these methods are resource-intensive, requiring significant investments of time, labor, and financial resources [6]. The advancement of geostatistical theories has introduced sophisticated interpolation techniques such as Kriging and inverse distance weighting (IDW), facilitating the creation of spatially continuous maps of soil heavy metal content [7]. Despite their effectiveness, these methods depend heavily on the availability of comprehensive and evenly distributed sampling data [8].
With the development of remote sensing techniques, advancements have enabled the use of distinct soil spectral signatures for the detection of heavy metals [9,10]. Hyperspectral remote sensing technology captures hundreds of narrow-band spectral reflectance data across extensive areas, distinguished by its high spatial and spectral resolution [11,12]. Therefore, hyperspectral remote sensing technology offers a cost-effective approach for the quantitative analysis of soil heavy metals [13]. Its capacity to produce high-resolution spatial soil maps is invaluable for soil science [14,15]. And hyperspectral sensors, such as the ASD FieldSpec3 portable spectroradiometer (Analytical Spectral Devices Inc., Boulder, CO, USA) and the Foss NIRSystems 5000 spectrophotometer (Silver Spring, MD, USA), have proven effective in identifying spectral responses within the visible and near-infrared (VNIR), as well as shortwave infrared (SWIR), regions that are sensitive to soil Cd [16]. Consequently, numerous researchers have employed hyperspectral data to estimate soil Cd content. For instance, Arif et al. [17] utilized a FieldSpec-3 portable handheld spectroradiometer to measure the spectral reflectance of soils in urban roadside greenbelts, effectively and rapidly estimating Cd content in these greenbelts using characteristic hyperspectral bands. Similarly, Zhang et al. [18] employed the ASD FieldSpec4 spectrometer to measure hyperspectral data and found that the continuous wavelet transform (CWT) effectively minimized noise and baseline drift, enabling high-precision inversion of soil Cd content. Hong et al. [19], through the integration of ground-based hyperspectral reflectance data with soil covariates, demonstrated that combining these datasets significantly enhances the accuracy of Cd content prediction. However, the majority of these studies have relied on point spectral data in laboratory or in situ environments, which introduces significant limitations. Such data are often not scalable to broader regional contexts, thus constraining their applicability for comprehensive environmental monitoring [20]. This highlights a critical area for further research and development in the application of remote sensing technologies for soil heavy metal prediction at regional and larger scales.
Due to the small components and low spectral responses of heavy metals in soil, directly estimating their content through spectral responses is generally challenging [6]. However, the enrichment mechanisms of soil heavy metals can be observed through the adsorption of various minerals such as clay minerals, iron oxides, and organic matter [21]. These components influence both the morphology and reflectance properties of soil spectra, while also exhibiting distinct spectral absorption features [22]. By recognizing the co-variance between these components and metals, even those that are spectrally indistinctive can be accurately estimated [23]. This relationship forms the theoretical basis for the hyperspectral estimation of soil heavy metals. Nevertheless, mapping soil heavy metal concentrations using satellite hyperspectral imagery presents significant challenges. These challenges primarily stem from the susceptibility of satellite-derived data to disturbances caused by variations in topography, soil moisture, and atmospheric conditions [24]. Such factors introduce significant noise, complicating the analysis and interpretation of spectral signatures. To ensure reliable mapping of soil heavy metals, spectral preprocessing steps such as atmospheric correction and topographic normalization are essential [25]. Additionally, employing advanced spectral transformation techniques, including derivatives and reciprocal and logarithmic transformations, can enhance the detectability of these metals by emphasizing their unique spectral features [26]. Nevertheless, determining the most effective spectral transformation technique for preprocessing hyperspectral imagery, particularly in estimating soil Cd, remains an unresolved issue. This uncertainty highlights the necessity for ongoing research to evaluate and compare various spectral transformation methods.
Furthermore, estimation methods for soil heavy metal concentrations utilize both statistical and machine learning models [27]. Traditional empirical statistical techniques such as multiple linear regression (MLR) and partial least squares regression (PLSR) have been widely used in soil heavy metal prediction [28]. However, these linear models often struggle to fully capture the complex relationships between heavy metal concentrations and their spectral signatures due to their inherent linear assumptions [29]. In contrast, machine learning offers a robust alternative through its capability to model nonlinear relationships. As nonlinear mathematical analytical approaches, machine learning techniques can more effectively simulate the intricate interactions between soil heavy metals and hyperspectral data, thereby significantly enhancing the accuracy of heavy metal estimation [30]. These advanced computational methods enable more precise and comprehensive analyses, making them increasingly preferable in contemporary soil heavy metal studies.
The ZY1-02D satellite, launched in September 2019, is renowned for its wide swath, rich spectral information, and high spatial resolution. Equipped with an Advanced Hyperspectral Imager (AHSI), it captures spectral data across 166 bands spanning from the visible range (400 nm) to the shortwave infrared region (2500 nm). While the ZY1-02D has proven effective in soil organic carbon and nitrogen mapping [31,32], its potential for monitoring low-concentration soil heavy metals, such as Cd, remains largely unexplored. Therefore, the primary objective of this study is to develop a framework for effectively monitoring soil Cd content using hyperspectral data from the ZY1-02D satellite. Specifically, our aim is to evaluate nine spectral transformation methods in conjunction with four distinct modeling strategies to assess their impact on the inversion model for soil Cd content. By identifying the optimal framework, we seek to validate the capability of ZY1-02D data in estimating soil Cd concentrations. Ultimately, we will generate a detailed map of soil Cd distribution using the most effective model. Through a rigorous examination of the nine spectral transformation methods applied to ZY1-02D hyperspectral data, our goal is to determine the most effective preprocessing approach for accurately estimating soil Cd content via satellite hyperspectral imagery.

2. Materials and Methods

2.1. Study Area

The study site is situated in the Xiancha River basin (114°58′–115°16′ E, 26°36′–26°50′ N) within the central-southern region of Jiangxi Province, China (Figure 1). This area exhibits elevations ranging from 29 m to 944 m. It is characterized by a subtropical monsoon humid climate, and red soil predominates in this region due to its high Fe-oxide content. This region is rich in mineral resources, particularly tungsten ore. Historical surveys have indicated that the Xiancha River Basin has been significantly impacted by heavy metal emissions from the Xiaolong tungsten mining area [33]. Established in 1934, the Xiaolong tungsten mining site is situated in the Nanling metallogenic belt. Prolonged mineral extraction activities have led to increased environmental contamination, including heightened flue gas emissions, accumulation of solid waste residues, and increased effluent discharge [34]. Additionally, the utilization of river water for irrigation and the extensive application of fertilizers have contributed to heavy metal pollution in the cultivated soils of this region, posing risks to local food security and the ecological environment.

2.2. Soil Sampling and Cd Content Determination

Considering the spatial distribution of the mining site and farmlands, as well as topographical features and road accessibility within the study area, a total of 304 surface soil samples were collected at a depth of 0–20 cm in December 2020. Samples were collected at both the central location and the four corners of each sampling point within a 30 m × 30 m square grid to form a composite sample, and the coordinates of the five points were recorded using a handheld GPS device. Subsequently, the soil samples were stored in a refrigerator, and transported back to the laboratory. They underwent air-drying, grinding, and sieving processes to achieve particle sizes ≤ 0.15 mm before being stored in clean, sealed bags.
The procedure for determining the Cd content in the soil was as follows: Approximately 0.1 g of soil was placed into a microwave digestion system, followed by the addition of 5 mL of nitric acid, 2 mL of hydrofluoric acid, and 3 mL of hydrogen peroxide into the digestion tank. After digestion, 1 mL of perchloric acid was added to remove the hydrofluoric acid, and finally, the solution was brought up to a 50 mL volume in a colorimetric tube and left to stand overnight. Measurements were then conducted using an Inductively Coupled Plasma Mass Spectrometer (ICP-MS, Thermo Fisher, Waltham, MA, USA).

2.3. Hyperspectral Data Acquisition and Preprocessing

The hyperspectral satellite imagery used in this study were derived from the ZY1-02D hyperspectral satellite, accessible through the Natural Resources Satellite Remote Sensing Cloud Service Platform of China (http://sasclouds.com/, accessed on 22 October 2022). The ZY1-02D hyperspectral satellite covers the visible and near-infrared (VNIR), and short-wave infrared (SWIR) regions, comprising 76 bands with a spectral resolution of 10 nm in the VNIR range, and 90 bands with a spectral resolution of 20 nm in the SWIR range. The satellite features a wide image swath width of 60 km and a spatial resolution of 30 m, making it an exceptionally capable tool for conducting detailed earth observation and analysis.
The ZY1-02D hyperspectral image was acquired on 8 December 2019, encompassing the entire study area. Due to the extended revisit cycle of hyperspectral satellites, acquiring imagery during bare soil periods can be challenging, and no high-quality, cloud-free hyperspectral images were available for the exact year of soil sampling. However, considering that soil heavy metal concentrations tend to remain stable over short time periods in the absence of significant disturbances, such as those resulting from mining or industrial activities [35,36,37], we determined that using a high-quality image from a nearby year (2019) was suitable for our analysis. Meanwhile, the selected image was cloud-free and provided the necessary spectral information for comprehensive analysis. The visualized image is presented in Figure 1.
The hyperspectral imagery underwent radiometric correction and atmospheric correction using ENVI 5.3 (Environment for Visualizing Images 5.3, Harris Geospatial Corporation, Broomfield, CO, USA) software. The fast line-of-sight atmospheric analysis of spectral hypercubes (FLAASH) atmospheric correction model was applied to obtain accurate surface reflectance in the study area. Subsequently, Landsat-8 satellite imagery and elevation information from the ASTER Global Digital Elevation Model (ASTER GDEM) with a spatial resolution of 30 m (www.usgs.gov, accessed on 22 October 2022) were employed as reference datasets. Utilizing these datasets enabled orthorectification of the ZY1-02D imagery to achieve an RPC orthorectification accuracy of less than half a pixel. Considering the inherent limitations of the sensor in specific areas, such as signal-to-noise ratio and susceptibility to atmospheric water interference during spectral acquisition, several bands were excluded. These excluded bands include 395.861 nm, 1005.604 to 1039.193 nm, 1357.908 to 1408.492 nm, 1828.817 to 1929.929 nm, and 2467.568 to 2501.081 nm.

2.4. Spectral Transformations

The raw spectral data (R) from hyperspectral imagery can be influenced by sensor operational states and environmental conditions during acquisition, leading to baseline drift and random noise [38]. Spectral preprocessing is essential to mitigate these issues and enhance the performance of the prediction model for soil Cd [39,40]. Initially, a Savitzky–Golay filter was applied for smoothing to reduce random noise. Based on the smoothed reflectance, nine different spectral transformation techniques were applied: (1) logarithmic transformation (LT); (2) reciprocal transformation (RT); (3) logarithm of reciprocal transformation (LR); (4) reciprocal of logarithmic transformation (RL); (5) first derivative (FD); (6) reciprocal of logarithmic and first derivative (RLFD); (7) standard normal variate (SNV); (8) continuum removal (CR); (9) multiplicative scatter correction (MSC). All spectral transformations were performed using MATLAB 2022a.
These spectral transformation techniques serve distinct purposes. Logarithmic and reciprocal transformations (LT, RT, LR, RL) enhance spectral variations, particularly in the visible light region [41]. Derivative transformations (FD, RLFD) are effective in neutralizing background disturbances and mitigating baseline drift, which improves the clarity of spectral features [42]. The spectral preprocessing techniques SNV, MSC, and CR are referred to as baseline correction and normalization techniques. Specifically, SNV removes a constant offset by subtracting the mean of the spectrum and dividing by the standard deviation, aligning all spectra to a common scale; it is widely adopted due to its simplicity and effectiveness [43]. CR is a preprocessing technique that enhances spectral absorption features by establishing a continuous line connecting local maxima of reflectance, which proves valuable in isolating specific absorption characteristics [44]. MSC, on the other hand, addresses variations caused by scattering effects and improves the correlation between spectra and their corresponding data by adjusting each spectrum to match a reference spectrum, thereby mitigating scatter-induced distortions [45]. These preprocessing steps are essential for improving the accuracy and reliability of soil Cd monitoring using hyperspectral imagery.

2.5. Model Techniques and Accuracy Evaluation

This study employed both linear and nonlinear models for predicting soil Cd content. As the linear model, partial least squares regression (PLSR) was applied, a multivariate regression method that minimizes squared errors to find an optimal fitting function in a reduced-dimensional space. This approach effectively addresses multicollinearity among independent variables, making it particularly useful for hyperspectral bands [17]. In contrast, support vector machine (SVM), bidirectional recurrent neural network (BRNN), and random forest (RF) models were used to build nonlinear predictive models. The SVM model aims to find the optimal separating hyperplane that classifies the training dataset while maximizing the geometric margin [46]. BRNN improves the prediction quality by capturing input information from both past and future contexts in recurrent neural networks [47]. The RF model, an ensemble learning algorithm based on decision trees, can handle high-dimensional feature input samples and achieve satisfactory accuracy even in noisy conditions [48]. These models were selected to provide a comprehensive comparison between linear and nonlinear approaches, aiming to identify the most effective method for predicting soil Cd content using hyperspectral satellite data.
All the models were constructed using the caret package in the R programming language. Ten-fold cross-validation was utilized to evaluate the accuracy of these models. The evaluation metrics included the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE). These formulas can be defined as follows:
R 2 = 1 i = 1 N y i ^ y i 2 i = 1 N y i y ¯ 2
R M S E = 1 n i = 1 N y i ^ y i 2
M A E = 1 n i = 1 N y i ^ y i
where n denotes the number of soil samples, y i ^ signifies the predicted value of the ith sample, y i represents the actual value of the ith sample, and y ¯ is the average value of the actual samples.

3. Results and Discussion

3.1. Descriptive Statistics of Soil Cd Content

The statistical characteristics of Cd content in the 304 collected soil samples are presented in Table 1. The Cd content ranged from 0.1 to 3.251 mg/kg, with a mean value of 0.535 mg/kg and a standard deviation of 0.563 mg/kg. The results indicated that the average Cd concentration significantly exceeded the risk screening threshold of 0.3 mg/kg specified in the Soil Environmental Quality Control Standards (Chinese National Standard: GB 15618-2018), highlighting relatively severe soil contamination [49]. Furthermore, the distribution of Cd content exhibited a pronounced right skewness, suggesting the presence of high concentrations in specific geographical regions. The coefficient of variation (CV) was calculated to be 105.23%, reflecting substantial environmental heterogeneity [50]. In the field of soil science, CV is employed as a measure to quantify the extent of variability in soil properties, where a CV of 0% to 15% indicates low variability, 16% to 35% suggests moderate variability, and anything above 36% represents high variability [51]. The considerable CV observed in this study implied significant fluctuations in soil Cd content, thereby suggesting notable anthropogenic influence [14]. Consequently, it becomes evident that there exists substantial and profound human-induced variation in soil Cd levels within this region.

3.2. Spectral Characteristics and Correlation Analysis with Cd Content

The spectral data extracted from hyperspectral imagery across various sample points revealed broadly consistent reflectance trends (Figure 2). Reflectance consistently increased within the spectral ranges of 404–1040 nm and 1055–1341 nm. The soil’s reflectance spectrum is intrinsically linked to the physicochemical properties of its components, each contributing distinct spectral features at specific wavelengths [52]. For instance, iron oxide and manganese oxide introduce subtle absorption peaks in the soil’s spectrum at approximately 430 nm and 470 nm, respectively [53]. Additionally, soil carbon exhibits numerous absorption features across the entire VNIR-SWIR spectrum [54]. Within the 1425–1812 nm range, there was an initial decreasing trend followed by an increase in spectral reflectance. The 1946–2450 nm range exhibited a fluctuating downward trend in reflectance (Figure 2a). A broad absorption band proximate to 2100 nm is indicative of the presence of cellulose and lignin [55]. Importantly, although soil Cd does not possess distinguishable spectral characteristics within the VNIR-SWIR range, its association with other soil properties that have spectral features, such as soil organic matter and iron oxides [4,16,56], forms the basis for successfully estimating soil Cd content.
A comparison of the reflectance curves before and after applying Savitzky–Golay (SG) smoothing illustrates the effectiveness of this technique in reducing noise while preserving the original spectral characteristics (Figure 2). Moreover, the correlation between soil spectral signatures and Cd content before and after SG smoothing exhibited a high level of consistency in the overall morphology of both curves (Figure 3). Notably, the SG-smoothed curve demonstrated a stronger correlation compared to the original, particularly within the 400–700 nm range, where the correlation coefficients were consistently above 0.3, indicating a highly significant correlation (p < 0.01). The spectral variations within this range are closely associated with soil organic matter and iron oxides, specifically goethite and hematite [57,58]. In the spectral range of 2100–2450 nm, there is also a noteworthy correlation between soil Cd and the spectrum, with correlation coefficients reaching 0.25 and exhibiting significant correlation (p < 0.01). The spectral attributes within this range correspond to carbonates and the C-H bonds in organic matter [59,60].
The spectral transformation curves of soil were obtained using nine spectral transformation methods, namely logarithmic transformation (LT), reciprocal transformation (RT), logarithm of reciprocal transformation (LR), reciprocal of logarithmic transformation (RL), first derivative (FD), reciprocal of logarithmic and first derivative (RLFD), standard normal variate (SNV), continuum removal (CR), and multiplicative scatter correction (MSC) (Figure 4). Through these transformations, the absorption and reflection attributes of the spectral curve were significantly amplified, with peaks and troughs becoming more pronounced. This enhancement facilitated the identification of spectral bands in the soil that exhibited sensitivity to Cd content, which is consistent with previous studies [61].
Furthermore, the correlation analysis between nine spectral transformation curves and soil Cd content was explored, yielding a correlation coefficient curve for soil Cd and spectral bands (Figure 5). The analysis showed an overall improvement in the correlation between soil spectral signatures and soil Cd after applying the spectral transformations. Among the mathematical transformation methods, the correlation coefficient curves for LT, RT, LR, and RL exhibited similar shapes and demonstrated comparable maximum correlation values, with the peak correlation band at 687 nm (Table 2). However, these four transformations showed only marginal improvement in terms of relative enhancement. In contrast, the derivative transformation methods, including FD and RLFD, showcased substantial variations in their trends. Notably, the RLFD records the highest correlation coefficient of 0.414 at the 567 and 593 nm bands (Table 2). Moreover, the preprocessing methodologies of standard normal variate (SNV) and multiplicative scatter correction (MSC) displayed a similarity in their correlation coefficient curve structures. The spectral bands corresponding to the peak correlation between the derived spectra and soil Cd for these two preprocessing methodologies were identified within the shortwave infrared spectrum, specifically at 1795 nm (Table 2).
These improvements suggested that the four mathematical transformation methods fail to effectively capture the spectral information pertinent to soil Cd. In contrast, the derivative method excels at extracting the spectral data associated with soil Cd, pinpointing the spectral bands at 567 nm and 593 nm, characteristic of hematite [19,62]. Furthermore, significant similarities between SNV and MSC suggest that these two preprocessing techniques are largely interchangeable for most practical applications [63]. The maximum correlation coefficient (0.414, p < 0.01) was observed by RLFD spectral transformation, with the most significantly correlated spectral bands predominantly located between 400–700 nm (Table 2). This aligns with existing research, indicating that soil organic matter and Fe are pivotal attributes in deciphering the mechanism behind predicting soil Cd levels from visible to near-infrared reflectance spectra [21,64]. The spectral responses of these two components predominantly occur within the 400–700 nm range [65], making this spectral interval crucial for accurately estimating soil Cd content.

3.3. Comparison of Model Performances across Nine Spectral Transformations

The impacts of nine spectral transformation preprocessing methods on prediction models were compared by training and evaluating four machine learning models: random forest (RF), bidirectional recurrent neural network (BRNN), support vector machine (SVM), and partial least squares regression (PLSR) (Table 3). The results, based on refined models derived from smoothed spectral data, indicated that mathematical transformations (LT, RT, LR, and RL) had minimal impact on model performance. In contrast, the derivative methodologies (FD and RLFD) demonstrated superior performance within the RF, BRNN, and SVM models. Specifically, the FD transformation method exhibited the highest precision within the RF model framework (R2 = 0.61, RMSE = 0.37 mg/kg, MAE = 0.21 mg/kg), closely followed by the RLFD transformation (R2 = 0.60, RMSE = 0.37 mg/kg, MAE = 0.21 mg/kg). Moreover, the MSC and SNV methods showed suboptimal performance and marginally improved the accuracy across three models compared to the Savitzky-Golay (SG) preprocessing. However, the CR preprocessing technique negatively affected the performance of the BRNN, SVM, and PLSR models, which aligns with the correlation analysis results in Table 2.
It is important to recognize that this study relies on satellite hyperspectral data to estimate soil heavy metal concentrations, a process that is inherently influenced by various environmental and atmospheric conditions. These real-world factors introduce complexities that are not present in controlled laboratory experiments, making direct comparisons with laboratory-based studies difficult [64]. Consequently, the predictive accuracy of our models may be lower than those achieved under ideal conditions where such external influences are minimized. However, despite these challenges, our results are consistent with findings from other satellite-based hyperspectral studies. For instance, the accuracy we obtained is comparable to that of Sun et al. [66], who used the GF-5 hyperspectral satellite to estimate soil zinc concentrations. This suggests that our model performance aligns with current research in the field, where environmental variability is a known limitation.
Overall, derivative transformations stand out as superior to conventional mathematical transformations such as LT, RT, LR, and RL [67]. This advantage is primarily due to the objectives of derivative transformations, which include minimizing external noise, resolving overlapping peaks, eliminating baselines, and addressing other challenges in spectral modeling. Furthermore, they enhance the detection of subtle variations in spectral data [68]. In contrast, mathematical spectral transformations such as reciprocal and logarithmic conversions primarily adjust the overall spectral shape or dynamic range [69,70]. Consequently, for soil Cd, which lacks a direct spectral response, derivative transformations establish a more pronounced connection between spectra and soil Cd content.
Moreover, the RF model, renowned for its robustness, effectively establishes a coherent relationship between spectral data and soil Cd, even when faced with challenges arising from CR-transformed spectral data. It exhibited superior prediction accuracy compared to the SVM, BRNN, and PLSR models. Designed by aggregating multiple decision trees, the RF model boasts a heightened resistance to interference [71]. Additionally, considering that the estimation of soil Cd content indirectly relies on other soil properties, such as soil organic matter or iron oxides, the RF model adeptly manages a diverse range of features and their intricate interrelationships [72]. Consequently, it captures the nuanced interactions among these characteristics more effectively.

3.4. Spatial Mapping and Analysis of Soil Cd Levels

The RF model constructed with FD-transformed spectral data was employed as the most effective model for soil Cd content inversion in the research area. Applying this model, a spatial distribution map was generated, illustrating the variation in soil Cd content across the region (Figure 6). The map revealed a distinct trend of gradual decrease in soil Cd content from the southeast to the northwest within the research area. Notably, soils located near the river exhibit lower levels of Cd content compared to those farther away, which is consistent with findings from previous field investigations [34]. Furthermore, there is a higher concentration of Cd near Xiancha River compared to other tributaries, although this effect diminishes downstream. This observed pattern could be attributed to untreated or inadequately treated wastewater discharged into the river by Xiaolong tungsten mine, upstream [33]. When this contaminated water is used for irrigation or flows through agricultural areas, heavy metals, including cadmium, tend to settle and accumulate in the soil [73].
According to the Agricultural Soil Pollution Intervention Standards, the soil within the research area can be categorized based on its Cd content and the stipulated soil pollution risk screening and intervention values [49]. The prediction map was classified accordingly, and the proportions of the three soil categories are illustrated in Figure 7. Specifically, soil with Cd content below 0.3 mg/kg is designated as “Priority Protection”. Soil with Cd levels exceeding 0.3 mg/kg but not surpassing the intervention value of 1.5 mg/kg is termed “Safe Use”. Although these areas are deemed safe, they present potential risks to the growth and safety quality of agricultural products. Soil registering a Cd concentration above the soil pollution intervention value of 1.5 mg/kg falls under the “Strict Control” category. These lands require stringent interventions, including ceasing agricultural activities and possibly reforesting. The data clearly indicated that soils classified under the ‘Priority Protection’ category, which represent a low risk of contamination, only cover a mere 0.771 km2, accounting for just 1.1% of the study area. In contrast, soils categorized as ‘Safe Usage’ dominate, constituting 89.9% of the total area. It is important to note that these soils pose potential risks regarding the quality and safety of agricultural products [74]. Furthermore, approximately 9% of the soils require stringent interventions such as prohibiting agricultural cultivation and implementing reforestation measures. This highlights the urgent soil safety concerns in the region that necessitate immediate remedial actions.

3.5. Limitation and Future Work

This study explored the impact of various spectral transformation methods on the hyperspectral inversion of soil Cd content and validated the effectiveness of these transformations, providing a foundational approach for the hyperspectral estimation of soil heavy metals. However, several limitations remain that warrant attention. Firstly, due to the scarcity of satellite hyperspectral data, it was challenging to consistently obtain high-quality, cloud-free hyperspectral imagery during the bare soil period within the study area. The temporal mismatch between image acquisition and soil sampling may cause some uncertainties. To mitigate these uncertainties, future work could consider incorporating multiple sources of satellite imagery, which would allow for cross-validation and potentially more accurate estimations.
Secondly, this study focused exclusively on a single hyperspectral image from the ZY1-02D sensor. It is important to recognize that different hyperspectral satellite sensors vary in parameters such as spatial resolution, spectral resolution, and swath width, which may influence their effectiveness in detecting soil heavy metals. Moreover, variations in soil heavy metals across different regions could significantly impact the results. Future work could explore how different hyperspectral sensors respond to soil heavy metals, or on applying our research framework to other regions to explore and validate its applicability.

4. Conclusions

This research provides a framework that integrates spectral transformations and machine learning algorithms for the estimation of soil Cd content at a regional scale using satellite hyperspectral data. This highlights the potential of ZY1-02D satellite data in monitoring soil heavy metals, and the main findings presented below:
(1)
Derivative transformations, especially the first derivative (FD), proved more effective for predicting soil Cd content compared to other transformations. Specifically, the FD adeptly minimizes external noise, addresses challenges associated with overlapping mixed peaks, and rectifies baseline deviations, thus enhancing the correlation between spectral bands and soil Cd content.
(2)
The optimal wavelengths under mathematical and derivative transformation methods for predicting soil Cd content are between 400–700 nm. While the spectral bands corresponding to the peak correlation between the derived spectra and soil Cd after the SNV and MSC transformations were identified within the shortwave infrared spectrum.
(3)
Among the best soil Cd prediction models derived by different spectral transformations with four models, the RF model combined with FD transformation yielded the highest accuracy (R2 = 0.61, RMSE = 0.37 mg/kg, MAE = 0.21 mg/kg). Notably, the RF model showed significant stability and accuracy in estimating soil Cd concentrations.
Given the pronounced variations in soil properties across different regions, the applicability of this research framework to soil heavy metal estimation in other regions warrants further exploration and validation.

Author Contributions

Conceptualization, J.G.; methodology, J.L. and X.X.; software, J.L. and X.X.; validation, J.G., Y.Y. and Y.G.; formal analysis, J.L., J.G. and X.X.; investigation, J.G., H.F., Y.G. and S.C.; resources, J.G., H.F. and S.C.; data curation, J.L., X.X. and Y.Y.; writing—original draft preparation, J.L. and J.G.; writing—review and editing, J.L., J.G., X.X., Y.Y., H.F., Y.G. and S.C.; visualization, J.L. and X.X.; supervision, J.G. and H.F.; project administration, J.G. and H.F.; funding acquisition, J.G., H.F. and S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (32101301), and the “Unveiling the List of Hanging” Science and Technology Project of Jinggangshan Agricultural High-tech Industrial Demonstration Zone (No.20222-051244).

Data Availability Statement

The data presented in this study are available upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. McBratney, A.; Field, D.J.; Koch, A. The dimensions of soil security. Geoderma 2014, 213, 203–213. [Google Scholar] [CrossRef]
  2. Wang, J.; Hu, X.; Shi, T.; He, L.; Hu, W.; Wu, G. Assessing toxic metal chromium in the soil in coal mining areas via proximal sensing: Prerequisites for land rehabilitation and sustainable development. Geoderma 2022, 405, 115399. [Google Scholar] [CrossRef]
  3. Gu, Y.; Li, S.; Gao, W.; Wei, H. Hyperspectral estimation of the cadmium content in leaves of Brassica rapa chinesis based on the spectral parameters. Acta Ecol. Sin. 2015, 35, 4445–4453. [Google Scholar] [CrossRef]
  4. Gholizadeh, A.; Saberioon, M.; Ben-Dor, E.; Borůvka, L. Monitoring of selected soil contaminants using proximal and remote sensing techniques: Background, state-of-the-art and future perspectives. Crit. Rev. Environ. Sci. Technol. 2018, 48, 243–278. [Google Scholar] [CrossRef]
  5. Wang, F.; Gao, J.; Zha, Y. Hyperspectral sensing of heavy metals in soil and vegetation: Feasibility and challenges. ISPRS J. Photogramm. Remote Sens. 2018, 136, 73–84. [Google Scholar] [CrossRef]
  6. Wu, Y.; Chen, J.; Wu, X.; Tian, Q.; Ji, J.; Qin, Z. Possibilities of reflectance spectroscopy for the assessment of contaminant elements in suburban soils. Appl. Geochem. 2005, 20, 1051–1059. [Google Scholar] [CrossRef]
  7. Ha, H.; Olson, J.R.; Bian, L.; Rogerson, P.A. Analysis of Heavy Metal Sources in Soil Using Kriging Interpolation on Principal Components. Environ. Sci. Technol. 2014, 48, 4999–5007. [Google Scholar] [CrossRef]
  8. Xie, Y.; Chen, T.-B.; Lei, M.; Yang, J.; Guo, Q.-J.; Song, B.; Zhou, X.-Y. Spatial distribution of soil heavy metal pollution estimated by different interpolation methods: Accuracy and uncertainty analysis. Chemosphere 2011, 82, 468–476. [Google Scholar] [CrossRef]
  9. Agyeman, P.C.; Borůvka, L.; Kebonye, N.M.; Khosravi, V.; John, K.; Drabek, O.; Tejnecky, V. Prediction of the concentration of cadmium in agricultural soil in the Czech Republic using legacy data, preferential sampling, Sentinel-2, Landsat-8, and ensemble models. J. Environ. Manag. 2023, 330, 117194. [Google Scholar] [CrossRef]
  10. Khosravi, V.; Gholizadeh, A.; Saberioon, M. Soil toxic elements determination using integration of Sentinel-2 and Landsat-8 images: Effect of fusion techniques on model performance. Environ. Pollut. 2022, 310, 119828. [Google Scholar] [CrossRef]
  11. Yin, F.; Wu, M.; Liu, L.; Zhu, Y.; Feng, J.; Yin, D.; Yin, C.; Yin, C. Predicting the abundance of copper in soil using reflectance spectroscopy and GF5 hyperspectral imagery. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102420. [Google Scholar] [CrossRef]
  12. Dai, X.; Wang, Z.; Liu, S.; Yao, Y.; Zhao, R.; Xiang, T.; Fu, T.; Feng, H.; Xiao, L.; Yang, X.; et al. Hyperspectral imagery reveals large spatial variations of heavy metal content in agricultural soil—A case study of remote-sensing inversion based on Orbita Hyperspectral Satellites (OHS) imagery. J. Clean. Prod. 2022, 380, 134878. [Google Scholar] [CrossRef]
  13. Tan, W.; Wang, S.; He, H.; Qi, W. Reconstructing coastal blue with blue spectrum based on ZY-1 (02D) satellite. Optik 2021, 242, 166901. [Google Scholar] [CrossRef]
  14. Zhang, B.; Guo, B.; Zou, B.; Wei, W.; Lei, Y.; Li, T. Retrieving soil heavy metals concentrations based on GaoFen-5 hyperspectral satellite image at an opencast coal mine, Inner Mongolia, China. Environ. Pollut. 2022, 300, 118981. [Google Scholar] [CrossRef] [PubMed]
  15. Sun, Y.; Chen, S.; Dai, X.; Li, D.; Jiang, H.; Jia, K. Coupled retrieval of heavy metal nickel concentration in agricultural soil from spaceborne hyperspectral imagery. J. Hazard. Mater. 2023, 446, 130722. [Google Scholar] [CrossRef]
  16. Shi, T.; Chen, Y.; Liu, Y.; Wu, G. Visible and near-infrared reflectance spectroscopy—An alternative for monitoring soil contamination by heavy metals. J. Hazard. Mater. 2014, 265, 166–176. [Google Scholar] [CrossRef]
  17. Arif, M.; Qi, Y.; Dong, Z.; Wei, H. Rapid retrieval of cadmium and lead content from urban greenbelt zones using hyperspectral characteristic bands. J. Clean. Prod. 2022, 374, 133922. [Google Scholar] [CrossRef]
  18. Zhang, S.; Shen, Q.; Nie, C.; Huang, Y.; Wang, J.; Hu, Q.; Ding, X.; Zhou, Y.; Chen, Y. Hyperspectral inversion of heavy metal content in reclaimed soil from a mining wasteland based on different spectral transformation and modeling methods. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2019, 211, 393–400. [Google Scholar] [CrossRef]
  19. Hong, Y.; Shen, R.; Cheng, H.; Chen, S.; Chen, Y.; Guo, L.; He, J.; Liu, Y.; Yu, L.; Liu, Y. Cadmium concentration estimation in peri-urban agricultural soils: Using reflectance spectroscopy, soil auxiliary information, or a combination of both? Geoderma 2019, 354, 113875. [Google Scholar] [CrossRef]
  20. Liu, Z.; Lu, Y.; Peng, Y.; Zhao, L.; Wang, G.; Hu, Y. Estimation of Soil Heavy Metal Content Using Hyperspectral Data. Remote Sens. 2019, 11, 1464. [Google Scholar] [CrossRef]
  21. Wu, Y.; Chen, J.; Ji, J.; Gong, P.; Liao, Q.; Tian, Q.; Ma, H. A mechanism study of reflectance spectroscopy for investigating heavy metals in soils. Soil Sci. Soc. Am. J. 2007, 71, 918–926. [Google Scholar] [CrossRef]
  22. Cheshire, M.; Dumat, C.; Fraser, A.; Hillier, S.; Staunton, S. The interaction between soil organic matter and soil clay minerals by selective removal and controlled addition of organic matter. Eur. J. Soil Sci. 2000, 51, 497–509. [Google Scholar] [CrossRef]
  23. Stenberg, B. Effects of soil sample pretreatments and standardised rewetting as interacted with sand classes on Vis-NIR predictions of clay and soil organic carbon. Geoderma 2010, 158, 15–22. [Google Scholar] [CrossRef]
  24. Gomez, C.; Viscarra Rossel, R.A.; McBratney, A.B. Soil organic carbon prediction by hyperspectral remote sensing and field vis-NIR spectroscopy: An Australian case study. Geoderma 2008, 146, 403–411. [Google Scholar] [CrossRef]
  25. Hou, L.; Li, X.; Li, F. Hyperspectral-based Inversion of Heavy Metal Content in the Soil of Coal Mining Areas. J. Environ. Qual. 2019, 48, 57–63. [Google Scholar] [CrossRef]
  26. Lin, X.; Su, Y.-C.; Shang, J.; Sha, J.; Li, X.; Sun, Y.-Y.; Ji, J.; Jin, B. Geographically Weighted Regression Effects on Soil Zinc Content Hyperspectral Modeling by Applying the Fractional-Order Differential. Remote Sens. 2019, 11, 636. [Google Scholar] [CrossRef]
  27. Yang, H.; Huang, K.; Zhang, K.; Weng, Q.; Zhang, H.; Wang, F. Predicting Heavy Metal Adsorption on Soil with Machine Learning and Mapping Global Distribution of Soil Adsorption Capacities. Environ. Sci. Technol. 2021, 55, 14316–14328. [Google Scholar] [CrossRef]
  28. Riedel, F.; Denk, M.; Müller, I.; Barth, N.; Gläßer, C. Prediction of soil parameters using the spectral range between 350 and 15,000 nm: A case study based on the Permanent Soil Monitoring Program in Saxony, Germany. Geoderma 2018, 315, 188–198. [Google Scholar] [CrossRef]
  29. Chabrillat, S.; Ben-Dor, E.; Cierniewski, J.; Gomez, C.; Schmid, T.; van Wesemael, B. Imaging Spectroscopy for Soil Mapping and Monitoring. Surv. Geophys. 2019, 40, 361–399. [Google Scholar] [CrossRef]
  30. Jia, X.; Hu, B.; Marchant, B.P.; Zhou, L.; Shi, Z.; Zhu, Y. A methodological framework for identifying potential sources of soil heavy metal pollution based on machine learning: A case study in the Yangtze Delta, China. Environ. Pollut. 2019, 250, 601–609. [Google Scholar] [CrossRef]
  31. Yang, Y.; Shang, K.; Xiao, C.; Wang, C.; Tang, H. Spectral Index for Mapping Topsoil Organic Matter Content Based on ZY1-02D Satellite Hyperspectral Data in Jiangsu Province, China. ISPRS Int. J. Geo-Inf. 2022, 11, 111. [Google Scholar] [CrossRef]
  32. Xu, Z.; Chen, S.; Zhu, B.; Chen, L.; Ye, Y.; Lu, P. Evaluating the Capability of Satellite Hyperspectral Imager, the ZY1–02D, for Topsoil Nitrogen Content Estimation and Mapping of Farmlands in Black Soil Area, China. Remote Sens. 2022, 14, 1008. [Google Scholar] [CrossRef]
  33. Guo, Y.; Cheng, S.; Fang, H.; Yang, Y.; Li, Y.; Zhou, Y. Responses of soil fungal taxonomic attributes and enzyme activities to copper and cadmium co-contamination in paddy soils. Sci. Total Environ. 2022, 844, 157119. [Google Scholar] [CrossRef] [PubMed]
  34. Zhang, F.; Wang, Y.; Liao, X. Recognition method for the health risks of potentially toxic elements in a headwater catchment. Sci. Total Environ. 2022, 839, 156287. [Google Scholar] [CrossRef] [PubMed]
  35. Feng, W.; Guo, Z.; Xiao, X.; Peng, C.; Shi, L.; Ran, H.; Xu, W. A dynamic model to evaluate the critical loads of heavy metals in agricultural soil. Ecotoxicol. Environ. Saf. 2020, 197, 110607. [Google Scholar] [CrossRef]
  36. Li, Z.; Ma, Z.; van der Kuijp, T.J.; Yuan, Z.; Huang, L. A review of soil heavy metal pollution from mines in China: Pollution and health risk assessment. Sci. Total Environ. 2014, 468–469, 843–853. [Google Scholar] [CrossRef]
  37. Miao, X.; Hao, Y.; Zhang, F.; Zou, S.; Ye, S.; Xie, Z. Spatial distribution of heavy metals and their potential sources in the soil of Yellow River Delta: A traditional oil field in China. Environ. Geochem. Health 2020, 42, 7–26. [Google Scholar] [CrossRef]
  38. Vasques, G.M.; Grunwald, S.; Sickman, J.O. Comparison of multivariate methods for inferential modeling of soil carbon using visible/near-infrared spectra. Geoderma 2008, 146, 14–25. [Google Scholar] [CrossRef]
  39. Dotto, A.C.; Dalmolin, R.S.D.; Grunwald, S.; ten Caten, A.; Pereira Filho, W. Two preprocessing techniques to reduce model covariables in soil property predictions by Vis-NIR spectroscopy. Soil Tillage Res. 2017, 172, 59–68. [Google Scholar] [CrossRef]
  40. Jia, X.; Hou, D. Mapping soil arsenic pollution at a brownfield site using satellite hyperspectral imagery and machine learning. Sci. Total Environ. 2023, 857, 159387. [Google Scholar] [CrossRef]
  41. Yang, C.; Feng, M.; Song, L.; Wang, C.; Yang, W.; Xie, Y.; Jing, B.; Xiao, L.; Zhang, M.; Song, X.; et al. Study on hyperspectral estimation model of soil organic carbon content in the wheat field under different water treatments. Sci. Rep. 2021, 11, 18582. [Google Scholar] [CrossRef] [PubMed]
  42. Hong, Y.; Guo, L.; Chen, S.; Linderman, M.; Mouazen, A.M.; Yu, L.; Chen, Y.; Liu, Y.; Liu, Y.; Cheng, H.; et al. Exploring the potential of airborne hyperspectral image for estimating topsoil organic carbon: Effects of fractional-order derivative and optimal band combination algorithm. Geoderma 2020, 365, 114228. [Google Scholar] [CrossRef]
  43. Zhu, C.; Ding, J.; Zhang, Z.; Wang, Z. Exploring the potential of UAV hyperspectral image for estimating soil salinity: Effects of optimal band combination algorithm and random forest. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 279, 121416. [Google Scholar] [CrossRef] [PubMed]
  44. Clark, R.N.; Roush, T.L. Reflectance spectroscopy: Quantitative analysis techniques for remote sensing applications. J. Geophys. Res. Solid Earth 1984, 89, 6329–6340. [Google Scholar] [CrossRef]
  45. Rinnan, Å.; Berg, F.v.d.; Engelsen, S.B. Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends Anal. Chem. 2009, 28, 1201–1222. [Google Scholar] [CrossRef]
  46. Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef]
  47. Rezaei, M.; Mohammadifar, A.; Gholami, H.; Mina, M.; Riksen, M.J.P.M.; Ritsema, C. Mapping of the wind erodible fraction of soil by bidirectional gated recurrent unit (BiGRU) and bidirectional recurrent neural network (BiRNN) deep learning models. Catena 2023, 223, 106953. [Google Scholar] [CrossRef]
  48. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  49. GB15618-2018; Soil Environmental Quality–Risk Control Standard for Soil Contamination of Agricultural Land. China Environmental Science Press: Beijing, China, 2018. Available online: https://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/trhj/201807/t20180703_446029.shtml (accessed on 28 December 2022).
  50. Manta, D.S.; Angelone, M.; Bellanca, A.; Neri, R.; Sprovieri, M. Heavy metals in urban soils: A case study from the city of Palermo (Sicily), Italy. Sci. Total Environ. 2002, 300, 229–243. [Google Scholar] [CrossRef]
  51. Ma, Q.; Zhao, G. Effects of different land use types on soil nutrients in intensive agricultural region. J. Nat. Resour. 2010, 25, 1834–1843. [Google Scholar] [CrossRef]
  52. Ye, M.; Zhu, L.; Li, X.; Ke, Y.; Huang, Y.; Chen, B.; Yu, H.; Li, H.; Feng, H. Estimation of the soil arsenic concentration using a geographically weighted XGBoost model based on hyperspectral data. Sci. Total Environ. 2023, 858, 159798. [Google Scholar] [CrossRef] [PubMed]
  53. Zhou, W.; Yang, H.; Xie, L.; Li, H.; Huang, L.; Zhao, Y.; Yue, T. Hyperspectral inversion of soil heavy metals in Three-River Source Region based on random forest model. Catena 2021, 202, 105222. [Google Scholar] [CrossRef]
  54. Zhang, Y.; Hartemink, A.E.; Huang, J.; Townsend, P.A. Synergistic use of hyperspectral imagery, Sentinel-1 and LiDAR improves mapping of soil physical and geochemical properties at the farm-scale. Eur. J. Soil Sci. 2021, 72, 1690–1717. [Google Scholar] [CrossRef]
  55. Daughtry, C.S.T.; Doraiswamy, P.C.; Hunt, E.R.; Stern, A.J.; McMurtrey, J.E.; Prueger, J.H. Remote sensing of crop residue cover and soil tillage intensity. Soil Tillage Res. 2006, 91, 101–108. [Google Scholar] [CrossRef]
  56. Nawar, S.; Cipullo, S.; Douglas, R.; Coulon, F.; Mouazen, A. The applicability of spectroscopy methods for estimating potentially toxic elements in soils: State-of-the-art and future trends. Appl. Spectrosc. Rev. 2020, 55, 525–557. [Google Scholar] [CrossRef]
  57. Song, Y.; Li, F.; Yang, Z.; Ayoko, G.A.; Frost, R.L.; Ji, J. Diffuse reflectance spectroscopy for monitoring potentially toxic elements in the agricultural soils of Changjiang River Delta, China. Appl. Clay Sci. 2012, 64, 75–83. [Google Scholar] [CrossRef]
  58. Shen, L.; Gao, M.; Yan, J.; Li, Z.-L.; Leng, P.; Yang, Q.; Duan, S.-B. Hyperspectral Estimation of Soil Organic Matter Content using Different Spectral Preprocessing Techniques and PLSR Method. Remote Sens. 2020, 12, 1206. [Google Scholar] [CrossRef]
  59. White, W.B. Infrared characterization of water and hydroxyl ion in the basic magnesium carbonate minerals. Am. Mineral. J. Earth Planet. Mater. 1971, 56, 46–53. [Google Scholar]
  60. Ben-Dor, E.; Inbar, Y.; Chen, Y. The reflectance spectra of organic matter in the visible near-infrared and short wave infrared region (400–2500 nm) during a controlled decomposition process. Remote Sens. Environ. 1997, 61, 1–15. [Google Scholar] [CrossRef]
  61. Lin, N.; Jiang, R.; Li, G.; Yang, Q.; Li, D.; Yang, X. Estimating the heavy metal contents in farmland soil from hyperspectral images based on Stacked AdaBoost ensemble learning. Ecol. Indic. 2022, 143, 109330. [Google Scholar] [CrossRef]
  62. Rossel, R.V.; Behrens, T. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 2010, 158, 46–54. [Google Scholar] [CrossRef]
  63. Chen, T.; Chang, Q.; Clevers, J.G.P.W.; Kooistra, L. Rapid identification of soil cadmium pollution risk at regional scale based on visible and near-infrared spectroscopy. Environ. Pollut. 2015, 206, 217–226. [Google Scholar] [CrossRef] [PubMed]
  64. Hong, Y.; Chen, Y.; Shen, R.; Chen, S.; Xu, G.; Cheng, H.; Guo, L.; Wei, Z.; Yang, J.; Liu, Y.; et al. Diagnosis of cadmium contamination in urban and suburban soils using visible-to-near-infrared spectroscopy. Environ. Pollut. 2021, 291, 118128. [Google Scholar] [CrossRef] [PubMed]
  65. Baumgardner, M.F.; Silva, L.F.; Biehl, L.L.; Stoner, E.R. Reflectance properties of soils. Adv. Agron. 1986, 38, 1–44. [Google Scholar] [CrossRef]
  66. Sun, W.; Liu, S.; Zhang, X.; Zhu, H. Performance of hyperspectral data in predicting and mapping zinc concentration in soil. Sci. Total Environ. 2022, 824, 153766. [Google Scholar] [CrossRef]
  67. Wang, X.; Zhang, F.; Johnson, V.C. New methods for improving the remote sensing estimation of soil organic matter content (SOMC) in the Ebinur Lake Wetland National Nature Reserve (ELWNNR) in northwest China. Remote Sens. Environ. 2018, 218, 104–118. [Google Scholar] [CrossRef]
  68. Hong, Y.; Liu, Y.; Chen, Y.; Liu, Y.; Yu, L.; Liu, Y.; Cheng, H. Application of fractional-order derivative in the quantitative estimation of soil organic matter content through visible and near-infrared spectroscopy. Geoderma 2019, 337, 758–769. [Google Scholar] [CrossRef]
  69. Leydesdorff, L.; Bensman, S. Classification and powerlaws: The logarithmic transformation. J. Am. Soc. Inf. Sci. Technol. 2006, 57, 1470–1486. [Google Scholar] [CrossRef]
  70. He, T.; Wang, J.; Lin, Z.; Cheng, Y. Spectral features of soil organic matter. Geo-Spat. Inf. Sci. 2009, 12, 33–40. [Google Scholar] [CrossRef]
  71. Tan, K.; Ma, W.; Wu, F.; Du, Q. Random forest–based estimation of heavy metal concentration in agricultural soils with hyperspectral sensor data. Environ. Monit. Assess. 2019, 191, 446. [Google Scholar] [CrossRef]
  72. Wang, L.; Zhou, Y.; Liu, J.; Liu, Y.; Zuo, Q.; Li, Q. Exploring the potential of multispectral satellite images for estimating the contents of cadmium and lead in cropland: The effect of the dimidiate pixel model and random forest. J. Clean. Prod. 2022, 367, 132922. [Google Scholar] [CrossRef]
  73. Gebrekidan, A.; Weldegebriel, Y.; Hadera, A.; Van der Bruggen, B. Toxicological assessment of heavy metals accumulated in vegetables and fruits grown in Ginfel river near Sheba Tannery, Tigray, Northern Ethiopia. Ecotoxicol. Environ. Saf. 2013, 95, 171–178. [Google Scholar] [CrossRef] [PubMed]
  74. Sun, Y.; Li, H.; Guo, G.; Semple, K.T.; Jones, K.C. Soil contamination in China: Current priorities, defining background levels and standards for heavy metals. J. Environ. Manag. 2019, 251, 109512. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Distribution of soil sampling sites in the study area: (a) Jiangxi Province, China; (b) Geographic location of the study area; (c) Distribution of sampling points and elevation within the study area. The top-right image shows the coverage of the study area by the original ZY1-02D imagery.
Figure 1. Distribution of soil sampling sites in the study area: (a) Jiangxi Province, China; (b) Geographic location of the study area; (c) Distribution of sampling points and elevation within the study area. The top-right image shows the coverage of the study area by the original ZY1-02D imagery.
Agriculture 14 01619 g001
Figure 2. (a) Original spectral curves and (b) Savitzky–Golay (SG) smoothed spectral curves of soil samples from hyperspectral images. Note: Each color represents a sampling point.
Figure 2. (a) Original spectral curves and (b) Savitzky–Golay (SG) smoothed spectral curves of soil samples from hyperspectral images. Note: Each color represents a sampling point.
Agriculture 14 01619 g002
Figure 3. The correlation coefficients between soil Cd and original soil spectral data, and after Savitzky–Golay (SG) smoothed spectral data.
Figure 3. The correlation coefficients between soil Cd and original soil spectral data, and after Savitzky–Golay (SG) smoothed spectral data.
Agriculture 14 01619 g003
Figure 4. Nine spectral transformation curves of soil samples from hyperspectral images. (a) logarithmic transformation (LT), (b) reciprocal transformation (RT), (c) first derivative (FD), (d) logarithm of reciprocal transformation (LR), (e) reciprocal of logarithmic transformation (RL), (f) reciprocal of logarithmic and first derivative (RLFD), (g) standard normal variate (SNV), (h) continuum removal (CR), and (i) multiplicative scatter correction (MSC). Note: Each color represents a sampling point.
Figure 4. Nine spectral transformation curves of soil samples from hyperspectral images. (a) logarithmic transformation (LT), (b) reciprocal transformation (RT), (c) first derivative (FD), (d) logarithm of reciprocal transformation (LR), (e) reciprocal of logarithmic transformation (RL), (f) reciprocal of logarithmic and first derivative (RLFD), (g) standard normal variate (SNV), (h) continuum removal (CR), and (i) multiplicative scatter correction (MSC). Note: Each color represents a sampling point.
Agriculture 14 01619 g004
Figure 5. The correlation coefficient curves between the spectra derived from nine spectral transformation methods and the soil Cd content. (a) logarithmic transformation (LT), (b) reciprocal transformation (RT), (c) first derivative (FD), (d) logarithm of reciprocal transformation (LR), (e) reciprocal of logarithmic transformation (RL), (f) reciprocal of logarithmic and first derivative (RLFD), (g) standard normal variate (SNV), (h) continuum removal (CR), and (i) multiplicative scatter correction (MSC).
Figure 5. The correlation coefficient curves between the spectra derived from nine spectral transformation methods and the soil Cd content. (a) logarithmic transformation (LT), (b) reciprocal transformation (RT), (c) first derivative (FD), (d) logarithm of reciprocal transformation (LR), (e) reciprocal of logarithmic transformation (RL), (f) reciprocal of logarithmic and first derivative (RLFD), (g) standard normal variate (SNV), (h) continuum removal (CR), and (i) multiplicative scatter correction (MSC).
Agriculture 14 01619 g005
Figure 6. Spatial distribution of soil Cd content in the study area driven by the RF model constructed with first derivative-transformed spectral data. Note that this Cd distribution map has been masked with a cropland layer derived from the GlobeLand30 dataset (http://www.globallandcover.com/, accessed on 20 December 2022).
Figure 6. Spatial distribution of soil Cd content in the study area driven by the RF model constructed with first derivative-transformed spectral data. Note that this Cd distribution map has been masked with a cropland layer derived from the GlobeLand30 dataset (http://www.globallandcover.com/, accessed on 20 December 2022).
Agriculture 14 01619 g006
Figure 7. Relative proportional and spatial extents of three soil pollution categories based on soil Cd contents.
Figure 7. Relative proportional and spatial extents of three soil pollution categories based on soil Cd contents.
Agriculture 14 01619 g007
Table 1. Descriptive statistical analysis of soil Cd content in study area.
Table 1. Descriptive statistical analysis of soil Cd content in study area.
Soil ComponentCountMean (mg·kg−1)Maximum (mg·kg−1)Minimum (mg·kg−1)Standard Deviation (mg·kg−1)SkewnessKurtosisCV (%)
Cd3040.5353.2510.10.5632.6247.625105.23
Table 2. Maximum correlation coefficients between Cd content and spectra.
Table 2. Maximum correlation coefficients between Cd content and spectra.
Spectral TransformationsMaximum Correlation Band (nm)Correlation Coefficients
SR413−0.339 **
LT687−0.350 **
RT6870.359 **
FD593−0.412 **
LR6870.350 **
RL6870.338 **
RLFD567/5930.414 **
SNV1795−0.381 **
CR687−0.301 **
MSC1795−0.390 **
Note: ** represents significance at the p < 0.01 level.
Table 3. Modeling performance based on four models across various spectral transformations.
Table 3. Modeling performance based on four models across various spectral transformations.
Spectral TransformationsRFBRNNSVMPLSR
RMSE (mg/kg)R2MAE (mg/kg)RMSE
(mg/kg)
R2MAE
(mg/kg)
RMSE
(mg/kg)
R2MAE (mg/kg)RMSE
(mg/kg)
R2MAE (mg/kg)
SG0.440.410.260.410.280.230.410.500.220.420.500.29
LT0.440.410.260.410.280.230.400.500.210.400.530.27
RT0.450.400.260.400.330.230.410.500.220.390.540.28
FD0.370.610.210.370.350.190.380.570.250.430.490.29
LR0.450.400.260.420.250.240.400.500.210.400.530.28
RL0.450.400.260.420.250.240.400.510.210.420.490.29
RLFD0.370.600.210.350.440.180.380.560.210.430.490.31
SNV0.380.560.230.350.350.200.400.520.210.410.510.28
CR0.320.470.290.460.190.260.480.300.290.500.250.35
MSC0.380.560.230.350.380.200.390.530.200.410.520.28
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lv, J.; Geng, J.; Xu, X.; Yu, Y.; Fang, H.; Guo, Y.; Cheng, S. Estimating Cadmium Concentration in Agricultural Soils with ZY1-02D Hyperspectral Data: A Comparative Analysis of Spectral Transformations and Machine Learning Models. Agriculture 2024, 14, 1619. https://doi.org/10.3390/agriculture14091619

AMA Style

Lv J, Geng J, Xu X, Yu Y, Fang H, Guo Y, Cheng S. Estimating Cadmium Concentration in Agricultural Soils with ZY1-02D Hyperspectral Data: A Comparative Analysis of Spectral Transformations and Machine Learning Models. Agriculture. 2024; 14(9):1619. https://doi.org/10.3390/agriculture14091619

Chicago/Turabian Style

Lv, Junwei, Jing Geng, Xuanhong Xu, Yong Yu, Huajun Fang, Yifan Guo, and Shulan Cheng. 2024. "Estimating Cadmium Concentration in Agricultural Soils with ZY1-02D Hyperspectral Data: A Comparative Analysis of Spectral Transformations and Machine Learning Models" Agriculture 14, no. 9: 1619. https://doi.org/10.3390/agriculture14091619

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop