Base Paper
Base Paper
iew
Traditional Machine Learning & Crowd Sourcing
ev
Abstract
This study introduces a novel methodology for crop type classification in Tanzania by inte-
r
grating crowdsourced data with time-series features extracted from Sentinel-2 satellite imagery.
Leveraging the YouthMappers network, we collected ground validation data on various crops,
including challenging types such as cassava, millet, sunflower, sorghum, and cotton across a
range of agricultural areas. Traditional machine learning algorithms, augmented with carefully
er
engineered time-series features, were employed to map the different crop classes. Our approach
achieved high classification accuracy, evidenced by a Cohen’s Kappa score of 0.80 and an F1-
micro score of 0.82. The model often match or outperform broadly used land cover models which
simply classify ‘agriculture’ without specifying crop types. By interpreting feature importance
pe
using SHAP values, we identified key time-series features driving the model’s performance,
enhancing both interpretability and reliability. Our findings demonstrate that traditional ma-
chine learning techniques, combined with computationally efficient feature extraction methods,
offer a practical and effective “lite learning” approach for mapping crop types in data-scarce
environments. This methodology facilitates accurate crop type classification using a low-cost,
resource-limited approach that contributes valuable insights for sustainable agricultural prac-
tices and informed policy-making, ultimately impacting food security and land management in
ot
1
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 Introduction
ed
2 Background and Context
3 The free access to remotely sensed data, such as imagery from satellites (e.g. Sentinel-2, Landsat), has
4 allowed for crop type classification in developing countries. By leveraging the power of advanced imaging
technologies combined with machine learning algorithms, researchers and practitioners can now identify and
iew
5
6 map different crop types over large geographic areas at no or low cost (Hersh, Engstrom, and Mann 2021).
7 This has the potential to improve food security, land use planning, and agricultural policy in regions where
8 ground-based data collection is limited or non-existent (Bégué, Arvor, Bellon, Betbeder, De Abelleyra, P. D.
9 Ferraz, et al. 2018; H. Li et al. 2023; Ibrahim et al. 2021).
10 In recent years, machine learning approaches have emerged as powerful tools for crop type classification using
11 remotely sensed data. Specifically, methods based on machine learning algorithms have gained recognition
ev
12 for their effectiveness in matching valuable spectral information from satellite imagery to observations of crop
13 type for particular locations. Machine learning algorithms, including decision trees, random forests, support
14 vector machines (SVM), and k-nearest neighbors (KNN), have been successfully used to classify imagery into
15 unique agricultural types (Ibrahim et al. 2021; Bégué, Arvor, Bellon, Betbeder, De Abelleyra, PD Ferraz, et
16 al. 2018; Delince et al. 2017). These algorithms leverage the rich spectral information captured by satellite
r
17 sensors, allowing them to identify distinctive patterns associated with different crop types. By training on
18 large labeled datasets where ground-validation information on crop types is linked to corresponding image
19 pixels, these models can effectively learn the relationships between the spectral characteristics of crops and
20
21
22
er
their respective classes (Bégué, Arvor, Bellon, Betbeder, De Abelleyra, P. D. Ferraz, et al. 2018).
The strength of traditional machine learning approaches lies in their ability to exploit both the spectral
and time-series patterns within the remotely sensed data. Traditional machine learning approaches offer
advantages in terms of interpretability and computational efficiency compared to deep learning architectures.
pe
23
24 They provide insight into the decision-making process and can be more readily understood and explained
25 by domain experts. Additionally, these methods are generally less computationally demanding and require
26 less training data, making them suitable for applications with limited computational resources (Höhl et al.
27 2024; Maxwell, Warner, and Guillén 2021; Teixeira et al. 2023; Y. Li et al. 2023; Ma et al. 2019).
28 Traditional machine learning algorithms require the extraction of variables (e.g. max EVI, mean B2) that
can help distinguish different plant or crop types (Bégué, Arvor, Bellon, Betbeder, De Abelleyra, PD Ferraz,
ot
29
30 et al. 2018). The development of salient time-series features to capture phenological differences between
31 locations from remotely sensed images remains a challenge. These features are typically derived from the
32 spectral bands (e.g. red edge, NIR) of the satellite imagery or indexes, such as the enhanced vegetation index
tn
33 (EVI), and basic time series statistics (e.g. mean, max, minimum, slope) for the growing season (Morton et
34 al. 2006). Meanwhile a broader set of time series statistics from bands or indexes may be more relevant
35 for a number of applications. For instance the skewness of EVI might help distinguish crops that green-up
36 earlier vs later in the season, measures of the numbers of peaks in EVI might help differentiate intercropping
37 or multiple plantings in a season (Bégué, Arvor, Bellon, Betbeder, De Abelleyra, PD Ferraz, et al. 2018).
However, the selection and extraction of these features can be time-consuming and labor-intensive, requiring
rin
38
43 modeling sequential data such as time series, speech, text, and audio. The fundamental feature of RNNs is
44 their ability to maintain a ‘memory’ of previous inputs by using their internal state (hidden layers), which
45 allows them to exhibit dynamic temporal behavior. RNNs and its variants allow the integration of time-
46 series imagery, significantly improving crop type classification outcomes especially in data rich environments
47 [Teixeira et al. (2023); camps2021deep]. Deep learning approaches however typically require much larger
Pr
48 sets of training data, may be more prone to overfitting especially with small sample sizes, have significant
49 limitations to interpretability, and require expensive compute (Höhl et al. 2024; Maxwell, Warner, and
50 Guillén 2021; Teixeira et al. 2023; Y. Li et al. 2023; Ma et al. 2019). Although recent efforts have closed
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 the gap e.g. (Tseng et al. 2021), the lack of readily available and reliable ground truth data or benchmark
ed
2 datasets for training, as discussed earlier, may limit the applicability of deep learning for a variety of tasks
3 including crop classification and make researchers more reliant of less reliable techniques like transfer learning
4 or zero-shot or low shot methods (Owusu et al. 2024; Y. Li et al. 2023; Ma et al. 2019). Moreover, training
5 data for extreme events, like crop losses, disease, and lodging are largely non-existant. Interpretability is also
6 a salient weakness as interpretation of models allows us to gain scientific insight and assess trustworthiness
iew
7 and fairness in so far as outputs affect policy decisions.
8 An alternative approach turns back the clock on deep learning approaches. For instance CNN classifiers,
9 through the exertion of tremendous effort of GPUs, can apply and learn from thousands of filters or convo-
10 lutions that help detect distinct features like edges, textures or patterns. It is however possible to apply a
11 more limited yet salient set of filters like Fourier Transforms, Differential Morphological Profiles (Pesaresi
12 and Benediktsson 2001), Line Support Regions or Structural Feature Sets (Huang, Zhang, and Li 2007)
amongst others, to images and then use these as features in more traditional machine learning approaches
ev
13
14 Engstrom, Hersh, and Newhouse (2022). This approach may be particularly useful in data-scarce environ-
15 ments, requiring less training data and potentially offering more efficient results in low-information settings.
16 The same approach has been taken for time series analysis, where instead of learning patterns through a
17 RNNs memory, we can apply a more limited but potentially salient series of time series filters. Measures of
trends, descriptions of distributions, or measures of change and complexity might adequately describe time
r
18
19 series properties for regression and classification tasks (Christ et al. 2018; Yang et al. 2021). This time
20 series filter approach, developed for this paper, can also be applied on a pixel-by-pixel basis to satellite image
21
22
23
24
bands or index values(Mann, Michael L. 2024).
er
Field-collected data provides the necessary validation and calibration for remote sensing-based models. It
serves as the benchmark against which the model’s predictions are evaluated and refined. Ground validation
data collected through field visits, observation, and interactions with local farmers offers essential insights
pe
25 into the specific crop types present in the study area. Validating and training models with accurate ground
26 reference information allows for the spectral patterns captured by remote sensing data to be correctly as-
27 sociated with the corresponding crop. By combining the spectral information from satellite imagery with
28 ground validation data, researchers can develop robust models that effectively differentiate between different
29 crop types based on their unique spectral signatures and temporal patterns.
30 The collection of field observations and ground validation data is a critical input for the development of
ot
31 models to classify crop types (Delince et al. 2017; Ma et al. 2019). However, obtaining accurate and timely
32 ground validation data can be challenging in developing countries due to limited resources, infrastructure, and
33 local capacity (Delince et al. 2017; Bégué, Arvor, Bellon, Betbeder, De Abelleyra, PD Ferraz, et al. 2018).
34 In many cases, researchers rely on crowdsourced data from volunteers or citizen scientists to supplement or
tn
35 validate ground truth data collected through traditional methods. Projects like (Tseng et al. 2021) point
36 to the paucity of multi-class crop type datasets globally. This is a significant gap in the field of crop type
37 classification, as the availability of high-quality training data is essential for the development of accurate and
38 reliable machine learning models (Maxwell, Warner, and Guillén 2021).
In this study we aim to address two critical challenges in the field of crop type classification: the lack of
rin
39
40 in-season multi-class crop type datasets, and the need for new methods to obtain high accuracy crop type
41 predictions from limited amounts of training data.
42 We propose a novel approach that combines crowdsourced data with a new automated approach to extracting
43 time-series features from satellite imagery. We apply this new approach to classify crop types in Northern
ep
44 Tanzania. By leveraging the power of crowdsourcing and remote sensing technologies, we aim to develop a
45 robust and scalable solution for crop type classification that can be adapted to other regions and contexts
46 with a minimal or no cost.
47
48 Data for this study were collected from multiple sources, including satellite imagery, and crowdsourced
49 ground truth observations. The section below describes the input data and methods used throughout the
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 paper.
ed
2 Study Area
3 The study was conducted in 50 wards within three major districts of Arusha, Dodoma and Mwanza in
4 Tanzania as seen in Figure 1. Tanzania, a country in East Africa, is known for its diverse agricultural
iew
5 landscape. The region is characterized by a mix of smallholder farms, commercial plantations, and natural
6 vegetation, making it an ideal yet challenging location for studying crop type classification. Our choice of
7 these three districts was driven by the distinct variation in the major crop types that possibly dominated in
8 each district,among oil seeds, grains and commercial crops such as cotton.
r ev
er
pe
ot
tn
rin
10 Crop type data collection was designed and executed by YouthMappers through a crowdsourced GIS ap-
11 proach. The method was designed in 3 steps where: 1) Development of and training all intended student
12 participants. 2) Data collection using KoboToolbox hosting a well developed data model. The exercise lasted
13 14 days with 7 days of iterative pilot testing on different farms, crops and landscapes. Finally the last step,
14 3) was the data review and cleaning phase to generate a sample for training.
Pr
15 Additional training data was collected utilizing high resolution imagery from Google Earth. These data were
16 used to supplement the crowdsourced data and improve the model’s ability to distinguish between crops
17 and more common land cover types like forests, urban areas, and water. The final cleaned dataset includes
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 1,400 crop type observations of rice, maize, cassava, sunflower, sorghum, cotton, and millet; plus 386 other
ed
2 observations of land cover classes including water, tidal areas, forest, shrub and urban.
3 Data Collection Methods To ensure the success of our project, we focused heavily on the design of our
4 data collection methods. These methods were carefully integrated, taking into account: the crop calendar,
5 information on the different stages of crop development, the distances between crop fields, the tools used,
iew
6 and data quality assurance.
7 Young crops exhibit significant differences compared to mature crops in terms of color, density, and pheno-
8 logical development. Variations in the crop cycle across different fields could lead to heteroscedasticity in the
9 spectral reflectance measurements used for machine learning (ML) training, thereby affecting the precision
10 and accuracy of the model. By targeting the period of April through May we aimed to capture crops late in
11 the growing season and yet before harvest as seen in the crop calendar in Figure 2 below.
r ev
er
pe
ot
16 In the north and northeast, bimodal areas experience the short rains (Vuli) from late-October to mid-January,
17 during which crops like maize, beans, and vegetables are planted in October and November, and the long
18 rains (Masika) from March to May, supporting crops like maize, rice, sorghum, and cassava, typically planted
19 in February and March. In the central, southern, and western regions with unimodal rainfall, there is a sin-
20 gle rainy season from November to April, when crops such as cotton, maize, millet, rice, and sunflower are
ep
21 planted in November and December. This diversity in rainfall patterns allows for a wide variety of crops
22 suited to the local climate and seasonal conditions.
23 The data collection took place between late April and May as shown in figure 2 to align with mid-season
24 for many crops. YouthMappers were advised to focus on a set of target crops, ones known to be present in
the region and at appropriate crop growth stages. Before embarking on data collection, discussions covered
Pr
25
26 several factors to consider in selecting field collection sites. Factors included field size to establish a minimum
27 detectable by the satellite imagery, clear and open fields to enhance clean spectra sampling, prioritizing areas
28 covered by a single crop to reduce confusion, sampling distribution of at least one kilometer between stops,
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 and even crop maturity and health. YouthMappers were advised to identify only fields 30 meters or greater
ed
2 across to ensure a minimum size detectable by the satellite imagery. When picking between fields for data
3 collection, defining clear and open fields was discussed with several examples, as agriculture can include mixed
4 land cover types with tree cover, power lines, buildings, and other obstructions that prevent the satellite
5 from cleanly capturing spectra of only the crop. YouthMappers were advised to only pick clear and open
6 fields and prioritize those growing only one crop. The recommendation to have a sampling distribution of at
iew
7 least one kilometer was a compromise between the amount of time available for data collection, the expense
8 of travel, and a sufficient distribution to reduce spatial autocorrelation. It was permitted for YouthMappers
9 to identify adjacent fields growing different types of crops, but otherwise highly encouraged for them to
10 return to the vehicle and drive the 1 km to collect more data. The most important factors driving the timing
11 for data collection were crop maturity and health. The fieldwork was conducted between late-April to May
12 2023 because the target crops typically reach reproductive stages with maximum canopy cover during this
13 time of year. This crop stage is best suited for discerning different crop types with satellite imagery. While
ev
14 most fields were found in late reproductive stages, drought conditions impacted the health of some fields.
15 YouthMappers were advised to prioritize and identify mature, lush green fields, as ideal data collection sites.
16 By thoroughly discussing each of these factors, we trained YouthMappers to select fields best suited as in-situ
17 training data for satellite imagery analysis.
The data collection was managed through KoboCollect, hosted on the KoboToolBox infrastructure, which
r
18
19 provided an effective platform for gathering and organizing data. This approach enabled a collection of the
20 desired volume of data points necessary for model training and evaluation, as summarized in Table 1.
Desired points:
Arusha
300
er
Mwanza
1000
Dodoma
800
pe
Crops: Maize Maize Sorghum
Rice Cotton Maize
Sorghum & Millet Rice Millet
Peanuts or Groundnut Sunflower
Peanuts or Groundnut
Cotton Fields
ot
21 Satellite Imagery
22 Satellite imagery was obtained from the Sentinel-2 satellite constellation, which provides high-resolution
23 multispectral data at 10-meter spatial resolution. The imagery was acquired over the study area between
24 January and August of 2023 during the growing season, capturing the spectral characteristics of different
crop types and coinciding with field data collection. The Sentinel-2 L2 harmonized reflectance data were
rin
25
26 pre-processed to remove noise and atmospheric effects, ensuring that the spectral information was accurate
27 and reliable for classification purposes (Bégué, Arvor, Bellon, Betbeder, De Abelleyra, PD Ferraz, et al.
28 2018).
29 In our study, cloud and cloud shadow contamination was mitigated using the ‘s2cloudless’ machine learning
ep
30 model on the Google Earth Engine platform. Cloudy pixels were identified using a cloud probability mask,
31 with pixels having a probability above 50% classified as clouds. To detect cloud shadows, we used the Near-
32 Infrared (NIR) spectrum to flag dark pixels not identified as water as potential shadow pixels. The projected
33 shadows from the clouds were identified using a directional distance transform based on the solar azimuth
34 angle from the image metadata. A combined cloud and shadow mask was refined through morphological
Pr
35 dilation, creating a buffer zone to ensure comprehensive coverage. This mask was applied to the Sentinel-2
36 surface reflectance data to exclude all pixels identified as clouds or shadows, enhancing the reliability of the
37 dataset for environmental analysis.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 Monthly composites were collected for January through August of 2023 for the the bands B2 Blue (458-
ed
2 523nm), B6 Vegetation Red Edge (733-738nm), B8 Near Infrared (785-899nm), B11 Short-Wave Infrared
3 (SWIR)(1565-1655nm), and B12 Short Wave Infrared (2100-2280nm). Sw. We also calculate the Enhanced
4 Vegetation Index (EVI) and hue, the color spectrum value (Google, n.d.). This computed hue value provides
5 the basic color as perceived in the color wheel, from red, through green, blue, and back to red for each pixel.
6 Due to the high prevalence of clouds in the region, linear interpolation was used to fill in missing data in the
iew
7 time series using xr_fresh (Mann, Michael L. 2024). These bands were selected based on their relevance
8 to crop type classification and their ability to capture the unique spectral signatures of different crops. The
9 monthly composites were used to generate time series features for each pixel in the study area, providing
10 valuable information on the temporal dynamics of crop growth and development.
ev
12
13 information on the phenological patterns of different crops. We leverage the time series nature of the
14 satellite imagery to extract relevant features for crop type classification.
15 In this study, we utilized the xr_fresh toolkit to compute detailed time-series statistics for various spectral
16 bands, facilitating comprehensive pixel-by-pixel temporal analysis (Mann, Michael L. 2024). The xr_fresh
r
17 framework is specifically designed to extract a wide array of statistical measures from time-series data, which
18 are essential for understanding temporal dynamics in remote sensing datasets.
19
20
21
22
er
The metrics computed by xr_fresh in this study include basic statistical descriptors, changes over time, and
distribution-based metrics, applied to each pixel’s time series for selected spectral bands (B12, B11, hue, B6,
EVI, and B2). The list of computed time-series statistics encompasses:
• Energy Measures: Absolute energy which provides a sum of squares of the values.
pe
23 • Change Metrics: Absolute sum of changes to quantify overall variability, mean absolute change, and
24 mean change.
25 • Autocorrelation: Calculated for three lags (1, 2, and 3) to assess the serial dependence at different
26 time intervals.
27 • Count Metrics: Count above and below mean, capturing the frequency of high and low values relative
28 to the average.
ot
29 • Extreme Values: Day of the year for maximum and minimum values, providing insight into seasonal
30 patterns.
31 • Distribution Characteristics: Kurtosis, skewness, and quantiles (5th and 95th percentiles) to de-
32 scribe the shape and spread of the distribution.
tn
33 • Variability Metrics: Standard deviation, variance, and whether variance is larger than standard
34 deviation to evaluate the dispersion of values.
35 • Complexity and Trend Analysis: Time series complexity and symmetry looking, adding depth to
36 the analysis of temporal patterns.
37 For a full list of the time series features extracted in this study and their descriptions, please refer to the
rin
38 Appendix.
39 The integration of xr_fresh into our analytical workflow allowed for an automated and robust analysis
40 of temporal patterns across the study area. By leveraging this toolkit, we could efficiently process large
41 datasets, ensuring that each pixel’s temporal dynamics were comprehensively characterized, which is critical
for accurate environmental monitoring and change detection.
ep
42
43 Data Extraction
44 To partially account for variation in field size we extracted pixels based on a buffer around field point locations.
45 This allows us to account for the fact that fields likely represent groups of adjacent pixels. Small fields were
Pr
46 buffered by only 5 meters, medium fields by 10m and large fields by 30m. This approach allowed us to
47 capture the time series features from the surrounding area, providing a more comprehensive representation
48 of the field’s characteristics. The use of larger buffers was explored but found to decrease model performance
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 as fields tended to be heterogenous - for instance containing patches of trees. To account for this in our
ed
2 modeling we treat observations from the same field as a “group” in our cross-validation scheme - as described
3 below.
iew
5
6 crop classifications. Notably, features were centered and scaled from the scikit-learn library to normalize
7 the data, followed by the application of a variance threshold method to reduce dimensionality by excluding
8 features with low variance (Pedregosa et al. 2011).
9 We employ Optuna, an optimization framework, to conduct systematic model selection and hyperparameter
10 tuning (Akiba et al. 2019). Our methodology involved defining a study using Optuna where each trial
11 proposes a set of model parameters aimed at optimizing performance metrics. Specifically, we used stratified
ev
12 group k-fold cross-validation with the number of splits set to three, ensuring that samples from the same
13 field were not split across training and validation sets to prevent data leakage. The scoring metric utilized
14 is the kappa statistic, chosen for its suitability in evaluating models on imbalanced datasets.
15 This approach allows us to rigorously evaluate and compare different classifiers, including LightGBM, Support
r
16 Vector Classification (SVC), and RandomForest, and their configurations under a variety of conditions. The
17 final selection of the model and its parameters was based on the ability to maximize the kappa statistic,
18 ensuring that the chosen model provided the best possible performance for the classification of land cover
19
20
types in our dataset.
22 exPlanations (SHAP) (Lundberg and Lee 2017). This approach, based on game theory, quantifies the impact
23 of each feature on the prediction outcome, providing insights into which features are most influential in
24 determining land cover types.
25 In our feature selection process, we incorporate both the mean and maximum SHAP values to comprehen-
26 sively assess the influence of features on model predictions. The mean of the absolute SHAP values across all
samples, provides a measure of the average impact of each feature, highlighting its overall importance across
ot
27
28 the dataset. This approach emphasizes features that consistently affect the model’s output but might un-
29 derrepresented the significance of features causing substantial impacts under specific conditions. To address
30 this, we also consider the maximum absolute SHAP values. Sorting features by their maximum absolute
SHAP values allows us to identify those that have significant, albeit possibly infrequent, effects on individ-
tn
31
32 ual predictions. This method ensures that features crucial for particular scenarios are not overlooked, thus
33 offering a more nuanced understanding of feature importance that balances general influence with critical,
34 situation-specific impacts.
35 Feature selection then is the union of the top 30 time series features found with both the mean and maximum
rin
36 SHAP values, resulting in 33 total features. This approach ensures that the selected features are both con-
37 sistently influential across the dataset and capable of exerting substantial impacts under specific conditions,
38 providing a comprehensive set of features for model training and evaluation.
44 data from a large number of volunteers or citizen scientists, who provide valuable ground truth information.
45 This method has proven especially useful in areas where traditional data collection methods are challenging
46 due to logistical, financial, or infrastructural constraints.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 By leveraging the YouthMappers student organization, with over 420 chapters in 80 countries, we were
ed
2 able to collect a large dataset of crop type observations in Tanzania. Participating YouthMappers chapters
3 included: the Institute of Rural Development Planning - Dodoma, Institute of Rural Development Planning -
4 Mwanza, University of Dodoma, the Nelson Mandela African Institution of Science and Technology, and the
5 Institute of Accountancy Arusha. Moreover this exercise provided an important opportunity for students to
6 gain practical experience in data collection, analysis, and interpretation, contributing to their professional
iew
7 development and capacity building in the geospatial domain.
8 Challenges and Lessons Learned There were a number of challenges involved with planning, and
9 implementing a large-scale field operation. One of the primary challenges encountered was the variability
10 in crop cycles across different fields and crop identification more generally. This was particularly true in
11 Arusha, where fields were found in almost every stage of crop development and some fields were visited
12 before the reproductive stages, as the drought delayed planting. In other regions, some crops had already
ev
13 been harvested. This discrepancy resulted in incomplete datasets, as certain crop types were missing or not
14 easily accounted for. The absence of these crops in certain areas impacted our modeling efforts by reducing
15 the representativeness of the training data. Second, although the YouthMappers teams did a commendable
16 job, crop identification is challenging for non-agricultural experts. This task was even more challenging given
17 the heterogeneity of local planting practices and the similarity of early stage growth between, for instance,
r
18 crops like maize and sorghum. To mitigate this issue YouthMappers teams took detailed photos of each
19 field. These images provided us the ability to verify crop types remotely before training the model. While
extremely useful, the collection of more detailed single plant images could have helped us minimize removal of
20
21
22
23
24
er
some observations. Third, the site selection depended on many non-crop related factors including where the
training could be hosted and time constraints based on YouthMappers student’s academic calendars. This
led to changes in which target crops were selected. Fourth, travel was finally approved during a drought year.
This is helpful for transportation during field work, yet poses a challenge as more fields can be abandoned,
pe
25 harvested early, or otherwise found in a poor condition. To mitigate many of these issues, future data
26 collection efforts should allow for more flexibility in the timing in data collection and ensure coverage that
27 reflects local crop cycles.
29
30 Figure 3. The dataset consists of a diverse range of land cover types, each contributing differently to the total
31 number of observations. Maize is the most prevalent land cover type, accounting for the highest percentage of
32 the observations, followed by rice and sunflower. This is indicative of the agricultural dominance in the region
being studied. Less common land covers such as millet, sorghum, and urban areas represent intermediate
tn
33
34 percentages, reflecting the heterogeneous landscape that includes both agricultural and urbanized zones.
rin
ep
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
ed
iew
r ev
Figure 3: Land Cover by Percentage of Observations
er
pe
1 Feature Importance
2 The interpretation of model behavior using SHAP values has allowed for a deeper understanding of how
3 different spectral features impact the model’s predictions, which is critical for refining the feature selection
4 process. By analyzing both the mean and maximum SHAP values, we were able to prioritize features based
ot
5 on their overall impact as well as their critical contributions to specific model decisions.
6 In the two summary plots below, we display the SHAP values for each feature, to identify how much impact
7 each feature has on the model output for pixels in the validation dataset. Features are sorted by the sum of
tn
8 the SHAP value across all samples. The figures bar length represents the mean contributions to explaining
9 each predicted land class value - with different land classes represented with different colors (hues). This
10 visualization provides a comprehensive overview of the feature importance, highlighting the key predictors
11 that drive the model’s predictions. For example, features that are highly influential for “maize” may not be
12 as impactful for “rice” or “sorghum”, reflecting the unique spectral signatures of these crops.
rin
13 Mean SHAP Values In Figure 4, the mean SHAP values provide insights into the average impact of each
14 feature across all predictions. This analysis highlights the features that consistently influence the model’s
15 output across various scenarios. For example, the mean value of B11 (B11.mean) and the 5th percentile
16 of hue (hue.quantile.q.0.05) features were found to have substantial average impacts on model outputs,
suggesting their strong relevance in distinguishing between different crop types. Reflecting on the colors
ep
17
18 of the bars we can see that ‘B11.mean’ is important in distinguishing sunflower, sorghum, and millet to a
19 roughly equal degree, and has some small impact on distinguishing other classes. While ‘hue.quantile.q.0.05’
20 has the strongest effect distinguishing rice, sunflower, and to a lesser degree cotton. Looking down the
21 list we can see that features like “EVI.standard.deviation” are most effective at isolating urban areas, and
‘B12.mean.second.derivative.central’ substantively differentiates shrub from other classes. Note that the
Pr
22
23 mean second derivative of B12 is a measure of the rate of change of the rate of change of the B12 band over
24 time, so positive values indicate increasing rate of change (increasingly upward trend), and negative values
25 with decreasing rate of change (increasingly downward trend).
10
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
ed
iew
r ev
er
pe
ot
2 On the other hand, Figure 5, maximum SHAP values uncover features that, while perhaps not consistently
3 influential, have high impacts under particular conditions. This aspect of the analysis is crucial for identifying
4 features that can cause significant shifts in model output, potentially corresponding to specific agricultural
5 or environmental contexts. Features such as “hue.median” and “B11.maximum” show high maximum SHAP
rin
6 values, indicating their pivotal roles in determining certain classes. For instance, “B11.maximum” reflects
7 peak reflectance in the Short-Wavelength Infrared (SWIR), which could be critical in identifying crops at
8 their maximum biomass, like sunflower at full bloom compared to other crops at different stages of growth.
ep
Pr
11
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
ed
iew
r ev
er
pe
ot
The final selection of features for model training was carefully curated to include all 30 of both the highest
tn
2 mean and maximum SHAP values, ensuring a comprehensive set of predictors for accurate and reliable
3 classification of crop types in Tanzania. This strategic selection process not only improved model accuracy
4 but also enhanced our understanding of the spectral characteristics most relevant for distinguishing among
5 the diverse agricultural landscapes of the region.
rin
6 Model Selection
7 Optuna trials tuning results selected LightGBM (Ke et al. 2017)is a gradient boosting algorithm that
8 combines many simple decision trees to produce a stronger single model, improving the model at each step.
9 LightGBM grows decision trees leaf-wise rather than adding different levels, thereby targeting branches that
ep
10 most need refining. Here we find an optimal bagging fraction of approximately 0.58, bagging frequency of 3,
11 learning rate of 0.025, max depth of 35, minimum data in each leaf of 154, and the number of leaves set at
12 51.
13 Model Performance
Pr
14 The classification model demonstrated robust performance across multiple land cover classes, as evidenced
15 by the out-of-sample mean confusion matrix with a Cohen’s Kappa score of 0.800 and F1-micro score of 0.822
16 (Table 2, indicating substantial agreement between predicted and actual classifications. Remember that each
12
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 field is treated as a ‘group’ in the group k-fold procedure to ensure that pixels from the same field are not
ed
2 split between the testing and training groups. The confusion matrix (Figure 6) shows high diagonal values
3 for most classes, highlighting the model’s ability to accurately identify specific land covers. For instance, rice
4 and urban categories achieved classification accuracies of 90% and 94%, respectively. Other well-classified
5 categories included millet, maize, sunflower, tidal, water, shrubs, and forest, each with over 70% accuracy.
6 However forest is primarily confused with the category shrub, which is likely a result of poor training data
iew
7 and the difficulty of visually determining trees versus shrubs from high-res imagery without the benefit of
8 field visits.
9 Categories such as sorghum, sunflower and cotton displayed moderate confusion with other classes, indicating
10 potential areas for model improvement, especially in distinguishing features that are common between similar
11 crop types. Notably, the ‘other’ category showed a broader distribution of misclassifications, likely due to
12 its encompassing a diverse range of less frequent land covers, achieving a lower accuracy of 40%. However
this class is irrelevant to the objectives of this paper.
ev
13
r
er
pe
ot
tn
rin
14 The overall high out-of-sample performance in Table 2 across the majority of categories suggests that the
15 model is effective for practical applications in land cover classification, though further refinement is recom-
16 mended for categories showing lower accuracy and higher misclassification rates.
17 We can compare our results across multiple models using the figure below in 7 from (Kerner et al. 2024).
Pr
18 This plot represents multiple performance metrics of land cover models that include an ‘agricultural’ category
19 specifically for Tanzania. Our model’s performance is indicated by the dashed line. The high level of
20 performance - particularly for the more challenging F1 score - is not surprising given that our model is
13
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
Metric Value
ed
Balanced Accuracy 0.79
Kappa Accuracy 0.80
Accuracy 0.82
F1 Micro Accuracy 0.82
iew
Table 2: Summary of Classification Metrics
2
1 specifically trained on Tanzanian data, while the other models are typically global or regional models. On
2 the other hand, most of these models include only a single ‘agricultural’ class, meaning their prediction task
3 is a significantly easier one than the one presented here. Given this our strong out-of-sample performance is
ev
4 notable.
r
er
pe
ot
Land cover model performance metrics for Tanzania - dashed line indicates this paper’s model
out-of-sample performance across all land covers
6 The integration of crowdsourced data with traditional machine learning and engineered time-series features
7 yielded a robust model for crop classification in Tanzania. While the model performed exceptionally well for
8 crops like maize and rice, some confusion persisted among similar crop types such as sorghum and cotton.
9 This suggests that additional discriminative features or more extensive training data may be necessary to
10 further enhance classification accuracy for these crops. The challenges encountered, such as variability in
ep
11 crop cycles and challenges of crop identification, highlight the complexities of agricultural monitoring in
12 resource-limited settings. Addressing these issues in future research could improve model performance and
13 generalizability. Overall, our findings demonstrate the practicality of using efficient, interpretable machine
14 learning methods in conjunction with community-driven data collection to advance agricultural monitoring
15 in developing regions.
Pr
14
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 Conclusion
ed
2 In this study, we introduced a novel methodology for crop type classification in Tanzania by leveraging crowd-
3 sourced data and time-series features extracted from Sentinel-2 satellite imagery. By combining advanced
4 remote sensing techniques with local knowledge, we addressed significant gaps in agricultural monitoring
5 within resource- and data-limited settings. Our approach gathered a new dataset and successfully applied it
to a real-world task at very low cost, using traditional machine learning algorithms augmented with carefully
iew
6
ev
13 outperforms broadly used land cover models that perform the simpler task of classifying ‘agriculture’ without
14 specifying the crop type. This highlights the need for better and more frequent crop type classification data.
15 By interpreting feature importance using SHAP values, we gained a deeper understanding of the model’s
16 behavior and the key predictors driving its predictions. Identifying the most influential features across
different land cover types allowed us to refine the feature selection process, ensuring that the selected features
r
17
21
22
23
er
augmented with carefully engineered time-series features for crop type classification in data-scarce environ-
ments. By “turning back the clock” on deep learning, we demonstrate that applying a limited yet salient
set of filters—such as measures of trends, distribution descriptions, and complexity metrics—can capture
essential temporal dynamics without the extensive data requirements of deep learning models. This method-
pe
24 ology not only achieved high classification accuracy but also enhanced interpretability and computational
25 efficiency.
26 Our findings highlight that traditional machine learning techniques, combined with advanced yet compu-
27 tationally efficient feature extraction methods, offer a practical and effective alternative to deep learning,
28 particularly in low-information settings prevalent in developing regions. This approach facilitates accurate
29 crop type classification and contributes valuable insights for sustainable agricultural practices and informed
ot
30 policy-making, ultimately impacting food security and land management in resource-limited contexts.
tn
rin
ep
Pr
15
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 Appendix
ed
2 Acknowledgments
3 The United States Agency for International Development generously supports this program through a grant
4 from the USAID GeoCenter under Award # AID-OAA-G-15-00007 and Cooperative Agreement Number:
5 7200AA18CA00015
iew
6 Time Series Features Description
7 The following table provides a comprehensive list of the time series features extracted from the satellite
8 imagery using the xr_fresh module. These features capture the temporal dynamics of crop growth and
9 development, providing valuable information on the phenological patterns of different crops. The computed
10 metrics encompass a wide range of statistical measures, changes over time, and distribution-based metrics,
ev
11 offering a detailed analysis of the temporal patterns in the study area.
r
𝑛−1
Absolute Sum of Changes sum over the absolute value of ∑𝑖=1 ∣ 𝑥𝑖+1 − 𝑥𝑖 ∣
consecutive changes in the series
1 𝑛−𝑙
Autocorrelation (1 & 2
month lag)
Count Above Mean
er
Correlation between the time se-
ries and its lagged values
Number of values above the
mean
(𝑛−𝑙)𝜎2 ∑𝑡=1 (𝑋𝑡 − 𝜇)(𝑋𝑡+𝑙 − 𝜇)
𝑛
̄ 𝑖 −𝑡)̄
∑𝑖=1 (𝑥𝑖 −𝑥)(𝑡
Linear Time Trend Linear trend coefficient esti- 𝑏= 𝑛
∑𝑖=1 (𝑥𝑖 −𝑥)̄ 2
mated over the entire time series
Longest Strike Above Longest consecutive sequence of —
Mean values above the mean
rin
1 𝑛
Mean Mean value of the time series 𝑥̄ = 𝑛 ∑𝑖=1 𝑥𝑖
1 𝑛−1
Mean Absolute Change Mean of absolute differences be- 𝑛−1 ∑𝑖=1 |𝑥𝑖+1 − 𝑥𝑖 |
tween consecutive values
1 𝑛−1
Mean Change Mean of the differences between 𝑛−1 ∑𝑖=1 𝑥𝑖+1 − 𝑥𝑖
Pr
consecutive values
1 𝑛−1 1
Mean Second Derivative measure of acceleration of 2(𝑛−2) ∑𝑖=1 2 (𝑥𝑖+2 − 2 ⋅ 𝑥𝑖+1 + 𝑥𝑖 )
Central changes in a time series data
16
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
Statistic Description Equation
ed
Median Median value of the time series 𝑥̃
Minimum Minimum value of the time series 𝑥min
Quantile (q = 0.05, 0.95) Values representing the speci- 𝑄0.05 , 𝑄0.95
fied quantiles (5th and 95th per-
iew
centiles)
1 𝑛
Ratio Beyond r Sigma Proportion of values beyond r 𝑃𝑟 = 𝑛 ∑𝑖=1 (|𝑥𝑖 − 𝑥|̄ > 𝑟𝜎𝑥 )
(r=1,2,3) standard deviations from the
mean
3
Skewness Measure of the asymmetry of the 𝑛
(𝑛−1)(𝑛−2) ∑ ( 𝑋𝑖𝑠−𝑋 )
ev
time series distribution
𝑛
Standard Deviation Standard deviation of the time √ 𝑁1 ∑𝑖=1 (𝑥𝑖 − 𝑥)̄ 2
series
𝑛
Sum Values Sum of all values in the time se- 𝑆 = ∑𝑖=1 𝑥𝑖
r
ries
Symmetry Looking Measures the similarity of the |𝑥mean − 𝑥median | < 𝑟 ∗ (𝑥max − 𝑥min )
time series when flipped horizon-
17
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 References
ed
2 Akiba, Takuya, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. “Optuna: A
3 Next-Generation Hyperparameter Optimization Framework.” In Proceedings of the 25th ACM SIGKDD
4 International Conference on Knowledge Discovery and Data Mining.
5 Bégué, Agnès, Damien Arvor, Beatriz Bellon, Julie Betbeder, Diego De Abelleyra, Rodrigo P. D. Ferraz,
6 Valentine Lebourgeois, Camille Lelong, Margareth Simões, and Santiago R. Verón. 2018. “Remote
iew
7 Sensing and Cropping Practices: A Review.” Remote Sensing 10 (1). https://doi.org/10.3390/rs100100
8 99.
9 Bégué, Agnès, Damien Arvor, Beatriz Bellon, Julie Betbeder, Diego De Abelleyra, Rodrigo PD Ferraz,
10 Valentine Lebourgeois, Camille Lelong, Margareth Simões, and Santiago R. Verón. 2018. “Remote
11 Sensing and Cropping Practices: A Review.” Remote Sensing 10 (1): 99.
12 Chao, Steven, Ryan Engstrom, Michael Mann, and Adane Bedada. 2021. “Evaluating the Ability to Use Con-
13 textual Features Derived from Multi-Scale Satellite Imagery to Map Spatial Patterns of Urban Attributes
ev
14 and Population Distributions.” Remote Sensing 13 (19): 3962.
15 Christ, Maximilian, Nils Braun, Julius Neuffer, and Andreas W Kempa-Liehr. 2018. “Time Series Feature
16 Extraction on Basis of Scalable Hypothesis Tests (Tsfresh–a Python Package).” Neurocomputing 307:
17 72–77.
Delince, J, G Lemoine, P Defourny, J Gallego, A Davidson, S Ray, O Rojas, J Latham, and F Achard. 2017.
r
18
22
23
24
412.
er
FAS. n.d. “Tanzania Production.” https://ipad.fas.usda.gov/countrysummary/default.aspx?id=TZ; USDA.
https://ipad.fas.usda.gov/countrysummary/default.aspx?id=TZ.
Google. n.d. “Ee.image.rgbtohsv Google Earth Engine.” Google Earth Engine. Google. https://developers
pe
25
26 .google.com/earth-engine/apidocs/ee-image-rgbtohsv.
27 Graesser, Jordan, Anil Cheriyadat, Ranga Raju Vatsavai, Varun Chandola, Jordan Long, and Eddie Bright.
28 2012. “Image Based Characterization of Formal and Informal Neighborhoods in an Urban Landscape.”
29 IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5 (4): 1164–76.
30 Hersh, Jonathan, Ryan Engstrom, and Michael Mann. 2021. “Open Data for Algorithms: Mapping Poverty
31 in Belize Using Open Satellite Derived Features and Machine Learning.” Information Technology for
ot
36 Huang, Xin, Liangpei Zhang, and Pingxiang Li. 2007. “Classification and Extraction of Spatial Features
37 in Urban Areas Using High-Resolution Multispectral Imagery.” IEEE Geoscience and Remote Sensing
38 Letters 4 (2): 260–64.
39 Ibrahim, Esther Shupel, Philippe Rufin, Leon Nill, Bahareh Kamali, Claas Nendel, and Patrick Hostert.
40 2021. “Mapping Crop Types and Cropping Systems in Nigeria with Sentinel-2 Imagery.” Remote Sensing
rin
41 13 (17): 3523.
42 Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu.
43 2017. “Lightgbm: A Highly Efficient Gradient Boosting Decision Tree.” Advances in Neural Information
44 Processing Systems 30.
45 Kerner, Hannah, Catherine Nakalembe, Adam Yang, Ivan Zvonkov, Ryan McWeeny, Gabriel Tseng, and
ep
46 Inbal Becker-Reshef. 2024. “How Accurate Are Existing Land Cover Maps for Agriculture in Sub-
47 Saharan Africa?” Scientific Data 11 (1): 486.
48 Li, Haijun, Xiao-Peng Song, Matthew C Hansen, Inbal Becker-Reshef, Bernard Adusei, Jeffrey Pickering, Li
49 Wang, et al. 2023. “Development of a 10-m Resolution Maize and Soybean Map over China: Matching
50 Satellite-Based Crop Classification with Sample-Based Area Estimation.” Remote Sensing of Environment
294: 113623.
Pr
51
52 Li, Yansheng, Xinwei Li, Yongjun Zhang, Daifeng Peng, and Lorenzo Bruzzone. 2023. “Cost-Efficient Infor-
53 mation Extraction from Massive Remote Sensing Data: When Weakly Supervised Deep Learning Meets
54 Remote Sensing Big Data.” International Journal of Applied Earth Observation and Geoinformation 120:
18
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 103345. https://doi.org/https://doi.org/10.1016/j.jag.2023.103345.
ed
2 Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In
3 Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio,
4 H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4765–74. Curran Associates, Inc. http:
5 //papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.
6 Ma, Lei, Yu Liu, Xueliang Zhang, Yuanxin Ye, Gaofei Yin, and Brian Alan Johnson. 2019. “Deep Learning
iew
7 in Remote Sensing Applications: A Meta-Analysis and Review.” ISPRS Journal of Photogrammetry and
8 Remote Sensing 152: 166–77. https://doi.org/https://doi.org/10.1016/j.isprsjprs.2019.04.015.
9 Mann, Michael L. 2024. “xr_fresh: Python Package for Feature Extraction from Raster Data Time Series.”
10 Zenodo. https://doi.org/10.5281/zenodo.12701466.
11 Maxwell, Aaron E., Timothy A. Warner, and Luis Andrés Guillén. 2021. “Accuracy Assessment in Convo-
12 lutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 2: Recommendations and
13 Best Practices.” Remote Sensing 13 (13). https://doi.org/10.3390/rs13132591.
ev
14 Morton, Douglas C, Ruth S DeFries, Yosio E Shimabukuro, Liana O Anderson, Egidio Arai, Fernando del Bon
15 Espirito-Santo, Ramon Freitas, and Jeff Morisette. 2006. “Cropland Expansion Changes Deforestation
16 Dynamics in the Southern Brazilian Amazon.” Proceedings of the National Academy of Sciences 103 (39):
17 14637–41.
18 Owusu, Maxwell, Ryan Engstrom, Dana Thomson, Monika Kuffer, and Michael L. Mann. 2023. “Mapping
r
19 Deprived Urban Areas Using Open Geospatial Data and Machine Learning in Africa.” Urban Science 7
20 (4). https://doi.org/10.3390/urbansci7040116.
21 Owusu, Maxwell, Arathi Nair, Amir Jafari, Dana Thomson, Monika Kuffer, and Ryan Engstrom. 2024.
22
23
24
25
er
“Towards a Scalable and Transferable Approach to Map Deprived Areas Using Sentinel-2 Images and
Machine Learning.” Computers, Environment and Urban Systems 109: 102075.
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011.
“Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.
pe
26 Pesaresi, Martino, and Jon Atli Benediktsson. 2001. “A New Approach for the Morphological Segmentation
27 of High-Resolution Satellite Imagery.” IEEE Transactions on Geoscience and Remote Sensing 39 (2):
28 309–20.
29 Teixeira, Igor, Raul Morais, Joaquim J. Sousa, and António Cunha. 2023. “Deep Learning Models for the
30 Classification of Crops in Aerial Imagery: A Review.” Agriculture 13 (5). https://doi.org/10.3390/agri
31 culture13050965.
Tseng, Gabriel, Ivan Zvonkov, Catherine Lilian Nakalembe, and Hannah Kerner. 2021. “CropHarvest:
ot
32
36 Yang, Zhongguo, Irshad Ahmed Abbasi, Elfatih Elmubarak Mustafa, Sikandar Ali, and Mingzhu Zhang.
37 2021. “An Anomaly Detection Algorithm Selection Service for IoT Stream Data Based on Tsfresh Tool
38 and Genetic Algorithm.” Security and Communication Networks 2021 (1): 6677027.
rin
ep
Pr
19
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897