0% found this document useful (0 votes)

31 views19 pages

Base Paper

Uploaded by

saiprajwal7244

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views19 pages

Base Paper

Uploaded by

saiprajwal7244

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

ed

Lite Learning: Eﬀicient Crop Classification in Tanzania Using

iew
Traditional Machine Learning & Crowd Sourcing

Michael L. Mann∗ Lisa Colson† Rory Nealon‡ Ryan Engstrom§

Stellamaris Nakacwa¶

ev
Abstract
This study introduces a novel methodology for crop type classification in Tanzania by inte-

r
grating crowdsourced data with time-series features extracted from Sentinel-2 satellite imagery.
Leveraging the YouthMappers network, we collected ground validation data on various crops,
including challenging types such as cassava, millet, sunflower, sorghum, and cotton across a
range of agricultural areas. Traditional machine learning algorithms, augmented with carefully

er
engineered time-series features, were employed to map the different crop classes. Our approach
achieved high classification accuracy, evidenced by a Cohen’s Kappa score of 0.80 and an F1-
micro score of 0.82. The model often match or outperform broadly used land cover models which
simply classify ‘agriculture’ without specifying crop types. By interpreting feature importance
pe
using SHAP values, we identified key time-series features driving the model’s performance,
enhancing both interpretability and reliability. Our findings demonstrate that traditional ma-
chine learning techniques, combined with computationally eﬀicient feature extraction methods,
offer a practical and effective “lite learning” approach for mapping crop types in data-scarce
environments. This methodology facilitates accurate crop type classification using a low-cost,
resource-limited approach that contributes valuable insights for sustainable agricultural prac-
tices and informed policy-making, ultimately impacting food security and land management in
ot

resource-limited contexts, such as sub-Saharan Africa.

tn
rin

∗ The George Washington University, Washington DC 20052, Corresponding author. Email

mmann1123@gmail.com
† USDA Foreign Agricultural Service, Washington DC 20250
ep

‡ USAID GeoCenter, Washington DC 20523

§ The George Washington University, Washington DC 20052
¶ YouthMappers, Texas Tech University, Lubbock TX 79409

1
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 Introduction

ed
2 Background and Context
3 The free access to remotely sensed data, such as imagery from satellites (e.g. Sentinel-2, Landsat), has
4 allowed for crop type classification in developing countries. By leveraging the power of advanced imaging
technologies combined with machine learning algorithms, researchers and practitioners can now identify and

iew
5

6 map different crop types over large geographic areas at no or low cost (Hersh, Engstrom, and Mann 2021).
7 This has the potential to improve food security, land use planning, and agricultural policy in regions where
8 ground-based data collection is limited or non-existent (Bégué, Arvor, Bellon, Betbeder, De Abelleyra, P. D.
9 Ferraz, et al. 2018; H. Li et al. 2023; Ibrahim et al. 2021).
10 In recent years, machine learning approaches have emerged as powerful tools for crop type classification using
11 remotely sensed data. Specifically, methods based on machine learning algorithms have gained recognition

ev
12 for their effectiveness in matching valuable spectral information from satellite imagery to observations of crop
13 type for particular locations. Machine learning algorithms, including decision trees, random forests, support
14 vector machines (SVM), and k-nearest neighbors (KNN), have been successfully used to classify imagery into
15 unique agricultural types (Ibrahim et al. 2021; Bégué, Arvor, Bellon, Betbeder, De Abelleyra, PD Ferraz, et
16 al. 2018; Delince et al. 2017). These algorithms leverage the rich spectral information captured by satellite

r
17 sensors, allowing them to identify distinctive patterns associated with different crop types. By training on
18 large labeled datasets where ground-validation information on crop types is linked to corresponding image
19 pixels, these models can effectively learn the relationships between the spectral characteristics of crops and
20

22
er
their respective classes (Bégué, Arvor, Bellon, Betbeder, De Abelleyra, P. D. Ferraz, et al. 2018).
The strength of traditional machine learning approaches lies in their ability to exploit both the spectral
and time-series patterns within the remotely sensed data. Traditional machine learning approaches offer
advantages in terms of interpretability and computational eﬀiciency compared to deep learning architectures.
pe
23

24 They provide insight into the decision-making process and can be more readily understood and explained
25 by domain experts. Additionally, these methods are generally less computationally demanding and require
26 less training data, making them suitable for applications with limited computational resources (Höhl et al.
27 2024; Maxwell, Warner, and Guillén 2021; Teixeira et al. 2023; Y. Li et al. 2023; Ma et al. 2019).
28 Traditional machine learning algorithms require the extraction of variables (e.g. max EVI, mean B2) that
can help distinguish different plant or crop types (Bégué, Arvor, Bellon, Betbeder, De Abelleyra, PD Ferraz,
ot

30 et al. 2018). The development of salient time-series features to capture phenological differences between
31 locations from remotely sensed images remains a challenge. These features are typically derived from the
32 spectral bands (e.g. red edge, NIR) of the satellite imagery or indexes, such as the enhanced vegetation index
tn

33 (EVI), and basic time series statistics (e.g. mean, max, minimum, slope) for the growing season (Morton et
34 al. 2006). Meanwhile a broader set of time series statistics from bands or indexes may be more relevant
35 for a number of applications. For instance the skewness of EVI might help distinguish crops that green-up
36 earlier vs later in the season, measures of the numbers of peaks in EVI might help differentiate intercropping
37 or multiple plantings in a season (Bégué, Arvor, Bellon, Betbeder, De Abelleyra, PD Ferraz, et al. 2018).
However, the selection and extraction of these features can be time-consuming and labor-intensive, requiring
rin

39 domain expertise and manual intervention.

40 In contrast, deep learning methods have dominated the most recent literature (Teixeira et al. 2023; Höhl et
41 al. 2024). These methods include both recurrent neural networks (RNN) and convolutional neural networks
42 (CNN). Recurrent Neural Networks (RNNs) are a class of neural networks that are particularly powerful for
ep

43 modeling sequential data such as time series, speech, text, and audio. The fundamental feature of RNNs is
44 their ability to maintain a ‘memory’ of previous inputs by using their internal state (hidden layers), which
45 allows them to exhibit dynamic temporal behavior. RNNs and its variants allow the integration of time-
46 series imagery, significantly improving crop type classification outcomes especially in data rich environments
47 [Teixeira et al. (2023); camps2021deep]. Deep learning approaches however typically require much larger
Pr

48 sets of training data, may be more prone to overfitting especially with small sample sizes, have significant
49 limitations to interpretability, and require expensive compute (Höhl et al. 2024; Maxwell, Warner, and
50 Guillén 2021; Teixeira et al. 2023; Y. Li et al. 2023; Ma et al. 2019). Although recent efforts have closed

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 the gap e.g. (Tseng et al. 2021), the lack of readily available and reliable ground truth data or benchmark

ed
2 datasets for training, as discussed earlier, may limit the applicability of deep learning for a variety of tasks
3 including crop classification and make researchers more reliant of less reliable techniques like transfer learning
4 or zero-shot or low shot methods (Owusu et al. 2024; Y. Li et al. 2023; Ma et al. 2019). Moreover, training
5 data for extreme events, like crop losses, disease, and lodging are largely non-existant. Interpretability is also
6 a salient weakness as interpretation of models allows us to gain scientific insight and assess trustworthiness

iew
7 and fairness in so far as outputs affect policy decisions.
8 An alternative approach turns back the clock on deep learning approaches. For instance CNN classifiers,
9 through the exertion of tremendous effort of GPUs, can apply and learn from thousands of filters or convo-
10 lutions that help detect distinct features like edges, textures or patterns. It is however possible to apply a
11 more limited yet salient set of filters like Fourier Transforms, Differential Morphological Profiles (Pesaresi
12 and Benediktsson 2001), Line Support Regions or Structural Feature Sets (Huang, Zhang, and Li 2007)
amongst others, to images and then use these as features in more traditional machine learning approaches

ev
13

14 Engstrom, Hersh, and Newhouse (2022). This approach may be particularly useful in data-scarce environ-
15 ments, requiring less training data and potentially offering more eﬀicient results in low-information settings.
16 The same approach has been taken for time series analysis, where instead of learning patterns through a
17 RNNs memory, we can apply a more limited but potentially salient series of time series filters. Measures of
trends, descriptions of distributions, or measures of change and complexity might adequately describe time

r
18

19 series properties for regression and classification tasks (Christ et al. 2018; Yang et al. 2021). This time
20 series filter approach, developed for this paper, can also be applied on a pixel-by-pixel basis to satellite image
21

24
bands or index values(Mann, Michael L. 2024).

er
Field-collected data provides the necessary validation and calibration for remote sensing-based models. It
serves as the benchmark against which the model’s predictions are evaluated and refined. Ground validation
data collected through field visits, observation, and interactions with local farmers offers essential insights
pe
25 into the specific crop types present in the study area. Validating and training models with accurate ground
26 reference information allows for the spectral patterns captured by remote sensing data to be correctly as-
27 sociated with the corresponding crop. By combining the spectral information from satellite imagery with
28 ground validation data, researchers can develop robust models that effectively differentiate between different
29 crop types based on their unique spectral signatures and temporal patterns.
30 The collection of field observations and ground validation data is a critical input for the development of
ot

31 models to classify crop types (Delince et al. 2017; Ma et al. 2019). However, obtaining accurate and timely
32 ground validation data can be challenging in developing countries due to limited resources, infrastructure, and
33 local capacity (Delince et al. 2017; Bégué, Arvor, Bellon, Betbeder, De Abelleyra, PD Ferraz, et al. 2018).
34 In many cases, researchers rely on crowdsourced data from volunteers or citizen scientists to supplement or
tn

35 validate ground truth data collected through traditional methods. Projects like (Tseng et al. 2021) point
36 to the paucity of multi-class crop type datasets globally. This is a significant gap in the field of crop type
37 classification, as the availability of high-quality training data is essential for the development of accurate and
38 reliable machine learning models (Maxwell, Warner, and Guillén 2021).
In this study we aim to address two critical challenges in the field of crop type classification: the lack of
rin

40 in-season multi-class crop type datasets, and the need for new methods to obtain high accuracy crop type
41 predictions from limited amounts of training data.
42 We propose a novel approach that combines crowdsourced data with a new automated approach to extracting
43 time-series features from satellite imagery. We apply this new approach to classify crop types in Northern
ep

44 Tanzania. By leveraging the power of crowdsourcing and remote sensing technologies, we aim to develop a
45 robust and scalable solution for crop type classification that can be adapted to other regions and contexts
46 with a minimal or no cost.

Data & Methods

48 Data for this study were collected from multiple sources, including satellite imagery, and crowdsourced
49 ground truth observations. The section below describes the input data and methods used throughout the

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 paper.

ed
2 Study Area
3 The study was conducted in 50 wards within three major districts of Arusha, Dodoma and Mwanza in
4 Tanzania as seen in Figure 1. Tanzania, a country in East Africa, is known for its diverse agricultural

iew
5 landscape. The region is characterized by a mix of smallholder farms, commercial plantations, and natural
6 vegetation, making it an ideal yet challenging location for studying crop type classification. Our choice of
7 these three districts was driven by the distinct variation in the major crop types that possibly dominated in
8 each district,among oil seeds, grains and commercial crops such as cotton.

r ev
er
pe
ot
tn
rin

Figure 1: Study area map

Districts in Northern Tanzania where field visits were carried out (green)

9 Crowd Sourced Crop Data

10 Crop type data collection was designed and executed by YouthMappers through a crowdsourced GIS ap-
11 proach. The method was designed in 3 steps where: 1) Development of and training all intended student
12 participants. 2) Data collection using KoboToolbox hosting a well developed data model. The exercise lasted
13 14 days with 7 days of iterative pilot testing on different farms, crops and landscapes. Finally the last step,
14 3) was the data review and cleaning phase to generate a sample for training.
Pr

15 Additional training data was collected utilizing high resolution imagery from Google Earth. These data were
16 used to supplement the crowdsourced data and improve the model’s ability to distinguish between crops
17 and more common land cover types like forests, urban areas, and water. The final cleaned dataset includes

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 1,400 crop type observations of rice, maize, cassava, sunflower, sorghum, cotton, and millet; plus 386 other

ed
2 observations of land cover classes including water, tidal areas, forest, shrub and urban.

3 Data Collection Methods To ensure the success of our project, we focused heavily on the design of our
4 data collection methods. These methods were carefully integrated, taking into account: the crop calendar,
5 information on the different stages of crop development, the distances between crop fields, the tools used,

iew
6 and data quality assurance.
7 Young crops exhibit significant differences compared to mature crops in terms of color, density, and pheno-
8 logical development. Variations in the crop cycle across different fields could lead to heteroscedasticity in the
9 spectral reflectance measurements used for machine learning (ML) training, thereby affecting the precision
10 and accuracy of the model. By targeting the period of April through May we aimed to capture crops late in
11 the growing season and yet before harvest as seen in the crop calendar in Figure 2 below.

r ev
er
pe
ot

Figure 2: Tanzania Crop Calendar

12 Source: (FAS, n.d.)

13 USDA’s Foreign Agricultural Service compiles information on planting and harvest windows for grain, oilseed,
14 and cotton crops as an important tool to support crop condition assessments with satellite imagery. Tanza-
15 nia’s crop planting seasons are shaped by its bimodal and unimodal rainfall patterns, which vary by region.
rin

16 In the north and northeast, bimodal areas experience the short rains (Vuli) from late-October to mid-January,
17 during which crops like maize, beans, and vegetables are planted in October and November, and the long
18 rains (Masika) from March to May, supporting crops like maize, rice, sorghum, and cassava, typically planted
19 in February and March. In the central, southern, and western regions with unimodal rainfall, there is a sin-
20 gle rainy season from November to April, when crops such as cotton, maize, millet, rice, and sunflower are
ep

21 planted in November and December. This diversity in rainfall patterns allows for a wide variety of crops
22 suited to the local climate and seasonal conditions.
23 The data collection took place between late April and May as shown in figure 2 to align with mid-season
24 for many crops. YouthMappers were advised to focus on a set of target crops, ones known to be present in
the region and at appropriate crop growth stages. Before embarking on data collection, discussions covered
Pr

26 several factors to consider in selecting field collection sites. Factors included field size to establish a minimum
27 detectable by the satellite imagery, clear and open fields to enhance clean spectra sampling, prioritizing areas
28 covered by a single crop to reduce confusion, sampling distribution of at least one kilometer between stops,

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 and even crop maturity and health. YouthMappers were advised to identify only fields 30 meters or greater

ed
2 across to ensure a minimum size detectable by the satellite imagery. When picking between fields for data
3 collection, defining clear and open fields was discussed with several examples, as agriculture can include mixed
4 land cover types with tree cover, power lines, buildings, and other obstructions that prevent the satellite
5 from cleanly capturing spectra of only the crop. YouthMappers were advised to only pick clear and open
6 fields and prioritize those growing only one crop. The recommendation to have a sampling distribution of at

iew
7 least one kilometer was a compromise between the amount of time available for data collection, the expense
8 of travel, and a suﬀicient distribution to reduce spatial autocorrelation. It was permitted for YouthMappers
9 to identify adjacent fields growing different types of crops, but otherwise highly encouraged for them to
10 return to the vehicle and drive the 1 km to collect more data. The most important factors driving the timing
11 for data collection were crop maturity and health. The fieldwork was conducted between late-April to May
12 2023 because the target crops typically reach reproductive stages with maximum canopy cover during this
13 time of year. This crop stage is best suited for discerning different crop types with satellite imagery. While

ev
14 most fields were found in late reproductive stages, drought conditions impacted the health of some fields.
15 YouthMappers were advised to prioritize and identify mature, lush green fields, as ideal data collection sites.
16 By thoroughly discussing each of these factors, we trained YouthMappers to select fields best suited as in-situ
17 training data for satellite imagery analysis.
The data collection was managed through KoboCollect, hosted on the KoboToolBox infrastructure, which

r
18

19 provided an effective platform for gathering and organizing data. This approach enabled a collection of the
20 desired volume of data points necessary for model training and evaluation, as summarized in Table 1.

Desired points:
Arusha
300
er
Mwanza
1000
Dodoma
800
pe
Crops: Maize Maize Sorghum
Rice Cotton Maize
Sorghum & Millet Rice Millet
Peanuts or Groundnut Sunflower
Peanuts or Groundnut
Cotton Fields
ot

Table 1: Collection Targets and Primary Crops by Region in Tanzania

21 Satellite Imagery
22 Satellite imagery was obtained from the Sentinel-2 satellite constellation, which provides high-resolution
23 multispectral data at 10-meter spatial resolution. The imagery was acquired over the study area between
24 January and August of 2023 during the growing season, capturing the spectral characteristics of different
crop types and coinciding with field data collection. The Sentinel-2 L2 harmonized reflectance data were
rin

26 pre-processed to remove noise and atmospheric effects, ensuring that the spectral information was accurate
27 and reliable for classification purposes (Bégué, Arvor, Bellon, Betbeder, De Abelleyra, PD Ferraz, et al.
28 2018).
29 In our study, cloud and cloud shadow contamination was mitigated using the ‘s2cloudless’ machine learning
ep

30 model on the Google Earth Engine platform. Cloudy pixels were identified using a cloud probability mask,
31 with pixels having a probability above 50% classified as clouds. To detect cloud shadows, we used the Near-
32 Infrared (NIR) spectrum to flag dark pixels not identified as water as potential shadow pixels. The projected
33 shadows from the clouds were identified using a directional distance transform based on the solar azimuth
34 angle from the image metadata. A combined cloud and shadow mask was refined through morphological
Pr

35 dilation, creating a buffer zone to ensure comprehensive coverage. This mask was applied to the Sentinel-2
36 surface reflectance data to exclude all pixels identified as clouds or shadows, enhancing the reliability of the
37 dataset for environmental analysis.

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 Monthly composites were collected for January through August of 2023 for the the bands B2 Blue (458-

ed
2 523nm), B6 Vegetation Red Edge (733-738nm), B8 Near Infrared (785-899nm), B11 Short-Wave Infrared
3 (SWIR)(1565-1655nm), and B12 Short Wave Infrared (2100-2280nm). Sw. We also calculate the Enhanced
4 Vegetation Index (EVI) and hue, the color spectrum value (Google, n.d.). This computed hue value provides
5 the basic color as perceived in the color wheel, from red, through green, blue, and back to red for each pixel.
6 Due to the high prevalence of clouds in the region, linear interpolation was used to fill in missing data in the

iew
7 time series using xr_fresh (Mann, Michael L. 2024). These bands were selected based on their relevance
8 to crop type classification and their ability to capture the unique spectral signatures of different crops. The
9 monthly composites were used to generate time series features for each pixel in the study area, providing
10 valuable information on the temporal dynamics of crop growth and development.

11 Time Series Features

Time series features capture the temporal dynamics of crop growth and development, providing valuable

ev
12

13 information on the phenological patterns of different crops. We leverage the time series nature of the
14 satellite imagery to extract relevant features for crop type classification.
15 In this study, we utilized the xr_fresh toolkit to compute detailed time-series statistics for various spectral
16 bands, facilitating comprehensive pixel-by-pixel temporal analysis (Mann, Michael L. 2024). The xr_fresh

r
17 framework is specifically designed to extract a wide array of statistical measures from time-series data, which
18 are essential for understanding temporal dynamics in remote sensing datasets.
19

22
er
The metrics computed by xr_fresh in this study include basic statistical descriptors, changes over time, and
distribution-based metrics, applied to each pixel’s time series for selected spectral bands (B12, B11, hue, B6,
EVI, and B2). The list of computed time-series statistics encompasses:
• Energy Measures: Absolute energy which provides a sum of squares of the values.
pe
23 • Change Metrics: Absolute sum of changes to quantify overall variability, mean absolute change, and
24 mean change.
25 • Autocorrelation: Calculated for three lags (1, 2, and 3) to assess the serial dependence at different
26 time intervals.
27 • Count Metrics: Count above and below mean, capturing the frequency of high and low values relative
28 to the average.
ot

29 • Extreme Values: Day of the year for maximum and minimum values, providing insight into seasonal
30 patterns.
31 • Distribution Characteristics: Kurtosis, skewness, and quantiles (5th and 95th percentiles) to de-
32 scribe the shape and spread of the distribution.
tn

33 • Variability Metrics: Standard deviation, variance, and whether variance is larger than standard
34 deviation to evaluate the dispersion of values.
35 • Complexity and Trend Analysis: Time series complexity and symmetry looking, adding depth to
36 the analysis of temporal patterns.
37 For a full list of the time series features extracted in this study and their descriptions, please refer to the
rin

38 Appendix.
39 The integration of xr_fresh into our analytical workflow allowed for an automated and robust analysis
40 of temporal patterns across the study area. By leveraging this toolkit, we could eﬀiciently process large
41 datasets, ensuring that each pixel’s temporal dynamics were comprehensively characterized, which is critical
for accurate environmental monitoring and change detection.
ep

43 Data Extraction
44 To partially account for variation in field size we extracted pixels based on a buffer around field point locations.
45 This allows us to account for the fact that fields likely represent groups of adjacent pixels. Small fields were
Pr

46 buffered by only 5 meters, medium fields by 10m and large fields by 30m. This approach allowed us to
47 capture the time series features from the surrounding area, providing a more comprehensive representation
48 of the field’s characteristics. The use of larger buffers was explored but found to decrease model performance

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 as fields tended to be heterogenous - for instance containing patches of trees. To account for this in our

ed
2 modeling we treat observations from the same field as a “group” in our cross-validation scheme - as described
3 below.

4 Machine Learning Models

In our study, we utilized the extracted time-series features from satellite imagery, described above, to analyze

iew
5

6 crop classifications. Notably, features were centered and scaled from the scikit-learn library to normalize
7 the data, followed by the application of a variance threshold method to reduce dimensionality by excluding
8 features with low variance (Pedregosa et al. 2011).
9 We employ Optuna, an optimization framework, to conduct systematic model selection and hyperparameter
10 tuning (Akiba et al. 2019). Our methodology involved defining a study using Optuna where each trial
11 proposes a set of model parameters aimed at optimizing performance metrics. Specifically, we used stratified

ev
12 group k-fold cross-validation with the number of splits set to three, ensuring that samples from the same
13 field were not split across training and validation sets to prevent data leakage. The scoring metric utilized
14 is the kappa statistic, chosen for its suitability in evaluating models on imbalanced datasets.
15 This approach allows us to rigorously evaluate and compare different classifiers, including LightGBM, Support

r
16 Vector Classification (SVC), and RandomForest, and their configurations under a variety of conditions. The
17 final selection of the model and its parameters was based on the ability to maximize the kappa statistic,
18 ensuring that the chosen model provided the best possible performance for the classification of land cover
19

20
types in our dataset.

Interpretation and Feature Selection

er
To interpret the contributions of individual features to the model predictions, we employed SHapley Additive
pe
21

22 exPlanations (SHAP) (Lundberg and Lee 2017). This approach, based on game theory, quantifies the impact
23 of each feature on the prediction outcome, providing insights into which features are most influential in
24 determining land cover types.
25 In our feature selection process, we incorporate both the mean and maximum SHAP values to comprehen-
26 sively assess the influence of features on model predictions. The mean of the absolute SHAP values across all
samples, provides a measure of the average impact of each feature, highlighting its overall importance across
ot

28 the dataset. This approach emphasizes features that consistently affect the model’s output but might un-
29 derrepresented the significance of features causing substantial impacts under specific conditions. To address
30 this, we also consider the maximum absolute SHAP values. Sorting features by their maximum absolute
SHAP values allows us to identify those that have significant, albeit possibly infrequent, effects on individ-
tn

32 ual predictions. This method ensures that features crucial for particular scenarios are not overlooked, thus
33 offering a more nuanced understanding of feature importance that balances general influence with critical,
34 situation-specific impacts.
35 Feature selection then is the union of the top 30 time series features found with both the mean and maximum
rin

36 SHAP values, resulting in 33 total features. This approach ensures that the selected features are both con-
37 sistently influential across the dataset and capable of exerting substantial impacts under specific conditions,
38 providing a comprehensive set of features for model training and evaluation.

39 Results & Discussion

40 Crowd Sourced Data

41 To address the significant gap in available crop type datasets, particularly in developing regions, this study
42 harnessed the power of crowdsourced data to enhance the robustness and applicability of our machine learning
43 models. Crowdsourced data collection, an innovative approach in the agricultural domain, involves gathering
Pr

44 data from a large number of volunteers or citizen scientists, who provide valuable ground truth information.
45 This method has proven especially useful in areas where traditional data collection methods are challenging
46 due to logistical, financial, or infrastructural constraints.

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 By leveraging the YouthMappers student organization, with over 420 chapters in 80 countries, we were

ed
2 able to collect a large dataset of crop type observations in Tanzania. Participating YouthMappers chapters
3 included: the Institute of Rural Development Planning - Dodoma, Institute of Rural Development Planning -
4 Mwanza, University of Dodoma, the Nelson Mandela African Institution of Science and Technology, and the
5 Institute of Accountancy Arusha. Moreover this exercise provided an important opportunity for students to
6 gain practical experience in data collection, analysis, and interpretation, contributing to their professional

iew
7 development and capacity building in the geospatial domain.

8 Challenges and Lessons Learned There were a number of challenges involved with planning, and
9 implementing a large-scale field operation. One of the primary challenges encountered was the variability
10 in crop cycles across different fields and crop identification more generally. This was particularly true in
11 Arusha, where fields were found in almost every stage of crop development and some fields were visited
12 before the reproductive stages, as the drought delayed planting. In other regions, some crops had already

ev
13 been harvested. This discrepancy resulted in incomplete datasets, as certain crop types were missing or not
14 easily accounted for. The absence of these crops in certain areas impacted our modeling efforts by reducing
15 the representativeness of the training data. Second, although the YouthMappers teams did a commendable
16 job, crop identification is challenging for non-agricultural experts. This task was even more challenging given
17 the heterogeneity of local planting practices and the similarity of early stage growth between, for instance,

r
18 crops like maize and sorghum. To mitigate this issue YouthMappers teams took detailed photos of each
19 field. These images provided us the ability to verify crop types remotely before training the model. While
extremely useful, the collection of more detailed single plant images could have helped us minimize removal of
20

24
er
some observations. Third, the site selection depended on many non-crop related factors including where the
training could be hosted and time constraints based on YouthMappers student’s academic calendars. This
led to changes in which target crops were selected. Fourth, travel was finally approved during a drought year.
This is helpful for transportation during field work, yet poses a challenge as more fields can be abandoned,
pe
25 harvested early, or otherwise found in a poor condition. To mitigate many of these issues, future data
26 collection efforts should allow for more flexibility in the timing in data collection and ensure coverage that
27 reflects local crop cycles.

28 Land Cover and Crop Type

The distribution of primary land cover types within the training dataset used for the model are represented in
ot

30 Figure 3. The dataset consists of a diverse range of land cover types, each contributing differently to the total
31 number of observations. Maize is the most prevalent land cover type, accounting for the highest percentage of
32 the observations, followed by rice and sunflower. This is indicative of the agricultural dominance in the region
being studied. Less common land covers such as millet, sorghum, and urban areas represent intermediate
tn

34 percentages, reflecting the heterogeneous landscape that includes both agricultural and urbanized zones.
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
ed
iew
r ev
Figure 3: Land Cover by Percentage of Observations
er
pe
1 Feature Importance
2 The interpretation of model behavior using SHAP values has allowed for a deeper understanding of how
3 different spectral features impact the model’s predictions, which is critical for refining the feature selection
4 process. By analyzing both the mean and maximum SHAP values, we were able to prioritize features based
ot

5 on their overall impact as well as their critical contributions to specific model decisions.
6 In the two summary plots below, we display the SHAP values for each feature, to identify how much impact
7 each feature has on the model output for pixels in the validation dataset. Features are sorted by the sum of
tn

8 the SHAP value across all samples. The figures bar length represents the mean contributions to explaining
9 each predicted land class value - with different land classes represented with different colors (hues). This
10 visualization provides a comprehensive overview of the feature importance, highlighting the key predictors
11 that drive the model’s predictions. For example, features that are highly influential for “maize” may not be
12 as impactful for “rice” or “sorghum”, reflecting the unique spectral signatures of these crops.
rin

13 Mean SHAP Values In Figure 4, the mean SHAP values provide insights into the average impact of each
14 feature across all predictions. This analysis highlights the features that consistently influence the model’s
15 output across various scenarios. For example, the mean value of B11 (B11.mean) and the 5th percentile
16 of hue (hue.quantile.q.0.05) features were found to have substantial average impacts on model outputs,
suggesting their strong relevance in distinguishing between different crop types. Reflecting on the colors
ep

18 of the bars we can see that ‘B11.mean’ is important in distinguishing sunflower, sorghum, and millet to a
19 roughly equal degree, and has some small impact on distinguishing other classes. While ‘hue.quantile.q.0.05’
20 has the strongest effect distinguishing rice, sunflower, and to a lesser degree cotton. Looking down the
21 list we can see that features like “EVI.standard.deviation” are most effective at isolating urban areas, and
‘B12.mean.second.derivative.central’ substantively differentiates shrub from other classes. Note that the
Pr

23 mean second derivative of B12 is a measure of the rate of change of the rate of change of the B12 band over
24 time, so positive values indicate increasing rate of change (increasingly upward trend), and negative values
25 with decreasing rate of change (increasingly downward trend).

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
ed
iew
r ev
er
pe
ot

Figure 4: Top 20 Mean SHAP Feature Importance by Land Cover Type

Maximum SHAP Values

2 On the other hand, Figure 5, maximum SHAP values uncover features that, while perhaps not consistently
3 influential, have high impacts under particular conditions. This aspect of the analysis is crucial for identifying
4 features that can cause significant shifts in model output, potentially corresponding to specific agricultural
5 or environmental contexts. Features such as “hue.median” and “B11.maximum” show high maximum SHAP
rin

6 values, indicating their pivotal roles in determining certain classes. For instance, “B11.maximum” reflects
7 peak reflectance in the Short-Wavelength Infrared (SWIR), which could be critical in identifying crops at
8 their maximum biomass, like sunflower at full bloom compared to other crops at different stages of growth.
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
ed
iew
r ev
er
pe
ot

Figure 5: Top 20 Max SHAP Feature Importance by Land Cover Type

The final selection of features for model training was carefully curated to include all 30 of both the highest
tn

2 mean and maximum SHAP values, ensuring a comprehensive set of predictors for accurate and reliable
3 classification of crop types in Tanzania. This strategic selection process not only improved model accuracy
4 but also enhanced our understanding of the spectral characteristics most relevant for distinguishing among
5 the diverse agricultural landscapes of the region.
rin

6 Model Selection
7 Optuna trials tuning results selected LightGBM (Ke et al. 2017)is a gradient boosting algorithm that
8 combines many simple decision trees to produce a stronger single model, improving the model at each step.
9 LightGBM grows decision trees leaf-wise rather than adding different levels, thereby targeting branches that
ep

10 most need refining. Here we find an optimal bagging fraction of approximately 0.58, bagging frequency of 3,
11 learning rate of 0.025, max depth of 35, minimum data in each leaf of 154, and the number of leaves set at
12 51.

13 Model Performance
Pr

14 The classification model demonstrated robust performance across multiple land cover classes, as evidenced
15 by the out-of-sample mean confusion matrix with a Cohen’s Kappa score of 0.800 and F1-micro score of 0.822
16 (Table 2, indicating substantial agreement between predicted and actual classifications. Remember that each

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 field is treated as a ‘group’ in the group k-fold procedure to ensure that pixels from the same field are not

ed
2 split between the testing and training groups. The confusion matrix (Figure 6) shows high diagonal values
3 for most classes, highlighting the model’s ability to accurately identify specific land covers. For instance, rice
4 and urban categories achieved classification accuracies of 90% and 94%, respectively. Other well-classified
5 categories included millet, maize, sunflower, tidal, water, shrubs, and forest, each with over 70% accuracy.
6 However forest is primarily confused with the category shrub, which is likely a result of poor training data

iew
7 and the diﬀiculty of visually determining trees versus shrubs from high-res imagery without the benefit of
8 field visits.
9 Categories such as sorghum, sunflower and cotton displayed moderate confusion with other classes, indicating
10 potential areas for model improvement, especially in distinguishing features that are common between similar
11 crop types. Notably, the ‘other’ category showed a broader distribution of misclassifications, likely due to
12 its encompassing a diverse range of less frequent land covers, achieving a lower accuracy of 40%. However
this class is irrelevant to the objectives of this paper.

ev
13

r
er
pe
ot
tn
rin

Figure 6: Out of Sample Confusion Matrix

14 The overall high out-of-sample performance in Table 2 across the majority of categories suggests that the
15 model is effective for practical applications in land cover classification, though further refinement is recom-
16 mended for categories showing lower accuracy and higher misclassification rates.
17 We can compare our results across multiple models using the figure below in 7 from (Kerner et al. 2024).
Pr

18 This plot represents multiple performance metrics of land cover models that include an ‘agricultural’ category
19 specifically for Tanzania. Our model’s performance is indicated by the dashed line. The high level of
20 performance - particularly for the more challenging F1 score - is not surprising given that our model is

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
Metric Value

ed
Balanced Accuracy 0.79
Kappa Accuracy 0.80
Accuracy 0.82
F1 Micro Accuracy 0.82

iew
Table 2: Summary of Classification Metrics
2

1 specifically trained on Tanzanian data, while the other models are typically global or regional models. On
2 the other hand, most of these models include only a single ‘agricultural’ class, meaning their prediction task
3 is a significantly easier one than the one presented here. Given this our strong out-of-sample performance is

ev
4 notable.

r
er
pe
ot

Figure 7: Tanzanian Land Cover Model Performance Comparison - source

Land cover model performance metrics for Tanzania - dashed line indicates this paper’s model
out-of-sample performance across all land covers

5 Source: (Kerner et al. 2024)

rin

6 The integration of crowdsourced data with traditional machine learning and engineered time-series features
7 yielded a robust model for crop classification in Tanzania. While the model performed exceptionally well for
8 crops like maize and rice, some confusion persisted among similar crop types such as sorghum and cotton.
9 This suggests that additional discriminative features or more extensive training data may be necessary to
10 further enhance classification accuracy for these crops. The challenges encountered, such as variability in
ep

11 crop cycles and challenges of crop identification, highlight the complexities of agricultural monitoring in
12 resource-limited settings. Addressing these issues in future research could improve model performance and
13 generalizability. Overall, our findings demonstrate the practicality of using eﬀicient, interpretable machine
14 learning methods in conjunction with community-driven data collection to advance agricultural monitoring
15 in developing regions.
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 Conclusion

ed
2 In this study, we introduced a novel methodology for crop type classification in Tanzania by leveraging crowd-
3 sourced data and time-series features extracted from Sentinel-2 satellite imagery. By combining advanced
4 remote sensing techniques with local knowledge, we addressed significant gaps in agricultural monitoring
5 within resource- and data-limited settings. Our approach gathered a new dataset and successfully applied it
to a real-world task at very low cost, using traditional machine learning algorithms augmented with carefully

iew
6

7 engineered time-series features to precisely identify crop types.

8 Our results demonstrated the effectiveness of the proposed methodology, achieving a Cohen’s Kappa score
9 of 0.80 and an F1-micro score of 0.82 across a diverse multi-class dataset. The model accurately classified
10 challenging crops such as cassava, millet, sorghum, and cotton. The integration of crowdsourced data and
11 time-series features provided valuable insights into the temporal dynamics of crop growth, enhancing the
12 model’s accuracy and reliability. Notably, our model—although trained specifically on Tanzanian data—

ev
13 outperforms broadly used land cover models that perform the simpler task of classifying ‘agriculture’ without
14 specifying the crop type. This highlights the need for better and more frequent crop type classification data.
15 By interpreting feature importance using SHAP values, we gained a deeper understanding of the model’s
16 behavior and the key predictors driving its predictions. Identifying the most influential features across
different land cover types allowed us to refine the feature selection process, ensuring that the selected features

r
17

18 were both consistently influential and impactful under specific conditions.

19 In conclusion, our study underscores the viability and effectiveness of traditional machine learning approaches
20

23
er
augmented with carefully engineered time-series features for crop type classification in data-scarce environ-
ments. By “turning back the clock” on deep learning, we demonstrate that applying a limited yet salient
set of filters—such as measures of trends, distribution descriptions, and complexity metrics—can capture
essential temporal dynamics without the extensive data requirements of deep learning models. This method-
pe
24 ology not only achieved high classification accuracy but also enhanced interpretability and computational
25 eﬀiciency.
26 Our findings highlight that traditional machine learning techniques, combined with advanced yet compu-
27 tationally eﬀicient feature extraction methods, offer a practical and effective alternative to deep learning,
28 particularly in low-information settings prevalent in developing regions. This approach facilitates accurate
29 crop type classification and contributes valuable insights for sustainable agricultural practices and informed
ot

30 policy-making, ultimately impacting food security and land management in resource-limited contexts.
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 Appendix

ed
2 Acknowledgments
3 The United States Agency for International Development generously supports this program through a grant
4 from the USAID GeoCenter under Award # AID-OAA-G-15-00007 and Cooperative Agreement Number:
5 7200AA18CA00015

iew
6 Time Series Features Description
7 The following table provides a comprehensive list of the time series features extracted from the satellite
8 imagery using the xr_fresh module. These features capture the temporal dynamics of crop growth and
9 development, providing valuable information on the phenological patterns of different crops. The computed
10 metrics encompass a wide range of statistical measures, changes over time, and distribution-based metrics,

ev
11 offering a detailed analysis of the temporal patterns in the study area.

Statistic Description Equation

𝑛
Absolute energy sum over the squared values 𝐸 = ∑𝑖=1 𝑥2𝑖

r
𝑛−1
Absolute Sum of Changes sum over the absolute value of ∑𝑖=1 ∣ 𝑥𝑖+1 − 𝑥𝑖 ∣
consecutive changes in the series
1 𝑛−𝑙
Autocorrelation (1 & 2
month lag)
Count Above Mean
er
Correlation between the time se-
ries and its lagged values
Number of values above the
mean
(𝑛−𝑙)𝜎2 ∑𝑡=1 (𝑋𝑡 − 𝜇)(𝑋𝑡+𝑙 − 𝜇)

𝑁above = ∑𝑖=1 (𝑥𝑖 > 𝑥)̄

𝑛
pe
𝑛
Count Below Mean Number of values below the 𝑁below = ∑𝑖=1 (𝑥𝑖 < 𝑥)̄
mean
Day of Year of Maximum Day of the year when the maxi- —
Value mum value occurs in series
Day of Year of Minimum Day of the year when the mini- —
ot

Value mum value occurs in series

𝜇4
Kurtosis Measure of the tailedness of the 𝐺2 = 𝜎4 −3
time series distribution
tn

𝑛
̄ 𝑖 −𝑡)̄
∑𝑖=1 (𝑥𝑖 −𝑥)(𝑡
Linear Time Trend Linear trend coeﬀicient esti- 𝑏= 𝑛
∑𝑖=1 (𝑥𝑖 −𝑥)̄ 2
mated over the entire time series
Longest Strike Above Longest consecutive sequence of —
Mean values above the mean
rin

Longest Strike Below Longest consecutive sequence of —

Mean values below the mean
Maximum Maximum value of the time se- 𝑥max
ries
ep

1 𝑛
Mean Mean value of the time series 𝑥̄ = 𝑛 ∑𝑖=1 𝑥𝑖
1 𝑛−1
Mean Absolute Change Mean of absolute differences be- 𝑛−1 ∑𝑖=1 |𝑥𝑖+1 − 𝑥𝑖 |
tween consecutive values
1 𝑛−1
Mean Change Mean of the differences between 𝑛−1 ∑𝑖=1 𝑥𝑖+1 − 𝑥𝑖
Pr

consecutive values
1 𝑛−1 1
Mean Second Derivative measure of acceleration of 2(𝑛−2) ∑𝑖=1 2 (𝑥𝑖+2 − 2 ⋅ 𝑥𝑖+1 + 𝑥𝑖 )
Central changes in a time series data

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
Statistic Description Equation

ed
Median Median value of the time series 𝑥̃
Minimum Minimum value of the time series 𝑥min
Quantile (q = 0.05, 0.95) Values representing the speci- 𝑄0.05 , 𝑄0.95
fied quantiles (5th and 95th per-

iew
centiles)
1 𝑛
Ratio Beyond r Sigma Proportion of values beyond r 𝑃𝑟 = 𝑛 ∑𝑖=1 (|𝑥𝑖 − 𝑥|̄ > 𝑟𝜎𝑥 )
(r=1,2,3) standard deviations from the
mean
3
Skewness Measure of the asymmetry of the 𝑛
(𝑛−1)(𝑛−2) ∑ ( 𝑋𝑖𝑠−𝑋 )

ev
time series distribution
𝑛
Standard Deviation Standard deviation of the time √ 𝑁1 ∑𝑖=1 (𝑥𝑖 − 𝑥)̄ 2
series
𝑛
Sum Values Sum of all values in the time se- 𝑆 = ∑𝑖=1 𝑥𝑖

r
ries
Symmetry Looking Measures the similarity of the |𝑥mean − 𝑥median | < 𝑟 ∗ (𝑥max − 𝑥min )
time series when flipped horizon-

Time Series Complexity

(CID CE)
tally
er
measure of number of peaks and
valleys
√∑𝑛−1 (𝑥𝑖 − 𝑥𝑖−1 )2
𝑖=1
pe
1 𝑛
Variance Variance of the time series 𝜎2 = 𝑁 ∑𝑖=1 (𝑥𝑖 − 𝑥)̄ 2
Variance Larger than check if variance is larger than 𝜎2 > 1
Standard Deviation standard deviation
ot
tn
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 References

ed
2 Akiba, Takuya, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. “Optuna: A
3 Next-Generation Hyperparameter Optimization Framework.” In Proceedings of the 25th ACM SIGKDD
4 International Conference on Knowledge Discovery and Data Mining.
5 Bégué, Agnès, Damien Arvor, Beatriz Bellon, Julie Betbeder, Diego De Abelleyra, Rodrigo P. D. Ferraz,
6 Valentine Lebourgeois, Camille Lelong, Margareth Simões, and Santiago R. Verón. 2018. “Remote

iew
7 Sensing and Cropping Practices: A Review.” Remote Sensing 10 (1). https://doi.org/10.3390/rs100100
8 99.
9 Bégué, Agnès, Damien Arvor, Beatriz Bellon, Julie Betbeder, Diego De Abelleyra, Rodrigo PD Ferraz,
10 Valentine Lebourgeois, Camille Lelong, Margareth Simões, and Santiago R. Verón. 2018. “Remote
11 Sensing and Cropping Practices: A Review.” Remote Sensing 10 (1): 99.
12 Chao, Steven, Ryan Engstrom, Michael Mann, and Adane Bedada. 2021. “Evaluating the Ability to Use Con-
13 textual Features Derived from Multi-Scale Satellite Imagery to Map Spatial Patterns of Urban Attributes

ev
14 and Population Distributions.” Remote Sensing 13 (19): 3962.
15 Christ, Maximilian, Nils Braun, Julius Neuffer, and Andreas W Kempa-Liehr. 2018. “Time Series Feature
16 Extraction on Basis of Scalable Hypothesis Tests (Tsfresh–a Python Package).” Neurocomputing 307:
17 72–77.
Delince, J, G Lemoine, P Defourny, J Gallego, A Davidson, S Ray, O Rojas, J Latham, and F Achard. 2017.

r
18

19 “Handbook on Remote Sensing for Agricultural Statistics.” GSARS: Rome, Italy.

20 Engstrom, Ryan, Jonathan Hersh, and David Newhouse. 2022. “Poverty from Space: Using High Resolution
Satellite Imagery for Estimating Economic Well-Being.” The World Bank Economic Review 36 (2): 382–
21

24
412.
er
FAS. n.d. “Tanzania Production.” https://ipad.fas.usda.gov/countrysummary/default.aspx?id=TZ; USDA.
https://ipad.fas.usda.gov/countrysummary/default.aspx?id=TZ.
Google. n.d. “Ee.image.rgbtohsv Google Earth Engine.” Google Earth Engine. Google. https://developers
pe
25

26 .google.com/earth-engine/apidocs/ee-image-rgbtohsv.
27 Graesser, Jordan, Anil Cheriyadat, Ranga Raju Vatsavai, Varun Chandola, Jordan Long, and Eddie Bright.
28 2012. “Image Based Characterization of Formal and Informal Neighborhoods in an Urban Landscape.”
29 IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5 (4): 1164–76.
30 Hersh, Jonathan, Ryan Engstrom, and Michael Mann. 2021. “Open Data for Algorithms: Mapping Poverty
31 in Belize Using Open Satellite Derived Features and Machine Learning.” Information Technology for
ot

32 Development 27 (2): 263–92.

33 Höhl, Adrian, Ivica Obadic, Miguel-Ángel Fernández-Torres, Dario Oliveira, and Xiao Xiang Zhu. 2024.
34 “Recent Trends Challenges and Limitations of Explainable AI in Remote Sensing.” In Proceedings of the
35 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8199–8205.
tn

36 Huang, Xin, Liangpei Zhang, and Pingxiang Li. 2007. “Classification and Extraction of Spatial Features
37 in Urban Areas Using High-Resolution Multispectral Imagery.” IEEE Geoscience and Remote Sensing
38 Letters 4 (2): 260–64.
39 Ibrahim, Esther Shupel, Philippe Rufin, Leon Nill, Bahareh Kamali, Claas Nendel, and Patrick Hostert.
40 2021. “Mapping Crop Types and Cropping Systems in Nigeria with Sentinel-2 Imagery.” Remote Sensing
rin

41 13 (17): 3523.
42 Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu.
43 2017. “Lightgbm: A Highly Eﬀicient Gradient Boosting Decision Tree.” Advances in Neural Information
44 Processing Systems 30.
45 Kerner, Hannah, Catherine Nakalembe, Adam Yang, Ivan Zvonkov, Ryan McWeeny, Gabriel Tseng, and
ep

46 Inbal Becker-Reshef. 2024. “How Accurate Are Existing Land Cover Maps for Agriculture in Sub-
47 Saharan Africa?” Scientific Data 11 (1): 486.
48 Li, Haijun, Xiao-Peng Song, Matthew C Hansen, Inbal Becker-Reshef, Bernard Adusei, Jeffrey Pickering, Li
49 Wang, et al. 2023. “Development of a 10-m Resolution Maize and Soybean Map over China: Matching
50 Satellite-Based Crop Classification with Sample-Based Area Estimation.” Remote Sensing of Environment
294: 113623.
Pr

52 Li, Yansheng, Xinwei Li, Yongjun Zhang, Daifeng Peng, and Lorenzo Bruzzone. 2023. “Cost-Eﬀicient Infor-
53 mation Extraction from Massive Remote Sensing Data: When Weakly Supervised Deep Learning Meets
54 Remote Sensing Big Data.” International Journal of Applied Earth Observation and Geoinformation 120:

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897
1 103345. https://doi.org/https://doi.org/10.1016/j.jag.2023.103345.

ed
2 Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In
3 Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio,
4 H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4765–74. Curran Associates, Inc. http:
5 //papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.
6 Ma, Lei, Yu Liu, Xueliang Zhang, Yuanxin Ye, Gaofei Yin, and Brian Alan Johnson. 2019. “Deep Learning

iew
7 in Remote Sensing Applications: A Meta-Analysis and Review.” ISPRS Journal of Photogrammetry and
8 Remote Sensing 152: 166–77. https://doi.org/https://doi.org/10.1016/j.isprsjprs.2019.04.015.
9 Mann, Michael L. 2024. “xr_fresh: Python Package for Feature Extraction from Raster Data Time Series.”
10 Zenodo. https://doi.org/10.5281/zenodo.12701466.
11 Maxwell, Aaron E., Timothy A. Warner, and Luis Andrés Guillén. 2021. “Accuracy Assessment in Convo-
12 lutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 2: Recommendations and
13 Best Practices.” Remote Sensing 13 (13). https://doi.org/10.3390/rs13132591.

ev
14 Morton, Douglas C, Ruth S DeFries, Yosio E Shimabukuro, Liana O Anderson, Egidio Arai, Fernando del Bon
15 Espirito-Santo, Ramon Freitas, and Jeff Morisette. 2006. “Cropland Expansion Changes Deforestation
16 Dynamics in the Southern Brazilian Amazon.” Proceedings of the National Academy of Sciences 103 (39):
17 14637–41.
18 Owusu, Maxwell, Ryan Engstrom, Dana Thomson, Monika Kuffer, and Michael L. Mann. 2023. “Mapping

r
19 Deprived Urban Areas Using Open Geospatial Data and Machine Learning in Africa.” Urban Science 7
20 (4). https://doi.org/10.3390/urbansci7040116.
21 Owusu, Maxwell, Arathi Nair, Amir Jafari, Dana Thomson, Monika Kuffer, and Ryan Engstrom. 2024.
22

25
er
“Towards a Scalable and Transferable Approach to Map Deprived Areas Using Sentinel-2 Images and
Machine Learning.” Computers, Environment and Urban Systems 109: 102075.
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011.
“Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.
pe
26 Pesaresi, Martino, and Jon Atli Benediktsson. 2001. “A New Approach for the Morphological Segmentation
27 of High-Resolution Satellite Imagery.” IEEE Transactions on Geoscience and Remote Sensing 39 (2):
28 309–20.
29 Teixeira, Igor, Raul Morais, Joaquim J. Sousa, and António Cunha. 2023. “Deep Learning Models for the
30 Classification of Crops in Aerial Imagery: A Review.” Agriculture 13 (5). https://doi.org/10.3390/agri
31 culture13050965.
Tseng, Gabriel, Ivan Zvonkov, Catherine Lilian Nakalembe, and Hannah Kerner. 2021. “CropHarvest:
ot

33 A Global Dataset for Crop-Type Classification.” In Thirty-Fifth Conference on Neural Information

34 Processing Systems Datasets and Benchmarks Track (Round 2). https://openreview.net/forum?id=Jtjz
35 UXPEaCu.
tn

36 Yang, Zhongguo, Irshad Ahmed Abbasi, Elfatih Elmubarak Mustafa, Sikandar Ali, and Mingzhu Zhang.
37 2021. “An Anomaly Detection Algorithm Selection Service for IoT Stream Data Based on Tsfresh Tool
38 and Genetic Algorithm.” Security and Communication Networks 2021 (1): 6677027.
rin
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=5090897

Multimodal Crop Cover Identification Using Deep Learning and Remote Sensing
No ratings yet
Multimodal Crop Cover Identification Using Deep Learning and Remote Sensing
19 pages
Crop Identification-UGC Care
No ratings yet
Crop Identification-UGC Care
13 pages
Wa0007
No ratings yet
Wa0007
19 pages
Rustowicz Semantic Segmentation of Crop Type in Africa A Novel Dataset CVPRW 2019 Paper
No ratings yet
Rustowicz Semantic Segmentation of Crop Type in Africa A Novel Dataset CVPRW 2019 Paper
8 pages
Cai 2018
No ratings yet
Cai 2018
13 pages
Computers and Electronics in Agriculture 169 (2020) 105164
No ratings yet
Computers and Electronics in Agriculture 169 (2020) 105164
12 pages
Agriculture 13 00097
No ratings yet
Agriculture 13 00097
18 pages
Agriculture 13 00906
No ratings yet
Agriculture 13 00906
19 pages
Mapping Crop Types Using Sentinel-2 Data Machine Learning and Monitoring Crop Phenology With Sentinel-1 Backscatter Time Series in Pays de Brest, Brittany, France
No ratings yet
Mapping Crop Types Using Sentinel-2 Data Machine Learning and Monitoring Crop Phenology With Sentinel-1 Backscatter Time Series in Pays de Brest, Brittany, France
28 pages
Crop Discrimination
No ratings yet
Crop Discrimination
15 pages
Remotesensing 07 03633
No ratings yet
Remotesensing 07 03633
18 pages
Remotesensing 12 01984
No ratings yet
Remotesensing 12 01984
15 pages
Ijgi 07 00129 v3
No ratings yet
Ijgi 07 00129 v3
18 pages
Remotesensing 12 04052
No ratings yet
Remotesensing 12 04052
21 pages
Remotesensing 15 03792
No ratings yet
Remotesensing 15 03792
20 pages
Remote Sensing: A Deep Learning Semantic Segmentation-Based Approach For Field-Level Sorghum Panicle Counting
No ratings yet
Remote Sensing: A Deep Learning Semantic Segmentation-Based Approach For Field-Level Sorghum Panicle Counting
19 pages
Jude Watimongo Crop Classification Sub Saharan Africa
No ratings yet
Jude Watimongo Crop Classification Sub Saharan Africa
9 pages
Agriengineering 06 00276 v2
No ratings yet
Agriengineering 06 00276 v2
20 pages
Agronomy 12 02722
No ratings yet
Agronomy 12 02722
15 pages
Paper 2021
No ratings yet
Paper 2021
12 pages
Research Proposal
0% (2)
Research Proposal
3 pages
1 s2.0 S0034425721003199 Main
No ratings yet
1 s2.0 S0034425721003199 Main
21 pages
Field Validation of NDVI To Identify Crop Phenological Signatures
No ratings yet
Field Validation of NDVI To Identify Crop Phenological Signatures
26 pages
Project Report
No ratings yet
Project Report
28 pages
1 s2.0 S2095311920633299 Main
No ratings yet
1 s2.0 S2095311920633299 Main
14 pages
Categorisation by Leveraging Cnns and Remote Sensing Satellite Imagery For Crop Analysis in Arid Environments
No ratings yet
Categorisation by Leveraging Cnns and Remote Sensing Satellite Imagery For Crop Analysis in Arid Environments
14 pages
Zhao 2022
No ratings yet
Zhao 2022
18 pages
An Unsupervised Domain Adaptation Deep Learning Method For Spatial and Temporal Transferable Crop Type Mapping Using Sentinel-2 Imagery
No ratings yet
An Unsupervised Domain Adaptation Deep Learning Method For Spatial and Temporal Transferable Crop Type Mapping Using Sentinel-2 Imagery
16 pages
Hashemi Et Al. - 2024 - Review of Synthetic Aperture Radar With Deep Learn
No ratings yet
Hashemi Et Al. - 2024 - Review of Synthetic Aperture Radar With Deep Learn
30 pages
Remotesensing 14 01095 v2
No ratings yet
Remotesensing 14 01095 v2
27 pages
Remotesensing 14 00559
No ratings yet
Remotesensing 14 00559
27 pages
Classification of Crops Through Self-Supervised Decomposition For Transfer Learning
No ratings yet
Classification of Crops Through Self-Supervised Decomposition For Transfer Learning
12 pages
A New Visible Band Index (VNDVI) For Estimating NDVI Values On RGB Images Utilizing Genetic Algorithms
No ratings yet
A New Visible Band Index (VNDVI) For Estimating NDVI Values On RGB Images Utilizing Genetic Algorithms
13 pages
Exploring The Potential of Multi Source Unsupervised Domain Adaptation in Crop Mapping Using Sentinel 2 Images
No ratings yet
Exploring The Potential of Multi Source Unsupervised Domain Adaptation in Crop Mapping Using Sentinel 2 Images
20 pages
Land Use Classification Paper
No ratings yet
Land Use Classification Paper
8 pages
Remote Sensing: Using Sentinel-1, Sentinel-2, and Planet Imagery To Map Crop Type of Smallholder Farms
No ratings yet
Remote Sensing: Using Sentinel-1, Sentinel-2, and Planet Imagery To Map Crop Type of Smallholder Farms
13 pages
Crop Discrimination and Yield Monitoring
No ratings yet
Crop Discrimination and Yield Monitoring
28 pages
Satellite Cropland Mapping Framework
No ratings yet
Satellite Cropland Mapping Framework
26 pages
Remote Sensing: Combining ASNARO-2 XSAR HH and Sentinel-1 C-Sar VH Crop Mapping
No ratings yet
Remote Sensing: Combining ASNARO-2 XSAR HH and Sentinel-1 C-Sar VH Crop Mapping
15 pages
Crop Mapping Using Supervised Machine Learning and Deep Learning A Systematic Literature Review
No ratings yet
Crop Mapping Using Supervised Machine Learning and Deep Learning A Systematic Literature Review
38 pages
Monitoring Plant Health Andd Detection of Plant Disease Using Iot
No ratings yet
Monitoring Plant Health Andd Detection of Plant Disease Using Iot
15 pages
A Lightweight CNN-Transformer Network For Pixel-Based Crop Mapping Using Time-Series Sentinel-2 Imagery
No ratings yet
A Lightweight CNN-Transformer Network For Pixel-Based Crop Mapping Using Time-Series Sentinel-2 Imagery
17 pages
Penjelasan Tentang SAR
No ratings yet
Penjelasan Tentang SAR
21 pages
Crop Mapping with Spectral Profiles
No ratings yet
Crop Mapping with Spectral Profiles
11 pages
MTP Project
No ratings yet
MTP Project
19 pages
Fusion Methods of Deep Learning For Surveillance and Harvest Prediction of Plants
No ratings yet
Fusion Methods of Deep Learning For Surveillance and Harvest Prediction of Plants
6 pages
Cotton Detection
No ratings yet
Cotton Detection
13 pages
1-s2.0-S027311772400886X-main - Copie
No ratings yet
1-s2.0-S027311772400886X-main - Copie
11 pages
1 s2.0 S0924271622000739 Main
No ratings yet
1 s2.0 S0924271622000739 Main
20 pages
Satellite 4 Good
No ratings yet
Satellite 4 Good
14 pages
Remotesensing 09 01048
No ratings yet
Remotesensing 09 01048
20 pages
Agronomy Journal - 2022 - Pinto - Corn Grain Yield Forecasting by Satellite Remote Sensing and Machine Learning Models
No ratings yet
Agronomy Journal - 2022 - Pinto - Corn Grain Yield Forecasting by Satellite Remote Sensing and Machine Learning Models
13 pages
Combining Remote Sensing and Machine Learning To Estimate Peanut Photosynthetic Parameters
No ratings yet
Combining Remote Sensing and Machine Learning To Estimate Peanut Photosynthetic Parameters
15 pages
Review 2
No ratings yet
Review 2
15 pages
Aghighi 2018
No ratings yet
Aghighi 2018
15 pages
Isprs Archives XLVIII M 1 2023 309 2023
No ratings yet
Isprs Archives XLVIII M 1 2023 309 2023
7 pages
Churu
No ratings yet
Churu
12 pages
Predicting Canopy Chlorophyll Content in
No ratings yet
Predicting Canopy Chlorophyll Content in
22 pages
Agriculture 12 01461 v3
No ratings yet
Agriculture 12 01461 v3
28 pages
Transformer SOC
No ratings yet
Transformer SOC
8 pages
An Online Visual Measurement Method For Workpiece Dimension Based On Deep Learning
No ratings yet
An Online Visual Measurement Method For Workpiece Dimension Based On Deep Learning
11 pages
Gatys 2016
No ratings yet
Gatys 2016
10 pages
Paper 3
No ratings yet
Paper 3
11 pages
Pano 3 D
No ratings yet
Pano 3 D
21 pages
Deep Learning Based Emotion Recognition and Visualization of Figural Representation
No ratings yet
Deep Learning Based Emotion Recognition and Visualization of Figural Representation
12 pages
Mokksh Kapur Resume 30-08-2024
No ratings yet
Mokksh Kapur Resume 30-08-2024
1 page
Research Paper - Attendease
No ratings yet
Research Paper - Attendease
13 pages
MLT Numericals
No ratings yet
MLT Numericals
4 pages
2023 - Jurnal - Q1-Mapping Smallholder Plantation As A Key To Sustainable Oil Palm
No ratings yet
2023 - Jurnal - Q1-Mapping Smallholder Plantation As A Key To Sustainable Oil Palm
18 pages
Machine Learning in Medical Imaging: Maryellen L. Giger, PHD
No ratings yet
Machine Learning in Medical Imaging: Maryellen L. Giger, PHD
9 pages
Skin Care Products Recommendation System
No ratings yet
Skin Care Products Recommendation System
10 pages
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
No ratings yet
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
9 pages
CCS355 SET1 Anna University Lab Manual Question Set
100% (1)
CCS355 SET1 Anna University Lab Manual Question Set
3 pages
Age Detection Using Deep Learning in OpenCV
No ratings yet
Age Detection Using Deep Learning in OpenCV
14 pages
A Weld Seam Dataset and Automatic Detection of Welding Defects Using Convolutional Neural Network
No ratings yet
A Weld Seam Dataset and Automatic Detection of Welding Defects Using Convolutional Neural Network
10 pages
Fig 4
No ratings yet
Fig 4
2 pages
PRJ - Report Final 3
No ratings yet
PRJ - Report Final 3
74 pages
UNIT-4 Foundations of Deep Learning
100% (1)
UNIT-4 Foundations of Deep Learning
43 pages
Engineering: Yiran Chen, Yuan Xie, Linghao Song, Fan Chen, Tianqi Tang
No ratings yet
Engineering: Yiran Chen, Yuan Xie, Linghao Song, Fan Chen, Tianqi Tang
11 pages
1 s2.0 S0957417424000757 Main
No ratings yet
1 s2.0 S0957417424000757 Main
11 pages
Diabetic Retinopathy Detection Using Deep Learning Techniques - A Comprehensive Study
No ratings yet
Diabetic Retinopathy Detection Using Deep Learning Techniques - A Comprehensive Study
8 pages
VQGAN: Taming Transformer For High-Resolution Image Synthesis
No ratings yet
VQGAN: Taming Transformer For High-Resolution Image Synthesis
52 pages
Dongdong Et Al (2020) .Remaining Useful Life Prediction of Lithium Battery Using Convolutional Neural Network With Optimized Parameters
No ratings yet
Dongdong Et Al (2020) .Remaining Useful Life Prediction of Lithium Battery Using Convolutional Neural Network With Optimized Parameters
5 pages
Recent Advances in Deep Learning Techniques For Face Recognition MAIN
No ratings yet
Recent Advances in Deep Learning Techniques For Face Recognition MAIN
33 pages
Deep CNN Based Brain Tumor Detection in - 2024 - International Journal of Intel
No ratings yet
Deep CNN Based Brain Tumor Detection in - 2024 - International Journal of Intel
8 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Biologically Inspired Cognitive Architectures: Hadjer Boubenna, Dohoon Lee T
No ratings yet
Biologically Inspired Cognitive Architectures: Hadjer Boubenna, Dohoon Lee T
7 pages
The Method of The Real-Time Human Detection and Tracking: ISSN 2710 - 1673 Artificial Intelligence 2023 1
No ratings yet
The Method of The Real-Time Human Detection and Tracking: ISSN 2710 - 1673 Artificial Intelligence 2023 1
8 pages
Machine Learning Crash Course For BCA 5th Semester
No ratings yet
Machine Learning Crash Course For BCA 5th Semester
21 pages

Base Paper

Uploaded by

Base Paper

Uploaded by

ed

Lite Learning: Eﬀicient Crop Classification in Tanzania Using

Michael L. Mann∗ Lisa Colson† Rory Nealon‡ Ryan Engstrom§

resource-limited contexts, such as sub-Saharan Africa.

∗ The George Washington University, Washington DC 20052, Corresponding author. Email

‡ USAID GeoCenter, Washington DC 20523

39 domain expertise and manual intervention.

Data & Methods

Figure 1: Study area map

9 Crowd Sourced Crop Data

Figure 2: Tanzania Crop Calendar

12 Source: (FAS, n.d.)

Table 1: Collection Targets and Primary Crops by Region in Tanzania

11 Time Series Features

4 Machine Learning Models

Interpretation and Feature Selection

39 Results & Discussion

40 Crowd Sourced Data

28 Land Cover and Crop Type

Figure 4: Top 20 Mean SHAP Feature Importance by Land Cover Type

Maximum SHAP Values

Figure 5: Top 20 Max SHAP Feature Importance by Land Cover Type

Figure 6: Out of Sample Confusion Matrix

Figure 7: Tanzanian Land Cover Model Performance Comparison - source

5 Source: (Kerner et al. 2024)

7 engineered time-series features to precisely identify crop types.

18 were both consistently influential and impactful under specific conditions.

Statistic Description Equation

𝑁above = ∑𝑖=1 (𝑥𝑖 > 𝑥)̄

Value mum value occurs in series

Longest Strike Below Longest consecutive sequence of —

Time Series Complexity

19 “Handbook on Remote Sensing for Agricultural Statistics.” GSARS: Rome, Italy.

32 Development 27 (2): 263–92.

33 A Global Dataset for Crop-Type Classification.” In Thirty-Fifth Conference on Neural Information

You might also like