Climate Change Prediction with ML
Climate Change Prediction with ML
A project-based report
Submitted in the partial fulfillment of the requirements
for the award of the degree of
Bachelor of Technology
in
Department of Computer Science and Engineering
by
2100039067 PABBATHI VENKATA KARTHIKEYA
2100039017 MUDRABOYINA VENKATA CHAKRESH
2100030274 KONDAMUDI RACHEL ANUPAMA
2100031770 JONNALA SAI TEJASWINI
1
2
3
DECLARATION
The Project Report entitled “Prediction of Climate Changes and its effects” is a
record of Bonafide work of student [Link]-2100039067 ,Venkata Chakresh-
2100039017, [Link]-2100031770, [Link]-2100030274 submitted in partial
fulfillment for the award of B. Tech in Computer Science and Engineering to the K L
University. The results embodied in this report have not been copied from any other
4edepartments / University / Institute.
4
CERTIFICATE
This is to certify that the Project Report entitled “Prediction of climate changes and
its effects” is being submitted by , [Link]-2100039067 , [Link]
Chakresh-2100039017, [Link]-2100031770, [Link]-2100030274
submitted in partial fulfilment for the award of B. Tech in Computer Science and
Engineering to the K L University is a record Bonafide work carried out under our
guidance and supervision. The results embodied in this report have not been copied
from any other departments/University/Institute.
Dr.V.S.V Prabhakar
5
ACKNOWLEDGEMENT
It is with all the humility that I would like to thank God Almighty without whose
blessings, no work can be completed. Next, I would like to thank my parents who have
always allowed me to pursue the career of my liking. I am grateful to the Department
of Computer Science and Engineering, K L E F for giving me the opportunity to
execute this project, which is an integral part of the curriculum in B. Tech program at
the Koneru Lakshmaiah Education Foundation, Vijayawada. I owe my sincere thanks
to my internal project guide and project coordinator Dr.K. Rajeshkumar for their
continuous support and encouragement throughout my research program. My special
thanks to Dr.V.S.V Prabhakar, Head of the Computer Science and Engineering
Department for all the facilities provided to successfully complete the project work. I
am also very thankful to all the faculty members and technicians of the department for
their constant valuable advices, encouragement, support and blessings during the
project.
6
ABSTRACT
Climate change has emerged as one of the most critical challenges of our time,
necessitating cutting-edge solutions for accurate prediction and mitigation. This
project, Prediction of Climate Changes and Their Effects, harnesses the power of
advanced machine learning algorithms, including ensemble techniques like Random
Forest, Gradient Boosting, and deep learning architecture such as Recurrent Neural
Networks (RNNs) and Long Short-Term Memory (LSTM). These technologies analyze
vast amounts of historical climatic data, uncovering hidden patterns and correlations to
deliver highly precise forecasts. With the integration of AI-powered anomaly detection
and time-series forecasting, the system predicts climate variations and their potential
impacts on ecosystems, agriculture, and human livelihoods. By leveraging state-of-the-
art technologies, this project sets a benchmark in proactive climate change modeling
and fosters sustainable decision-making for a resilient future.
7
TABLE OF CONTENTS
1 INTRODUCTION 11 - 15
2 LITERATURE REVIEW 16 – 19
3 THEORETICAL 20 – 30
ANALYSIS
4 METHODOLOGY 31 – 33
5 EXPERIMENTAL 34 - 47
RESULTS
6 DISCUSSION OF THE 48 - 51
RESULTS
7 CONCLUSION 52-55
8 REFERENCES 56 - 58
8
LIST OF IMAGES
9
LIST OF TABLES
10
INTRODUCTION
Predicting climate change and its effects using machine learning is gaining widespread
adoption across various fields, including environmental science, agriculture, and
disaster management. The ability of machine learning to analyze complex patterns in
historical climate data allows for accurate forecasting of future trends and the
identification of potential hazards such as extreme weather events, rising sea levels, or
temperature anomalies. These predictive insights are critical in enabling governments,
organizations, and communities to implement timely interventions to mitigate adverse
impacts. Machine learning-based prediction systems are invaluable because they go
beyond traditional models, identifying patterns and making connections that would
otherwise remain undetected.
[1] In recent years, climate prediction using machine learning has garnered significant
attention due to the increasing availability of historical climate datasets and
advancements in computational power. As data grows more complex and
multidimensional, conventional statistical models struggle to manage high-dimensional
relationships. Machine learning algorithms, including supervised, unsupervised, and
deep learning methods, enable the extraction of meaningful patterns from these
datasets, facilitating highly accurate climate forecasting. These approaches empower
researchers and policymakers to address pressing climate-related challenges
proactively.
12
production and consumption, particularly in renewable energy.
Urban Planning: Enables cities to prepare for extreme weather events,
ensuring infrastructure resilience.
Environmental Conservation: Tracks shifts in ecosystems to protect
biodiversity and mitigate the impact of climate change on natural habitats.
1.5 Advanced Algorithms for Climate Prediction
This project employs state-of-the-art machine learning algorithms, including:
Random Forest and Gradient Boosting: These ensemble techniques enhance
prediction accuracy by combining the outputs of multiple models.
Recurrent Neural Networks (RNNs) and Long Short-Term Memory
(LSTM): [2]Specialized for time-series data, these architectures identify
temporal dependencies in climatic variables.
Convolutional Neural Networks (CNNs): Applied to analyze spatial data,
such as satellite imagery, for detecting trends in cloud patterns or sea surface
temperatures.
1.6 Scope of the Study
This study focuses on leveraging machine learning techniques to analyze and predict
climate changes and their effects. By implementing advanced models, the project aims
to forecast key climatic variables, such as temperature, rainfall, and sea-level rise, while
evaluating their broader impacts on agriculture, ecosystems, and human health. The
research explores the potential of these methods in addressing real-world climate
challenges and highlights their relevance in global climate action strategies.
1.7 Challenges in Climate Prediction
Despite its potential, machine learning for climate prediction faces several challenges:
Data Quality: Climate datasets often contain missing or inconsistent data,
which can compromise prediction accuracy.
Dynamic Nature of Climate: Rapid and unpredictable changes in climate
variables require models to adapt continuously.
High Dimensionality: The complexity of climate data necessitates efficient
algorithms capable of managing multidimensional relationships.
Computational Demands: Advanced models require significant computational
power, particularly for deep learning techniques.
13
Interpretability: Machine learning models, particularly deep learning, can act
as "black boxes," making it difficult to explain their predictions to
policymakers.
HYPOTHESIS
[3]The premise of this project is that employing advanced machine learning algorithms
such as Linear Regression, Random Forest Regression, and XGBoost will enable
accurate climate prediction and provide insights into its effects. It is expected that the
use of these models, combined with effective feature engineering and visualization
techniques, will significantly enhance the prediction accuracy and offer a deeper
understanding of climate trends. Below are the specific hypotheses for this project:
1. Prediction Accuracy Improvement:
The primary hypothesis is that transitioning from simple models like Linear
Regression to more sophisticated models such as Random Forest Regression and
XGBoost will improve the accuracy of climate predictions. These advanced models
are expected to better capture the non-linear relationships and interactions between
features.
2. Comparison of Linear Regression and Random Forest:
It is hypothesized that while Linear Regression serves as a good baseline for model
training, Random Forest Regression will outperform it by capturing complex
interactions between variables and reducing overfitting. This transition highlights the
advantage of ensemble methods in handling climate data.
3. Comparison of Linear Regression and Random Forest:
It is hypothesized that while Linear Regression serves as a good baseline for model
training, Random Forest Regression will outperform it by capturing complex
interactions between variables and reducing overfitting. This transition highlights the
advantage of ensemble methods in handling climate data.
4. Performance of XGBoost:
The hypothesis regarding XGBoost is that it will deliver superior performance
compared to other models due to its ability to handle missing data, prevent overfitting,
and learn efficiently from the dataset's features. XGBoost's scalability and
computational efficiency are expected to make it the optimal choice for climate
14
prediction.
5. Impact of Feature Engineering and Visualization:
It is anticipated that the selection of relevant features and the implementation of various
visualizations will significantly enhance model interpretability and prediction accuracy.
Feature engineering, such as deriving weather indices or aggregating seasonal data, is
expected to reduce noise and improve the model's ability to discern patterns.
6. Efficiency of Model Training:
The hypothesis includes exploring the computational trade-offs between models. It is
expected that Random Forest Regression and XGBoost, while more accurate, will
require higher computational resources compared to Linear Regression. This aspect of
model efficiency will be evaluated during implementation.
15
LITERARTURE REVIEW
16
[5]Key steps include:
Data Cleaning: Removing missing or noisy data points to ensure a consistent
dataset.
Scaling and Normalization: Ensuring that features with different units and
ranges do not disproportionately affect the model.
Feature Selection: Identifying the most relevant features, like temperature,
wind speed, and pressure, which significantly impact weather prediction.
Handling Temporal Data: Properly encoding time-series data to maintain
sequence integrity in algorithms like LSTM.
17
records can impair model performance.
Real-Time Predictions: The need for instantaneous predictions can strain
computational resources and model efficiency.
Adapting to Climate Change: Models must evolve to account for changing
baselines due to global warming and environmental shifts.
18
variability, necessitate advanced techniques for preparation to ensure that the ML
models yield meaningful outputs. In this study, the dataset consists of critical weather
variables, including temperature (in Celsius and Fahrenheit), wind speed and
direction, atmospheric pressure, humidity, and UV index.
Preprocessing Workflow:
1. Data Cleaning: Missing values, outliers, and inconsistencies were addressed
using interpolation techniques and statistical outlier detection.
2. Feature Selection: The most influential variables were selected using
methods such as Recursive Feature Elimination (RFE) and correlation
analysis.
3. Scaling and Normalization: Standardizing features like temperature and
pressure to comparable ranges enhanced model efficiency.
4. Encoding Temporal Features: By encoding patterns such as seasonal cycles,
temporal dependencies were preserved for time-series models like LSTM.
19
THEORITICAL ANALYSIS
[8]Machine learning, a key subfield of artificial intelligence, focuses on developing
algorithms that learn from data to make decisions or predictions without explicit
programming for every specific task. These algorithms are trained on datasets to
identify patterns and relationships, enabling machines to mimic human cognitive
abilities in tasks such as image recognition, time series forecasting, and predicting
trends like price fluctuations.
For those eager to delve into this exciting field, numerous affordable and flexible
learning opportunities are available, catering to learners from all backgrounds. These
courses often include hands-on projects, theoretical foundations, and practical
applications, allowing participants to understand and apply machine learning
techniques effectively.
20
Supervised
Supervised learning involves training a model using historical data with known
outcomes, enabling it to make predictions or classifications for new, unseen data. For
example, in an image classification task, a model is trained on a [10] labeled dataset of
images, such as those tagged as "cats" or "dogs," to classify new images. Supervised
learning is characterized by its reliance on labeled datasets, where the goal is to map
inputs to their corresponding outputs accurately. Common algorithms include linear
regression for regression tasks and decision trees for classification tasks. It is widely
used in applications like natural language processing, image recognition, and
recommendation systems. While it performs well when labeled data is available, it can
be challenging when such data is scarce or expensive to obtain, and it often requires
careful feature engineering to achieve optimal results.
Unsupervised
Unsupervised learning deals with unlabeled data and aims to uncover hidden patterns,
structures, or relationships within the dataset. For instance, clustering similar
documents based on their content is a typical application, where the model groups
documents without prior knowledge of their categories. This type of learning is
characterized by its ability to identify latent structures and reduce data dimensionality,
with algorithms such as k-means clustering and principal component analysis (PCA)
being commonly used. Unsupervised learning is particularly useful for exploratory data
analysis, customer segmentation, and anomaly detection. However, the lack of labeled
data makes validation and evaluation more complex, and the interpretation of results
often depends on the specific context and expertise.
Reinforcement
[11]Reinforcement learning involves an agent interacting with an environment to learn
a strategy that maximizes a cumulative reward through trial and error. For example, an
autonomous robot navigating a maze can learn to reach its goal by receiving rewards
for correct actions and penalties for incorrect moves. This type of learning operates in
a continuous feedback loop of decision-making, action-taking, and reward assessment,
where the agent refines its policy over time. Algorithms like Q-learning and deep
reinforcement learning with neural networks are commonly used in this domain.
Reinforcement learning is well-suited for dynamic decision-making tasks, such as
robotics, autonomous systems, and game-playing agents. However, it often demands
21
significant computational resources and careful parameter tuning, with complex
environments requiring extensive exploration and experimentation to achieve effective
learning.
Precision is a metric that evaluates the accuracy of the positive predictions made by
the model. It is calculated as the ratio of true positive predictions to the total number of
positive predictions (true positives plus false positives). Precision is particularly useful
when the cost of false positives is high, as it reflects how many of the predicted positive
instances were actually correct.
22
Recall, also referred to as
sensitivity or the true positive rate, measures the model's ability to correctly identify all
actual positive instances. It is calculated as the ratio of true positive predictions to the
total number of actual positive instances (true positives plus false negatives). Recall is
crucial in scenarios where missing positive cases, such as in medical diagnoses, can
have severe consequences.
F1-score [13]is the harmonic mean of precision and recall, providing a single metric
that balances the trade-offs between the two. It is especially useful when the dataset has
an uneven class distribution or when both false positives and false negatives carry
significant weight. The F1-score is calculated as
Support [14]represents the number of actual occurrences of each class in the dataset.
While it does not directly measure performance, support contextualizes the precision,
recall, and F1-score metrics by showing the class distribution in the data. It helps
identify if the model's performance is influenced by class imbalances.
23
1. Linear Regression
[15]Linear Regression is one of the simplest and most commonly used algorithms for
predicting continuous values. It establishes a linear relationship between the input
features (independent variables) and the output (dependent variable). The model tries
to fit a straight line that minimizes the error (difference) between the predicted and
actual values.
Estimating the impact of one weather parameter (e.g., wind speed) on another
(e.g., humidity).
Benefits:
Challenges:
Assumes a linear relationship, which may not be suitable for complex weather
data.
24
Fig 2 – Linear Regression
25
Fig 3 – Random Regression Model
26
[19] Fig 4:XGBoost Model
27
insights are based on tree weight and gain
provided. splits. metrics.
28
characteristic makes MSE particularly valuable in applications where large errors are
highly undesirable, such as financial forecasting or precision engineering. Its use
ensures that the model prioritizes reducing substantial deviations, which can have a
more pronounced impact on real-world outcomes
Root Mean Squared Error (RMSE), derived as the square root of MSE, provides an
error metric in the same units as the output variable, making it easier to interpret. Like
MSE, RMSE emphasizes larger errors due to the squaring operation but offers
improved clarity through unit consistency. It is widely used in applications such as
weather forecasting and climate modeling, where accurately understanding error
magnitude in real-world units is critical for actionable insights.
R-Squared (R2R^2R2), also known as the [21]Coefficient of Determination,
measures the proportion of variance in the dependent variable that is explained by the
independent variables. It provides a summary statistic for model performance, with
values ranging from 0 to 1, where closer to 1 indicates a better fit. Negative values can
occur in poorly fitted models, offering a clear indication of suboptimal performance.
R2R^2R2 is especially useful for understanding how well a model captures overall
trends in the data, making it a go-to metric for assessing regression models.
Mean Absolute Percentage Error (MAPE) expresses the error as a percentage of the
actual values, making it particularly effective for comparing model performance across
datasets with different scales. However, it is sensitive to small actual values, which can
distort the error calculation. MAPE is often used in business and financial contexts
where percentage errors provide a clearer understanding of model performance relative
29
Fig 5-Example of Anomaly Detection
Metric Description
Used for
False Positive Rate (FPR) Proportion of normal points Evaluating classifier's error
incorrectly classified as rate
anomalies.
31
Linear Regression: [23]A simple and interpretable model used to predict
continuous outcomes based on input features. It attempts to find the linear
relationship between the independent variables (e.g., temperature, humidity, wind
speed) and the dependent variable (e.g., temperature or humidity prediction).
Random Forest Regression: A robust ensemble method that creates multiple
decision trees during training and outputs the average prediction from all the trees.
It works well for handling complex datasets with nonlinear relationships, such as
predicting climate patterns, by capturing the interactions between different features.
XGBoost (Extreme Gradient Boosting): An efficient and scalable gradient
boosting algorithm used for regression tasks. It works by combining the predictions
of multiple weak models (decision trees) to improve accuracy. XGBoost is highly
effective in capturing complex relationships and is less prone to overfitting, making
it suitable for weather and climate prediction.
LSTM (Long Short-Term Memory): A type of recurrent neural network (RNN)
designed to capture long-term dependencies in time series data. It is especially
useful for predicting climate changes where past weather patterns influence future
conditions. LSTM can model sequential data, making it ideal for predicting time-
dependent variables like temperature and humidity.
32
different thresholds.
Cross-validation was performed to ensure robustness, and the models were evaluated
on the test set to validate their effectiveness.
4.8 Prediction of Weather Workflow
The workflow followed in the project is as follows:
Data Collection and Preprocessing: Collect and preprocess the data to make it
ready for training.
Feature Selection: Use feature selection techniques to focus on the most
important variables.
Model Training: Train multiple models and optimize their hyperparameters.
Model Evaluation: Evaluate models using precision, recall, F1-score, and
AUC.
Anomaly Detection: Use the trained models to detect anomalies in new data.
33
EXPERIMENTAL RESULTS
import pandas as pd
df=pd.read_excel('Weather [Link]')
[Link]
[Link]()
34
# Normalize the values (min-max scaling)
ax1.set_xlabel('Time')
ax1.set_ylabel('Temperature (°C) & Wind Speed (kph)', color='black')
[Link](df['last_updated_epoch'], df['temperature_celsius'], label='Temperature (°C)',
color='red')
[Link](df['last_updated_epoch'], df['wind_kph'], label='Wind Speed (kph)',
color='blue')
ax1.tick_params(axis='y', labelcolor='black')
[Link](loc='upper left')
Model Training :
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from [Link] import mean_squared_error, mean_absolute_error
35
# Features and target
X = data[['wind_kph', 'humidity', 'pressure_mb', 'cloud']]
y = data['temperature_celsius']
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Model
model = LinearRegression()
[Link](X_train, y_train)
# Predictions
y_pred = [Link](X_test)
# Evaluation
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
print(f'Mean Absolute Error: {mae:.2f}')
Output :
Mean Squared Error: 9.60
Mean Absolute Error: 2.29
Updated Model :
36
# Train the model
rf_model.fit(X_train, y_train)
# Predictions
y_pred_rf = rf_model.predict(X_test)
Output:
Random Forest - Mean Squared Error: 3.63
Random Forest - Mean Absolute Error: 1.25
37
model_fit = [Link]()
# Forecast
forecast = model_fit.forecast(steps=30)
print(forecast)
# Plot
[Link](figsize=(10, 5))
[Link](wind_data[-100:], label='Historical Data')
[Link](forecast, label='Forecast', color='red')
[Link]('Wind Speed Forecast')
[Link]()
[Link]()
38
data['rainfall'] = (data['precip_mm'] > 0).astype(int)
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Model
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train, y_train)
# Predictions
y_pred = rf_model.predict(X_test)
# Evaluation
print(classification_report(y_test, y_pred))
Output:
39
Classify weather conditions into categories such as sunny, rainy, stormy[24]
def classify_weather(row):
if row['precip_mm'] > 20 and row['wind_mph'] > 30:
return 'Stormy'
elif row['precip_mm'] > 5:
return 'Rainy'
elif row['cloud'] < 30 and row['precip_mm'] == 0:
return 'Sunny'
else:
return 'Cloudy'
# Apply classification
data['Weather Condition'] = [Link](classify_weather, axis=1)
40
Fig 6.3-Distribution of Weather Conditions
[Link](figsize=(8, 5))
[Link](data['temperature_celsius'], data['feels_like_celsius'], alpha=0.6,
color='blue')
[Link]([min(data['temperature_celsius']), max(data['temperature_celsius'])],
[min(data['temperature_celsius']), max(data['temperature_celsius'])],
color='red', linestyle='--', label='Actual = Feels Like')
[Link]('Actual Temperature (°C)')
[Link]('Feels Like Temperature (°C)')
[Link]('Actual vs. Feels Like Temperature')
[Link]()
[Link]()
41
Fig 6.4- Actual vs. Feels Like Temperature
[Link](figsize=(8, 6))
[Link](correlation, annot=True, cmap='coolwarm', fmt='.2f', square=True)
[Link]('Correlation with Feels Like Temperature')
[Link]()
def comfort_index(feels_like):
if feels_like < 20:
42
return 'Cold'
elif feels_like > 30:
return 'Hot'
else:
return 'Comfortable'
44
# Define foggy/hazy threshold
data['is_foggy'] = (data['visibility_km'] < 2).astype(int)
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
1. Data Preprocessing
The initial phase of the project involved preprocessing the data. This step included:
Handling missing values: [25] Missing or null values in the dataset were handled by
either imputation or removal based on the nature of the data.
Feature engineering: New features were created or adjusted as needed, ensuring that
the data was formatted properly for input into the machine learning model.
45
machine learning models for predicting temperature. Several algorithms were tested,
including:
Linear Regression
Random Forest Regressor
XGBoost
[Link] Deployment
For user interaction, a Streamlit interface was built. This interface allows users to
input various weather-related parameters (e.g., wind speed, pressure, humidity) and get
the predicted temperature after a specified number of hours.
Streamlit Interface Features:
Input fields: [27]Users can input weather features (wind speed, humidity, pressure,
etc.) and specify the number of hours for which they want to predict the temperature.
Prediction display: Based on the user's inputs, the model predicts the temperature after
the specified time period.
46
Fig 7-Streamlit Interface
47
DISCUSSION OF THE RESULTS
The project focuses on predicting climate changes and weather conditions using
machine learning algorithms. [28]Initially, models such as Linear Regression were
implemented and evaluated to assess their performance. Subsequently, more advanced
algorithms, including Random Forest Regressor, XGBoost, and LSTM, were employed
to improve prediction accuracy. The performance of each model was thoroughly
analyzed using evaluation metrics to ensure the most accurate and reliable predictions.
The results underscore the importance of accurately forecasting weather conditions, as
these predictions have significant real-world applications in various sectors.
The dataset used in the project includes features such as temperature (in Celsius and
Fahrenheit), wind speed (in mph and kph), wind direction, pressure (in mb and inches),
precipitation, humidity, cloud cover, visibility (in km and miles), UV index, and gust
speeds. These features provide a comprehensive view of the weather conditions,
making it possible to predict climate patterns and their potential effects on different
domains.
The predictive model developed in this project has several practical applications:
1. Agriculture: Farmers can use weather predictions to make informed decisions
about planting, irrigation, and harvesting schedules. Accurate forecasts help
mitigate risks related to extreme weather events, such as droughts or floods,
ensuring better crop yields and food security.
2. Hospitality: The hospitality industry can leverage climate predictions to plan
outdoor events, manage bookings during peak travel seasons, and offer weather-
based services to guests, such as recommending clothing or activities.
3. Transportation: Weather forecasts play a crucial role in ensuring
transportation safety and efficiency. Airlines can plan flight routes to avoid
turbulence or storms, while road and rail networks can prepare for adverse
conditions like snow or heavy rain, reducing accidents and delays.
4. Disaster Management: Real-time predictions enable authorities to prepare for
extreme weather events, such as hurricanes or heatwaves, by issuing early
warnings, evacuating affected areas, and mobilizing emergency services.
5. Energy Management: Weather predictions aid in optimizing energy
consumption and production. For instance, solar and wind energy industries can
48
adjust operations based on expected sunlight or wind speeds.
6. Retail and Supply Chain: Retailers can anticipate demand for seasonal
products, such as winter clothing or summer beverages, while supply chain
managers can ensure timely delivery by planning around weather disruptions.
Agriculture Insights
Analyze historical weather patterns and their correlation with crop yields or growing
seasons. Use predictive modeling to forecast optimal planting and harvesting times
based on weather patterns.
49
Fig 8.2-Wind Speed vs UV Index
Transportation
Healthcare
50
Fig 8.4-Feature Importance
51
CONCLUSION
54
References
[1] [1]Balti, H., Abbes, A. B., Mellouli, N., Farah, I. R., Sang, Y., & Lamolle,
M. (2020). A review of drought monitoring with big data: Issues, methods,
challenges and research directions. Ecological Informatics, 60, 101136.
[2] [2]Y.H. Fu, H. Zhao, S. Piao, M. Peaucelle, S. Peng, G. Zhou, P. Ciais, M.
Huang, A. Menzel, J. Peñuelas, Y. Song, Y. Vitasse, Z. Zeng, and I. A.
Janssens, ‘‘Declining global warming effects on the phenology of spring leaf
unfolding,’’ Nature, vol. 526, no. 7571, pp. 104–107, Oct. 2015, doi:
10.1038/nature15402.
[3] [3] Balti, H., Abbes, A. B., Mellouli, N., Sang, Y., Farah, I. R., Lamolle, M.,
& Zhu, Y. (2021, July). Big data based architecture for drought forecasting
using LSTM, ARIMA, and Prophet: Case study of the Jiangsu Province,
China. In 2021 International Congress of Advanced Technology and
Engineering (ICOTEN) (pp. 1-8). IEEE.
[4] [4]Hanadé Houmma, I., El Mansouri, L., Gadal, S., Garba, M., & Hadria, R.
(2022). Modelling agricultural drought: a review of latest advances in big
data technologies. Geomatics, Natural Hazards and Risk, 13(1), 2737-2776.
[5] [5]W. Cramer, A. Bondeau, F. I. Woodward, I. C. Prentice, R. A. Betts, V.
Brovkin, P. M. Cox, V. Fisher, J. A. Foley, A. D. Friend, C. Kucharik, M. R.
Lomas, N. Ramankutty, S. Sitch, B. Smith, A. White, and C. Young-
Molling,‘‘Global response of terrestrial ecosystem structure and function to.
CO2 and climate change: Results from six dynamic global vegetation
models,’’ Global Change Biol., vol. 7, no. 4, pp. 357–373, Apr. 2001, doi:
10.1046/j.1365-2486.2001.00383.x.
[6] [6]Nandgude, N., Singh, T. P., Nandgude, S., & Tiwari, M. (2023). Drought
prediction: a comprehensive review of different drought prediction models
and adopted technologies. Sustainability, 15(15), 11684R. Kaur et al.,
“CNN-Based Anomaly Detection in Industrial IoT Systems,” Sensors
Journal, 2020.
[7] [7]Jaber, M. M., Ali, M. H., Abd, S. K., Jassim, M. M., Alkhayyat, A., Aziz,
H. W., & Alkhuwaylidee, A. R. (2022). Predicting climate factors based on
big data analytics based agricultural disaster management. Physics and
Chemistry of the Earth, Parts A/B/C, 128, 103243Y. Kim, “Unsupervised
Learning in CPS: Autoencoders,” Neural Networks, 2019.
[8] [8]J. Matthewman and G. Magnusdottir, ‘‘Observed interaction between
Pacific sea ice and the Western Pacific pattern on intraseasonal time scales,’’
J. Climate, vol. 24, no. 19, pp. 5031–5042, Oct. 2011, doi:
10.1175/2011JCLI4216.1..
[9] [9] Fung, K. F., Huang, Y. F., Koo, C. H., & Soh, Y. W. (2020). Drought
forecasting: A review of modelling approaches 2007–2017. Journal of Water
and Climate Change, 11(3), 771-799.
[10] [10]G. Krinner, N. Viovy, N. de Noblet-Ducoudré, J. Ogée, J. Polcher, P.
Friedlingstein, P. Ciais, S. Sitch, and I. C. Prentice, ‘‘A dynamic global
vegetation model for studies of the coupled atmosphere-biosphere system,’’
55
Global Biogeochem. Cycles, vol. 19, no. 1, Mar. 2005, Art. no. GB1015, doi:
10.1029/2003GB002199.
[11] [11]Canavera, G., Magnanini, E., Lanzillotta, S., Malchiodi, C., Cunial, L.,
& Poni, S. (2023). A sensorless, Big Data based approach for phenology
and meteorological drought forecasting in vineyards. Scientific Reports,
13(1), 16818.T. Nguyen et al., “Real-Time Anomaly Detection in Smart
Grids Using Machine Learning,” IEEE Transactions on Smart Grid, 2022.
[12] [12]Fathi, M., Haghi Kashani, M., Jameii, S. M., & Mahdipour, E. (2022).
Big data analytics in weather forecasting: A systematic review. Archives of
Computational Methods in Engineering, 29(2), 1247-1275.
[13] [13]Kaur, A., & Sood, S. K. (2020). Artificial intelligence-based model for
drought prediction and forecasting. The Computer Journal, 63(11), 1704-
1712.E. R. Schaefer et al., “Multimodal Anomaly Detection in Industrial
CPS: A Survey,” Sensors, 2023.
[14] [14]Hao, Z., Singh, V. P., & Xia, Y. (2018). Seasonal drought prediction:
Advances, challenges, and future prospects. Reviews of Geophysics, 56(1),
108-141.
[15] [15]Xu, X., Xie, F., & Zhou, X. (2016). Research on spatial and temporal
characteristics of drought based on GIS using Remote Sensing Big Data.
Cluster Computing, 19, 757-767.
[16] [16]Brust, C., Kimball, J. S., Maneta, M. P., Jencso, K., & Reichle, R. H.
(2021). DroughtCast: A machine learning forecast of the United States
drought monitor. Frontiers in big Data, 4, 773478.
[17] [17]Mishra, A. K., & Singh, V. P. (2019). Drought modeling–A review.
Journal of Hydrology, 403(1-2), 157-175.
[18] [18]Kaur, A., & Sood, S. K. (2020). Cloud-Fog based framework for drought
prediction and forecasting using artificial neural network and genetic
algorithm. Journal of Experimental & Theoretical Artificial Intelligence,
32(2), 273-289.
[19] [19]Mishra, A. K., & Desai, V. R. (2022). Drought forecasting using
stochastic models. Stochastic environmental research and risk assessment,
19, 326-339.
[20] [20]Prodhan, F. A., Zhang, J., Hasan, S. S., Sharma, T. P. P., & Mohana, H.
P. (2022). A review of machine learning methods for drought hazard
monitoring and forecasting: Current research trends, challenges, and future
research directions. Environmental modelling & software, 149, 105327.
[21] [21]Gyaneshwar, A., Mishra, A., Chadha, U., Raj Vincent, P. D.,
Rajinikanth, V., Pattukandan Ganapathy, G., & Srinivasan, K. (2023). A
contemporary review on deep learning models for drought prediction.
Sustainability, 15(7), 6160.
[22] [22]Akanbi, A., & Masinde, M. (2020). A distributed stream processing
middleware framework for real-time analysis of heterogeneous data on big
data platform: Case of environmental monitoring. Sensors, 20(11), 3166.
[23] [23]Rhee, J., & Im, J. (2019). Meteorological drought forecasting for
ungauged areas based on machine learning: Using long-range climate
forecast and remote sensing data. Agricultural and Forest Meteorology, 237,
56
105-122.A. Ribeiro et al., “Graph-Based Anomaly Detection in Cyber-
Physical Systems,” ACM SIGKDD Explorations Newsletter, 2023.
[24] [24]Elbeltagi, A., Kumar, M., Kushwaha, N. L., Pande, C. B., Ditthakit, P.,
Vishwakarma, D. K., & Subeesh, A. (2023). Drought indicator analysis and
forecasting using data driven models: case study in Jaisalmer, India.
Stochastic Environmental Research and Risk Assessment, 37(1), 113-131.
[25] [25]Amanambu, A. C., Mossa, J., & Chen, Y. H. (2022). Hydrological
drought forecasting using a deep transformer model. Water, 14(22), 3611.
[26] [26]Chen, Y., & Han, D. (2019). Big data and hydroinformatics. Journal of
Hydroinformatics, 18(4), 599-614.
[27] [27]Deng, M., Di, L., Han, W., Yagci, A. L., Peng, C., & Heo, G. (2023).
Web-service-based monitoring and analysis of global agricultural drought.
Photogrammetric Engineering & Remote Sensing, 79(10), 929-943.
[28] [28]Morid, S., Smakhtin, V., & Bagherzadeh, K. (2022). Drought forecasting
using artificial neural networks and time series of drought indices.
International Journal of climatology, 27(15), 2103-2112.
[29] [29]Panu, U. S., & Sharma, T. C. (2022). Challenges in drought research:
some perspectives and future directions. Hydrological Sciences Journal,
47(S1), S19-S30.
[30] [30]Poornima, S., & Pushpalatha, M. (2023). Drought prediction based on
SPI and SPEI with varying timescales using LSTM recurrent neural network.
Soft Computing, 23(18), 8399-8412.
[31] [31]J. Wang, J. Dong, Y. Yi, G. Lu, J. Oyler, W. K. Smith, M. Zhao, J. Liu,
and S. Running, ‘‘Decreasing net primary production due to drought and
slight decreases in solar radiation in China from 2000 to 2012,’’ J. Geophys.
Res., Biogeosci., vol. 122, no. 1, pp. 261–278, Jan. 2020, doi:
10.1002/2016JG003417.
[32] [32]I.H. Myers-Smith et al., ‘‘Complexity revealed in the greening of the
Arctic,’’ Nature Climate Change, vol. 10, no. 2, pp. 106–117, Feb. 2020,
doi: 10.1038/s41558-019-0688-1.
[33] [33]G. Sugihara, B. T. Grenfell, R. M. May, and H. Tong, ‘‘Nonlinear fore
casting for the classification of natural time series,’’ Phil. Trans. Roy. Soc.
London. A, Phys. Eng. Sci., vol. 348, no. 1688, pp. 477–495, 2022, doi:
10.1098/rsta.1994.0106.
[34] [34]D. Liu, C. Zhang, R. Ogaya, M. Fernández-Martínez, T. A. M. Pugh, and
J. Peñuelas, ‘‘Increasing climatic sensitivity of global grassland vegetation
biomass and species diversity correlates with water availabil ity,’’ New
Phytologist, vol. 230, no. 5, pp. 1761–1771, Jun. 2021, doi:
10.1111/nph.17269.
[35] [35] R.A. Pielke Sr., R. Avissar, M. Raupach, A. J. Dolman, X. Zeng, and
[Link], ‘‘Interactions between the atmosphere and terrestrial
ecosystems: Influence on weather and climate,’’ Global Change Biol., vol. 4,
no. 5, pp. 461–475, Jun. 2022, doi: 10.1046/j.1365-2486.1998.t01-1-
00176.x.
[36] [36]T. S. Bhatia and R. Choudhury, “Recent Advances in Anomaly
Detection Techniques for machine learning Systems,” ACM Computing
Surveys, 2022.
[37] [37]K. H. Kim et al., “Deep Learning Techniques for Anomaly Detection
Systems: A Survey,” IEEE
57