Agripredict CAPSTONE Report
Agripredict CAPSTONE Report
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING
by
DECEMBER 2024
1
ACKNOWLEDGEMENTS
We would like to express our heartfelt gratitude to our professor Dr.Srinivasrao Pokuri sir, for
his valuable guidance, encouragement, and support throughout the development of this project.
His insights and expertise were instrumental in shaping the project's direction and ensuring its
successful completion. We are also deeply thankful to VIT-AP University for providing a
collaborative environment that fostered innovation and learning. Finally, we extend our sincere
appreciation to our fellow team members for their dedication, hard work, and seamless
collaboration, which made this project possible.
ABSTRACT
The AgriPredict-AI driven crop yield and price forecasting System is a web-based
application designed to assist farmers in making informed decisions about crop selection and yield
estimation. Using modern web development technologies like HTML, CSS, and JavaScript for the
frontend and Flask for the backend, the system provides an intuitive interface for users to input
details such as land area, crop type, fertilizers used, and other relevant factors. A trained machine
learning model processes this data to predict crop yields and estimate potential income. This
software-based solution eliminates the need for additional hardware, making it cost-effective and
accessible to users with an internet connection. By leveraging reliable datasets from trusted sources
such as the FAO and Indian Government portals, the system ensures accurate and actionable
insights, empowering farmers to optimize their agricultural practices and maximize productivity.
2
TABLE OF CONTENTS
Sl.No. Chapter Title Page Number
1. Acknowledgement 2
2. Abstract 2
3. List of Figures and Table 4
4. 1 Introduction 6
1.1 Objectives 7
1.2 Background and Literature Survey 7
1.3 8
Organization of the Report
5. 2 Chapter Title (Work) 9
2.1 Proposed System 9
2.2 Working Methodology 10
2.3 Standards 10
2.4 System Details 11
2.4.1 Software 11
2.4.2 Hardware 14
6. 3 Cost Analysis 15
3.1 List of components and their cost 15
7. 4 Results and Discussion 16
4.1 Tentative Methodologies 16
4.2 Data Set Information 16
4.3 Processing Data Sets 17
4.4 Exploring Data sets 17
4.4.1 Distribution of data 17
4.4.2 Statistical Description 18
4.5 Visualizing the data to find Insights 18
4.5.1 Correlation matrix 18
4.6 Building pipelines for data preprocessing 21
4.6.1 Numerical Pipeline 21
4.6.2 Categorical Pipeline 22
4.7 Selecting and Training the machine 23
learning model
4.8 Evaluation Metrics 23
4.8.1 PERFORMANCE OF MODELS ON 24
TRAINING SET
EVALUATION OF MODELS ON THE TEST
4.8.2 25
SET
Combining yield with prices
4.9 26
8. 5 Conclusion & Future Works 27
3
9. 6 Appendix 28
10. 7 References 44
List of Tables
2. Train R2 Score on 24
Training set
4
List of Figures
3 Home Page 11
4 Login page 12
5 Sign up page 12
6 Input page 13
7 Guide Page 13
8 Help Page 13
9 Output Page
10 Distribution of Data 17
11 Heat Map 19
12 Scatter Plot 19
13 Bar Graphs 20
14 One Hot Encoding 22
5
CHAPTER 1
INTRODUCTION
Agriculture remains the cornerstone of the Indian economy, providing livelihoods to millions and
contributing significantly to the nation’s GDP. India’s agricultural landscape includes a diverse
array of crops, such as cereals, pulses, fruits, vegetables, coffee, tea, oilseeds, spices, and non-food
crops like rubber, jute, and cotton. The sector plays a pivotal role in meeting the growing food
demands of an ever-increasing population, yet farmers face persistent challenges, particularly in
price volatility. Fluctuating crop prices hinder farmers’ ability to plan production patterns, causing
economic instability, especially for perishable crops like tomatoes, which have a short shelf life and
are highly sensitive to market price changes. Figure 1 in this report provides a detailed depiction of
crop price fluctuations in India from 2000 to 2024, highlighting the instability in prices over the
years and emphasizing the urgent need for improved agricultural policies and systems to stabilize
market conditions and support farmers.
6
1.1 Objectives
The following are the objectives of this project:
• To assist farmers in selecting the most profitable crop to cultivate based on prior crop yield,
climatic conditions, nutrients supplied, and fertilizers used.
• To utilize machine learning methods to predict the best crops for the upcoming season,
ensuring accurate and data-driven decision-making.
• To analyze and interpret weather conditions such as rainfall and temperature, along with
pesticide usage, to forecast crop yields effectively.
• To provide farmers with a tool that calculates and compares the expected yield for various
crops using input attributes, enabling them to identify the highest-yielding crop.
• To empower farmers with reliable information for agricultural risk management, thereby
improving productivity and optimizing resource utilization.
8
CHAPTER 2
AgriPredict: AI-Driven Crop Yield and Price Forecasting
This Chapter describes the proposed system, working methodology, software and hardware details.
The following block diagram (figure 1) shows the Work flow of this project.
9
2.2 Working Methodology
This System is a web-based application developed using HTML, CSS, and Flask . It allows farmers
to input essential details directly through a user-friendly interface. The inputs include information
such as land area both used and cultivated, crop type, fertilizers and nutrients used, and current
market prices. Once the farmer submits the data, the system processes the inputs and applies a
Random Forest Model, trained on historical data, to predict the crop yield. The system then
combines the yield prediction with real-time market prices and inflation rates to estimate the
farmer's expected income at the time of harvest. The output, which includes predicted crop yield,
estimated income, and market price, is displayed on the web interface. Additionally, the application
provides suggestions on the most profitable crops based on the given inputs, helping farmers make
informed decisions for the upcoming cultivating season. This web application aims to assist farmers
in optimizing their crop production and financial planning without the need for additional hardware
or complex setups.
2.3 Standards
Various standards used in this project are:
• HTTP(Hyper Text Transfer Protocol)
HTTPS ensures secure communication between the user's browser and the web server by
encrypting the data using SSL/TLS. This is crucial for protecting user data, particularly for any
personal or sensitive information that might be shared during the data entry process. It ensures that
the data transmitted between the client (farmer) and the server is encrypted, preventing
unauthorized access.
• W3C Web Standards (HTML5, CSS3, JavaScript ES6):
Your web application should follow the W3C web standards for HTML5, CSS3 These standards
ensure that the website is accessible, compatible across different browsers, and mobile-responsive.
HTML5 offers semantic elements, multimedia capabilities, and better accessibility, while CSS3
enables advanced styling techniques like animations, transitions, and responsive designs
10
2.4 System Details
This section describes the software and hardware details of the system:
11
Fig 4:LOGIN PAGE
12
Fig 6.INPUT PAGE
13
Fig 10:Output Page
The Crop Yield Prediction System is entirely software-based and does not involve any hardware
components. The project is implemented as a web-based application using modern web
development technologies, including HTML, CSS, JavaScript for the frontend, and Flask for the
backend. It facilitates data collection directly from the user through a simple, intuitive interface.
Farmers input details such as land area, crop type, fertilizers, and other relevant factors, which are
then processed by a trained machine learning model to provide yield predictions and income
estimations. The system's focus is solely on software solutions, eliminating the need for additional
hardware, making it cost-effective and accessible to users with just an internet connection.
14
CHAPTER 3
COST ANALYSIS
3.1
We do not use any hardware components with in our project. It is a web based application
The project is implemented as a web-based application using modern web development
technologies, including HTML, CSS, JavaScript for the frontend, and Flask for the backend. It
facilitates data collection directly from the user through a simple, intuitive interface. Farmers input
details such as land area, crop type, fertilizers, and other relevant factors, which are then processed
by a trained machine learning model(Random Forest) to provide yield predictions and income
estimations. The system's focus is solely on software solutions, eliminating the need for additional
hardware, making it cost-effective and accessible to users with just an internet connection. So it is
a ZERO COST APPLICATION.
TOTAL COST=Rs.0.
15
CHAPTER 4
• Problem Identification
• Data Collection
• Dataset Processing
• Data Visualization to gain insights
• Getting the data ready for training
• Training the machine learning models
• Finalizing the model that gives best crop yield
• Relating the crop yield with current and probable future market prices
The dataset is collected from official and trusted resources like FAO (Food and
Agricultural Organization of the United Nations) and Indian Government's data
resources like ODG (Open Government Data).
Multiple datasets are collected on the following categories:
• Crop yield
• Temperature data
• Fertilizers by nutrients containing Nitrogen, Potash, Phosphate
• Pesticides
• Quantity Produced
• Area Irrigated
• Area Harvested
• Land Used
• Synthetic Fertilizers
16
4.3 Processing Data Sets
• All the collected datasets have the data related to 80 different crops that were
grown in India spanning over 30 years (from 1990 to 2019).
• The columns in each dataset are checked thoroughly and unwanted features
are removed .
• Redundant features are removed by keeping the important one out of them
• All the datasets are finally merged into a bigger dataset based on the crop
items and the year of production.
• This new dataset is used for the further process.
17
4.4.2 Statistical Description
18
Fig 11. HEAT MAP
19
Fig 13.Bar Graphs
20
4.6 BUILDING PIPELINES FOR DATA PREPROCESSING
• Sklearn's SimpleImputer
• Imputation transformer for completing missing values.
• Strategy = median: Replacing null values with median value of the column
II . Standard Scaling
• First it subtracts the mean value (so standardized values always have a zero mean), and then it divides by the
standard deviation so that the resulting distribution has unit variance.
• Standardize features by removing the mean and scaling to unit variance.
• The standard score of a sample x is calculated as: z = (x - u) / s
21
4.6.2 Categorical Pipeline
22
4.7 SELECTING AND TRAINING THE MACHINE LEARNING MODEL
• From the problem and the dataset, we can infer that this is a regression problem.
• Various regression algorithms are selected and trained to find the best model out of them.
• The models trained in this project are
i. Linear Regressor
ii. Decision Tree Regressor
iii. Random Forest Regressor
iv. Gradient Boosting Regressor
v. Linear Support Vector Regressor (SVR)
R2 SCORE:
23
4.8.1
• Decision Tree Regressor has highest R2 Score = 1. But the RMSE value of Decision Tree
Regressor is worse when compared to the Random Forest Regressor.
• This means the Decision Tree Regressor has clearly overfitted the training data
• The Random Forest Regressor did well in terms of both RMSE value and R2 Score.
• The Linear SVR did not perform well and clearly it shows a negative R2 Score, this means
the Linear SVR is not following the trend in the dataset and it is fitting worse than a
horizontal line.
• The Linear Regressor fitted the data well, but has poor performance when compared to
Random Forest Regressor and Decision Tree Regressor
• The Gradient Boosting Regressor's weak decision trees couldn't perform well as compared
to individual Decision Tree and Random Forest.
24
4.8.2
25
4.9 COMBINING YIELD WITH PRICES
• The crop yield predicted by the machine learning model will be combined
with the market prices and inflation at that day to predict the amount that the
farmer is going to get at the time of harvest.
• The farmer inputs the required fields in a small farmer friendly UI, which
then processes the whole calculation and shows the output.
26
CHAPTER 5
• This project depends on the input given by the farmer based on the current
price of the crop. This can be improved by collecting data on crop prices
across the country.
• This project can be specialized regionally (for example, statewide crop yield
prediction) depending on the factors prevailing in that region.
• More climatic factors like rainfall, precipitation, wind speed, humidity, etc
can also be taken into consideration in order to improve the model’s
efficiency in terms of climatic conditions.
• Data related to the soil present in the crop can also be considered for more
insights by examining the soil through other artificial intelligence techniques.
• Considering the effects of the natural disasters might help the farmers in
exceptional environmental conditions.
• More crops can be added to the list of the crops in order to increase the reach
of the project to more farmers.
• We can create a Web Application to get inputs from farmers.
• The application’s UI can be made more farmer-friendly by supporting
regional languages.
27
CHAPTER 6
APPENDIX
import pickle
import pandas as pd
app = Flask(_name_)
full_pipeline = pickle.load(f)
model = pickle.load(f)
# Homepage Route
@app.route('/')
def homepage():
return render_template('homepage.html')
def login():
if request.method == 'POST':
username = request.form.get('username')
password = request.form.get('password')
session['logged_in'] = True
session['username'] = username
return redirect(url_for('main_index'))
else:
return redirect(url_for('login'))
return render_template('login.html')
28
# Signup Route (if needed for adding new users)
def signup():
if request.method == 'POST':
username = request.form.get('username')
password = request.form.get('password')
if username in authorized_users:
return redirect(url_for('signup')
authorized_users[username] = password
return redirect(url_for('login'))
return render_template('signup.html')
@app.route('/guide')
def guide_page():
return render_template('Guide.html')
@app.route('/help')
def help_page():
return render_template('help.html')
@app.route('/index')
def main_index():
if not session.get('logged_in'):
return redirect(url_for('login'))
return render_template('index.html')
# Logout Route
@app.route('/logout')
def logout():
session.pop('logged_in', None)
return redirect(url_for('homepage'))
@app.route('/submit', methods=['POST'])
def submit():
29
if not session.get('logged_in'):
return redirect(url_for('login'))
user_input = {
"CropItem": request.form.get("CropItem"),
"Year": int(request.form.get("Year")),
"Nutrients(tonnes)": float(request.form.get("Nutrients(tonnes)")),
"SyntheticFert(tonnes)": float(request.form.get("SyntheticFert(tonnes)")),
"Pesticides(tonnes)": float(request.form.get("Pesticides(tonnes)")),
"Temp_ann_degC": float(request.form.get("Temp_ann_degC")),
"LandUsed": float(request.form.get("LandUsed")),
"LandIrrigated": float(request.form.get("LandIrrigated")),
"MarketPrice": float(request.form.get("MarketPrice")),
try:
input_values = list(user_input.values())
except Exception as e:
return redirect(url_for('main_index'))
def get_processed_data(user_input):
user_input = convert_units(user_input)
user_df = pd.DataFrame([user_input])
return full_pipeline.transform(user_df)
def get_predictions(user_input):
market_price = user_input["MarketPrice"]
crop_area = user_input["LandUsed"]
del user_input["MarketPrice"]
processed_data = get_processed_data(user_input)
def convert_units(user_input):
user_input["Nutrients(tonnes)"] /= 1000
30
user_input["SyntheticFert(tonnes)"] /= 1000
user_input["Pesticides(tonnes)"] /= 1000
user_input["LandUsed"] /= 2471.052
user_input["LandIrrigated"] /= 2471.052
return user_input
def get_inflation_rate():
try:
inflation = pd.read_csv("datasets/InflationRates.csv")
inflation["Value"].fillna(inflation["Value"].median(), inplace=True)
inflation = inflation.groupby("Year")["Value"].mean().reset_index()
return round(inflation["Value"].median() / 2, 5)
except FileNotFoundError:
return 0
if _name_ == '_main_':
app.run(debug=True)
Home page
31
32
33
LOGIN PAGE
34
HELP PAGE
35
36
Training the model over the data set
37
38
39
40
41
42
43
REFERENCES
• S. Veenadhari, B. Misra and C. Singh, "Machine learning approach for forecasting crop yield
based on climatic parameters," 2014 International Conference on Computer Communication
and Informatics, Coimbatore, 2014, pp. 1-5, doi: 10.1109/ICCCI.2014.6921718.
• Balakrishnan, N., Muthukumarasamy, G.: Crop production-ensemble machine learning model
for prediction. Int. J. Comput. Sci. Softw. Eng. 5(7), 148–153 (2016)
• Jeong JH, Resop JP, Mueller ND, Fleisher DH, Yun K, Butler EE, Timlin DJ, Shim KM,
Gerber JS, Reddy VR, Kim SH. Random Forests for Global and Regional Crop Yield
Predictions. PLoS One. 2016 Jun 3;11(6):e0156571. doi: 10.1371/journal.pone.0156571.
PMID: 27257967; PMCID: PMC4892571.
• Mishra, Subhadra & Mishra, Debahuti & Santra, Gour. (2016). Applications of Machine
Learning Techniques in Agricultural Crop Production: A Review Paper. Indian Journal of
Science and Technology. 9. 10.17485/ijst/2016/v9i38/95032.
• Priya, P., Muthaiah, U., & Balamurugan, M. (2018). Predicting yield of the crop using
machine learning algorithm. International Journal of Engineering Sciences & Research
Technology, 7(1), 1-7.
• Pavan P., Virendra P., Shrikhant K., Crop Prediction System using Machine Learning
Algorithms, International Research Journal of Engineering and Technology (IRJET), 7(2),
2020 Feb.
• Aurelien Geron, Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow, 2nd
Edition, O’Reilly.
• David Freedman, Robert Pisani, Roger Purves, Statistics, 4th Edition, Viva Books Pvt ltd.
• Wes McKinney, Python for Data Analysis, 2nd Edition, O’Reilly.
• Jake VanderPlas, Python Data Science Handbook, O’Reilly.
• Peter Harrington, Machine Learning in action, Manning Publications
• Samprit Chatterjee, Ali S. Hadi, Regression Analysis by Example, 5th Edition, Wiley.
44
BIODATA
TEAM MEMBER-1
TEAM MEMBER -2
TEAM MEMBER -2
TEAM MEMBER -4
45