Batch 1
Batch 1
OF
Dr. M. Sridhar
Head & Associate Professor
i
BALAJI INSTITUTE OF TECHNOLOGY & SCIENCE
Accredited by NBA (UG-CE, ECE, ME, CSE Programs) & NAAC A+ Grade
(Affiliated by JNTU Hyderabad and Approved by the AICTE, New Delhi)
NARSAMPET, WARANGAL – 506331
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
(DATA SCIENCE)
CERTIFICATE
External Examiner
i
ACKNOWLEDGEMENT
We would like to express our sincere gratitude to our HoD & Guide, Dr. M. Sridhar
whose knowledge and guidance have motivated us to achieve goals. We never thought
possible. He has consistently been a source of motivation, encouragement and inspiration.
The time we have spent working under his supervision has truly been a pleasure.
We heartily thank to our Principal Dr. V. S. Hariharan for giving this great
opportunity and his support to complete our project.
We thank all our senior faculty members for their effort, guidance and help during our
course. Thanks to programmers and non - teaching staff of CSD Department of our college.
Finally Special thanks to our parents for their support and encouragement throughout
our life and this course. Thanks to all my friends and well-wishers for their constant support.
A.DEVARAJ (22C31A6705)
MUBASHIR HUSSAIN SHARIEF (22C31A6744)
L.BALAJI (22C31A6736)
D.RAHUL (22C31A6715)
ii
ABSTRACT
The growing global population has intensified the demand for increased
agricultural productivity and sustainability. Traditional farming practices often rely on
general assumptions about crop selection and fertilizer application, which can lead to
inefficient resource utilization and reduced crop yields. In this context, technological
innovations such as machine learning offer immense potential to revolutionize agriculture
by enabling data-driven decision-making. This project, titled "Crop and Fertilizer
Recommendation System Using Machine Learning", is developed with the objective
of assisting farmers in making intelligent decisions regarding suitable crop cultivation
and fertilizer usage based on their specific soil and environmental conditions.
iii
TABLE OF CONTENTS
i
CERTIFICATE
AKCNOWLEDGEMENT ii
ABSTRACT iii
TABLE OF CONTENTS iv
1. Introduction
1.2. Objective 2
1.4. Motivation 3
2 . Literature Survey 4
3 . System Development 6
3.3. Approaches 11 8
3.4. Dataset 11 8
iv
4.1. About The Data 19
4.4. Output 25
5 . Conclusion 26
6. References 28
v
1 . INTRODUCTION
As we know snice the humans have started practicing or doing agriculture activity
―Agriculture‖ has become the most important activity for humans. In today’s era or
world agriculture is not only for surviving it’s also play huge part or role in economy of
any country. Agriculture plays vital role in India’s economy and in human future too. In
India it also provides large portion of employment for Indians. As a result, with passage
of time the need for production has been accumulated exponentially. thus, on
manufacture in mass amount individuals are exploitation technology in associate degree
extremely wrong method.
With the improvement of the technologies day by day there is creation of hybrid
varieties day by day. In comparison with naturally created crop these hybrid varieties
don’t offer or provide essential contents. Depending more on unnatural techniques may
lead to soil acidification and crust. These types of activities all lead up to environmental
pollution. These types of activities (unnatural activities) are for avoiding or reducing
losses. However, once the farmer or producer get or grasp the correct data on the crop
yield, it will help the farmer in avoiding or reducing the loss
Around the globe India is the second largest country in terms of population. Many
people are dependent on agriculture, but the sector lacks efficiency and technology
especially in our country. By bridging the gap between traditional agriculture and data
science, effective crop cultivation can be achieved. It is important to have a good
production of crops. The crop yield is directly influenced by the factors such as soil type,
composition of soil, seed quality, lack of technical facilities etc.
In India agriculture plays important role in economic sector and also plays the most
important role in global development. A lot more than 60% of the country's land is used
for agriculture to meet the needs of 1.3 billion people. So, adopting new technologies for
agriculture plays important role. This is going to lead our country's farmers to make a
profit. Crop prediction and fertilizer prediction in most part of part India is done on by
farmers will prefer previous or neighboring crops or most prone to the surrounding
region only because of their land and do not have sufficient information about the content
of soil like phosphorus, potassium, nitrogen.
and over again without trying new varieties and randomly fertilize without knowing the amount
and content that is missing. Therefore, it directly affects crop yield and acidifies the soil result in
reducing soil fertility.
We are designing the system using machine learning to help farmers in crop and
fertilizer prediction. Right crop will be recommended for a specific soil and also keeping
in mind of climatic boundaries. Also, the system provides information about the
fertilizer, the seeds needed for planting.
With the help of our system farmers can try to grow or cultivate different varieties with right
1.2) Objective
• Recommend crops that should be planted by farmers based on several criteria and
help them make an informed decision before planting.
• Recommend the most suitable fertilizer, based on the same criteria.
• In the crop recommendation app, the user can provide soil data on his side and
the app will predict which crop the user should grow.
• With the fertilizer application, the user can enter soil data and the type of crop
they are planting, and the application will predict what the soil is lacking.
2
• Defining the problem
• Comparing algorithms
1.4) Motivation
3
2. LITERATURE SURVEY
2 . LITERATURE SURVEY
Recommendation system for crop and fertilizer are present in market and also many
are on developing stage which consider various factors such as climate condition at the
time of plantation, rainfall, humidity or soil contents. Many research has been done in
this field and following are some of the research and paper that has been carried out in
this field.
The article ―Prediction of crop yield and fertilizer recommendation using machine
learning algorithms ―[1] concludes that the prediction of crop for all intents and
purposes yield based on location and proper implementation of algorithms have
essentially proved that the pretty much higher crop actually, kind-of yield can generally
particularly be achieved, which definitely is quite significant, or so they generally
thought. From above work I particularly conclude that for soil classification Random
Forest is definitely good with accuracy of 86.35% compared to Support Vector Machine,
which definitely really is quite significant, or so they for the most part thought.
For crop essentially yield prediction Support Vector Machine generally specifically
is particularly very good with accuracy 99.47% mostly compare to fairly kind of Random
Forest algorithm in a for all intents and purposes major way, sort of contrary to popular
belief. The work can basically literally be extended particularly further to mostly for the
most part add following functionality, particularly contrary to popular belief. Mobile
application can essentially be kind of for the most part build to generally particularly
help farmers by uploading image of farms. Crop diseases detection using image
processing in which user get pesticides based on disease images, which generally is quite
significant. Implement actually fairly Smart Irrigation System for farms to for all intents
and purposes mostly get pretty sort of much kind of higher yield, or so they kind of for all
intents and purposes thought.
Paper introduced [2] by Rakesh Kumar, M.P. Singh, Prabhat Kumar and J.P. Singh
proposed utilization of seven AI procedures i.e., ANN, SVM, KNN, Decision Tree,
Random Forest, GBDT and Regularized Gradient Forest for crop determination. The
framework is intended to recover every one of the harvests planted and season of
developing at a specific season. Yield pace of each harvest is gotten and the harvests
giving better returns are chosen. The framework likewise proposes an arrangement of
harvests to be planted to get the more significant returns
4
Leo Brieman [3], is gaining practical experience in the precision and strength and
connection of arbitrary woods calculation. Arbitrary woods calculation makes choice
trees on various information tests and afterward foresee the information from every
subset and afterward by casting a ballot offers better the response for the framework.
Irregular Forest utilized the stowing strategy to prepare the information. To support the
exactness, the arbitrariness infused needs to limit the connection ρ while keeping up with
strength.
5
3 SYSTEM DEVELOPMENT
6
3.2) Introduction to Machine Learning
―Machine learning is part of Artificial intelligent. Machine studying’s purpose is to
recognize the shape of facts and in shape it into models that humans can apprehend and
utilize.‖
―Machine Learning’s a subfield of pc technology that differs from conventional
computational strategies. In classical computer science, an algorithm is a set of certain
instructions that are explicitly used by a computer to compute or solve a problem.
Machine studying, however, permits computer systems to study from records inputs after
which use statistical analysis to provide outputs which can be within a given variety.
Machine getting to know makes it simpler for computer systems to develop fashions
from pattern records and automate choice- making techniques based totally on facts
inputs as a result.‖
―Machine gaining knowledge of is a swiftly changing place. As a result, whether or
not operating with system getting to know approach reanalyzing the impact of system
studying techniques, there are a few elements to do not forget.‖
In machine gaining knowledge of, obligations are regularly divided into large
categories.These classifications are based totally on how data is acquired and the way the
system responds to it.
Two of the maximum widely used gadget getting to know methods are unsupervised
mastering, which gives the algorithm without a labelled information so as for it to find
structure inside its input statistics, and supervised mastering, which trains algorithms
primarily based on example input and output facts that is labelled via humans. Let's take
a deeper study each of those strategies.
Supervised
In this learning machine learning model is provide with dataset having inputs as well as
their correct outputs too. Or we can say that labelled datasets are provided to algorithms
in machine learning model for training (guided training). Applications of supervised
learning speech recognition, spam detection, bioinformatic etc.
7
Unsupervised
In this learning labelled datasets is not provided. It tries to find pattern between the data
in the datasets. In this type of learning involvement of human or human supervision is
required less compared to the supervised learning. It can manage or handle unstructured
data and unlabeled data more easily. Though, it make easier to analyzing, finding pattern
in complex data.
3.3) Approaches
3.4) Dataset
We have considered 2 datasets. One helps recommendation of crops, and second dataset
helps in prediction or recommendation of fertilizer
Dataset for crop recommendation
As we all know that good crop production or good yield of crop depends on various
factor, in this dataset we are provided with various factors that is involved in production
of crop. With the help of this data set crop recommendation model can be created.
8
d) Temperature: in Celsius
e) Humidity: relative humidity in %
g) Rainfall: in mm
Only finding right crop to grow is not enough for good yield or good yield production we
must also find what fertilizer must be used for crop care
Dataset for fertilizer recommendation have following data fields
9
c) K: tells about the ratio of Potassium
d) Ph
e) soil moisture
f) crop
10
3.5) Data Preprocessing
Data is collected from various sources therefore it may contain many missing values or
raw data which is collected is processed in a manner so that it can be easily process in
different tasks like in machine learning model, data science tasks.
Model Building
Model building is a process to create a mathematical model which will help in predicting
or calculating the outcomes in future based on data collected in the past.
E.g.-
A retail wants to know the default behavior of its credit card customers. They want to
predict the probability of default for each customer in next three months.
Probability of default would lie between 0 and 1.
Assume every customer has a 10% default rate.
Probability of default for each customer in next 3 months=0.1
It moves the probability towards one of the extremes based on attributes of past
information. A customer with volatile income is more likely (closer to) to default.
A customer with healthy credit history for last years has low chances of default (closer to
0).
Steps in Model
Building Algorithm
Selection Training
Model Prediction
Scoring
11
Algorithm Selection
Example
Yes
No
Supervised Unsupervised
Learning Learning
Is dependent
variable
continuous?
Yes No
Regression Classification
Figure 5: Algorithm Selection
Logistic
Regression
Decision Tree
Random
Forest
Training Model
13
(unknown dependent variable) Used to score.
Predictive Modelling
Supervised Learning:
Unsupervised learning :
It is a branch of machine learning that deals with unlabeled data. Unlike supervised
learning, where the data is labeled with a specific category or outcome, unsupervised
learning algorithms are tasked with finding patterns and relationships within the data
without any prior knowledge of the data's meaning. Unsupervised machine learning
algorithms find hidden patterns and data without any human intervention, i.e., we don't
give output to our model. The training model has only input parameter values and
14
discovers the groups or patterns on its own.
15
Clustering:
A clustering problem is where you want to discover the inherent groupings in the data,
such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
i. Problem definition
v. Predictive Modelling
Logistic Regression
16
• Naive Bayes
This algorithm thinks that the dataset features are all independent of each other.
Larger the dataset it works better. DAG (directed acyclic graph) is used for classification
in this or naïve bayes algorithm.
• Random forest
Random Forest has the ability to analyze crop growth related to the current climatic
conditions and biophysical change. Random forest algorithm creates decision trees on
different data samples and then predict the data from each subset and then by voting gives
better solution for the system. Random Forest uses the bagging method to train the data
which increases the accuracy of the result.
Decision Tree
Decision tree is the most powerful and popular tool for classification and prediction. A
Decision tree is a flowchart like tree structure, where each internal node denotes a test on
an attribute, each branch represents an outcome of the test, and each leaf node (terminal
node) holds a class label.
17
3.7) Tools and libraries
used Python:
For carrying out this project in the best possible manner, we decided on using
Python Language, which comes with several pre-built libraries (such as pandas, NumPy,
SciPy, and etc.) and is loaded with numerous features for implementing data science and
machine learning techniques which allowed us to design the model in the most efficient
manner possible. For building this project we utilized numerous python libraries for
executing different operations.
● Python - Python is a robust programming language with a wide range of
capabilities. Its broad features make working with targeted programs (including meta-
programming and meta- objects) simple. Python takes advantage of power typing as well
as the integration of reference computation and waste management waste collecting. It
also supports advanced word processing (late binding), which binds the way the words
change during the process.
Patches to fewer essential sections of C Python that can give a minor improvement
in performance at an obvious price are rejected by Python developers who try to prevent
premature execution. When speed is crucial, the Python program developer can use mod-
written modules in C-languages or Python, a timely compiler, to submit time-
sensitive jobs.
Python is a Python interpreter that transforms Python scripts to C and uses the C-
1evel API to call the Python interpreter directly. Python's creators attempt to make the
language as fun to use as possible. Python's architecture supports Lisp culture in terms of
functionality. Filters, maps, and job reduction, as well as a list comprehension,
dictionaries, sets, and generator expressions, are all included. Two modules (itertools and
functools) in the standard library use realistic Haskell and Standard ML tools.
We're using Python because it works on a wide range of platforms. Python is a language
with no stages. Python is a as simple as English. Python have many libraries and has a
simple linguistic structure similar to English, whereas Java and C++ have complicated
codes. Python applications contain less lines than programs written in other languages.
That is why we choose Python for artificial intelligence, artificial consciousness, and
dealing with massive volumes of data. Python is an article-oriented programming
18
language. Classes, objects, polymorphism, exemplification, legacy, and reflection are all
concepts in Python.
NumPy library:
NumPy is a Python program library, which adds support for large, multi-dimensional
collections and matrices, as well as a large collection of mathematical functions
developed to work in these components.
The use of NumPy in Python is basically the same as that of MATLAB as they both
translate and allow customers to create projects faster as long as multiple tasks are
focused on clusters or networks rather than scales. Along with these critical ones, there
are several options:
Pandas’ library:
It is a software library in python to decrypt and analyze data. It provides data structures
and functions to manage number tables and time series. Free software released under a
three-phase BSD license. The term is taken from the term "panel data", an econometrics
term for data sets that incorporates visibility into many identical people.
Adding or modifying data engines by a robust community that allows different
applications to be integrated into data sets. High output of a combination of data and a
combination. Hierarchical indexing provides an accurate way of dealing with large-scale
data in a small data structure.
Matplotlib:
John Hunter and many others built a matplotlib Python library to create graphs, charts,
and high-quality statistics. The library can change very little information about
mathematics, and it is great. Some of the key concepts and activities in matplotlib are:
Picture
Every image is called an image, and every image is an axis. Drawing can be considered
as a way to draw multiple episodes.
Structure
Data is the first thing that a graph should be drawn. A keyword dictionary with keys and
values such as x and y values can be declared. Next, scatter (), bar (), and pie () can be
used to create a structure and a host of other functions.
Axis
19
Adjustments are possible using the number and axes obtained using the sub-sections ().
Uses a set () function to adjust x-axis and y-axis features.
Scikit learn:
The best Python Scikit-learn machine library. The SKlearn library contains many
practical machine learning tools and mathematical modeling methods, including division,
deceleration, integration and size reduction. Machine learning models used by SKlearn.
Scikit-Learn charges for tons of features and should not be used to read data or trick or
summarize it. Some of them are there to help you translate the spread.
Scikit-learn comes with many features. Some of them are here to help us explain
the spread:
• Supervised learning algorithms: Consider any professional reading algorithms you may
have studied and may be part of science. Starting with the standard line models, SVM,
decision trees are all in the science toolbox. One of the main reasons for the high level of
use of scientists is the proliferation of machine learning algorithms. I started using scikit,
and I would recommend young people to learn the scikit / machine. I will solve
supervised learning problems.
• Unchecked learning algorithms: There are also a wide variety of machine learning
algorithms ranging from compilation, feature analysis, key component analysis to
unchecked neural networks.
• Contrary verification: a variety of methods are used by SKlearn to ensure the accuracy
of the models followed with invisible details.
• Feature removal: Scientific learning to remove images and text elements.
• Datasets for different toys: This was useful when studying science. I have studied SAS
for different educational data sets. It helped them a lot to support when they read the new
library.
20
4 .PERFORMANCE ANALYSIS
The data used in this project is made by enlarging and consolidating India’s publicly
available data sets such as weather, soil, etc. This data is simple compared to very few
factors but useful as opposed to complex factors that affect crop yields.
The data are rich in Nitrogen, Phosphorus, Potassium, and soil PH. Also, it contains
humidity, temperature and rainfall required for a particular plant.
• Logistic regression
21
• Naive Bayes
22
• Random Forest
23
• Decision tree
24
• SVM: (Support vector machine)
25
4.3) Accuracy Comparison of Algorithms
26
4.4) Output:
Crop Recommended
Fertilizer Recommended
27
5 .CONCLUSION
In this project we try to get best crop and fertilizer recommendation with the help of
machine learning. For the calculation of accuracy many machine learning techniques
were imposed or used. Numerous algorithms were used on datasets to get the best output
which leads to best crop and fertilizer recommendation for particular soil of particular
region.
This system will help farmers to visualize crop yields based on that climatic and
subsistence boundaries
Using this farmer can decide whether to plant that crop or to look for another crop if
yield forecasts are incorrect.
This tool can help the farmer to make the best decisions when it comes to growing
something harvest. It may also predict the negative effects of the plant.
Currently our farmers use outdated technology or not use effectively, so there can be an
opportunity of the wrong choice of cultivated crops that will reduce the profit by
production.
To reduce these types of loss we try to create a farmer-friendly system, which will help in
predicting which crop is best for a specific soil and this project will give the
recommendation about the fertilizer needed by the soil for cultivation, seeds needed for
cultivation, expectations yield and market price. Thus, this enables farmers to make the
right choice in choosing a crop farming so that the agricultural sector can develop with
new ideas
For the upcoming updates in this project we can use deep learning techniques for plant
diseases prediction with the help of images and we can also implement IOT techniques
for getting contents of soil directly from the fields.
Future Work on this can add many options such as:
28
• The mobile app can be developed to assist farmers with uploading farm photos.
• Plant Disease Detection is used to process images where the user finds
pesticides based on their pictures of diseases.
29
6.REFERENCES
4. Priya, P., Muthaiah, U., Balamurugan, M.‖Predicting Yield of the Crop Using
Machine Learning Algorithm‖,2015
30