[go: up one dir, main page]

100% found this document useful (1 vote)
5K views32 pages

Loan Prediction System

The document describes a project report on a loan prediction system submitted by four students. The system uses machine learning models on past loan data to predict loan fraud and defaults, helping banks reduce losses. It involves preprocessing the data, splitting it into training and test sets, training models like decision trees and evaluating their performance, with the goal of deploying an accurate fraud detection system.

Uploaded by

Nihal Gour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
5K views32 pages

Loan Prediction System

The document describes a project report on a loan prediction system submitted by four students. The system uses machine learning models on past loan data to predict loan fraud and defaults, helping banks reduce losses. It involves preprocessing the data, splitting it into training and test sets, training models like decision trees and evaluating their performance, with the goal of deploying an accurate fraud detection system.

Uploaded by

Nihal Gour
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Project Report

On
Loan Prediction System
Submitted in Partial fulfillment for the award of degree of Bachelor of
Engineering in Computer Science and Engineering

Submitted to

Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal (M.P.)


Submitted By:
ADITYA RANA (0126CS171005)
KANHA GOYAL (0126CS171042)
NIHAL GOUR (0126CS171056)
ASTHA JAIN (0126CS171026)

Under the Guidance


Prof. AMIT DUBEY
Associate Professor
Department of Computer Science & Engineering

ORIENTAL COLLEGE OF TECHNOLOGY, BHOPAL


Approved by AICTE New Delhi & Govt. of M.P.
Affiliated to Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal (M.P.)
Session: 2020-21
ORIENTAL COLLEGE OF TECHNOLOGY, BHOPAL
Approved by AICTE New Delhi & Govt. of M.P. & Affiliated to Rajiv Gandhi
Proudyogiki Vishwavidyalaya, Bhopal (M.P.)
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE
This is to certify that the work embodied in this Project, Dissertation Report
entitled as “ Loan prediction system” being Submitted by ADITYA RANA
(0126CS171005), KANHA GOYAL (0126CS171042), NIHAL GOUR
(0126CS171052) and ASTHA JAIN (0126CS171026) in partial fulfillment of the
requirement for the award of “Bachelor of Engineering” in Computer Science &
Engineering discipline to Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal (M.P.)
during the academic year 2020-21 is a record of confide piece of work, carried out
under my supervision and guidance in the Department of Computer Science &
Engineering, Oriental College of Technology, Bhopal.

Approved by

Prof. AMIT DUBEY DR. SANJAY SHARMA

Guide Head of Department

DR. AMITA MAHOR


Director

i
ORIENTAL COLLEGE OF TECHNOLOGY, BHOPAL
Approved by AICTE New Delhi & Govt. of M.P. & Affiliated to Rajiv Gandhi
Proudyogiki Vishwavidyalaya, Bhopal (M.P.)
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE OF APPROVAL

This Project “Loan prediction system” being submitted by ADITYA RANA


(0126CS171005), KANHA GOYAL (0126CS171042), NIHAL GOUR (0126CS171052)
and ASTHA JAIN (0126CS171026) has been examined by me & hereby approve
for the partial fulfillment of the requirement for the award of “Bachelor of
Engineering in Computer Science & Engineering”, for which it has been
submitted. It is understood that by this approval the undersigned do not
necessarily endorse or approve any statement made, opinion expresses or
conclusion draw there in, but the Project only for the purpose for which it has
been submitted.

INTERNAL EXAMINER EXTERNAL EXAMINER

Date: Date:

ii
CANDIDATE DECLARATION

We hereby declare that the Project dissertation work presented in the report
entitled as “LOAN PREDICTION SYSTEM” submitted in the partial fulfillment of
the requirements for the award of the degree of Bachelor of Engineering in
Computer Science & Engineering of Oriental College of Technology is an
authentic record of our own work.
We have not submitted the part and partial of this report for the award of any
other degree or diploma.

ADITYA RANA (0126CS171005)


KANHA GOYAL (0126CS171042)
NIHAL GOUR (0126CS171052)
ASTHA JAIN (0126CS171026)

Date:

This is to certify that the above statement made by the candidates is correct
to the best of my knowledge.

Prof. AMIT DUBEY


Guide

iii
ACKNOWLEDGMENT

We are heartily thankful to the Management of Oriental College of


Technology for providing us all the facilities and infrastructure to take our
work to the final stage. It is the constant supervision, moral support and
proper guidance of our respected Director DR. AMITA MAHOR, who
motivated throughout the work.
We express deep sense of gratitude and respect to our learned guide
Prof. AMIT DUBEY Professor in the Department of Computer Science &
Engineering, during all phases of our work. Without his enthusiasm and
encouragement this dissertation would not have been completed. His
valuable knowledge and innovative ideas helped us to take the work to the
final stage. He has timely suggested actions and procedures to follow for
which we are really grateful and thankful to him.
We express our gratefulness to DR. SANJAY SHARMA Head of Computer
Science & Engineering Department for providing all the facilities available
in the department for his continuous support, advice, and encouragement
during this work and also help to extend our knowledge and proper
guidelines.
Constant help, moral and financial support of our loving parents
motivated us to complete the work. We express our heartily thanks to our
all family members for their co-operation.
We really admire the fond support of our class-mates for their co-
operation and constant help. It gives immense pleasure to acknowledge
the encouragement and support extended by them. Last but not the least
we are extremely thankful to all who have directly or indirectly helped us
for the completion of the work.

ADITYA RANA (0126CS171005)


KANHA GOYAL (0126CS171042)
NIHAL GOUR (0126CS171052)
ASTHA JAIN (0126CS171026)

4|Page
Abstract
The rate at which banks looses funds to loan beneficiaries due to loan default is alarming.
This trend has led to the closure of many banks, potential beneficiaries deprived of access
to loan; and many workers losing their jobs in the banks and other sectors.
Recently the Scams occurred in Indian Banking sector by Scammer such as VIJAY MALLYA,
MEHUL CHAUKSEY, NIRAV MODI and list goes on and on, The Average Money that have
Conned is 10,000 CRORES which bring us to a serious question “Is our Money Safe in Bank”,
This project is a small practical approach to find what next could be done with System.
So, that every person can be a little more safe .

This work uses past loan records based on the employment of machine learning to predict
fraud in bank loan administration and subsequently avoid loan default that manual scrutiny
by a credit officer would not have discovered. However, such hidden patterns are revealed
by machine learning.
We have used various methods like Confusion matrix, decision tree, fraud, XG Boost,
logistic Regression and replace old System of Civil Score. Statistical and conventional
approaches in this direction are restricted in their accuracy capabilities. With a large
volume and variety of data, credit history judgment by man is inefficient; case-based,
analogy-based reasoning and statistical approaches have been employed but the 21st
century fraudulent attempts cannot be discovered by these approaches, hence; the
machine learning approach using the decision tree method to predict fraud and it delivers
an accuracy of more than 90 percent.

5|Page
INDEX
S.NO CONTENTS PAGE
1 Introduction 7
2 Detailed Project Profile 8
3 Steps of Implementation 10
4 Process of implementing Machine Learning 10-18
4.1 Dataset Preparation & Preprocessing 10
4.2 Data Visualization 11
4.3 Data Preprocessing 12
4.3.1 Labelling 13
4.3.2 Data Selection 14
4.3.3 Data Formatting 14
4.3.4 Data Cleaning 14
4.3.5 Data Anonymization 15
4.3.6 Data Sampling 15
4.4 Data Transformation 15
4.4.1 Scaling 15
4.4.2 Decomposition 15
4.4.3 Aggregation 15
4.5 Dataset Splitting 16
4.5.1 Training set 16
4.5.2 Test set 16
4.5.3 Validation Dataset 16
4.6 Modelling 17
4.6.1 Model Training 17
4.6.1.1 Supervised learning 17
4.6.1.2 Unsupervised learning 17
4.7 Model Evaluation and testing 17
4.8 Model Deployment 19
5 Python 19-23
6 Libraries used 19
6.1 NUMPY 20
6.2 PANDAS 21
6.3 SEABORN 22
6.4 MATPLOTLIB 22
7 Flask 23
8 Software and hardware requirement 24
9 Screen layout 25
10 Output layout 27
11 Limitations 28
12 Future Scope 29
13 Conclusion 30
14 Appendix 31

6|Page
1. Introduction

Loans are the core business of banks. The main profit comes directly from the loan’s
interest. The loan companies grant a loan after an intensive process of verification and
validation. However, they still don’t have assurance if the applicant is able to repay the loan
with no difficulties. In this project, we’ll build a predictive model to predict if an applicant is
able to repay the lending company or not.
A loan is a form of debt incurred by an individual or other entity. The lender—usually a
corporation, financial institution, or government—advances a sum of money to the
borrower. In return, the borrower agrees to a certain set of terms including any finance
charges, interest, repayment date, and other conditions.
Companies want to automate the loan eligibility process (real-time) based on customer
detail provided while filling an online application form. These details are Gender, Marital
Status, Education, Number of Dependents, Income, Loan Amount, Credit History and
others. To automate this process, they have given a problem to identify the customers'
segments; those are eligible for loan amount so that they can specifically target these
customers. Here they have provided a partial data set.
There are unsolved fraudulent practices in financial operations in the society, including
bank credit administration, calling for a remedy through intelligent technology. Existing
fraud detection techniques in bank credit administration have not sufficiently met the
desired accuracy, and avoidance of false alarm, and none focused on fraud in bank credit
default. Also, fraudulent duplicates, missing data, and undefined fraud scenarios affect
prediction accuracy.
Any unlawful act by human beings or invoked by machines that leads to personal
gain at the expense of institutions or the legal human beneficiaries is a financial fraud, but
an error must not be taken for a fraud . Considering the overall effect of financial frauds, it
is referred to as an economic sabotage. The examples of financial fraud are money
laundering, bank credit fraud, pension fraud, co-operative society fraud, tax evasion,
telecommunications fraud, credit card fraud, inflated

Motivation

In Indian Banking System, we use CIBIL score to determine if a person is eligible for loan,
but it can be highly manipulated when we dig down deep.
A CIBIL Score is a consumer's credit score. Simply put, this is a three-digit numeric summary
of a consumer's credit history and a reflection of the person's credit profile. This is based on
past credit behavior, such as borrowing and repayment habits as shared by banks and
lenders with CIBIL on a regular basis.

7|Page
We are trying to find if a person get answer is yes or no, So that he/she can get a clear
picture instead of a score range which is more wide range in from 0-900. It will be more
practical for banks too.

2. Detailed Project Profile


We use waterfall model for developing this project. Waterfall model is the basic software
development life cycle model. It is very simple but idealistic. Earlier this model was very
popular but nowadays it is not used. But it is very important because all the other software
development life cycle models are based on the classical waterfall model.
Classical waterfall model divides the life cycle into a set of phases. This model considers
that one phase can be started after completion of the previous phase. That is the output of
one phase will be the input to the next phase. Thus the development process can be
considered as a sequential flow in the waterfall. Here the phases do not overlap with each
other. The different sequential phases of the classical waterfall model are shown in the
below:

8|Page
Step of
Working in this project were:
• Collection of Data to Start with
• Data Pre-processing
• Exploratory Data Analysis
• Model Building (Test and Train)
• Improving Efficiency
• Making a web Application

Our Model Foundationally follow - Decision tree


Decision Tree is a type of supervised learning algorithm (having a pre-defined
pre target
variable) that is mostly used in classification problems. In this technique, we split the
population or sample into two or mo more homogeneous sets (or sub-populations)
populations) based on
the most significant splitter/differentiator in input variables. Decision trees use multiple
algorithms to decide to split a node into two or more sub-nodes.
sub nodes. The creation of sub-nodes
sub
increases the homogeneityity of resultant sub
sub-nodes.
nodes. In other words, we can say that purity of
the node increases with respect to the target variable.
3. Steps of Implementation
Any Project has 3 parts to start with language, framework and methods
We are using Python as a language to implement Machine Learning and ffor front end
Framework used is Flask
1. KERAS, TENSORFLOW, and SKLEARN for machine learning
2. NUMPY for high-performance
performance scientific computing and data analysis
3. SCIPY for advanced computing
4. Pandas for general-purpose
purpose d data analysis
5. SEABORN for data visualization

4. Process of Implementing Machine Learning

We are doing this project using machine Learning,


Learning Every time three steps are executed one
by one while implementing any model. A general process flow include.

4.1 Dataset preparation and preprocessing


Data is the foundation for any machine learning project. The second stage of project
implementation is complex and involves data collection, selection, preprocessing, and
transformation. Each of these phases can be split
spl into several steps.
Data collection
It’s time for a data analyst to pick up the baton and lead the way to machine learning
implementation. The job of a data analyst is to find ways and sources of collecting relevant
and comprehensive data, interpreting it, and analyzing results with the help of statistical
techniques .The type of data depends on what you want to predict.
There is no exact answer to the question “How much data is needed?” because each
machine learning problem is unique. In turn, the numbe
numberr of attributes data scientists will
use when building a predictive model depends on the attributes’ predictive value.
‘The more, the better’ approach is reasonable for this phase. Some data scientists suggest
considering that less than one-third
third of
of collected data may be useful. It’s difficult to estimate
which part of the data will provide the most accurate results until the model training
begins. That’s why it’s important to collect and store all data — internal and open,
structured and unstructured.

4.2 Data visualization


A large amount of information represented in graphic form is easier to understand and
analyze. Some companies specify that a data analyst must know how to create slides,
diagrams, charts, and templates.
4.3 Data Preprocessing
The purpose of preprocessing is to convert raw data into a form that fits machine learning.
Structured and clean data allows a data scientist to get more precise results from an
applied machine learning model. The technique includes data formformatting,
atting, cleaning, and
sampling.
4.3.1 Data Labeling
Supervised machine learning is a predictive model on historical data with predefined target
answers. An algorithm must be shown which target answers or attributes to look for.
Mapping these target attributes in a dataset is called labeling.

Data labeling takes much time and effort as datasets sufficient for machine learning may
require thousands of records to be labeled. For instance, if your image recognition
algorithm must classify types of bicycles, these types should be clearly defined and labeled
in a dataset. Here are some approaches that streamline this tedious and time-consuming
time
procedure.
4.3.2 Data selection
After having collected all information, a data analyst chooses a subgro
subgroup
up of data to solve
the defined problem. For instance, if you save your customers’ geographical location, you
don’t need to add their cell phones and bank card numbers to a dataset. But purchase
history would be necessary. The selected data includes attrib attributes
utes that need to be
considered when building a predictive model.

4.3.3 Data Formatting


The importance of data formatting grows when data is acquired from various sources by
different people. The first task for a data scientist is to standardize record formats. A
specialist checks whether variables representing each attribute are recorded in the same
way. Titles of products and services, prices, date formats, and addresses are examples of
variables. The principle of data consistency also applies to attributes represented by
numeric ranges.

4.3.4 Data Cleaning


This set of procedures allows forr removing noise and fixing inconsistencies in data. A data
scientist can fill in missing data using imputation techniques, e.g. substituting missing values
with mean attributes. A specialist also detects outliers — observations that deviate
significantly from the rest of distribution. If an outlier indicates erroneous data, a data
scientist deletes or corrects them if possible. This stage also includes removing incomplete
and useless data objects.

4.3.5 Data Anonymization


Sometimes a data scientist must anonymize or exclude attributes representing sensitive
information (i.e. when working with healthcare and banking data).

4.3.6 Data Sampling


Big datasets require more time and computational power for analysis. If a dataset is too
large, applying data sampling is the way to go. A data scientist uses this technique to select
a smaller but representative data sample to build and run models much faster, and at the
same time to produce accurate outcomes.

4.4 Data Transformation


In this final preprocessing phase, we transform or consolidate data into a form appropriate
for mining (creating algorithms to get insights from data) or machine learning. Data can be
transformed through scaling (normalization), attribute decompositions, and attribute
aggregations. This phase is also called feature engineering.

4.4.1 Scaling
Data may have numeric attributes (features) that span different ranges, for example,
millimeters, meters, and kilometers. Scaling is about converting these attributes so that
they will have the same scale, such as between 0 and 1, or 1 and 10 for the smallest and
biggest value for an attribute.

4.4.2 Decomposition
Sometimes finding patterns in data with features representing complex concepts is more
difficult. Decomposition technique can be applied in this case. During decomposition, a
specialist converts higher level features into lower level ones. In other words, new features
based on the existing ones are being added. Decomposition is mostly used in time series
analysis. For example, to estimate a demand for air conditioners per month, a market
research analyst converts data representing demand per quarters.

4.4.3 Aggregation
Unlike decomposition, aggregation aims at combining several features into a feature that
represents them all. For example, you’ve collected basic information about your customers
and particularly their age. To develop a demographic segmentation strategy, you need to
distribute them into age categories, such as 16-20, 21-30, 31-40, etc. You use aggregation
to create large-scale features based on small-scale ones. This technique allows you to
reduce the size of a dataset without the loss of information.

15 | P a g e
4.5 Dataset splitting

A dataset used for machine learning should be partitioned into three subsets
1 –training,
2 –test, and
3 –validation sets.

4.5.1 Training set


A data scientist uses a training set to train a model and define its optimal parameters —
parameters it has to learn from data.

4.5.2 Test set


A test set is needed for an evaluation of the trained model and its capability for
generalization. The latter means a model’s ability to identify patterns in new unseen data
after having been
een trained over a training data. It’s crucial to use different subsets for
training and testing to avoid model over fitting, which is the incapacity for generalization
we mentioned above.

4.5.3 Validation set


The purpose of a validation set is to tweak a model’s hyper parameters — higher-level
structural settings that can’t be directly learned from data. These settings can express, for
instance, how complex a model is and how fast it finds patterns in data.
4.6 Modeling
During this stage, we train numerous models to define which one of them provides the
most accurate predictions.

4.5.1 Model Training


After a data scientist has preprocessed the collected data and split it into three subsets, he
or she can proceed with model training. This process entails “feeding” the algorithm with
training data. An algorithm will process data and output a model that is able to find a target
value (attribute) in new data — an answer you want to get with predictive analysis. The
purpose of model training is to develop a model.
Two model training styles are most common — supervised and unsupervised learning. The
choice of each style depends on whether you must forecast specific attributes or group
data objects by similarities.

4.5.1.1 Supervised learning


Supervised learning allows for processing data with target attributes or labeled data. These
attributes are mapped in historical data before the training begins. With supervised
learning, a data scientist can solve classification and regression problems.

4.5.1.2 Unsupervised learning


During this training style, an algorithm analyzes unlabeled data. The goal of model training
is to find hidden interconnections between data objects and structure objects by
similarities or differences. Unsupervised learning aims at solving such problems as
clustering, association rule learning, and dimensionality reduction. For instance, it can be
applied at the data preprocessing stage to reduce data complexity.

4.7 Model evaluation and testing


The goal of this step is to develop the simplest model able to formulate a target value fast
and well enough. We can achieve this goal through model tuning. That’s the optimization of
model parameters to achieve an algorithm’s best performance.
One of the more efficient methods for model evaluation and tuning is cross-validation.
Cross-validation
Cross-validation is the most commonly used tuning method. It entails splitting a training
dataset into ten equal parts (folds). A given model is trained on only nine folds and then
tested on the tenth one (the one previously left out). Training continues until every fold is
left aside and used for testing. As a result of model performance measure, a specialist
calculates a cross-validated score for each set of hyper parameters. A data scientist trains
models with different sets of hyper parameters to define which model has the highest
prediction accuracy. The cross-validated score indicates average model performance across
ten hold-out folds.

17 | P a g e
4.7 Model deployment
The model deployment stage covers putting a model into production use.
Once a data scientist has chosen a reliable model and specified its performance
requirements, he or she delegates its deployment to a data engineer or database
administrator. The distribution of roles depends on your organization’s structure and the
amount of data you store.
Generally, data engineer implements, tests, and maintains infrastructural components for
proper data collection, storage, and accessibility. Besides working with big data, building
and maintaining a data warehouse, a data engineer takes part in model deployment. To do
so, a specialist translates the final model from high-level programming languages (i.e.
Python and R) into low-level languages such as C/C++ and Java.
The distinction between two types of languages lies in the level of their abstraction in
reference to hardware. A model that’s written in low-level or a computer’s native language,
therefore, better integrates with the production environment.

18 | P a g e
5. Python
Python is an interpreted high-level general-purpose programming language. Python's
design philosophy emphasizes code readability with its notable use of significant
indentation.
Python offers concise and readable code. While complex algorithms and versatile
workflows stand behind machine learning and AI, Python’s simplicity allows developers to
write reliable systems. Developers get to put all their effort into solving an ML problem
instead of focusing on the technical nuances of the language. Additionally, Python is
appealing to many developers as it’s easy to learn. Python code is understandable by
humans, which makes it easier to build models for machine learning.
To reduce development time, programmers turn to a number of Python frameworks and
libraries. A software library is pre-written code that developers use to solve common
programming tasks. Python, with its rich technology stack, has an extensive set of libraries
for artificial intelligence and machine learning. Here are some of them:

6. Libraries Used
A general list of libraries used in developing this model are :
6.1 NUMPY
NUMPY is a library for the Python programming language, adding support for large, multi-
dimensional arrays and matrices, along with a large collection of high-level mathematical
functions to operate on these arrays.

Moreover NUMPY forms the foundation of the Machine Learning stack. In this article we
cover the most frequently used NUMPY operations.
1) Creating a Vector
We use NUMPY to create a 1-D Array which we then call a vector.
2) Creating a Matrix
We create a 2-D Array in NUMPY and call it a Matrix. It contains 2 rows and 3
columns.
3) Creating a Sparse Matrix
Sparse Matrices store only none zero elements and assume all other values will be
zero, leading to significant computational savings.
4) Selecting Elements
When you need to select one or more element in a vector or matrix
5) Describing a Matrix
When you want to know about the shape size and dimensions of a Matrix

19 | P a g e
6) Applying operations to elements
You want to apply some function to multiple elements in an array.
NUMPY’S VECTORIZE class converts a function into a function that can apply to
multiple elements in an array or slice of an array.
7) Finding the max and min values
We use NUMPY’S max and min functions
8) Calculating Average, Variance and Standard deviation
When you want to calculate some descriptive statistics about an array .
9) Reshaping Arrays
When you want to reshape an array changing the number of rows and columns
without changing the elements .

6.2 PANDAS
PANDAS is one of the tools in Machine Learning which is used for data cleaning and
analysis. It has features which are used for exploring, cleaning, transforming and visualizing
from data.

PANDAS is an open-source python package built on top of NUMPY developed by Wes


McKinney. It is used as one of the most important data cleaning and analysis tool. It
provides fast, flexible and expressive data structures.
Pandas deal with three types of data structures:
Series
DATA FRAME PANEL
a) Series is a one-dimensional array-like structure with homogeneous data. The size of the
series is immutable (cannot be changed) but its values are mutable.
b) Data Frame is a two-dimensional array-like structure with heterogeneous data. Data is
aligned in a tabular manner (Rows &Columns form). The size and values of Data Frame are
mutable.
c) The panel is a three-dimensional data structure with heterogeneous data. It is hard to
represent the panel in graphical representation. But it can be illustrated as a container of
Data Frame. The size and values of a Panel are mutable.

20 | P a g e
6.3 SEABORN
SEABORN is a data visualization library built on top of MATPLOTLIB and closely integrated
with pandas data structures in Python. Visualization is the central part of SEABORN which
helps in exploration and understanding of data. One has to be familiar with NUMPY and
MATPLOTLIB and Pandas to learn about SEABORN.
SEABORN offers the following functionalities:

Dataset oriented API to determine the relationship between variables.

21 | P a g e
Automatic estimation and plotting of linear regression plots.
It supports high-level abstractions for multi-plot grids.

Visualizing UNIVARITE and BI-BIVARIATE distribution

Using SEABORN we can plot wide varieties of plots like:


1. Distribution Plots
2. Pie Chart & Bar Chart
3. Scatter Plots
4. Pair Plots
5. Heat Map

6.4 MATPLOTLIB

MATPLOTLIB is a 2-D plotting library that helps in visualizing figures. MATPLOTLIB emulates
MATPLOTLIB like graphs and visualizations. MATPLOTLIB is not free, is difficult to scale and
as a programming language is tedious. So, MATPLOTLIB in Python is used as it is a robust,
free and easy library for data visualization.

Visualizations are the easiest way to analyze and absorb information. Visuals help to easily
understand the complex problem. They help in identifying patterns, relationships, and
outliers in data. It helps in understanding business problems better and quickly. It helps to
build a compelling story based on visuals. Insights gathered from the visuals help in building
strategies for businesses. It is also a precursor to many high-level data analysis for
Exploratory Data Analysis (EDA) and Machine Learning (ML).

22 | P a g e
7. Flask
Flask is a lightweight WSGI web application framework. It is designed to make getting
started quick and easy, with the ability to scale up to complex applications. It began as a
simple wrapper around WERKZEUG
ERKZEUG and JINJA and has become one of the most popular
Python web application frameworks.

Flask offers suggestions, but doesn't enforce any dependencies or project layout. It is up to
the developer to choose the tools and libraries they want to use. There are many
extensions provided by the community that make adding new functionality easy.
Features
 Development server and debugger
 Integrated support for unit testing
 Restful request dispatching
 Uses JINJA TEMPLATING
 Support
port for secure cookies (client side sessions)
 100% WSGI 1.0 compliant
 Unicode-based
 Extensive documentation
 Google App Engine compatibility
 Extensions available to enhance features desired

8. Software and Hardware Requir


Requirement:
(Initial Requirements)
Hardware
Requirements:
1 - 8 G.B of Ram,
2 - 20 G.B of R.O.M,
3 - 4 G.B of Graphics Card .
Software Requirements:
1 - Python (version 3.8.1),
2 –Anaconda (JUPYTER
JUPYTER NOTEBOOK),
NOTEBOOK
3 -Visual Studio,
4 - Web Browser.
List of Libraries Required
(To be preinstalled in Anaconda)
1- NUMPY
2- PANDAS
3- MATPLOTLIB
4- SEABORN
5- SKLEARN
6 –XGBOOST
For Client Side:
A computer or phone with active internet connection .

Screen Layout
1 Home Screen Tab

This is how our home page looks like a simple layout ,no fancy stuff
Once you click on predict button next webpage will be shown

2 Data
D Selection Tab

Here you have choice to opt for in


individual or joint loan.

3 Analysis Tab
4 Data filling Tab

10. 5 Output Layout


11. Limitations:
Even though the models with the best accuracies are found, more work still needs to be
done to optimize the model for our application. The goal of the model is to help make
decisions on issuing loans to maximize the profit
Random Forest and XGBOOST,, classify instances based on the calculated probabilities of
falling into classes. In order to balance the trade
trade-off between the decrease in revenue and a
decrease in cost, an optimization problem has to be solved by adjusting the threshold and
seeking the optimum. If “Settled” is defined as positive and “Past
“Past Due” is defined as –ve
Loan amount and interest due are two vectors from the dataset. The other three masks are
binary flags (vectors) that use 0 and 1 to represent whether the specific conditions are met
for a certain record. Easy to under fit. Also, compared to en sampling models, the
performance is not that good.
High demand for data, sensitive to missing values, anomaly values, and unable to process
non-linear features . This means data cleaning and feature engineering will cost quite a lot
of time. Not good at dealing with unbalanced data, high-dimension feature set, and
categorical features. These calculation differences hugely effect the calculation and hence
are minimized

12. Future Scope


Loan approval is a very important process for banking organizations. The system approved
or rejects the loan applications. Recovery of loans is a major contributing parameter in the
financial statements of a bank. It is very difficult to predict the possibility of payment of
loan by the customer.

 Customer wills an idea they are eligible or not, it will help in reducing the time they
go and apply for a loan.
 Customer can see the reason why they are not eligible to get loan, because the
reason can be endless for defaulting a loan
 If Customer has existing loan in other bank, System will reflect back directly once
application is integrated with a bank.
 Small Finance bank can be first to use this services in mass level. In current situations
too, they will get a extra Security.
 Banks N.P.A will reduce and this reduced N.P.A will boost the loan giving ability of
bank.
 Added security can be a game changer as banks like yes bank, LAXMI villas bank and
other small co-operative banks will not file for bankruptcy after a major Scam and
consecutive Frauds.

29 | P a g e
13. Conclusion

From a proper analysis of positive points and constraints on the component, it can be safely
concluded that the product is a highly efficient component. This application is working
properly and meeting to all Banker requirements.
Initially it can be used as third parties Application which can get you a better Idea whether
you are eligible get a loan or not.
In Next Phase after integrating it with bank server we can get a clearer picture of loan
approval. This may seem a joke now but can be revolutionary product if being used and
customized properly.
This component can be easily plugged in many other systems. There have been numbers
cases of computer glitches, errors in content and most important weight of features is fixed
in automated prediction system, So in the near future the so –called software could be
made more secure, reliable and dynamic weight adjustment .In near future this module of
prediction can be integrate with the module of automated processing system. The system is
trained on old training dataset in future software can be made such that new testing date
should also take part in training data after some fix time.
Its efficiency will vary from time to time but overall it is a practical approach which we
need in current time.
The anomaly of taking credit and ending up in a default to the detriment of the lender has
been confirmed to have a remedy in machine learning. Using a real life dataset it has been
revealed that false positives can be reduced with an employment of decision tree, thereby
getting a highly reliable accuracy that financial institutions can depend on while scrutinizing
loan applications.

30 | P a g e
14. Appendix
The link to Access all the content of project, it include
1. Lab Report digital Copy
2. Power Point Presentation
3. Source Code

https://drive.google.com/drive/folders/1E1SxyMC2OIRDkX7ofeP5noY0PDXAJNBa

31 | P a g e

You might also like