ML MiniProject Report
ML MiniProject Report
ML MiniProject Report
MNIST DATASET”
Mini Project Report
by:
CERTIFICATE
02
Approval Sheet
Project Report Approval
Submitted by:
1.
2.
Date:
Place:
03
Declaration
We declare that this written submission represents our ideas in our own words
and where others' ideas or words have been included, we have adequately
cited and referenced the original sources. We also declare that we have
adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in our
submission. We understand that any violation of the above will be cause for
disciplinary action by the Institute and can also evoke penal action from the
sources which have thus not been properly cited or from whom proper
permission has not been taken when needed.
Date:
Place:
04
Acknowledgement
We would like to express our sincere gratitude towards our guide Prof. Ujwala
Gaikwad, Mini Project Coordinators Prof. Randeep Kaur, Prof. Nilesh
Kulal, Prof. Dnyaneshwar Bavkar, Prof. Vishwajit Gaikwad for their help,
guidance and encouragement, they provided during the project development.
This work would have not been possible without their valuable time, patience
and motivation. We thank them for making our stint thoroughly pleasant and
enriching. It was great learning and an honor being their student.
We take the privilege to express our sincere thanks to Dr. L. K. Ragha our
Principal for providing encouragement and much support throughout our work.
Date:
Place:
05
Abstract
The task for handwritten digit recognition has been troublesome due to various
variations in writing styles. Therefore, we have tried to create a base for future
researches in the area so that the researchers can overcome the existing
problems. The existing methods and techniques for handwritten digit
recognition were reviewed and understood to analyze the most suitable and best
method for digit recognition. A number of 60,000 images were used as training
sets of images with pixel size of 8×8. The images/training sets were matched
with original image. It was found out after complete analysis and review that
classifier ensemble system has the least error rate of just 1.28%. In this paper,
review of different methods handwritten digit recognition were observed and
analyzed.
06
Table of Contents
Chapter 1 Introduction 09
1.1 Aim and Objectives of Project 10
1.2 Scope 10
1.3 Organization of Report 11
Chapter 2 Literature Survey 12
2.1 Existing System 14
Chapter 3 Software Analysis
3.1 Waterfall Model 15
3.1.1 Phases of Waterfall model 16
3.2 Proposed System 17
Chapter 4 Design and Implementation
4.1 Use Case Diagram 18
4.2 Flowchart Diagram 19
4.3 State Chart Diagram 20
4.4 Sequence Diagram 21
4.5 Hardware and Software Requirement 22
4.6 Software Requirement 23
Chapter 5 Methodology
5.1 Project Module 24
Chapter 6 Implementation Detail
6.1 Working of System 25
Chapter 7 Performance Evaluation
7.1 Evaluation metrics 26
7.2 Experimental setup 27
7.2.1 Description of Data 28
7.2.2 Methodology used to perform experiment 29
Chapter 8 Problem Timeline
8.1 Gantt Chart 30
Chapter 9 Results
9.1 Project Screenshots 32
Chapter 10 Conclusion 34
References 35
07
List of Figures
08
Chapter 1
INTRODUCTION
09
1.1 Aim and Objectives of Project
1.2 Scope
The current model works on 8x8 image which can substituted to a higher
resolution image.
We can implement higher machine learning technique like neural network
in the build.
We can also make use of deep learning technique in our model.
10
1.3 Organization Of The Report
The organization of this report will be presented below. This structure
represents the flow made
for the elaboration of the system, so that the reader can analyze in a
sequential way each one of the necessary steps for the elaboration of such a
system.
Chapter 3 discusses the flow of work described using a waterfall model and
further describes the proposed system in the project.
11
Chapter 2
Research paper 1 :
TITLE :
Handwritten Digit Recognition Using Structural, Statistical Features and K-nearest
Neighbor Classifier
Observation :
The proposed algorithm is tested on MNIST digit database. The Algorithms uses
60,000 training sample of numerals and 5000 testing samples. The algorithm is executed
for k=1, k=3, and k=5 and results are listed out. The results are compared to find out the
optimum value of k. it is clear that k=1 is optimal value. The recognition rate of individual
digits Test database of MNIST is listed out. The overall recognition rate is found to be
98.42%
Conclusion :
In this paper, we used thirteen (13) statistical and five (5) structural features for
recognition of handwritten numerals. In any recognition process, the important
problem is to address the feature extraction and correct classification approaches. The
Overall accuracy of 98.42% is achieved in the recognition process.
Accuracy Result :
A total 5000 numeral images are tested, and the overall accuracy is found to be 98.42%.
Author :
U Ravi Babu Research Scholar, Aacharya Nagarjuna University Assoc. Professor – GIET
Rajahmundry, A.P, India uppu.ravibabu@gmail.com Aneel Kumar Chintha M.Tech (CSE)
Studen, GIET, Rajahmundry, A.P, India aneelkumar.chintha@gmail.com Dr. Y
Venkateswarlu Professor & Head, Department of CSE GIET Engg College, Rajahmundry,
A.P, India yalla_venkat@yahoo.com
12
Research paper 2 :
TITLE :
Offline Handwritten Digits Recognition Using
Machine learning
Conclusion :
This paper has practiced different machine learning technique and different models for data
training
attempting to discover a representation of isolated handwritten digits that allow their
effective recognition and to
achieve the highest accuracy of predicting handwritten numeral. Thus, this study settled on
classifying a given
handwritten digit image as the required digit using five different algorithms and
consequently testing its accuracy.
This study built handwritten recognizers evaluated their performances on MNIST (Mixed
National Institute of
Standards and Technology) dataset and then improved the training speed and the
recognition performance.
Accuracy Result :
Using the dataset obtained by Image Attribute Reduction in MATLAB (discussed earlier in
the preprocessing section)
analysis is done to check the accuracy of the classifier K-NN and Neural Net.
With 196 attributes, we
got an accuracy of 95.73% and 95.93% in Neural Net and K-NN respectively.
Author :
Shengfeng Chen
1Department of Industrial Engineering Western Michigan University, Kalamazoo, MI.
Rabia Almamlook
2Department of Industrial Engineering Western Michigan University, Kalamazoo, MI.
13
2.1 Existing System
The current Digit/Pattern Recognition model is found very useful in all these
industrial fields or for persons in need.
Currently, there are model which are build based on higher machine learning
technique like neural network and deep learning which has high performance.
14
Chapter 3
SOFTWARE ANALYSIS
➢ Design Phase: During this phase, there may also be a need for new
insights or more detailed analysis.
➢ Testing Phase: The results obtained will then be evaluated in the Testing phase.
16
3.2 Proposed System
The core part of our application is the algorithm guessing the drawn number. Machine
learning will be the tool used to achieve a good guess quality. This kind of basic method
allows a system to learn automatically with a given amount of data. In broader terms,
machine learning is a process of finding a coincidence or set of coincidences in the data to
rely on them to guess the result.
17
Chapter 4
Figure 4.1
18
4.2 Flowchart Diagram
Figure 4.2
19
4.3 State Chart Diagram
Figure 4.3
20
4.4 Sequence Diagram
Figure 4.4
21
4.5 Hardware and Software Requirements
Software Requirements Specification
➢ Python 3 in-line IDE (Google Colab)
➢ Jupyter Notebook
Libraries
➢ Tkinter
➢ Scikit-learn
➢ Numpy
➢ Matplotlib
Operating System
➢ Windows/MAC
22
4.6 Software Components
➢ Python3:-
The project is constructed using Python programming language.
➢ Libraries:-
Sklearn:- Scikit-learn is probably the most useful library for
machine learning in Python. The sklearn library contains a lot of efficient tools
for machine learning and statistical modeling including classification,
regression, clustering and dimensionality reduction
23
Chapter 5
METHODOLOGY
5.1 Project Modules
The project can be divided into these modules:
1. Dataset preparation:- Inorder to train our model better ,a large amount of data
is required to train it. So, to achieve that goal we are using already preprocessed
dataset “MNIST”. So we can assume we don’t need to clean the data as its
already preprocessed.
2. Model Training:- This process entails feeding the model with training data.
The model will process the data using specific classifier and predict the outputs
after giving proper input.
3. Visualization predicted output:- To check the output of the test set once the
model is trained.
24
Chapter 6
IMPLEMENTATION DETAILS
6.1 Working of the system
The data flow will run through each of the modules mentioned in Chapter 5.1,
presenting a final result through the last of them. In addition, each of the modules makes
use of different libraries that provide them with predetermined functions.
At the very beginning we fetch our dataset from MNIST dataset. After
fetching/collecting data we split the data into training and testing. Then we make use of
training data to train the model.
Then we use KNN-classifier to build our model which is already trained and ready
to be tested by determining its accuracy.
We got accuracy about 98.42% using KNN-classifier on test dataset, which proves
that our model is quite good and trained well. So after computing accuracy we determine
Confusion Matrix to check its precision .
After fully building our model we create a GUI which consists of a canvas board to
draw a number in it which is needed to predict by the user. GUI also consists of “CLEAR”
and “PREDICT” option .
If a user wants to try another number to predict he simply clicks on CLEAR and draw
another number and click on PREDICT to check whether our model predicted correct or
not.
25
Chapter 7
PERFORMANCE EVALUATION
7.1 Evaluation Metrics
1. Confusion matrix
26
2. Accuracy:
Test Data:- 98.42%
27
7.2.1Description of Data
28
7.2.1 Methodology Used To Perform Experiment
k-nearest neighbors algorithm’s an algorithm that gets some data samples and
arranges them on a plane ordered by a given set of characteristics. To
understand it better, let’s review the following image:
To detect the type of the Green Dot, we should check the types of k nearest
neighbors where k is the argument set. Considering the image above, if k is
equal to 1, 2, 3, or 4, the guess will be a Black Triangle as most of the green
dot’s closest k neighbors are black triangles. If we increase k to 5, then the
majority of the objects are blue squares, hence the guess will be a Blue Square.
29
Chapter 8
PROBLEM TIMELINE
Feasibility Study
Requirement Gathering
Analysis
1st Review
Design
Testing
2nd Review
30
TASKS START END DAYS
31
Chapter 9
RESULTS
9.1 Project Screenshots
32
33
Chapter 10
CONCLUSION
34
REFERENCES
Link :
https://www.researchgate.net/publication/272854375_Handwritten_Digit_
Recognition_Using_Structural_Statistical_Features_and_K-
nearest_Neighbor_Classifier
http://ieomsociety.org/dc2018/papers/123.pdf
https://scikitlearn.org/stable/modules/generated/sklearn.neighbors.KNeigh
borsClassifier.html
35