Pratham Content
Pratham Content
A
PROJECT REPORT
Submitted in partial fulfillment of the Requirements
For the award of Bachelor of Computer Application Degree
LNCT UNIVERSITY
Submitted by
Name of Student – Pratham Chourasiya
Enrollment No. LNCBBCAAI130
BHOPAL
Submission Date: 18 May 2024
BACHELOR OF COMPUTER APPLICATION
CERTIFICATE
This is to certify that the work embodied in this Major project “OBJECT
DETECTION” has been satisfactorily completed by Pratham Chourasiya
LNCBBCAAI130 It is a bonafide piece of work, carried out under my guidance
in the Bachelor of Computer Application, LNCT University, Bhopal for the
partial fulfillment of the Bachelor of Computer Application degree during the
academic session January-June, 2024.
Guided By
Prof. Dev Kumar Chouhan
Approved By
Head of Department
Forwarded by
Director
LNCT University, Bhopal
LNCT UNIVERSITY
LNCT UNIVERSITY, BHOPAL
BACHELOR OF COMPUTER APPLICATION
CERTIFICATE OF APPROVAL
DECLARATION
Pratham Chourasiya
Enrollment No. --- LNCBBCAAI130
Date : ----18 MAY 2024
LNCT UNIVERSITY, BHOPAL
BACHELOR OF COMPUTER APPLICATION
ACKNOWLEDMENT
1 INTRODUCTION 04
1.1 PURPOSE AND OBJECTIVES 01
1.2 EXISTING AND PROPOSED SYSTEM 02
1.3 SCOPE OF PROJECT
04
2 LITERATURE SURVEY 5-6
4.1 DESCRIPTION 11
4.2 ARCHITECTURE 12
5 METHODOLOGY 18-39
6 IMPLEMENTATION 40-52
CONCLUSION 53
BIBLIOGRAPHY 55
ABSTRACT
Now with the advance of deep learning and neural network, we can
finally tackle such problems without coming up with various
heuristics realtime.
➢ The improved versions of R-CNN, like Fast RCNN and Faster-RCNN, used
more strategies to reduce computation of region but they didn’t pushed real-time
inference speed.
➢ With the help of transfer learning, the used network will be trained with
other objects to increase the scope of objects the Mobile.Net can detect.
LITERATURE SURVEY
➢ To avoid above situations we propose this model
We included Single Shot Multi Box Detector architecture.
SYSTEM ANALYSIS
3.1 HARDWARE AND SOFTWARE REQUIREMENTS
The development and deployment of the application require the following
general and specific minimum requirements for hardware:
The development and deployment of the application requires the following general and
specific minimum requirements for software:
Components Minimum requirement
Functional Requirements
➢ A real time image or a video can be fed to the ML model.
➢ All the detected objects along with their labels are displayed with in the video or the
image.
Non-Functional Requirements.
➢ Performance: User will get the desired output, without getting interrupted.
SOURCE CODE
clc
%% Test Two
%%Histogram of Orientated Gradients
%%Histogram of Pixel Orientation
%%Histogram of curvatures
%% Eccentricity
%clear all
close all
%% Area Ratios Weight
tic
load ('newData.mat')%read video
%load('FinalHog.mat')
depth=6;
Params = [9 3 2 1 0.2];
video=mmreader('F:\Thesis\Testing Videos\T4.h64');%\Other
Datasets\test_videos\test_videos\3.avi');
%[7.5 18.5 345 224]));
for k=3501:5:4000
%%Read an image
figure(1);
image=imcrop(read(video,k),[7.5 18.5 345 224]
;
SYSTEM DESIGN
4.1 DESCRIPTION
Object detection is to detect all instances of objects from a known class, such as people,
cars, or faces in an image. Generally, only a small number of instances of the object are
present in the image, but there is a very large number of possible locations and scales at
which they can occur and that need to somehow be explored.
Each detection of the image is reported with some form of pose information. This is as
simple as the location of the object, a location, and scale, or the extent of the object
defined in terms of a bounding box.
In some other situations, the pose information is more detailed and contains the
parameters of a linear or non-linear transformation. For example, face detection in a
face detector may compute the locations of the eyes, nose, and mouth, in addition to
the bounding box of the face
ARCHITECTURE
Depending on the development culture, some of these artifacts are treated more or
less formally than others.
Such artifacts are not only the deliverables of a project; they are also
critical in controlling, measuring, and communicating about a system during
its development and after its deployment.
The UML addresses the documentation of a system's architecture and all of its details.
The UML also provides a language for expressing requirements and for tests. Finally, the
A use case diagram is a graph of actors’ set of use cases enclosed by a system
boundary, communication associations between the actors and users, and
generalization among use cases.
In our project, the actors are the user and the machine(server). The user
starts the application and selects the a desired mode of video input for the
network and the server classifies the objects based on the labeled-map , then
it returns the labeled frame or object as the output.
4.3.2Sequence Diagram:
Its high-level built-in data structures, combined with dynamic typing and
dynamic binding, make it very attractive for Rapid Application Development, as
well as for use as a scripting or glue language to connect existing components
together.
Python's simple, easy to learn syntax emphasizes readability and therefore reduces
OpenCV
OpenCV-Python is a library of Python bindings designed to solve computer
vision problems.
All the OpenCV array structures are converted to and from Numpy arrays. This
also makes it easier to integrate with other libraries that use Numpy such as SciPy and
Matplotlib.
Besides its obvious scientific uses, Numpy can also be used as an efficient multi-
dimensional container of generic data.
5.2MODULES DESCRIPTION
Modules:
1) GUI
2) Object-Detection
Module I:
It mainly concentrates on User Interface. A library called tKinter is used to
develop an application UI . It has a text describing the project and has some
buttons which lets user to choose between the options such as Live Object
Detection and Object Detection Using a Video.
It also lets user to upload an mp4 file in-order to detect objects using a video file.
Module II:
This project lets user to choose detection using either live detection or by
uploading an mp4 file. The project uses pre-trained Single Shot Multi Box-
Detector(SSD) Mobile Net to detect various objects. A frozeninference-graph is
extracted and used which has the pre-trained weights of the objects which helps in
faster execution and efficient detection of objects in the frame.
A session is used in the project, which allows user to execute graphs i.e., frozen
inference graphs.
Each and every frame from the live feed or from the video file is used to detect
the objects and after successful detection, a rectangular box is drawn around the
detected object and also labelled accordingly as detected from the labelled-map.
5.3 PROCESS/ALGORITHM
Mobile Net:
❖ Mobile Net is an object detector is an efficient CNN architecture designed for mobile
and embedded vision applications. This architecture uses proven depth-wise
separable convolutions to build light weight deep neural networks.
❖ The core layers of Mobile Net is built on depth-wise separable filters. The first layer,
which is a full convolution, is an exception.
❖ Basic operations like reshaping and resizing of images during feeding the data to the
model is done.
❖ Data preprocessing involves conversion of data from a given format to much more
user friendly, desired and meaningful format.
❖ The proposed method deals with image and video data using Numpy and OpenCV.
❖ A frozen inference graph from the pre-trained which has the weights of pretrained
objects is used in detecting objects.
❖ Freezing is the process to identify and save all of required things(graph, weights
etc) in a single file that you can easily use.
The above shows the pictorial representation of depth-wise separable convolutions .
R-CNN
Fast R-CNN
The same author of the previous paper(R-CNN) solved some of the drawbacks
of the R-CNN to build a faster object detection algorithm and it was called Fast R-CNN.
The approach is similar to the R-CNN algorithm. But, instead of feeding the
region proposals to the CNN, we feed the input image to the CNN to generate a
convolutional feature map.
From the convolutional feature map, we can identify the region of the proposals
and warp them into the squares, and by using an RoI pooling layer shape them into a
fixed size so that they can be fed into a fully connected layer. From the RoI feature
vector, we can use a softmax layer to predict the class of the proposed region and
also the offset values for the bounding box.
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time. Instead,
the convolution operation is always done only once per image and a feature map is
generated from it.
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time. Instead,
the convolution operationisalwaysdone only once per image and a feature map is
generated from it
Faster R-CNN
Both of the above algorithms(R-CNN & Fast R-CNN) uses selective search
to find out the region proposals. Selective search is a slow and time-
consuming process that affects the performance of the network.
YOLO is orders of magnitude faster(45 frames per second) than any other object
detection algorithms. The limitation of the YOLO algorithm is that it struggles with the
small objects within the image for example, it might have difficulties in identifying a
flock of birds. This is due to the spatial constraints of the algorithm.
SDD :
The SSD object detection composes of 2 parts:
Each prediction composes of a boundary box and 21 scores for each class (one
extra class for no object), and we pick the highest score as the class for the bounded
object. Conv4_3 makes total of38× 38 × 4 predictions: four predictions per cell
regardless of the depth of feature maps. As expected, many predictions contain no
object. SSD reserves a class “0” to indicate
SSD does not use the delegated region proposal network. Instead, it resolves to a
very simple method.It computes both the location and class scores using small
convolution filters. After extraction the feature maps, SSD applies 3 × 3 convolution
filters for each cell to make predictions. (These filters compute the results just like the
regular CNN filters.) Each filter gives outputs as 25channels:21scores for each class
plus one boundary box.
Beginning, we describe the SSD that detects objects from a single layer. Actually,
it uses multiple layers(multi-scale feature maps the detecting objects independently.
As CNN reduces the spatial dimension gradually, the resolution of the feature maps
also decreases. SSD uses lower resolution layers for the detect larger-scale objects.
For example, the 4× 4 feature maps are used for the larger-scale object
Numpy is a general-purpose array-processing package. It provides a highperformance multidimensional
array object and tools for working with these arrays.
Besides its obvious scientific uses, Numpy can also be used as an efficient multi-
dimensional container of generic data.
5.2MODULES DESCRIPTION
Modules:
3) GUI
4) Object-Detection
Module I:
It also lets user to upload an mp4 file in-order to detect objects using a video file.
Module II:
This project lets user to choose detection using either live detection or by
uploading an mp4 file. The project uses pre-trained Single Shot Multi Box-
Detector(SSD) Mobile Net to detect various objects. A frozeninference-graph is
extracted and used which has the pre-trained weights of the objects which helps in
faster execution and efficient detection of objects in the frame.
A session is used in the project, which allows user to execute graphs i.e., frozen
inference graphs.
Each and every frame from the live feed or from the video file is used to detect
the objects and after successful detection, a rectangular box is drawn around the
detected object and also labelled accordingly as detected from the labelled-map.
5.3 PROCESS/ALGORITHM
Mobile Net:
❖ Mobile Net is an object detector is an efficient CNN architecture designed for mobile
and embedded vision applications. This architecture uses proven depth-wise
separable convolutions to build light weight deep neural networks.
❖ The core layers of Mobile Net is built on depth-wise separable filters. The first layer,
which is a full convolution, is an exception.
❖ Basic operations like reshaping and resizing of images during feeding the data to the
model is done.
❖ Data preprocessing involves conversion of data from a given format to much more
user friendly, desired and meaningful format.
❖ The proposed method deals with image and video data using Numpy and OpenCV.
The same author of the previous paper(R-CNN) solved some of the drawbacks
of the R-CNN to build a faster object detection algorithm and it was called Fast R-CNN.
The approach is similar to the R-CNN algorithm. But, instead of feeding the
region proposals to the CNN, we feed the input image to the CNN to generate a
convolutional feature map.
From the convolutional feature map, we can identify the region of the proposals
and warp them into the squares, and by using an RoI pooling layer shape them into a
fixed size so that they can be fed into a fully connected layer. From the RoI feature
vector, we can use a softmax layer to predict the class of the proposed region and
also the offset values for the bounding box.
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time. Instead,
the convolution operation is always done only once per image and a feature map is
generated from it.
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time. Instead,
the convolution operationisalwaysdone only once per image and a feature map is
generated from it
Both of the above algorithms(R-CNN & Fast R-CNN) uses selective search
to find out the region proposals. Selective search is a slow and time-
consuming process that affects the performance of the network.
From the above graph, you can see that Faster R-CNN is much faster than it’s
predecessors. There fore,it can even be used for real-time object detection.
All the previous object detection algorithms have used regions to localize the object
within the image. The network does not look at the complete image. Instead, parts of
the image which has high probabilities of containing the object. YOLO or You Only
Look Once is an object detection algorithm much is different from the region-based
algorithms seen above. In YOLO a single convolutional network predicts the bounding
boxes and the class probabilities for these boxes.
YOLO works by taking an image and split it into an SxS grid, within each of the
grid we taken bounding boxes. For each of the bounding boxes, the network gives an
output a class probability and offset values for the bounding box. The bounding boxes
have the class probability above-threshold value is selected and used to locate the
object within the image.
YOLO is orders of magnitude faster(45 frames per second) than any other object
detection algorithms. The limitation of the YOLO algorithm is that it struggles with the
small objects within the image for example, it might have difficulties in identifying a
flock of birds. This is due to the spatial constraints of the algorithm.
SDD :
The SSD object detection composes of 2 parts:
SSD does not use the delegated region proposal network. Instead, it resolves to a
very simple method.It computes both the location and class scores using small
convolution filters. After extraction the feature maps, SSD applies 3 × 3 convolution
filters for each cell to make predictions. (These filters compute the results just like the
regular CNN filters.) Each filter gives outputs as 25channels:21scores for each class
plus one boundary box.
Beginning, we describe the SSD that detects objects from a single layer. Actually,
it uses multiple layers(multi-scale feature maps the detecting objects independently.
As CNN reduces the spatial dimension gradually, the resolution of the feature maps
also decreases. SSD uses lower resolution layers for the detect larger-scale objects.
For example, the 4× 4 feature maps are used for the larger-scale object
BIBLIOGRAPHY
This website provided insights into job listing functionalities, user interface design, and search algorithms.
The LinkedIn job portal offered valuable information on user profile management, job recommendations,
and employer features.