0% found this document useful (0 votes)

24 views43 pages

Pratham Content

Uploaded by

pkcking27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views43 pages

Pratham Content

Uploaded by

pkcking27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 43

OBJECT DETECTION

A
PROJECT REPORT
Submitted in partial fulfillment of the Requirements
For the award of Bachelor of Computer Application Degree

LNCT UNIVERSITY

MAJOR PROJECT REPORT

Submitted by
Name of Student – Pratham Chourasiya
Enrollment No. LNCBBCAAI130

Under the Guidance of

Prof. Dev Kumar Chouhan

BACHELOR OF COMPUTER APPLICATION

LNCT UNIVERSITY

BHOPAL
Submission Date: 18 May 2024
BACHELOR OF COMPUTER APPLICATION

CERTIFICATE

This is to certify that the work embodied in this Major project “OBJECT
DETECTION” has been satisfactorily completed by Pratham Chourasiya
LNCBBCAAI130 It is a bonafide piece of work, carried out under my guidance
in the Bachelor of Computer Application, LNCT University, Bhopal for the
partial fulfillment of the Bachelor of Computer Application degree during the
academic session January-June, 2024.

Guided By
Prof. Dev Kumar Chouhan

Approved By

Head of Department

Forwarded by

Director
LNCT University, Bhopal
LNCT UNIVERSITY
LNCT UNIVERSITY, BHOPAL
BACHELOR OF COMPUTER APPLICATION

CERTIFICATE OF APPROVAL

This foregoing major project is hereby approved as a creditable study of a

Computer Application Subject carried out and presented in a manner
satisfactory to warranty its acceptance as a prerequisite to the degree for which
it has been submitted. It is understood that by this approval the undersigned do
not necessarily endorse or approve any statement made, opinion expressed or
conclusion drawn therein, but approve the thesis only for the purpose for which
it has been submitted.

Internal Examiner External

Examiner
LNCT UNIVERSITY, BHOPAL

BACHELOR OF COMPUTER APPLICATION

DECLARATION

I, Pratham Chourasiya LNCBBCAAI130 a student of Bachelor of

Computer Application, LNCT University, Bhopal hereby declare that the
work presented in this Major Project is outcome of my own work, is bonafide,
correct to the best of my knowledge and this work has been carried out taking
care of Engineering Ethics. The work presented does not infringe any patented
work and has not been submitted to any University for the award of any degree
or any professional diploma.

Pratham Chourasiya
Enrollment No. --- LNCBBCAAI130
Date : ----18 MAY 2024
LNCT UNIVERSITY, BHOPAL
BACHELOR OF COMPUTER APPLICATION

ACKNOWLEDMENT

We express our sincere indebtedness towards our guide Prof. Dev

Kumar Chouhan sir ,Bachelor of Computer Application, LNCT
UNIVERSITY, Bhopal for his invaluable guidance, suggestions and
supervision throughout the work. Without his kind patronage and
guidance the project would not have taken shape. We would also like
to express our gratitude and sincere regards for his kind approval of
the project, time to time counseling and advices. We would also like
to thank to our Director Dr. Sanjay Bajpai, Bachelor of Computer
Application, LNCT UNIVERSITY, BHOPAL for his expert advice
and counseling from time to time. We owe sincere thanks to all the
faculty members in the department of Bachelor of Computer
Application, LNCT UNIVERSITY, Bhopal for their kind guidance
and encouragement from time to time.
TABLE OF CONTENTS

S.NO TITLE PG.NO

1 INTRODUCTION 04
1.1 PURPOSE AND OBJECTIVES 01
1.2 EXISTING AND PROPOSED SYSTEM 02
1.3 SCOPE OF PROJECT
04
2 LITERATURE SURVEY 5-6

3 SYSTEM ANALYSIS 7-9

3.1 HARDWARE AND SOFTWARE REQUIREMENTS 08

3.2 SOFTWARE REQUIREMENTS SPECIFICATION 09

4 SYSTEM DESIGN 10-17

4.1 DESCRIPTION 11

4.2 ARCHITECTURE 12

4.3 UML DIAGRAMS 13-17

5 METHODOLOGY 18-39

5.1 TECHNOLOGIES USED 19-23

5.2 MODULES DESCRIPTION 24

5.3 PROCESS/ALGORITHM 25-39

6 IMPLEMENTATION 40-52

6.1 SAMPLE CODE 41-46

6.2 OUTPUT SCREENS 47-49

6.3 TEST CASES 50-52 7

CONCLUSION 53

BIBLIOGRAPHY 55
ABSTRACT

Object detection is a key problem in computer vision. Detection can

be difficult since there are all kinds of variations in orientation,
lighting, background and occlusion that can result in completely
different images of the very same object.

Now with the advance of deep learning and neural network, we can
finally tackle such problems without coming up with various
heuristics realtime.

The project “Object Detection” detects objects efficiently based on

CNN algorithm and apply the algorithm on image data. Various
algorithms like ‘You Only Look Once’ and other Convolutional Neural
Networks helps in achieving detection of objects in different position
and orientation. We first take a pretrained convolutional neural
network.
INTRODUCTION

1.1 PURPOSE AND OBJECTIVES

A few years ago, the creation of the software and

hardware image processing systems was mainly limited to
the development of the user interface, which most of the
programmers of each firm were engaged in.

However, this has not yet led to the cardinal progress in

solving typical tasks of recognizing faces, car numbers,
road signs, analyzing remote and medical images, etc.

In these scenario, deep learning is typically used and

among various network architectures used in deep
learning, Convolutional Neural Networks (CNN) are widely
used in image recognition.

We train the last layer of the network based on the

number of classes that need to be detected and the
model is fed with different types of objects in different
position, lightning and orientation which is highly required
for getting better prediction. Then we get the Region of
Interest for each image. Each and every object that is
detected is labelled along with the accuracy of the
detection over the bounded boxes.
1.2 EXISTING SYSTEM

➢ The improved versions of R-CNN, like Fast RCNN and Faster-RCNN, used
more strategies to reduce computation of region but they didn’t pushed real-time
inference speed.

➢ A YOLO system, however, broke the bottleneck by integrating region

proposal and classification into a single regression problem straight from image
pixels to bounding box coordinates and class probabilities and evaluate each full
image with a single run.

➢ Since the whole detection pipeline is a single network, it can be optimized

end-to-end directly on detection performance. YOLO is the first framework to reach
real time detection standard with 45 FPS (on GPU) and a MAP(mean Average
Precision) of 63.4% on VOC2007, but still has drawback in detecting smaller objects.

1.2 PROPOSED SYSTEM

➢ To avoid above situations we propose this model We included Single Shot

Multi Box Detector architecture.

➢ We build a Mobile-network model , TensorFlow, OpenCV that detects

objects, with much accuracy, and is robust.

➢ Through combining anchor box proposal system of faster-RCNN and using

multi-scale features to do detection layer, detecting smaller objects was eased.

1.3 SCOPE OF PROJECT

➢ This application is intended to be used in any working environment accuracy

and precision are highly desired to serve the purpose.

➢ As mentioned, the proposed model is able to detect around 90 objects. As

part of the future enhancements, the model will be custom trained with the other
objects to increase its detection capability.

➢ With the help of transfer learning, the used network will be trained with
other objects to increase the scope of objects the Mobile.Net can detect.

LITERATURE SURVEY
➢ To avoid above situations we propose this model
We included Single Shot Multi Box Detector architecture.

➢ We build a Mobile-network model , TensorFlow,

OpenCV that detects objects, with much accuracy, and is
robust.

➢ Through combining anchor box proposal system of

faster-RCNN and using multi-scale features to do
detection layer, detecting smaller objects was eased

SYSTEM ANALYSIS
3.1 HARDWARE AND SOFTWARE REQUIREMENTS
The development and deployment of the application require the following
general and specific minimum requirements for hardware:

Components Minimum requirement

Ram Capacity Minimum of 4GB

Camera Any desktop/laptop supported

Hard Disk Minimum of 2GB

Processor Intel Pentium or higher

The development and deployment of the application requires the following general and
specific minimum requirements for software:
Components Minimum requirement

Operating System Windows 7,8,10 (or)

Linux

Coding Language Python

Coding Languages libraries OpenCV – pip install opencv-python. (version

4.1.0 and above) TensorFlow – pip install tensor
flow (TensorFlow as backend) Keras – pip install
keras
Numpy – pip install numpy

Software Environment Visual Studio

3.2 SOFTWARE REQUIREMENTS SPECIFICATION

Functional Requirements
➢ A real time image or a video can be fed to the ML model.

➢ User has to open the application.

➢ User has to choose either a live detection or a video detection.

➢ User will be able to get labelled objects detected.

➢ All the detected objects along with their labels are displayed with in the video or the
image.

Non-Functional Requirements.

➢ Performance: User will get the desired output, without getting interrupted.

SOURCE CODE
clc
%% Test Two
%%Histogram of Orientated Gradients
%%Histogram of Pixel Orientation
%%Histogram of curvatures
%% Eccentricity
%clear all
close all
%% Area Ratios Weight
tic
load ('newData.mat')%read video
%load('FinalHog.mat')
depth=6;
Params = [9 3 2 1 0.2];
video=mmreader('F:\Thesis\Testing Videos\T4.h64');%\Other
Datasets\test_videos\test_videos\3.avi');
%[7.5 18.5 345 224]));
for k=3501:5:4000
%%Read an image
figure(1);
image=imcrop(read(video,k),[7.5 18.5 345 224]
;

%%Gray Scale image

img=(rgb2gray(image));
BW=edge(sqrt(double(img)),'canny',0.29);
% [x,y]=find(BW);
% deri= diff([x y],2) ;
% ind=find(deri(:,1)~=0&deri(:,2)~=0);
img1=sqrt(double(img))-sqrt(double(fi));
fore=zeros(size(img1));
ind=find(img1>max(max(img1)*0.6));
fore(ind)=255;
%BW=abs(edgelinking2_C(BW,3,3));
[BW AngleLeft AngleRight]= edgelinking2_C(BW,3,3);
BW=abs(BW);
st=strel('disk',3);
BW=imopen(BW,st);
fore=imdilate(fore,st);
LabelsList=unique(BW(ind)

SYSTEM DESIGN

4.1 DESCRIPTION
Object detection is to detect all instances of objects from a known class, such as people,
cars, or faces in an image. Generally, only a small number of instances of the object are
present in the image, but there is a very large number of possible locations and scales at
which they can occur and that need to somehow be explored.

Each detection of the image is reported with some form of pose information. This is as
simple as the location of the object, a location, and scale, or the extent of the object
defined in terms of a bounding box.

In some other situations, the pose information is more detailed and contains the
parameters of a linear or non-linear transformation. For example, face detection in a
face detector may compute the locations of the eyes, nose, and mouth, in addition to
the bounding box of the face

ARCHITECTURE

RCNN or Region Convolutional Neural Network determines the location of multiple

objects in an image. An image is split into various regions of interest to scan for an
object. It scans for the specific regions of interest that will likely contain an object of
value. The input image is processed using selecting search method, generating about
2000 region proposals. The proposal regions are run through CNN, then fed into a
classification subnetwork to determine the Object class. The figure below illustrates R-
CNN architecture
4.3UML DIAGRAMS

The unified modeling is a standard language for specifying, visualizing,

constructing, and documenting the system and its components is a graphical
language that provides a vocabulary andset of semantics and rules. The UML
focuses on the conceptual and physical representation of the system.

It captures the decisions and understandings about systems that must

be constructed. It is usedto understand, design, configure and control
information about the systems.

Depending on the development culture, some of these artifacts are treated more or
less formally than others.

Such artifacts are not only the deliverables of a project; they are also
critical in controlling, measuring, and communicating about a system during
its development and after its deployment.

The UML addresses the documentation of a system's architecture and all of its details.
The UML also provides a language for expressing requirements and for tests. Finally, the

4.3.1Use Case Diagram:

A use case diagram is a graph of actors’ set of use cases enclosed by a system
boundary, communication associations between the actors and users, and
generalization among use cases.

In our project, the actors are the user and the machine(server). The user
starts the application and selects the a desired mode of video input for the
network and the server classifies the objects based on the labeled-map , then
it returns the labeled frame or object as the output.
4.3.2Sequence Diagram:

Sequence Diagrams are interaction diagrams that detail how

operations are carried out. They capture the interaction between objects in
the context of a collaboration. there are three life lines in our project they are
user-application, server. These operations between the lifelines are of
stepwise in an order.
4.3.3Activity Diagram:

A UML activity diagram is basically used to document the logic of a

single operation, a single- use case, and follow the communication process.
The activity can be described as an operation of the system. The control flow
is drawn from one operation to another.
4.3.4Class Diagram:
Objects in this class diagram are user, application and the server. In
this class diagram, the server and the application have an association
connection. User and Application have a direct association connection.
5.METHODOLOGY

5.1 TECHNOLOGIES USED:

Python:

Python is an interpreted, object-oriented, high-level programming language with

dynamic semantics.

Its high-level built-in data structures, combined with dynamic typing and
dynamic binding, make it very attractive for Rapid Application Development, as
well as for use as a scripting or glue language to connect existing components
together.

Python's simple, easy to learn syntax emphasizes readability and therefore reduces

the cost of program maintenance.

OpenCV
OpenCV-Python is a library of Python bindings designed to solve computer
vision problems.

OpenCV-Python makes use of Numpy, which is a highly optimized library for

numerical operations with a MATLAB-style syntax.

All the OpenCV array structures are converted to and from Numpy arrays. This
also makes it easier to integrate with other libraries that use Numpy such as SciPy and
Matplotlib.

TensorFlow – pip install tensorflow- gpu

It is an open source artificial intelligence library, using data flow graphs to

build models. It allows developers to create largescale neural networks with many
layers. TensorFlow is mainly used for: Classification, Perception, Understanding,
Discovering, Prediction and Creation.
Numpy is a general-purpose array-processing package. It provides a
highperformance multidimensional array object and tools for working with these
arrays.

It is the fundamental package for scientific computing with Python.

Besides its obvious scientific uses, Numpy can also be used as an efficient multi-
dimensional container of generic data.

5.2MODULES DESCRIPTION

Modules:

Various modules used in the project are:

1) GUI

2) Object-Detection

Module I:
It mainly concentrates on User Interface. A library called tKinter is used to
develop an application UI . It has a text describing the project and has some
buttons which lets user to choose between the options such as Live Object
Detection and Object Detection Using a Video.

It also lets user to upload an mp4 file in-order to detect objects using a video file.

Module II:
This project lets user to choose detection using either live detection or by
uploading an mp4 file. The project uses pre-trained Single Shot Multi Box-
Detector(SSD) Mobile Net to detect various objects. A frozeninference-graph is
extracted and used which has the pre-trained weights of the objects which helps in
faster execution and efficient detection of objects in the frame.

A session is used in the project, which allows user to execute graphs i.e., frozen
inference graphs.

Each and every frame from the live feed or from the video file is used to detect
the objects and after successful detection, a rectangular box is drawn around the
detected object and also labelled accordingly as detected from the labelled-map.

5.3 PROCESS/ALGORITHM

Mobile Net:

❖ Mobile Net is an object detector is an efficient CNN architecture designed for mobile
and embedded vision applications. This architecture uses proven depth-wise
separable convolutions to build light weight deep neural networks.
❖ The core layers of Mobile Net is built on depth-wise separable filters. The first layer,
which is a full convolution, is an exception.
❖ Basic operations like reshaping and resizing of images during feeding the data to the
model is done.
❖ Data preprocessing involves conversion of data from a given format to much more
user friendly, desired and meaningful format.
❖ The proposed method deals with image and video data using Numpy and OpenCV.

❖ Data visualization is the process of transforming abstract data to meaningful

representations using knowledge communication and insight discovery through
encodings.

❖ The SSD MobileNet model is pretrained with COCO(common objects in context)

dataset.

❖ This model consists of 90 different labelled classes.

❖ These networks uses Depth-Wise separable convolutions in place of the standard
convolutions used in earlier architectures to build lighter models.

❖ Each depth-wise separable convolution layer consists of a depthwise convolution

and a pointwise convolution. Counting depthwise and pointwise convolutions as
separate layers, a MobileNet has 28 layers.

❖ A frozen inference graph from the pre-trained which has the weights of pretrained
objects is used in detecting objects.

❖ Freezing is the process to identify and save all of required things(graph, weights
etc) in a single file that you can easily use.
The above shows the pictorial representation of depth-wise separable convolutions .
R-CNN

To circumvent the problem of selecting a huge number of regions, Ross Girshick

et al. proposed a method where we use the selective search to extract just 2000
regions from the image and he called them region proposals. Therefore, instead of
trying to classify the huge number of regions, you can just work with 2000 regions.
These 2000 region proposals are generated by using the selective search algorithm
which is written below. Selective Search:

1. Generate the initial sub-segmentation, we generate many candidate regions

2. Use the greedy algorithm to recursively combine similar regions into larger ones
3. Use generated regions to produce the final candidate region proposals
These 2000 candidate regions which are proposals are warped into a square and fed
into aconvolutional neural network that produces a 4096-dimensional feature vector
as output.The CNN plays the role of feature extractor and the output dense layer
consists of the features extracted from the image and the extracted features are fed
into an SVM for the classify the presence of the job within that that candidate region
proposal. In addition to predicting the presence of an object within there gion
proposals, the algorithm also predicts four values which are offset values for
increasing the precision of the bounding box. For example, given the region proposal,
the algorithm might have predicted the presence of a person but the face of that
person within that region proposal could have been cut in half. Therefore, the offset
values which is given help in adjusting the bounding box of the region proposal

Fast R-CNN

The same author of the previous paper(R-CNN) solved some of the drawbacks
of the R-CNN to build a faster object detection algorithm and it was called Fast R-CNN.
The approach is similar to the R-CNN algorithm. But, instead of feeding the
region proposals to the CNN, we feed the input image to the CNN to generate a
convolutional feature map.

From the convolutional feature map, we can identify the region of the proposals
and warp them into the squares, and by using an RoI pooling layer shape them into a
fixed size so that they can be fed into a fully connected layer. From the RoI feature
vector, we can use a softmax layer to predict the class of the proposed region and
also the offset values for the bounding box.
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time. Instead,
the convolution operation is always done only once per image and a feature map is
generated from it.

Faster R-CNN

Both of the above algorithms(R-CNN & Fast R-CNN) uses selective search
to find out the region proposals. Selective search is a slow and time-
consuming process that affects the performance of the network.

Similar to Fast R-CNN, the image is provided as an input to a convolutional

network which provides a convolutional feature map. Instead of using the selective
search algorithm for the feature map to identify the region proposals, a separate
network is used to predict the region proposals. The predicted region which is the
proposal is then reshaped using an RoI pooling layer which is used to classify the
YOLO — You Only Look Once
All the previous object detection algorithms have used regions to localize the object
within the image. The network does not look at the complete image. Instead, parts of
the image which has high probabilities of containing the object. YOLO or You Only
Look Once is an object detection algorithm much is different from the region-based
algorithms seen above. In YOLO a single convolutional network predicts the bounding
boxes and the class probabilities for these boxes.
YOLO works by taking an image and split it into an SxS grid, within each of the
grid we taken bounding boxes. For each of the bounding boxes, the network gives an
output a class probability and offset values for the bounding box. The bounding boxes
have the class probability above-threshold value is selected and used to locate the
object within the image.

YOLO is orders of magnitude faster(45 frames per second) than any other object
detection algorithms. The limitation of the YOLO algorithm is that it struggles with the
small objects within the image for example, it might have difficulties in identifying a
flock of birds. This is due to the spatial constraints of the algorithm.

SDD :
The SSD object detection composes of 2 parts:

1. Extract feature maps, and

2. Apply convolution filters to detect objects.

Each prediction composes of a boundary box and 21 scores for each class (one
extra class for no object), and we pick the highest score as the class for the bounded
object. Conv4_3 makes total of38× 38 × 4 predictions: four predictions per cell
regardless of the depth of feature maps. As expected, many predictions contain no
object. SSD reserves a class “0” to indicate
SSD does not use the delegated region proposal network. Instead, it resolves to a
very simple method.It computes both the location and class scores using small
convolution filters. After extraction the feature maps, SSD applies 3 × 3 convolution
filters for each cell to make predictions. (These filters compute the results just like the
regular CNN filters.) Each filter gives outputs as 25channels:21scores for each class
plus one boundary box.
Beginning, we describe the SSD that detects objects from a single layer. Actually,
it uses multiple layers(multi-scale feature maps the detecting objects independently.
As CNN reduces the spatial dimension gradually, the resolution of the feature maps
also decreases. SSD uses lower resolution layers for the detect larger-scale objects.
For example, the 4× 4 feature maps are used for the larger-scale object
Numpy is a general-purpose array-processing package. It provides a highperformance multidimensional
array object and tools for working with these arrays.

It is the fundamental package for scientific computing with Python.

Besides its obvious scientific uses, Numpy can also be used as an efficient multi-
dimensional container of generic data.

5.2MODULES DESCRIPTION

Modules:

Various modules used in the project are:

3) GUI

4) Object-Detection

Module I:

It mainly concentrates on User Interface. A library called tKinter is used to

develop an application UI . It has a text describing the project and has some
buttons which lets user to choose between the options such as Live Object
Detection and Object Detection Using a Video.

It also lets user to upload an mp4 file in-order to detect objects using a video file.

5.3 PROCESS/ALGORITHM

Mobile Net:

❖ Data visualization is the process of transforming abstract data to meaningful

representations using knowledge communication and insight discovery through
encodings.

❖ The SSD MobileNet model is pretrained with COCO(common objects in context)

dataset.

❖ This model consists of 90 different labelled classes.

Depth Wise Convolution:
Depth-wise Convolution is a type of convolution where we apply a single
convolutional filter for each input channel. In the regular 2D convolution performed
over multiple input channels, the filter is as deep as the input and lets us freely mix
channels to generate each element in the output. In contrast, depth-wise
convolutions keep each channel separate. In this process, the procedural module
followed :

2. Split the input and filter into channels.

3. We convolve each input with the respective filter.
4. We stack the convolved outputs together.

The above shows the pictorial representation of depth-wise separable convolutions .

R-CNN
To circumvent the problem of selecting a huge number of regions, Ross Girshick
et al. proposed a method where we use the selective search to extract just 2000
regions from the image and he called them region proposals. Therefore, instead of
trying to classify the huge number of regions, you can just work with 2000 regions.
These 2000 region proposals are generated by using the selective search algorithm
which is written below. Selective Search:

4. Generate the initial sub-segmentation, we generate many candidate regions

5. Use the greedy algorithm to recursively combine similar regions into larger ones
6. Use generated regions to produce the final candidate region proposals
These 2000 candidate regions which are proposals are warped into a square and fed
into aconvolutional neural network that produces a 4096-dimensional feature vector
as output.The CNN plays the role of feature extractor and the output dense layer
consists of the features extracted from the image and the extracted features are fed
into an SVM for the classify the presence of the job within that that candidate region
proposal. In addition to predicting the presence of an object within there gion
proposals, the algorithm also predicts four values which are offset values for
increasing the precision of the bounding box. For example, given the region proposal,
the algorithm might have predicted the presence of a person but the face of that
person within that region proposal could have been cut in half. Therefore, the offset
values which is given help in adjusting the bounding box of the region proposal
Fast R-CNN

The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time. Instead,
the convolution operationisalwaysdone only once per image and a feature map is
generated from it
Both of the above algorithms(R-CNN & Fast R-CNN) uses selective search
to find out the region proposals. Selective search is a slow and time-
consuming process that affects the performance of the network.

Similar to Fast R-CNN, the image is provided as an input to a convolutional

From the above graph, you can see that Faster R-CNN is much faster than it’s
predecessors. There fore,it can even be used for real-time object detection.

YOLO — You Only Look Once

All the previous object detection algorithms have used regions to localize the object
within the image. The network does not look at the complete image. Instead, parts of
the image which has high probabilities of containing the object. YOLO or You Only
Look Once is an object detection algorithm much is different from the region-based
algorithms seen above. In YOLO a single convolutional network predicts the bounding
boxes and the class probabilities for these boxes.
YOLO works by taking an image and split it into an SxS grid, within each of the
grid we taken bounding boxes. For each of the bounding boxes, the network gives an
output a class probability and offset values for the bounding box. The bounding boxes
have the class probability above-threshold value is selected and used to locate the
object within the image.

SDD :
The SSD object detection composes of 2 parts:

3. Extract feature maps, and

4. Apply convolution filters to detect objects.

SSD uses VGG16 to extract feature maps. Then it detects objects using the
Conv4_3layer. For illustration, we draw the Conv4_3 to be 8 × 8 spatially (it should be
38 × 38). For each cell in the image(also called location), it makes 4 object predictions.
Each prediction composes of a boundary box and 21 scores for each class (one
extra class for no object), and we pick the highest score as the class for the bounded
object. Conv4_3 makes total of38× 38 × 4 predictions: four predictions per cell
regardless of the depth of feature maps. As expected, many predictions contain no
object. SSD reserves a class “0” to indicate

SSD does not use the delegated region proposal network. Instead, it resolves to a
very simple method.It computes both the location and class scores using small
convolution filters. After extraction the feature maps, SSD applies 3 × 3 convolution
filters for each cell to make predictions. (These filters compute the results just like the
regular CNN filters.) Each filter gives outputs as 25channels:21scores for each class
plus one boundary box.
Beginning, we describe the SSD that detects objects from a single layer. Actually,
it uses multiple layers(multi-scale feature maps the detecting objects independently.
As CNN reduces the spatial dimension gradually, the resolution of the feature maps
also decreases. SSD uses lower resolution layers for the detect larger-scale objects.
For example, the 4× 4 feature maps are used for the larger-scale object
BIBLIOGRAPHY

Indeed. (n.d.). Retrieved from https://www.indeed.com/

This website provided insights into job listing functionalities, user interface design, and search algorithms.

LinkedIn. (n.d.). Retrieved from https://www.linkedin.com/jobs/

The LinkedIn job portal offered valuable information on user profile management, job recommendations,
and employer features.

Glassdoor. (n.d.). Retrieved from https://www.glassdoor.com/index.htm

Glassdoor provided insights into empl

oyer branding, company reviews,

and job application processe

Face Recognition System Mini Document
No ratings yet
Face Recognition System Mini Document
80 pages
(FREE PDF Sample) Programming Principles and Practice Using C 3rd Edition Stroustrup Ebooks
75% (4)
(FREE PDF Sample) Programming Principles and Practice Using C 3rd Edition Stroustrup Ebooks
52 pages
Table of Contents
No ratings yet
Table of Contents
60 pages
Final Report - 1
No ratings yet
Final Report - 1
55 pages
Sl-Unit 5
100% (1)
Sl-Unit 5
25 pages
Architecture of Complex Web Applications
No ratings yet
Architecture of Complex Web Applications
183 pages
Akshaya 1
No ratings yet
Akshaya 1
68 pages
Noush Mini Proj
No ratings yet
Noush Mini Proj
61 pages
Chapter 5: CPU Scheduling: Silberschatz, Galvin and Gagne ©2009 Operating System Concepts - 8 Edition
No ratings yet
Chapter 5: CPU Scheduling: Silberschatz, Galvin and Gagne ©2009 Operating System Concepts - 8 Edition
46 pages
Health Mental Ai Driven Companion
No ratings yet
Health Mental Ai Driven Companion
54 pages
Malwaredetection 07
No ratings yet
Malwaredetection 07
54 pages
PIREEE236
No ratings yet
PIREEE236
35 pages
Adfly Api v1 Documentation PDF
No ratings yet
Adfly Api v1 Documentation PDF
71 pages
Batch 1 Project Book
No ratings yet
Batch 1 Project Book
73 pages
Final Document Recent f5
No ratings yet
Final Document Recent f5
52 pages
Final Document Recent f4
No ratings yet
Final Document Recent f4
52 pages
Phase 1
No ratings yet
Phase 1
78 pages
Report On AttendanceManagementSystem
No ratings yet
Report On AttendanceManagementSystem
105 pages
Dwf13 Amf Aut t1059
No ratings yet
Dwf13 Amf Aut t1059
151 pages
Project Documentation Final (Sudharshanan)
No ratings yet
Project Documentation Final (Sudharshanan)
62 pages
Model PRJCT Java
No ratings yet
Model PRJCT Java
103 pages
Operating Systems
No ratings yet
Operating Systems
15 pages
EnSPy: Python Library For Computations of Ensembles of Particles On GPU
No ratings yet
EnSPy: Python Library For Computations of Ensembles of Particles On GPU
41 pages
Bridging Risk and Innovation: Generative Ai": Bachelor of Technology
No ratings yet
Bridging Risk and Innovation: Generative Ai": Bachelor of Technology
64 pages
Min 2
No ratings yet
Min 2
63 pages
Yamini
No ratings yet
Yamini
60 pages
Coding Examples From Simple To Complex
No ratings yet
Coding Examples From Simple To Complex
240 pages
Mini Project Documentation
No ratings yet
Mini Project Documentation
37 pages
Noshith K
No ratings yet
Noshith K
60 pages
Caching in The Distributed Environment
100% (3)
Caching in The Distributed Environment
25 pages
Batch 4
No ratings yet
Batch 4
80 pages
Log
No ratings yet
Log
13 pages
Null 2
No ratings yet
Null 2
72 pages
Lab 5
No ratings yet
Lab 5
11 pages
Kwikwit Documentation (Anurag)
No ratings yet
Kwikwit Documentation (Anurag)
58 pages
1822 B.tech It Batchno 340
No ratings yet
1822 B.tech It Batchno 340
48 pages
AD-17 Documentation Report Projects
No ratings yet
AD-17 Documentation Report Projects
42 pages
A Project On Early Predictor of Retinal Diseases by Image Processing
No ratings yet
A Project On Early Predictor of Retinal Diseases by Image Processing
39 pages
Webdesign and Programming-Chap7-2
No ratings yet
Webdesign and Programming-Chap7-2
57 pages
C++ Notes
No ratings yet
C++ Notes
61 pages
PDL Lab 4
No ratings yet
PDL Lab 4
32 pages
MINI PROJECT REPORT-converted Kuld
No ratings yet
MINI PROJECT REPORT-converted Kuld
77 pages
Ojt SS
No ratings yet
Ojt SS
33 pages
Online Car Rental System
No ratings yet
Online Car Rental System
52 pages
Unit 3 C++
No ratings yet
Unit 3 C++
22 pages
OSD Printout
No ratings yet
OSD Printout
18 pages
1822 B.tech It Batchno 358
No ratings yet
1822 B.tech It Batchno 358
119 pages
Git Command Cheat Sheet & Quick Reference
No ratings yet
Git Command Cheat Sheet & Quick Reference
11 pages
Acknowledgment Skin Lesion
No ratings yet
Acknowledgment Skin Lesion
8 pages
Prem Starting
No ratings yet
Prem Starting
6 pages
Weather Forecasting-3
No ratings yet
Weather Forecasting-3
39 pages
Ceng1004 Introduction
No ratings yet
Ceng1004 Introduction
23 pages
17 String Matching - Rabin Karp Algorithm
No ratings yet
17 String Matching - Rabin Karp Algorithm
25 pages
A76XX Series MQTT - EX - AT Command Manual - V1.00
No ratings yet
A76XX Series MQTT - EX - AT Command Manual - V1.00
13 pages
A Novel Image Style Transfer Model Using Generative AI
No ratings yet
A Novel Image Style Transfer Model Using Generative AI
72 pages
LAB - 5 (Banker's) - Code
No ratings yet
LAB - 5 (Banker's) - Code
2 pages
Project
No ratings yet
Project
63 pages
Sinemn Pro
No ratings yet
Sinemn Pro
54 pages
Server Side Game Developer
No ratings yet
Server Side Game Developer
1 page
FSD Report11
No ratings yet
FSD Report11
23 pages
Model Answer-17332 (PR - Test - 50 - 1) - FIRST
No ratings yet
Model Answer-17332 (PR - Test - 50 - 1) - FIRST
21 pages
Excel Skills For Business: Intermediate II: Week 3: Automating Lookups
50% (4)
Excel Skills For Business: Intermediate II: Week 3: Automating Lookups
10 pages
Module-2 & 3: Linux Commands
No ratings yet
Module-2 & 3: Linux Commands
28 pages
Varshith Asa
No ratings yet
Varshith Asa
10 pages
National Institute of Technology Calicut: Signature Verification Project Report
No ratings yet
National Institute of Technology Calicut: Signature Verification Project Report
39 pages
Sample
No ratings yet
Sample
9 pages
Objectfy 1
No ratings yet
Objectfy 1
54 pages
Final Document
No ratings yet
Final Document
69 pages
Mini Project1
0% (1)
Mini Project1
47 pages
Finite State Machine Design
No ratings yet
Finite State Machine Design
7 pages
Dayanand Saraswati P.G.College Shajapur, (M.P.) : A Major Project Report ON
No ratings yet
Dayanand Saraswati P.G.College Shajapur, (M.P.) : A Major Project Report ON
85 pages
Phase 2 Final
100% (1)
Phase 2 Final
65 pages
AUTOMATIC AGE AND GENDER DETECTION MCA - Documentes
No ratings yet
AUTOMATIC AGE AND GENDER DETECTION MCA - Documentes
68 pages
Visvesvaraya Technological University: Traffic Signal
No ratings yet
Visvesvaraya Technological University: Traffic Signal
62 pages
Black Book 2020
No ratings yet
Black Book 2020
49 pages
Group 14 Minor Project Report
No ratings yet
Group 14 Minor Project Report
57 pages
Designing Better Indexes and Influencing DB2 On z/OS Index Usage
No ratings yet
Designing Better Indexes and Influencing DB2 On z/OS Index Usage
13 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
28 pages
Solutions To Chapters 15 & 16
No ratings yet
Solutions To Chapters 15 & 16
8 pages
Object Detection
No ratings yet
Object Detection
73 pages
Opt Valuation
No ratings yet
Opt Valuation
2 pages
Programming PDF
No ratings yet
Programming PDF
108 pages
Project Report
No ratings yet
Project Report
22 pages
Fake Job Posting Detection Documentation Updated
No ratings yet
Fake Job Posting Detection Documentation Updated
65 pages
Flight Fare Prediction Final
No ratings yet
Flight Fare Prediction Final
65 pages
Capstone Project Report
No ratings yet
Capstone Project Report
38 pages
Major Project Documentation Final 2
No ratings yet
Major Project Documentation Final 2
62 pages
Quiz Gaming System: Bachelor of Engineering IN Compuer Science and Engineering
No ratings yet
Quiz Gaming System: Bachelor of Engineering IN Compuer Science and Engineering
26 pages
DevOps Engineer's Guidebook: Essential Techniques
From Everand
DevOps Engineer's Guidebook: Essential Techniques
Ted Noreux
No ratings yet
CircuitPython in Practice: Definitive Reference for Developers and Engineers
From Everand
CircuitPython in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet