[go: up one dir, main page]

0% found this document useful (0 votes)
24 views43 pages

Pratham Content

Uploaded by

pkcking27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views43 pages

Pratham Content

Uploaded by

pkcking27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

OBJECT DETECTION

A
PROJECT REPORT
Submitted in partial fulfillment of the Requirements
For the award of Bachelor of Computer Application Degree

LNCT UNIVERSITY

MAJOR PROJECT REPORT

Submitted by
Name of Student – Pratham Chourasiya
Enrollment No. LNCBBCAAI130

Under the Guidance of


Prof. Dev Kumar Chouhan

BACHELOR OF COMPUTER APPLICATION


LNCT UNIVERSITY

BHOPAL
Submission Date: 18 May 2024
BACHELOR OF COMPUTER APPLICATION

CERTIFICATE

This is to certify that the work embodied in this Major project “OBJECT
DETECTION” has been satisfactorily completed by Pratham Chourasiya
LNCBBCAAI130 It is a bonafide piece of work, carried out under my guidance
in the Bachelor of Computer Application, LNCT University, Bhopal for the
partial fulfillment of the Bachelor of Computer Application degree during the
academic session January-June, 2024.

Guided By
Prof. Dev Kumar Chouhan

Approved By

Head of Department

Forwarded by

Director
LNCT University, Bhopal
LNCT UNIVERSITY
LNCT UNIVERSITY, BHOPAL
BACHELOR OF COMPUTER APPLICATION

CERTIFICATE OF APPROVAL

This foregoing major project is hereby approved as a creditable study of a


Computer Application Subject carried out and presented in a manner
satisfactory to warranty its acceptance as a prerequisite to the degree for which
it has been submitted. It is understood that by this approval the undersigned do
not necessarily endorse or approve any statement made, opinion expressed or
conclusion drawn therein, but approve the thesis only for the purpose for which
it has been submitted.

Internal Examiner External


Examiner
LNCT UNIVERSITY, BHOPAL

BACHELOR OF COMPUTER APPLICATION

DECLARATION

I, Pratham Chourasiya LNCBBCAAI130 a student of Bachelor of


Computer Application, LNCT University, Bhopal hereby declare that the
work presented in this Major Project is outcome of my own work, is bonafide,
correct to the best of my knowledge and this work has been carried out taking
care of Engineering Ethics. The work presented does not infringe any patented
work and has not been submitted to any University for the award of any degree
or any professional diploma.

Pratham Chourasiya
Enrollment No. --- LNCBBCAAI130
Date : ----18 MAY 2024
LNCT UNIVERSITY, BHOPAL
BACHELOR OF COMPUTER APPLICATION

ACKNOWLEDMENT

We express our sincere indebtedness towards our guide Prof. Dev


Kumar Chouhan sir ,Bachelor of Computer Application, LNCT
UNIVERSITY, Bhopal for his invaluable guidance, suggestions and
supervision throughout the work. Without his kind patronage and
guidance the project would not have taken shape. We would also like
to express our gratitude and sincere regards for his kind approval of
the project, time to time counseling and advices. We would also like
to thank to our Director Dr. Sanjay Bajpai, Bachelor of Computer
Application, LNCT UNIVERSITY, BHOPAL for his expert advice
and counseling from time to time. We owe sincere thanks to all the
faculty members in the department of Bachelor of Computer
Application, LNCT UNIVERSITY, Bhopal for their kind guidance
and encouragement from time to time.
TABLE OF CONTENTS

S.NO TITLE PG.NO

1 INTRODUCTION 04
1.1 PURPOSE AND OBJECTIVES 01
1.2 EXISTING AND PROPOSED SYSTEM 02
1.3 SCOPE OF PROJECT
04
2 LITERATURE SURVEY 5-6

3 SYSTEM ANALYSIS 7-9

3.1 HARDWARE AND SOFTWARE REQUIREMENTS 08

3.2 SOFTWARE REQUIREMENTS SPECIFICATION 09

4 SYSTEM DESIGN 10-17

4.1 DESCRIPTION 11

4.2 ARCHITECTURE 12

4.3 UML DIAGRAMS 13-17

5 METHODOLOGY 18-39

5.1 TECHNOLOGIES USED 19-23

5.2 MODULES DESCRIPTION 24

5.3 PROCESS/ALGORITHM 25-39

6 IMPLEMENTATION 40-52

6.1 SAMPLE CODE 41-46

6.2 OUTPUT SCREENS 47-49

6.3 TEST CASES 50-52 7

CONCLUSION 53

BIBLIOGRAPHY 55
ABSTRACT

Object detection is a key problem in computer vision. Detection can


be difficult since there are all kinds of variations in orientation,
lighting, background and occlusion that can result in completely
different images of the very same object.

Now with the advance of deep learning and neural network, we can
finally tackle such problems without coming up with various
heuristics realtime.

The project “Object Detection” detects objects efficiently based on


CNN algorithm and apply the algorithm on image data. Various
algorithms like ‘You Only Look Once’ and other Convolutional Neural
Networks helps in achieving detection of objects in different position
and orientation. We first take a pretrained convolutional neural
network.
INTRODUCTION

1.1 PURPOSE AND OBJECTIVES

A few years ago, the creation of the software and


hardware image processing systems was mainly limited to
the development of the user interface, which most of the
programmers of each firm were engaged in.

However, this has not yet led to the cardinal progress in


solving typical tasks of recognizing faces, car numbers,
road signs, analyzing remote and medical images, etc.

In these scenario, deep learning is typically used and


among various network architectures used in deep
learning, Convolutional Neural Networks (CNN) are widely
used in image recognition.

We train the last layer of the network based on the


number of classes that need to be detected and the
model is fed with different types of objects in different
position, lightning and orientation which is highly required
for getting better prediction. Then we get the Region of
Interest for each image. Each and every object that is
detected is labelled along with the accuracy of the
detection over the bounded boxes.
1.2 EXISTING SYSTEM

➢ The improved versions of R-CNN, like Fast RCNN and Faster-RCNN, used
more strategies to reduce computation of region but they didn’t pushed real-time
inference speed.

➢ A YOLO system, however, broke the bottleneck by integrating region


proposal and classification into a single regression problem straight from image
pixels to bounding box coordinates and class probabilities and evaluate each full
image with a single run.

➢ Since the whole detection pipeline is a single network, it can be optimized


end-to-end directly on detection performance. YOLO is the first framework to reach
real time detection standard with 45 FPS (on GPU) and a MAP(mean Average
Precision) of 63.4% on VOC2007, but still has drawback in detecting smaller objects.

1.2 PROPOSED SYSTEM

➢ To avoid above situations we propose this model We included Single Shot


Multi Box Detector architecture.

➢ We build a Mobile-network model , TensorFlow, OpenCV that detects


objects, with much accuracy, and is robust.

➢ Through combining anchor box proposal system of faster-RCNN and using


multi-scale features to do detection layer, detecting smaller objects was eased.

1.3 SCOPE OF PROJECT

➢ This application is intended to be used in any working environment accuracy


and precision are highly desired to serve the purpose.

➢ As mentioned, the proposed model is able to detect around 90 objects. As


part of the future enhancements, the model will be custom trained with the other
objects to increase its detection capability.

➢ With the help of transfer learning, the used network will be trained with
other objects to increase the scope of objects the Mobile.Net can detect.

LITERATURE SURVEY
➢ To avoid above situations we propose this model
We included Single Shot Multi Box Detector architecture.

➢ We build a Mobile-network model , TensorFlow,


OpenCV that detects objects, with much accuracy, and is
robust.

➢ Through combining anchor box proposal system of


faster-RCNN and using multi-scale features to do
detection layer, detecting smaller objects was eased

SYSTEM ANALYSIS
3.1 HARDWARE AND SOFTWARE REQUIREMENTS
The development and deployment of the application require the following
general and specific minimum requirements for hardware:

Components Minimum requirement

Ram Capacity Minimum of 4GB

Camera Any desktop/laptop supported

Hard Disk Minimum of 2GB

Processor Intel Pentium or higher

The development and deployment of the application requires the following general and
specific minimum requirements for software:
Components Minimum requirement

Operating System Windows 7,8,10 (or)


Linux

Coding Language Python

Coding Languages libraries OpenCV – pip install opencv-python. (version


4.1.0 and above) TensorFlow – pip install tensor
flow (TensorFlow as backend) Keras – pip install
keras
Numpy – pip install numpy

Software Environment Visual Studio

3.2 SOFTWARE REQUIREMENTS SPECIFICATION

Functional Requirements
➢ A real time image or a video can be fed to the ML model.

➢ User has to open the application.

➢ User has to choose either a live detection or a video detection.

➢ User will be able to get labelled objects detected.

➢ All the detected objects along with their labels are displayed with in the video or the
image.

Non-Functional Requirements.

➢ Performance: User will get the desired output, without getting interrupted.

SOURCE CODE
clc
%% Test Two
%%Histogram of Orientated Gradients
%%Histogram of Pixel Orientation
%%Histogram of curvatures
%% Eccentricity
%clear all
close all
%% Area Ratios Weight
tic
load ('newData.mat')%read video
%load('FinalHog.mat')
depth=6;
Params = [9 3 2 1 0.2];
video=mmreader('F:\Thesis\Testing Videos\T4.h64');%\Other
Datasets\test_videos\test_videos\3.avi');
%[7.5 18.5 345 224]));
for k=3501:5:4000
%%Read an image
figure(1);
image=imcrop(read(video,k),[7.5 18.5 345 224]
;

%%Gray Scale image


img=(rgb2gray(image));
BW=edge(sqrt(double(img)),'canny',0.29);
% [x,y]=find(BW);
% deri= diff([x y],2) ;
% ind=find(deri(:,1)~=0&deri(:,2)~=0);
img1=sqrt(double(img))-sqrt(double(fi));
fore=zeros(size(img1));
ind=find(img1>max(max(img1)*0.6));
fore(ind)=255;
%BW=abs(edgelinking2_C(BW,3,3));
[BW AngleLeft AngleRight]= edgelinking2_C(BW,3,3);
BW=abs(BW);
st=strel('disk',3);
BW=imopen(BW,st);
fore=imdilate(fore,st);
LabelsList=unique(BW(ind)

SYSTEM DESIGN

4.1 DESCRIPTION
Object detection is to detect all instances of objects from a known class, such as people,
cars, or faces in an image. Generally, only a small number of instances of the object are
present in the image, but there is a very large number of possible locations and scales at
which they can occur and that need to somehow be explored.

Each detection of the image is reported with some form of pose information. This is as
simple as the location of the object, a location, and scale, or the extent of the object
defined in terms of a bounding box.

In some other situations, the pose information is more detailed and contains the
parameters of a linear or non-linear transformation. For example, face detection in a
face detector may compute the locations of the eyes, nose, and mouth, in addition to
the bounding box of the face

ARCHITECTURE

RCNN or Region Convolutional Neural Network determines the location of multiple


objects in an image. An image is split into various regions of interest to scan for an
object. It scans for the specific regions of interest that will likely contain an object of
value. The input image is processed using selecting search method, generating about
2000 region proposals. The proposal regions are run through CNN, then fed into a
classification subnetwork to determine the Object class. The figure below illustrates R-
CNN architecture
4.3UML DIAGRAMS

The unified modeling is a standard language for specifying, visualizing,


constructing, and documenting the system and its components is a graphical
language that provides a vocabulary andset of semantics and rules. The UML
focuses on the conceptual and physical representation of the system.

It captures the decisions and understandings about systems that must


be constructed. It is usedto understand, design, configure and control
information about the systems.

Depending on the development culture, some of these artifacts are treated more or
less formally than others.

Such artifacts are not only the deliverables of a project; they are also
critical in controlling, measuring, and communicating about a system during
its development and after its deployment.

The UML addresses the documentation of a system's architecture and all of its details.
The UML also provides a language for expressing requirements and for tests. Finally, the

4.3.1Use Case Diagram:

A use case diagram is a graph of actors’ set of use cases enclosed by a system
boundary, communication associations between the actors and users, and
generalization among use cases.

In our project, the actors are the user and the machine(server). The user
starts the application and selects the a desired mode of video input for the
network and the server classifies the objects based on the labeled-map , then
it returns the labeled frame or object as the output.
4.3.2Sequence Diagram:

Sequence Diagrams are interaction diagrams that detail how


operations are carried out. They capture the interaction between objects in
the context of a collaboration. there are three life lines in our project they are
user-application, server. These operations between the lifelines are of
stepwise in an order.
4.3.3Activity Diagram:

A UML activity diagram is basically used to document the logic of a


single operation, a single- use case, and follow the communication process.
The activity can be described as an operation of the system. The control flow
is drawn from one operation to another.
4.3.4Class Diagram:
Objects in this class diagram are user, application and the server. In
this class diagram, the server and the application have an association
connection. User and Application have a direct association connection.
5.METHODOLOGY

5.1 TECHNOLOGIES USED:


Python:

Python is an interpreted, object-oriented, high-level programming language with


dynamic semantics.

Its high-level built-in data structures, combined with dynamic typing and
dynamic binding, make it very attractive for Rapid Application Development, as
well as for use as a scripting or glue language to connect existing components
together.

Python's simple, easy to learn syntax emphasizes readability and therefore reduces

the cost of program maintenance.

OpenCV
OpenCV-Python is a library of Python bindings designed to solve computer
vision problems.

OpenCV-Python makes use of Numpy, which is a highly optimized library for


numerical operations with a MATLAB-style syntax.

All the OpenCV array structures are converted to and from Numpy arrays. This
also makes it easier to integrate with other libraries that use Numpy such as SciPy and
Matplotlib.

TensorFlow – pip install tensorflow- gpu

It is an open source artificial intelligence library, using data flow graphs to


build models. It allows developers to create largescale neural networks with many
layers. TensorFlow is mainly used for: Classification, Perception, Understanding,
Discovering, Prediction and Creation.
Numpy is a general-purpose array-processing package. It provides a
highperformance multidimensional array object and tools for working with these
arrays.

It is the fundamental package for scientific computing with Python.

Besides its obvious scientific uses, Numpy can also be used as an efficient multi-
dimensional container of generic data.

5.2MODULES DESCRIPTION

Modules:

Various modules used in the project are:

1) GUI

2) Object-Detection

Module I:
It mainly concentrates on User Interface. A library called tKinter is used to
develop an application UI . It has a text describing the project and has some
buttons which lets user to choose between the options such as Live Object
Detection and Object Detection Using a Video.

It also lets user to upload an mp4 file in-order to detect objects using a video file.

Module II:
This project lets user to choose detection using either live detection or by
uploading an mp4 file. The project uses pre-trained Single Shot Multi Box-
Detector(SSD) Mobile Net to detect various objects. A frozeninference-graph is
extracted and used which has the pre-trained weights of the objects which helps in
faster execution and efficient detection of objects in the frame.

A session is used in the project, which allows user to execute graphs i.e., frozen
inference graphs.

Each and every frame from the live feed or from the video file is used to detect
the objects and after successful detection, a rectangular box is drawn around the
detected object and also labelled accordingly as detected from the labelled-map.

5.3 PROCESS/ALGORITHM

Mobile Net:

❖ Mobile Net is an object detector is an efficient CNN architecture designed for mobile
and embedded vision applications. This architecture uses proven depth-wise
separable convolutions to build light weight deep neural networks.
❖ The core layers of Mobile Net is built on depth-wise separable filters. The first layer,
which is a full convolution, is an exception.
❖ Basic operations like reshaping and resizing of images during feeding the data to the
model is done.
❖ Data preprocessing involves conversion of data from a given format to much more
user friendly, desired and meaningful format.
❖ The proposed method deals with image and video data using Numpy and OpenCV.

❖ Data visualization is the process of transforming abstract data to meaningful


representations using knowledge communication and insight discovery through
encodings.

❖ The SSD MobileNet model is pretrained with COCO(common objects in context)


dataset.

❖ This model consists of 90 different labelled classes.


❖ These networks uses Depth-Wise separable convolutions in place of the standard
convolutions used in earlier architectures to build lighter models.

❖ Each depth-wise separable convolution layer consists of a depthwise convolution


and a pointwise convolution. Counting depthwise and pointwise convolutions as
separate layers, a MobileNet has 28 layers.

❖ A frozen inference graph from the pre-trained which has the weights of pretrained
objects is used in detecting objects.

❖ Freezing is the process to identify and save all of required things(graph, weights
etc) in a single file that you can easily use.
The above shows the pictorial representation of depth-wise separable convolutions .
R-CNN

To circumvent the problem of selecting a huge number of regions, Ross Girshick


et al. proposed a method where we use the selective search to extract just 2000
regions from the image and he called them region proposals. Therefore, instead of
trying to classify the huge number of regions, you can just work with 2000 regions.
These 2000 region proposals are generated by using the selective search algorithm
which is written below. Selective Search:

1. Generate the initial sub-segmentation, we generate many candidate regions


2. Use the greedy algorithm to recursively combine similar regions into larger ones
3. Use generated regions to produce the final candidate region proposals
These 2000 candidate regions which are proposals are warped into a square and fed
into aconvolutional neural network that produces a 4096-dimensional feature vector
as output.The CNN plays the role of feature extractor and the output dense layer
consists of the features extracted from the image and the extracted features are fed
into an SVM for the classify the presence of the job within that that candidate region
proposal. In addition to predicting the presence of an object within there gion
proposals, the algorithm also predicts four values which are offset values for
increasing the precision of the bounding box. For example, given the region proposal,
the algorithm might have predicted the presence of a person but the face of that
person within that region proposal could have been cut in half. Therefore, the offset
values which is given help in adjusting the bounding box of the region proposal

Fast R-CNN

The same author of the previous paper(R-CNN) solved some of the drawbacks
of the R-CNN to build a faster object detection algorithm and it was called Fast R-CNN.
The approach is similar to the R-CNN algorithm. But, instead of feeding the
region proposals to the CNN, we feed the input image to the CNN to generate a
convolutional feature map.

From the convolutional feature map, we can identify the region of the proposals
and warp them into the squares, and by using an RoI pooling layer shape them into a
fixed size so that they can be fed into a fully connected layer. From the RoI feature
vector, we can use a softmax layer to predict the class of the proposed region and
also the offset values for the bounding box.
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time. Instead,
the convolution operation is always done only once per image and a feature map is
generated from it.

The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time. Instead,
the convolution operationisalwaysdone only once per image and a feature map is
generated from it

Faster R-CNN

Both of the above algorithms(R-CNN & Fast R-CNN) uses selective search
to find out the region proposals. Selective search is a slow and time-
consuming process that affects the performance of the network.

Similar to Fast R-CNN, the image is provided as an input to a convolutional


network which provides a convolutional feature map. Instead of using the selective
search algorithm for the feature map to identify the region proposals, a separate
network is used to predict the region proposals. The predicted region which is the
proposal is then reshaped using an RoI pooling layer which is used to classify the
YOLO — You Only Look Once
All the previous object detection algorithms have used regions to localize the object
within the image. The network does not look at the complete image. Instead, parts of
the image which has high probabilities of containing the object. YOLO or You Only
Look Once is an object detection algorithm much is different from the region-based
algorithms seen above. In YOLO a single convolutional network predicts the bounding
boxes and the class probabilities for these boxes.
YOLO works by taking an image and split it into an SxS grid, within each of the
grid we taken bounding boxes. For each of the bounding boxes, the network gives an
output a class probability and offset values for the bounding box. The bounding boxes
have the class probability above-threshold value is selected and used to locate the
object within the image.

YOLO is orders of magnitude faster(45 frames per second) than any other object
detection algorithms. The limitation of the YOLO algorithm is that it struggles with the
small objects within the image for example, it might have difficulties in identifying a
flock of birds. This is due to the spatial constraints of the algorithm.

SDD :
The SSD object detection composes of 2 parts:

1. Extract feature maps, and

2. Apply convolution filters to detect objects.


SSD uses VGG16 to extract feature maps. Then it detects objects using the
Conv4_3layer. For illustration, we draw the Conv4_3 to be 8 × 8 spatially (it should be
38 × 38). For each cell in the image(also called location), it makes 4 object predictions.

Each prediction composes of a boundary box and 21 scores for each class (one
extra class for no object), and we pick the highest score as the class for the bounded
object. Conv4_3 makes total of38× 38 × 4 predictions: four predictions per cell
regardless of the depth of feature maps. As expected, many predictions contain no
object. SSD reserves a class “0” to indicate
SSD does not use the delegated region proposal network. Instead, it resolves to a
very simple method.It computes both the location and class scores using small
convolution filters. After extraction the feature maps, SSD applies 3 × 3 convolution
filters for each cell to make predictions. (These filters compute the results just like the
regular CNN filters.) Each filter gives outputs as 25channels:21scores for each class
plus one boundary box.
Beginning, we describe the SSD that detects objects from a single layer. Actually,
it uses multiple layers(multi-scale feature maps the detecting objects independently.
As CNN reduces the spatial dimension gradually, the resolution of the feature maps
also decreases. SSD uses lower resolution layers for the detect larger-scale objects.
For example, the 4× 4 feature maps are used for the larger-scale object
Numpy is a general-purpose array-processing package. It provides a highperformance multidimensional
array object and tools for working with these arrays.

It is the fundamental package for scientific computing with Python.

Besides its obvious scientific uses, Numpy can also be used as an efficient multi-
dimensional container of generic data.

5.2MODULES DESCRIPTION

Modules:

Various modules used in the project are:

3) GUI

4) Object-Detection

Module I:

It mainly concentrates on User Interface. A library called tKinter is used to


develop an application UI . It has a text describing the project and has some
buttons which lets user to choose between the options such as Live Object
Detection and Object Detection Using a Video.

It also lets user to upload an mp4 file in-order to detect objects using a video file.

Module II:
This project lets user to choose detection using either live detection or by
uploading an mp4 file. The project uses pre-trained Single Shot Multi Box-
Detector(SSD) Mobile Net to detect various objects. A frozeninference-graph is
extracted and used which has the pre-trained weights of the objects which helps in
faster execution and efficient detection of objects in the frame.
A session is used in the project, which allows user to execute graphs i.e., frozen
inference graphs.

Each and every frame from the live feed or from the video file is used to detect
the objects and after successful detection, a rectangular box is drawn around the
detected object and also labelled accordingly as detected from the labelled-map.

5.3 PROCESS/ALGORITHM

Mobile Net:

❖ Mobile Net is an object detector is an efficient CNN architecture designed for mobile
and embedded vision applications. This architecture uses proven depth-wise
separable convolutions to build light weight deep neural networks.
❖ The core layers of Mobile Net is built on depth-wise separable filters. The first layer,
which is a full convolution, is an exception.
❖ Basic operations like reshaping and resizing of images during feeding the data to the
model is done.
❖ Data preprocessing involves conversion of data from a given format to much more
user friendly, desired and meaningful format.
❖ The proposed method deals with image and video data using Numpy and OpenCV.

❖ Data visualization is the process of transforming abstract data to meaningful


representations using knowledge communication and insight discovery through
encodings.

❖ The SSD MobileNet model is pretrained with COCO(common objects in context)


dataset.

❖ This model consists of 90 different labelled classes.


Depth Wise Convolution:
Depth-wise Convolution is a type of convolution where we apply a single
convolutional filter for each input channel. In the regular 2D convolution performed
over multiple input channels, the filter is as deep as the input and lets us freely mix
channels to generate each element in the output. In contrast, depth-wise
convolutions keep each channel separate. In this process, the procedural module
followed :

2. Split the input and filter into channels.


3. We convolve each input with the respective filter.
4. We stack the convolved outputs together.

The above shows the pictorial representation of depth-wise separable convolutions .


R-CNN
To circumvent the problem of selecting a huge number of regions, Ross Girshick
et al. proposed a method where we use the selective search to extract just 2000
regions from the image and he called them region proposals. Therefore, instead of
trying to classify the huge number of regions, you can just work with 2000 regions.
These 2000 region proposals are generated by using the selective search algorithm
which is written below. Selective Search:

4. Generate the initial sub-segmentation, we generate many candidate regions


5. Use the greedy algorithm to recursively combine similar regions into larger ones
6. Use generated regions to produce the final candidate region proposals
These 2000 candidate regions which are proposals are warped into a square and fed
into aconvolutional neural network that produces a 4096-dimensional feature vector
as output.The CNN plays the role of feature extractor and the output dense layer
consists of the features extracted from the image and the extracted features are fed
into an SVM for the classify the presence of the job within that that candidate region
proposal. In addition to predicting the presence of an object within there gion
proposals, the algorithm also predicts four values which are offset values for
increasing the precision of the bounding box. For example, given the region proposal,
the algorithm might have predicted the presence of a person but the face of that
person within that region proposal could have been cut in half. Therefore, the offset
values which is given help in adjusting the bounding box of the region proposal
Fast R-CNN

The same author of the previous paper(R-CNN) solved some of the drawbacks
of the R-CNN to build a faster object detection algorithm and it was called Fast R-CNN.
The approach is similar to the R-CNN algorithm. But, instead of feeding the
region proposals to the CNN, we feed the input image to the CNN to generate a
convolutional feature map.

From the convolutional feature map, we can identify the region of the proposals
and warp them into the squares, and by using an RoI pooling layer shape them into a
fixed size so that they can be fed into a fully connected layer. From the RoI feature
vector, we can use a softmax layer to predict the class of the proposed region and
also the offset values for the bounding box.
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time. Instead,
the convolution operation is always done only once per image and a feature map is
generated from it.

The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time. Instead,
the convolution operationisalwaysdone only once per image and a feature map is
generated from it
Both of the above algorithms(R-CNN & Fast R-CNN) uses selective search
to find out the region proposals. Selective search is a slow and time-
consuming process that affects the performance of the network.

Similar to Fast R-CNN, the image is provided as an input to a convolutional


network which provides a convolutional feature map. Instead of using the selective
search algorithm for the feature map to identify the region proposals, a separate
network is used to predict the region proposals. The predicted region which is the
proposal is then reshaped using an RoI pooling layer which is used to classify the
image within the proposed region and predict the offset values for the bounding
boxes.

From the above graph, you can see that Faster R-CNN is much faster than it’s
predecessors. There fore,it can even be used for real-time object detection.

YOLO — You Only Look Once

All the previous object detection algorithms have used regions to localize the object
within the image. The network does not look at the complete image. Instead, parts of
the image which has high probabilities of containing the object. YOLO or You Only
Look Once is an object detection algorithm much is different from the region-based
algorithms seen above. In YOLO a single convolutional network predicts the bounding
boxes and the class probabilities for these boxes.
YOLO works by taking an image and split it into an SxS grid, within each of the
grid we taken bounding boxes. For each of the bounding boxes, the network gives an
output a class probability and offset values for the bounding box. The bounding boxes
have the class probability above-threshold value is selected and used to locate the
object within the image.

YOLO is orders of magnitude faster(45 frames per second) than any other object
detection algorithms. The limitation of the YOLO algorithm is that it struggles with the
small objects within the image for example, it might have difficulties in identifying a
flock of birds. This is due to the spatial constraints of the algorithm.

SDD :
The SSD object detection composes of 2 parts:

3. Extract feature maps, and

4. Apply convolution filters to detect objects.


SSD uses VGG16 to extract feature maps. Then it detects objects using the
Conv4_3layer. For illustration, we draw the Conv4_3 to be 8 × 8 spatially (it should be
38 × 38). For each cell in the image(also called location), it makes 4 object predictions.
Each prediction composes of a boundary box and 21 scores for each class (one
extra class for no object), and we pick the highest score as the class for the bounded
object. Conv4_3 makes total of38× 38 × 4 predictions: four predictions per cell
regardless of the depth of feature maps. As expected, many predictions contain no
object. SSD reserves a class “0” to indicate

SSD does not use the delegated region proposal network. Instead, it resolves to a
very simple method.It computes both the location and class scores using small
convolution filters. After extraction the feature maps, SSD applies 3 × 3 convolution
filters for each cell to make predictions. (These filters compute the results just like the
regular CNN filters.) Each filter gives outputs as 25channels:21scores for each class
plus one boundary box.
Beginning, we describe the SSD that detects objects from a single layer. Actually,
it uses multiple layers(multi-scale feature maps the detecting objects independently.
As CNN reduces the spatial dimension gradually, the resolution of the feature maps
also decreases. SSD uses lower resolution layers for the detect larger-scale objects.
For example, the 4× 4 feature maps are used for the larger-scale object
BIBLIOGRAPHY

Indeed. (n.d.). Retrieved from https://www.indeed.com/

This website provided insights into job listing functionalities, user interface design, and search algorithms.

LinkedIn. (n.d.). Retrieved from https://www.linkedin.com/jobs/

The LinkedIn job portal offered valuable information on user profile management, job recommendations,
and employer features.

Glassdoor. (n.d.). Retrieved from https://www.glassdoor.com/index.htm

Glassdoor provided insights into empl

oyer branding, company reviews,

and job application processe

You might also like