0% found this document useful (0 votes)

25 views54 pages

L7 Detection

Object detection involves predicting bounding boxes and class labels for objects in images. R-CNN was an early two-stage detector that used selective search to generate region proposals which were then classified using CNN features. Fast R-CNN improved on R-CNN by making the whole system trainable end-to-end using a multi-task loss over classification and bounding box regression. It introduced ROI pooling to extract fixed-length feature vectors from convolutional feature maps for each region proposal. Faster R-CNN built on this by incorporating the region proposal network to generate proposals within the detection network.

Uploaded by

Agha Kazim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views54 pages

L7 Detection

Uploaded by

Agha Kazim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Object detection

Image source
Outline
• Task definition and evaluation
• Two-stage detectors:
• R-CNN
• Fast R-CNN
• Faster R-CNN
• Single-stage and multi-resolution detectors
• Recent trends
Object detection evaluation
• At test time, predict bounding boxes, class labels, and confidence
scores
• For each detection, determine whether it is a true or false positive
• PASCAL criterion: Area(GT ∩ Det) / Area(GT ∪ Det) > 0.5
• For multiple detections of the same ground truth box, only one is
considered a true positive

dog: 0.6
dog
dog: 0.55

cat: 0.8 cat

Ground truth (GT)

Object detection evaluation
• At test time, predict bounding boxes, class labels, and confidence
scores
• For each detection, determine whether it is a true or false positive
• For each class, sort detections from highest to lowest confidence,
plot Recall-Precision curve and compute Average Precision
(area under the curve)
• Take mean of AP over classes to get mAP
Precision:
true positive detections /
total detections
Recall:
true positive detections /
total positive test instances
PASCAL VOC Challenge (2005-2012)

• 20 challenge classes:
• Person
• Animals: bird, cat, cow, dog, horse, sheep
• Vehicles: airplane, bicycle, boat, bus, car, motorbike, train
• Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

• Dataset size (by 2012): 11.5K training/validation images,

27K bounding boxes, 7K segmentations

http://host.robots.ox.ac.uk/pascal/VOC/
Progress on PASCAL detection
PASCAL VOC

Before CNNs

After CNNs
More recent benchmark: COCO

http://cocodataset.org/#home
COCO dataset: Tasks

image classification object detection

semantic segmentation instance segmentation

• Also: keypoint prediction, captioning, question answering…

COCO detection metrics

• Leaderboard: http://cocodataset.org/#detection-leaderboard
• Not updated since 2020
Object detection: Outline
• Task definition and evaluation
• Two-stage detectors

Proposal
Generation

Region Proposals

Image source
R-CNN: Region proposals + CNN features
Source: R. Girshick
SVMs Classify regions with SVMs
SVMs

SVMs Forward each region

through ConvNet
ConvNet
ConvNet

ConvNet
Warped image regions

Region proposals

Input image

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014
R-CNN details

• Regions: ~2000 Selective Search proposals

• Network: AlexNet pre-trained on ImageNet (1000 classes), fine-tuned
on PASCAL (21 classes)
• Final detector: warp proposal regions, extract fc7 network activations
(4096 dimensions), classify with linear SVM
• Bounding box regression to refine box locations
• Performance: mAP of 53.7% on PASCAL 2010
(vs. 35.1% for Selective Search and 33.4% for Deformable Part Models)
R-CNN pros and cons
• Pros
• Much more accurate than previous approaches!
• Any deep architecture can immediately be “plugged in”
• Cons
• Not a single end-to-end system
• Fine-tune network with softmax classifier (log loss)
• Train post-hoc linear SVMs (hinge loss)
• Train post-hoc bounding-box regressions (least squares)
• Training was slow (84h), took up a lot of storage
• 2000 CNN passes per image
• Inference (detection) was slow (47s / image with VGG16)
Fast R-CNN

Softmax classifier Linear +

softmax Linear Bounding-box regressors

FCs Fully-connected layers

RoI Pooling layer

Region Conv5 feature map of image

proposals

Forward whole image through ConvNet

ConvNet

Source: R. Girshick R. Girshick, Fast R-CNN, ICCV 2015

RoI pooling
• “Crop and resample” a fixed-size feature representing a
region of interest out of the outputs of the last conv layer
• Use nearest-neighbor interpolation of coordinates, max pooling

Conv feature map RoI

pooling
layer
FC layers
…

Region of Interest RoI

(RoI) feature
Source: R. Girshick, K. He
RoI pooling illustration

Image source
Prediction
• For each RoI, network predicts probabilities for 𝐶 + 1 classes
(class 0 is background) and four bounding box offsets for 𝐶
classes

R. Girshick, Fast R-CNN, ICCV 2015

Fast R-CNN training
Log loss + smooth L1 loss Multi-task loss

Linear +
softmax Linear

FCs

Trainable

ConvNet

Source: R. Girshick R. Girshick, Fast R-CNN, ICCV 2015

Multi-task loss
• Loss for ground truth class 𝑦, predicted class probabilities 𝑃(𝑦), ground
෠
truth box 𝑏, and predicted box 𝑏:
𝐿 𝑦, 𝑃, 𝑏, 𝑏෠ = −log 𝑃(𝑦) + 𝜆𝕀[𝑦 ≥ 1]𝐿reg (𝑏, 𝑏)
෠

softmax loss regression loss

• Regression loss: smooth 𝐿1 loss on top of log space offsets relative to

proposal

𝐿reg 𝑏, 𝑏෠ = ෍ smooth𝐿1 (𝑏𝑖 − 𝑏෠𝑖 )

𝑖={𝑥,𝑦,𝑤,ℎ}
Bounding box regression
Ground truth box
Target offset
to predict*
Region proposal
Predicted (a.k.a default box,
Loss
offset prior, reference,
anchor)

Predicted
box

*Typically in transformed,
normalized coordinates
ROI pooling: Backpropagation
• Similar to max pooling, but has to take into account overlap of
pooling regions
𝑟1
RoI pooling
𝑧1,4

𝑟1 𝑧2,1

𝑥33 𝑟2

𝑟2

RoI pooling

Feature Map

Source: Ross
Girshick
ROI pooling: Backpropagation
• Similar to max pooling, but has to take into account overlap of
pooling regions
𝑟1

𝑖 ∗ 1,4 = 33 𝑧1,4
𝑖 ∗ 2,1 = 33 𝑧2,1
𝑟1
Backward Pass:
max pooling 𝜕𝑒
𝑥33 “switch” 𝑟2 Have ,
𝜕𝑧
(argmax 𝜕𝑒
want
back-pointer) 𝜕𝑥
𝑟2

𝜕𝑒 𝜕𝑒 𝜕𝑧𝑟𝑗 ∗
𝜕𝑒
= ෍෍ = ෍ ෍ 𝕀 𝑖 = 𝑖 𝑟, 𝑗
𝜕𝑥𝑖 𝜕𝑧𝑟𝑗 𝜕𝑥𝑖 𝜕𝑧𝑟𝑗
𝑟 𝑗 𝑟 𝑗
Over regions 𝑟, 1 if 𝑟, 𝑗 “pooled”
RoI indices 𝑗 input 𝑖; 0 o/w Source: Ross Girshick
Mini-batch sampling
• Sample a few images (e.g., 2)
• Sample many regions from each image (64)

... ... ... ...

Sample images

SGD mini-batch

Source: R. Girshick, K. He
Fast R-CNN results

Fast R-CNN R-CNN

Train time (h) 9.5 84
- Speedup 8.8x
Test time / image 0.32s 47.0s
- Test speedup 146x
mAP 66.9% 66.0% (vs. 53.7% for AlexNet)

Timings exclude object proposal time, which is equal for all methods.
All methods use VGG16.

Source: R. Girshick, K. He
Faster R-CNN

Region
proposals

Region Proposal
Network feature map
feature map

share features

CNN CNN

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks, NIPS 2015
Region proposal network (RPN)
• Idea: put an “anchor box” of fixed size over each position in
the feature map and try to predict whether this box is likely to
contain an object

Anchor is
an object?

Figure source: J. Johnson

Region proposal network (RPN)
• Idea: put an “anchor box” of fixed size over each position in
the feature map and try to predict whether this box is likely to
contain an object

Anchor is
an object?

Figure source: J. Johnson

Region proposal network (RPN)
• Idea: put an “anchor box” of fixed size over each position in
the feature map and try to predict whether this box is likely to
contain an object

Conv
Anchor is
an object?

Figure source: J. Johnson

Region proposal network (RPN)
• Idea: put an “anchor box” of fixed size over each position in
the feature map and try to predict whether this box is likely to
contain an object
• Introduce anchor boxes at multiple scales and aspect ratios
to handle a wider range of object sizes and shapes

Anchor is object?

Conv
Anchor is object?
Anchor is object?
Anchor is object?

Figure source: J. Johnson

Faster R-CNN RPN design
• Slide a small window (3x3) over the conv5 layer
• Predict object/no object
• Regress bounding box coordinates with reference to anchors
(3 scales x 3 aspect ratios)
One network, four losses
Classification Bounding-box
loss regression loss
…

Classification Bounding-box
loss regression loss RoI pooling

proposals

Region Proposal
Network
feature map

CNN

image
Source: R. Girshick, K. He
Faster R-CNN results
Object detection progress

Faster R-CNN
Fast R-CNN

Before CNNs R-CNNv1

After CNNs
Outline
• Task definition and evaluation
• Two-stage detectors
• R-CNN
• Fast R-CNN
• Faster R-CNN
• Single-stage and multi-resolution detectors
Streamlined detection architectures
• The Faster R-CNN pipeline separates proposal generation
and region classification
RPN Region Classification +
Proposals Regression

Conv feature RoI RoI

map of the pooling Detections
features
entire image

• Is it possible to do detection in one shot?

Classification +
Conv feature Regression
map of the Detections
entire image
YOLO
• Divide the image into a coarse grid and directly predict class
label and a few candidate boxes for each grid cell

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You Only Look Once: Unified, Real-Time
Object Detection, CVPR 2016
YOLO
1. Take conv feature maps at 7x7 resolution
2. Add two FC layers to predict, at each location,
a score for each class and 2 bboxes w/ confidences
• For PASCAL, output is 7 × 7 × 30 (30 = 20 + 2 ∗ (4 + 1))

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You Only Look Once: Unified, Real-Time
Object Detection, CVPR 2016
YOLO
• Objective function:

Regression

Object/no object
confidence

Class prediction
YOLO
• Objective function:
Cell i contains object,
predictor j is
responsible for it

Small deviations matter

less for larger boxes
than for smaller boxes

Confidence for object

Confidence for no object

Down-weight loss from Class probability

boxes that don’t contain
objects (𝜆noobj = 0.5)
YOLO: Results
• Each grid cell predicts only two boxes and can only have one class –
this limits the number of nearby objects that can be predicted
• Localization accuracy suffers compared to Fast(er) R-CNN due to
coarser features, errors on small boxes
• 7x speedup over Faster R-CNN (45-155 FPS vs. 7-18 FPS)

Performance on PASCAL 2007

YOLO v2
• Remove FC layer, do VOC 2007 results

convolutional prediction
with anchor boxes
instead
• Increase resolution of
input images and conv
feature maps
• Improve accuracy using
batch normalization and
other tricks YouTube demo

J. Redmon and A. Farhadi, YOLO9000: Better, Faster, Stronger, CVPR 2017

Multi-resolution prediction: SSD
• Predict boxes of different size from different conv maps
• Each level of resolution has its own predictor

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. Berg, SSD: Single Shot MultiBox Detector, ECCV 2016
Multi-resolution prediction: SSD
• Predict boxes of different size from different conv maps
• Each level of resolution has its own predictor

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. Berg, SSD: Single Shot MultiBox Detector, ECCV 2016
Feature pyramid networks
• Improve predictive power of
lower-level feature maps by
adding contextual information
from higher-level feature maps
• Predict different sizes of
bounding boxes from different
levels of the pyramid (but
share parameters of
predictors)

T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, CVPR 2017
RetinaNet
• Combine feature pyramid network with focal loss to reduce the standard
cross-entropy loss for well-classified examples

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal loss for dense object detection, ICCV 2017
RetinaNet
• Combine feature pyramid network with focal loss to reduce the standard
cross-entropy loss for well-classified examples

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal loss for dense object detection, ICCV 2017
RetinaNet: Results

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal loss for dense object detection, ICCV 2017
Outline
• Task definition and evaluation
• Two-stage detectors
• R-CNN
• Fast R-CNN
• Faster R-CNN
• Single-stage and multi-resolution detectors
• Recent trends
CornerNet

H. Law and J. Deng, CornerNet: Detecting Objects as Paired Keypoints, ECCV 2018
CornerNet

H. Law and J. Deng, CornerNet: Detecting Objects as Paired Keypoints, ECCV 2018
CenterNet
• Use an additional center point to verify predictions:

K. Duan et al. CenterNet: Keypoint Triplets for Object Detection, ICCV 2019
CenterNet

K. Duan et al. CenterNet: Keypoint Triplets for Object Detection, ICCV 2019
Detection Transformer (DETR)

N. Carion et al., End-to-end object detection with transformers, ECCV 2020

Lec36 Obj Detn
No ratings yet
Lec36 Obj Detn
60 pages
10 R CNN
No ratings yet
10 R CNN
28 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
Object Detection
No ratings yet
Object Detection
57 pages
CVR FDP
No ratings yet
CVR FDP
37 pages
Yolo Family
No ratings yet
Yolo Family
40 pages
Object Detection1
No ratings yet
Object Detection1
29 pages
Face Detection With The Faster R-CNN
No ratings yet
Face Detection With The Faster R-CNN
6 pages
Lecture Paola Object Detection
No ratings yet
Lecture Paola Object Detection
29 pages
Fast Methods For Deep Learning Based Object Detection
No ratings yet
Fast Methods For Deep Learning Based Object Detection
43 pages
BTP Report Faster R CNN Compressed
No ratings yet
BTP Report Faster R CNN Compressed
32 pages
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
42 pages
Li 2021 J. Phys.: Conf. Ser. 1827 012085
No ratings yet
Li 2021 J. Phys.: Conf. Ser. 1827 012085
11 pages
Faster R-CNN - Deep Dive Into Object Detection
No ratings yet
Faster R-CNN - Deep Dive Into Object Detection
31 pages
Faster R-CNN with Region Proposal Networks
No ratings yet
Faster R-CNN with Region Proposal Networks
9 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms
No ratings yet
R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms
11 pages
Ref 16
No ratings yet
Ref 16
14 pages
MV cs4243 2024 Amir 6 p2
No ratings yet
MV cs4243 2024 Amir 6 p2
95 pages
Generalized R-CNN for Researchers
No ratings yet
Generalized R-CNN for Researchers
127 pages
Deep Learning Algorithms For Object Detection
No ratings yet
Deep Learning Algorithms For Object Detection
43 pages
R-CNN vs Fast R-CNN Analysis
No ratings yet
R-CNN vs Fast R-CNN Analysis
4 pages
L10 Lecture Detection - Segmentation v2.5
No ratings yet
L10 Lecture Detection - Segmentation v2.5
35 pages
Unit 3
No ratings yet
Unit 3
45 pages
Ross Girshick Et Al - in 2013 Proposed An Architecture Called R-CNN (Region
No ratings yet
Ross Girshick Et Al - in 2013 Proposed An Architecture Called R-CNN (Region
6 pages
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
No ratings yet
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
47 pages
Fast R-CNN
No ratings yet
Fast R-CNN
9 pages
R-CNN Minus R: Karel Lenc Andrea Vedaldi
No ratings yet
R-CNN Minus R: Karel Lenc Andrea Vedaldi
9 pages
Object Detection
No ratings yet
Object Detection
76 pages
RCNN
No ratings yet
RCNN
25 pages
139 Pretrained Networks Object Detection
No ratings yet
139 Pretrained Networks Object Detection
22 pages
Object Detection Using CNN-RCNN.-1
No ratings yet
Object Detection Using CNN-RCNN.-1
14 pages
3.1 Faster - R-CNN - Towards - Real-Time - Object - Detection - With - Region - Proposal - Networks
No ratings yet
3.1 Faster - R-CNN - Towards - Real-Time - Object - Detection - With - Region - Proposal - Networks
13 pages
Beginner's Guide to R-CNN Basics
No ratings yet
Beginner's Guide to R-CNN Basics
6 pages
Fast R-CNN
No ratings yet
Fast R-CNN
9 pages
Advanced Object Detection Guide
No ratings yet
Advanced Object Detection Guide
90 pages
CSE4261 Lecture-12
No ratings yet
CSE4261 Lecture-12
24 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Lenc 15 RCNN
No ratings yet
Lenc 15 RCNN
12 pages
Najibi G-CNN An Iterative CVPR 2016 Paper
No ratings yet
Najibi G-CNN An Iterative CVPR 2016 Paper
9 pages
Fast R-CNN (R Girshick 2015) PDF
No ratings yet
Fast R-CNN (R Girshick 2015) PDF
9 pages
An Improved Faster R-CNN For Same Object
No ratings yet
An Improved Faster R-CNN For Same Object
12 pages
Region-Based Object Detection and Classification Using Faster R-CNN
No ratings yet
Region-Based Object Detection and Classification Using Faster R-CNN
6 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
He Mask R-CNN Iccv 2017 Paper
No ratings yet
He Mask R-CNN Iccv 2017 Paper
9 pages
He Mask R-CNN ICCV 2017 Paper PDF
No ratings yet
He Mask R-CNN ICCV 2017 Paper PDF
9 pages
A Comprehensive Survey of The R-CNN Family For Object Detection
No ratings yet
A Comprehensive Survey of The R-CNN Family For Object Detection
6 pages
Lecture 4 Detection
No ratings yet
Lecture 4 Detection
148 pages
CS60010 - CNN 4
No ratings yet
CS60010 - CNN 4
32 pages
Object Detection & Segmentation Guide
No ratings yet
Object Detection & Segmentation Guide
38 pages
Report 34
No ratings yet
Report 34
22 pages
CornerNet Detecting Objects As Paired Keypoints
No ratings yet
CornerNet Detecting Objects As Paired Keypoints
14 pages
Du 2018 J. Phys. Conf. Ser. 1004 012029
No ratings yet
Du 2018 J. Phys. Conf. Ser. 1004 012029
9 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
DINTA Object Recognition
No ratings yet
DINTA Object Recognition
47 pages
Week 5 - Fast RCNN
No ratings yet
Week 5 - Fast RCNN
17 pages
Intro to Camera & Image Basics
No ratings yet
Intro to Camera & Image Basics
42 pages
HCI Cognitive Analysis
No ratings yet
HCI Cognitive Analysis
8 pages
Usability in Design
No ratings yet
Usability in Design
24 pages
Usability Testing
No ratings yet
Usability Testing
29 pages
Design Methodologies Guide
No ratings yet
Design Methodologies Guide
5 pages
Bow - English 5
No ratings yet
Bow - English 5
6 pages
English Vinglis Part 1
100% (1)
English Vinglis Part 1
101 pages
Analiza Sociala
No ratings yet
Analiza Sociala
80 pages
Gracie Combatives At-Home Training Tracker Card
No ratings yet
Gracie Combatives At-Home Training Tracker Card
1 page
CH 02
No ratings yet
CH 02
36 pages
University of Rizal System
No ratings yet
University of Rizal System
32 pages
Interprofessional Education Program Guide
No ratings yet
Interprofessional Education Program Guide
13 pages
NSTP1 MODULE-10 Answer
No ratings yet
NSTP1 MODULE-10 Answer
2 pages
Babita Yadav: Versatile Professional Profile
No ratings yet
Babita Yadav: Versatile Professional Profile
1 page
Routledge Handbook of Latin America in The World 1st Edition Jorge I Dominguez Instant Download
100% (1)
Routledge Handbook of Latin America in The World 1st Edition Jorge I Dominguez Instant Download
53 pages
Cardiopulmonary PT Expertise
No ratings yet
Cardiopulmonary PT Expertise
4 pages
Day 5 Heat Result
No ratings yet
Day 5 Heat Result
7 pages
B.E.: Electronics & Communication Engineering: (Not For E&C Students)
No ratings yet
B.E.: Electronics & Communication Engineering: (Not For E&C Students)
1 page
Nursing Theorist and Their Works
No ratings yet
Nursing Theorist and Their Works
14 pages
Watsonwhatdoyoumeanurban
No ratings yet
Watsonwhatdoyoumeanurban
7 pages
Toledo School of Translators and Its Importance in The History of Translation in The West (#1538569) - 4168837
No ratings yet
Toledo School of Translators and Its Importance in The History of Translation in The West (#1538569) - 4168837
7 pages
Order of National Artists Overview
No ratings yet
Order of National Artists Overview
12 pages
Women's University in Africa Programmes & Short Courses-2023
100% (1)
Women's University in Africa Programmes & Short Courses-2023
20 pages
Oscar Peterson
0% (1)
Oscar Peterson
9 pages
Compliance Officer & Audit Expert Resume
No ratings yet
Compliance Officer & Audit Expert Resume
3 pages
LPF Brochure New
No ratings yet
LPF Brochure New
2 pages
Liberal Education: by Irshad Ali Sodhar (FSP) 2. Definition 3. Importance 4. Sphere of Liberal Education 5. Objectives
No ratings yet
Liberal Education: by Irshad Ali Sodhar (FSP) 2. Definition 3. Importance 4. Sphere of Liberal Education 5. Objectives
5 pages
99 - March 2012 LET Reviewer
No ratings yet
99 - March 2012 LET Reviewer
15 pages
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
No ratings yet
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
5 pages
Wishful Drinking Carrie Fisher Complete Edition
No ratings yet
Wishful Drinking Carrie Fisher Complete Edition
135 pages
Structure and Guideline of The CourseWork
No ratings yet
Structure and Guideline of The CourseWork
2 pages
Unit 2-Art Vs Craft
No ratings yet
Unit 2-Art Vs Craft
5 pages
13 在线加密流量分类有无代码
No ratings yet
13 在线加密流量分类有无代码
14 pages
Summer 23 24
No ratings yet
Summer 23 24
38 pages

L7 Detection

Uploaded by

L7 Detection

Uploaded by

Object detection

cat: 0.8 cat

Ground truth (GT)

• Dataset size (by 2012): 11.5K training/validation images,

image classification object detection

semantic segmentation instance segmentation

• Also: keypoint prediction, captioning, question answering…

SVMs Forward each region

• Regions: ~2000 Selective Search proposals

Softmax classifier Linear +

FCs Fully-connected layers

RoI Pooling layer

Region Conv5 feature map of image

Forward whole image through ConvNet

Source: R. Girshick R. Girshick, Fast R-CNN, ICCV 2015

Conv feature map RoI

Region of Interest RoI

R. Girshick, Fast R-CNN, ICCV 2015

Source: R. Girshick R. Girshick, Fast R-CNN, ICCV 2015

softmax loss regression loss

• Regression loss: smooth 𝐿1 loss on top of log space offsets relative to

𝐿reg 𝑏, 𝑏෠ = ෍ smooth𝐿1 (𝑏𝑖 − 𝑏෠𝑖 )

... ... ... ...

Fast R-CNN R-CNN

Figure source: J. Johnson

Figure source: J. Johnson

Figure source: J. Johnson

Figure source: J. Johnson

Before CNNs R-CNNv1

Conv feature RoI RoI

• Is it possible to do detection in one shot?

Small deviations matter

Confidence for object

Confidence for no object

Down-weight loss from Class probability

Performance on PASCAL 2007

J. Redmon and A. Farhadi, YOLO9000: Better, Faster, Stronger, CVPR 2017

N. Carion et al., End-to-end object detection with transformers, ECCV 2020

You might also like