0% found this document useful (0 votes)

20 views36 pages

Object Detection and Segmentation - Part 2

The document discusses semantic segmentation in deep learning, focusing on labeling each pixel in images without differentiating instances. It covers various datasets used for training, models like Fully Convolutional Networks (FCN), U-Net, and DeepLabV3+, and techniques for upsampling and downsampling. Additionally, it addresses the concepts of object detection, instance segmentation, and panoptic segmentation, along with their respective metrics for evaluation.

Uploaded by

gamecule1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views36 pages

Object Detection and Segmentation - Part 2

Uploaded by

gamecule1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Deep Learning

Object Detection and Segmentation

Huỳnh Văn Thống
FPT Univ.
Semantic Segmentation
• Label each pixel in the image with
a category label.
• Don’t differentiate instances, only
care about pixels.

2/24/2025 2
Segmentation: Dataset
• Pascal VOC: 16k training natural images divided into 20 classes.
• Cityscapes: 25K urban-street images divided into 30 classes.
• ADE20K: 25K (20 stands for 20K training) scene-parsing images
divided into 150 classes.
• MS COCO: 328K images with 80 things categories and 91 stuff
categories.

Models are often pre-trained in the

large MS-COCO dataset, before
finetuned to the specific dataset.

2/24/2025 3
Semantic Segmentation: FCN
• FCN = Fully Convolutional Network.
• Design a network as a bunch of convolutional layers to make
predictions for pixels all at once.

2/24/2025 4
Semantic Segmentation: FCN
• Design a network as a bunch of convolutional layers to make
predictions for pixels all at once.

Problem #1: Effective receptive field size Problem #2: Convolution on high res
is linear in number of conv layers: With L images is expensive! Recall ResNet stem
3x3 conv layers, receptive field is 1+2L aggressively downsamples.
2/24/2025 5
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

2/24/2025 6
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

Downsampling: Upsampling : ?
Pooling, strided convolution

2/24/2025 7
In-Network Upsampling: “Unpooling”

2/24/2025 8
In-Network Upsampling: Bilinear Interpolation

Use two closest neighbors in 𝑥 and 𝑦

to construct linear approximations

2/24/2025 9
In-Network Upsampling: Bicubic Interpolation

Use three closest neighbors in 𝑥 and 𝑦 to

construct cubic approximations.
(This is how we normally resize images)

2/24/2025 10
In-Network Upsampling: “Max Unpooling”
Max Pooling: Remember Max Unpooling: Place into
which position had the max remembered positions

Pair each downsampling layer with

an upsampling layer

2/24/2025 11
Learnable Upsampling: Transposed Convolution

Recall: Normal 3 x 3 convolution, stride 1, pad 1

2/24/2025 12
Learnable Upsampling: Transposed Convolution

Recall: Normal 3 x 3 convolution, stride 2, pad 1

2/24/2025 13
Learnable Upsampling: Transposed Convolution

Recall: Normal 3 x 3 convolution, stride 2, pad 1

Convolution with stride > 1 is “Learnable Downsampling”

Can we use stride < 1 for “Learnable Upsampling”?

2/24/2025 14
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2

2/24/2025 15
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2

2/24/2025 16
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2 Sum where outputs

are overlap

2/24/2025 17
Learnable Upsampling: Transposed Convolution

3 x 3 transposed convolution, stride 2 Sum where outputs

are overlap

2/24/2025 18
Transposed Convolution: 1D example

Output has copies of filter

weighted by input.

Stride 2: Move 2 pixels output

for each pixel in input.

Sum at overlaps.

2/24/2025 19
Transposed Convolution: 1D example
Many name:
• Deconvolution (bad).
• Upconvolution.
• Fractionally strided
convolution.
• Backward strided
convolution.
• Transposed Convolution
(best).
2/24/2025 20
Semantic Segmentation: FCN
• Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

Downsampling: Upsampling :
Pooling, strided convolution Iinterpolation,
transposed conv
2/24/2025 21
Semantic Segmentation: FCN
• Combine predictions with different resolutions

Fully Convolutional Networks for Semantic Segmentation. Long et al., CVPR, 2015
2/24/2025 22
Semantic Segmentation: U-Net
• Incorporating the low-
level information.

U-Net: Convolutional Networks for Biomedical Image

Segmentation, Ronneberger et al., MICCAI 2015
2/24/2025 23
Semantic Segmentation: DeepLabV3+
• Encode multi-scale contextual
information by applying atrous
convolution at multiple scales

Encoder-Decoder with Atrous Separable Convolution

for Semantic Image Segmentation, Chen et al., ECCV
2/24/2025 2018 24
Atrous Convolution

Sparse feature extraction with

standard convolution on a
low-resolution input feature
map.

Dense feature extraction with

atrous convolution with rate r=2,
applied on a high-resolution input
feature map.

2/24/2025 25
Semantic Segmentation: DeepLabV3+
• Encode multi-scale contextual
information by applying atrous
convolution at multiple scales.

• Refine the segmentation

results along object
boundaries.

Encoder-Decoder with Atrous Separable Convolution

for Semantic Image Segmentation, Chen et al., ECCV
2/24/2025 2018 26
Computer Vision Tasks
Object Detection: Detects individual Semantic Segmentation: Gives per
object instances, but only gives box. pixel labels, but merges instances

2/24/2025 27
Things and Stuff
Things: Object categories that
can be separated into object
instances (e.g. cats, cars,
person).

Stuff: Object categories that

cannot be separated into
instances (e.g. sky, grass,
water, trees)

2/24/2025 28
Computer Vision Tasks
Object Detection: Detects individual Semantic Segmentation: Gives per
object instances, but only gives box. pixel labels, but merges instances.
(Only things) (Both things and stuff)

2/24/2025 29
Computer Vision Tasks
Instance Segmentation: Detect all objects Semantic Segmentation: Gives per
in the image and identify the pixels that pixel labels, but merges instances.
belong to each object. (Only things!) (Both things and stuff)

2/24/2025 30
Computer Vision Tasks: Instance Segmentation
Instance Segmentation: Detect all
objects in the image, and identify the
pixels that belong to each object.
(Only things!)

Approach: Perform object detection,

then predict a segmentation mask
for each object!

2/24/2025 31
Beyond Instance Segmentation: Panoptic Segmentation

• Label all pixels in the image

(both things and stuff).

• For “thing” categories also

separate into instances.

2/24/2025 32
Beyond Instance Segmentation: Panoptic Segmentation

2/24/2025 33
Panoptic quality (PQ) measure
• Computed per-category and results are averaged
across categories.
• The ground truth and predicted segments are
matched with an IoU threshold 0.5
• TP (matched pairs), FP (unmatched predicted
segments), and FN (unmatched ground truth
segments).

SQ: how close the predicted segments are to the

ground truth segment (does not consider bad RQ: just like for detection, we want to know if we are missing
predictions!) any instances (FN) or predicting more instances (FP)
2/24/2025 34
Next
• Visualization and Understanding
• Attention and Transformer
• Foundation Models and Promptable Segmentation.
• ….

2/24/2025 35
Questions?

2/24/2025 36

Dlcv2017d3l1segmentation 170623173102
No ratings yet
Dlcv2017d3l1segmentation 170623173102
36 pages
Semantic Segmentation for CS Students
No ratings yet
Semantic Segmentation for CS Students
151 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
181 pages
Lecture 5 - CNNs For Detection and Segmentation
No ratings yet
Lecture 5 - CNNs For Detection and Segmentation
62 pages
02 Semantic Segmentation 2024
No ratings yet
02 Semantic Segmentation 2024
53 pages
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
No ratings yet
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
10 pages
Lecture 5 Segmentation
No ratings yet
Lecture 5 Segmentation
140 pages
Lecture 21 Semantic Segmentation
No ratings yet
Lecture 21 Semantic Segmentation
24 pages
Deconvolution Network ICCV 2015 Paper PDF
No ratings yet
Deconvolution Network ICCV 2015 Paper PDF
9 pages
Object Detyection Using CNN
No ratings yet
Object Detyection Using CNN
113 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
8-Image Detection and Segmentation
No ratings yet
8-Image Detection and Segmentation
73 pages
Segmentation Detection
100% (1)
Segmentation Detection
109 pages
Deep Learning for Image Segmentation
No ratings yet
Deep Learning for Image Segmentation
6 pages
Object Detection-Compressed
No ratings yet
Object Detection-Compressed
80 pages
DL Unit 5
No ratings yet
DL Unit 5
63 pages
FCN 29sep2018
No ratings yet
FCN 29sep2018
12 pages
Review: Deepmask (Instance Segmentation) : An Instance Segment Proposal Method Driven by Convolutional Neural Networks
No ratings yet
Review: Deepmask (Instance Segmentation) : An Instance Segment Proposal Method Driven by Convolutional Neural Networks
6 pages
Vision
No ratings yet
Vision
24 pages
Semantic Segmentation by Using Down-Sampling and S
No ratings yet
Semantic Segmentation by Using Down-Sampling and S
14 pages
UNet For Semantic Segmentation - DTD - 19april2024
No ratings yet
UNet For Semantic Segmentation - DTD - 19april2024
20 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
17 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
12 pages
Advanced Topics in CNN and RNN
No ratings yet
Advanced Topics in CNN and RNN
72 pages
Fully Convolutional Networks For Semantic Segmentation
No ratings yet
Fully Convolutional Networks For Semantic Segmentation
12 pages
Image Segmentation Basics
No ratings yet
Image Segmentation Basics
11 pages
L10 Lecture Detection - Segmentation v2.5
No ratings yet
L10 Lecture Detection - Segmentation v2.5
35 pages
Lecture 4
No ratings yet
Lecture 4
46 pages
A Comprehensive Review of Modern Object Segmentation Approaches
No ratings yet
A Comprehensive Review of Modern Object Segmentation Approaches
177 pages
Mask R-CNN
No ratings yet
Mask R-CNN
4 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Thesis AlexanderJaus BIBTEX
No ratings yet
Thesis AlexanderJaus BIBTEX
9 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
Lecture Sematic-Segmentation
No ratings yet
Lecture Sematic-Segmentation
23 pages
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
42 pages
2018 - Understanding Convolution For Semantic Segmentation
No ratings yet
2018 - Understanding Convolution For Semantic Segmentation
10 pages
Computer Vision
No ratings yet
Computer Vision
6 pages
He 2017
No ratings yet
He 2017
9 pages
Semantic Segmentation with Keras
No ratings yet
Semantic Segmentation with Keras
5 pages
Deep Learning for Image Segmentation
No ratings yet
Deep Learning for Image Segmentation
92 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Od Segment 221219 043435
No ratings yet
Od Segment 221219 043435
40 pages
Segmentation-Aware Convolutional Networks Using Local Attention Masks
No ratings yet
Segmentation-Aware Convolutional Networks Using Local Attention Masks
11 pages
Unet + RL
No ratings yet
Unet + RL
63 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
AI-Powered Object Segmentation
No ratings yet
AI-Powered Object Segmentation
12 pages
Overview of Semantic Segmentation
No ratings yet
Overview of Semantic Segmentation
20 pages
Semantic Segmentation Architecture: A Key Part of Scene Understanding Applications
No ratings yet
Semantic Segmentation Architecture: A Key Part of Scene Understanding Applications
9 pages
METHODOLOGY
No ratings yet
METHODOLOGY
5 pages
Instance Segmentation
No ratings yet
Instance Segmentation
51 pages
Generalizability of Semantic Segmentation Techniques: Keshav Bhandari Texas State University, San Marcos, TX
No ratings yet
Generalizability of Semantic Segmentation Techniques: Keshav Bhandari Texas State University, San Marcos, TX
6 pages
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
No ratings yet
A Comparative Study of Real-Time Semantic Segmentation For Autonomous Driving
11 pages
The One Hundred Layers Tiramisu: Fully Convolutional Densenets For Semantic Segmentation
No ratings yet
The One Hundred Layers Tiramisu: Fully Convolutional Densenets For Semantic Segmentation
9 pages
He Mask R-CNN Iccv 2017 Paper
No ratings yet
He Mask R-CNN Iccv 2017 Paper
9 pages
He Mask R-CNN ICCV 2017 Paper PDF
No ratings yet
He Mask R-CNN ICCV 2017 Paper PDF
9 pages
CS60010 - CNN 4
No ratings yet
CS60010 - CNN 4
32 pages
2023 Article 295
No ratings yet
2023 Article 295
32 pages
ASTM Chart WSTyler
No ratings yet
ASTM Chart WSTyler
39 pages
Bias Compensation
No ratings yet
Bias Compensation
7 pages
Multi-Tenant Data Architecture For SaaS
No ratings yet
Multi-Tenant Data Architecture For SaaS
7 pages
CMT M2 PT1
No ratings yet
CMT M2 PT1
10 pages
Beam Design Wizard Guide
No ratings yet
Beam Design Wizard Guide
2 pages
FALLSEM2017-18 EEE2005 ETH TT423 VL2017181000234 Reference Material I 13-Chebyshev Lowpass Filter
No ratings yet
FALLSEM2017-18 EEE2005 ETH TT423 VL2017181000234 Reference Material I 13-Chebyshev Lowpass Filter
9 pages
Mine Haulage
100% (6)
Mine Haulage
54 pages
Notes Functions in Python 2022 23
No ratings yet
Notes Functions in Python 2022 23
26 pages
DPS 132 User Manual - 353305B
No ratings yet
DPS 132 User Manual - 353305B
116 pages
6ED10551FB100BA2 Datasheet en
No ratings yet
6ED10551FB100BA2 Datasheet en
2 pages
Research Methodology Guide
No ratings yet
Research Methodology Guide
18 pages
David Rumelhart
No ratings yet
David Rumelhart
3 pages
MSC in Power System (First Semester) Syllabus: Advanced Mathematics EG 801 EE
No ratings yet
MSC in Power System (First Semester) Syllabus: Advanced Mathematics EG 801 EE
1 page
Unit 2 N
No ratings yet
Unit 2 N
33 pages
NDT Mock-Up Test Guidelines
No ratings yet
NDT Mock-Up Test Guidelines
7 pages
Review Question #5
0% (2)
Review Question #5
2 pages
PP - Master Recipe Mapping Template Fields
No ratings yet
PP - Master Recipe Mapping Template Fields
13 pages
Math Concepts and Formulas Guide
No ratings yet
Math Concepts and Formulas Guide
4 pages
Digi SM-110 Operation and Programming Manual
No ratings yet
Digi SM-110 Operation and Programming Manual
7 pages
Selected Solutions To Ahlfors
No ratings yet
Selected Solutions To Ahlfors
33 pages
Insul 8 8 Bar Design Features
No ratings yet
Insul 8 8 Bar Design Features
22 pages
MCX-OWNER-0707 Manual de Operación Split Stylus
No ratings yet
MCX-OWNER-0707 Manual de Operación Split Stylus
16 pages
Semi-Detailed Lesson Plan
100% (4)
Semi-Detailed Lesson Plan
3 pages
Aspiring Research Manager's Profile
No ratings yet
Aspiring Research Manager's Profile
1 page
WAS Performance Cookbook
No ratings yet
WAS Performance Cookbook
619 pages
RT 4 Er
No ratings yet
RT 4 Er
3 pages
Compiler Design Lecture Plan
No ratings yet
Compiler Design Lecture Plan
15 pages
Elasticity
No ratings yet
Elasticity
18 pages

Object Detection and Segmentation - Part 2

Uploaded by

Object Detection and Segmentation - Part 2

Uploaded by

Deep Learning

Object Detection and Segmentation

Models are often pre-trained in the

Use two closest neighbors in 𝑥 and 𝑦

Use three closest neighbors in 𝑥 and 𝑦 to

Pair each downsampling layer with

Recall: Normal 3 x 3 convolution, stride 1, pad 1

Recall: Normal 3 x 3 convolution, stride 2, pad 1

Recall: Normal 3 x 3 convolution, stride 2, pad 1

Convolution with stride > 1 is “Learnable Downsampling”

Can we use stride < 1 for “Learnable Upsampling”?

3 x 3 transposed convolution, stride 2

3 x 3 transposed convolution, stride 2

3 x 3 transposed convolution, stride 2 Sum where outputs

3 x 3 transposed convolution, stride 2 Sum where outputs

Output has copies of filter

Stride 2: Move 2 pixels output

U-Net: Convolutional Networks for Biomedical Image

Encoder-Decoder with Atrous Separable Convolution

Sparse feature extraction with

Dense feature extraction with

• Refine the segmentation

Encoder-Decoder with Atrous Separable Convolution

Stuff: Object categories that

Approach: Perform object detection,

• Label all pixels in the image

• For “thing” categories also

SQ: how close the predicted segments are to the

You might also like