Computer Vision
IFT6758 - Data Science
Sources:
http://www.cs.cmu.edu/~16385/
http://cs231n.stanford.edu/2018/syllabus.html
http://www.cse.psu.edu/~rtc12/CSE486/
Announcement
• Assignment 3 is available on gradescope:
• Algorithmic discrimination
• NLP (has two bonus point questions)
• CV (basics)
• Due: November 28
• Grades of assignment 2 and mid-term will be published this week.
!2
CV Pipeline
!3
Image transformations
!4
Recall: filtering
!5
Convolution
!6
Convolution
All of computer vision
is convolutions
(basically)
!7
Convolution
• The mathematics for many filters can be expressed in a principal manner
using 2D convolution, such as smoothing and sharpening images and
detecting edges.
• Convolution in 2D operates on two images, with one functioning as the
input image and the other, called the kernel, serving as a filter.
• It expresses the amount overlap of one function as it is shifted over another
function, as the output image is produced by sliding the kernel over the input
image.
!8
Convolution
• Convolution is the process of adding each element of the image to its local neighbors,
weighted by the kernel.
!9
Convolution for 2D discrete signals
Definition of filtering as convolution:
filtered image filter input image
!10
Convolution for 2D discrete signals
Definition of filtering as convolution:
filtered image filter input image
If the filter is non-zero only within , then
The box filter kernel we saw earlier is the 3x3 matrix representation of
.
!11
Convolution Examples
!12
Convolution Examples
• A sharpening filter can be broken down into two steps: It takes a smoothed
image, subtracts it from the original image to obtain the "details" of the
image, and adds the "details" to the original image.
!13
Convolution Examples
• A sharpening filter can be broken down into two steps: It takes a smoothed
image, subtracts it from the original image to obtain the "details" of the
image, and adds the "details" to the original image.
!14
Convolution Examples
• Gaussian Smoothing Filter: is a weighted averaging filter which gives more
weights to central pixels and less weights to the neighbors
!15
Convolution vs Correlation
Definition of discrete 2D
convolution: notice the flip
Definition of discrete 2D notice the lack of a
correlation: flip
• Most of the time won’t matter, because our kernels will be symmetric.
!16
Convolution vs correlation
• Convolution: is an integral that expresses the amount of overlap of
one function as it is shifted over another function.
• Convolution is a filtering operation
• Correlation compares the similarity of two sets of data. Correlation
computes a measure of similarity of two input signals as the are
shifted by another.
• The correlation reaches a maximum at the time when the two
signals matches best.
!17
Correlation application
• Correlation tells you how similar the signal is to the filter at any point. This is
used for image alignment, template matching and simple image matching.
Template
Original image
!18
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column
!19
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column
2D convolution with a separable filter is equivalent to two 1D convolutions
(with the “column” and “row” filters).
!20
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column
2D convolution with a separable filter is equivalent to two 1D convolutions
(with the “column” and “row” filters).
If the image has M x M pixels and the filter kernel has size N x N:
What is the cost of convolution with a non-separable filter?
!21
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column
2D convolution with a separable filter is equivalent to two 1D convolutions
(with the “column” and “row” filters).
If the image has M x M pixels and the filter kernel has size N x N:
What is the cost of convolution with a non-separable filter? M2 x N2
!22
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column
2D convolution with a separable filter is equivalent to two 1D convolutions
(with the “column” and “row” filters).
If the image has M x M pixels and the filter kernel has size N x N:
What is the cost of convolution with a non-separable filter? M2 x N2
What is the cost of convolution with a separable filter?
!23
Separable filters
A 2D filter is separable if it can be written as the product of a “column” and a
“row”.
1 1 1 1 1 1 1
example:
box filter 1 1 1 = 1 * row
1 1 1 1
column
2D convolution with a separable filter is equivalent to two 1D convolutions
(with the “column” and “row” filters).
If the image has M x M pixels and the filter kernel has size N x N:
What is the cost of convolution with a non-separable filter? M2 x N2
What is the cost of convolution with a separable filter? 2 x N x M2
!24
Examples of separable filters
• Box-filter that is used as smoothing filter.
• Sober operator which is used commonly for edge detection.
!25
CV Pipeline
!26
Edge detection
• Edges are the points in an image where the image brightness changes sharply
or has discontinuities. Such discontinuities generally correspond to:
• Discontinuities in depth
• Discontinuities in surface orientation
• Changes in material properties
• Variations in scene illumination
• Edges are important for two main reasons.
• 1) Most semantic and shape information can be deduced from them, so we can
perform object recognition and analyze perspectives and geometry of an image.
• 2) They are a more compact representation than pixels.
!27
Edge detection
!28
Characterizing edges
• We can pinpoint where edges occur from an image's intensity profile along a
row or column of the image. Wherever there is a rapid change in the intensity
function indicates an edge, as seen where the function's first derivative has a
local extrema.
!29
Partial derivatives with Convolution
!30
Partial derivatives of an image
-1 0 1 0
-1
!31
Image Gradient
!32
Intensity profile
!33
Effect of noise
!34
Effect of noise
!35
Solution: Smoothing
!36
Edge detection via Convolution
!37
Derivative of Gaussian filter
!38
Derivatives of Gaussian filter
!39
Edge detectors
!40
Canny Edge detection
!41
Corner/blob detectors
• Edges are useful as local features, but corners and small areas (blobs) are
generally more helpful in computer vision tasks. Blob detectors can be built
by extending the basic edge detector idea that we just discussed.
!42
Scale Invariant Feature Transform
(SIFT)
• Keypoints are basically the points of interest in an image. Keypoints are
analogous to the features of a given image.
• They are locations that define what is interesting in the image. Keypoints are
important, because no matter how the image is modified (rotation, shrinking,
expanding, distortion), we will always find the same keypoints for the image.
Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of computer vision 60.2 (2004): 91-110.
!43
SIFT
!44
Speeded-Up Robust Features (SURF)
• Speeded-Up Robust Features (SURF) is an enhanced version of SIFT. It
works much faster and is more robust to image transformations.
• In SIFT, the scale space is approximated using Difference of Gaussians
(DoG) while in SURF they use Laplacian of Gaussian. The Laplacian kernel
works by approximating a second derivative of the image. Hence, it is very
sensitive to noise and they apply the Gaussian kernel to the image before
Laplacian kernel thus giving it the name Laplacian of Gaussian.
• In SURF, the Laplacian of Gaussian is calculated using a box filter (kernel).
The convolution with box filter can be done in parallel for different scales
which is the underlying reason for the enhanced speed of SURF (compared
to SIFT).
Bay, Herbert, Tinne Tuytelaars, and Luc Van Gool. "Surf: Speeded up robust features." European conference on computer vision. Springer, Berlin, Heidelberg, 2006.
!45
SURF
!46
Bag of Words
Dictionary Learning:
Learn Visual Words using clustering
Encode:
build Bags-of-Words (BOW) vectors
for each image
Classify:
Train and test data using BOWs
!47
Which object do these parts belong
to?
!48
Some local feature are
very informative
An object as
a collection of local features
(bag-of-words)
• deals well with occlusion
• scale invariant
• rotation invariant
!49
BOW
• Extract features (e.g., SIFT)
• Learn a visual dictionary
!50
Image Features
!51
Image Features: Motivation
!52
Image Features: Motivation
!53
Image Features
!54
Image Features
!55
Image features vs ConvNets
!56
!57
ImageNet
• ImageNet is an image database organized according to
the WordNet hierarchy (currently only the nouns), in which each node of the
hierarchy is depicted by hundreds and thousands of images. Currently we
have an average of over five hundred images per node
!58
!59
AlexNet
“AlexNet is considered one of the most influential papers published in computer
vision, having spurred many more papers published employing CNNs and GPUs
to accelerate deep learning.”
!60
Convolution Neural Networks
We will learn about it on Thursday!
!61
Conferences focusing on CV
• CVPR : IEEE/CVF Conference on Computer Vision and Pattern Recognition
http://cvpr2020.thecvf.com/
• ICCV : IEEE/CVF International Conference on Computer Vision
http://iccv2019.thecvf.com/
• ACMMM : ACM International Conference on Multimedia
https://www.acmmm.org/2020/
• CV is one of the main topics of the major machine learning and AI
conferences such as:
AAAI, IJCAI, ICML, NEURIPS, …
!62