[go: up one dir, main page]

0% found this document useful (0 votes)
23 views71 pages

9 ComputerVision

The document discusses the Scale-Invariant Feature Transform (SIFT) in the context of computer vision, focusing on its application for image matching and panorama creation. It outlines the importance of detecting and matching feature points across images, ensuring robustness to various transformations such as scale and rotation. Additionally, it highlights the advantages of local features, the process of extracting key points, and the significance of SIFT in achieving efficient and distinctive feature matching.

Uploaded by

Omer Amin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views71 pages

9 ComputerVision

The document discusses the Scale-Invariant Feature Transform (SIFT) in the context of computer vision, focusing on its application for image matching and panorama creation. It outlines the importance of detecting and matching feature points across images, ensuring robustness to various transformations such as scale and rotation. Additionally, it highlights the advantages of local features, the process of extracting key points, and the significance of SIFT in achieving efficient and distinctive feature matching.

Uploaded by

Omer Amin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Recap Scale-Invariant Feature Transform

Scale-Invariant Feature Transform


CS-477 Computer Vision

Dr. Mohsin Kamal


Associate Professor
dr.mohsinkamal@seecs.edu.pk

School of Electrical Engineering and Computer Science (SEECS)


National University of Sciences and Technology (NUST), Pakistan

1
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

1 Recap

2 Scale-Invariant Feature Transform

2
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

1 Recap

2 Scale-Invariant Feature Transform

3
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching with features

How do we build a panorama?

We need to match (align) images

4
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching with features

How do we build a panorama?

Detect feature points in


both images
Find corresponding pairs
Use these matching pairs
to align images - the
required mapping is called
a homography.

5
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching with features

Problem 1: Detect the same point independently in both images

6
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching with features

Problem 2: For each point correctly recognize the


corresponding one

7
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching with features

Among all my matches, how do I know which ones are good?

8
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching with features

Finding the "same" thing across images


Why do we care about matching features?

Object recognition
Wide baseline matching
Tracking

9
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching with features

We want invariance!

Good features should be robust to all sorts of nastiness that can


occur between images.

10
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Image matching

11
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Image matching

Harder case

12
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Image matching

Ever harder case

"How the Afghan Girl was Identified by Her Iris Patterns1 "

1
https://www.cl.cam.ac.uk/~jgd1000/afghan.html 13
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Image matching

Still a harder case

NASA Mars Rover images


14
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Image matching

Answer below (look for tiny colored squares...)

NASA Mars Rover images with SIFT feature matches (Figure by


Noah Snavely)
15
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Image matching

16
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Image matching

17
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Invariant local features

Find features that are invariant to transformations


geometric invariance: translation, rotation, scale
photometric invariance: brightness, exposure, ...

18
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Advantages of local features

Locality: features are local, so robust to occlusion and clutter


Distinctiveness: can differentiate a large database of objects
Quantity: hundreds or thousands in a single image
Efficiency: real-time performance achievable
Generality: exploit different types of features in different
situations

19
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Advantages of local features

More motivation

Feature points are used for:


Image alignment (e.g., mosaics)
3D reconstruction
Motion tracking
Object recognition
Robot navigation, and
many more

20
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Advantages of local features

Uniqueness

Look for image regions that are unusual


Lead to unambiguous matches in other images
How to define "unusual"?

21
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Detector, Descriptor and Correspondence


Detector: detect same scene points independently in both
images
Descriptor: encode local neighboring window
Note how scale and rotation of window are the same in both
image (but computed independently)
Correspondence: find most similar descriptor in other
image

22
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Selecting Good features

What is a "good feature"?


Satisfies brightness constancy - looks the same in both
images
Has sufficient texture variation
Does not have too much texture variation
Corresponds to a "real" surface patch.

Does not deform too much over time

23
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Harris detector: Some properties

Invariant image scale

24
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Harris detector: Some properties

Not invariant to image scale!

25
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

1 Recap

2 Scale-Invariant Feature Transform

26
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Scale Invariant Detection

Consider regions (e.g. circles) of different sizes around a


point
Regions of corresponding sizes will look the same in both
images

27
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Scale Invariant Detection

The problem
How do we choose corresponding circles independently in
each image?
Do objects in the image have a characteristic scale that we
can identify?

28
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Scale Invariant Detection

Solution

Design a function on the region (circle), which is "scale


invariant" (the same for corresponding regions, even if they
are at different scales)
Example: average intensity. For corresponding regions (even
of different sizes) it will be the same.
For a point in one image, we can consider it as a function of
region size (circle radius)

29
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Scale Invariant Detection

A "good" function for scale detection has one stable sharp


peak

For usual images: a good function would be a one which


responds to contrast (sharp local intensity change)

30
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

SIFT - Key Point Extraction

Stands for scale invariant feature transform


Patented by university of British Columbia
Similar to the one used in primate visual system (human,
ape, monkey, etc.)
Transforms image data into scale invariant coordinates

31
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Goal

Extracting distinctive invariant features


Correctly matched against a large database of features from
many images
Invariance to image scale and rotation
Robustness to
Affine distortion,
Change in 3D viewpoint,
Addition of noise,
Change in illumination.

32
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Properties of SIFT
Highly distinctive
A single feature can be correctly matched with high
probability against a large database of features from many
images.
Scale and rotation invariant.
Partially invariant to 3D camera viewpoint
Can tolerate up to about 60 degree out of plane rotation
Can handle significant changes in illumination
Sometimes even day vs. night (below)
Fast and efficient - can run in real time
Lots of code available

33
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Advantages

Locality: features are local, so robust to occlusion and


clutter
Distinctiveness: individual features can be matched to a
large database of objects
Quantity: many features can be generated for even small
objects
Efficiency: close to real-time performance

34
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Steps for Extracting Key Points

Scale space peak selection


Potential locations for finding features
Key point localization
Accurately locating the feature key points
Orientation Assignment
Assigning orientation to the key points
Key point descriptor
Describing the key point as a high dimensional vector

35
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Scales

What should be sigma value for Canny and LG edge


detection?
If use multiple sigma values (scales), how do you combine
multiple edge maps?
Marr-Hildreth:
Spatial Coincidence assumption:
Zerocrossings that coincide over several scales are physically
significant.

36
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Edge detection - Rewind

37
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Edge detection - Rewind

38
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Edge detection - Rewind

39
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Laplacian-of-Gaussian (LoG)

40
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

What is a useful Signature function?

Laplacian-of-Gaussian = "Blob" detector

41
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Scale-space blob detector: Example

42
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Scale-space blob detector: Example

43
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Scale-space blob detector: Example

44
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Building a Scale Space

All scales must be examined to identify scale invariant


features
An efficient function is to compute the Laplacian Pyramid
(Difference of Gaussian)

45
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Approximation of LoG by Difference of Gaussians

We can approximate the Laplacian with a difference of


Gaussians; more efficient to implement.
L = σ 2 (Gxx (x, y , σ) + Gyy (x, y , σ))
(Laplacian: 2nd derivative of
Gaussian)

DoG = G(x, y , k σ) − G(x, y , σ)


Difference of Gaussians

46
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Building a Scale Space


1 2 2 2 2
Using, G(x, y , k σ) = 2π(k σ)2
e−(x +y )/2k σ

47
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Building a Scale Space

L(x, y , σ) = G(x, y , σ) ∗ I(x, y )


D(x, y , σ) = L(x, y , k σ) − L(x, y , σ)

48
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Scale Space peak detection

Compare a pixel (X) with 26 pixels in


current and adjacent scales (Green
Circles)
Select a pixel (X) if larger/smaller
than all 26 pixels

49
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Key Point Localization


Candidates are chosen from extrema detection

50
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Feature Point Localization - Initial outlier rejection

1 Low contrast candidates


2 Poorly localized candidates along an edge
let xo = [x, y , σ]T and z = [δx, δy , δσ]T
Taylor series expansion of DoG, D.
T
1 T ∂2D
D(xo + x) = D(Xo ) + ∂D
∂x x + 2 x ∂x 2 x

Minima or maxima is located at


2 −1
D
x̂ = − ∂ ∂x 2
∂D
∂x

Value of D(x) at minima/maxima must


be large, |D(x)| > th.

51
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Feature Point Localization - Initial outlier rejection

Solution(1/3)

∂D T 1 ∂2D
D(xo + x) ≈ D(Xo ) + x + xT x (1)
∂x 2 ∂x 2
To find maxima or minima, we know that

∂D(xo + x)
=0 (2)
∂x
Derivate 1 w.r.t. x
∂D(xo +x) ∂D 1 ∂2D
∂x
=0+ ∂x
+ 2
×2× ∂x 2

Put in 2, which becomes

∂D ∂2D
+ x̂ = 0 (3)
∂x ∂x 2
By solving 3, we get

∂ 2 D −1 ∂D
x̂ = − (4)
∂x 2 ∂x
52
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Feature Point Localization - Initial outlier rejection

Solution(2/3)

1 can be modified as
∂D T 1 ∂2D
D(xo + x̂) ≈ D(Xo ) + x̂ + x̂ T x̂ (5)
∂x 2 ∂x 2
Put the value of x̂ computed in 4. 5 becomes
T 2
∂D T ∂ 2 D −1 ∂D

1 ∂ D
D(xo + x̂) ≈ D(Xo ) + x̂ + − x̂ (6)
∂x 2 ∂x 2 ∂x ∂x 2
T −1
!T
∂D T ∂2D ∂2D

1 ∂D
D(xo + x̂) ≈ D(Xo ) + x̂ − x̂ (7)
∂x 2 ∂x ∂x 2 ∂x 2
 T
−1
∂2D
In 7, ∂x 2
represents the Hessian matrix which is symmetric in nature.

53
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Feature Point Localization - Initial outlier rejection

Solution(3/3)

7 is solved to
∂D T 1 ∂D T
D(xo + x̂) ≈ D(Xo ) + x̂ − x̂ (8)
∂x 2 ∂x

1 ∂D T
D(xo + x̂) ≈ D(Xo ) + x̂ (9)
2 ∂x
If the result of (9) is less than 0.03, it is discarded saying it’s a
low contrast point.
All pixels are normalized
h between
i 0 and 1.
∂D ∂D ∂D ∂D
Note that ∂z = ∂x ∂y ∂σ

54
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Feature Point Localization - Initial outlier rejection

After Feature Point Localization and thresholding

55
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Removing points on the edges - Further outlier rejection

DoG has strong response along edge


The prinicpal curvatures can be computed from a 2×2
Hessian matrix, H, computed at the location and scale of
the key point
 
Dxx Dxy
H=
Dxy Dyy
let α and β be the eigen values of the matrix
Tr (H)2 (α+β)2
Det(H) = αβ
let r represents the ratio of the eigen values i.e., r = αβ , then
α = rβ
Putting in equation above, we get
Tr (H)2 (r β+β)2 2

Det(H) = r β2
= (r +1)
r
Keep only those values which satisfies the following condition
Tr (H)2 (r +1)2
Det(H) < r for r = 10
56
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Removing points on the edges - Further outlier rejection

Further explaination (for reading)

Analogous to Harris corner detector


Compute Hessian of D  
Dxx Dxy
H=
Dxy Dyy
where,
Tr (H) = Dxx + Dyy = α + β
Det(H) = Dxx Dyy − (Dxy )2 = αβ
Remove outliers by evaluating
Tr (H)2 (r +1)2
Det(H) = r
where r = αβ

57
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Removing points on the edges - Further outlier rejection

Following quantity is minimum when r = 1


It increases with r
Tr (H)2 (r +1)2
Det(H) = r
α
where r = β
Eliminate key points if r > 10

58
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Removing points on the edges - Further outlier rejection

After removal of edges points

59
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Orientation Assignment

Descriptor - Why do we need it?

60
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Orientation Assignment

To achieve rotation invariance


Compute central derivatives, gradient magnitude and
direction of L (smooth image) at the scale of key point (x, y )
m(x,
p y) =
(L(x + 1, y ) − L(x − 1, y ))2 + (L(x, y + 1) − L(x, y − 1))2
θ(x, y ) =
tan−1 ((L(x, y + 1) − L(x, y − 1))/(L(x + 1, y ) − L(x − 1, y )))

61
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Orientation Assignment

Create a weighted direction histogram in a neighborhood of


a key point (36 bins i.e., from 360o /10)

62
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Orientation Assignment

Select the peak as direction of the key point


Introduce additional key points (same location) at local
peaks (within 80% of max peak) of the histogram with
different directions

63
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Extraction of Local Image Descriptors at Key Points

Compute relative orientation and magnitude in a 16 × 16


neighborhood at key point
Form weighted histogram (8 bin) for 4 × 4 regions
Weight by magnitude and spatial Gaussian
Concatenate 16 histograms in one long vector of 128
dimensions
Example for 8 × 8 to 2 × 2 descriptors

64
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Key point matching

Match the key points against a database of that obtained


from training images.
Find the nearest neighbor i.e. a key point with minimum
Euclidean distance.
Efficient Nearest Neighbor matching
Looks at ratio of distance between best and 2nd best match.
What if you get ratio = 0.8? It is ambiguous.

65
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching local features

66
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching local features

To generate candidate matches, find patches that have the


most similar appearance or SIFT descriptor
Simplest approach: compare them all, take the closest (or
closest k, or within a thresholded distance)
67
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching local features

Ambiguous matches

At what distance do we have a good match?


To add robustness to matching, can consider ratio :
distance to best match / distance to second best match
If low, first match looks good.
If high, could be ambiguous match.
68
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching local features

Given a feature in I1 , how to find the best match in I2 ?


1 Define distance function that compares two descriptors.
2 Test all the features in I2, find the one with min distance.

69
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching local features

L2 Distance

Essentially, comparing two arrays of data


Let H1 (k ) and H2 (k ) be two
rP arrays of data f length n
d(H1 , H2 ) = (H1 (k ) − H2 (k ))2
k
Smaller the distance metric, better the match
Perfect match is achieved when d(H1 , H2 ) = 0

70
CS-477 Computer Vision by Dr. Mohsin Kamal
Recap Scale-Invariant Feature Transform

Matching local features

Normalized Correlation

P
[(H1 (k ) − H̄1 )(H2 (k ) − H̄2 )]
k
d(H1 , H2 ) = rP rP (10)
(H1 (k ) − H̄1 )2 (H2 (k ) − H̄2 )2
k k

N
1 P
where Ĥi = N Hi (k )
k =1
Larger the distance metric, better the match
Perfect match when d(H1 , H2 ) = 1

71
CS-477 Computer Vision by Dr. Mohsin Kamal

You might also like