Module 3
Boundary Detection:
1. Fitting Lines and Curves,
2. Active Contours,
3. Hough Transform,
4. Generalized Hough Transform,
5. SIFT Detector:
a. Interest Points,
b. Detecting Blobs,
c. SIFT Detector,
d. SIFT Descriptor,
e. SURF Features
Boundary Detection: Fitting Lines and Curves
Introduction
Boundary detection is a fundamental task in computer vision and image processing, where the goal is to
identify the edges or boundaries of objects within an image. After detecting boundaries, one often needs to
fit geometric primitives like lines and curves to these boundaries for further analysis, such as object
recognition, shape analysis, and scene understanding.
Key Concepts:
1. Edge Detection: The first step in boundary detection, which involves identifying points in the image
where the intensity changes significantly. Common techniques include Sobel, Canny, and Laplacian
operators.
2. Line Fitting: Once edges are detected, fitting a line involves finding a straight line that best
represents the boundary points. This is often done using methods like the Hough Transform or least
squares fitting.
3. Curve Fitting: In some cases, the boundaries are not straight lines but curves. Curve fitting involves
finding a mathematical function that best describes the curve, such as polynomial fitting or spline
fitting.
Mathematical Formulation
1. Line Fitting using Least Squares:
Suppose you have a set of points (x 1,y1),(x2,y2),…,(xn,yn) that lie on or near a line. The equation
of the line can be written as y=mx+c, where m is the slope and c is the intercept.
The goal is to minimize the sum of the squared differences between the observed values and the
values predicted by the line:
To find m and c, take the partial derivatives of S(m,c) with respect to m and c, and set them to
zero. Solve the resulting system of equations.
2. Hough Transform for Line Detection:
In the Hough Transform, each point in the image space is transformed to a line in the parameter
space (typically m and c in y=mx+c).
The intersection of these lines in the parameter space corresponds to the most likely line in the
image space.
The Hough Transform is particularly robust to noise and is commonly used in detecting lines in
edge maps.
3. Curve Fitting:
Polynomial Fitting: For curve fitting, a common approach is to fit a polynomial of degree n to
the data points:
The coefficients a0,a1,…,an are found by minimizing the sum of the squared errors, similar to the
line fitting approach.
4. Spline Fitting:
Splines are piecewise polynomials that can fit data more flexibly than a single polynomial. A
cubic spline, for example, fits a piecewise cubic polynomial between each pair of data points,
ensuring smoothness at the joins.
Example 1
Find the equation of the best-fit line using the least squares method for the five boundary points: (1,2), (2,3),
(3,5), (4,4), (5,6).
Example 2
Given three points on a boundary: (1,2), (2,3), (3,6). Fit a quadratic curve y=ax2+bx+c to these points.
Example 3
Given three points on a boundary: (1, 4), (2, 5), (3, 8), fit a quadratic curve y=ax2+bx+c to these points.
Solution:
Step 1: Write Down the System of Equations
For each point:
Example 4
Given three points on a boundary: (0, 1), (1, 0), (2, 3), fit a quadratic curve y=ax2+bx+c to these points.
Solution:
Step 1: Write Down the System of Equations
For each point:
Example 5
Given three points on a boundary: (-1, 0), (1, 2), (3, 6), fit a quadratic curve y=ax2+bx+c points.
Solution:
Step 1: Write Down the System of Equations
For each point:
Example 6
Given three points on a boundary: (1, 1), (2, 0), (3, 3), fit a quadratic curve y=ax2+bx+c to these points.
Solution:
Step 1: Write Down the System of Equations
For each point:
Example 7
Given three points on a boundary: (-1, 2), (0, -1), (1, 0), fit a quadratic curve y=ax2+bx+cy these points.
Solution:
Step 1: Write Down the System of Equations
For each point:
Example 8
Find the equation of the best-fit line using the least squares method for the five boundary points: (2, 4), (3,
7), (4, 8), (5, 11), (6, 13).
Example 9
Find the equation of the best-fit line using the least squares method for the four boundary points: (1, 3), (2,
5), (3, 7), (4, 9).
Example 10
Find the equation of the best-fit line using the least squares method for the five boundary points: (1,1), (2,2),
(3,1), (4,3), (5,4).
Example 11
Find the equation of the best-fit line using the least squares method for the four boundary points: (2,5), (4,7),
(6,8), (8,9).
Active Contours
Active contours, commonly referred to as "snakes," are a widely used method in image processing and
computer vision for object boundary detection. The concept was introduced by Kass, Witkin, and Terzo
Poulos in 1988. An active contour is a deformable model that evolves under the influence of internal forces
(coming from the shape of the contour itself) and external forces (coming from the image data), to find
object boundaries.
Key Concepts:
Internal Forces: These forces keep the contour smooth during its evolution. They are derived from
the properties of the curve itself.
External Forces: These forces are derived from the image data and are used to pull the contour
towards the edges of the object.
Energy Minimization: The active contour evolves to minimize an energy functional that combines
the internal and external forces. The contour reaches its final position when the energy is minimized,
ideally at the object's boundary.
Mathematical Formulation
The active contour model can be formulated as an energy minimization problem. The energy functional
Esnake for a contour C(s), parameterized by s (where s typically ranges from 0 to 1), is given by:
where:
Eint(C(s)) represents the internal energy, which controls the smoothness of the contour.
Eext(C(s)) represents the external energy, which attracts the contour towards image features like
edges.
1. Internal Energy: The internal energy is typically composed of two terms: the tension (first
derivative of the contour) and the bending (second derivative of the contour).
α controls the elasticity (tension), which encourages the contour to maintain its shape.
β controls the rigidity (bending), which encourages the contour to be smooth.
2. External Energy: The external energy is derived from the image and is often based on the image
gradient. It is designed to attract the contour towards edges.
where ∇I is the image gradient.
3. Total Energy: The total energy that the snake minimizes is:
The contour evolves over time to minimize this energy. The evolution can be done iteratively using
techniques like gradient descent.
Example 1
Consider a gray scale image with a circular object centered at (50, 50) with a radius of 20 pixels. The image
is 100×100 pixels. Use an active contour (snake) to detect the boundary of the circle. The initial contour is a
circle centered at (50, 50) with a radius of 15 pixels.
Parameters for the snake:
α=0.1 (internal energy weight)
β=0.4 (smoothing term)
γ=0.9 (step size)
Number of iterations: 5
Solution:
We will consider some sample points on the initial circle with radius 15 centered at (50,50):
Contour Points (Initial):
o P1=(50+15,50)=(65,50)
o P2=(50−15,50)=(35,50)
o P3=(50,50+15)=(50,65)
o P4=(50,50−15)=(50,35)
The goal is to iteratively update these points using the active contour parameters.
Active Contour Parameters:
o α=0.1(internal energy weight)
o β=0.4 (smoothing term weight, but assumed to be negligible in calculations for simplicity)
o γ=0.9 (step size)
o Number of iterations = 5
o P1=(50,35): ∇ I=(−2,3)
Assumed Image Gradients for each point in every iteration (to simulate external energy):
o P2=(65,50): ∇ I=(−3,2)
o P3=(50,65): ∇ I=(2,−2)
o P4=(35,50) ∇ I=(3,−2)
Let’s now show the updates for each point over all 5 iterations.
Iteration 1
1. Point P1=(50,35)
Elasticity Gradient:
o Neighboring Points: P4=(35,50) and P2=(65,50).
o Average Position: (50,50)
o Elasticity Gradient: (50,50)−(50,35)=(0,15).
o Given: ∇I=(−2,3).
External Energy Gradient:
Total Energy Gradient:
∇Etotal=0.1⋅(0,15)+(−2,3)=(0,1.5)+(−2,3)=(−2,4.5)
Update:
Example 2
Consider a grayscale image with an elliptical object centered at (60, 40) with a semi-major axis of 30 pixels
and a semi-minor axis of 20 pixels. The image size is 100×100 pixels. Use an active contour (snake) to
detect the boundary of the ellipse. The initial contour is an ellipse centered at (60, 40) with a semi-major
axis of 25 pixels and a semi-minor axis of 15 pixels.
Parameters for the snake:
α=0.2 (internal energy weight)
β=0.3 (smoothing term)
γ=0.8 (step size)
Number of iterations: 5
Example 3
Consider a gray scale image with a circular object centered at (50, 50) with a radius of 20 pixels. The image
size is 100×100 pixels. Use an active contour (snake) to detect the boundary of the circle. The initial contour
is a smaller circle centered at (50, 50) with a radius of 15 pixels.
Parameters for the snake:
α=0.4 (internal energy weight)
β=0.2 (smoothing term)
γ=0.9 (step size)
Number of iterations: 10
Hough Transform
The Hough Transform is a feature extraction technique used in image analysis, computer vision, and digital
image processing. It is particularly useful for detecting simple shapes, such as lines, circles, and ellipses, in
an image. The Hough Transform works by transforming the image space into a parameter space, where the
detection of shapes becomes easier. For instance, a straight line in an image can be represented as a point in
Hough space.
Key Concepts:
Image Space: The original image where objects are located.
Hough Space (Parameter Space): A transformed space where geometric shapes like lines and
circles correspond to points, making it easier to detect them.
Mathematical Formulation
1. Line Detection: For line detection, the Hough Transform uses the polar representation of a line:
ρ=xcosθ+ysinθ
where:
ρ is the perpendicular distance from the origin to the line.
θ is the angle between the x-axis and the perpendicular line from the origin to the line.
(x,y) are the coordinates of a point in the image.
Each point (x,y) in the image can be represented by a sinusoidal curve in (ρ,θ) space. Points that lie on the
same line in the image will produce sinusoids that intersect at a single point in Hough space.
2. Circle Detection: For detecting circles, the Hough Transform uses the equation of a circle:
where:
(a,b) is the center of the circle.
r is the radius of the circle.
(x,y) are the coordinates of a point on the circle.
In this case, the Hough space is three-dimensional, consisting of (a,b,r)
Applications
Line Detection: Detecting road markings, edges of objects, and lanes in autonomous driving.
Circle Detection: Identifying circular objects like coins, wheels, or pupils in images.
Ellipse Detection: Detecting elliptical shapes in various applications, such as eye detection in facial
recognition.
Example 1
You have an image with the following edge points:
(1,2)
(2,3)
(3,4)
(4,5)
We'll detect the line that passes through these points using the Hough Transform.
Solution:
Conclusion:
In this example, the Hough Transform detected the line y=x+1 that passes through the points (1,2), (2,3),
(3,4), and (4,5). The calculations at each step show how the Hough Transform is used to detect lines by
transforming the problem from the image space to the Hough space and then identifying the parameters that
define the line.
Example 2
You have an image with the following edge points:
(5, 8)
(6, 9)
(7, 10)
(8, 11)
We'll detect the line that passes through these points using the Hough Transform.
Example 3
You have an image with the following edge points corresponding to a circular object:
(15, 20)
(18, 24)
(21, 20)
(18, 16)
Assume the radius of the circle is known to be 5 pixels. Detect the circle using the Hough Transform.
Example 4
An image has multiple edge points corresponding to several lines. The edge points are:
(2, 3)
(4, 6)
(5, 8)
(7, 14)
(10, 20)
(12, 24)
Use the Hough Transform to detect the lines.
Generalized Hough Transform
Introduction:
The Generalized Hough Transform (GHT) is an extension of the standard Hough Transform. While the
traditional Hough Transform is mainly used to detect simple shapes like lines and circles in an image, the
Generalized Hough Transform allows for the detection of arbitrary shapes, particularly when the shape can
be represented by a template. This method is especially useful when the shape to be detected does not have a
simple parametric representation.
Mathematical Formulation:
1. Template Definition:
o Define a reference shape (template) using a set of boundary points P={(xi,yi)}P = \{(x_i,
y_i)\}P={(xi,yi)}.
o For each boundary point (xi,yi)(x_i, y_i)(xi,yi), compute a vector from a reference point
(usually the centroid of the shape) to the boundary point.
2. R-Table Construction:
o For each boundary point (xi,yi)(x_i, y_i)(xi,yi) on the template, compute the gradient
direction θi\theta_iθi at that point.
o The vector from the reference point (centroid) to the boundary point is stored in an R-Table,
indexed by the gradient direction θi\theta_iθi.
3. Detection in the Image:
o For each edge point in the image, compute its gradient direction.
o Use the gradient direction to look up the corresponding vectors in the R-Table.
o Vote for possible locations of the reference point (centroid) of the shape in the image.
4. Accumulator Array:
o An accumulator array is used to record votes for the position of the reference point in the
image.
o Peaks in the accumulator array indicate likely locations of the shape in the image.
Applications:
Object Recognition: The GHT can be used to recognize complex objects in images, especially when
the object may be rotated, translated, or even partially occluded.
Medical Imaging: Detecting specific anatomical structures in medical images where the structure
has a complex shape.
Automated Industrial Inspection: Detecting specific parts or defects in machinery components.
SIFT Detector: Interest Points, Detecting Blobs, SIFT
Detector, SIFT Descriptor
1. Interest Points
Interest points, also known as keypoints, are specific points in an image that are stable and distinctive. These
points are often edges, corners, or blobs that can be consistently identified across different images and
scales. Interest points are crucial in tasks such as object recognition, image stitching, and 3D reconstruction
because they provide a reliable way to match features between images.
2. Detecting Blobs
Blobs are regions in an image that have similar intensity or color. Detecting blobs involves identifying these
homogeneous regions, which can vary in size, shape, and intensity. Blob detection is a foundational
technique in computer vision, used for identifying regions of interest in an image.
Laplacian of Gaussian (LoG): The Laplacian of Gaussian is a common method for blob detection.
It involves smoothing the image with a Gaussian filter and then applying the Laplacian operator to
detect areas where the intensity changes significantly.
Difference of Gaussian (DoG): This method approximates the LoG by subtracting two Gaussian-
blurred versions of the image with different standard deviations. The DoG is computationally
efficient and is used in the SIFT algorithm for detecting keypoints.
3. SIFT Detector
The Scale-Invariant Feature Transform (SIFT) detector is a highly robust algorithm for detecting keypoints
in an image that are invariant to scale, rotation, and slight changes in illumination. SIFT first detects
keypoints using the DoG method and then refines these points to ensure stability and distinctiveness.
Steps in SIFT Detector:
1. Scale-space Construction: Generate a scale-space by progressively smoothing the image with
Gaussian filters of increasing standard deviations.
2. Keypoint Detection: Apply the Difference of Gaussian (DoG) method to detect keypoints across
different scales.
3. Keypoint Localization: Refine the detected keypoints by eliminating those with low contrast or that
lie along edges.
4. Orientation Assignment: Assign an orientation to each keypoint based on the local image gradient,
ensuring rotational invariance.
4. SIFT Descriptor
Once the keypoints are detected, the SIFT descriptor is used to describe the local image region around each
keypoint. The descriptor encodes the appearance of the keypoint in a way that is invariant to scale, rotation,
and illumination changes.
Steps in SIFT Descriptor:
1. Gradient Calculation: Compute the gradient magnitude and orientation in a 16x16 neighborhood
around the keypoint.
2. Orientation Histogram: Divide the neighborhood into 4x4 sub-regions, and for each, compute an
orientation histogram with 8 bins.
3. Descriptor Vector: The histograms from all sub-regions are concatenated to form a 128-
dimensional descriptor vector.
Applications:
Object Recognition: SIFT is widely used in object recognition tasks due to its robustness in
matching keypoints across different images.
Image Stitching: SIFT features are used to align images in panorama creation by matching
keypoints across overlapping images.
3D Reconstruction: SIFT keypoints are used in Structure from Motion (SfM) to reconstruct 3D
models from 2D images.
Tracking: SIFT features can be used to track objects across video frames.
Mathematical Formulation:
Scale-space Construction:
L(x,y,σ)=G(x,y,σ)∗I(x,y) Where:
L(x,y,σ) is the scale-space representation.
G(x,y,σ) is a Gaussian function with standard deviation σ.
I(x,y) is the input image.
Difference of Gaussian (DoG):
D(x,y,σ)=L(x,y,kσ)−L(x,y,σ) Where k is a constant multiplicative factor between successive scales.
Keypoint Localization:
The keypoints are found by locating the maxima and minima of the DoG function.
Orientation Assignment:
Descriptor Calculation:
For each keypoint, an orientation histogram is constructed with 8 bins, and the final descriptor is a 128-
dimensional vector.
Example 1
You are given two small grayscale images, I1I_1I1 and I2I_2I2. Each image contains several distinct points
(keypoints). Your task is to use the SIFT algorithm to find corresponding points (matches) between the two
images.
Consider the following descriptors (each descriptor is a simplified 3-dimensional vector) for the keypoints in
the two images:
Image I1 Keypoints:
o k1: Descriptor = [1, 3, 2]
o k2: Descriptor = [4, 4, 3]
o k3: Descriptor = [2, 1, 4]
Image I2 Keypoints:
o k1: Descriptor = [2, 1, 3]
o k2′: Descriptor = [1, 3, 2]
o k3′: Descriptor = [4, 4, 3]
Task: Find the best match for each keypoint in I1 with the keypoints in I2 using Euclidean distance.
Solution:
Step 1: Calculate Euclidean Distances Between Descriptors
Example 2:
You are given two small grayscale satellite images, S1S_1S1 and S2S_2S2. Each image contains distinct
geographical features (keypoints). Your task is to use the SIFT algorithm to find corresponding points
(matches) between the two images.
Consider the following descriptors (each descriptor is a simplified 3-dimensional vector) for the keypoints in
the two images:
Image S1 Keypoints:
o p1: Descriptor = [4, 1, 2]
o p2: Descriptor = [3, 5, 4]
o p3: Descriptor = [1, 1, 3]
Image S2 Keypoints:
o p1′: Descriptor = [5, 2, 1]
o p2′: Descriptor = [4, 1, 2]
o p3′: Descriptor = [3, 5, 4]
Task: Find the best match for each keypoint in S1 with the keypoints in S2 using Euclidean distance.
Solution:
Step 1: Calculate Euclidean Distances Between Descriptors
Example 3
You are given two grayscale images, O1 and O2, containing different objects. Your task is to use the SIFT
algorithm to find corresponding points (matches) between the two images.
Consider the following descriptors (each descriptor is a simplified 3-dimensional vector) for the keypoints in
the two images:
Image O1 Keypoints:
o q1: Descriptor = [2, 2, 5]
o q2: Descriptor = [1, 3, 3]
o q3: Descriptor = [4, 4, 1]
Image O2O_2O2 Keypoints:
o q1′: Descriptor = [3, 3, 4]
o q2′: Descriptor = [2, 2, 5]
o q3′: Descriptor = [1, 3, 3]
Task: Find the best match for each keypoint in O1 with the keypoints in O2 using Euclidean distance.
Solution:
Step 1: Calculate Euclidean Distances Between Descriptors
SURF Features
SURF (Speeded-Up Robust Features) is a robust local feature detector and descriptor that is faster and more
efficient than SIFT (Scale-Invariant Feature Transform). It was introduced by Herbert Bay, Tinne
Tuytelaars, and Luc Van Gool in 2006. SURF is used in computer vision tasks like object recognition, image
registration, and 3D reconstruction.
Key Characteristics of SURF:
1. Speed: SURF uses an integral image and approximates the determinant of the Hessian matrix,
making it much faster than SIFT.
2. Robustness: SURF is robust to image scaling, rotation, and, to some extent, changes in illumination
and viewpoint.
3. Scale Invariance: Like SIFT, SURF is scale-invariant, meaning it can detect the same features in
images at different scales.
Mathematical Formulation of SURF
1. Integral Image: The first step in SURF is computing the integral image, which allows for fast
computation of box-type convolution filters. The integral image at a point (x,y) is defined as:
where I(i,j) is the pixel intensity at position (i,j).
2. Hessian Matrix-Based Detector: SURF detects key points using the Hessian matrix. For a point
x=(x,y) in an image, the Hessian matrix H(x,σ) at scale σ is defined as:
where Lxx(x,σ), Lxy(x,σ), and Lyy(x,σ) are the convolution of the Gaussian second-order derivatives
with the image at scale σ.
3. Keypoint Detection: The determinant of the Hessian matrix is used to detect interest points:
The keypoints are detected at the local maxima of the determinant of the Hessian matrix.
Orientation Assignment: To achieve rotation invariance, an orientation is assigned to each keypoint.
The dominant orientation is found by calculating the Haar wavelet responses in a circular region around the
keypoint.
Descriptor Construction: SURF constructs a descriptor by summarizing the information in a
neighborhood around the keypoint. The descriptor is built using the sum of the Haar wavelet responses in
4x4 sub-regions.