Lec 1 - 2
Lec 1 - 2
Introduction
Definition
“Computer Vision is an interdisciplinary field of study that empowers
computers to interpret and understand visual information from the
world, much like humans do with their sense of vision.”
• It involves the development of algorithms and techniques;
• Enabling computers to process, analyze, and interpret images and videos
• These algorithms allow machines to extract meaningful information from
visual data
• Leading to a wide range of applications in various industries.
Applicaitons
1. Image and Video Analysis:
• Enables machines to identify objects, track motion, detect patterns, and even recognize faces in images and videos;
• This has applications in security, surveillance, and content analysis.
2. Medical Imaging:
• Diagnosing diseases from medical images (e.g., X-rays, MRIs, CT scans) and segmenting organs and tissues;
• It aids in early disease detection and treatment planning.
3. Autonomous Vehicles:
• Self-driving cars and autonomous drones heavily rely on Computer Vision for perception
• These systems use computer vision algorithms to detect pedestrians, recognize traffic signs, and navigate safely.
4. Industrial Automation:
• Computer Vision is crucial in industrial automation for quality control and inspection tasks
• It can identify defects in manufacturing processes, such as flaws in products on an assembly line.
5. Agriculture:
• In agriculture, computer vision is used for crop monitoring, yield prediction, and disease detection;
• Drones equipped with cameras can capture images of large agricultural fields and provide insights to farmers.
6. Robotics:
• Robots equipped with computer vision systems can navigate in unstructured environments, pick and place objects, and interact with humans;
• This has applications in manufacturing, healthcare, and even space exploration.
8. Human-Computer Interaction:
• Gesture recognition and facial expression analysis enable natural and intuitive interactions between humans and computers, improving user experience.
Importance
Automation: Computer Vision enables automation in various industries, reducing the need for manual labor
and enhancing efficiency.
For instance, it allows factories to inspect products 24/7 without human intervention.
Accuracy: Computer Vision systems can perform tasks with high precision and consistency, surpassing
human capabilities in tasks like medical diagnosis or detecting defects in manufacturing.
Safety: In autonomous vehicles and drones, Computer Vision contributes to safety by identifying obstacles,
pedestrians, and other vehicles, potentially reducing accidents.
Productivity: By automating tasks, Computer Vision frees up human resources to focus on more complex
and creative aspects of their work.
Scientific Advancements: In fields like biology, astronomy, and environmental science, Computer Vision
aids researchers in analyzing vast datasets, leading to discoveries and breakthroughs.
Historically Development (1/2)
1950s-1960s: Early Beginnings
• 1956: The Dartmouth Workshop marked the birth of Artificial Intelligence (AI) as a field. While Computer
Vision wasn't explicitly mentioned, it laid the foundation for AI research.
1960s: Early computer vision experiments involved edge detection and simple pattern recognition.
Researchers aimed to replicate human visual perception in machines.
1970s-1980s: Image Processing Emerges
1970s: Researchers like David Marr developed early theories of vision, emphasizing the importance of
understanding how the human visual system works.
1980: The "Pictorial Structures for Object Recognition" paper by Fischler and Elschlager introduced the
idea of modeling objects as arrangements of simple parts.
1980s: The "Vision" textbook by David Marr became influential in the field. Marr's work laid the foundation
for understanding vision as a process involving multiple levels of representation.
Historically Development (2/2)
1990s-2000s: Rise of Feature-Based Methods
1990s: The development of feature-based methods for object recognition and tracking gained prominence. Techniques
like Scale-Invariant Feature Transform (SIFT) emerged.
1999: The publication of "Multiple View Geometry in Computer Vision" by Richard Hartley and Andrew Zisserman
became a seminal work on geometric methods in computer vision.
2000s: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) marked a significant milestone, driving
advances in image classification using deep learning.
2010s-Present: Deep Learning Revolution
2012: The ImageNet competition was won by a Convolutional Neural Network (CNN) model developed by Alex
Krizhevsky, sparking the deep learning revolution in computer vision.
2015: The "Deep Residual Learning for Image Recognition" paper introduced ResNet, a deep neural network
architecture that allowed training extremely deep networks efficiently.
2016: The advent of Generative Adversarial Networks (GANs) opened up possibilities in image generation and
manipulation.
Vision Paradigms and Basics of Image Formation
Different Paradigms in Computer Vision
Paradigms in Computer Vision (1/3)
Computer Vision encompasses various paradigms, each addressing different aspects of visual data analysis and
understanding. Here's an overview of these paradigms
1. Image Processing:
• Definition: Image processing focuses on manipulating and enhancing images to extract useful
information or improve their visual quality. It involves a wide range of techniques, including filtering,
segmentation, noise reduction, and image restoration.
• Importance: Image processing is foundational in Computer Vision. It's used to preprocess images
before higher-level tasks, such as object detection or recognition. It's also essential in medical imaging,
satellite image analysis, and more.
2. Object Detection:
• Definition: Object detection is the task of locating and classifying objects within an image or video
sequence. It involves identifying and delineating objects of interest, often using bounding boxes.
• Importance: Object detection has numerous applications, including autonomous vehicles, surveillance,
robotics, and face detection in cameras and smartphones.
Paradigms in Computer Vision (2/3)
3. Image Understanding:
Definition: Image understanding aims to interpret the content and context of images at a higher
semantic level. It involves recognizing objects, scenes, relationships, and even emotions within
images.
Importance: Image understanding is essential in applications like content-based image retrieval,
scene understanding for autonomous robots, and sentiment analysis based on visual content.
4. Computer Vision:
Definition: 3D computer vision deals with the reconstruction and understanding of three-dimensional
scenes from two-dimensional images or video streams. It involves tasks like depth estimation, 3D object
recognition, and 3D scene reconstruction.
Importance: 3D computer vision is critical for applications such as augmented reality, 3D modeling,
autonomous navigation, and robotics.
Paradigms in Computer Vision (3/3)
5. Deep Learning and Neural Networks:
Definition: Pixels, short for "picture elements," are the smallest units of an image. They are tiny square or rectangular elements,
each containing a single color value. The arrangement of pixels forms the visual content of an image.
Importance: Understanding pixels is fundamental in image processing and computer vision. Pixels are the building blocks of
digital images, and their manipulation enables various image processing operations.
2. Color Spaces:
Definition: Color spaces are mathematical models that represent colors using a set of coordinates or values. They provide a
structured way to describe and manipulate colors in images.
Importance: Color spaces are crucial for tasks like image analysis, color correction, and color-based object detection. Different
color spaces are used for different purposes.
Common Color Spaces:
RGB (Red, Green, Blue): The RGB color space represents colors using combinations of red, green, and blue intensities. It's
widely used in digital displays and cameras.
HSV (Hue, Saturation, Value): HSV separates color information into hue (the type of color), saturation (color intensity),
and value (brightness).
YUV: YUV separates luminance (Y) from chrominance (U and V), making it useful for image compression and
transmission.
CMYK: CMYK is used in printing and represents colors as combinations of cyan, magenta, yellow, and black ink.
Basics of Image Formation (2/2)
3. Image Representation:
Definition: Image representation refers to how images are stored and structured in a digital format. It
involves encoding pixel values, dimensions, and metadata.
Importance: Image representation determines how images are processed, transmitted, and displayed by
computers and imaging devices. Efficient image representations are essential for storage and transmission.
Common Image Formats:
JPEG (Joint Photographic Experts Group): A lossy compression format suitable for photographs and
images with continuous tones.
PNG (Portable Network Graphics): A lossless compression format that preserves image quality, often
used for graphics and images with transparency.
BMP (Bitmap): An uncompressed format used for simple image storage with no loss of quality.
TIFF (Tagged Image File Format): A flexible format supporting lossless compression and multiple
color spaces, often used in professional image processing.
Image Sensing and Acquisition (1/2)
1. Cameras:
Definition: Cameras are devices designed for capturing still images or recording video by capturing and storing visual information. They consist of
optical components, an image sensor, and electronics for processing and storing images.
Importance: Cameras are fundamental tools in photography, computer vision, and many scientific and industrial applications. Understanding how
cameras work is essential for various fields.
Components of a Camera:
Lens: The lens focuses incoming light onto the image sensor. Different lenses offer varying focal lengths and apertures.
Image Sensor: The image sensor, often a CCD (Charge-Coupled Device) or CMOS (Complementary Metal-Oxide-Semiconductor) sensor,
converts light into electrical signals.
Shutter: The shutter controls the duration of light exposure to the image sensor.
Aperture: The aperture controls the amount of light entering the camera.
Viewfinder or LCD Screen: These components allow users to compose and view images.
2. Sensors:
Definition: Image sensors are electronic devices that convert light into electrical signals, representing the captured image. Common types include
CCD and CMOS sensors.
Importance: Image sensors are at the heart of digital cameras, smartphones, and other imaging devices. They play a critical role in determining image
quality.
Types of Image Sensors:
CCD (Charge-Coupled Device): CCD sensors are known for their high-quality image capture, making them suitable for digital cameras.
CMOS (Complementary Metal-Oxide-Semiconductor): CMOS sensors are widely used in digital cameras and smartphones due to their
lower power consumption and faster readout.
Sensor Characteristics: Image sensors are characterized by parameters like resolution (megapixels), sensor size (e.g., APS-C, full-frame), and
sensitivity (ISO range).
Image Sensing and Acquisition (2/2)
3. Lenses:
Definition: Lenses are optical elements that focus light onto an image sensor or film. They play a crucial role in determining image quality,
focus, and field of view.
Importance: Lenses are critical components of cameras and optical systems. Different types of lenses are used for various applications, from
wide-angle photography to macro imaging.
Types of Lenses:
Prime Lenses: Fixed focal length lenses that offer excellent optical quality.
Zoom Lenses: Variable focal length lenses that provide flexibility in framing.
Macro Lenses: Designed for close-up photography with high magnification.
Wide-Angle Lenses: Capture a broader field of view, ideal for landscapes and architecture.
Telephoto Lenses: Offer high magnification for distant subjects.
Lens Characteristics: Lens characteristics include focal length, aperture (f-stop), lens coatings, and distortion.
Note:
Understanding cameras, sensors, and lenses is crucial for photographers, filmmakers, and professionals in fields like
computer vision and remote sensing. These components collectively determine the quality and characteristics of the
images captured.
Camera Calibration in Computer Vision
Camera Calibration in Computer
Vision
• Definition: Camera calibration is the process of determining the intrinsic and extrinsic parameters of a camera to establish a
mathematical relationship between the 3D world and the 2D image. Intrinsic parameters include focal length, principal point, and lens
distortion, while extrinsic parameters define the camera's position and orientation in 3D space
• Camera calibration is crucial in Computer Vision for several reasons:
1. Accurate 3D Reconstruction: In Computer Vision, one of the primary objectives is to reconstruct 3D scenes or objects from 2D
images. Accurate calibration provides the necessary parameters to transform 2D image coordinates into real-world 3D coordinates.
Without proper calibration, the reconstructed 3D models would be inaccurate and unreliable.
2. Object Measurement: Calibrated cameras enable precise measurements of objects in the real world. This is essential in
applications like industrial quality control, where accurate dimensions must be obtained from images.
3. Camera Pose Estimation: Knowing the extrinsic parameters allows for determining the camera's position and orientation in space.
This is vital for tasks like simultaneous localization and mapping (SLAM) in robotics and augmented reality applications.
4. Image Rectification: Calibration helps correct lens distortions, such as radial and tangential distortion, ensuring that straight lines
remain straight in rectified images. This is critical for tasks like stereo vision and visual odometry.
5. Depth Estimation: In applications requiring depth information, such as 3D reconstruction or obstacle avoidance in autonomous
vehicles, calibrated cameras provide the data needed to estimate depth accurately.
6. Object Tracking: In object tracking scenarios, calibrated cameras provide a consistent coordinate system, allowing objects to be
tracked consistently across frames. This is vital in surveillance and motion analysis.
7. Augmented Reality: In augmented reality applications, accurate camera calibration is necessary for overlaying virtual objects
seamlessly onto the real world. Misalignment due to incorrect calibration can lead to a poor user experience.
Methods for Camera Calibration
• Camera calibration can be achieved through various methods:
1. Checkerboard Calibration: This involves capturing images of a checkerboard pattern from different
angles. The known properties of the pattern allow for the extraction of camera parameters.
2. Calibration Grids: Similar to checkerboard calibration but using grid patterns. They provide known
reference points for calibration.
3. Self-Calibration: This method estimates camera parameters using images of a scene with known 3D
geometry, such as a scene with a planar surface.
4. Bundle Adjustment: A more complex optimization technique that refines camera parameters along with 3D
scene structure. It's often used in Structure from Motion (SfM) and multi-camera systems.
5. Commercial Calibration Tools: Some software and hardware tools are available for automated calibration
of cameras, which can be useful in industrial applications.
Calibration Mistakes
Camera calibration is a crucial process in computer vision, and making mistakes during calibration can lead to inaccurate
results and unreliable vision systems. Here are some common calibration mistakes to be aware of:
1. Insufficient Data: Collecting too few calibration images or using a limited range of camera poses can result in an
incomplete calibration. A lack of diverse data may lead to inaccuracies, especially when dealing with varying lighting
conditions or perspectives.
2. Inaccurate Feature Detection: The calibration process often relies on detecting and tracking features in the calibration
images, such as corners of a checkerboard pattern. Mistakes in feature detection, such as misidentifying corners or missing
some, can lead to calibration errors.
3. Not Accounting for Lens Distortion: Failing to model and correct for lens distortion (e.g., radial and tangential
distortion) can result in distorted images and inaccurate calibration. Lens distortion can significantly affect the accuracy of
measurements and 3D reconstructions.
4. Incorrect Pattern Dimensions: When using calibration patterns like checkerboards or grids, it's essential to provide
accurate dimensions of the pattern. Errors in specifying the size of the squares can lead to scaling issues during calibration.
5. Ignoring Nonlinear Distortion Models: Some cameras exhibit complex, nonlinear distortion that cannot be accurately
modeled with simple distortion models. Neglecting these nonlinear distortions can result in calibration errors.
Calibration Mistakes Cont…
6. Inconsistent Image Quality: Inconsistent image quality across calibration images, such as variations in exposure, focus, or
sharpness, can lead to inaccuracies. It's important to maintain consistent image quality during calibration
7. Inadequate Calibration Object Placement: Placing the calibration pattern too close to the camera or too far away can lead
to calibration errors. It's important to cover the entire field of view, including regions near the edges.
8. Not Updating Calibration: Over time, cameras may undergo mechanical changes or suffer from wear and tear, which can
affect their intrinsic parameters. It's crucial to periodically recalibrate cameras to account for these changes.
9. Not Considering Environmental Factors: Changes in environmental conditions, such as temperature or humidity, can
impact the camera's optical properties. Neglecting these factors can lead to calibration inaccuracies.
10. Not Validating Calibration: After performing calibration, it's essential to validate its accuracy using a separate set of test
images. Failing to do so may result in the use of an inaccurate calibration.
11. Using Incorrect Calibration Models: Choosing an inappropriate calibration model or assuming a simplified model when a
more complex one is required can lead to errors. Ensure that the selected model matches the camera's characteristics.
12. Ignoring Sensor Characteristics: Not taking into account sensor-specific characteristics, such as noise, can affect
calibration accuracy, especially in low-light conditions.
13. Improper Documentation: Failing to document the calibration process thoroughly can make it challenging to reproduce or
troubleshoot calibration results.
Intrinsic and Extrinsic Camera Parameters
Intrinsic Camera Parameters:
Intrinsic camera parameters describe the internal properties of the camera that affect how it captures an image.
These parameters are often considered constant for a specific camera (once calibrated) and are used to transform
3D world coordinates into 2D image coordinates. The primary intrinsic parameters include:
Focal Length (f): The focal length is the distance from the camera's optical center (the camera's aperture) to
the image sensor or film plane. It is usually expressed in millimeters. The focal length determines how much
a camera can zoom in on a subject.
Principal Point (cx, cy): The principal point represents the coordinates of the optical center on the image
sensor or film plane. It is typically given in pixels. The principal point defines the point where the optical
axis intersects the image plane and is often close to the center of the image.
Pixel Aspect Ratio (sx, sy): This parameter accounts for the difference in scale between the x and y
dimensions of the camera sensor's pixels. It is used to ensure that pixels are square. In most cases, sx and sy
are equal (sx = sy = 1), indicating square pixels.
Lens Distortion Parameters: These parameters account for optical distortions introduced by the camera
lens. Common types of distortion include radial distortion and tangential distortion. Lens distortion
parameters are used to correct distortions in the images.
Extrinsic Camera Parameters
Extrinsic camera parameters describe the camera's position and orientation in 3D space concerning the world
coordinate system. These parameters define the camera's pose, enabling the transformation of 3D world
coordinates into the camera's coordinate system. The primary extrinsic parameters include:
Rotation Matrix (R): The rotation matrix represents how the camera is oriented in 3D space. It describes
the rotation of the camera from the world coordinate system to its coordinate system. The rotation matrix is
typically expressed as a 3x3 matrix.
Translation Vector (T): The translation vector specifies the camera's position in the world coordinate
system. It represents the translation of the camera's optical center from the world origin. The translation
vector is typically a 3D vector.
Extrinsic Matrix (E): The extrinsic matrix combines the rotation matrix and translation vector to represent
the complete transformation from world coordinates to camera coordinates. It is often used in camera
projection equations.
Why Intrinsic and Extrinsic
Parameters Matter ?
3D-to-2D Projection: Intrinsic parameters are essential for the projection of 3D points in the world onto the
2D image plane. They define how objects in the real world are transformed into pixel coordinates in the
image.
Camera Pose: Extrinsic parameters determine the camera's position and orientation relative to the world.
This information is crucial for tasks like object localization, augmented reality, and robotics.
Lens Distortion Correction: Intrinsic parameters, specifically lens distortion parameters, are necessary to
correct optical distortions in images, ensuring that straight lines remain straight and accurate measurements
can be made.
Camera Calibration: Together, intrinsic and extrinsic parameters form the core of camera calibration,
allowing computer vision systems to accurately understand the relationship between the camera and the real
world.
• Calibrating a camera involves estimating these parameters accurately, typically through the use of calibration
patterns or known 3D points. Once calibrated, the camera can be used for various computer vision tasks,
including 3D reconstruction, object tracking, and image analysis, with greater accuracy and reliability.
The Pinhole Camera
The Pinhole Camera
• The pinhole camera model is a fundamental concept in computer vision and computer graphics used to
represent the way cameras capture images. It simplifies the complex process of light entering a camera lens
and striking a photosensitive surface into a basic mathematical model. Let's explore the pinhole camera
model in detail, including its parameters and limitations:
Pinhole Camera Model:
• The pinhole camera model is based on the idea that light from a scene passes through a single point (the
"pinhole") and projects an inverted image onto a flat surface (the "image plane"). This model is highly
simplified but provides a solid foundation for understanding camera projection and geometry.
Parameters of the Pinhole Camera
Model
Focal Length (f): The focal length represents the distance from the pinhole (or the lens) to the image plane.
It determines the magnification and field of view of the camera. A shorter focal length results in a wider
field of view, while a longer focal length narrows the field of view, essentially acting as a zoom.
Pinhole (or Lens) Position: The location of the pinhole (or lens) is crucial. It represents the camera's
position in 3D space relative to the scene being photographed. This position is often defined as the optical
center of the camera.
Image Plane: The image plane is the flat surface where the 2D image is formed. It is usually parallel to the
lens and positioned at a certain distance (the focal length) from the lens.
Principal Point (Cx, Cy): The principal point represents the coordinates of the image plane where the
optical axis (a straight line passing through the pinhole) intersects. It is typically near the center of the image
plane.
Pixel Coordinates (u, v): These represent the 2D coordinates of a point on the image plane. The pixel
coordinates are related to the 3D world coordinates (X, Y, Z) through the process of camera projection.
Limitations of the Pinhole Camera
Model
While the pinhole camera model is a useful simplification, it has several limitations:
No Lens Effects: The pinhole camera model assumes an idealized "pinhole" and does not account for the
complex optical effects introduced by real camera lenses, such as distortion and aberration.
Infinite Depth of Field: In reality, cameras can have a limited depth of field where objects at various
distances from the camera are in focus. The pinhole camera model assumes an infinite depth of field.
No Shutter Mechanism: The model does not consider the concept of a shutter or exposure time, which is
essential for capturing images in the real world.
No Color Information: The pinhole camera model simplifies the capture process by not considering color
information. Real cameras capture color using sensors with multiple channels.
No Noise or Sensor Artifacts: In practice, cameras may introduce noise and artifacts into images. The
pinhole camera model does not account for these real-world imperfections.
Projective Geometry
Homogeneous Coordinates and
Vanishing Points
• Projective geometry is a branch of geometry that deals with the properties
of geometric figures and transformations, focusing on the principles of
perspective and the projective properties of objects. In computer vision and
computer graphics, projective geometry plays a crucial role in
understanding how 3D scenes are projected onto 2D images.
• Two fundamental concepts in projective geometry are homogeneous
coordinates and vanishing points
1. Homogeneous Coordinates
Definition: Homogeneous coordinates are a mathematical representation used to simplify projective
transformations, such as perspective projection. In this coordinate system, a point in 2D space is represented as
a vector of three or four values (x, y, w), where (x/w, y/w) represents the actual 2D point. The variable w is a
scaling factor introduced to enable the representation of points at infinity.
Importance: Homogeneous coordinates allow projective transformations, including perspective projection,
to be represented as matrix multiplications, making the mathematics more concise and suitable for computer
algorithms. They are particularly useful for describing transformations that involve translations, rotations,
and perspective effects.
Perspective Division: To obtain the 2D point from its homogeneous representation (x, y, w), you divide the
x and y coordinates by the w coordinate: (x/w, y/w). This operation is known as perspective division and
converts the homogeneous coordinates back to Cartesian coordinates.
2. Vanishing Points
Definition: Vanishing points are essential concepts in perspective geometry. They are points in the image where
parallel lines in 3D space appear to converge or meet. Vanishing points are a result of perspective projection and
are critical for understanding the depth and 3D structure of a scene from a 2D image.
Types of Vanishing Points:
Horizon Line Vanishing Points: In scenes with a flat ground plane (e.g., a road or a field), parallel
lines that lie on the ground plane converge to two vanishing points on the horizon line.
Vertical Vanishing Points: Vertical lines (e.g., the edges of tall buildings) converge to a single
vanishing point above or below the image depending on the direction of the lines.
Applications: Vanishing points are used in computer vision for tasks like camera calibration, structure-
from-motion, and estimating scene depth. They provide critical cues for understanding the 3D arrangement
of objects in a scene.
• Example: When you look at a photograph of a road that extends into the distance, the sides of the road
appear to converge to a point on the horizon. This point is a vanishing point, and understanding its location
and properties helps in analyzing the depth and perspective of the scene.
Camera Projection Matrix and
Perspective Projection Equation
(1/2)
How 3D points in the world are transformed into 2D image coordinates on the camera's image plane ?
Definition: The perspective projection equation describes how a 3D point in world coordinates is projected onto a 2D image plane. It's
a mathematical model that simulates how objects in the real world appear smaller as they move farther away from the camera.
Components of the Equation:
X,Y,Z: The 3D coordinates of the point in the world coordinate system.
u,v: The 2D pixel coordinates of the point on the image plane.
fx,fy: The focal lengths of the camera in the x and y directions.
cx,cy: The principal point coordinates (usually close to the center of the image).
w: The perspective scaling factor.
Mathematical Representation: The perspective projection equation relates 3D world coordinates to 2D image coordinates using
the camera's intrinsic parameters:
• u=fx⋅X/Z+cx
• v=fy⋅Y/Z+cy
• These equations describe how the X and Y coordinates of the 3D point are divided by their distance from the camera (Z) and then
scaled by the focal lengths and principal point offsets to obtain the 2D pixel coordinates (u,v) on the image plane.
Applications
• The camera projection matrix and perspective projection equation are fundamental
for tasks such as camera calibration, 3D reconstruction from images, augmented
reality, and camera pose estimation.
• They enable the mapping of real-world objects and scenes into the image space,
allowing computer vision systems to analyze and interpret visual data.
Camera Calibration Techniques
Direct Linear Transform (DLT) Method/Algorithm
• Step 1: Collect Calibration Data:
To calibrate the camera, you need a set of calibration images that contain known 3D world points and their
corresponding 2D image coordinates. These calibration images should ideally cover a wide range of camera
poses and object positions within the camera's field of view.
• Step 2: Set Up Correspondence Pairs:
Each calibration image provides a set of correspondence pairs. These pairs consist of a 3D point (X,Y,Z) in the
world coordinate system and its corresponding 2D pixel coordinates (u,v) in the image.
• Step 3: Formulate the Projection Matrix:
The goal is to find the 3x4 camera projection matrix P that relates 3D world points to 2D image points. This
matrix can be expressed as:
Camera Calibration Techniques
Cont…
• Step 4: Build a System of Linear Equations:
For each correspondence pair, you can form two linear equations representing the x and y components of the
projection equation. This results in a system of linear equations.
• Step 5: Solve the Linear System:
To obtain the camera projection matrix P, you need to solve the overdetermined linear system of equations.
Techniques like the Singular Value Decomposition (SVD) can be used to find the least-squares solution to this
system, which minimizes the error between the predicted and actual 2D image coordinates.
• Step 6: Extract Intrinsic and Extrinsic Parameters:
Once you have the camera projection matrix P, you can extract the intrinsic and extrinsic camera parameters:
Intrinsic Parameters: These parameters include the focal length, pixel aspect ratio, and principal point.
They are typically obtained from the calibration matrix within P.
Extrinsic Parameters: These parameters describe the camera's position and orientation in 3D space and
can be derived from the rotation and translation components of P.
Limitations and Considerations
Overdetermined System: The DLT method requires a sufficient number of correspondence pairs to form an
overdetermined system of linear equations. More correspondences provide more robust calibration.
Noise and Error Handling: The DLT method is sensitive to noise in correspondence pairs. Outliers or
inaccuracies in the data can lead to incorrect calibration results. Robust techniques like RANSAC (Random
Sample Consensus) are often used to handle outliers.
Nonlinear Effects: The DLT method assumes a pinhole camera model, which does not account for complex
lens distortions. For highly accurate calibration, additional techniques may be required to model and correct
lens distortions.
Applications
The DLT method is used in a wide range of applications, including photogrammetry, 3D reconstruction,
computer vision, robotics, and augmented reality.
It's particularly useful when you need to calibrate a camera and understand its parameters for accurate
measurement or scene reconstruction.
Assignment #01
• Please more explore and summrize about Lense Distortion ?
• Why it is important?
• What its possible correction methods ?
• You need to capture indoor and outdoor scenes with (a) camera/s,
would you prefer pinhole camera for them ?
• If YES, would you use it for both, indoor and outdoor ? And Why ?
• If YES for any one of the case, defend it in your own words.
• If NOT for both cases, defend with proper reasoning.
• Please provide provide a complete case study to implement Camera
Projection Matrix and Prespective Projection Equation?
Next Lecture
Lighting and Image Formation