0% found this document useful (0 votes)

29 views39 pages

Lec 1 - 2

Uploaded by

abbasahmer734

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views39 pages

Lec 1 - 2

Uploaded by

abbasahmer734

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

COMPUTER VISION

Introduction
Definition
“Computer Vision is an interdisciplinary field of study that empowers
computers to interpret and understand visual information from the
world, much like humans do with their sense of vision.”
• It involves the development of algorithms and techniques;
• Enabling computers to process, analyze, and interpret images and videos
• These algorithms allow machines to extract meaningful information from
visual data
• Leading to a wide range of applications in various industries.
Applicaitons
1. Image and Video Analysis:
• Enables machines to identify objects, track motion, detect patterns, and even recognize faces in images and videos;
• This has applications in security, surveillance, and content analysis.

2. Medical Imaging:
• Diagnosing diseases from medical images (e.g., X-rays, MRIs, CT scans) and segmenting organs and tissues;
• It aids in early disease detection and treatment planning.

3. Autonomous Vehicles:
• Self-driving cars and autonomous drones heavily rely on Computer Vision for perception
• These systems use computer vision algorithms to detect pedestrians, recognize traffic signs, and navigate safely.

4. Industrial Automation:
• Computer Vision is crucial in industrial automation for quality control and inspection tasks
• It can identify defects in manufacturing processes, such as flaws in products on an assembly line.

5. Agriculture:
• In agriculture, computer vision is used for crop monitoring, yield prediction, and disease detection;
• Drones equipped with cameras can capture images of large agricultural fields and provide insights to farmers.

6. Robotics:
• Robots equipped with computer vision systems can navigate in unstructured environments, pick and place objects, and interact with humans;
• This has applications in manufacturing, healthcare, and even space exploration.

7. Augmented and Virtual Reality:

• Computer Vision enhances augmented and virtual reality experiences by recognizing real-world objects and overlaying digital information seamlessly.

8. Human-Computer Interaction:
• Gesture recognition and facial expression analysis enable natural and intuitive interactions between humans and computers, improving user experience.
Importance
 Automation: Computer Vision enables automation in various industries, reducing the need for manual labor
and enhancing efficiency.
 For instance, it allows factories to inspect products 24/7 without human intervention.
 Accuracy: Computer Vision systems can perform tasks with high precision and consistency, surpassing
human capabilities in tasks like medical diagnosis or detecting defects in manufacturing.
 Safety: In autonomous vehicles and drones, Computer Vision contributes to safety by identifying obstacles,
pedestrians, and other vehicles, potentially reducing accidents.
 Productivity: By automating tasks, Computer Vision frees up human resources to focus on more complex
and creative aspects of their work.
 Scientific Advancements: In fields like biology, astronomy, and environmental science, Computer Vision
aids researchers in analyzing vast datasets, leading to discoveries and breakthroughs.
Historically Development (1/2)
1950s-1960s: Early Beginnings

• 1956: The Dartmouth Workshop marked the birth of Artificial Intelligence (AI) as a field. While Computer
Vision wasn't explicitly mentioned, it laid the foundation for AI research.

 1960s: Early computer vision experiments involved edge detection and simple pattern recognition.
Researchers aimed to replicate human visual perception in machines.
1970s-1980s: Image Processing Emerges

 1970s: Researchers like David Marr developed early theories of vision, emphasizing the importance of
understanding how the human visual system works.
 1980: The "Pictorial Structures for Object Recognition" paper by Fischler and Elschlager introduced the
idea of modeling objects as arrangements of simple parts.
 1980s: The "Vision" textbook by David Marr became influential in the field. Marr's work laid the foundation
for understanding vision as a process involving multiple levels of representation.
Historically Development (2/2)
1990s-2000s: Rise of Feature-Based Methods

 1990s: The development of feature-based methods for object recognition and tracking gained prominence. Techniques
like Scale-Invariant Feature Transform (SIFT) emerged.
 1999: The publication of "Multiple View Geometry in Computer Vision" by Richard Hartley and Andrew Zisserman
became a seminal work on geometric methods in computer vision.
 2000s: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) marked a significant milestone, driving
advances in image classification using deep learning.
2010s-Present: Deep Learning Revolution

 2012: The ImageNet competition was won by a Convolutional Neural Network (CNN) model developed by Alex
Krizhevsky, sparking the deep learning revolution in computer vision.
 2015: The "Deep Residual Learning for Image Recognition" paper introduced ResNet, a deep neural network
architecture that allowed training extremely deep networks efficiently.
 2016: The advent of Generative Adversarial Networks (GANs) opened up possibilities in image generation and
manipulation.
Vision Paradigms and Basics of Image Formation
Different Paradigms in Computer Vision
Paradigms in Computer Vision (1/3)
Computer Vision encompasses various paradigms, each addressing different aspects of visual data analysis and
understanding. Here's an overview of these paradigms
1. Image Processing:
• Definition: Image processing focuses on manipulating and enhancing images to extract useful
information or improve their visual quality. It involves a wide range of techniques, including filtering,
segmentation, noise reduction, and image restoration.
• Importance: Image processing is foundational in Computer Vision. It's used to preprocess images
before higher-level tasks, such as object detection or recognition. It's also essential in medical imaging,
satellite image analysis, and more.

2. Object Detection:
• Definition: Object detection is the task of locating and classifying objects within an image or video
sequence. It involves identifying and delineating objects of interest, often using bounding boxes.
• Importance: Object detection has numerous applications, including autonomous vehicles, surveillance,
robotics, and face detection in cameras and smartphones.
Paradigms in Computer Vision (2/3)
3. Image Understanding:
 Definition: Image understanding aims to interpret the content and context of images at a higher
semantic level. It involves recognizing objects, scenes, relationships, and even emotions within
images.
 Importance: Image understanding is essential in applications like content-based image retrieval,
scene understanding for autonomous robots, and sentiment analysis based on visual content.
4. Computer Vision:
 Definition: 3D computer vision deals with the reconstruction and understanding of three-dimensional
scenes from two-dimensional images or video streams. It involves tasks like depth estimation, 3D object
recognition, and 3D scene reconstruction.
 Importance: 3D computer vision is critical for applications such as augmented reality, 3D modeling,
autonomous navigation, and robotics.
Paradigms in Computer Vision (3/3)
5. Deep Learning and Neural Networks:

 Definition: Deep learning, particularly Convolutional Neural Networks (CNNs), is a

paradigm within computer vision that focuses on training deep neural networks to
automatically learn hierarchical features from raw image data.
 Importance: Deep learning has revolutionized computer vision by achieving state-of-
the-art results in image classification, object detection, and segmentation.
Basics of Image Formation (1/2)
1. Pixels:

 Definition: Pixels, short for "picture elements," are the smallest units of an image. They are tiny square or rectangular elements,
each containing a single color value. The arrangement of pixels forms the visual content of an image.
 Importance: Understanding pixels is fundamental in image processing and computer vision. Pixels are the building blocks of
digital images, and their manipulation enables various image processing operations.

2. Color Spaces:

 Definition: Color spaces are mathematical models that represent colors using a set of coordinates or values. They provide a
structured way to describe and manipulate colors in images.
 Importance: Color spaces are crucial for tasks like image analysis, color correction, and color-based object detection. Different
color spaces are used for different purposes.
 Common Color Spaces:
 RGB (Red, Green, Blue): The RGB color space represents colors using combinations of red, green, and blue intensities. It's
widely used in digital displays and cameras.
 HSV (Hue, Saturation, Value): HSV separates color information into hue (the type of color), saturation (color intensity),
and value (brightness).
 YUV: YUV separates luminance (Y) from chrominance (U and V), making it useful for image compression and
transmission.
 CMYK: CMYK is used in printing and represents colors as combinations of cyan, magenta, yellow, and black ink.
Basics of Image Formation (2/2)
3. Image Representation:

 Definition: Image representation refers to how images are stored and structured in a digital format. It
involves encoding pixel values, dimensions, and metadata.
 Importance: Image representation determines how images are processed, transmitted, and displayed by
computers and imaging devices. Efficient image representations are essential for storage and transmission.
 Common Image Formats:
 JPEG (Joint Photographic Experts Group): A lossy compression format suitable for photographs and
images with continuous tones.
 PNG (Portable Network Graphics): A lossless compression format that preserves image quality, often
used for graphics and images with transparency.
 BMP (Bitmap): An uncompressed format used for simple image storage with no loss of quality.
 TIFF (Tagged Image File Format): A flexible format supporting lossless compression and multiple
color spaces, often used in professional image processing.
Image Sensing and Acquisition (1/2)
1. Cameras:
 Definition: Cameras are devices designed for capturing still images or recording video by capturing and storing visual information. They consist of
optical components, an image sensor, and electronics for processing and storing images.
 Importance: Cameras are fundamental tools in photography, computer vision, and many scientific and industrial applications. Understanding how
cameras work is essential for various fields.
 Components of a Camera:
 Lens: The lens focuses incoming light onto the image sensor. Different lenses offer varying focal lengths and apertures.
 Image Sensor: The image sensor, often a CCD (Charge-Coupled Device) or CMOS (Complementary Metal-Oxide-Semiconductor) sensor,
converts light into electrical signals.
 Shutter: The shutter controls the duration of light exposure to the image sensor.
 Aperture: The aperture controls the amount of light entering the camera.
 Viewfinder or LCD Screen: These components allow users to compose and view images.
2. Sensors:
 Definition: Image sensors are electronic devices that convert light into electrical signals, representing the captured image. Common types include
CCD and CMOS sensors.
 Importance: Image sensors are at the heart of digital cameras, smartphones, and other imaging devices. They play a critical role in determining image
quality.
 Types of Image Sensors:
 CCD (Charge-Coupled Device): CCD sensors are known for their high-quality image capture, making them suitable for digital cameras.
 CMOS (Complementary Metal-Oxide-Semiconductor): CMOS sensors are widely used in digital cameras and smartphones due to their
lower power consumption and faster readout.
 Sensor Characteristics: Image sensors are characterized by parameters like resolution (megapixels), sensor size (e.g., APS-C, full-frame), and
sensitivity (ISO range).
Image Sensing and Acquisition (2/2)
3. Lenses:

 Definition: Lenses are optical elements that focus light onto an image sensor or film. They play a crucial role in determining image quality,
focus, and field of view.
 Importance: Lenses are critical components of cameras and optical systems. Different types of lenses are used for various applications, from
wide-angle photography to macro imaging.
 Types of Lenses:
 Prime Lenses: Fixed focal length lenses that offer excellent optical quality.
 Zoom Lenses: Variable focal length lenses that provide flexibility in framing.
 Macro Lenses: Designed for close-up photography with high magnification.
 Wide-Angle Lenses: Capture a broader field of view, ideal for landscapes and architecture.
 Telephoto Lenses: Offer high magnification for distant subjects.
 Lens Characteristics: Lens characteristics include focal length, aperture (f-stop), lens coatings, and distortion.
Note:
Understanding cameras, sensors, and lenses is crucial for photographers, filmmakers, and professionals in fields like
computer vision and remote sensing. These components collectively determine the quality and characteristics of the
images captured.
Camera Calibration in Computer Vision
Camera Calibration in Computer
Vision
• Definition: Camera calibration is the process of determining the intrinsic and extrinsic parameters of a camera to establish a
mathematical relationship between the 3D world and the 2D image. Intrinsic parameters include focal length, principal point, and lens
distortion, while extrinsic parameters define the camera's position and orientation in 3D space
• Camera calibration is crucial in Computer Vision for several reasons:

1. Accurate 3D Reconstruction: In Computer Vision, one of the primary objectives is to reconstruct 3D scenes or objects from 2D
images. Accurate calibration provides the necessary parameters to transform 2D image coordinates into real-world 3D coordinates.
Without proper calibration, the reconstructed 3D models would be inaccurate and unreliable.
2. Object Measurement: Calibrated cameras enable precise measurements of objects in the real world. This is essential in
applications like industrial quality control, where accurate dimensions must be obtained from images.
3. Camera Pose Estimation: Knowing the extrinsic parameters allows for determining the camera's position and orientation in space.
This is vital for tasks like simultaneous localization and mapping (SLAM) in robotics and augmented reality applications.
4. Image Rectification: Calibration helps correct lens distortions, such as radial and tangential distortion, ensuring that straight lines
remain straight in rectified images. This is critical for tasks like stereo vision and visual odometry.
5. Depth Estimation: In applications requiring depth information, such as 3D reconstruction or obstacle avoidance in autonomous
vehicles, calibrated cameras provide the data needed to estimate depth accurately.
6. Object Tracking: In object tracking scenarios, calibrated cameras provide a consistent coordinate system, allowing objects to be
tracked consistently across frames. This is vital in surveillance and motion analysis.
7. Augmented Reality: In augmented reality applications, accurate camera calibration is necessary for overlaying virtual objects
seamlessly onto the real world. Misalignment due to incorrect calibration can lead to a poor user experience.
Methods for Camera Calibration
• Camera calibration can be achieved through various methods:

1. Checkerboard Calibration: This involves capturing images of a checkerboard pattern from different
angles. The known properties of the pattern allow for the extraction of camera parameters.
2. Calibration Grids: Similar to checkerboard calibration but using grid patterns. They provide known
reference points for calibration.
3. Self-Calibration: This method estimates camera parameters using images of a scene with known 3D
geometry, such as a scene with a planar surface.
4. Bundle Adjustment: A more complex optimization technique that refines camera parameters along with 3D
scene structure. It's often used in Structure from Motion (SfM) and multi-camera systems.
5. Commercial Calibration Tools: Some software and hardware tools are available for automated calibration
of cameras, which can be useful in industrial applications.
Calibration Mistakes
Camera calibration is a crucial process in computer vision, and making mistakes during calibration can lead to inaccurate
results and unreliable vision systems. Here are some common calibration mistakes to be aware of:

1. Insufficient Data: Collecting too few calibration images or using a limited range of camera poses can result in an
incomplete calibration. A lack of diverse data may lead to inaccuracies, especially when dealing with varying lighting
conditions or perspectives.
2. Inaccurate Feature Detection: The calibration process often relies on detecting and tracking features in the calibration
images, such as corners of a checkerboard pattern. Mistakes in feature detection, such as misidentifying corners or missing
some, can lead to calibration errors.
3. Not Accounting for Lens Distortion: Failing to model and correct for lens distortion (e.g., radial and tangential
distortion) can result in distorted images and inaccurate calibration. Lens distortion can significantly affect the accuracy of
measurements and 3D reconstructions.
4. Incorrect Pattern Dimensions: When using calibration patterns like checkerboards or grids, it's essential to provide
accurate dimensions of the pattern. Errors in specifying the size of the squares can lead to scaling issues during calibration.
5. Ignoring Nonlinear Distortion Models: Some cameras exhibit complex, nonlinear distortion that cannot be accurately
modeled with simple distortion models. Neglecting these nonlinear distortions can result in calibration errors.
Calibration Mistakes Cont…
6. Inconsistent Image Quality: Inconsistent image quality across calibration images, such as variations in exposure, focus, or
sharpness, can lead to inaccuracies. It's important to maintain consistent image quality during calibration
7. Inadequate Calibration Object Placement: Placing the calibration pattern too close to the camera or too far away can lead
to calibration errors. It's important to cover the entire field of view, including regions near the edges.
8. Not Updating Calibration: Over time, cameras may undergo mechanical changes or suffer from wear and tear, which can
affect their intrinsic parameters. It's crucial to periodically recalibrate cameras to account for these changes.
9. Not Considering Environmental Factors: Changes in environmental conditions, such as temperature or humidity, can
impact the camera's optical properties. Neglecting these factors can lead to calibration inaccuracies.
10. Not Validating Calibration: After performing calibration, it's essential to validate its accuracy using a separate set of test
images. Failing to do so may result in the use of an inaccurate calibration.
11. Using Incorrect Calibration Models: Choosing an inappropriate calibration model or assuming a simplified model when a
more complex one is required can lead to errors. Ensure that the selected model matches the camera's characteristics.
12. Ignoring Sensor Characteristics: Not taking into account sensor-specific characteristics, such as noise, can affect
calibration accuracy, especially in low-light conditions.
13. Improper Documentation: Failing to document the calibration process thoroughly can make it challenging to reproduce or
troubleshoot calibration results.
Intrinsic and Extrinsic Camera Parameters
Intrinsic Camera Parameters:
Intrinsic camera parameters describe the internal properties of the camera that affect how it captures an image.
These parameters are often considered constant for a specific camera (once calibrated) and are used to transform
3D world coordinates into 2D image coordinates. The primary intrinsic parameters include:

 Focal Length (f): The focal length is the distance from the camera's optical center (the camera's aperture) to
the image sensor or film plane. It is usually expressed in millimeters. The focal length determines how much
a camera can zoom in on a subject.
 Principal Point (cx, cy): The principal point represents the coordinates of the optical center on the image
sensor or film plane. It is typically given in pixels. The principal point defines the point where the optical
axis intersects the image plane and is often close to the center of the image.
 Pixel Aspect Ratio (sx, sy): This parameter accounts for the difference in scale between the x and y
dimensions of the camera sensor's pixels. It is used to ensure that pixels are square. In most cases, sx and sy
are equal (sx = sy = 1), indicating square pixels.
 Lens Distortion Parameters: These parameters account for optical distortions introduced by the camera
lens. Common types of distortion include radial distortion and tangential distortion. Lens distortion
parameters are used to correct distortions in the images.
Extrinsic Camera Parameters
Extrinsic camera parameters describe the camera's position and orientation in 3D space concerning the world
coordinate system. These parameters define the camera's pose, enabling the transformation of 3D world
coordinates into the camera's coordinate system. The primary extrinsic parameters include:

 Rotation Matrix (R): The rotation matrix represents how the camera is oriented in 3D space. It describes
the rotation of the camera from the world coordinate system to its coordinate system. The rotation matrix is
typically expressed as a 3x3 matrix.
 Translation Vector (T): The translation vector specifies the camera's position in the world coordinate
system. It represents the translation of the camera's optical center from the world origin. The translation
vector is typically a 3D vector.
 Extrinsic Matrix (E): The extrinsic matrix combines the rotation matrix and translation vector to represent
the complete transformation from world coordinates to camera coordinates. It is often used in camera
projection equations.
Why Intrinsic and Extrinsic
Parameters Matter ?
 3D-to-2D Projection: Intrinsic parameters are essential for the projection of 3D points in the world onto the
2D image plane. They define how objects in the real world are transformed into pixel coordinates in the
image.
 Camera Pose: Extrinsic parameters determine the camera's position and orientation relative to the world.
This information is crucial for tasks like object localization, augmented reality, and robotics.
 Lens Distortion Correction: Intrinsic parameters, specifically lens distortion parameters, are necessary to
correct optical distortions in images, ensuring that straight lines remain straight and accurate measurements
can be made.
 Camera Calibration: Together, intrinsic and extrinsic parameters form the core of camera calibration,
allowing computer vision systems to accurately understand the relationship between the camera and the real
world.
• Calibrating a camera involves estimating these parameters accurately, typically through the use of calibration
patterns or known 3D points. Once calibrated, the camera can be used for various computer vision tasks,
including 3D reconstruction, object tracking, and image analysis, with greater accuracy and reliability.
The Pinhole Camera
The Pinhole Camera
• The pinhole camera model is a fundamental concept in computer vision and computer graphics used to
represent the way cameras capture images. It simplifies the complex process of light entering a camera lens
and striking a photosensitive surface into a basic mathematical model. Let's explore the pinhole camera
model in detail, including its parameters and limitations:
Pinhole Camera Model:
• The pinhole camera model is based on the idea that light from a scene passes through a single point (the
"pinhole") and projects an inverted image onto a flat surface (the "image plane"). This model is highly
simplified but provides a solid foundation for understanding camera projection and geometry.
Parameters of the Pinhole Camera
Model
 Focal Length (f): The focal length represents the distance from the pinhole (or the lens) to the image plane.
It determines the magnification and field of view of the camera. A shorter focal length results in a wider
field of view, while a longer focal length narrows the field of view, essentially acting as a zoom.
 Pinhole (or Lens) Position: The location of the pinhole (or lens) is crucial. It represents the camera's
position in 3D space relative to the scene being photographed. This position is often defined as the optical
center of the camera.
 Image Plane: The image plane is the flat surface where the 2D image is formed. It is usually parallel to the
lens and positioned at a certain distance (the focal length) from the lens.
 Principal Point (Cx, Cy): The principal point represents the coordinates of the image plane where the
optical axis (a straight line passing through the pinhole) intersects. It is typically near the center of the image
plane.
 Pixel Coordinates (u, v): These represent the 2D coordinates of a point on the image plane. The pixel
coordinates are related to the 3D world coordinates (X, Y, Z) through the process of camera projection.
Limitations of the Pinhole Camera
Model
While the pinhole camera model is a useful simplification, it has several limitations:
 No Lens Effects: The pinhole camera model assumes an idealized "pinhole" and does not account for the
complex optical effects introduced by real camera lenses, such as distortion and aberration.
 Infinite Depth of Field: In reality, cameras can have a limited depth of field where objects at various
distances from the camera are in focus. The pinhole camera model assumes an infinite depth of field.
 No Shutter Mechanism: The model does not consider the concept of a shutter or exposure time, which is
essential for capturing images in the real world.
 No Color Information: The pinhole camera model simplifies the capture process by not considering color
information. Real cameras capture color using sensors with multiple channels.
 No Noise or Sensor Artifacts: In practice, cameras may introduce noise and artifacts into images. The
pinhole camera model does not account for these real-world imperfections.
Projective Geometry
Homogeneous Coordinates and
Vanishing Points
• Projective geometry is a branch of geometry that deals with the properties
of geometric figures and transformations, focusing on the principles of
perspective and the projective properties of objects. In computer vision and
computer graphics, projective geometry plays a crucial role in
understanding how 3D scenes are projected onto 2D images.
• Two fundamental concepts in projective geometry are homogeneous
coordinates and vanishing points
1. Homogeneous Coordinates
Definition: Homogeneous coordinates are a mathematical representation used to simplify projective
transformations, such as perspective projection. In this coordinate system, a point in 2D space is represented as
a vector of three or four values (x, y, w), where (x/w, y/w) represents the actual 2D point. The variable w is a
scaling factor introduced to enable the representation of points at infinity.
 Importance: Homogeneous coordinates allow projective transformations, including perspective projection,
to be represented as matrix multiplications, making the mathematics more concise and suitable for computer
algorithms. They are particularly useful for describing transformations that involve translations, rotations,
and perspective effects.
 Perspective Division: To obtain the 2D point from its homogeneous representation (x, y, w), you divide the
x and y coordinates by the w coordinate: (x/w, y/w). This operation is known as perspective division and
converts the homogeneous coordinates back to Cartesian coordinates.
2. Vanishing Points
Definition: Vanishing points are essential concepts in perspective geometry. They are points in the image where
parallel lines in 3D space appear to converge or meet. Vanishing points are a result of perspective projection and
are critical for understanding the depth and 3D structure of a scene from a 2D image.
 Types of Vanishing Points:
 Horizon Line Vanishing Points: In scenes with a flat ground plane (e.g., a road or a field), parallel
lines that lie on the ground plane converge to two vanishing points on the horizon line.
 Vertical Vanishing Points: Vertical lines (e.g., the edges of tall buildings) converge to a single
vanishing point above or below the image depending on the direction of the lines.
 Applications: Vanishing points are used in computer vision for tasks like camera calibration, structure-
from-motion, and estimating scene depth. They provide critical cues for understanding the 3D arrangement
of objects in a scene.
• Example: When you look at a photograph of a road that extends into the distance, the sides of the road
appear to converge to a point on the horizon. This point is a vanishing point, and understanding its location
and properties helps in analyzing the depth and perspective of the scene.
Camera Projection Matrix and
Perspective Projection Equation
(1/2)
How 3D points in the world are transformed into 2D image coordinates on the camera's image plane ?

1. Camera Projection Matrix:

 Definition: The camera projection matrix, often denoted as �P, is a 3x4 matrix that encapsulates the entire process of projecting 3D
world points onto a 2D image plane. It combines the intrinsic and extrinsic camera parameters, as well as the homogeneous
coordinates of a 3D point, to produce the 2D image coordinates of that point.
 Components of the Projection Matrix:
 Intrinsic Parameters: The upper-left 3x3 submatrix of the projection matrix encodes the camera's intrinsic parameters, such as
the focal length, pixel aspect ratio, and principal point. This submatrix is often referred to as the camera's calibration matrix K.
 Extrinsic Parameters: The rightmost column of the projection matrix contains the camera's extrinsic parameters, which specify
the camera's position and orientation in 3D space.
 World Coordinates: The 3D point in world coordinates is represented as a column vector with homogeneous coordinates
[X,Y,Z,1].
 Perspective Division: The 2D image coordinates of the point are obtained by applying perspective division to the resulting
homogeneous coordinates. The final 2D image coordinates are given by dividing the first two components of the result by the third
component, yielding [u,v,w], where u and v are the pixel coordinates on the image plane, and w is a scaling factor.
 Mathematical Representation: The camera projection matrix P is used in the perspective projection equation to map 3D points to 2D
image coordinates .
Here, s is a scaling factor that accounts for perspective projection
Camera Projection Matrix and
Perspective Projection Equation
(2/2)
2. Perspective Projection Equation

Definition: The perspective projection equation describes how a 3D point in world coordinates is projected onto a 2D image plane. It's
a mathematical model that simulates how objects in the real world appear smaller as they move farther away from the camera.
 Components of the Equation:
 X,Y,Z: The 3D coordinates of the point in the world coordinate system.
 u,v: The 2D pixel coordinates of the point on the image plane.
 fx,fy: The focal lengths of the camera in the x and y directions.
 cx,cy: The principal point coordinates (usually close to the center of the image).
 w: The perspective scaling factor.
 Mathematical Representation: The perspective projection equation relates 3D world coordinates to 2D image coordinates using
the camera's intrinsic parameters:
• u=fx⋅X/Z+cx
• v=fy⋅Y/Z+cy
• These equations describe how the X and Y coordinates of the 3D point are divided by their distance from the camera (Z) and then
scaled by the focal lengths and principal point offsets to obtain the 2D pixel coordinates (u,v) on the image plane.
Applications
• The camera projection matrix and perspective projection equation are fundamental
for tasks such as camera calibration, 3D reconstruction from images, augmented
reality, and camera pose estimation.
• They enable the mapping of real-world objects and scenes into the image space,
allowing computer vision systems to analyze and interpret visual data.
Camera Calibration Techniques
Direct Linear Transform (DLT) Method/Algorithm
• Step 1: Collect Calibration Data:
To calibrate the camera, you need a set of calibration images that contain known 3D world points and their
corresponding 2D image coordinates. These calibration images should ideally cover a wide range of camera
poses and object positions within the camera's field of view.
• Step 2: Set Up Correspondence Pairs:
Each calibration image provides a set of correspondence pairs. These pairs consist of a 3D point (X,Y,Z) in the
world coordinate system and its corresponding 2D pixel coordinates (u,v) in the image.
• Step 3: Formulate the Projection Matrix:
The goal is to find the 3x4 camera projection matrix P that relates 3D world points to 2D image points. This
matrix can be expressed as:
Camera Calibration Techniques
Cont…
• Step 4: Build a System of Linear Equations:
For each correspondence pair, you can form two linear equations representing the x and y components of the
projection equation. This results in a system of linear equations.
• Step 5: Solve the Linear System:
To obtain the camera projection matrix P, you need to solve the overdetermined linear system of equations.
Techniques like the Singular Value Decomposition (SVD) can be used to find the least-squares solution to this
system, which minimizes the error between the predicted and actual 2D image coordinates.
• Step 6: Extract Intrinsic and Extrinsic Parameters:
Once you have the camera projection matrix P, you can extract the intrinsic and extrinsic camera parameters:
 Intrinsic Parameters: These parameters include the focal length, pixel aspect ratio, and principal point.
They are typically obtained from the calibration matrix within P.
 Extrinsic Parameters: These parameters describe the camera's position and orientation in 3D space and
can be derived from the rotation and translation components of P.
Limitations and Considerations
 Overdetermined System: The DLT method requires a sufficient number of correspondence pairs to form an
overdetermined system of linear equations. More correspondences provide more robust calibration.
 Noise and Error Handling: The DLT method is sensitive to noise in correspondence pairs. Outliers or
inaccuracies in the data can lead to incorrect calibration results. Robust techniques like RANSAC (Random
Sample Consensus) are often used to handle outliers.
 Nonlinear Effects: The DLT method assumes a pinhole camera model, which does not account for complex
lens distortions. For highly accurate calibration, additional techniques may be required to model and correct
lens distortions.

Applications
 The DLT method is used in a wide range of applications, including photogrammetry, 3D reconstruction,
computer vision, robotics, and augmented reality.
 It's particularly useful when you need to calibrate a camera and understand its parameters for accurate
measurement or scene reconstruction.
Assignment #01
• Please more explore and summrize about Lense Distortion ?
• Why it is important?
• What its possible correction methods ?
• You need to capture indoor and outdoor scenes with (a) camera/s,
would you prefer pinhole camera for them ?
• If YES, would you use it for both, indoor and outdoor ? And Why ?
• If YES for any one of the case, defend it in your own words.
• If NOT for both cases, defend with proper reasoning.
• Please provide provide a complete case study to implement Camera
Projection Matrix and Prespective Projection Equation?
Next Lecture
Lighting and Image Formation

grp3 Computervision
No ratings yet
grp3 Computervision
28 pages
A Comprehensive Guide To Computer Vision
No ratings yet
A Comprehensive Guide To Computer Vision
6 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Computer Vision Assignment
No ratings yet
Computer Vision Assignment
10 pages
What Is Computer Vision in 2025? A Beginners Guide: Artificial Intelligence
No ratings yet
What Is Computer Vision in 2025? A Beginners Guide: Artificial Intelligence
48 pages
New Seminar
No ratings yet
New Seminar
11 pages
CV SVD L01 P1 Intro
No ratings yet
CV SVD L01 P1 Intro
35 pages
Computer Vision: In-Depth Overview
No ratings yet
Computer Vision: In-Depth Overview
5 pages
CV Unit 1
No ratings yet
CV Unit 1
30 pages
Raz Report Final
No ratings yet
Raz Report Final
37 pages
A Computer Vision System Processes Images Acquired
No ratings yet
A Computer Vision System Processes Images Acquired
4 pages
Class - Notes Computer Vision
No ratings yet
Class - Notes Computer Vision
3 pages
CXVXFV
No ratings yet
CXVXFV
12 pages
Group 17 Computer Vision @Lcd-1
No ratings yet
Group 17 Computer Vision @Lcd-1
25 pages
Computer Vision Research Document
No ratings yet
Computer Vision Research Document
3 pages
Computer Vision Advancement Rebecca
No ratings yet
Computer Vision Advancement Rebecca
17 pages
Lec1 - Computer Vision - v1
No ratings yet
Lec1 - Computer Vision - v1
38 pages
How Computer Vision Is Used in Everyday Life
No ratings yet
How Computer Vision Is Used in Everyday Life
5 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
2 pages
Making Machines See Class 12 Notes
No ratings yet
Making Machines See Class 12 Notes
6 pages
Chapter One-3
No ratings yet
Chapter One-3
8 pages
Computer Vision SM-1
No ratings yet
Computer Vision SM-1
26 pages
CV Notes
No ratings yet
CV Notes
75 pages
CV 1
No ratings yet
CV 1
21 pages
Computer Vision: Key Concepts & Tasks
No ratings yet
Computer Vision: Key Concepts & Tasks
4 pages
Computer Visiion
No ratings yet
Computer Visiion
4 pages
CV Digital Notes
No ratings yet
CV Digital Notes
77 pages
Computer Vision ET
No ratings yet
Computer Vision ET
12 pages
18cse390t U1 s1 Slo1 Content
No ratings yet
18cse390t U1 s1 Slo1 Content
15 pages
CV Unit 1
No ratings yet
CV Unit 1
17 pages
Computer Vision
No ratings yet
Computer Vision
28 pages
Unit 1
No ratings yet
Unit 1
186 pages
Computer Vision
No ratings yet
Computer Vision
2 pages
Computer Vision for Tech Enthusiasts
No ratings yet
Computer Vision for Tech Enthusiasts
3 pages
What Is Computer Vision
No ratings yet
What Is Computer Vision
18 pages
Two
No ratings yet
Two
4 pages
Summary of Computer Vision
No ratings yet
Summary of Computer Vision
6 pages
CV #1 Course Introduction-1
No ratings yet
CV #1 Course Introduction-1
61 pages
1 Intro To CV
No ratings yet
1 Intro To CV
76 pages
Computer Vision Introduction
No ratings yet
Computer Vision Introduction
11 pages
PDF Joiner
No ratings yet
PDF Joiner
38 pages
Wa0194.
No ratings yet
Wa0194.
7 pages
Computer Vision
No ratings yet
Computer Vision
10 pages
Computer Vision Presentation Updated
No ratings yet
Computer Vision Presentation Updated
15 pages
Computer Vision
No ratings yet
Computer Vision
19 pages
CV Unit 1 Overview of Computer Vison and Application
No ratings yet
CV Unit 1 Overview of Computer Vison and Application
51 pages
CO1 Notes
No ratings yet
CO1 Notes
105 pages
Computer Vision for Tech Enthusiasts
No ratings yet
Computer Vision for Tech Enthusiasts
47 pages
FALLSEM2025-26 VL BCSE407L 00100 TH 2025-07-31 Introduction
No ratings yet
FALLSEM2025-26 VL BCSE407L 00100 TH 2025-07-31 Introduction
37 pages
Computer Vision Presentation
No ratings yet
Computer Vision Presentation
10 pages
Abhijith Vision
No ratings yet
Abhijith Vision
17 pages
Making Machines See (Unit-3)
No ratings yet
Making Machines See (Unit-3)
8 pages
Computer Vision Intro
No ratings yet
Computer Vision Intro
2 pages
CVIP Module 01 Reviewer
No ratings yet
CVIP Module 01 Reviewer
20 pages
8394 Making Machines See
No ratings yet
8394 Making Machines See
50 pages
Unit - 2 Computer Vision
No ratings yet
Unit - 2 Computer Vision
27 pages
Format of 1st Page - Seminar
No ratings yet
Format of 1st Page - Seminar
3 pages

Lec 1 - 2

Uploaded by

Lec 1 - 2

Uploaded by

COMPUTER VISION

7. Augmented and Virtual Reality:

 Definition: Deep learning, particularly Convolutional Neural Networks (CNNs), is a

1. Camera Projection Matrix:

You might also like