ELB1502
Study Guide
ELECTRICAL ENGINEERING ROBOTICS
Unit 5: Introduction to Robot Vision
Diploma in Electrical Engineering
In the Department of Electrical Engineering
School of Engineering
College of Science, Engineering & Technology (CSET)
University of South Africa
Compiled by: Dr. E.M. Migabo (PhD Computer Science & DEng Electrical Engineering)
Instructors: Dr. M.E. Migabo & Mr. A.M. Dlamini
May, 2023
ELB1502
Study Guide
I. Learning objectives
The learning objectives for this study unit are:
a. To define what is computer vision.
b. To understand human vision system and machine
c. To understand images as matrices
d. To understand the camera model
e. To understand robotic applications of machine vision
II. Unit summary
The following set of slides summarize very well the content of the study unit in terms of learning
objectives a. to d.:
2
STUDY UNIT 4
Introduction to Computer
Vision for Robotics
Compiled by: Dr. E.M. Migabo (PhD)
Introduction to Electrical Robotics
ELB1502
Introduction to Computer Vision 1
8QLW Outline
● Introduction
○ What is CV?
○ Overview of the field
○ A look at history
○ Hard Problem?
● Human Vision System & the Machine
○ The human vision system
○ Fooling humans
○ The computer vision system
● Images as matrices.
○ How cameras work to produce these matrices
○ Meaning of Intensity, Color etc
○ Shoutout to Image Processing
Introduction to Computer Vision 2
8QLW Outline
● Camera Model
○ Pinhole Camera Model
○ Intrinsic Camera Matrix
○ Camera Calibration
Introduction to Computer Vision 3
Introduction
Introduction to Computer Vision 4
What is Computer Vision?
Universe
Computer
Image Vision Information
System
Image
Image
Processing
Introduction to Computer Vision 5
What is Computer Vision?
Image Credits: CS131, Fall ‘18, Stanford
Introduction to Computer Vision 6
What is Computer Vision?
● Computer Vision is deals with extracting information regarding the 3D world
we live in using a single or a bunch of images.
● Computer Vision like most other fields today, is at the junction of numerous
disciplines from Biology to Computer Science and has applications only
limited by our imagination.
Introduction to Computer Vision 7
Overview of the field
Image Credits: XKCD, 1425, 2014
Introduction to Computer Vision 8
Overview of the field
Image Credits: https://tinyurl.com/y53by9pr Image Credits: XKCD, 1425, 2014
Image Credits: https://tinyurl.com/y53by9pr
Introduction to Computer Vision 9
Overview of the field
What kind of Information?
Universe
(Image
Processing + )
Image
Computer Vision
System
Introduction to Computer Vision 10
Overview of the field
What kind of Information?
Image Credits: Karpathy, CVPR’15
Image Credits: https://tinyurl.com/lxuex6o
Introduction to Computer Vision 11
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
Recognition: Cat?
Image: https://tinyurl.com/yanp2o5e
Introduction to Computer Vision 12
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
Recognition: Cat?
Localization: Where is the cat?
Image: https://tinyurl.com/yanp2o5e
Introduction to Computer Vision 13
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
Object Detection: Which Objects
are here and where?
Image: https://tinyurl.com/y4ly96rd
Introduction to Computer Vision 14
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
2. Segmentation
Segmentation: Which pixels
belong to which object?
Credits: Own Work
Introduction to Computer Vision 15
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
Image Colorization: From
Grayscale to Colored Images
Credits: Richard Zhang, CVPR 2016
Introduction to Computer Vision 16
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
Image Enhancement: Real Time
Image Enhancement
Credits: Michael Gharbi, ACM Graphics 2017
Introduction to Computer Vision 17
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
Super Resolution: Upsampling
Images while preserving quality
Credits: https://github.com/tensorlayer/srgan
Introduction to Computer Vision 18
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
Image Description: Automatic
semantic description for images
Credits: Karpathy, CVPR 2015
Introduction to Computer Vision 19
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
Image Generation: A style based
generator architecture for GANs
Credits: Tero Karras, arXiv 2018
Introduction to Computer Vision 20
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
Optical Flow: Lucas Kanade
6. Motion Estimation method for motion estimation
Credits: https://tinyurl.com/y5rloh3g
Introduction to Computer Vision 21
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
3D Reconstruction: REMODE,
6. Motion Estimation Real Time Reconstruction
Credits: Matia Pizzoli, ICRA 2014
7. 3D reconstruction from Images
Introduction to Computer Vision 22
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
3D Reconstruction: REMODE,
6. Motion Estimation Real Time Reconstruction
Credits: Matia Pizzoli, ICRA 2014
7. 3D reconstruction from Images
8. Visual SLAM
Introduction to Computer Vision 23
Overview of the field
Primary themes in Computer Vision are:
1. Object Detection
2. Segmentation
3. Image Modifications/Enhancements
4. Image to Text
5. Image Generation
Biometrics : Fingerprint
6. Motion Estimation Detection, Apple Face ID
7. 3D reconstruction from Images Credits:https://tinyurl.com/y2a7wybz,
TheVerge Youtube
8. Visual SLAM
9. Biometrics and more ...
Introduction to Computer Vision 24
A look at history
● Robert Nathan started writing
computer programs for enhancing
Credits: EE604, nasa.gov
images from NASA’s spacecraft’s at
Jet Propulsion Lab, NASA.
● The Summer Vision Project: Project at
MIT to solve a significant part of
visual system. Primary Objective was
to divide the image into object,
background and chaos regions, over
the course of a summer.
Credits: https://tinyurl.com/y6bpo4nk
Introduction to Computer Vision 25
A look at history
Credits: Prof. Tanaya Guha, EE698K
Introduction to Computer Vision 26
A look at history
Credits: Prof. Tanaya Guha, EE698K
Introduction to Computer Vision 27
Hard Problem?
● Why are we still working on roughly the same problem as the “summer vision
project”?
● Why is it that creating 3D models of chairs is easier than identifying them?
Introduction to Computer Vision 28
Hard Problem?
● Why are we still working on roughly the same problem as the “summer vision
project”?
● Why is it that creating 3D models of chairs is easier than identifying them?
➔ There is a large between some ~1920x1080x3 numbers and the high-level
abstract meaning we associate with them.
➔ Images are 2D representation of information from 3D world.
Introduction to Computer Vision 29
Human Vision System & Computer Vision System
Introduction to Computer Vision 30
The human vision system
Credits: https://tinyurl.com/y6bkhnqa
Introduction to Computer Vision 31
The human vision system
Introduction to Computer Vision 32
The human vision system
Credits: Ulas Bagci, UCF
Introduction to Computer Vision 33 Harsh Sinha
Fooling humans
Credits: https://tinyurl.com/y49rp7sd
Credits: Wikipedia, Spinning Dancer
Credits: Oleg Shuplyak, Pinterest
Introduction to Computer Vision 34
The (human) (computer) vision system
Credits: CS131, Stanford
Introduction to Computer Vision 35
Fooling Computers
Credits: https://tinyurl.com/l5pwp6t Credits: Wikipedia, Barber Pole Illusion
Introduction to Computer Vision 36
Images as Matrices
Introduction to Computer Vision 37
Camera Models
Introduction to Computer Vision 38
Camera Models
Credits: https://tinyurl.com/y6qen2vb
Introduction to Computer Vision 39
Camera Models
Not this one but models as
in modelling a phenomena
Credits: https://tinyurl.com/y6qen2vb
Introduction to Computer Vision 40
Camera Models
● Like so many things in engineering, we create a simple “model” of a camera
to which is easy to understand and can approximate the actual functioning of
a camera to a good degree.
● There are different models:
■ Pinhole camera model
■ Lens model
■ ...
Introduction to Computer Vision 41
Pinhole camera model
aperture
Credits: Wikipedia, Pinhole Camera Model
Introduction to Computer Vision 42
Pinhole camera model
aperture
Credits: Wikipedia, Pinhole Camera Model
Introduction to Computer Vision 43
Pinhole camera model
aperture
Credits: Wikipedia, Pinhole Camera Model
Introduction to Computer Vision 44
Pinhole camera model
where x’i = yi and z = x3
where c is an offset in pixels
Can we make this into a matrix multiplication of the
form p’ = Mp?
Introduction to Computer Vision 45
Intrinsic camera matrix
Credits: Edwin Olson, University of Michigan
Introduction to Computer Vision 46
Intrinsic camera matrix
Credits: Edwin Olson, University of Michigan
Introduction to Computer Vision 47
Intrinsic camera matrix
Credits: Edwin Olson, University of Michigan
Introduction to Computer Vision 48
Intrinsic camera matrix
Credits: Edwin Olson, University of Michigan
Introduction to Computer Vision 49
Intrinsic camera matrix
Credits: Edwin Olson, University of Michigan
Introduction to Computer Vision 50
Camera calibration
Pi
Calibration Rig Image Pci
Credits: Gaurav Pandey, Ford
Introduction to Computer Vision 51
ELB1502
Study Guide
Summary notes and Computer Vision Applications to Robotics
Machine vision is concerned with the sensing of vision data and its interpretation by a computer.
The typical vision system consists of the camera and digitizing hardware, a digital computer and
hardware and software necessary to interface them. This interface hardware and software is often
referred to as a pre-processor. The operation of the vision system consists of three functions:
1. Sensing and digitizing image data
The sensing and digitizing functions involve the input of vision data by means of a camera
focused on the scene of interest. Special lighting techniques are frequently used to obtain an
image of sufficient contrast for later processing. The image viewed by the camera is typically
digitized and stored in computer memory.
The digital image is called frame of vision data and is frequently captured by a hardware device
called as frame grabber. These devices are capable of digitizing images at the rate of 30 frames
per second.
2. Image processing and analysis
The digitized image matrix for each frame is stored and then subjected to image processing and
analysis functions for data reduction and interpretation of the image. These steps are required to
permit the real-time application of vision analysis required in robot applications.
Typically an image frame will be threshold to produce a binary image and then various feature
measurements will further reduce the data representation of the image. This data reduction can
change the representation of a frame from several hundred thousand bytes of raw image data to
several hundred bytes of feature value data. The resultant feature data can be analysed in the
available time for action by the robot system.
3. Application
The third function of a machine vision system is the applications function. The current
applications of machine vision in robotics include inspection part identification, location and
orientation.
The relationship between three function is shown in figure 1
3
ELB1502
Study Guide
Figure 1: Relationship between the three functions
4. Sensing and digitizing function in machine vision
Image sensing requires some type of image formation device such as a camera and a digitizer
which stores a video frame in the computer memory. We divide the sensing and digitizing
functions into several steps.
The initial step involves capturing the image of the scene with the vision camera. The image
consists of relative light intensities corresponding to the various portions of the scene. These light
intensities are continuous analogue values which must be sampled and converted into a digital
form.
The second step, digitizing, is achieved by an analogue to digital converter. The A/D converter is
either part of a digital video camera or the front end of a frame grabber. The choice is dependent
on the type of hardware in the system.
The frame grabber representing the third step, is an image storage and computational device which
stores a given pixel array. The frame grabber can vary in capability from one which simply stores
an image to significant computation capability.
In the more powerful frame grabbers, thresholding, windowing, and histogram modification
calculations can be carried out under computer control. The stored image is then subsequently
processed and analysed by the combination of the frame grabber and the vision controller.
4
ELB1502
Study Guide
4.1.Robotic applications
Robotic application of machine vision falls into three broad categories listed below:
· Inspection
The first category is one in which the primary function is the inspection process. This is carried
out by the machine vision system, and the robot us used in a secondary to support the application.
The objectives of machine vision inspection include checking for gross surface defects, discovery
of flaws in labelling, verification of the presence of components in assembly and checking for the
presence of holes and other features in a part.
When these kinds of inspection operations are performed manually, there is a tendency for human
error. Also, the time required in most manual inspection operations require that the procedures are
carried out automatically using 100 percent inspection and usually in much less time.
· Identification
This is concerned with applications in which the purpose of the machine vision system is to
recognise and classify an object rather than to inspect it. Inspection implies that the part must be
either accepted or rejected. Identification involves a recognition process in which the “part itself,
or its position and/or orientation, is determined.
This is usually followed by subsequent decision and action taken by the robot. Identification
applications of machine vision include part sorting, palletizing and depalletizing and picking parts
that are randomly oriented from a conveyer or bin.
· Visual serving and navigation
In the third category, visual serving and navigation control, the purpose of the vision system is to
direct the actions of the robot based on its visual input.
The generic example of robot visual serving is where the machine vision system is used to control
the trajectory of the robot’s end effector toward an object in the workspace. Industrial examples
of this application include part positioning, retrieving parts moving along a conveyor, retrieving,
and reorienting parts moving along a conveyor, assembly etc.
5
ELB1502
Study Guide
III. Tutorials
1. Q: What is computer vision? A: Computer vision is the field of study that focuses on
enabling computers to interpret and understand visual information from digital images
or videos.
2. Q: How does human vision differ from machine vision? A: Human vision is a complex
process involving the eyes, brain, and perception, while machine vision refers to the
use of computer algorithms and techniques to extract information from images or
videos.
3. Q: How are images represented in computer vision? A: In computer vision, images are
represented as matrices or grids of pixels, where each pixel stores numerical values
representing the color or intensity of the corresponding image location.
4. Q: What are the components of the camera model in computer vision? A: The camera
model includes intrinsic parameters (focal length, principal point) and extrinsic
parameters (position and orientation) that describe the relationship between the 3D
world and 2D image coordinates.
5. Q: What are some robotic applications of machine vision? A: Robotic applications of
machine vision include object recognition and localization, robot navigation, industrial
automation, surveillance, autonomous vehicles, and augmented reality.
6. Q: Define computer vision. A: Computer vision is an interdisciplinary field that focuses
on developing algorithms and techniques for machines to extract, analyze, and interpret
information from digital images or videos.
7. Q: How does the human vision system work? A: The human vision system involves
the eyes capturing light, which is then processed by the brain to form visual perception,
including recognition, depth perception, and object tracking.
8. Q: Explain images as matrices in computer vision. A: In computer vision, images are
represented as matrices, where each element in the matrix represents a pixel value that
encodes color or intensity information.
9. Q: What is the camera model in computer vision? A: The camera model describes the
mathematical relationship between 3D points in the world and their projection onto a
2D image plane. It includes intrinsic and extrinsic parameters.
10. Q: Provide examples of robotic applications that utilize machine vision. A: Examples
include industrial robots for quality control, autonomous vehicles for road scene
understanding, surgical robots for precise image-guided procedures, and drones for
object tracking.
11. Q: How would you define computer vision in the context of robotics? A: In robotics,
computer vision refers to the application of image processing and analysis techniques
to enable robots to perceive and interpret visual information from the environment.
12. Q: What are the primary stages of human vision processing? A: Human vision
processing involves image formation on the retina, feature extraction in the visual
cortex, and higher-level interpretation in the brain for object recognition and
understanding.
13. Q: How can images as matrices be manipulated in computer vision? A: Matrices
representing images can be processed using various techniques, such as filtering, edge
detection, morphological operations, and transformations like rotation or scaling.
14. Q: Explain the concept of intrinsic parameters in the camera model. A: Intrinsic
parameters describe the internal characteristics of the camera, such as focal length,
principal point, and lens distortion, which affect the mapping of 3D points to the image
plane.
6
ELB1502
Study Guide
15. Q: What are some examples of robotic applications that utilize machine vision for
object recognition? A: Examples include industrial robots identifying and sorting
objects on an assembly line, autonomous drones detecting and avoiding obstacles, and
robots in healthcare assisting in surgical procedures.
16. Q: How does the machine vision process contribute to robot navigation? A: Machine
vision allows robots to perceive and understand the environment by analyzing visual
information, which aids in tasks such as obstacle detection, mapping, and localization.
17. Q: Explain the concept of extrinsic parameters in the camera model. A: Extrinsic
parameters define the position and orientation of the camera in the 3D world coordinate
system, enabling the transformation from 3D world points to the 2D image plane.
18. Q: What are some challenges in robotic applications of machine vision? A: Challenges
include handling variations in lighting conditions, occlusions, complex scenes, real-
time processing requirements, and robustness to noise and uncertainties.
19. Q: How does machine vision contribute to industrial automation? A: Machine vision
systems are used in industrial automation for tasks such as quality control, defect
detection, object sorting, robotic assembly, and visual inspection.
20. Q: What is the role of computer vision in robot navigation? A: Computer vision enables
robots to perceive and interpret the environment, allowing them to understand
obstacles, landmarks, and spatial relationships. This information is crucial for tasks
such as mapping, localization, path planning, and obstacle avoidance. By analyzing
visual data from cameras or other sensors, robots can make informed decisions to
navigate their surroundings safely and efficiently. Computer vision provides valuable
input for autonomous navigation systems, enabling robots to adapt to dynamic
environments and handle complex scenarios.
IV. Exercises and problems:
1. Q: Convert a color image with dimensions 640x480 pixels into a grayscale image.
Calculate the resulting image size.
A: The resulting image size will be 640x480 pixels since a grayscale image has only
one channel.
2. Q: Given an image represented as a 3x3 matrix, perform element-wise multiplication
by a scalar value of 2.
A: If the original image matrix is
1 2 3
[4 5 6]
7 8 9
, the resulting image matrix will be
2 4 6
[ 8 10 12]
14 16 18
3. Q: Calculate the total number of pixels in a grayscale image with dimensions 800x600
pixels.
A: The total number of pixels will be 800 * 600 = 480,000 pixels.
4. Q: Given a camera with a focal length of 50 mm and an object distance of 2 meters,
1 1 1
calculate the image distance using the camera model equation = + )
𝑓 𝑑0 𝑑𝑖
A: Using the camera model equation, the image distance (𝑑𝑖 ) will be 0.040 meters or
40 millimeters.
7
ELB1502
Study Guide
5. Q: Determine the intrinsic matrix K given the camera's focal length of 500 pixels and
principal point coordinates (320, 240).
A: The intrinsic matrix K will be:
500 0 320
[ 0 500 240]
0 0 1
6. Q: Calculate the aspect ratio of an image with dimensions 1024x768 pixels.
A: The aspect ratio is calculated by dividing the width by the height, resulting in
1024/768 ≈ 1.3333.
7. Q: Given an RGB image with dimensions 640x480 pixels, calculate the total number
of color channels.
A: RGB images have three color channels (Red, Green, and Blue). So, the total number
of color channels will be 3.
8. Q: Determine the field of view (FOV) of a camera with a focal length of 35 mm and
an image sensor size of 22.3 mm x 14.9 mm.
A: The horizontal FOV can be calculated using the formula:
𝑠𝑒𝑛𝑠𝑜𝑟𝑤𝑖𝑑𝑡ℎ
𝐹𝑂𝑉 = 2 × tan−1 ( )
2 × 𝑓𝑜𝑐𝑎𝑙𝑙𝑒𝑛𝑔𝑡ℎ
Substituting the values, the horizontal FOV will be approximately 37.1 degrees.
9. Q: Given a camera with a pixel size of 5 μm and a resolution of 2048x1536 pixels,
calculate the physical size of the image sensor.
A: The physical size of the image sensor can be calculated by multiplying the pixel
size by the resolution. In this case, it will be 5 μm * 2048 x 5 μm * 1536 ≈ 10.24 mm
x 7.68 mm.
10. Q: Calculate the Euclidean distance between two points A(3, 4) and B(7, 1) in an
image.
A: The Euclidean distance can be calculated using the formula:
√(𝑥2 − 𝑥1 )2 + (𝑦2 − 𝑦1 )2
Substituting the values, the Euclidean distance between points A and B will be
√(7 − 3)2 + (1 − 4)2 = √(16 + 9) = √25 = 5 𝑢𝑛𝑖𝑡𝑠
11. Problem: A robot is equipped with a camera that captures images at a resolution of
800x600 pixels. Each pixel represents a 0.1 cm x 0.1 cm area in the real world. The
robot needs to determine the size of an object in the image. If the object occupies 200
pixels in width, what is its size in centimeters?
Solution: The size of the object in centimeters can be calculated by multiplying the
number of pixels by the pixel size. In this case, the object size is 200 pixels * 0.1
cm/pixel = 20 cm.
12. Problem: A robot is using machine vision to detect defects on manufactured parts. The
camera captures images at a rate of 30 frames per second. If each image processing
operation takes 20 milliseconds to complete, what is the maximum number of parts the
robot can inspect per minute?
Solution: The time taken to process one image is 20 milliseconds. Therefore, the robot
can process 1000 milliseconds / 20 milliseconds = 50 images per second. Multiplying
this by 60 seconds gives a maximum inspection rate of 50 images/second * 60 seconds
= 3000 parts per minute.
13. Problem: A robot is performing object recognition using machine vision. The camera
captures images at a resolution of 1280x960 pixels. The robot's algorithm requires the
images to be converted to grayscale. If each grayscale conversion operation takes 5
8
ELB1502
Study Guide
milliseconds, what is the total processing time for a video sequence of 100 frames?
Solution: The time taken to convert one frame to grayscale is 5 milliseconds.
Therefore, the total processing time for 100 frames will be 100 frames * 5 milliseconds
= 500 milliseconds or 0.5 seconds.
14. Problem: A robot is equipped with a depth-sensing camera that measures the distance
of objects in a scene. The camera has a depth resolution of 1 millimeter. If the robot
detects an object at a distance of 5 meters, what is the depth measurement accuracy in
centimeters?
Solution: The depth measurement accuracy is equal to the depth resolution, which is 1
millimeter. Converting this to centimeters gives an accuracy of 0.1 centimeters.
15. Problem: A robot is using machine vision to navigate through a maze. The camera
captures images at a resolution of 640x480 pixels. The robot's algorithm requires the
images to be resized to a resolution of 320x240 pixels. If each image resizing operation
takes 10 milliseconds, what is the processing time for a video sequence of 50 frames?
Solution: The time taken to resize one frame is 10 milliseconds. Therefore, the total
processing time for 50 frames will be 50 frames * 10 milliseconds = 500 milliseconds
or 0.5 seconds.
V. References
[1] J. J. Craig, Introduction to Robotics, Global Edition, 3rd ed. Harlow, England: Pearson
Education Limited, 2014