[go: up one dir, main page]

0% found this document useful (0 votes)
33 views10 pages

Unit 1 Computer Vision

Computer Vision is a branch of AI focused on enabling machines to interpret and understand visual information from images and videos. Key tasks include image classification, object detection, and motion tracking, with applications in areas like autonomous vehicles, medical imaging, and security systems. The document also discusses image formation, capture, representation, and techniques like linear filtering, correlation, and convolution for image processing.

Uploaded by

anithakumaran29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views10 pages

Unit 1 Computer Vision

Computer Vision is a branch of AI focused on enabling machines to interpret and understand visual information from images and videos. Key tasks include image classification, object detection, and motion tracking, with applications in areas like autonomous vehicles, medical imaging, and security systems. The document also discusses image formation, capture, representation, and techniques like linear filtering, correlation, and convolution for image processing.

Uploaded by

anithakumaran29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

UNIT 1-COMPUTER VISION BASICS

COMPUTER VISION

Computer Vision is a branch of Artificial Intelligence (AI) that enables computers to


acquire, process, analyze, and understand images or videos, and make decisions or take actions
based on that information.

Computer Vision is the technology that allows machines to gain understanding from images and
videos.

Key Goals of Computer Vision:

 Detect and recognize objects


 Classify and label images
 Track motion in videos
 Understand scenes and environments

Objective:
To simulate human vision by enabling machines to:
 See (capture images)
 Understand (interpret objects, scenes, motion)

 Make decisions (based on visual input)

Key Tasks in Computer Vision:

 Image classification – Identifying what is in an image


 Object detection – Locating and identifying multiple objects

 Image segmentation – Dividing an image into meaningful parts

 Facial recognition – Identifying individuals from their facial features

 Motion tracking – Analyzing movement in video sequences

Applications of Computer Vision:

 Autonomous Vehicles – Object and lane detection


 Medical imaging Healthcare – Analyzing X-rays, MRIs, and scans
 Surveillance and security systems
 Augmented and Virtual Reality
 Security Systems – Face and activity recognition
 Manufacturing – Quality inspection using cameras

 Retail & Marketing – Customer behavior analysis

Related Fields/Disciplines

 Artificial Intelligence (AI)


 Machine Learning (ML)

 Computer Graphics

 Image Processing

 Robotics

 Computer Vision aims to enable machines to perceive, interpret, and understand visual
information from the world. Below are its key goals along with purposes and examples.
Goals of Computer Vision
1. Image Understanding
→ Understand content in an image
Example: Google Photos grouping pictures by person or location
2. Object Recognition & Classification
→ Identify and classify objects
Example: Amazon Go stores recognizing items for checkout
3. Object Detection & Localization
→ Detect objects and their positions
 Object Detection is a computer vision technique that identifies and locates objects
within an image or video.
 Localisation refers to identifying the position of the detected object in the image,
usually by drawing a bounding box around it.

Example: Face detection in mobile phone cameras


4. Scene Reconstruction
→ Create 3D models from 2D images
Example: Augmented Reality (AR) in interior design apps

5. Motion Analysis & Tracking


→ Track moving objects in video
Example: CCTV tracking a person’s movement in real

6. Image Restoration & Enhancement


→ Improve image quality
Example: AI tools restoring old or blurred photographs
7. Automation & Robotics
→ Help machines interact with surroundings
Example: Self-driving cars detecting roads and obstacles

8. Face & Text Recognition


→ Identify faces or read text in images
Example: Passport scanners at airports, Google Lens for text

Advantages of Computer Vision:

1. Automation and Speed: Processes visual data much faster than humans (e.g., inspection
in factories).Enables real-time decisions in applications like self-driving cars.
2. Accuracy and Consistency: Reduces human error in tasks like medical image analysis
or quality control.
3. Handles Large Volumes of Data
Can process and analyze vast amounts of image or video data that would be
overwhelming for humans.

Disadvantages of Computer Vision:

1. High Initial Cost and Complexity: Requires expensive hardware and large datasets for
training.
2. Limited in Unstructured Environments: Performance may drop in poor lighting,
cluttered scenes, or unfamiliar situations.
3. Privacy Concerns:Widespread surveillance and facial recognition can raise ethical and
legal issues.

IMAGE FORMATION

Image formation is the process of capturing a visual representation of a scene using a


camera or sensor and converting it into a digital image that a computer can process.

Key Steps in Image Formation:


1. Light Reflection from Objects
o Light from a source (like the sun or a bulb) reflects off objects in the scene.

2. Camera Lens Captures Light


o The reflected light passes through a camera lens, which focuses it to form an
image.
3. Projection onto Image Sensor
o The focused light hits a sensor (like CCD or CMOS) in the camera, converting it
into electrical signals.
4. Conversion to Digital Image
o The signals are digitized into pixels — small units that represent brightness and
color.

Example:

When you take a photo of a tree using a smartphone:

o The tree reflects light.


o The phone's lens captures and focuses that light.

o The sensor records the light and produces a digital image.

IMAGE CAPTURE

Definition:
Image capture refers to the process of recording the formed image using a sensor and converting
it to a digital format.
Steps/Process:
 Analog Signal Generation: The sensor detects light intensity.
 Analog-to-Digital Conversion (ADC): Converts analog signals to digital pixel values.

 Image Storage: The digital image is stored in memory (as JPG, PNG, etc.).
 CCD (Charge-Coupled Device) and CMOS (Complementary Metal-Oxide
Semiconductor) are image sensors used in cameras to capture light and convert it into
digital images.

Example:
CCTV camera records a video stream in a store. It captures continuous frames per second and
stores them in a digital video format.

IMAGE REPRESENTATION
Definition:
Image representation is how a digital image is stored and processed in a computer system.

Types:

 Grayscale Image: Each pixel has one intensity value (0–255).


 Color Image (RGB): Each pixel has three components – Red, Green, Blue.
 Binary Image: Pixels are either 0 (black) or 1 (white).

Image as Matrix:

 An image is represented as a 2D (grayscale) or 3D (color) matrix of pixels.

Example:
Face recognition systems convert captured facial images into pixel matrices to compare and
identify people

Summary Table:

Concept Meaning Real-Time Example


Converting scene into image using
Image Formation Taking a photo using a camera
optics
Image Capture Digitizing and storing the image CCTV recording a video
Image Storing image as a pixel matrix in Face recognition software
Representation digital format processing an image

LINEAR FILTERING, CORRELATION, AND CONVOLUTION

Linear filtering, correlation, and convolution are fundamental operations in image


processing and computer vision. They are used to manipulate or extract features from images.

What is a Kernel?
A kernel (or filter or mask) is a small matrix used in image processing. It is applied to each pixel
of an image to change its value based on its neighbors. Common sizes are 3×3, 5×5, etc.

 It moves over each pixel in the image (this is called convolution or correlation).
 At each position, it performs a calculation using neighboring pixel values to produce a
new value.
EX:
In photo editing apps, when you apply blur or sharpen, the app is using different kernels behind
the scenes

Example 3×3 Kernel:This kernel averages the pixel values in a 3×3 neighborhood.Useful for
blurring or smoothing an image.
[1/9 1/9 1/9]
[1/9 1/9 1/9]
[1/9 1/9 1/9]

If you use a 5×5 kernel (25 pixels), each element will be 1/25.
If you use a 2×2 kernel, each element will be 1/4, and so on.

Kernel Size Number of Pixels Divide By


3×3 9 9
5×5 25 25
2×2 4 4
Kernel Size Values in Kernel Value of Each Cell
3×3 9 1/9
5×5 25 1/25
2×2 4 1/4

Center Pixel Concept


When a kernel is placed over a patch of an image, the pixel at the center of the patch is called the
center pixel. The kernel calculates a new value for this center pixel using its neighbors.

Example image patch:


[10 20 30]
[40 50 60]
[70 80 90]
Here, 50 is the center pixel.

Why only the middle? Because when the kernel slides across the image, we assign the calculated
result to the position of the center pixel in the output image. This ensures the output image size
remains the same.

LINEAR FILTERING

Definition: A process of applying a filter (or kernel) to an image to enhance certain features (like
edges) or reduce noise.

 The image is processed by sliding a kernel (small matrix) across it.


 Each pixel is updated based on the weighted sum of neighboring pixels.

Use cases:

 Noise reduction (e.g., Gaussian blur)


 Edge detection
 Smoothing
Common Filters:

 Mean Filter: Averages surrounding pixels – smoothing


 Gaussian Filter: Weighted average – less blurring than mean

 Laplacian Filter: Edge enhancement

Formula:

Output(x, y) = Σ Σ [ Image(x+i, y+j) × Kernel(i, j) ]

Example: Mean Blur Kernel (3×3):


[1/9 1/9 1/9]
[1/9 1/9 1/9]
[1/9 1/9 1/9]
Applied to:
[10 20 30]
[40 50 60]
[70 80 90]

Take all 9 numbers, add them, and then divide by 9.


This gives the average – that's why it blurs or smooths the image.

Take all 9 pixels:


10 + 20 + 30 + 40 + 50 + 60 + 70 + 80 + 90 = 450
Now divide by 9:
450 / 9 = 50
Sum = 450, divide by 9 = 50 → new value for center pixel.

So, the center pixel (50) stays the same in this case,
but in real images, this would smooth sharp edges and reduce noise.
CORRELATION
Correlation measures similarity between the kernel and the image patch. We slide the kernel over
the image, multiply corresponding values, and sum them. The kernel is NOT flipped.

Formula:

Output(x, y) = Σ Σ [ Image(x+i, y+j) × Kernel(i, j) ]

Example Kernel:
[0 1 0]
[1 -4 1]
[0 1 0]
Image Patch:
[1 2 3]
[4 5 6]
[7 8 9]
Calculation: (1×0)+(2×1)+(3×0)+(4×1)+(5×-4)+(6×1)+(7×0)+(8×1)+(9×0) = 0

In a face detection system, correlation helps to:

 Detect eyes or nose by comparing parts of the image with a known pattern.
 If a part of the image matches the kernel (like the shape of an eye), the correlation result
is high.

3. CONVOLUTION
Convolution is similar to correlation but the kernel is flipped horizontally and vertically before
applying. In many cases, if the kernel is symmetric, flipping has no effect.

What Does “Kernel Flipped” Mean?


When we say "flipping a kernel", we mean reversing the kernel in both directions:
1. Flip Horizontally (Left–Right)
 Switch columns from left to right.
2.Flip Vertically (Top–Bottom)
Switch rows from top to bottom.
In mobile photo editing apps, convolution is behind effects like sharpen, emboss, or blur.

Formula:

Output(x, y) = Σ Σ [ Image(x+i, y+j) × Kernel(-i, -j) ]

Example:
Kernel before flip:
[0 1 0]
[1 -4 1]
[0 1 0]
After flipping (same in this symmetric case), applying to:
[1 2 3]
[4 5 6]
[7 8 9]
Result = 0.

General Formula for Correlation / Linear Filtering


The general formula for applying a kernel (size: (2m+1) × (2n+1)) to an image is:

g(x, y) = Σ (i = -m to m) Σ (j = -n to n) [ f(x + i, y + j) × h(i, j) ]


where:
- g(x, y) is the output image pixel value at (x, y)
- f(x + i, y + j) is the input image pixel value at the corresponding position
- h(i, j) is the kernel value at position (i, j)
- m, n define the kernel size (for 3×3 kernel, m = 1, n = 1)

EDGE DETECTION

What is an Edge?

An edge in an image is a point where the brightness (intensity) of the image changes sharply. It
marks the boundary between two regions, such as between an object and the background.

Why Detect Edges?

Edge detection helps:

 Identify object boundaries


 Reduce the amount of data

 Extract important features for further processing (like object recognition, segmentation,
etc.)

Types of Edges

1. Step Edge – sudden change in intensity.


2. Ramp Edge – gradual change in intensity.

3. Line Edge – bright line on a dark background.

4. Roof Edge – sharp peak (similar to ramp but thinner).

Common Edge Detection Operators


Operator Description

Sobel Uses gradient magnitude in horizontal and vertical directions.

Prewitt Similar to Sobel but simpler masks.

Canny Advanced method with noise reduction and edge thinning.

Lane Detection in Self-Driving Cars

 The car's camera captures road images.


 Edge detection highlights white lane markings.
 Helps the car stay in the correct lane.

You might also like