[go: up one dir, main page]

0% found this document useful (0 votes)
9 views31 pages

Lecture 1 AI Summary

The document provides an overview of biological vision and computer vision, detailing the processes involved in perceiving and interpreting visual information. It discusses key concepts such as image formation, digitization, and the challenges faced in computer vision, including occlusion and low illumination. Additionally, it highlights various applications of computer vision, including medical imaging, self-driving cars, and optical character recognition.

Uploaded by

g23ai2114
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views31 pages

Lecture 1 AI Summary

The document provides an overview of biological vision and computer vision, detailing the processes involved in perceiving and interpreting visual information. It discusses key concepts such as image formation, digitization, and the challenges faced in computer vision, including occlusion and low illumination. Additionally, it highlights various applications of computer vision, including medical imaging, self-driving cars, and optical character recognition.

Uploaded by

g23ai2114
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

1.

Biological Vision

What is it?

● Biological vision is the ability of living beings to perceive their surroundings using
their eyes.

● This process involves:

1. Reception: Eyes capture visual information as light enters them.

2. Transmission: The captured signals are sent to the brain via the optic nerve.

3. Interpretation: The brain processes these signals to form a "visual


perception."

Why is this important in Computer Vision?

● Biological vision serves as inspiration. If we can understand how humans see and
interpret the world, we can try to replicate this process in machines.

Key Research:

● In the 1960s, Hubel and Wiesel identified:

o Simple cells in the brain that detect edges or bars of light at specific
orientations.

o Complex cells that detect edges but are less sensitive to position.

Example:

● When you look at a flower, simple cells might detect its edges, while complex cells
recognize its structure, regardless of where you’re focusing.

2. Computer Vision

What is it?

● Computer vision is the process of enabling computers to understand visual


information (images, videos).

● It’s not just about taking a photo (like in photography); it's about extracting
meaningful insights from it.

Key Idea:
● Machines need to "see" to:

o Recognize objects in images (e.g., identifying a cat).

o Understand events in videos (e.g., detecting someone waving).

How It Works (Simplified Flow):

1. Capture: A digital camera captures the image/video.

2. Preprocessing: Noise is removed, and the image is enhanced.

3. Feature Extraction: Key information (edges, textures) is identified.

4. Analysis: Algorithms classify objects, detect patterns, or predict outcomes.

Example:

● A self-driving car uses cameras to identify road signs, pedestrians, and obstacles.

3. Issues with Computer Vision

a. Inverse Problem

What is the inverse problem?

● In computer graphics, we create a 2D image from a 3D scene (forward problem).

● In computer vision, we try to reverse this process:

o From 2D images, infer 3D properties like shape, depth, and illumination.

Why is it hard?

● The same 2D image could correspond to many different 3D scenes. For example:

o A flat photo of a mountain gives no clue about its depth.

b. Challenges

1. Occlusion: Some objects may block others in the scene, making it hard to identify all
objects.

o Example: A person standing behind a chair.

2. Low Illumination or Resolution: Dark or blurry images make object detection


difficult.

o Example: Identifying faces in a low-light CCTV video.


3. Viewpoint Variability: Objects look different from different angles.

o Example: A cup viewed from the top vs. the side.

4. Image Formation Process

What happens when an image is captured?

● Light from the real world enters a camera (or eyes) and gets converted into a 2D
image.

Steps:

1. Light Interaction: Light reflects off objects and enters the lens.

2. Projection: The camera lens projects the light onto an image sensor, forming a 2D
representation.

3. Digitization: The continuous light signals are converted into discrete pixel values.

Mathematical Representation: An image is represented as a function I(x,y)):

● x,y: Coordinates of the pixel.

● I(x,y): Intensity (brightness) at that pixel.

Example:

o 0: Black

o 255: White

5. Digitization

What is it?
● Digitization converts a continuous image into discrete pixels and intensities.

a. Sampling

● Dividing the image into a grid of pixels.

● Higher sampling rate = more pixels = better resolution.

o Example: A 1080p image has more details than a 240p image.

b. Quantization

● Assigning intensity values to pixels.

● Example: Grayscale images usually have intensity values between 0 (black) and 255
(white).

Trade-off:

● Higher resolution and more intensity levels = better image quality but require more
storage.

6. Types of Images

1. Binary Images: Pixels are either black (0) or white (1).

2. Grayscale Images: Pixels have intensity values from 0 to 255.

3. Color Images: Each pixel has three intensity values for red, green, and blue (RGB).

Example:

● A pixel in a grayscale image might have a value of 128 (medium gray).

● A pixel in a color image might have RGB values (255, 0, 0), representing bright red.

7. Applications of Computer Vision

Computer vision has a wide range of applications, enabling machines to perform tasks that
typically require human visual understanding. Below are some common and impactful
applications:

1. Optical Character Recognition (OCR)


● What it does: Converts images of text (printed or handwritten) into machine-
readable text.

● Example:

o Scanning a document to extract its text for editing.

o Recognizing numbers from handwritten bank cheques.

How it works:

1. Preprocessing: The image is cleaned to enhance the contrast and remove noise.

2. Segmentation: The text is broken into lines, words, and characters.

3. Feature Extraction: Patterns of strokes, edges, and shapes are identified.

4. Classification: Algorithms match the patterns to known characters (A, B, C, etc.).

Mathematics (Feature Matching):

● Features like edge gradients and pixel distributions are compared using similarity
metrics (e.g., Euclidean distance).

2. Medical Imaging

● What it does: Assists doctors in diagnosing diseases from X-rays, MRIs, and CT scans.

● Example:

o Detecting tumors in brain scans.

o Identifying fractures in bone X-rays.

How it works:

● Algorithms process high-resolution images to detect abnormalities.

● Techniques like edge detection and region segmentation help isolate areas of
interest (e.g., a tumor).

Deep Learning Use:

● Convolutional Neural Networks (CNNs) are often used for this task, as they excel at
pattern recognition in images.
3. Self-Driving Cars

● What it does: Allows vehicles to "see" their environment and navigate without
human input.

● Example:

o Detecting traffic lights, pedestrians, and road signs.

How it works:

1. Object Detection: Identifying objects like vehicles, humans, and obstacles.

2. Lane Detection: Recognizing road boundaries.

3. Motion Prediction: Predicting the movement of other objects to avoid collisions.

4. Surveillance

● What it does: Monitors areas for security purposes.

● Example:

o Face recognition in public spaces.

o Detecting suspicious activity using motion detection.

How it works:

1. Cameras capture footage.

2. Algorithms analyze the footage to track objects and recognize faces.

3. Anomalies (e.g., unauthorized access) are flagged.

5. 3D Model Building (Photogrammetry)

● What it does: Constructs 3D models of objects or scenes from multiple images.

● Example:

o Creating 3D maps of buildings or terrains.


How it works:

1. Images are captured from different angles.

2. Keypoints (like edges or corners) are matched across images.

3. Geometry is reconstructed using mathematical techniques like triangulation.


Interaction Between the 3 R’s

The three processes work together:

1. Recognition helps reconstruction: Identifying objects provides context for better 3D


modelling.

2. Reconstruction helps reorganization: 3D information (e.g., depth) improves


segmentation.

3. Reorganization helps recognition: Grouping regions simplifies object identification.


1. Example:

o A tree in 3D space is captured as a flat, 2D image by your camera.

2. Sampling and Quantization:

o Sampling: The continuous scene is divided into discrete pixels.

o Quantization: The intensity (brightness) of each pixel is converted into a

numeric value.

Filtering
What is it?

● A technique to process and modify images to enhance certain features or suppress


unwanted noise.

Types of Filters

1. Low-Pass Filters (Smoothing):

o Removes high-frequency noise (sharp edges, details) to create a smoother


image.

o Example: Blurring an image to reduce noise.

Applications of Filtering
1. Noise Removal:

o Smoothing noisy images (e.g., blurry CCTV footage).

2. Enhancing Features:

o Highlighting edges in X-ray images.

3. Preprocessing:

o Preparing images for further analysis (e.g., before object detection).


Types of Digitized Images

1. Binary Images:

o Each pixel has only two possible values: 0 (black) or 1 (white).

2. Grayscale Images:

o Each pixel represents a single intensity value, typically ranging from 0 (black)
to 255 (white).

3. Color Images:

o Each pixel has three intensity values (Red, Green, Blue).

o Example: [255,0,0] represents bright red.

Applications of Digitization

1. Medical Imaging:

o High-resolution digitization helps detect small abnormalities.


2. Image Compression:

o Reduces the number of samples and quantization levels to save storage.

3. Image Analysis:

o Enables machine learning models to process and classify images.


Why Resolution Matters for Applications Like Medical Imaging

● In medical imaging (e.g., MRI or CT scans), high resolution is critical to capture fine
details for accurate diagnosis.

● Example: A low-resolution brain scan might miss small abnormalities (like a tumor)
that a high-resolution scan would reveal.

Key Trade-offs: Resolution vs. Storage

● High Resolution: Better image quality but higher storage and processing
requirements.

● Low Resolution: Saves storage but sacrifices quality.

● Applications must balance these based on their requirements. For example:

o Streaming platforms may adjust resolution dynamically based on internet


speed.

o Surveillance systems might use lower resolution to save storage but record at
higher resolutions during critical events.
Practical Solutions and Examples

● Deep Learning: Modern algorithms like YOLO or Mask-RCNN are designed to handle
occlusion, variability, and illumination differences by learning from large datasets.

● Preprocessing: Techniques like noise reduction, histogram equalization, and edge


detection improve image quality before analysis.
Example Workflow

1. Take two images of the same scene from slightly different angles.

2. Identify corresponding points in the images.

3. Compute disparities and use triangulation to estimate depth.


5. Practical Example: Panoramic Stitching

● Task: Stitch two overlapping images into a single panoramic view.

● Workflow:

1. Detect key points in both images (e.g., using SIFT or ORB).

2. Match corresponding points.

3. Compute the homography matrix H.

4. Warp one image to align with the other using H.

5. Blend the images into a seamless panorama.


+

You might also like