Image and Video Processing Notes
Image and Video Processing Notes
Image_and_Video_processing_all_Sli…
PDF
Based on your full syllabus from the "Image and Video Processing" course, I’ve organized
important topics and provided detailed study notes for each. This will help you focus your
revision and ensure coverage of all key concepts for the end-semester exam.
Quantization: Converts continuous amplitude (gray levels) into discrete levels. Controls
intensity precision.
2. Image Representation
Grayscale: 0–255 intensity values.
3. Histogram Processing
Histogram: Plot of pixel intensity vs. frequency.
1/73
Median Filter: Reduces salt-and-pepper noise.
Gaussian Filter: Weighted averaging using a Gaussian kernel (better edge preservation).
5. Convolution
Convolution Operation: Apply a kernel to an image via sliding window and dot product.
6. Edge Detection
Prewitt/Sobel: First-order derivatives (detect gradients).
Circle Detection: Detect circular shapes using parameter space (a, b, r).
3. Color Models
RGB: Red, Green, Blue — additive model.
4. Morphological Operations
2/73
Dilation: Expands object boundaries.
4. Image Compression
Lossless: No information loss (e.g., PNG).
DCT (Discrete Cosine Transform): Used in JPEG, focuses energy in low frequencies.
3/73
🔴 MODULE 4: Advanced Topics and Video Processing
1. Face Detection
Viola-Jones: Uses Haar-like features and AdaBoost.
2. Dimensionality Reduction
PCA (Principal Component Analysis): Reduces high-dimensional data while preserving
variance.
3. Video Processing
Video Formation: Sequence of frames.
4/73
Would you like a printable summary PDF or flashcards for rapid revision?
Got it! ✅
Please go ahead and ask your questions.
Detailed,
It is a point operation: The output value at a pixel depends only on the input value at
that same pixel.
Mathematically:
s = T (r)
where:
s = output intensity
r = input intensity
T = transformation function
5/73
🔶 Types of Intensity Transformations
1. Identity Transformation
Nothing changes.
s=r
2. Image Negative
Formula:
s=L−1−r
10 245
20 235
30 225
40 215
3. Log Transformation
Formula:
6/73
s = c ⋅ log(1 + r)
where:
255
c= log(1+rmax )
Now compute:
r s = c × log(1 + r)
Formula:
s = c ⋅ rγ
10 200 150
20 100 90
7/73
r
70 50 220
Apply:
Try for:
Formula:
Omax − Omin
s = (r − Imin ) ⋅ + Omin
Imax − Imin
where:
r = 10
Imin = 5, Imax = 150
255 − 0 255
s = (10 − 5) ⋅ +0=5⋅ ≈ 8.8
150 − 5 145
8/73
So, new intensity s ≈9
(b) Thresholding
s={
0, if r < T
L − 1, if r ≥ T
🔚 Summary Table
Method Formula Purpose
Let me know if you’d like practice questions or want to go deeper into any transformation!
now do the same for the next topic in the slides which is
Histograms
9/73
✅ Histograms in Image Processing
🔷 What is a Histogram?
A histogram is a graphical representation of the distribution of pixel intensity values in a
digital image.
🔷 Histogram Terminology
h(rk ) = nk :
Normalized Histogram:
h(rk )
P (rk ) =
M ⋅N
L−1
∑ P (rk ) = 1
k=0
Where:
10/73
1. Image Enhancement
→ Improves contrast and visibility
2. Thresholding
→ Used to convert grayscale image to binary
3. Image Segmentation
→ Identifying object/background regions
4. Image Analysis
→ Statistical analysis of intensity distribution
✅ Histogram Examples
🔶 Example 1 (Slide 46):
Given a 3×3 grayscale image:
1 2 6
6 1 3
1 6 6
1 3 3/9 = 0.33
2 1 1/9 ≈ 0.11
3 1 1/9 ≈ 0.11
6 4 4/9 ≈ 0.44
Others (0,4,5,7) 0 0
11/73
✅ Histogram Equalization (Slides 47–51)
🔶 Goal:
Enhance image contrast by spreading out frequent intensity values to utilize full dynamic
range (0 to 255 for 8-bit images).
🔶 Steps:
1. Compute PDF:
h(rk )
P (rk ) =
MN
2. Compute CDF (Cumulative Distribution Function):
k
Sk = T (rk ) = (L − 1) ⋅ ∑ P (rj )
j=0
rk nk P (rk ) = nk /4096
Sk
Approx. Value
12/73
rk nk P (rk ) = nk /4096
Sk
Approx. Value
7 81 0.02 7.00 7
📌 After Equalization:
Many values get mapped to higher levels.
📊 Summary Table
Concept Description
Sk Formula
Sk = (L − 1) ⋅ ∑kj=0 P (rj )
Let me know if you’d like a step-by-step Python code version of histogram equalization or
have a follow-up question!
13/73
Great! Let's now dive into Image Convolution and Image Smoothing, based strictly on your
slides (Pages 52–66). I'll break it into clear, well-structured parts with examples and details
retained from the original content.
✅ Image Convolution
🔷 What is Convolution?
Convolution is a fundamental operation in image processing. It involves applying a small
matrix called a kernel or filter to an image to modify its appearance — like blurring,
sharpening, edge detection, etc.
📌 Intuition:
A kernel (like a 3×3 matrix) slides over the image.
For each position, a weighted sum of neighborhood pixels is computed using the
kernel.
🔷 Convolution Formula:
a b
g(x, y) = ∑ ∑ w(i, j) ⋅ f (x + i, y + j)
i=−a j=−b
14/73
🔶 Example: (Slide 52–53)
3×3 Kernel:
W1 W2 W3
W = W4
W5
W6
W7 W8 W9
Image patch:
X1 X2 X3
X = X4 X5 X6
X7 X8 X9
W1 X1 + W2 X2 + ... + W9 X9
🔁 Repeat this for each position in the image, typically with padding to maintain size.
Kernel:
1 1 1
1
1 1 1
9
1 1 1
1 2 3
4 5 6
7 8 9
Sum = 45 → Average = 45 / 9 = 5
16/73
1 2 1
1
2 4 2
16
1 2 1
🧮 Max of [1, 2, 5, 3, 4] = 5
🧮 Min of [1, 2, 5, 3, 4] = 1
Used in applications like morphological operations (e.g., dilation/erosion).
17/73
Filter Type Good For Downsides
✅ Summary
Concept Description
Median Filter Replaces with middle value — great for salt-and-pepper noise
Visual examples or
18/73
Excellent question! Let's now explain in detail how to compute the "updated nk " and
"updated P (rk )" in Histogram Equalization, exactly as per your slides (Slide 51).
n
P (rk ) Normalized histogram = M ⋅N
k
📋 Original Table
nk
rk nk P (rk ) =
4096
Sk = T (rk )
Approx. Sk
19/73
nk
rk nk P (rk ) =
4096
Sk = T (rk )
Approx. Sk
7 81 0.02 7.00 7
Now, we need to group all original pixels that map to the same new intensity Sk .
Example:
New Intensity 1:
rk = 0 ⇒ S k ≈ 1
nk = 790
→ Updated n1 = 790
New Intensity 3:
rk = 1 ⇒ S k ≈ 3
nk = 1023
→ Updated n3 = 1023
New Intensity 5:
rk = 2 ⇒ S k ≈ 5
nk = 850
→ Updated n5 = 850
New Intensity 6:
20/73
rk = 3 ⇒ Sk ≈ 6 → n = 656
rk = 4 ⇒ Sk ≈ 6 → n = 329
New Intensity 7:
rk = 5, 6, 7 ⇒ Sk ≈ 7
Using:
Updated nk nk
Updated P (rk ) = =
So:
New Intensity Sk
Updated nk
Updated P (rk )
790
1 790 4096
≈ 0.193
1023
3 1023 4096
≈ 0.25
850
5 850 4096
≈ 0.207
985
6 985 4096
≈ 0.24
448
7 448 4096
≈ 0.11
🔚 Summary
Step Description
21/73
Step Description
updated nk
5 Compute updated P (rk ) =
total pixels
Great! Let’s now explain Edge Detection from your slides (Pages 72–93). This is a very
important topic for your exam, so we’ll go step-by-step, clearly and completely, while staying
true to the content.
🔍 Why Important?
Edges represent object boundaries, shapes, and significant features in an image — crucial
for tasks like:
Object detection
Image segmentation
Feature extraction
22/73
Scene analysis
🔷 First Derivative:
Highlights intensity change rate.
🔷 Second Derivative:
Highlights change of gradient.
23/73
Derivative Key Property
∂f ∂f
∇f = [ , ]
∂x ∂y
Gradient Magnitude:
Gradient Direction:
θ = tan−1 ( )
Gy
Gx
Horizontal:
−1 0 1
−1 0 1
−1 0 1
Vertical:
24/73
−1 −1 −1
0
0 0
1 1 1
Horizontal:
−1 0 1
−2 0 2
−1 0 1
Vertical:
−1 −2 −1
0
0 0
1 2 1
Kernel:
−1 −1 −1
−1 8 −1
−1 −1 −1
Responds to regions of rapid intensity change.
25/73
Combines Gaussian smoothing + Laplacian edge detection.
📌 Steps:
1. Gaussian Smoothing: Reduces noise.
4. Hysteresis Thresholding:
📝 Features:
Good Detection
Good Localization
26/73
Feature Prewitt Sobel LoG Canny
📊 Summary
Concept Description
Python implementations
Practice questions
27/73
"Lect. Week: 17th March to 21st March 2025"
"Module 3"
Topics listed include:
“Image Enhancement in Frequency Domain”
“Fourier Transform”
“Image Compression”
“Discrete Cosine Transform (DCT)”
“Optical Flow”
Sure! Let’s now break down Image Enhancement in the Frequency Domain, as introduced
in Module 3 starting on Page 155 of your slides.
We’ll explain:
Fourier transforms,
👉 In frequency domain, we analyze how fast pixel intensities change (not their direct
values).
28/73
✅ 1. Transform Domain Basics (Slide 156–158)
🔷 What is a Transform?
A transform is a mathematical tool that converts data from one domain to another.
For images:
Each frequency component shows how often intensity changes over space.
Inverse FT:
∞
f (t) = ∫
F (u) ⋅ ej2πut du
−∞
x=0 y=0
Inverse DFT:
29/73
M −1 N −1
1
∑ ∑ F (u, v) ⋅ ej2π( M + N )
ux vy
f (x, y) =
MN u=0 v=0
🧠 Interpretation:
Low frequencies → slow changes (smooth areas)
Types:
Types:
Ideal HPF
30/73
Gaussian HPF
Butterworth HPF
Mathematically:
H(u, v) = {
1, if D(u, v) ≤ D0
0, if D(u, v) > D0
Where:
31/73
Enhance contrast and simultaneously suppress low-frequency illumination variations (like
shadows).
Then:
3. Apply Inverse FT
📊 Summary Table
Concept Explanation
32/73
Concept Explanation
Here are the visual results of applying Low-Pass and High-Pass Filters in the frequency
domain:
33/73
Enhances edges and fine structures.
Let me know if you’d like to see Gaussian filters, Butterworth filters, or apply these to a
real-world image!
Great! The next topic in Module 3 after frequency domain enhancement is:
✅ Image Compression
(As covered in your slides from Page 158 to 170)
Let’s go through this topic step-by-step with complete clarity and detail:
Data encoded inefficiently (e.g., ASCII text, Huffman coding corrects this)
2. Spatial Redundancy
34/73
3. Psycho-visual Redundancy
Human eyes are less sensitive to some details (e.g., slight color changes)
✅ Methods:
Run-Length Encoding (RLE)
Huffman Coding
LZW (Lempel-Ziv-Welch)
Dictionary-based approach
35/73
✅ JPEG Compression (Lossy) – Slide 165–170
JPEG = Joint Photographic Experts Group
Y (Luminance)
Cb/Cr (Chrominance)
x=0 y=0
Where:
1
C(u) = { 2
,
u=0
1, u>0
36/73
🧠 Low frequencies (top-left of 8×8 DCT block) contain most visual info.
6. Entropy Coding
Uses Huffman coding or Run-Length Encoding
🔄 Decompression:
JPEG decoder reverses the steps:
Entropy decode → Dequantize → Inverse DCT → Reconstruct image
✅ Summary Table
37/73
Step Description
Next topic
38/73
✅ Optical Flow
(From your slides: Page 171–174)
Let’s explore this important concept in computer vision step-by-step with detailed and clear
explanations based on your slides.
It is a vector field:
Each vector describes how a point in one frame moves to the next frame.
Used in:
Motion detection
Video analysis
Object tracking
Autonomous navigation
1. Brightness Constancy:
Mathematically:
39/73
🔶 Optical Flow Constraint Equation
From brightness constancy and Taylor series expansion:
∂I ∂I ∂I
⋅u+ ⋅v+ =0
∂x ∂y ∂t
Where:
dx
u= dt : motion in x-direction
dy
v= dt : motion in y-direction
∂I ∂I ∂I
∂x , ∂y , ∂t : image gradients
⚠️ Underdetermined Problem
This equation has two unknowns (u and v), but only one equation.
This is called the aperture problem — we cannot determine motion uniquely from local
information.
Solves the constraint equation for all pixels in the window using least squares
40/73
Mobile devices
Robotics
2
E = ∫ [(
∂I ∂I ∂I
⋅u+ ⋅v+ ) + α2 (∣∇u∣2 + ∣∇v∣2 )] dxdy
∂x ∂y ∂t
3D motion estimation
Gesture recognition
Video compression
41/73
Feature Lucas–Kanade Horn–Schunck
✅ Summary
Concept Explanation
A visual demo of optical flow (e.g., arrows showing motion between frames)
Or to continue to the next topic: Face Detection & Recognition (from Module 4).
Perfect! Let’s now begin Module 4 with one of its most practical and exam-relevant topics:
✅ Face Detection
(From your slides: Page 178 to 188)
42/73
We'll break this topic into:
🧩 Key Concepts:
1. Haar-like Features:
Simple rectangular features (black and white regions) that capture contrast
patterns in faces.
2. Integral Image:
3. AdaBoost:
A machine learning technique that selects the best features and forms a strong
classifier from many weak ones.
43/73
4. Cascade of Classifiers:
✅ Strengths:
Real-time performance
❌ Limitations:
Struggles with:
Non-frontal faces
Occlusions
Lighting variations
🧩 Key Steps:
1. Gradient Computation:
2. Orientation Binning:
3. Block Normalization:
Group multiple cells into blocks and normalize to reduce illumination effects.
4. Feature Vector:
Concatenate all histograms into a 1D vector for classification (typically with an SVM).
44/73
✅ Strengths:
Robust to illumination changes
❌ Limitations:
Not rotation-invariant
🧩 Steps:
1. Scale-space Extrema Detection:
2. Keypoint Localization:
3. Orientation Assignment:
4. Descriptor Creation:
45/73
✅ Used in:
Face matching
Object recognition
Image stitching
📊 Comparison Summary
Method Key Idea Pros Cons
✅ Summary Table
Concept Description
46/73
Concept Description
next topic
Let’s explain this thoroughly in a simple yet complete way, based on your slides.
Remove redundancy
47/73
Preserve only the most informative features
✅ What is PCA?
Principal Component Analysis (PCA) is a linear, unsupervised dimensionality reduction
technique.
🧠 Goal:
Transform data to a new coordinate system where:
Image1
Image2
X=
⇒ Shape: n × d
⋮
Imagen
48/73
🔶 Step 2: Mean-Center the Data
Subtract mean of each column:
Xcentered = X − μ
C : d × d matrix
Captures how features vary together
C ⋅ v = λv
Where:
Xreduced = Xcentered ⋅ W
49/73
✅ Visual Interpretation (Slide 192)
Imagine a cloud of data points → PCA rotates the coordinate axes to align with
directions of maximum variance.
2. Noise Reduction:
3. Visualization:
4. Compression:
50/73
✅ Summary
Step Description
next topic
This topic explains how image processing extends into video analysis, and how actions or
movements are recognized from video sequences.
🎬 What is Video?
A video is a sequence of image frames shown rapidly over time (typically 24–60 frames per
second), giving the illusion of motion.
51/73
Each frame is an individual image — so many image processing techniques (like smoothing,
edge detection, etc.) still apply.
3. Action Recognition
Each frame can be processed like an image — and the change between frames gives motion
information.
52/73
🔶 Types of Redundancy:
Temporal Redundancy: Adjacent frames often have similar content
🔷 Compression Techniques
✅ Intra-frame Compression:
Applies image compression (like JPEG) within each frame
✅ Inter-frame Compression:
Uses motion estimation between frames to encode only differences
53/73
🔶 Methods of Action Recognition:
1. Frame-by-frame feature extraction
Extract features like edges, contours, HOG, etc., from each frame.
3. Pose Estimation
4. Spatio-temporal descriptors
3D CNNs
HOG3D
🔷 Applications:
Surveillance: Detect suspicious activities
54/73
🔶 What is Depth?
Depth = distance of object from camera
Intel RealSense
Stereo Cameras
Gesture recognition
Scene segmentation
Object tracking
Robot navigation
Depth data makes action recognition more accurate — as it includes shape and structure,
not just appearance.
✅ Summary
55/73
Topic Key Idea
Perfect! Let's now cover Module 2, which we skipped earlier. Based on your slides, Module 2
spans Pages 100 to 138 and includes the following key topics:
3. Color Models
4. Morphological Operations
We’ll now explain each topic one by one in detailed, simple language based strictly on your
slides.
56/73
🔷 1. Hough Transform (Slides 100–108)
✅ Purpose:
Used to detect shapes in images, especially lines and circles, that may be partially visible or
noisy.
x cos θ + y sin θ = ρ
2. For each edge pixel (x, y), compute all possible (ρ, θ)
(x − a)2 + (y − b)2 = r2
57/73
a, b: center of circle
r: radius
Tracking
Object recognition
Image stitching
🔶 Intuition:
A corner is a point where intensity changes in all directions.
🔶 Algorithm Steps:
1. Compute image gradients: Ix , Iy
Ix2
M =[ ]
Ix Iy
Ix Iy Iy2
58/73
k : sensitivity constant (0.04–0.06)
4. Threshold R to detect corners
🔷 HSV Example:
Hue: Color type (angle on color wheel)
Saturation: Colorfulness
Value: Brightness
✅ Basic Operations:
Operation Effect
🔶 Applications:
Noise removal
Shape smoothing
Hole filling
Skeletonization
Border extraction
Counts how often pairs of pixels with specific values occur at a given distance and
direction
60/73
For example, for d = 1, θ = 0∘ :
Count how often a pixel with value i is next to pixel with value j
✅ Applications:
Texture classification
Medical imaging
Surface inspection
📊 Module 2 Summary
Topic Key Idea
Hough Transform Detects lines and circles via voting in parameter space
61/73
Topic Key Idea
Absolutely! Let’s now cover Dilation, Erosion, and Opening in detail, as discussed in Module
2, Slides 123 to 128.
These are essential morphological operations used in binary and grayscale image
processing, particularly for shape-based processing.
These operations are most often applied to binary images (black & white), but can also work
on grayscale.
62/73
🔧 Operation:
The maximum value in the neighborhood defined by the structuring element is taken.
🧮 Example:
Input (3×3 region):
0 1 0
1 1 1
0 0 0
Structuring Element:
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
🔧 Operation:
The minimum value in the neighborhood is taken.
🧮 Example:
Input:
0 1 0
1 1 1
0 0 0
After erosion:
0 0 0
0 1 0
0 0 0
64/73
🧠 Purpose:
Removes small white noise (objects smaller than structuring element)
🔧 Intuition:
First erodes to remove small white patches
📊 Summary Table
Operation Effect Use Case
mathematica
Before: After:
0 1 0 1 1 1
1 1 1 → 1 1 1
0 0 0 1 1 1
65/73
Erosion:
White regions shrink:
mathematica
Before: After:
0 1 0 0 0 0
1 1 1 → 0 1 0
0 0 0 0 0 0
closing as well
In short:
Closing = Dilation followed by Erosion
66/73
🔶 Purpose of Closing:
Fills small holes or black spots inside white regions
Smooths contours
🧠 Intuition
Dilation grows the white area — closes small black gaps or holes
🧮 Example
Input Binary Image:
0 1 0
1 0 1
0 1 0
Structuring Element:
1 1 1
1 1 1
1 1 1
67/73
After Dilation:
1 1 1
1 1 1
1 1 1
After Erosion:
0 1 0
1 1 1
0 1 0
🔍 Visual Intuition:
Opening:
Closing:
68/73
Fills small black holes inside white foreground
✅ Summary
Operation Description Use Case
Move on to the final topic in Module 2: Texture analysis using GLCM (already briefly
covered).
Certainly! Let's now go through the Discrete Cosine Transform (DCT) in detail — this is part
of Image Compression under Module 3, primarily discussed in your slides on Pages 165 to
170.
69/73
The Discrete Cosine Transform is a mathematical technique that transforms an image from
the spatial domain (pixel intensity) to the frequency domain.
This allows:
7 7
1 (2x + 1)uπ (2y + 1)vπ
F (u, v) = C(u)C(v) ∑ ∑ f (x, y) ⋅ cos [ ] ⋅ cos [ ]
4 16 16
x=0 y=0
Where:
u = 0, otherwise C(u) = 1
70/73
7 7
1 (2x + 1)uπ (2y + 1)vπ
f (x, y) = ∑ ∑ C(u)C(v)F (u, v) ⋅ cos [ ] ⋅ cos [ ]
4 u=0 v=0 16 16
Top-left Low-frequency
Bottom-right High-frequency
71/73
✅ Why DCT is Preferred Over DFT:
Feature DCT DFT
7. Done ✅
✅ Summary Table
Step Description
72/73
Step Description
73/73