0% found this document useful (0 votes)

15 views12 pages

Computer Vision & CNNs - Study Notes

Uploaded by

xedac78301

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

Computer Vision & CNNs - Study Notes

Uploaded by

xedac78301

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Computer Vision Deep Dive - January 2025

COMPUTER VISION & CNNS

Teaching Machines to See and Understand the Visual World

The Vision Revolution: From simple edge detection to understanding

complex scenes - Computer Vision has transformed how machines
perceive our visual world. CNNs are the backbone of this revolution!

1. Computer Vision Overview

Computer Vision (CV) is about extracting meaningful information from

images and videos. It's how we give machines the gift of sight!

Core Principle: Images are just numbers! A grayscale image is a 2D

matrix, RGB is a 3D tensor. Our job is to find patterns in these
numbers.

Evolution of Computer Vision:

1960s-1980s: Hand-crafted features (edges, corners)

1990s-2000s: Statistical methods (SVM, Random Forests)

2012-Present: Deep Learning dominance (AlexNet breakthrough)

2. Image Fundamentals

Digital Image Representation:

Grayscale: Image[height, width] → values: 0-255 RGB:

Image[height, width, 3] → R, G, B channels Example:
224×224×3 = 150,528 input values!

Common Preprocessing:

Normalization: pixel_value / 255.0

Standardization: (pixel - mean) / std

Resizing: Match model input size

Data Augmentation: Rotation, flip, zoom, crop

3. Traditional CV Techniques

Edge Detection

Sobel Filter: Gradient-based edge detection

Canny Edge: Multi-stage algorithm for optimal edges

Laplacian: Second derivative for edge detection

Feature Descriptors

SIFT: Scale-Invariant Feature Transform

SURF: Speeded-Up Robust Features

HOG: Histogram of Oriented Gradients

ORB: Oriented FAST and Rotated BRIEF

4. Enter Convolutional Neural Networks

Why CNNs? They automatically learn hierarchical features! Early layers
detect edges, middle layers detect shapes, deep layers detect objects.

CNN Building Blocks:

Input Image
↓
[Convolution Layer] → Feature Maps
↓
[ReLU Activation] → Non-linearity
↓
[Pooling Layer] → Downsampling
↓
[Convolution Layer] → More Features
↓
[ReLU Activation]
↓
[Pooling Layer]
↓
[Flatten]
↓
[Fully Connected] → Classification
↓
Output (Classes)

5. Convolution Operation - The Heart of

CNNs

Input (5×5) Filter (3×3) Output (3×3) 1 2 3 4 5 1 0 -1 2 3 4 5 6 2 0

-2 = Feature Map 3 4 5 6 7 1 0 -1 4 5 6 7 8 (Sliding window 5 6 7 8 9
operation)

Output Size = (Input - Filter + 2×Padding) / Stride

+ 1 Example: (32 - 3 + 2×1) / 1 + 1 = 32 (same
padding)
Key Concepts:

Filters/Kernels: Learnable feature detectors

Stride: How much the filter moves

Padding: Adding borders to maintain size

Receptive Field: Input region aﬀecting output

6. Pooling Layers

Max Pooling (2×2, stride=2):

Input: Output:
1 3 2 4 3 4
2 2 1 3 →
4 1 3 2 4 3
2 1 2 1

Takes maximum value in each window

Why Pooling?

Reduces spatial dimensions (computation)

Provides translation invariance

Helps prevent overfitting

7. Famous CNN Architectures

Architecture Year Key Innovation Depth

LeNet-5 1998 First successful CNN 7 layers

AlexNet 2012 ReLU, Dropout, Data Aug 8 layers

VGGNet 2014 Small filters (3×3) 16-19 layers

GoogLeNet 2014 Inception modules 22 layers

ResNet 2015 Skip connections 50-152 layers

DenseNet 2017 Dense connections 100+ layers

EﬃcientNet 2019 Compound scaling Varies

8. ResNet - The Game Changer

Problem: Deeper networks suffered from vanishing gradients

Solution: Skip connections! Allow gradients to flow directly

Output = F(x) + x Where F(x) is the residual

function to be learned x is the identity mapping
(skip connection)

Residual Block:

Golden Rule: Don't train from scratch unless you have millions of
images! Use pre-trained models and fine-tune.

Transfer Learning Strategies:

1. Feature Extraction: Freeze conv layers, train only classifier

2. Fine-tuning: Unfreeze top layers, train with low learning rate

3. Full Training: Unfreeze all, but initialize with pre-trained weights

PyTorch Transfer Learning:

import torchvision.models as models

# Load pre-trained ResNet

model = models.resnet50(pretrained=True)

# Freeze all layers

for param in model.parameters():
param.requires_grad = False

# Replace final layer for your task

num_features = model.fc.in_features
model.fc = nn.Linear(num_features, num_classes)

# Now only final layer will train!

10. Data Augmentation Techniques

Spatial Transformations:
Random Crop

Horizontal/Vertical Flip

Rotation (±15-30°)

Translation

Zoom/Scale

Pixel-level Transformations:

Brightness adjustment

Contrast changes

Saturation/Hue shifts

Gaussian noise

Cutout/Random erasing

11. Object Detection

Two-Stage Detectors:

R-CNN: Region proposals → CNN features → Classification

Fast R-CNN: Shared computation for proposals

Faster R-CNN: Region Proposal Network (RPN)

Single-Stage Detectors:

YOLO: You Only Look Once - Real-time detection

SSD: Single Shot MultiBox Detector

RetinaNet: Focal loss for class imbalance

mAP (mean Average Precision) = Primary metric IoU

(Intersection over Union) = Area_overlap /
Area_union Threshold typically 0.5 for detection

12. Semantic Segmentation

Classify every pixel! Applications: Medical imaging, autonomous

driving, satellite imagery

Popular Architectures:

FCN: Fully Convolutional Networks

U-Net: Encoder-decoder with skip connections

DeepLab: Atrous convolutions for multi-scale

Mask R-CNN: Instance segmentation

13. Vision Transformers (ViT)

2020 Breakthrough: Transformers aren't just for NLP! ViT treats

images as sequences of patches.

Vision Transformer Pipeline:

Image (224×224)
↓
[Split into 16×16 patches]
↓
[Linear Projection of patches]
↓
[Add position embeddings]
↓
[Transformer Encoder blocks]
↓
[Classification head]
↓
Output

14. Practical Tips & Tricks

Training Best Practices:

Start with small learning rate (1e-4 for fine-tuning)

Use learning rate scheduling

Monitor validation loss for early stopping

Batch Normalization helps convergence

Mix precision training for speed

Common Pitfalls:

Forgetting to normalize inputs

Wrong channel order (RGB vs BGR)

Training on imbalanced datasets

Not using data augmentation

Overfitting on small datasets

15. Evaluation Metrics

Task Metrics

Classification Accuracy, Precision, Recall, F1, Confusion Matrix

Object Detection mAP, IoU, FPS (for real-time)

Segmentation Pixel Accuracy, IoU, Dice Coeﬃcient

Face Recognition ROC curve, FAR/FRR

16. Real-World Applications

Where CNNs Shine:

Medical: Tumor detection, X-ray analysis, retinal scans

Automotive: Self-driving cars, parking assistance

Security: Face recognition, anomaly detection

Retail: Visual search, inventory management

Agriculture: Crop disease detection, yield prediction

Manufacturing: Quality control, defect detection

17. Code Example - Building a CNN

import torch.nn as nn

class SimpleCNN(nn.Module):
def __init__(self, num_classes=10):
super(SimpleCNN, self).__init__()
# Convolutional layers
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)

# Pooling layer
self.pool = nn.MaxPool2d(2, 2)

# Fully connected layers

self.fc1 = nn.Linear(128 * 28 * 28, 512)
self.fc2 = nn.Linear(512, num_classes)

# Activation and regularization

self.relu = nn.ReLU()
self.dropout = nn.Dropout(0.5)

def forward(self, x):

# Conv Block 1
x = self.relu(self.conv1(x))
x = self.pool(x)

# Conv Block 2
x = self.relu(self.conv2(x))
x = self.pool(x)

# Conv Block 3
x = self.relu(self.conv3(x))

# Flatten and classify

x = x.view(x.size(0), -1)
x = self.dropout(self.relu(self.fc1(x)))
x = self.fc2(x)

return x

18. Latest Trends & Future

What's Hot in 2025:

Self-supervised learning (DINO, MAE)

Neural Architecture Search (NAS)

3D Computer Vision

Video understanding

Multimodal models (CLIP, DALL-E)

Eﬃcient models for edge devices

19. Resources & Next Steps

My Study Plan:

1. Master PyTorch/TensorFlow for CV

2. Implement classic papers from scratch

3. Kaggle competitions for practice

4. Build an end-to-end CV application

5. Explore 3D vision and video

6. Dive into vision transformers

"The eye sees only what the mind is prepared to comprehend" - Now we're teaching
machines to comprehend!

Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
UNIT-III Convolution Neural Networks
No ratings yet
UNIT-III Convolution Neural Networks
9 pages
Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From
No ratings yet
Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From
26 pages
CNN 2
No ratings yet
CNN 2
47 pages
Introduction To Convolutional Neural Networks (CNNS)
No ratings yet
Introduction To Convolutional Neural Networks (CNNS)
28 pages
Deep Learning U3
No ratings yet
Deep Learning U3
3 pages
CV Unit V
No ratings yet
CV Unit V
18 pages
DL CNN
No ratings yet
DL CNN
7 pages
CV PPT Mt101
No ratings yet
CV PPT Mt101
16 pages
Some Important Question
No ratings yet
Some Important Question
59 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Deep Dive Into Convolutional Neural Networks CNNs
No ratings yet
Deep Dive Into Convolutional Neural Networks CNNs
3 pages
CNNs & Computer Vision with PyTorch
No ratings yet
CNNs & Computer Vision with PyTorch
29 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Computer Vision With Deep Learning
No ratings yet
Computer Vision With Deep Learning
5 pages
Advanced DL Computer Vision
No ratings yet
Advanced DL Computer Vision
10 pages
Computer Vision With CNNs
No ratings yet
Computer Vision With CNNs
3 pages
Convolutional Neural Networks (CNNS) : Foundations and Applications in Visual Representation Learning
No ratings yet
Convolutional Neural Networks (CNNS) : Foundations and Applications in Visual Representation Learning
9 pages
UNIT 2 Self Notes
No ratings yet
UNIT 2 Self Notes
10 pages
CNN Basic
No ratings yet
CNN Basic
64 pages
DL Unit Iv
No ratings yet
DL Unit Iv
18 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
Convolutional Neural Networks Guide
No ratings yet
Convolutional Neural Networks Guide
31 pages
Intro to CNNs for Tech Enthusiasts
No ratings yet
Intro to CNNs for Tech Enthusiasts
31 pages
Deep Learning: Seungsang Oh
No ratings yet
Deep Learning: Seungsang Oh
39 pages
4a Convolutional Neural Networks
No ratings yet
4a Convolutional Neural Networks
56 pages
03 Convolution Neural Networks and Computer Vision With Tensorflow
No ratings yet
03 Convolution Neural Networks and Computer Vision With Tensorflow
21 pages
Co2 CNN 3
No ratings yet
Co2 CNN 3
31 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
Deep Learning & CNN Fundamentals
No ratings yet
Deep Learning & CNN Fundamentals
56 pages
Image Classification Using Convolutional Neural Networks (CNNS)
No ratings yet
Image Classification Using Convolutional Neural Networks (CNNS)
61 pages
Guddu Jha - Organized
No ratings yet
Guddu Jha - Organized
3 pages
Image Processing Deep Dive
No ratings yet
Image Processing Deep Dive
4 pages
What Is A Convolutional Neural Network (CNN) ?
No ratings yet
What Is A Convolutional Neural Network (CNN) ?
5 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Convolutional Neural Networks: 1. Basics of Cnns
No ratings yet
Convolutional Neural Networks: 1. Basics of Cnns
8 pages
Convolutional Nets
No ratings yet
Convolutional Nets
41 pages
Deep Learning for Image Processing
No ratings yet
Deep Learning for Image Processing
48 pages
Lec6 RNN Attention Search
No ratings yet
Lec6 RNN Attention Search
62 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
PEC CS 802C Deep Learning
No ratings yet
PEC CS 802C Deep Learning
13 pages
SoS'25 Midterm - Report
No ratings yet
SoS'25 Midterm - Report
14 pages
Bascis of AI - Module 2 - Complementary Study Material - 3
No ratings yet
Bascis of AI - Module 2 - Complementary Study Material - 3
3 pages
ch4 CNN
No ratings yet
ch4 CNN
35 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu Raghav - Medium
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu Raghav - Medium
10 pages
Convolutional Neural Networks in Python - DataCamp
No ratings yet
Convolutional Neural Networks in Python - DataCamp
22 pages
CNN and Applications
No ratings yet
CNN and Applications
22 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Module 5
No ratings yet
Module 5
20 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Oct2022 CSC649 SupervisedDL - CNN
No ratings yet
Oct2022 CSC649 SupervisedDL - CNN
79 pages
Unit 3
No ratings yet
Unit 3
59 pages
Reviewer - Convolutional Neural Networks (CNNS) - Muqaddas Bin Tahir
No ratings yet
Reviewer - Convolutional Neural Networks (CNNS) - Muqaddas Bin Tahir
8 pages
02 - Introduction To Convolutional Neural Networks (CNNS)
No ratings yet
02 - Introduction To Convolutional Neural Networks (CNNS)
28 pages
DL Unit 3 2019PAT
No ratings yet
DL Unit 3 2019PAT
66 pages
Lecture2.2 UnimodalRepresentations Part1 PDF
No ratings yet
Lecture2.2 UnimodalRepresentations Part1 PDF
92 pages
Nomination Form For Election of Fellows 2024
No ratings yet
Nomination Form For Election of Fellows 2024
8 pages
Writing The Reaction Paper
100% (1)
Writing The Reaction Paper
4 pages
RRB Clerk Prelims Day - 25 (E) 165157615119
No ratings yet
RRB Clerk Prelims Day - 25 (E) 165157615119
30 pages
Periodic People Intro To Periodic Table Activity
100% (1)
Periodic People Intro To Periodic Table Activity
5 pages
RN Heals
No ratings yet
RN Heals
2 pages
MB 330 Demo
No ratings yet
MB 330 Demo
5 pages
Topic Wise Bundle PDF Course 2022 - Quantitative Aptitude Time and Work & Pipes and Cistern Set-5 (Eng)
No ratings yet
Topic Wise Bundle PDF Course 2022 - Quantitative Aptitude Time and Work & Pipes and Cistern Set-5 (Eng)
6 pages
Get Test Bank For International Law of Taxation Elements of International Law Peter Hongler 9780192653901 0192653903 HQ File PDF Download
No ratings yet
Get Test Bank For International Law of Taxation Elements of International Law Peter Hongler 9780192653901 0192653903 HQ File PDF Download
405 pages
Functional and Behavioral Competence
No ratings yet
Functional and Behavioral Competence
2 pages
6th English Socialscience 01
No ratings yet
6th English Socialscience 01
184 pages
Kyle Legg's Resume: Education & Skills
No ratings yet
Kyle Legg's Resume: Education & Skills
2 pages
Font Impact on Student Reading
No ratings yet
Font Impact on Student Reading
44 pages
IT Specialist Form
No ratings yet
IT Specialist Form
8 pages
Presentation On Counseling of Students
No ratings yet
Presentation On Counseling of Students
18 pages
Grade 9 Narrative Writing Guide
No ratings yet
Grade 9 Narrative Writing Guide
5 pages
Tlecomp7curr Map 1st
No ratings yet
Tlecomp7curr Map 1st
5 pages
(Masud Chaichian, Andrei Pavlovich Demichev) Intro
100% (3)
(Masud Chaichian, Andrei Pavlovich Demichev) Intro
356 pages
Division Clearance For School Heads 2024 2025
No ratings yet
Division Clearance For School Heads 2024 2025
7 pages
Ttu Dissertation Proposal
100% (1)
Ttu Dissertation Proposal
8 pages
E Portfolio Evidence Standard 3 4 and 5
100% (1)
E Portfolio Evidence Standard 3 4 and 5
13 pages
Savitribai Phule Pune University
No ratings yet
Savitribai Phule Pune University
2 pages
BYRON - KATIE - Emotions - and - Reactions - English - 19oct2011 - LTR
No ratings yet
BYRON - KATIE - Emotions - and - Reactions - English - 19oct2011 - LTR
2 pages
Survey of Accounting 4th Edition Edmonds Ebook and TestBank Bundle Full Download
100% (1)
Survey of Accounting 4th Edition Edmonds Ebook and TestBank Bundle Full Download
402 pages
Kota Factory Review (Manita)
No ratings yet
Kota Factory Review (Manita)
13 pages
Free Activities in The Burnham Area
No ratings yet
Free Activities in The Burnham Area
4 pages
Applications of Trigonometery
No ratings yet
Applications of Trigonometery
44 pages
New Keystone LD - Contents
100% (1)
New Keystone LD - Contents
12 pages
Disadvantages of Youth Sports
No ratings yet
Disadvantages of Youth Sports
2 pages
Pmlsp1 - Intro and History
No ratings yet
Pmlsp1 - Intro and History
4 pages
Two Kinds by Amy Tan: Directions: Answer The Following Questions in Complete Sentences
No ratings yet
Two Kinds by Amy Tan: Directions: Answer The Following Questions in Complete Sentences
2 pages