W11 Lecture ITS69204 Image Recognition
W11 Lecture ITS69204 Image Recognition
Recognition
Week 11
ITS69204 Computer Vision & Natural Language Processing
Reca
p
Digital Images Digitization of Images Digital Image Processing
Lecture
Content
• Object Recognition
o Object Identification (Classification)
o Object Detection (Localisation)
o Image Segmentation
• Transfer Learning
1 Object
Modern Applications of CV
o Detection – an object
localisation
task
Is there a cat in the
image? Where is the
cat?
o Image segmentation – an
object localisation task
Which pixels are cat located
in?
Object
Recognition
Object Recognition – Image Classification
Image Classification
• one of the most studied topics ever since the ImageNet dataset was
released in 2010.
image classification deals with processing the entire image as a whole and
assigning a specific label to it
Object
Recognition
Object Recognition – Object Detection
Object Detection
• refers to detection and localization of objects using bounding boxes.
• Previously methods of object detection used Haar Features, SIFT, and HOG
Features to detect features in an image and classify them based on classical
machine learning approaches.
This process, other than being time-consuming and largely inaccurate, has
severe limitations on the number of objects that can be detected.
Object
Recognition
Object Recognition – Object Detection
Object Detection
SSD YOLO
Object
Recognition
Object Recognition – Image Segmentation
Image Segmentation
To identify ROI:
Image
Operations
Summar
y
Traditional Approach Deep Learning
• Traditional approaches.
These approaches use hand-crafted features to represent objects. For example, the
SIFT feature descriptor is commonly used to represent local features in an image.
• Patterns are presented to the network via the 'input layer', which
communicates to one or more 'hidden layers' where the actual processing is
done via
a system of weighted 'connections’.
Object Recognition with Deep
Learning
What is Neural Network?
• Neuron - basic unit of a neural network. A neuron takes inputs, does
some math with them, and produces one output.
Hyperparameters:
• Activation function – a function to compute an unbounded input into an output
that has a nice, predictable form. (A commonly used activation function is
the sigmoid function, only outputs numbers in the range (0,1). You can think
of it as compressing (−∞,+∞) to (0,1)
— big negative numbers become ~0, and big positive numbers become ~1.)
• Loss - a way to quantify how “good” it’s doing so that it can try to do “better”.
The aim is to minimize the loss. Lower loss → Better prediction ( M S E is a
commonly use loss)
• Learning rate - a constant is used to controls how fast we train.
Object Recognition with Deep
Learning
Key Concepts – NN for Image Classification
• Pixel is the structured format of image
data
What
What
computers
we see
see
• Dense neuralnetworks
features reuse learned by
layers. previous
Object Recognition with Deep
Learning
Key Challenge – dense NN for Image Classification
• Fully connected NN Solution: Locally connected network
•
• Smaller no of parameters
• Too many parameters
• Accounts for local variance
• Doesn’t account for spatial
• Preferred choice
variance
• Rarely ever used
Convolution
Neural Network
• Convolution
– Filter/kernel
– convolution operation
– padding
• Data reduction
– stride
– max pooling
• Flatten
Object Recognition with Deep
Learning
CNN Components -
Kernels
1. Kernels/filter act as highlighters of a particular feature in the
image
2.They producing an activation map of the feature in the
image.
Convolution Operations
Object Recognition with Deep
Learning
CNN Components -
Kernels
1. Kernels/filter act as highlighters of a particular feature in the
image
2.They producing an activation map of the feature in the
image.
Convolution Operations
Object Recognition with Deep
Learning
CNN Components -
Kernels
1. Kernels/filter act as highlighters of a particular feature in the
image
2.They producing an activation map of the feature in the
image.
Convolution Operations
Object Recognition with Deep
Learning
CNN Components -
Kernels
1.•Kernels/filter act as highlighters of a particular feature in the
image
2.They producing an activation map of the feature in the
image.
Object Recognition with Deep
Learning
CNN Components -
Kernels
• No of Kernels – more kernels can detect more features
• Kernel size
o Larger kernels need more processing time
o Kernel size should be sufficient enough to detect meaningful
features while keeping the processing time
o kernels of 3x3 or 5x5 are common
Object Recognition with Deep
Learning
CNN Components -
Padding
• To maintain same image
size
Object Recognition with Deep
Learning
CNN Components -
Stride
• Kernel step size
• stride length of 1 pixel is a common option.
• another common choice is a 2-pixel stride
7 7 7
Object Recognition with Deep
Learning
CNN Components -
Pooling
Max Pooling
- makes the representations smaller and more
manageable
- operates over each activation map independently:
Like a convolutional layer, the pooling layer slides from left to right and from top to
bottom over the matrix of values input into it. With a 2×2-sized filter, the layer
retains only the largest of four input values
•LeNet
•AlexNet
•VGG Net
https://
www.researchgate.net/publication/330511306_A_Survey_of_the_Recent_Architectures_of_Deep_Convolutio
nal
_Neural_Networks/figures?lo=1&utm_source=google&utm_medium=organic
Object Recognition with Deep
Learning
Le Net (1998)
• LeNet, was one of the earliest CNN architectures, primarily designed for handwritten
digit
recognition tasks, such as recognizing digits in postal addresses.
• LeNet consists of convolutional layers, pooling layers, and fully connected layer, making it the
foundational model for future CNN architectures.
• a LeNet's design included the use of convolutional filters, non-linear activation functions (such
as tanh), and subsampling operations (such as average pooling)
Object Recognition with Deep
Learning
AlexNet (2012)
• AlexNet is deeper and larger than LeNet, consisting of multiple convolutional layers followed
by fully connected layers.
• used large filter sizes, kernel_size=(11, 11) in the earliest convolutional layers relative to
what is
popular today.
Object Recognition with Deep
Learning
VGG Net (2014)
Learning
Transfer
Learning
Transfer Learning – Pre-trained
model
“After supervised learning —Transfer Learning will be the next driver of ML
commercial success.” – Andrew Ng
https://www.kaggle.com/competitions/dogs-vs-
cats/code?competitionId=3362&sortBy=voteCount
Transfer
Learning
Transfer Learning – Why?
Review of typical characteristics of DL
models in CV:
When the Dataset is small: (small dataset, small number of labelled samples)
• The biggest benefitof transfer learning shows when the
target data set is relatively small.
• With large networks, small dataset leads to overfitting. Data augmentation is
one of the choices to solve small dataset problem. But this doesn’t resolve
overfitting.
In these cases, transfer learning is the best choice where the source model
has been trained on a vastly bigger training set.
Transfer
Learning
Transfer Learning – Approach
Where to Unfreeze?
• In Fine tuning approach the biggest question is where to unfreeze? Determining where to
cut off the unfreeze isa tedious task when handling large networks. It is resolved by
adjusting the learning rates of the layers
• Transfer Learning
Questions?
Ask me anything!