[go: up one dir, main page]

0% found this document useful (0 votes)
43 views44 pages

W11 Lecture ITS69204 Image Recognition

Uploaded by

Chloe Tee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views44 pages

W11 Lecture ITS69204 Image Recognition

Uploaded by

Chloe Tee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Object

Recognition
Week 11
ITS69204 Computer Vision & Natural Language Processing
Reca
p
Digital Images Digitization of Images Digital Image Processing
Lecture
Content
• Object Recognition
o Object Identification (Classification)
o Object Detection (Localisation)
o Image Segmentation

• Object Recognition with Deep Learning


o Neural Network (Deep Learning)
o Types of Neural Network – dense NN vs
CNN
o Components of Convolution Neural Network
(CNN)
o Kernel / Filter
o Max Pooling
o Zero padding

• Transfer Learning
1 Object
Modern Applications of CV

Image Classification + Object


Recognition
Localisation
Object
Recognition
Introduction to Object Recognition
• Object recognition is the ability of a computer to identify and classify
objects in images and videos.
• It is a fundamental task in computer vision, with applications in a wide
range of fields, including:

• Robotics. Robots use object recognition to identify and interact with


objects in the environment.
• Autonomous driving. Self-driving cars use object recognition to detect
and avoid pedestrians, vehicles, and traffic signs.
• Image search. Search engines use object recognition to provide relevant
results to users
who search for images.
• Medical imaging. Doctors use object recognition to identify diseases and
abnormalities in medical images.
Object
Recognition
Object Recognition - Components
• Object recognition typically
consist of two main components:
o Identification – an object/image
classification task
What is it? Is it a cat or a
dog? Who is it? Is it Jane
or Erik?

o Detection – an object
localisation
task
Is there a cat in the
image? Where is the
cat?

o Image segmentation – an
object localisation task
Which pixels are cat located
in?
Object
Recognition
Object Recognition – Image Classification
Image Classification
• one of the most studied topics ever since the ImageNet dataset was
released in 2010.

• problem statement of image classification is quite simple:


Given a group of images, the task is to classify them into a set of
predefined classes using solely a set of sample images that have already been
classified.

 image classification deals with processing the entire image as a whole and
assigning a specific label to it
Object
Recognition
Object Recognition – Object Detection
Object Detection
• refers to detection and localization of objects using bounding boxes.

• looks for class-specific details in an image or a video and identifies them


whenever they appear.

• Previously methods of object detection used Haar Features, SIFT, and HOG
Features to detect features in an image and classify them based on classical
machine learning approaches.

This process, other than being time-consuming and largely inaccurate, has
severe limitations on the number of objects that can be detected.
Object
Recognition
Object Recognition – Object Detection
Object Detection

• As such, Deep Learning models like YOLO, RCNN, SSD


that use millions of parameters to break through these
limitations are popularly employed for this task.

• Object detection is often accompanied by Object


Recognition, aka Object Classification

SSD YOLO
Object
Recognition
Object Recognition – Image Segmentation
Image Segmentation

• division of an image into subparts or sub-objects to demonstrate that the machine


can discern an object from the background and/or another object in the same image.

• A “segment” of an image represents a particular class of object that the neural


network has identified in an image, represented by a pixel mask that can be
used to extract it.

• traditional image processing algorithms: watershed algorithms,


clustering-based segmentation

• deep learning architectures: PSPNet, FPN, U-Net, SegNet, etc.

To identify ROI:
Image
Operations
Summar
y
Traditional Approach Deep Learning

CNN based models -


Hierachical
Image AlexNet,
approach – edge
GoogleNet, VGG,
Classification detection based
M S RA

Haar Features, SIFT, YOLO, RCNN,


Object and HOG Features SSD
Detection

watershed algorithms, PSPNet, FPN, U-


Image Net,
clustering-based
Segmentation segmentation SegNet, etc
2 Learnin
Implementation

Object Recognition with


Deep
g
Object Recognition with Deep
Learning
Object Recognition with Deep Learning
Recap:

• Traditional approaches.
These approaches use hand-crafted features to represent objects. For example, the
SIFT feature descriptor is commonly used to represent local features in an image.

• Deep learning approaches.


These approaches use deep neural networks to learn features from images. Deep
learning approaches have achieved state-of-the-art results on object recognition
tasks in recent years.
Object Recognition with Deep
Learning
What is Neural Network?
• Neural network (Artificial Neural Network) –
• Machine learning algorithm that mimic neuronal structure of
human brain.
• Components: Input layer, hidden layer and output layer
• The components are fully connected between the input and output
layers
Object Recognition with Deep
Learning
What is Neural Network?
• Neural neworks are typically organized in layers.

• Layers are made up of a number of interconnected 'nodes' which


contain an 'activation function’.

• Patterns are presented to the network via the 'input layer', which
communicates to one or more 'hidden layers' where the actual processing is
done via
a system of weighted 'connections’.
Object Recognition with Deep
Learning
What is Neural Network?
• Neuron - basic unit of a neural network. A neuron takes inputs, does
some math with them, and produces one output.

• Neural Network - a bunch of neurons connected together.

• Training a Neural Network – meaning to change our weights and biases to


minimize loss. An optimization algorithm such as Stochastic Gradient
Descent can be used.

Hyperparameters:
• Activation function – a function to compute an unbounded input into an output
that has a nice, predictable form. (A commonly used activation function is
the sigmoid function, only outputs numbers in the range (0,1). You can think
of it as compressing (−∞,+∞) to (0,1)
— big negative numbers become ~0, and big positive numbers become ~1.)
• Loss - a way to quantify how “good” it’s doing so that it can try to do “better”.
The aim is to minimize the loss. Lower loss → Better prediction ( M S E is a
commonly use loss)
• Learning rate - a constant is used to controls how fast we train.
Object Recognition with Deep
Learning
Key Concepts – NN for Image Classification
• Pixel is the structured format of image
data

What
What
computers
we see
see

• Dense neural networks connect


all layers to each other,
which allows each layer to
learn from all previous layers.

• Dense neuralnetworks
features reuse learned by
layers. previous
Object Recognition with Deep
Learning
Key Challenge – dense NN for Image Classification
• Fully connected NN Solution: Locally connected network

• Smaller no of parameters
• Too many parameters
• Accounts for local variance
• Doesn’t account for spatial
• Preferred choice
variance
• Rarely ever used
Convolution
Neural Network

Example: 1000 x 1000 image


1Million hidden units
 10^12 parameters

Spatial correlation is local


Better to put resources
elsewhere
Object Recognition with Deep
Learning
Convolution Neural Network (CNNs)
• CNNs are the most popular Deep Learning Neural Nets used for object
recognition.
• CNNs are well-suited for object recognition because they can learn
features that are invariant to viewpoint, illumination, and scale variations.
• Original CNN started with 8/9 layers AlexNet) and now 100s of layers are
common
Object Recognition with Deep
Learning
Convolution Neural Network (CNNs)
• Components of Deep Learning for Image Classification:
• Feature extractor.The feature extractor extracts features from the input
image. These features can be low-level features, such as edges and
corners, or high-level features, such as object parts.
• Classifier. The classifier takes the features extracted by the feature
extractor and
predicts the object category.
Object Recognition with Deep
Learning
Convolution Neural Network (CNNs)
• Feature Extraction is a core process of computer vision.
Typical feature of computer vision tasks is Edges-boundary.
• This is done by convolving the image with a suitable filter.

1. Kernels/filter act as highlighters of a particular feature in the image


2.They producing an activation map of the feature in the image.

Edge Activation Map


Object Recognition with Deep
Learning
Convolution Neural Network (CNNs)
Feature extractionin CNN is achieved through hierarchical application
of
convolution process with several filters.
E.g. Convolution , Subsampling (pooling)

• Convolution
– Filter/kernel
– convolution operation
– padding

• Data reduction
– stride
– max pooling

• Flatten
Object Recognition with Deep
Learning
CNN Components -
Kernels
1. Kernels/filter act as highlighters of a particular feature in the
image
2.They producing an activation map of the feature in the
image.

Convolution Operations
Object Recognition with Deep
Learning
CNN Components -
Kernels
1. Kernels/filter act as highlighters of a particular feature in the
image
2.They producing an activation map of the feature in the
image.

Convolution Operations
Object Recognition with Deep
Learning
CNN Components -
Kernels
1. Kernels/filter act as highlighters of a particular feature in the
image
2.They producing an activation map of the feature in the
image.

Convolution Operations
Object Recognition with Deep
Learning
CNN Components -
Kernels
1.•Kernels/filter act as highlighters of a particular feature in the
image
2.They producing an activation map of the feature in the
image.
Object Recognition with Deep
Learning
CNN Components -
Kernels
• No of Kernels – more kernels can detect more features

• Number of kernel in a network layer may vary


 Early layers detect simple features

• fewer kernels are used


 Later layers detect more complex features – combinations simple
features

• Kernel size
o Larger kernels need more processing time
o Kernel size should be sufficient enough to detect meaningful
features while keeping the processing time
o kernels of 3x3 or 5x5 are common
Object Recognition with Deep
Learning
CNN Components -
Padding
• To maintain same image
size
Object Recognition with Deep
Learning
CNN Components -
Stride
• Kernel step size
• stride length of 1 pixel is a common option.
• another common choice is a 2-pixel stride

• Stride length 3 is less common. Anything larger, is may skip regions of


the image that are of value to the model.

• increasing the stride will yield an increase in speed because there


are fewer calculations that need to be carried out.

7x7 input (spatially)


7 7 assume 3x3 filter applied
7
with stride 2
=> 3x3 output!

7 7 7
Object Recognition with Deep
Learning
CNN Components -
Pooling
Max Pooling
- makes the representations smaller and more
manageable
- operates over each activation map independently:

Like a convolutional layer, the pooling layer slides from left to right and from top to
bottom over the matrix of values input into it. With a 2×2-sized filter, the layer
retains only the largest of four input values

• Other pooling variants (e.g., average pooling, L2-norm pooling)


Object Recognition with Deep
Learning
CNN Components - Flatten

The Flatten layer, enables us to collapse many-dimensional arrays


down to one dimension
Example: Converting a 3x3 image to 9x1
Object Recognition with Deep
Learning
Putting it all
together:

Major CNN Architectures in


Image Classification:

•LeNet
•AlexNet
•VGG Net

https://
www.researchgate.net/publication/330511306_A_Survey_of_the_Recent_Architectures_of_Deep_Convolutio
nal
_Neural_Networks/figures?lo=1&utm_source=google&utm_medium=organic
Object Recognition with Deep
Learning
Le Net (1998)

• LeNet, was one of the earliest CNN architectures, primarily designed for handwritten
digit
recognition tasks, such as recognizing digits in postal addresses.
• LeNet consists of convolutional layers, pooling layers, and fully connected layer, making it the
foundational model for future CNN architectures.
• a LeNet's design included the use of convolutional filters, non-linear activation functions (such
as tanh), and subsampling operations (such as average pooling)
Object Recognition with Deep
Learning
AlexNet (2012)

• AlexNet is deeper and larger than LeNet, consisting of multiple convolutional layers followed
by fully connected layers.
• used large filter sizes, kernel_size=(11, 11) in the earliest convolutional layers relative to
what is
popular today.
Object Recognition with Deep
Learning
VGG Net (2014)

• VGG Net follows the same repeated conv-pool-block structure as AlexNet


• VGGNet simply has more of them, and with smaller (all 3×3-pixel) kernel sizes.
3 Transfer
Implementation

Learning
Transfer
Learning
Transfer Learning – Pre-trained
model
“After supervised learning —Transfer Learning will be the next driver of ML
commercial success.” – Andrew Ng

https://www.kaggle.com/competitions/dogs-vs-
cats/code?competitionId=3362&sortBy=voteCount
Transfer
Learning
Transfer Learning – Why?
Review of typical characteristics of DL
models in CV:

• typically have several layers – up to


a few hundreds
• several million parameters to learn
• need a very large number of labelled
exmples to learn the model
• need huge computing resources
• need very long time to train
• convolutional layers in deep learning
models extract features
• low level to mid level features in
images are similar

Neural networks are layer-wise self-contained — that is:


• after training, you can create a different neural network by:
–you can remove all layers after a particular layer while retaining the layers and
their weights below
– add on a fully connected layer with a different number of neurons and random
weights
Transfer
Learning
Transfer Learning – General Idea
• Transfer Learning is a research problem in deep learning that focuses on
storing knowledge gained while solving one problem and applying it to a
different but related problem.

• Use outputs of one or more layers of a network trained on a different task


as generic feature detectors. Train a new shallow model on these features.
Transfer
Learning
Transfer Learning – Approach

When the Dataset is small: (small dataset, small number of labelled samples)
• The biggest benefitof transfer learning shows when the
target data set is relatively small.
• With large networks, small dataset leads to overfitting. Data augmentation is
one of the choices to solve small dataset problem. But this doesn’t resolve
overfitting.

In these cases, transfer learning is the best choice where the source model
has been trained on a vastly bigger training set.
Transfer
Learning
Transfer Learning – Approach

Where to Unfreeze?
• In Fine tuning approach the biggest question is where to unfreeze? Determining where to
cut off the unfreeze isa tedious task when handling large networks. It is resolved by
adjusting the learning rates of the layers

• A typical learning rate rule - 2:4:6 involves:


– using a learning rate of 10-6 for the bottommost few layers,
– 10-4 for the other transfer layers
– And 10-2 for new layers.
• There may be other ratios.
Transfer
Learning
Transfer Learning – Summary
• Transfer learning is a concept of using heavily trained models to
build models for new tasks
• Suitable for tasks with limited labeled data availability
• Solves overfitting problem for heavily domain dependent tasks
• Feature extraction and fine tuning are two main approaches in
transfer learning
Summar
y
• Object Recognition
o Object Identification (Classification)
o Object Detection (Localisation)
o Image Segmentation

• Object Recognition with Deep Learning


o Neural Network (Deep Learning)
o Types of Neural Network – dense NN vs
CNN
o Components of Convolution Neural Network
(CNN)
o Kernel / Filter
o Max Pooling
o Zero padding

• Transfer Learning
Questions?
Ask me anything!

You might also like