0% found this document useful (0 votes)

8 views98 pages

Deep Learning Based Computer Vision

The document discusses artificial neural networks and deep learning, particularly in the context of image processing and computer vision. It covers topics such as digital images, image formats, spatial filtering, and the effectiveness of deep learning algorithms in tasks like object detection and recognition. Additionally, it explains the architecture of convolutional neural networks and their applications in various fields.

Uploaded by

lihit19426

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views98 pages

Deep Learning Based Computer Vision

Uploaded by

lihit19426

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 98

ARTIFICIAL NEURAL NETWORKS AND DEEP LEARNING

Deep Learning and its role in Computer Vision

duction to Robotics

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar 1
What is a Digital Image?

Image is a two-dimensional intensity function f(x, y), where

the value of f at a spatial location (x, y) is the intensity of the
image at that point.
y
x
Gray
Level
f(x,y)

Dr. Sandeep Singh Sengar

Common image formats
– 1 sample per point (B&W) [0,1]
– 1 sample per point (Grayscale)[0-255]
– 3 samples per point (Red, Green, and Blue)[0-255]
– 4 samples per point (Red, Green, Blue, and “Alpha”, a.k.a. Opacity) [0-
255, 0-1]

Dr. Sandeep Singh Sengar

Color Image

RGB Color Space

A color image is just three functions pasted
together. We can write this as a “vector-
valued” function:

 r ( x, y ) 
f ( x, y ) =  g ( x, y ) 
 

 b ( x, y ) 


Dr. Sandeep Singh Sengar

RGB Image

Dr. Sandeep Singh Sengar

Image Processing

An image processing operation typically defines a new

image g in terms of an existing image f.
We can write the following function for image transform:

Dr. Sandeep Singh Sengar

Why Digital Image Processing?
Digital image processing focuses on two major tasks
– Improvement of pictorial information for human interpretation
– Processing of image data for storage, transmission and
representation for autonomous machine perception
Some argument about where image processing ends and
fields such as image analysis and computer vision start

Dr. Sandeep Singh Sengar

The Spatial Filtering Process
Origin x
a b c j k l
d
g
e
h
f
i
* m
p
n
q
o
r
Original Filter (w)
Simple 3*3
e 3*3 Filter Image
Neighbourhoo Pixels
d
eprocessed = n*e + j*a + k*b
+ l*c + m*d + o*f + p*g + q*h
+ r*i
y Image f (x, y)

The above is repeated for every pixel in the original

image to generate the filtered image
Dr. Sandeep Singh Sengar
Levels of Digital Image Processing
The continuum from image processing to computer vision
can be broken up into low-, mid- and high-level processes

Low Level Process Mid Level Process High Level Process

Input: Image Input: Image Input: Attributes
Output: Image Output: Attributes Output: Understanding
Examples: Noise Examples: Object Examples: Scene
removal, image recognition, understanding,
sharpening segmentation autonomous navigation

Dr. Sandeep Singh Sengar

Spatial filters
Remember that types of neighborhood:

intensity transformation: neighborhood of size 1x1

spatial filter (or mask ,kernel, template or window): neighborhood of larger size, like 3*3 mask

The spatial filter mask is moved from point to point in an image. At each point (x, y),
the response of the filter is calculated
x

Neighbourhood

(x, y) Origin

y Sengar
Dr. Sandeep Singh Image f (x, y)
Neighbourhood Operations

For each pixel in the origin image, the outcome is written on

the same location at the target image.
x Target
Original

Neighbourhood

(x, y)
Origin

y Image f (x, y)
Dr. Sandeep Singh Sengar
The Spatial Filtering Process
Origin x
a b c j k l
d
g
e
h
f
i
* m
p
n
q
o
r
Original Filter (w)
Simple 3*3
e 3*3 Filter Image
Neighbourhood Pixels
eprocessed = n*e + j*a + k*b +
l*c + m*d + o*f + p*g + q*h +
r*i
y Image f (x, y)

The above is repeated for every pixel in the original

image to generate the filtered image
Dr. Sandeep Singh Sengar
Smoothing Spatial Filtering
Origin x
104 100 108

99 106 98

95 90 85
*
1/ 100108
9 /9 /9
104 1 1 Original Filter
Simple 3*3 /9 1106
199 /9 198
/9
3*3 Smoothing Image
Neighbourhood /9 190
195 /9 185
/9
Filter Pixels

e = 1/9106 + 1/9104 + 1/9*100 +

1/ *108 + 1/ *99 + 1/ *98 + 1/ *95 +
9 9 9 9
1/ *90 + 1/ *85 = 98.3333
y Image f (x, y) 9 9

The above is repeated for every pixel in

the original image to generate the
smoothed image Dr. Sandeep Singh Sengar
Spatial filters : Smoothing
linear smoothing : averaging kernels

Standard average

Dr. Sandeep Singh Sengar

Spatial filters : Smoothing

Standard Average- example

110 120 90 130 The mask is moved

from point to point in
91 94 98 200
an image. At each
90 91 99 100 point (x,y), the
response of the filter
82 96 85 90 is calculated

Standard averaging filter:

(110 +120+90+91+94+98+90+91+99)/9 =883/9 = 98.1

Dr. Sandeep Singh Sengar

Spatial filters : Smoothing
Weighted Average- example

Dr. Sandeep Singh Sengar

Spatial filters : Smoothing
Median Filter- example

Dr. Sandeep Singh Sengar

Another smoothing example
Smoothing example
By smoothing the original image we get rid of lots of the finer detail which
leaves only the gross features for thresholding.

Original Image Smoothed Image Thresholded Image

Dr. Sandeep Singh Sengar

Averaging filter vs. median filter example
Averaging filter vs. median filter example

Original Image Image After Image After

With Noise Averaging Filter Median Filter

• Filtering is often used to remove noise from images.

• Sometimes a median filter works better than an averaging filter.

Dr. Sandeep Singh Sengar

Strange things happen at the edges!
Strange things happen at the edges! (cont …)
At the edges of an image we are missing pixels to form a neighbourhood.
Origin
x
e e

e e e
y

Image f (x, y)

Dr. Sandeep Singh Sengar

What happens when the Values of the Kernel Fall Outside

the Image??!

Dr. Sandeep Singh Sengar

border padding

Dr. Sandeep Singh Sengar

Applications

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar
Text Recognition

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar
Dr. Sandeep Singh Sengar
Dr. Sandeep Singh Sengar
Biometrics

Dr. Sandeep Singh Sengar

Computer Vision

Dr. Sandeep Singh Sengar

“One picture is worth more than
thousand words”

Dr. Sandeep Singh Sengar

Object Detection
• Moving-object detection is one of the basic and most
active research domains in the field of computer vision.
• Underlying assumptions is that moving objects generally
entail intensity changes between consecutive frames.
Object Tracking
The object tracking is used to compute the configuration
(i.e., position and size) of the target in the subsequent
frames corresponding to the state of the target in the initial
frame.
Object Recognition

Object recognition is a computer vision technique for

identifying objects in images or videos.
Medical Image Segmentation

Medical
Imaging

Dr. Sandeep Singh Sengar

What is Machine Learning?
Machine learning is a subset of Artificial Intelligence, provides
computers with the ability to learn without being explicitly
programmed.

ML came in 1950s. Defined in 1951 by “Arthur Samuel” at IBM

(designed checkers play machine):

Ref:https://www.forbes.com/sites/kalevleetaru/2019/01/15/why-machine-learning-needs-semantics-not-
just-statistics/?sh=730fa3aa77b5 36
Dr. Sandeep Singh Sengar
Branch of Machine Learning

37
Dr. Sandeep Singh Sengar
Ref: https://www.wordstream.com/blog/ws/2017/07/28/machine-learning-applications
Deep Learning
Deep Learning is a subfield of machine learning concerned
with algorithms inspired by the structure and function of the
brain called artificial neural networks.
DL/ML is used to find the algorithm (model)
Large data High performance

Dr. Sandeep Singh Sengar

Ref: https://www.intel.la/content/www/xl/es/artificial-intelligence/posts/difference-between-ai-machine-learning-deep-learning.html
Why Deep Learning Today?
▪ Better algorithms and
understanding
▪ Computational power (GPUs,
TPUs, …)
▪ Massive labelled data
▪ Variety of open source tools
and models

Slide adapted from Wai K. Dr. Sandeep Singh Sengar

End-to-end approach?

Dr. Sandeep Singh Sengar

Ref: https://lawtomated.com/a-i-technical-machine-vs-deep-learning/
Deep Learning Process
▪ A deep neural network provides state-of-the-art
accuracy in many tasks, from object detection to
speech recognition
▪ They can learn automatically, without predefined
knowledge explicitly coded by the programmers

Dr. Sandeep Singh Sengar

Effectiveness of Deep Learning
▪ Deep learning algorithms attempt to learn
representation by using a hierarchy of multiple
layers
▪ If we provide the system tons of information, it
begins to understand it and respond in useful
ways
▪ Manually designed features are often over-
specified, incomplete and take a long time to
design and validate
▪ Learned features are easy to adapt, fast to learn

Dr. Sandeep Singh Sengar

Effectiveness of Deep Learning
▪ Deep learning provides a very flexible and
universal, learnable framework for representing
world
▪ Can learn in both unsupervised and supervised
manner
▪ Utilize large amounts of training data
▪ Since 2010, deep learning started outperforming
other machine learning techniques especially in
the areas of machine vision and speech
recognition

Dr. Sandeep Singh Sengar

Deep Learning Examples
▪ Hierarchy of representations with increasing level
of abstraction
▪ Each stage is a kind of trainable nonlinear feature
transform
▪ Image recognition example
• Pixel → edge → texton → motif → part → object
▪ Text example
• Character → word → word group → clause →
sentence → story

Dr. Sandeep Singh Sengar

Deep Learning in Practice
▪ Visual question answering : Given an image and a
natural language question about the image, the
task is to provide an accurate natural language
answer
▪ Click here for demo: http://visualqa.csail.mit.edu/

Dr. Sandeep Singh Sengar

Deep Learning Architectures

Architecture Application
CNN Image recognition, video analysis, natural language processing

RNN Speech recognition, handwriting recognition, Machine Translation

Natural language text compression, handwriting recognition,

LSTM/GRU networks
speech recognition, gesture recognition, image captioning

Image recognition, information retrieval, natural language

DBN
understanding, failure prediction

DSN Information retrieval, continuous speech recognition

Dr. Sandeep Singh Sengar

The Spatial Filtering Process
Origin x
a b c j k l
d
g
e
h
f
i
* m
p
n
q
o
r
Original Filter (w)
Simple 3*3
e 3*3 Filter Image
Neighbourhood Pixels
eprocessed = n*e + j*a + k*b
+ l*c + m*d + o*f + p*g + q*h
+ r*i
y Image f (x, y)

The above is repeated for every pixel in the original

image to generate the filtered image
Dr. Sandeep Singh Sengar
Convolutional Neural Network
A Convolutional Neural Network is a Deep Learning algorithm which can take
in an input image, assign importance (learnable weights and biases) to various
aspects/objects in the image and be able to differentiate one from the other.
The pre-processing required in a CNN is much lower as compared to other
classification algorithms.

Dr. Sandeep Singh Sengar

CNN layers
An image is passed through a series of layers:
– Convolutional – filters can be thought of as feature
identifiers
⮚Nonlinear (ReLu) – approximate complex functions
– Max Pooling (down sampling)
– Fully connected layers – softmax/sigmoid
⮚ which produce an output.

Dr. Sandeep Singh Sengar

Ref: https://towardsdatascience.com/understanding-and-implementing-lenet-5-cnn-architecture-deep-learning-a2d531ebc342
Convolutional Neural Network

Dr. Sandeep Singh Sengar

Basic idea of Convolutional

Dr. Sandeep Singh Sengar

Convolutional Layer Example

Stride s=2
#filters=2
#channels=3
Padding p=1

Dr. Sandeep Singh Sengar

Size of Output
I/P size: n*n
Filter size: f*f
O/P size: (n-f+1)*(n-f+1)

Dr. Sandeep Singh Sengar

Padding and stride convolutions
Padding: It is used for same I/P and O/P size
For padding: p
O/P size=(n+2p-f+1)*(n+2p-f+1) i.e. p=(f-1)/2

Stride: s
O/P size= [(n+2p-f)/s+1]* [(n+2p-f)/s+1]

Dr. Sandeep Singh Sengar

Multiple filters
For example to detect Horizontal and vertical edges.

O/P size: (n×n×nc)(f×f×nc) --> (n-f+1)(n-f+1)*nc’

Here nc’=# of filters

Dr. Sandeep Singh Sengar

Number of parameters in one layer
Suppose 10 filters of size 3*3*3

Then total parameters will be: [333+1 (bias)]*10=280

That means one bias for each filter.

It is not dependent on the original image size (beauty of DL)

It makes model to less prone to overfitting.

Dr. Sandeep Singh Sengar

Automatically learnt features

Retain most information (edge detectors)

Towards more abstract representation

Encode high level concepts

Sparser representations:
Detect less (more abstract) features

https://towardsdatascience.com/applied-deep-learning-part-4-
convolutional-neural-networks-584bc134c1e2
Dr. Sandeep Singh Sengar
Non-linear Activation Function

Dr. Sandeep Singh Sengar

Pooling
▪ The goal of the pooling operation is to reduce the
spatial size of convolved features
▪ Pooling helps in extracting salient features which
are rotational and positional invariant
• For example, by changing the orientation of nose,
eyes and ears, the image segment would still be
detected as a head
• This is one of the most prominent features of CNNs

Dr. Sandeep Singh Sengar

Pooling
▪ Two types of pooling operators are common: Max
pooling and Average pooling
• Max pooling returns the maximum value from the
portion of the image covered by the filter
• Average pooling returns the average of all the
values from the portion of the image covered by the
filter

Dr. Sandeep Singh Sengar

Max Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

23.8
12.7 18.3 22.3 7.9 8.3

11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar

Max Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

23.8 23.8
12.7 18.3 22.3 7.9 8.3

11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar

Max Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

23.8 23.8 22.3

12.7 18.3 22.3 7.9 8.3

11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar

Max Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

23.8 23.8 22.3

12.7 18.3 22.3 7.9 8.3

18.3
11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar

Max Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

23.8 23.8 22.3

12.7 18.3 22.3 7.9 8.3

18.3 18.9
11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar

Average Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

14.8
12.7 18.3 22.3 7.9 8.3

11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar

Average Pooling
▪ Let’s apply a 3 x 3 filter on a 5 x 5 convolved
features map

15.5 23.8 7.9 20.6 12.9

14.8 15.6
12.7 18.3 22.3 7.9 8.3

11.3 9.2 11.8 18.9 10.3

11.7 11.3 17.5 6.8 19.3

18.3 19.6 11.2 15.2 7,2

Convolved Features

Dr. Sandeep Singh Sengar

Max Pooling

Possible Nodes in Hidden Layer i + 1

9 4x4 max
Hidden Layer i

-4 5 4 6
5 6
0 -3 2 -3 2x2 max,
8 9 non overlapping
7 8 -5 9
3 0 -4 1
5 5 6 2x2 max,
overlapping
8 8 9 (contains non-
I/P size: n*n overlapping, so
8 8 9 no need for both)
Filter size: f*f
Padding=p, Stride=s
O/P size: (n+2p-f)/s+1 Dr. Sandeep Singh Sengar
Fully Connected Layer

Dr. Sandeep Singh Sengar

Fully Connected Layer
• Simply, feed forward neural networks.
• Fully Connected Layers form the last few layers in the
network.
• The input to the fully connected layer is the output from
the final Pooling or Convolutional Layer in the flattened form.
• After passing through the fully connected layers, the final
layer uses the softmax activation function which is used to
get probabilities of the input being in a particular class
(classification).

Dr. Sandeep Singh Sengar

CNN Architectures
There are various architectures of CNNs available which have
been key in building algorithms which power and shall power AI
as a whole in the foreseeable future. Some of them have been
listed below:
• LeNet
• AlexNet
• VGGNet
• GoogLeNet
• ResNet
• ZFNet

Dr. Sandeep Singh Sengar

U-Net

Ref: Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." In International Conference on Medical image computing and computer-assisted
intervention, pp. 234-241. Springer, Cham, 2015. Dr. Sandeep Singh Sengar
Train, Validation and Test Datasets

• Training Dataset: The sample of data used to fit the model.

• Validation Dataset: The validation set is used to evaluate a given model. We as machine learning
researchers use this data to fine-tune the model hyperparameters. Hence the model occasionally
sees this data, but never does it “Learn” from this. So the validation set in a way affects a model, but
indirectly.
• Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the
training dataset. The Test dataset provides the gold standard used to evaluate the model. It is only
used once a model is completely trained (using the train and validation sets).

Make sure, validation and test set come from same distribution

Hyper parameters: Learning rate, #iterations, Dr.

#hidden layers,Singh
Sandeep #hidden units, choice of activation function
Sengar
Under-fitting and Over-fitting

High bias: under fitting

High variance: Overfitting
Dr. Sandeep Singh Sengar
Bias and Variance

Training set error 1% 15% 15% 0.5%

Validation set error 11% 16% 30% 1%
Result High High bias High bias and Low bias and
variance variance variance

Dr. Sandeep Singh Sengar

Bias-variance Trade-off

Dr. Sandeep Singh Sengar

Under fitting (High bias)
• A statistical model or a machine learning algorithm is said to have under
fitting when it cannot capture the underlying trend of the data.
• Under fitting destroys the accuracy of our machine learning model.
• Training accuracy is much low in this case.

Steps for reducing under fitting:

⮚ Bigger Network
⮚ Train long duration
⮚ Increase the number of parameters in the model

Dr. Sandeep Singh Sengar

Overfitting (high variance)
• Overfitting happens when your model fits too well to the training set.
• It then becomes difficult for the model to generalize to new examples
that were not in the training set.
Steps for reducing overfitting:
⮚ Add more data
⮚ Data augmentation (rotate, crop, zoom)
⮚ Simplify the model
⮚ Change the training process (like loss function)
⮚ Early termination
⮚ Regularization
❑ Dropout and drop connect
❑ L1 and L2 regularization

Dr. Sandeep Singh Sengar

Ideas to improve ML/DL strategies
• Collect more data
• Collect more diverse training examples
• Train algorithm longer with suitable optimizer
• Try bigger network
• Try smaller network
• Try dropout
• Add regularization
• Network architectures:
❑ Activation function
❑ #hidden units
❑ Learning rate
❑ Iterations

Dr. Sandeep Singh Sengar

Problems where ML/DL significantly surpasses
human level performance
• Online advertising: estimate, how likely someone will click on it
• Product recommendations
• Loan approval
• Lots of data

Dr. Sandeep Singh Sengar

CNN for Computer Vision tasks
• Object detection • Image Classification With
• Object Tracking Localization
• Recognition • Object Segmentation
• Face Recognition • Image Style Transfer
• Action and Activity • Image Colorization
Recognition • Image Reconstruction
• Human Pose Estimation • Image Super-Resolution
• Image Classification • Image Synthesis

Dr. Sandeep Singh Sengar

Challenges

The challenge of making • Difficult to simulate something as

systems human-like complex as the human visual system.
• Objects may be in variety of sizes
and aspect ratios.
• Distinguish one object from multiple
others.
• Variety of handwriting styles, curves,
and shapes employed while writing.
• Deformation, appearance variation,
scale variation, occlusion, rotation of
objects.

Computer vision has its present challenges, but the humans working on this technology are steadily
improving it. Dr. Sandeep Singh Sengar
CNN: A Real Example

Dr. Sandeep Singh Sengar

CNN: A Real Example

Dr. Sandeep Singh Sengar

CNN: A Real Example

Dr. Sandeep Singh Sengar

CNN: A Real Example

Dr. Sandeep Singh Sengar

CNN: A Real Example

Dr. Sandeep Singh Sengar

CNN: A Real Example
Filters Features Maps

Dr. Sandeep Singh Sengar

CNN: A Real Example
Filters Features Maps

Dr. Sandeep Singh Sengar

CNN: A Real Example

Dr. Sandeep Singh Sengar

CNN: A Real Example

Dr. Sandeep Singh Sengar

CNN: A Real Example

Dr. Sandeep Singh Sengar

Convolutional Neural Network
Let the task is to predict an image caption
▪ The CNN receives an image of let's say a cat

• This image, in computer term, is a collection of the pixel

▪ Generally, one layer for the greyscale picture and three
layers for a color picture
▪ During the feature learning (i.e., hidden layers), the
network will identify unique features, for instance, the
tail of the cat, the ear, etc.
▪ When the network thoroughly learned how to recognize
a picture, it can provide a probability for each image it
knows
▪ The label with the highest probability will become the
prediction of the network
Dr. Sandeep Singh Sengar
Which Works Better: RNN or CNN?
▪ There is a vast amount of neural network, where
each architecture is designed to perform a given
task
▪ CNN works very well with images
▪ RNN (Recurrent Neural Network) provides
impressive results with time series and text
analysis

Dr. Sandeep Singh Sengar

Self-Review Questions
▪ What is convolution and how it works?
▪ What is pooling and how it works?
▪ What would be the impact of large/small
striding length?

Dr. Sandeep Singh Sengar

References

“Digital Image Processing”, Rafael C.

Gonzalez & Richard E. Woods,
Addison-Wesley, 2002
– Much of the material that follows is taken from
this book

“Machine Vision: Automated Visual

Inspection and Robot Vision”, David
Vernon, Prentice Hall, 1991

Dr. Sandeep Singh Sengar

Thank You

Dr. Sandeep Singh Sengar

COMP3411 Week 7 - Computer Vision
No ratings yet
COMP3411 Week 7 - Computer Vision
58 pages
DIP - ch1
No ratings yet
DIP - ch1
19 pages
Project Proposal Business Presentation in Dark Blue Pink Abstract Tech Style
No ratings yet
Project Proposal Business Presentation in Dark Blue Pink Abstract Tech Style
38 pages
CS4442 - CS9542 - Part 2 - Lecture 1 - Intro - Filtering
No ratings yet
CS4442 - CS9542 - Part 2 - Lecture 1 - Intro - Filtering
40 pages
Computer Vision
No ratings yet
Computer Vision
35 pages
CS4442 - CS9542 - Part 2 - Lecture 1 - Intro - Filtering
No ratings yet
CS4442 - CS9542 - Part 2 - Lecture 1 - Intro - Filtering
52 pages
Computer Vision: Presented By: Bikram Neupane (1925101) Sudeep Shrestha (1925111) MSC - Cs III
No ratings yet
Computer Vision: Presented By: Bikram Neupane (1925101) Sudeep Shrestha (1925111) MSC - Cs III
23 pages
Introcduction To Image Processing With Python Nour Eddine ALAA and Ismail Zine El Abidne March 5, 2021
No ratings yet
Introcduction To Image Processing With Python Nour Eddine ALAA and Ismail Zine El Abidne March 5, 2021
77 pages
Lecture 1 AI Summary
No ratings yet
Lecture 1 AI Summary
31 pages
Revisionback
No ratings yet
Revisionback
13 pages
Sagar Paper
No ratings yet
Sagar Paper
4 pages
CNN Course: Build & Apply Networks
No ratings yet
CNN Course: Build & Apply Networks
95 pages
Unit 1
No ratings yet
Unit 1
179 pages
Digital Image Processing Course
100% (1)
Digital Image Processing Course
81 pages
Digital Image Processing Seminar
80% (5)
Digital Image Processing Seminar
23 pages
Digital Image Processing
No ratings yet
Digital Image Processing
56 pages
Image Processing Through Machine Learning: By:-Akansh Kumar (En-1)
No ratings yet
Image Processing Through Machine Learning: By:-Akansh Kumar (En-1)
22 pages
Digital Image Fundamentals
No ratings yet
Digital Image Fundamentals
50 pages
Computer Vision 2
No ratings yet
Computer Vision 2
62 pages
Computer Vision Al 701
No ratings yet
Computer Vision Al 701
50 pages
Convolutional Nets
No ratings yet
Convolutional Nets
41 pages
Robotics
No ratings yet
Robotics
35 pages
Chapter 2
No ratings yet
Chapter 2
66 pages
Thesis Research Deep Learning
No ratings yet
Thesis Research Deep Learning
18 pages
Lec 1
No ratings yet
Lec 1
52 pages
Facial Expression Recognition Using Artificial Neural Networks
No ratings yet
Facial Expression Recognition Using Artificial Neural Networks
6 pages
(Ebook PDF) Digital Image Processing, Global Edition 4th Edition PDF Download
100% (1)
(Ebook PDF) Digital Image Processing, Global Edition 4th Edition PDF Download
43 pages
Digital Imaging Basics for Beginners
No ratings yet
Digital Imaging Basics for Beginners
31 pages
Digital Image Processing - Lecture Notes
0% (1)
Digital Image Processing - Lecture Notes
32 pages
Lecture 2 Image Formation
No ratings yet
Lecture 2 Image Formation
70 pages
Paper BackProoagation
No ratings yet
Paper BackProoagation
13 pages
Basic Concepts
No ratings yet
Basic Concepts
5 pages
Machine - Learning (Computer Vision)
No ratings yet
Machine - Learning (Computer Vision)
56 pages
Dip Module 1 Notes
No ratings yet
Dip Module 1 Notes
33 pages
Unit-I: Digital Image Fundamentals & Image Transforms
100% (1)
Unit-I: Digital Image Fundamentals & Image Transforms
70 pages
CV Unit 1
No ratings yet
CV Unit 1
17 pages
Week5 Computer Vision
No ratings yet
Week5 Computer Vision
58 pages
Computer Vision & Image Processing Course
No ratings yet
Computer Vision & Image Processing Course
3 pages
Lect02 ImageProcessingReview
No ratings yet
Lect02 ImageProcessingReview
53 pages
Unit 1 Computer Vision
No ratings yet
Unit 1 Computer Vision
10 pages
Digital Image Processing: Instructor: Namrata Vaswani
No ratings yet
Digital Image Processing: Instructor: Namrata Vaswani
27 pages
Digital Image Processing Basics
No ratings yet
Digital Image Processing Basics
62 pages
CV SVD L01 P1 Intro
No ratings yet
CV SVD L01 P1 Intro
35 pages
CNNs Tutorial: DeepLearning.ai Course
100% (1)
CNNs Tutorial: DeepLearning.ai Course
25 pages
Al3502 - DLV Unit 1 Notes
No ratings yet
Al3502 - DLV Unit 1 Notes
15 pages
Image Processing
No ratings yet
Image Processing
18 pages
Deep Learning for Vision Experts
No ratings yet
Deep Learning for Vision Experts
91 pages
Machine Vision
100% (4)
Machine Vision
453 pages
Machine Vision
No ratings yet
Machine Vision
453 pages
Digital Image Processing Lecture Notes
No ratings yet
Digital Image Processing Lecture Notes
342 pages
By Dr. Lochandaka Ranathunga
No ratings yet
By Dr. Lochandaka Ranathunga
20 pages
Computer Vision Class 10 Notes
No ratings yet
Computer Vision Class 10 Notes
5 pages
IPCV Lecture 01
No ratings yet
IPCV Lecture 01
74 pages
Lec 1 Image Processing
No ratings yet
Lec 1 Image Processing
59 pages
CV Unit 1
No ratings yet
CV Unit 1
61 pages
Measures of Dispersion (Part 1)
No ratings yet
Measures of Dispersion (Part 1)
28 pages
Chapter 4 - BUSINESS STATISTICS
No ratings yet
Chapter 4 - BUSINESS STATISTICS
14 pages
Unit Conversion Worksheet Practice
No ratings yet
Unit Conversion Worksheet Practice
4 pages
Modelling The Human Body Exposure To ELF Electric Fields
100% (1)
Modelling The Human Body Exposure To ELF Electric Fields
153 pages
AGN 032 - Motor Starting Methods: Application Guidance Notes: Technical Information From Cummins Generator Technologies
No ratings yet
AGN 032 - Motor Starting Methods: Application Guidance Notes: Technical Information From Cummins Generator Technologies
6 pages
Web Services
No ratings yet
Web Services
77 pages
Applications Lecture1
No ratings yet
Applications Lecture1
3 pages
Problem-Solving Agent: Goal Formulation Problem Formulation Search
No ratings yet
Problem-Solving Agent: Goal Formulation Problem Formulation Search
11 pages
DSA ORAL Question Bank
No ratings yet
DSA ORAL Question Bank
17 pages
Lecture 4 Cup Method Problem Set
No ratings yet
Lecture 4 Cup Method Problem Set
4 pages
G9 ENGLISH REGULAR - Google Drive 3
No ratings yet
G9 ENGLISH REGULAR - Google Drive 3
1 page
Assignment 2
No ratings yet
Assignment 2
2 pages
Momentum Picks
No ratings yet
Momentum Picks
27 pages
Marantz SR 4021 Service Manual
No ratings yet
Marantz SR 4021 Service Manual
35 pages
Holiday Homework Autumn
No ratings yet
Holiday Homework Autumn
2 pages
Iso 16634 1 2008 en PDF
No ratings yet
Iso 16634 1 2008 en PDF
11 pages
HW 5 Arc F04
No ratings yet
HW 5 Arc F04
7 pages
Astronomy Enthusiasts: Betelgeuse
No ratings yet
Astronomy Enthusiasts: Betelgeuse
4 pages
Conductivity Method
No ratings yet
Conductivity Method
2 pages
Preliminary Program - MEPCON 2019
No ratings yet
Preliminary Program - MEPCON 2019
12 pages
Mammalian Tissues Post-Lab Guide
100% (1)
Mammalian Tissues Post-Lab Guide
5 pages
Changes: Nil: This Chart Is A Part of Navigraph Charts and Is Intended For Flight Simulation Use Only
No ratings yet
Changes: Nil: This Chart Is A Part of Navigraph Charts and Is Intended For Flight Simulation Use Only
46 pages
XI Computer Science Guide 2019-20
No ratings yet
XI Computer Science Guide 2019-20
89 pages
Worksheet 8.3 (Ionic Equation Step by Step)
No ratings yet
Worksheet 8.3 (Ionic Equation Step by Step)
2 pages
2KW and 3KW Singlephase EV Charger
No ratings yet
2KW and 3KW Singlephase EV Charger
14 pages
IWC Catalogue 2014 - 2015
50% (2)
IWC Catalogue 2014 - 2015
310 pages
Power Supply Test Report
No ratings yet
Power Supply Test Report
5 pages
PHYSICS - Quiz Bee Reviewer
100% (6)
PHYSICS - Quiz Bee Reviewer
2 pages
Epp 6 Ict Summative Test 2
No ratings yet
Epp 6 Ict Summative Test 2
2 pages
DNP 3.0 vs IEC 870-5-101 vs Modbus
No ratings yet
DNP 3.0 vs IEC 870-5-101 vs Modbus
5 pages

Deep Learning Based Computer Vision

Uploaded by

Deep Learning Based Computer Vision

Uploaded by

ARTIFICIAL NEURAL NETWORKS AND DEEP LEARNING

Deep Learning and its role in Computer Vision

Dr. Sandeep Singh Sengar

Image is a two-dimensional intensity function f(x, y), where

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

RGB Color Space

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

An image processing operation typically defines a new

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

The above is repeated for every pixel in the original

Low Level Process Mid Level Process High Level Process

Dr. Sandeep Singh Sengar

intensity transformation: neighborhood of size 1x1

For each pixel in the origin image, the outcome is written on

The above is repeated for every pixel in the original

e = 1/9*106 + 1/9*104 + 1/9*100 +

The above is repeated for every pixel in

Dr. Sandeep Singh Sengar

Spatial filters : Smoothing

110 120 90 130 The mask is moved

Standard averaging filter:

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Original Image Smoothed Image Thresholded Image

Dr. Sandeep Singh Sengar

Original Image Image After Image After

• Filtering is often used to remove noise from images.

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

What happens when the Values of the Kernel Fall Outside

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Object recognition is a computer vision technique for

Dr. Sandeep Singh Sengar

ML came in 1950s. Defined in 1951 by “Arthur Samuel” at IBM

Dr. Sandeep Singh Sengar

Slide adapted from Wai K. Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

RNN Speech recognition, handwriting recognition, Machine Translation

Natural language text compression, handwriting recognition,

Image recognition, information retrieval, natural language

DSN Information retrieval, continuous speech recognition

Dr. Sandeep Singh Sengar

The above is repeated for every pixel in the original

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

Dr. Sandeep Singh Sengar

O/P size: (n×n×nc)*(f×f×nc) --> (n-f+1)*(n-f+1)*nc’

Here nc’=# of filters

Dr. Sandeep Singh Sengar

Then total parameters will be: [3*3*3+1 (bias)]*10=280

That means one bias for each filter.

It is not dependent on the original image size (beauty of DL)

Dr. Sandeep Singh Sengar

Retain most information (edge detectors)

Towards more abstract representation

Encode high level concepts

Dr. Sandeep Singh Sengar

e = 1/9106 + 1/9104 + 1/9*100 +

O/P size: (n×n×nc)(f×f×nc) --> (n-f+1)(n-f+1)*nc’

Then total parameters will be: [333+1 (bias)]*10=280