[go: up one dir, main page]

0% found this document useful (0 votes)
91 views61 pages

Convolutional Neural Networks (Part I)

This document provides an overview of convolutional neural networks (CNNs). It discusses how CNNs use convolution operations instead of matrix multiplication, which allows them to leverage sparse interactions, parameter sharing, and equivariance to translation. This helps CNNs better process grid-like data like images and be invariant to translations. The document outlines CNN architecture and operations like convolution, cross-correlation, and pooling layers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views61 pages

Convolutional Neural Networks (Part I)

This document provides an overview of convolutional neural networks (CNNs). It discusses how CNNs use convolution operations instead of matrix multiplication, which allows them to leverage sparse interactions, parameter sharing, and equivariance to translation. This helps CNNs better process grid-like data like images and be invariant to translations. The document outlines CNN architecture and operations like convolution, cross-correlation, and pooling layers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Convolutional Neural

Networks
(Part I)
• SHAHRULLOHON LUTFILLOHONOV
• shahrullo@pusan.ac.kr
• PNU DataLab
• 2020/10/09
Outline
• Overview
• Motivation
• The Convolution Operation
• Pooling
• Convolution and Pooling as an Infinitely Strong Prior

2
Outline
• Overview
• Motivation
• The Convolution Operation
• Pooling
• Convolution and Pooling as an Infinitely Strong Prior

3
Overview of Convolutional Networks
• Convolutional Networks, a.k.a. Convolutional Neural Networks (CNNs) are a
specialized kind of neural network
• For processing data that has a known grid-like topology
• Ex: time-series data, which is a 1-D grid,
taking samples at intervals

• Image data, which are 2-D grid of pixels


• They utilize convolution, which is a specialized kind of
linear operation 4
KEY idea
• Replace matrix multiplication in neural nets with convolution

• Everything else stays the same

• Maximum likelihood

• Back-propagation

• Etc.

5
Overview of Convolutional Networks
A computer sees an image as an array of
numbers

6
Outline
• Overview
• Motivation
• The Convolution Operation
• Pooling
• Convolution and Pooling as an Infinitely Strong Prior

7
Why Convolution Instead of Matrix
Multiplication?

We saw before:

• A series of matrix multiplications:

8
A Problem

input layer output


layer
• Will a NN that recognizes the left image as a flower
also recognize the one on the right as a flower?
• Need a network that will “fire” regardless of the
precise location of the target object 9
Motivation For Using Convolution Networks

• 1. Convolution leverages three important ideas to improve ML systems:


• Sparse interactions
• Parameter sharing
• Equivariant representations
• 2. Convolution also allows for working with inputs of variable size

10
Motivation: Sparse Connectivity

 • Fully connected network: is computed by full matrix multiplication with no


sparse connectivity
11
Motivation: Sparse Connectivity

 • Kernel of size 3: only depends on


12
Motivation: Sparse Connectivity

• It is possible to obtain good


performance while keeping
kernel several magnitudes
lower than input

• Connections in CNNs are sparse, but units in deeper layers are connected to
all of the input ( larger receptive field sizes)
13
Practical Example

Fully connected layer Sparse layer with kernel 3x3

~7.5 billion ~1,354 thousand


(=224²x 224²x3) Convolution is billions times more efficient (=224²x3x3x3)
parameters parameters

Convolutional layer also incorporates parameter sharing and this


number will decrease further
14
Motivation: Parameter Sharing

• Parameter sharing refers to using the same parameter for more than one function in a model

• Plain vanilla NN: Each element of the weight matrix is used exactly once when computing the

output of a layer

• It is multiplied by one element of the input and never revisited

• Parameter sharing is synonymous with tied weights

• Weight applied to one input is tied to value of a weight applied elsewhere

• Each member of the kernel is used in every position of the input (except at the boundary)

15
How Parameter Sharing Works

• 1. Convolutional model: black arrows indicate uses of the central element of a 3-element kernel

• 2. Fully connected model: single black arrow indicates use of the central element of the weight
matrix
• Model has no parameter sharing,
so the parameter is used only once

16
Motivation: Equivariance To Translation
 
• The particular form of parameter sharing leads to equivariance to translation
• Equivariant means that if the input changes, the output changes in the same way
• A function is equivariant to if
• If is a function that translates the input, i.e., that shifts it, then the convolution function is
equivariant to

17
Various instances of Cats detected due to
property of Translational Equivariance.

18
Absence of Equivariance
• In some cases, we may not wish to share parameters

across entire image

• If image is cropped to be centered on a face, we

may want different features from different parts of

the face

• Part of the network processing the top of the face

looks for eyebrows

• Part of the network processing the bottom of the

face looks for the chin

• Certain image operations such as scale and rotation are

not equivariant to convolution

• Other mechanisms are needed for such

transformations
19
A Problem

• Scan for the desired object


• “Look” for the target object at each position
• At each location, entire region is sent through NN
20
Outline
• Overview
• Motivation
• The Convolution Operation
• Pooling
• Convolution and Pooling as an Infinitely Strong Prior

21
Dropped Ball

We consider all the possible ways

22
What is Convolution?

 Probability for each case of is

Total likelihood

 The convolution of and , evaluated at is:

 If we substitute

23
What is Convolution?
• Convolution – an operation on two functions of a real-valued argument

Input Additional
Feature measurements
map Kernel
parameters adapted by the
multidimensional array of data learning algorithm

Arrays of input and kernel are referred to as tensors


24
Convolution Operation

continuous

  ∞

∑ 𝑥 ( 𝑎 ) 𝑤 (𝑡 − 𝑎) discrete
𝑎=− ∞

25
Two-Dimensional Convolution
• Convolutions over more than one axis
Commutativity
arises because
com
we have flipped
mu
tati
ve
the kernel
relative to the
input

Easier to implement since there is less variation in the


range of valid values of m and n

26
Cross-Correlation
• Same as convolution, but without flipping the kernel

• Both referred to as convolution, and whether kernel is flipped or not

• In ML, learning algorithm will learn appropriate values of the kernel in the
appropriate place

27
Convolution vs Cross-Correlation

28
Overall Architecture

Convolutional Detector layer:


Input Pooling Layer Next layers…
Layer non-linearity

perform several each linear activation


use a pooling
convolutions in is run through a
function to
parallel to produce a nonlinear activation
modify output of
set of linear function such as
the layer further
activations ReLU
29
Convolution

Convolutional Detector layer:


Input Pooling Layer Next layers…
Layer non-linearity

30
Convolution

Convolve image with kernel having weights w (learned by backpropagation)


31
32
The Convolution Operation

Generally, image pixels are 3D vectors with RGB


color (height * weight * depth)
33
Non-Linearity

Convolutional Detector layer:


Input Pooling Layer Next layers…
Layer non-linearity

34
The detector (activation) layer

After obtaining feature map, apply an elementwise non-


linearity to obtain a transformed feature map (same size)
35
Outline
• Overview
• Motivation
• The Convolution Operation
• Pooling
• Convolution and Pooling as an Infinitely Strong Prior

36
What is Pooling?
• Pooling a CNN is a subsampling step
• It replaces output at a location with a summary statistic of nearby outputs

Convolutional Detector layer:


Input Pooling Layer Next layers…
Layer non-linearity

37
38
39
40
41
42
43
Types of Pooling Functions

• Popular 4 types of pooling functions:

• 1. Max Pooling

• 2. Average Pooling of a rectangular neighborhood

• 3. L2 norm of a rectangular neighborhood

• 4. Weighted average

• based on the distance from the central pixel

44
Pooling causes Translation Invariance

• Pooling makes the representation become approximately invariant to


small translations of the input
• If we translate the input by a small amount values, most of the outputs does
not change

45
Max Pooling Produces Invariance To Translation
• View of middle of output of a convolutional layer
Outputs of maxpooling

Outputs of nonlinearity

• Same network after the input has been shifted by one pixel

• Every input value has changed, but only half the values of output have changed because maxpooling
units are only sensitive to maximum value in neighborhood not exact value
46
Why Translation Invariance IMPORTANT?
• Invariance to translation is important if we care about whether a feature is present

rather than exactly where it is


• For detecting a face we just need to know that an eye is present in a region, not its exact location

47
Pooling with Downsampling
• Downsampling allows the features to be flexibly positioned

Downsampling

• Downsampling pixels will not change the object bird


• We can downsample the pixels to make image smaller:
• fewer parameters to characterize the image
48
Outline
• Overview
• Motivation
• The Convolution Operation
• Pooling
• Convolution and Pooling as an Infinitely Strong Prior

49
Prior Parameter Distribution

• Role of a prior probability distribution over the parameters of a model is:


• Encode out belief as to what models are reasonable before seeing the data

50
Weak and Strong Priors
• A weak prior
• A distribution with high entropy
• e.g., Gaussian with high variance
• Data can move parameters freely

• A strong prior
• It has very low entropy
• e.g., a Gaussian with low variance
• Such a prior plays a more active role in determining
where the parameters end up

51
Infinitely Strong Prior

• An infinitely strong prior places zero probability on some parameters

• It says that some parameter values are forbidden regardless of support from data

• With an infinitely strong prior, irrespective of the data the prior cannot be

changed

52
Convolution As Infinitely Strong Prior
• Convolutional net is similar to a fully connected net but with an infinitely
strong prior over its weights
• weights for one hidden unit must be identical to the weights of its neighbor, but shifted
in space
• weights must be zero, except for in the small spatially contiguous receptive field
assigned to that hidden unit
• The function the layer should learn contains only local interactions and is equivariant to
translation

53
Pooling As Infinitely Strong Prior
• The use of pooling is an infinitely strong prior that each unit should be invariant to
small translations
• Maxpooling example:

54
Key Insight: Underfitting
• Convolution and pooling can cause underfitting
• Underfitting happens when model has high bias

• Convolution and pooling are only useful when the


assumptions made by the prior are reasonably
accurate
• Pooling may be inappropriate in some cases
• If the task relies on preserving spatial information
• Using pooling on all features can increase training error

55
Architecture and the training process
• Each layer
produces values
that are obtained
from previous
layer by
performing a
matrix-
multiplication

A CNN is composed of a stacking of several building blocks:


• Convolutional layer
• Pooling layer
• Fully connected layer
56
Architecture and the training process

A model’s performance under particular kernels and weights is calculated with a loss
function through forward propagation on a training dataset

57
Architecture and the training process

learnable parameters (kernels and weights) are updated according to the loss value through
backpropagation with gradient descent optimization algorithm

58
Classification with Convolutional Networks

59
Conclusion
• Scale up neural networks to process very large images/video sequences

• Sparse Interactions

• Parameter Sharing

• Automatically generalize across spatial translations of inputs

• Computationally efficient compared to general matrix multiplication

• Invariant to small changes

• Infinitely Strong Prior

• Applicable to any input that is laid out on a grid (1-D, 2-D, 3-D, …)
60
THANK YOU VERY MUCH

QUESTIONS?
61

You might also like