Convolutional Neural Networks (Part I)
Convolutional Neural Networks (Part I)
Networks
(Part I)
• SHAHRULLOHON LUTFILLOHONOV
• shahrullo@pusan.ac.kr
• PNU DataLab
• 2020/10/09
Outline
• Overview
• Motivation
• The Convolution Operation
• Pooling
• Convolution and Pooling as an Infinitely Strong Prior
2
Outline
• Overview
• Motivation
• The Convolution Operation
• Pooling
• Convolution and Pooling as an Infinitely Strong Prior
3
Overview of Convolutional Networks
• Convolutional Networks, a.k.a. Convolutional Neural Networks (CNNs) are a
specialized kind of neural network
• For processing data that has a known grid-like topology
• Ex: time-series data, which is a 1-D grid,
taking samples at intervals
• Maximum likelihood
• Back-propagation
• Etc.
5
Overview of Convolutional Networks
A computer sees an image as an array of
numbers
6
Outline
• Overview
• Motivation
• The Convolution Operation
• Pooling
• Convolution and Pooling as an Infinitely Strong Prior
7
Why Convolution Instead of Matrix
Multiplication?
We saw before:
8
A Problem
10
Motivation: Sparse Connectivity
• Connections in CNNs are sparse, but units in deeper layers are connected to
all of the input ( larger receptive field sizes)
13
Practical Example
• Parameter sharing refers to using the same parameter for more than one function in a model
• Plain vanilla NN: Each element of the weight matrix is used exactly once when computing the
output of a layer
• Each member of the kernel is used in every position of the input (except at the boundary)
15
How Parameter Sharing Works
• 1. Convolutional model: black arrows indicate uses of the central element of a 3-element kernel
• 2. Fully connected model: single black arrow indicates use of the central element of the weight
matrix
• Model has no parameter sharing,
so the parameter is used only once
16
Motivation: Equivariance To Translation
• The particular form of parameter sharing leads to equivariance to translation
• Equivariant means that if the input changes, the output changes in the same way
• A function is equivariant to if
• If is a function that translates the input, i.e., that shifts it, then the convolution function is
equivariant to
17
Various instances of Cats detected due to
property of Translational Equivariance.
18
Absence of Equivariance
• In some cases, we may not wish to share parameters
the face
transformations
19
A Problem
21
Dropped Ball
22
What is Convolution?
Total likelihood
If we substitute
23
What is Convolution?
• Convolution – an operation on two functions of a real-valued argument
Input Additional
Feature measurements
map Kernel
parameters adapted by the
multidimensional array of data learning algorithm
continuous
∞
∑ 𝑥 ( 𝑎 ) 𝑤 (𝑡 − 𝑎) discrete
𝑎=− ∞
25
Two-Dimensional Convolution
• Convolutions over more than one axis
Commutativity
arises because
com
we have flipped
mu
tati
ve
the kernel
relative to the
input
26
Cross-Correlation
• Same as convolution, but without flipping the kernel
• In ML, learning algorithm will learn appropriate values of the kernel in the
appropriate place
27
Convolution vs Cross-Correlation
28
Overall Architecture
30
Convolution
34
The detector (activation) layer
36
What is Pooling?
• Pooling a CNN is a subsampling step
• It replaces output at a location with a summary statistic of nearby outputs
37
38
39
40
41
42
43
Types of Pooling Functions
• 1. Max Pooling
• 4. Weighted average
44
Pooling causes Translation Invariance
45
Max Pooling Produces Invariance To Translation
• View of middle of output of a convolutional layer
Outputs of maxpooling
Outputs of nonlinearity
• Same network after the input has been shifted by one pixel
• Every input value has changed, but only half the values of output have changed because maxpooling
units are only sensitive to maximum value in neighborhood not exact value
46
Why Translation Invariance IMPORTANT?
• Invariance to translation is important if we care about whether a feature is present
47
Pooling with Downsampling
• Downsampling allows the features to be flexibly positioned
Downsampling
49
Prior Parameter Distribution
50
Weak and Strong Priors
• A weak prior
• A distribution with high entropy
• e.g., Gaussian with high variance
• Data can move parameters freely
• A strong prior
• It has very low entropy
• e.g., a Gaussian with low variance
• Such a prior plays a more active role in determining
where the parameters end up
51
Infinitely Strong Prior
• It says that some parameter values are forbidden regardless of support from data
• With an infinitely strong prior, irrespective of the data the prior cannot be
changed
52
Convolution As Infinitely Strong Prior
• Convolutional net is similar to a fully connected net but with an infinitely
strong prior over its weights
• weights for one hidden unit must be identical to the weights of its neighbor, but shifted
in space
• weights must be zero, except for in the small spatially contiguous receptive field
assigned to that hidden unit
• The function the layer should learn contains only local interactions and is equivariant to
translation
53
Pooling As Infinitely Strong Prior
• The use of pooling is an infinitely strong prior that each unit should be invariant to
small translations
• Maxpooling example:
54
Key Insight: Underfitting
• Convolution and pooling can cause underfitting
• Underfitting happens when model has high bias
55
Architecture and the training process
• Each layer
produces values
that are obtained
from previous
layer by
performing a
matrix-
multiplication
A model’s performance under particular kernels and weights is calculated with a loss
function through forward propagation on a training dataset
57
Architecture and the training process
learnable parameters (kernels and weights) are updated according to the loss value through
backpropagation with gradient descent optimization algorithm
58
Classification with Convolutional Networks
59
Conclusion
• Scale up neural networks to process very large images/video sequences
• Sparse Interactions
• Parameter Sharing
• Applicable to any input that is laid out on a grid (1-D, 2-D, 3-D, …)
60
THANK YOU VERY MUCH
QUESTIONS?
61