Neural Network
What is Deep Learning
label
dataset (with label)
Cat
Dog
It is called deep learning if we use neural network as a model in supervise learning
What is Deep Learning
Neural network
We can have million of neuron to analyse and learn the pattern (features) of a given data
and memorize the pattern.
NN Basic and Concept
Neural Network components
Neural network compose of 4 main components
1) Layers
2) Input and output
3) Loss function
4) optimizer
Layer
Layers
Node aka neuron
(e.g. layer has 4
neurons)
Fully connected layer/ dense
Input & Output
Input layer
(features)
Output layer
(classes/labels)
Input & Output
n
Class 1
m
Class 2
Xmn
The whole image is input
to first layer at once.
E.g. If you have 100 images, each image
will be inserted into NN one by one
Input & Output
Sentosa
X3 others
X4
The four features at first row are input to blue layer at once
Loss Function
4 main components
1) Neuron
2) Weight and Biases
3) Activation function
4) Feedforward
Neuron
Artificial
b1 neuron is
w1
referred
S F perceptron
Weight & Bias
Input layer Layer 1 x : input
w :weight
b b : bias
x F(s) : output
w s F F(s)
s = w*x + b
Activation function
Input layer Layer 1 x : input
w :weight
b b : bias
x F(s) : output
w s F F(s)
F(s) = 1
s = w*x + b 1+ e-x
Feed forward
Input layer Layer 1 x : input
x=1
w :weight
w=0.3
b b : bias
b= - 0.3 x F(s) : output
s= 0 w s F F(s)
F(s) = ?
F(s) = 1
s = w*x + b 1+ e-x
Loss
Forward direction to
reach F(S)
x=1
w=0.3 x b
w
b= - 0.3
s F
s=
F(s) = F(S)
Target value (label)
target value =
Loss =
Loss = target value – F(S)
Loss is the different between target value and F(s). Also called error
MSE
Loss
Forward direction to
reach F(S)
x=1
x b
w=0.3
w
w=0.3, b=-0.3
b= - 0.3
s F s = w*x +b
s= F(S) w x b s target value loss
F(s) =
Target value (label) 0.3*1 + (-0.3) = 0 0 0
target value =
Loss =
0.3*2 + (-0.3) = 0.3 -1 1.68
Loss = target value – F(S) 0.3*3 + (-0.3) = 0.6 -2 6.75
0.3*4 + (-0.3) = 0.9 -3 15.21
Lets x = 1, 2, 3, 4
Optimizer
4 main components
1) Backpropagation
2) Optimizer
3) Learning rate
4) Epoch & Accuracy
Back propagation
Backward direction
We go
backward in b
effort to x w
minimize s F Loss
the loss F(S)
Target value (label)
Loss = target value – F(S)
Optimizer (Reducing the loss)
Backward direction
We go backward
Change in value
in effort to b
minimize x w
s= (x*w) + b What are w and b so that loss is zero?
the loss using s F
F(s) w=-1, b=1
Optimizer. It is a
function to Target value (label) s = w*x +b
change w x b s target value loss
w and b so that -1*1 + 1 = 0 0 0
loss is zero
-1*2 + 1 = -1 -1 0
-1*3 + 1 = -2 -2 0
Loss = target value – F(s) = 0 0
-1*4 + 1 = -3 -3
Learning rate
Backward direction
Optimizer
updates Change in value
weight and x
b
w
bias toward s= (x*w) + b
s F
zero loss. F(s)
Learning rate Target value (label)
is the rate of
optimizer
changes
weights and Loss = target value – F(s) = 0
biases
Epoch
Backward direction
Forward direction to
One epoch Reach target value
Input layer 1 output
consists of (Dataset)
one forward
direction and b
then one x1, x2, w Optimizer
x3, x4,
backward x5, x6 s F Loss
F(s)
direction and Target value (label)
optimizer is
executed 6 inputs are fetched into neuron in one cycle is one epoch
once for each
samples in You need run several epoch until loss is zero
dataset
Now let adding more layers (multilayer)
Adding more neuron
w1, w2 .. W4 are weights
(Every input must x1 (Every path has weight)
w1 b1
has path to neuron(node))
w2 1
S1 F F(S1) =
1+ e-s
w3 b1and b2 is bias
x2
b2 (Node has bias)
w4
S2 F 1
F(S2) =
1+ e-s Every node has function F(s)
Adding more layers
Input layer 1 layer 2 output
2 inputs
b3 Model has 2 hidden layers
Layer 1 has 2 nodes
x1 w1 b1 Layer 2 has 3 nodes
S F w11
w5 Layer output has two labels
w2
S F w6 w12
w7 b4 S F Target value 1
w3
x2 w13
w4 b2 S F
w8 w14
Target value 2
S F w15 S F
w9
b5 w16
w10 Every input has path to every node
Every node has path to every node
S F
Every path has weight and value are different
Every node has different bias (b) except the output nodes
Every node has same activation function (F)
Every node has different output F(s)
Every node has to do summation (s)
F(s) is Activation function
Layer 1
Final value F(s) come out from nodes
determine by activation function. In
this example we use sigmoid function as
wn+1 activation function.
S F F(s)
1
F(S) =
1+ e-s
F(s)
1
Sigmoid function: F(S) = 1+ e-x
Sigmoid vs. Tanh ReLU Leaky ReLU
Feed forward
From x forward direction to
reach F(S)
Input layer 1 layer 2 output
(Dataset)
x1
S F
S F
S F F1(S)
x2 S F
S F S F F2(S)
S F
Loss
Forward direction to
reach F(S)
Input layer 1 layer 2 output
(Dataset)
x1
S F
S F
S F Target value (label)
x2 S F Loss = target value – F
S F Every path has loss
S F
Average of loss
S F
Back propagation
From F(s) backward direction
To reach x
We go Input layer 1 layer 2 output
backward in b3
effort to x1 b1
w1 S F
minimize w5 w11
w2
the loss by S F w6 w12
w7 b4 S F Target value 1
changing x2
w3
w13
value of w4 b2 S F
w8 w14
weight &, Target value 2
S F w15 S F
biases w9
b5
w10 w16
S F
Optimizer
From x forward direction to
reach F(S)
We go backward Input layer 1 layer 2 output
in effort to b3
minimize x1 b1
w1 S F
the loss. w5 w11
w2
Optimizer is a S F w6 w12
w7 b4 S F Target value 1
function to x2
w3
w13
change w4 b2 S F
w8 w14
weights and Target value 2
S F w15 S F
biases so that w9
b5
w10 w16
loss is zero
S F
Optimizer
Backward direction
Change in value
x1
w1 b1 b3 Loss
S F
w2 b2 S F Target value (label)
w3
S F
Optimizer works at every path
Objective : Loss = 0
Types of optimizer
GradientDescentOptimizer
AdadeltaOptimizer
MomentumOptimizer
AdamOptimizer
FtrlOptimizer
RMSPropOptimizer
Learning rate
Backward direction
Optimizer Change the
updates value according
to learning rate
weight and
bias toward
x1
zero loss. w1 b1 b3 Loss
Learning rate
S F
is the rate of w2 b2 S F Target value (label)
w3
optimizer
S F
changes
weights and Loss = target value - F
biases
Learning rate : rule that optimizer has to follow in changing w and b
Epoch
Backward direction
Forward direction to
One epoch Reach target value
Input layer 1 layer 2 output
consists of (Dataset)
one forward
direction and x1
S F Loss
then one
backward S F
Target value (label)
S F
direction and
optimizer is S F
Loss = target value - F
executed
S F S F
once for all
sample in
dataset. S F
Epoch, batch & iterations
Epoch
Batch
Iteration
Epoch, batch & iterations
This approach called “Mini batch gradient descent”
Epoch, batch & iterations
Dataset is 100 samples
Epoch = 40
Num_of_batch (iteration) = 5
Batch_size = 20
for i less than or equal to Epoch
for j less than or equal to Num_of_batch
compute loss and optimized Batch_size
Epoch, batch & iterations
What is happening during epoch?
Dataset is 1 sample
Epoch = 4
Num_of_batch = 1
Batch_size = 1
for i less than or equal to Epoch
for j less than or equal to Num_of_batch
After 4 epoch the optimizer achieves 0 error
compute loss and optimized Batch_size
Parameter and hyperparameter
Parameter Any value that change by computer.
They are weight and biases. Automatically update by optimizer
Hyperparameter Any value that change by human.
They are learning rate, epoch, batch, number of layer, number of nodes
dropped out rate.
Tutorial
How many parameters in this model
b3
w1 b1
X1 S F w11
w5
w2
S F w6 w12
w7 b4 S F
w3
x2 w13
w4 b2 S F
w8 w14
S F w15 S F
w9
b5 w16
w10
S F
Layers How many layers?
Input
How many nodes?
Output How many inputs?
How many activation functions?
How many classes?
How many weights ?
How many biases
x4 How many optimizer?
How many parameters?
Assessing performance
Assessing the performance
Train data Test data
(80%) (20%)
Dataset is 100 samples
Validation phase
Epoch = 40
Each single epoch → we run train data.
Num_of_batch = 5
End of each single epoch → we run test data
Batch_size = 20
for i less than or equal to Epoch
for j less than or equal to Num_of_batch Accuracy is the percentage of right prediction
compute loss and optimized Batch_size over number of sample in test data. It uses
during validation(of every epoch) or
testing phase(end of whole epoch).
Loss is the percentage during 1 epoch.
Assessing the performance
Train data Test data
(80%) (20%)
Dataset is 100 samples
Validation phase
Epoch = 40
Each single epoch → we run train data.
Num_of_batch = 5
End of each single epoch → we run test data
Batch_size = 20
for i less than or equal to Epoch
for j less than or equal to Num_of_batch Overfitting is when loss in validation phase is much
compute loss and optimized Batch_size bigger than in training phase.
Underfitting is simply the loss is much bigger
during training phase.
Assessing the performance
Overfitting is when training is so good but then when validation/testing phase is bit worsts
Dropped out
Randomly pick any nodes and disable it.
We gives every nodes a probability for being alive.
E.g. say probability is 0.5. So every node will be
50% alive or 50% dead.
Dropped out is always related to overcome overfitting.
Thank you