[go: up one dir, main page]

0% found this document useful (0 votes)
106 views4 pages

Multi Layer Perceptron - Notes

The document discusses the multi-layer perceptron model for machine learning. It introduces the structure of an MLP, which uses multiple layers of nodes and weighted connections between layers. The backpropagation algorithm is then described for training an MLP by calculating the gradient of the error function. Backpropagation works by propagating error signals backward from the output layer through the network to update the weights between nodes in a manner that reduces error. Pseudocode is provided for how the backpropagation algorithm calculates and updates the weights at each layer based on the error gradients.

Uploaded by

picode group
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views4 pages

Multi Layer Perceptron - Notes

The document discusses the multi-layer perceptron model for machine learning. It introduces the structure of an MLP, which uses multiple layers of nodes and weighted connections between layers. The backpropagation algorithm is then described for training an MLP by calculating the gradient of the error function. Backpropagation works by propagating error signals backward from the output layer through the network to update the weights between nodes in a manner that reduces error. Pseudocode is provided for how the backpropagation algorithm calculates and updates the weights at each layer based on the error gradients.

Uploaded by

picode group
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Introduction to Machine Learning (Lecture Notes)

Multi-layer Perceptron
Lecturer: Barnabas Poczos

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
They may be distributed outside this class only with the permission of the Instructor.

1 The Multi-layer Perceptron

1.1 Matlab demos

Matlab tutorials for neural network design:

nnd9sd % Steepest descent


nn9sdq % Steepest descent for quadratic

Character recognition with MLP:

appcr1

Struture of MLP:

Noise-free input: 26 different letters of size 7×5. Prediction errors:

1
2 Multi-layer Perceptron: Barnabas Poczos

1.2 An example and notations

Here we will always assume that the activation function is differentiable. This will allow us to optimize the
cost function with gradient descent. However, non-differentiable activation functions are getting popular as
well.

2 The back-propagation algorithm

2.1 The gradient of the error

The current error:


2 = 21 + 22 = (ŷ1 − y1 )2 + (ŷ2 − y2 )2 .
Multi-layer Perceptron: Barnabas Poczos 3

More generally:

NL
X NL
X
2
 = 2p = (ŷp − yp )2 .
p=1 p=1

We want to calculate
∂(k)2
=?.
∂Wijl (k)

2.2 Notation
• Wijl (k): At time step k, the strength of connection from neuron j on layer l − 1 to neuron i on layer l.
(i = 1, 2, ..., Nl , j = 1, 2, ..., Nl−1 )

• sli (k): The summed input of neuron i on layer l before applying the activation function f at time step
k(i = 1, ..., Nl ).

• xl (k) ∈ RNl−1 : The input of layer l at time step k.

• ŷ l (k) ∈ RNl : The output of layer l at time step k.

• N1 , N2 , ..., Nl , ..., NL : Number of neurons in layers 1, 2, ..., l, ..., L.

2.3 Some observations

xl = ŷ l−1 ∈ RNl−1
Nl−1 Nl−1
X X
sli = Wi·l ŷ l−1 = Wijl xlj = Wijl f (sl−1
j )
j=1 j=1

Nl
X
sl+1
j = l+1
Wji f (sli )
i=1

2.4 The back propagated error


∂ ∂ ∂g(x) ∂ ∂h(x)
Recall that ∂x f (g(x), h(x)) = ∂g f (g(x), h(x)) ∂x + ∂h f (g(x), h(x)) ∂x .

Introduce the notation:


NL
−∂2 (k) X ∂2p (k)
δil (k) = = −
∂sli (k) p=1
∂sli (k)

where i = 1, 2, ..., Nl .
As a special case, we have that
NL
X ∂(yp (k) − f (sL
p (k)))
2
δiL (k) = − = 2i (k)f 0 (sL
i (k))
p=1
∂sL
i (k)
4 Multi-layer Perceptron: Barnabas Poczos

Lemma 1 δil (k) can be calculated from {δ1l+1 (k), ..., δN


l+1
l+1
(k)} using backward recursion.

NL NL N
X ∂2p X Xl+1
∂2p ∂sl+1
j
δil (k) = − = −
p=1
∂sli p=1 j=1
∂sl+1
j
∂s l
i
Nl+1 NL
XX ∂2p l+1 0 l
= − Wji f (si )
j=1 p=1
∂sl+1
j

Therefore,
Nl+1
X
δil (k) = ( δjl+1 Wji
l+1
(k))f 0 (sli (k))
j=1

where δil (k) is the back propagated error.


Now using that
Nl−1
X
sli (k) = Wijl (k)xlj (k)
j=1

∂(k)2 ∂(k)2 ∂sli (k)


= = −δil (k)xlj (k)
∂Wijl (k) ∂sli (k) ∂Wijl (k)

The Back-propagation algorithm:

Wijl (k + 1) = Wijl (k) + µδil (k)xlj (k)

In vector form:
Wi·l (k + 1) = Wi·l (k) + µδil (k)xl· (k).

You might also like