Backpropagation Example With Numbers Step by Step
Backpropagation Example With Numbers Step by Step
When I come across a new mathematical concept or before I use a canned software package, I like to replicate the calculations in order to get a
deeper understanding of what is going on. This type of computation based approach from rst principles helped me greatly when I rst came
across material on arti cial neural networks.
In this post, I go through a detailed example of one iteration of the backpropagation algorithm using full formulas from basic principles and actual
values. The neural network I use has three input neurons, one hidden layer with two neurons, and an output layer with two neurons.
The following are the (very) high level steps that I will take in this post. Details on each step will follow after.
(2) Forward propagate through the network to get the output values
(5) Update the parameter estimates using the error derivative and the current value
Step 1
The input and target values for this problem are and . I will initialize weights as shown in the diagram
below. Generally, you will assign them randomly but for illustration purposes, I’ve chosen these numbers.
Step 2
Mathematically, we have the following relationships between nodes in the networks. For the input and output layer, I will use the somewhat strange
convention of denoting , , , and to denote the value before the activation function is applied and the notation of , , , and to
denote the values after application of the activation function.
We can use the formulas above to forward propagate through the network. I’ve shown up to four decimal places below but maintained all decimals
in actual calculations.
Step 3
We now de ne the sum of squares error using the target values and the results from the last layer from forward propagation.
Step 4
We are now ready to backpropagate through the network to compute all the error derivatives with respect to the parameters. Note that although
there will be many long formulas, we are not doing anything fancy here. We are just using the basic principles of calculus such as the chain rule.
First we go over some derivatives we will need in this step. The derivative of the sigmoid function is given here. Also, given that
and , we have , , , , , and .
We are now ready to calculate , , , and using the derivatives we have already discussed.
I will omit the details on the next three computations since they are very similar to the one above. Feel free to leave a comment if you are unable to
replicate the numbers below.
The error derivative of is a little bit more involved since changes to a ect the error through both and .
To summarize, we have computed numerical values for the error derivatives with respect to , , , , and . We will now backpropagate
one layer to compute the error derivatives of the parameters connecting the input layer to the hidden layer. These error derivatives are , ,
, , , , and .
I will calculate , , and rst since they all ow through the node.
The calculation of the rst term on the right hand side of the equation above is a bit more involved than previous calculations since a ects the
error through both and .
Now I will proceed with the numerical values for the error derivatives above. These derivatives have already been calculated above or are similar in
style to those calculated above. If anything is unclear, please leave a comment.
I will now calculate , , and since they all ow through the node.
The calculation of the rst term on the right hand side of the equation above is a bit more involved since a ects the error through both and .
So what do we do now? We repeat that over and over many times until the error goes down and the parameter estimates stabilize or converge to
some values. We obviously won’t be going through all these calculations manually. I’ve provided Python code below that codi es the calculations
above. Nowadays, we wouldn’t do any of these manually but rather use a machine learning package that is already readily available.
I ran 10,000 iterations and we see below that sum of squares error has dropped signi cantly after the rst thousand or so iterations.
FIRST DERIVATIVE OF THE SIGMOID FUNCTION