[go: up one dir, main page]

0% found this document useful (0 votes)
104 views12 pages

Backpropagation: Loading Data

The document discusses backpropagation and gradient checking for neural networks. It provides pseudocode for the forward and backward passes during backpropagation. The forward pass involves computing values at each node in a computational graph to get the final output. The backward pass computes gradients for each weight by propagating error backwards. Gradient checking is used to verify correct implementation of backpropagation by numerically estimating gradients and comparing to analytical gradients.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views12 pages

Backpropagation: Loading Data

The document discusses backpropagation and gradient checking for neural networks. It provides pseudocode for the forward and backward passes during backpropagation. The forward pass involves computing values at each node in a computational graph to get the final output. The backward pass computes gradients for each weight by propagating error backwards. Gradient checking is used to verify correct implementation of backpropagation by numerically estimating gradients and comparing to analytical gradients.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

BackPropagation

There will be some functions that start with the word "grader" ex: grader_sigmoid(), grader_forwardprop(),
grader_backprop() etc, you should not change those function definition.

Every Grader function has to return True.

Loading data
In [1]:
import pickle
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt

with open('data.pkl', 'rb') as f:


data = pickle.load(f)
print(data.shape)
X = data[:, :5]
y = data[:, -1]
print(X.shape, y.shape)

(506, 6)
(506, 5) (506,)

Computational graph

If you observe the graph, we are having input features [f1, f2, f3, f4, f5] and 9 weights [w1, w2, w3, w4, w5, w6, w7, w8,
w9].

The final output of this graph is a value L which is computed as (Y-Y')^2

Task 1: Implementing backpropagation and Gradient checking


Task 1: Implementing backpropagation and Gradient checking

Check this video for better understanding of the computational graphs and back propagation

In [2]:

from IPython.display import YouTubeVideo


YouTubeVideo('i94OvYb6noo',width="1000",height="500")

Out[2]:

Gradient clipping
Check this blog link for more details on Gradient clipping

we know that the derivative of any function is

$$\lim_{\epsilon\to0}\frac{f(x+\epsilon)-f(x-\epsilon)}{2\epsilon}$$

The definition above can be used as a numerical approximation of the derivative. Taking an epsilon small enough, the
calculated approximation will have an error in the range of epsilon squared.
In other words, if epsilon is 0.001, the approximation will be off by 0.00001.

Therefore, we can use this to approximate the gradient, and in turn make sure that backpropagation is implemented
properly. This forms the basis of gradient checking!

Gradient checking example</font>


lets understand the concept with a simple example: $f(w1,w2,x1,x2)=w_{1}^{2} . x_{1} + w_{2} . x_{2}$
from the above function , lets assume $w_{1}=1$, $w_{2}=2$, $x_{1}=3$, $x_{2}=4$ the gradient of $f$ w.r.t $w_{1}$ is
\begin{array} {lcl} \frac{df}{dw_{1}} = dw_{1} &=&2.w_{1}.x_{1} \\& = &2.1.3\\& = &6 \end{array}

let calculate the aproximate gradient of $w_{1}$ as mentinoned in the above formula and considering $\epsilon=0.0001$
\begin{array} {lcl} dw_1^{approx} & = & \frac{f(w1+\epsilon,w2,x1,x2)-f(w1-\epsilon,w2,x1,x2)}{2\epsilon} \\ & = &
\frac{((1+0.0001)^{2} . 3 + 2 . 4) - ((1-0.0001)^{2} . 3 + 2 . 4)}{2\epsilon} \\ & = & \frac{(1.00020001 . 3 + 2 . 4) - (0.99980001. 3 + 2
. 4)}{2*0.0001} \\ & = & \frac{(11.00060003) - (10.99940003)}{0.0002}\\ & = & 5.99999999999 \end{array}
. 4)}{2*0.0001} \\ & = & \frac{(11.00060003) - (10.99940003)}{0.0002}\\ & = & 5.99999999999 \end{array}

Then, we apply the following formula for gradient check: gradient_check = $\frac{\left\Vert\left (dW-dW^{approx}\rm\right)
\right\Vert_2}{\left\Vert\left (dW\rm\right) \right\Vert_2+\left\Vert\left (dW^{approx}\rm\right) \right\Vert_2}$

The equation above is basically the Euclidean distance normalized by the sum of the norm of the vectors. We use
normalization in case that one of the vectors is very small. As a value for epsilon, we usually opt for 1e-7. Therefore, if
gradient check return a value less than 1e-7, then it means that backpropagation was implemented correctly. Otherwise,
there is potentially a mistake in your implementation. If the value exceeds 1e-3, then you are sure that the code is not
correct.

in our example: gradient_check $ = \frac{(6 - 5.999999999994898)}{(6 + 5.999999999994898)} = 4.2514140356330737e^{-13}$

you can mathamatically derive the same thing like this


\begin{array} {lcl} dw_1^{approx} & = & \frac{f(w1+\epsilon,w2,x1,x2)-f(w1-\epsilon,w2,x1,x2)}{2\epsilon} \\ & = &
\frac{((w_{1}+\epsilon)^{2} . x_{1} + w_{2} . x_{2}) - ((w_{1}-\epsilon)^{2} . x_{1} + w_{2} . x_{2})}{2\epsilon} \\ & = & \frac{4.
\epsilon.w_{1}. x_{1}}{2\epsilon} \\ & = & 2.w_{1}.x_{1} \end{array}

Implement Gradient checking


(Write your code in def gradient_checking())

Algorithm

Task 2 : Optimizers
As a part of this task, you will be implementing 3 type of optimizers(methods to update weight)
Use the same computational graph that was mentioned above to do this task
Initilze the 9 weights from normal distribution with mean=0 and std=0.01

Check below video and this blog

In [3]:
from IPython.display import YouTubeVideo
YouTubeVideo('gYpoJMlgyXA',width="1000",height="500")

Out[3]:
Algorithm

for each epoch(1-100):


for each data point in your data:
using the functions forward_propagation() and backword_propagation() compute the
gradients of weights
update the weigts with help of gradients ex: w1 = w1-learning_rate*dw1

Implement below tasks</b>

Task 2.1: you will be implementing the above algorithm with Vanilla update of weights

Task 2.2: you will be implementing the above algorithm with Momentum update of weights

Task 2.3: you will be implementing the above algorithm with Adam update of weights

Note : If you get any assertion error while running grader functions, please print the variables in grader functions and check
which variable is returning False .Recheck your logic for that variable .

Write two functions

Forward propagation</b>(Write your code in def forward_propagation())

For easy debugging, we will break the computational graph into 3 parts.

Part 1</b>

Part 2</b>

Part 3</b>
def forward_propagation(X, y, W):

# X: input data point, note that in this assignment you are having 5-d dat
a points
# y: output varible
# W: weight array, its of length 9, W[0] corresponds to w1 in graph, W[1]
corresponds to w2 in graph,
..., W[8] corresponds to w9 in graph.
# you have to return the following variables
# exp= part1 (compute the forward propagation until exp and then store
the values in exp)
# tanh =part2(compute the forward propagation until tanh and then store th
e values in tanh)
# sig = part3(compute the forward propagation until sigmoid and then store
the values in sig)
# now compute remaining values from computional graph and get y'
# write code to compute the value of L=(y-y')^2
# compute derivative of L w.r.to Y' and store it in dl
# Create a dictionary to store all the intermediate values
# store L, exp,tanh,sig,dl variables

return (dictionary, which you might need to use for back propagation)

Backward propagation(Write your code in def backward_propagation()) </b>

def backward_propagation(L, W,dictionary):

# L: the loss we calculated for the current point


# dictionary: the outputs of the forward_propagation() function
# write code to compute the gradients of each weight [w1,w2,w3,...,w9]
# Hint: you can use dict type to store the required variables
# return dW, dW is a dictionary with gradients of all the weights

return dW

Task 1

Forward propagation
In [4]:
def sigmoid(z):
'''In this function, we will compute the sigmoid(z)'''

# we can use this function in forward and backward propagation

return 1/(1 + np.exp(-z))

def forward_propagation(x, y, w):

t1=w[0]*x[0]
t2=w[1]*x[1]
t3=t1+t2
t4=t3*t3

t5=t4+w[5]
exp=np.exp(t5)
t7=exp+w[6]
tanh=np.tanh(t7)

t9=w[2]*x[2]
t10=np.sin(t9)

t11=w[3]*x[3]
t12=w[4]*x[4]
t13=t11+t12

t14=t13*t10

t15=t14+w[7]

sig=sigmoid(t15)

t16=sig*w[8]

y_hat= t16+tanh

L=np.square(y-y_hat)
dl=-2*(y-y_hat)

dic={
'exp':exp,
'sigmoid':sig,
'tanh':tanh,
'loss':L,
'dy_pr':dl,
'sin':t10,
'cos':np.cos(t9)

return dic

Grader function - 1

In [5]:
def grader_sigmoid(z):
val=sigmoid(z)
assert(val==0.8807970779778823)
return True
grader_sigmoid(2)

Out[5]:
True

In [6]:

def grader_forwardprop(data):
dl = (data['dy_pr']==-1.9285278284819143)
dl = (data['dy_pr']==-1.9285278284819143)
loss=(data['loss']==0.9298048963072919)
part1=(data['exp']==1.1272967040973583)
part2=(data['tanh']==0.8417934192562146)
part3=(data['sigmoid']==0.5279179387419721)
assert(dl and loss and part1 and part2 and part3)
return True
w=np.ones(9)*0.1
d1=forward_propagation(X[0],y[0],w)
grader_forwardprop(d1)

Out[6]:

True

In [7]:
print(d1)

{'exp': 1.1272967040973583, 'sigmoid': 0.5279179387419721, 'tanh': 0.8417934192562146, 'loss': 0.9


298048963072919, 'dy_pr': -1.9285278284819143, 'sin': -0.14538296400984968, 'cos':
0.9893754564247643}

Backward propagation
In [8]:

def backward_propagation(L,W,dic):
'''In this function, we will compute the backward propagation '''

dw1=dic['dy_pr']*(1-np.square(dic['tanh']))*dic['exp']*2*((W[0]*L[0]+W[1]*L[1])*L[0])

dw2=dic['dy_pr']*(1-np.square(dic['tanh']))*dic['exp']*2*((W[1]*L[1]+W[0]*L[0])*L[1])

dw3=dic['dy_pr']*W[8]*dic['sigmoid']*(1-dic['sigmoid'])*(L[3]*W[3]+L[4]*W[4])*L[2]*dic['cos']
dw4=dic['dy_pr']*W[8]*dic['sigmoid']*(1-dic['sigmoid'])*L[3]*dic['sin']
dw5=dic['dy_pr']*W[8]*dic['sigmoid']*(1-dic['sigmoid'])*L[4]*dic['sin']
dw6=dic['dy_pr']*(1-np.square(dic['tanh']))*dic['exp']
dw7=dic['dy_pr']*(1-np.square(dic['tanh']))
dw8=dic['dy_pr']*W[8]*dic['sigmoid']*(1-dic['sigmoid'])
dw9=dic['sigmoid']*dic['dy_pr']

dW={
'dw1':dw1,
'dw2':dw2,
'dw3':dw3,
'dw4':dw4,
'dw5':dw5,
'dw6':dw6,
'dw7':dw7,
'dw8':dw8,
'dw9':dw9
}

return dW

In [9]:

def grader_backprop(data):
dw1=(data['dw1']==-0.22973323498702003)
dw2=(data['dw2']==-0.021407614717752925)
dw3=(data['dw3']==-0.005625405580266319)
dw4=(data['dw4']==-0.004657941222712423)
dw5=(data['dw5']==-0.0010077228498574246)
dw6=(data['dw6']==-0.6334751873437471)
dw7=(data['dw7']==-0.561941842854033)
dw8=(data['dw8']==-0.04806288407316516)
dw8=(data['dw8']==-0.04806288407316516)
dw9=(data['dw9']==-1.0181044360187037)
assert(dw1 and dw2 and dw3 and dw4 and dw5 and dw6 and dw7 and dw8 and dw9)
return True
w=np.ones(9)*0.1
d1=forward_propagation(X[0],y[0],w)
d1=backward_propagation(X[0],w,d1)

grader_backprop(d1)

Out[9]:

True

Implement gradient checking

W = initilize_randomly
def gradient_checking(data_point, W):

# compute the L value using forward_propagation()


# compute the gradients of W using backword_propagation()</font>
approx_gradients = []
for each wi weight value in W:<font color='grey'>
# add a small value to weight wi, and then find the values of L with the updated
weights
# subtract a small value to weight wi, and then find the values of L with the
updated weights
# compute the approximation gradients of weight wi</font>
approx_gradients.append(approximation gradients of weight wi)<font color='grey'>
# compare the gradient of weights W from backword_propagation() with the aproximation
gradients of weights with <br> gradient_check formula</font>
return gradient_check</font>

NOTE: you can do sanity check by checking all the return values of gradient_checking(),
they have to be zero. if not you have bug in your code

In [20]:
W = np.random.rand(9)
eps=0.0001
def gradient_checking(data_point, W):
d1=forward_propagation(data_point,y[0],W)

grad=backward_propagation(data_point,W,d1)
grad=list(grad.values())

# compute the L value using forward_propagation()


# compute the gradients of W using backword_propagation()
approx_gradients = []
for i in range(len(W)):

w_plus=W.copy()
w_plus[i]+=eps
L_plus=forward_propagation(data_point,y[0],w_plus)
grad_plus=backward_propagation(data_point,w_plus,L_plus)
L_plus=L_plus['loss']

w_minus=W.copy()
w_minus[i]-=eps
L_minus=forward_propagation(data_point,y[0],w_minus)
grad_minus=backward_propagation(data_point,w_minus,L_minus)
L_minus=L_minus['loss']
L_minus=L_minus['loss']

approx_grad=(L_plus-L_minus)/(2*eps)
approx_gradients.append(approx_grad)

gradient_check=[]
for i in range(len(W)):
num = np.linalg.norm(grad[i] - approx_gradients[i] )
den = np.linalg.norm(grad[i]) + np.linalg.norm(approx_gradients[i])
diff = num / den
gradient_check.append(diff)

# add a small value to weight wi, and then find the values of L with the updated weights
# subtract a small value to weight wi, and then find the values of L with the updated weig
hts
# compute the approximation gradients of weight wi
#approx_gradients.append(approximation gradients of weight wi)
# compare the gradient of weights W from backword_propagation() with the aproximation
gradients of weights with gradient_check formula
return gradient_check

g=gradient_checking(X[0],W)

In [21]:
g

Out[21]:
[1.446076278481714e-08,
1.2264881537525243e-10,
9.564345727570781e-11,
1.9892512093219975e-10,
7.8985457914592e-12,
1.2992058661307797e-09,
4.214611098858496e-09,
8.936748335692261e-10,
1.2188639210214776e-13]

Task 2: Optimizers

Algorithm with Vanilla update of weights

In [12]:
w=list(np.random.normal(0.0, 0.01, 9))
w

Out[12]:

[-0.001345358300390486,
0.013196112781944023,
0.0019041861049636545,
-0.011440720702484709,
0.017669818548239284,
0.006658934006215388,
0.0068233799771498585,
0.00656258256147008,
0.017630243873985256]

In [13]:
vw=w
Loss=[]
for epoch in range(100):
for i,j in zip(X,y):

d1=forward_propagation(i,j,vw)
loss=d1['loss']

dw=backward_propagation(i,vw,d1)

dw=list(dw.values())
dw=[i * 0.01 for i in dw]
vw=np.subtract(vw,dw)

Loss.append(loss)

Plot between epochs and loss

In [14]:
import matplotlib.pyplot as plt
epoch=list(range(1,101))
plt.plot(epoch,Loss)
plt.xlabel('Epoch')
plt.ylabel('Loss')

Out[14]:
Text(0, 0.5, 'Loss')

Algorithm with Vanilla update of weights

In [15]:
mw=w
v=list(np.zeros(9))
Loss_momentum=[]
for epoch in range(100):

for i,j in zip(X,y):

d1=forward_propagation(i,j,mw)
loss_momentum=d1['loss']

dw=backward_propagation(i,mw,d1)

dw=list(dw.values())

for i in range(len(dw)):
dw[i]=v[i]=0.9*v[i]-0.01*dw[i]
mw=np.add(mw,v)

Loss_momentum.append(loss_momentum)

Plot between epochs and loss

In [16]:

epoch=list(range(1,101))
plt.plot(epoch,Loss_momentum)
plt.xlabel('Epoch')
plt.ylabel('Loss')

Out[16]:

Text(0, 0.5, 'Loss')

Algorithm with Vanilla update of weights

In [33]:

aw=w
av=list(np.zeros(9))
am=list(np.zeros(9))
Loss_adam=[]
for epoch in range(100):

for i,j in zip(X,y):

d1=forward_propagation(i,j,aw)
loss_adam=d1['loss']

dw=backward_propagation(i,aw,d1)

dw=list(dw.values())

for i in range(len(dw)):
am[i] = 0.9*am[i] + (1-0.9)*dw[i]
av[i] = 0.999*av[i] + (1-0.999)*(dw[i]**2)
aw[i]+= - 0.1* am[i] / (np.sqrt(av[i]) + 1e-8)

Loss_adam.append(loss_adam)

Plot between epochs and loss


Plot between epochs and loss

In [34]:
epoch=list(range(1,101))
plt.plot(epoch,Loss_adam)
plt.xlabel('Epoch')
plt.ylabel('Loss')

Out[34]:

Text(0, 0.5, 'Loss')

Comparision plot between epochs and loss with different optimizers

In [35]:

plt.plot(epoch, Loss, 'r--')


plt.plot(epoch,Loss_momentum, 'b--')
plt.plot(epoch,Loss_adam, 'g--')
plt.legend(['vanilla', 'momentun','adam'])
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show();

In [ ]:

You might also like