Continuing from where we left last post, we were discussing the batch gradient algorithm. The iterative formula is
The most obvious way to do this in normal Python would be to run a double nested loop, one from 1 to m to sum up the partial derivative term, and the other from 1 to n. to update the term.
Something like this.
theta_init = np.zeros(n + 1)
for i in xrange(n):
sum = 0
for j in xrange(m - 1):
sum += (np.dot(theta_init, X[j]) - y[j])*(X[i][j])
theta_init[i] -= (1/m)*sum
This loop has to be repeates till theta converges to a certain tolerance or a certain number of iterations. So basically there are three nested for loops.
However, NumPy makes things easier. Lets look at the vectorised way of doing this.
Consider theta to be a matrix of size (n + 1, 1)
X is a matrix of size (m, n + 1)
y is a matrix of size (n + 1, 1)
The inner product is computed as
theta.T * X[i]
Instead of doing it individually for each example, we could do it for all vectors and store it in a matrix of size (m+1, 1)
np.dot(theta.T, X.T) - y
This gives a matrix like this.
Lets call this Error matrix where each element is the error in predicting the output of the corresponding example.
Now each error term $has to be multiplied with
to get
. Instead of doing this.
1] Simply multiply the error matrices, with to get the above mentioned term. How does this work (Figure it out yourself)?
2] Then divide by and subtract it from
, to get the updated value of
3] Repeat till convergence.
Stochastic gradient descent
Sometimes when you have a large number of examples then becomes computationally expensive, In that case you replace it with this,
so that theta is updated for each example, (i.e run the loop m times).
That’s it for now. (I tried implementing it on my own, and can be seen here, https://github.com/Manoj-Kumar-S/Machine_Learning/blob/master/regression.py)