DLP Lab
DLP Lab
class Perceptron:
def __init__(self, eta, epochs):
self.weights = np.random.randn(3) * 1e-4 # RANDOM WEIGHT ASSIGNMENT
print(f"initial weights before training: n{self.weights}")
self.eta = eta # LEARNING RATE
self.epochs = epochs
def total_loss(self):
total_loss = np.sum(self.error)
print(f"total loss: {total_loss}")
return total_loss
def prepare_data(df):
X=df.drop("y",axis=1)
y=df["y"]
return X,y
AND = {
"x1": [0,0,1,1],
"x2": [0,1,0,1],
"y": [0,0,0,1],
}
df = pd.DataFrame(AND)
X,y = prepare_data(df)
model.total_loss()
OUTPUT
2. Write a program to implement multi-layer perceptron using TensorFlow. Apply multi-layer
perceptron (MLP) on the Iris dataset.
import numpy as np
class Perceptron:
Here we have initialized some instance variables including, weights, bias, learning rate, and
epochs(iteration). Next, we are going to define the activation function method
The Heaviside activation method only takes one parameter, which is the weighted sum of inputs z, and returns
the corresponding output.
What happens here is really simple, first, we'll find the number of features in the training instance for assigning
the weights. Then the initial weights and biases are assigned randomly. After that, we find the weighted sum of
inputs and passed through the Heaviside activation function. Finally, the weights and bias are updated in each
case and the optimal values are returned.
Prediction method
That's it, we are done with the Perceptron class, So let's do the main part which is to classify the Iris dataset
Classifying Iris dataset using Perceptron
The iris data consisted of 150 samples of three species of Iris including Setosa, Versicolor, and Virginica. The
first column of the dataset represented sepal length, the second column represented sepal width, the third
column represented petal length, and the fourth column represented petal width. For this classification
purpose, we are only using the petal length and petal width
iris = load_iris()
Here we are only considering petal length and petal width, So we transformed the training data which only
contains the petal length and width and not sepal. After that, the X and y are split into training and testing sets.
perceptron.fit(X_train, y_train)
pred = perceptron.predict(X_test)
Now let's see how much accuracy we have got,
accuracy_score(pred, y_test)
-------
0.96
That's great we got an accuracy of 96%, changing the learning rate or the number of epochs will result in more
accurate results.
Classification report
------
accuracy 0.96 75
macro avg 0.97 0.95 0.96 75
weighted avg 0.96 0.96 0.96 75
3. (a) Write a program to implement a Convolution Neural Network (CNN) in Keras. Perform
predictions using the trained Convolution Neural Network (CNN).
Note: We don’t need to install the keras package because it now comes bundled with TensorFlow as its
official high-level API! Using TensorFlow’s Keras is now recommended over the
standalone keras package.
You should now be able to import these packages and poke around the MNIST dataset:
import numpy as np
import mnist
from tensorflow import keras
# The first time you run this might be a bit slow, since the
# mnist package has to download and cache the data.
train_images = mnist.train_images()
train_labels = mnist.train_labels()
Before we begin, we’ll normalize the image pixel values from [0, 255] to [-0.5, 0.5] to make our
network easier to train (using smaller, centered values usually leads to better results). We’ll also
reshape each image from (28, 28) to (28, 28, 1) because Keras requires the third dimension.
import numpy as np
import mnist
train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()
Every Keras model is either built using the Sequential class, which represents a linear stack of layers,
or the functional Model class, which is more customizeable. We’ll be using the
simpler Sequential model, since our CNN will be a linear stack of layers.
# WIP
model = Sequential([
# layers...
])
The Sequential constructor takes an array of Keras Layers. We’ll use 3 types of layers for our
CNN: Convolutional, Max Pooling, and Softmax.
This is the same CNN setup we used in my introduction to CNNs. Read that post if you’re not
comfortable with any of these 3 types of layers.
num_filters = 8
filter_size = 3
pool_size = 2
model = Sequential([
Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(10, activation='softmax'),
])
num_filters, filter_size, and pool_size are self-explanatory variables that set the hyperparameters for our
CNN.
The first layer in any Sequential model must specify the input_shape, so we do so on Conv2D. Once this input
shape is specified, Keras will automatically infer the shapes of inputs for later layers.
The output Softmax layer has 10 nodes, one for each class.
Before we can begin training, we need to configure the training process. We decide 3 key factors
during the compilation step:
The optimizer. We’ll stick with a pretty good default: the Adam gradient-based optimizer. Keras has many
other optimizers you can look into as well.
The loss function. Since we’re using a Softmax output layer, we’ll use the Cross-Entropy loss. Keras
distinguishes between binary_crossentropy (2 classes) and categorical_crossentropy (>2 classes), so we’ll
use the latter. See all Keras losses.
A list of metrics. Since this is a classification problem, we’ll just have Keras report on the accuracy metric.
model.compile(
'adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)
Onwards!
Training a model in Keras literally consists only of calling fit() and specifying some parameters. There
are a lot of possible parameters, but we’ll only supply these:
The training data (images and labels), commonly known as X and Y, respectively.
The number of epochs (iterations over the entire dataset) to train for.
The validation data (or test data), which is used during training to periodically measure the network’s
performance against data it hasn’t seen before.
There’s one thing we have to be careful about: Keras expects the training targets to be 10-dimensional
vectors, since there are 10 nodes in our Softmax output layer. Right now,
our train_labels and test_labels arrays contain single integers representing the class for each image:
import mnist
train_labels = mnist.train_labels()
print(train_labels[0]) # 5
Conveniently, Keras has a utility method that fixes this exact issue: to_categorical. It turns our array of
class integers into an array of one-hot vectors instead. For example, 2 would become [0, 0, 1, 0, 0,
0, 0, 0, 0, 0] (it’s zero-indexed).
model.fit(
train_images,
to_categorical(train_labels),
epochs=3,
validation_data=(test_images, to_categorical(test_labels)),
)
train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()
num_filters = 8
filter_size = 3
pool_size = 2
Running that code on the full MNIST dataset gives us results like this:
Epoch 1
loss: 0.2433 - acc: 0.9276 - val_loss: 0.1176 - val_acc: 0.9634
Epoch 2
loss: 0.1184 - acc: 0.9648 - val_loss: 0.0936 - val_acc: 0.9721
Epoch 3
loss: 0.0930 - acc: 0.9721 - val_loss: 0.0778 - val_acc: 0.9744
Now that we have a working, trained model, let’s put it to use. The first thing we’ll do is save it to disk
so we can load it back up anytime:
model.save_weights('cnn.h5')
We can now reload the trained model whenever we want by rebuilding it and loading in the saved
weights:
num_filters = 8
filter_size = 3
pool_size = 2
Using the trained model to make predictions is easy: we pass an array of inputs to predict() and it
returns an array of outputs. Keep in mind that the output of our network is 10 probabilities (because
of softmax), so we’ll use np.argmax() to turn those into actual digits.
There’s much more we can do to experiment with and improve our network - in this official Keras
MNIST CNN example, they achieve 99 test accuracy after 15 epochs. Some examples of modifications
you could make to our CNN include:
Network Depth
What happens if we add or remove Convolutional layers? How does that affect training and/or the
model’s final performance?
model = Sequential([
Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
Conv2D(num_filters, filter_size), MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(10, activation='softmax'),
])
Dropout
What if we tried adding Dropout layers, which are commonly used to prevent overfitting?
What if we add fully-connected layers between the Convolutional outputs and the final Softmax layer?
This is something commonly done in CNNs used for Computer Vision.
model = Sequential([
# See https://keras.io/layers/convolutional/#conv2d for more info.
Conv2D(
num_filters,
filter_size,
input_shape=(28, 28, 1),
strides=2, padding='same', activation='relu', ),
MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(10, activation='softmax'),
])
3. (b) Write a program to build an Image Classifier with CIFAR-10 Data.
Evaluation:
We have 10 classes, so if we pick a image and we randomly gues it class, we have 1/10 probability to be true.
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
plt.subplots_adjust(hspace=0.4)
In [4]:
classes_name = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', '
Truck']
In [5]:
Data Preprocessing
In [6]:
y_cat_train
Out[7]:
linkcode
# Convolutional Layer
model.add(Conv2D(filters=32, kernel_size=KERNEL_SIZE, input_shape=INPUT_SHAPE, activation='relu',
padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(filters=32, kernel_size=KERNEL_SIZE, input_shape=INPUT_SHAPE, activation='relu',
padding='same'))
model.add(BatchNormalization())
# Pooling layer
model.add(MaxPool2D(pool_size=(2, 2)))
# Dropout layers
model.add(Dropout(0.25))
model.add(Flatten())
# model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(10, activation='softmax'))
METRICS = [
'accuracy',
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall')
]
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=METRICS)
Model Evaluation
In [12]:
plt.figure(figsize=(12, 16))
plt.subplot(4, 2, 1)
plt.plot(r.history['loss'], label='Loss')
plt.plot(r.history['val_loss'], label='val_Loss')
plt.title('Loss Function Evolution')
plt.legend()
plt.subplot(4, 2, 2)
plt.plot(r.history['accuracy'], label='accuracy')
plt.plot(r.history['val_accuracy'], label='val_accuracy')
plt.title('Accuracy Function Evolution')
plt.legend()
plt.subplot(4, 2, 3)
plt.plot(r.history['precision'], label='precision')
plt.plot(r.history['val_precision'], label='val_precision')
plt.title('Precision Function Evolution')
plt.legend()
plt.subplot(4, 2, 4)
plt.plot(r.history['recall'], label='recall')
plt.plot(r.history['val_recall'], label='val_recall')
plt.title('Recall Function Evolution')
plt.legend()
Out[12]:
<matplotlib.legend.Legend at 0x7fc610996bd0>
4. (a) Write a program to perform face detection using CNN.
Also, since this is a multi-class classification problem, we are counting the number of unique faces, as that
will be used as the number of output neurons in the output layer of fully connected ANN classifier.
# Storing the face and the numeric tag for future reference
ResultMap={}
for faceValue,faceName in zip(TrainClasses.values(),TrainClasses.keys()):
ResultMap[faceValue]=faceName
# The number of neurons for the output layer is equal to the number of faces
OutputNeurons=len(ResultMap)
print('\n The Number of output neurons: ', OutputNeurons)
Just keep in mind, the more layers/neurons you add, the slower the model becomes.
Also, when you have large amount of images, in the tune of 50K and above, then your laptop’ CPU might
not be efficient to learn those many images. You will have to get a GPU enabled laptop, or use cloud
services like AWS or Google Cloud.
Since the data we have used for the demonstration is small containing only 244 images for training, you
Apart from selecting the best number of layers and the number of neurons in it, for each layer, there are
some hyper parameters which needs to be tuned as well.
Filters=32: This number indicates how many filters we are using to look at the image pixels during
the convolution step. Some filters may catch sharp edges, some filters may catch color variations
some filters may catch outlines, etc. In the end, we get important information from the images. In
the first layer the number of filters=32 is commonly used, then increasing the power of 2. Like in
the next layer it is 64, in the next layer, it is 128 so on and so forth.
kernel_size=(5,5): This indicates the size of the sliding window during convolution, in this case
study we are using 5X5 pixels sliding window.
strides=(1, 1): How fast or slow should the sliding window move during convolution. We are using
the lowest setting of 1X1 pixels. Means slide the convolution window of 5X5 (kernal_size) by 1
pixel in the x-axis and 1 pixel in the y-axis until the whole image is scanned.
input_shape=(64,64,3): Images are nothing but matrix of RGB color codes. during our data pre-
processing we have compressed the images to 64X64, hence the expected shape is 64X64X3.
Means 3 arrays of 64X64, one for RGB colors each.
kernel_initializer=’uniform’: When the Neurons start their computation, some algorithm has to
decide the value for each weight. This parameter specifies that. You can choose different values
for it like ‘normal’ or ‘glorot_uniform’.
activation=’relu’: This specifies the activation function for the calculations inside each neuron.
You can choose values like ‘relu’, ‘tanh’, ‘sigmoid’, etc.
optimizer=’adam’: This parameter helps to find the optimum values of each weight in the neural
network. ‘adam’ is one of the most useful optimizers, another one is ‘rmsprop’
batch_size=10: This specifies how many rows will be passed to the Network in one go after which
the SSE calculation will begin and the neural network will start adjusting its weights based on the
errors.
When all the rows are passed in the batches of 10 rows each as specified in this parameter, then
we call that 1-epoch. Or one full data cycle. This is also known as mini-batch gradient descent. A
small value of batch_size will make the LSTM look at the data slowly, like 2 rows at a time or 4
rows at a time which could lead to overfitting, as compared to a large value like 20 or 50 rows at a
time, which will make the LSTM look at the data fast which could lead to underfitting. Hence a
proper value must be chosen using hyperparameter tuning.
Epochs=10: The same activity of adjusting weights continues for 10 times, as specified by this
parameter. In simple terms, the LSTM looks at the full training data 10 times and adjusts its
weights.
'''######################## Create CNN deep learning model ########################'''
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPool2D
from keras.layers import Flatten
from keras.layers import Dense
classifier.add(MaxPool2D(pool_size=(2,2)))
classifier.add(Dense(OutputNeurons, activation='softmax'))
###########################################################
import time
# Measuring the time taken by the model to train
StartTime=time.time()
# Starting the model training
classifier.fit_generator(
training_set,
steps_per_epoch=30,
epochs=10,
validation_data=test_set,
validation_steps=10)
EndTime=time.time()
print("###### Total Time Taken: ", round((EndTime-StartTime)/60), 'Minutes ######')
4. (b) Write a program to demonstrate hyperparameter tuning in CNN.
# Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import cv2
from PIL import Image
0 27 26 5 5 22 20 20 Train/20/00020_00000_00000.png
1 28 27 5 6 23 22 20 Train/20/00020_00000_00001.png
2 29 26 6 5 24 21 20 Train/20/00020_00000_00002.png
3 28 27 5 6 23 22 20 Train/20/00020_00000_00003.png
4 28 26 5 5 23 21 20 Train/20/00020_00000_00004.png
39204 52 56 5 6 47 51 42 Train/42/00042_00007_00025.png
39205 56 58 5 5 51 53 42 Train/42/00042_00007_00026.png
39206 58 62 5 6 53 57 42 Train/42/00042_00007_00027.png
39207 63 69 5 7 58 63 42 Train/42/00042_00007_00028.png
39208 68 69 7 6 62 63 42 Train/42/00042_00007_00029.png
In [3]:
data_dir = "../input/gtsrb-german-traffic-sign"
train_imgpath= list((data_dir + '/' + str(train.Path[i])) for i in range(len(train.Path)))
In [4]:
for i in range(0,9):
plt.subplot(331+i)
seed=np.random.randint(0,39210)
im = Image.open(train_imgpath[seed])
plt.imshow(im)
plt.show()
Preprocessing image-
converting images into arrays of the form (28,28,3)
In [5]:
train_data=[]
train_labels=[]
path = "../input/gtsrb-german-traffic-sign/"
for i in range(len(train.Path)):
image=cv2.imread(train_imgpath[i])
image_from_array = Image.fromarray(image, 'RGB')
size_image = image_from_array.resize((28,28))
train_data.append(np.array(size_image))
train_labels.append(train.ClassId[i])
X=np.array(train_data)
y=np.array(train_labels)
In [6]:
X_train = X_train.astype('float32')/255
X_val = X_val.astype('float32')/255
#Using one hote encoding for the train and validation labels
from keras.utils import to_categorical
y_train = to_categorical(y_train, 43)
y_val = to_categorical(y_val, 43)
CNN Model-
Grid Search to determine the layers and neurons in each layer in the sequential model.
In [8]:
linkcode
def create_model(layers):
cnn = tf.keras.models.Sequential()
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation="relu", i
nput_shape=[28, 28, 3]))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2, padding='valid'))
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation="relu"))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2, padding='valid'))
cnn.add(tf.keras.layers.Flatten())
cnn.add(tf.keras.layers.Dense(units=43, activation='softmax'))
%matplotlib inline
#%config InlineBackend.figure_format = 'retina'
%qtconsole
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data_path = 'Bike-Sharing-Dataset/hour.csv'
rides = pd.read_csv(data_path)
In [35]:
rides.head()
Out[35]:
Checking out the data
This dataset has the number of riders for each hour of each day from January 1 2011 to December 31 2012. The
number of riders is split between casual and registered, summed up in the cnt column. You can see the first few
rows of the data above.
Below is a plot showing the number of bike riders over the first 10 days in the data set. You can see the hourly
rentals here. This data is pretty complicated! The weekends have lower over all ridership and there are spikes when
people are biking to and from work during the week. Looking at the data above, we also have information about
temperature, humidity, and windspeed, all of these likely affecting the number of riders. You'll be trying to capture all
this with your model.
In [36]:
rides[:24*10].plot(x='dteday', y='cnt')
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x8ea8a20>
Dummy variables
Here we have some categorical variables like season, weather, month. To include these in our model, we'll need to
make binary dummy variables. This is simple to do with Pandas thanks to get_dummies().
In [37]:
5 rows × 59 columns
To make training the network easier, we'll standardize each of the continuous variables. That is, we'll shift and scale
the variables such that they have zero mean and a standard deviation of 1.
The scaling factors are saved so we can go backwards when we use the network for predictions.
In [38]:
data.head()
Out[39]:
5 rows × 59 columns
Splitting the data into training, testing, and validation sets
We'll save the last 21 days of the data to use as a test set after we've trained the network. We'll use this set to make
predictions and compare them with the actual number of riders.
In [40]:
# Hold out the last 60 days of the remaining data as a validation set
train_features, train_targets = features[:-60*24], targets[:-60*24]
val_features, val_targets = features[-60*24:], targets[-60*24:]
We use the weights to propagate signals forward from the input to the output layers in a neural network. We use the
weights to also propagate error backwards from the output back into the network to update our weights. This is
called backpropagation.
Hint: You'll need the derivative of the output activation function ( f(x)=x�(�)=�) for the backpropagation
implementation. If you aren't familiar with calculus, this function is equivalent to the equation y=x�=�. What is the
slope of that equation? That is the derivative of f(x)�(�).
In [42]:
class NeuralNetwork(object):
def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes
# Initialize weights
self.weights_input_to_hidden = np.random.normal(0.0, self.hidden_nodes**-0.5,
(self.hidden_nodes, self.input_nodes))
#returns a numpy array of hidden nodes elements each containing input nodes eleme
nts. These are the weights to
#multiply with each feature=input node
# It returns a shape of (2,30)
# Output layer error is the difference between desired target and actual output.
# deducts from the desired feature value and return a (1,1) array
output_errors = targets - final_outputs
return final_outputs
In [43]:
# Calculate the difference between the loss of the training and the validation set
def MSE(y, Y):
return np.mean((y-Y)**2)
You'll also be using a method know as Stochastic Gradient Descent (SGD) to train the network. The idea is that for
each training pass, you grab a random sample of the data instead of using the whole data set. You use many more
training passes than with normal gradient descent, but each pass is much faster. This ends up training the network
more efficiently. You'll learn more about SGD later.
This is the number of times the dataset will pass through the network, each time updating the weights. As the
number of epochs increases, the network becomes better and better at predicting the targets in the training set.
You'll need to choose enough epochs to train the network well but not too many or you'll be overfitting.
This scales the size of weight updates. If this is too big, the weights tend to explode and the network fails to fit the
data. A good choice to start at is 0.1. If the network has problems fitting the data, try reducing the learning rate. Note
that the lower the learning rate, the smaller the steps are in the weight updates and the longer it takes for the neural
network to converge.
The more hidden nodes you have, the more accurate predictions the model will make. Try a few different numbers
and see how it affects the performance. You can look at the losses dictionary for a metric of the network
performance. If the number of hidden units is too low, then the model won't have enough space to learn and if it is
too high there are too many options for the direction that the learning can take. The trick here is to find the right
balance in number of hidden units you choose.
In [45]:
import sys
### Set the hyperparameters here ###
epochs = 2000
learning_rate = 0.05
hidden_nodes = 28
output_nodes = 1
#get the number of input nodes from the shape of the first row of the train_features
N_i = train_features.shape[1]
#Initiate the network
network = NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate)
losses['train'].append(train_loss)
losses['validation'].append(val_loss)
Progress: 99.9% ... Training loss: 0.049 ... Validation loss: 0.155
In [46]:
(0.0, 0.5)
dates = pd.to_datetime(rides.ix[test_data.index]['dteday'])
dates = dates.apply(lambda d: d.strftime('%b %d'))
ax.set_xticks(np.arange(len(dates))[12::24])
_ = ax.set_xticklabels(dates[12::24], rotation=45)
The validation loss for our model and these hyper parameters ranges from o.130 to 0.143. The model does much
better on the normal days versus the weekends and holiday season. The large variance of the bike sharing users
over the weekend and especially on the holiday season at the end of December does not allow the model to
acurately predict these values without overfitting. Probably more training data , over a larger period of time would
improve the prediction for these outlyiing values.
Having tried various hyperarameter combinations, I have noticed that after around 2000 epochs the validation and
training stop converging and start diverging slightly , indicating that overfitting may be taking place after this number.
I did not observe any improvement in the validation loss or the model for learning rates below 0.09 or above a
number of hidden nodes around 30.
Unit tests
Run these unit tests to check the correctness of your network implementation. These tests must all be successful to
pass the project.
In [15]:
import unittest
class TestMethods(unittest.TestCase):
##########
# Unit tests for data loading
##########
def test_data_path(self):
# Test that file path to dataset has been unaltered
self.assertTrue(data_path.lower() == 'bike-sharing-dataset/hour.csv')
def test_data_loaded(self):
# Test that data frame loaded
self.assertTrue(isinstance(rides, pd.DataFrame))
##########
# Unit tests for network functionality
##########
def test_activation(self):
network = NeuralNetwork(3, 2, 1, 0.5)
# Test that the activation function is a sigmoid
self.assertTrue(np.all(network.activation_function(0.5) == 1/(1+np.exp(-0.5))))
def test_train(self):
# Test that weights are updated correctly on training
network = NeuralNetwork(3, 2, 1, 0.5)
network.weights_input_to_hidden = test_w_i_h.copy()
network.weights_hidden_to_output = test_w_h_o.copy()
network.train(inputs, targets)
self.assertTrue(np.allclose(network.weights_hidden_to_output,
np.array([[ 0.37275328, -0.03172939]])))
self.assertTrue(np.allclose(network.weights_input_to_hidden,
np.array([[ 0.10562014, 0.39775194, -0.29887597],
[-0.20185996, 0.50074398, 0.19962801]])))
def test_run(self):
# Test correctness of run method
network = NeuralNetwork(3, 2, 1, 0.5)
network.weights_input_to_hidden = test_w_i_h.copy()
network.weights_hidden_to_output = test_w_h_o.copy()
self.assertTrue(np.allclose(network.run(inputs), 0.09998924))
suite = unittest.TestLoader().loadTestsFromModule(TestMethods())
unittest.TextTestRunner().run(suite)
.....
----------------------------------------------------------------------
Ran 5 tests in 0.016s
OK
5. Write a program to build auto-encoder in Keras.
import keras
from keras import layers
First, we'll configure our model to use a per-pixel binary crossentropy loss, and the Adam optimizer:
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
Let's prepare our input data. We're using MNIST digits, and we're discarding the labels (since we're only interested
in encoding/decoding the input images).
autoencoder.fit(x_train, x_train,
epochs=50,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
After 50 epochs, the autoencoder seems to reach a stable train/validation loss value of about 0.09. We can try to
visualize the reconstructed inputs and the encoded representations. We will use Matplotlib.
# Encode and decode some digits
# Note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
# Use Matplotlib (don't ask)
import matplotlib.pyplot as plt
# Display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
Here's what we get. The top row is the original digits, and the bottom row is the reconstructed digits. We are losing
quite a bit of detail with this basic approach.
encoding_dim = 32
input_img = keras.Input(shape=(784,))
# Add a Dense layer with a L1 activity regularizer
encoded = layers.Dense(encoding_dim, activation='relu',
activity_regularizer=regularizers.l1(10e-5))(input_img)
decoded = layers.Dense(784, activation='sigmoid')(encoded)
Deep autoencoder
We do not have to limit ourselves to a single layer as encoder or decoder, we could instead use a stack of layers, such
as:
input_img = keras.Input(shape=(784,))
encoded = layers.Dense(128, activation='relu')(input_img)
encoded = layers.Dense(64, activation='relu')(encoded)
encoded = layers.Dense(32, activation='relu')(encoded)
autoencoder.fit(x_train, x_train,
epochs=100,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
After 100 epochs, it reaches a train and validation loss of ~0.08, a bit better than our previous models. Our
reconstructed digits look a bit better too:
Convolutional autoencoder
Since our inputs are images, it makes sense to use convolutional neural networks (convnets) as encoders and
decoders. In practical settings, autoencoders applied to images are always convolutional autoencoders --they simply
perform much better.
Let's implement one. The encoder will consist in a stack of Conv2D and MaxPooling2D layers (max pooling being used
for spatial down-sampling), while the decoder will consist in a stack of Conv2D and UpSampling2D layers.
import keras
from keras import layers
autoencoder.fit(x_train, x_train,
epochs=50,
batch_size=128,
shuffle=True,
validation_data=(x_test, x_test),
callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])
This allows us to monitor training in the TensorBoard web interface (by navighating to http://0.0.0.0:6006):
The model converges to a loss of 0.094, significantly better than our previous models (this is in large part due to the
higher entropic capacity of the encoded representation, 128 dimensions vs. 32 previously). Let's take a look at the
reconstructed digits:
decoded_imgs = autoencoder.predict(x_test)
n = 10
plt.figure(figsize=(20, 4))
for i in range(1, n + 1):
# Display original
ax = plt.subplot(2, n, i)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# Display reconstruction
ax = plt.subplot(2, n, i + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
We can also have a look at the 128-dimensional encoded representations. These representations are 8x4x4, so we
reshape them to 4x32 in order to be able to display them as grayscale images.
n = 10
plt.figure(figsize=(20, 8))
for i in range(1, n + 1):
ax = plt.subplot(1, n, i)
plt.imshow(encoded_imgs[i].reshape((4, 4 * 8)).T)
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
Application to image denoising
Let's put our convolutional autoencoder to work on an image denoising problem. It's simple: we will train the
autoencoder to map noisy digits images to clean digits images.
Here's how we will generate synthetic noisy digits: we just apply a gaussian noise matrix and clip the images between
0 and 1.
noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)
n = 10
plt.figure(figsize=(20, 2))
for i in range(1, n + 1):
ax = plt.subplot(1, n, i)
plt.imshow(x_test_noisy[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
If you squint you can still recognize them, but barely. Can our autoencoder learn to recover the original digits? Let's
find out.
Compared to the previous convolutional autoencoder, in order to improve the quality of the reconstructed, we'll use
a slightly different model with more filters per layer:
autoencoder.fit(x_train_noisy, x_train,
epochs=100,
batch_size=128,
shuffle=True,
validation_data=(x_test_noisy, x_test),
callbacks=[TensorBoard(log_dir='/tmp/tb', histogram_freq=0, write_graph=False)])
Now let's take a look at the results. Top, the noisy digits fed to the network, and bottom, the digits are reconstructed
by the network.
It seems to work pretty well. If you scale this process to a bigger convnet, you can start building document denoising
or audio denoising models. Kaggle has an interesting dataset to get you started.
Sequence-to-sequence autoencoder
If you inputs are sequences, rather than vectors or 2D images, then you may want to use as encoder and decoder a
type of model that can capture temporal structure, such as a LSTM. To build a LSTM-based autoencoder, first use a
LSTM encoder to turn your input sequences into a single vector that contains information about the entire sequence,
then repeat this vector n times (where n is the number of timesteps in the output sequence), and run a LSTM decoder
to turn this constant sequence into the target sequence.
We won't be demonstrating that one on any specific dataset. We will just put a code example here for future
reference for the reader!
decoded = layers.RepeatVector(timesteps)(encoded)
decoded = layers.LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = keras.Model(inputs, decoded)
encoder = keras.Model(inputs, encoded)
What is a variational autoencoder, you ask? It's a type of autoencoder with added constraints on the encoded
representations being learned. More precisely, it is an autoencoder that learns a latent variable model for its input
data. So instead of letting your neural network learn an arbitrary function, you are learning the parameters of a
probability distribution modeling your data. If you sample points from this distribution, you can generate new input
data samples: a VAE is a "generative model".
How does a variational autoencoder work?
First, an encoder network turns the input samples x into two parameters in a latent space, which we will
note z_mean and z_log_sigma. Then, we randomly sample similar points z from the latent normal distribution that is
assumed to generate the data, via z = z_mean + exp(z_log_sigma) * epsilon, where epsilon is a random normal
tensor. Finally, a decoder network maps these latent space points back to the original input data.
The parameters of the model are trained via two loss functions: a reconstruction loss forcing the decoded samples to
match the initial inputs (just like in our previous autoencoders), and the KL divergence between the learned latent
distribution and the prior distribution, acting as a regularization term. You could actually get rid of this latter term
entirely, although it does help in learning well-formed latent spaces and reducing overfitting to the training data.
Because a VAE is a more complex example, we have made the code available on Github as a standalone script. Here
we will review step by step how the model is created.
First, here's our encoder network, mapping inputs to our latent distribution parameters:
original_dim = 28 * 28
intermediate_dim = 64
latent_dim = 2
inputs = keras.Input(shape=(original_dim,))
h = layers.Dense(intermediate_dim, activation='relu')(inputs)
z_mean = layers.Dense(latent_dim)(h)
z_log_sigma = layers.Dense(latent_dim)(h)
We can use these parameters to sample new similar points from the latent space:
def sampling(args):
z_mean, z_log_sigma = args
epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim),
mean=0., stddev=0.1)
return z_mean + K.exp(z_log_sigma) * epsilon
z = layers.Lambda(sampling)([z_mean, z_log_sigma])
Finally, we can map these sampled latent points back to reconstructed inputs:
# Create encoder
encoder = keras.Model(inputs, [z_mean, z_log_sigma, z], name='encoder')
# Create decoder
latent_inputs = keras.Input(shape=(latent_dim,), name='z_sampling')
x = layers.Dense(intermediate_dim, activation='relu')(latent_inputs)
outputs = layers.Dense(original_dim, activation='sigmoid')(x)
decoder = keras.Model(latent_inputs, outputs, name='decoder')
vae.fit(x_train, x_train,
epochs=100,
batch_size=32,
validation_data=(x_test, x_test))
Because our latent space is two-dimensional, there are a few cool visualizations that can be done at this point. One is
to look at the neighborhoods of different classes on the latent 2D plane:
Because the VAE is a generative model, we can also use it to generate new digits! Here we will scan the latent plane,
sampling latent points at regular intervals, and generating the corresponding digit for each of these points. This
gives us a visualization of the latent manifold that "generates" the MNIST digits.
for i, yi in enumerate(grid_x):
for j, xi in enumerate(grid_y):
z_sample = np.array([[xi, yi]])
x_decoded = decoder.predict(z_sample)
digit = x_decoded[0].reshape(digit_size, digit_size)
figure[i * digit_size: (i + 1) * digit_size,
j * digit_size: (j + 1) * digit_size] = digit
plt.figure(figsize=(10, 10))
plt.imshow(figure)
plt.show()
6. Write a program to implement basic reinforcement learning algorithm to teach a bot to reach its
destination.
import numpy as np
import pylab as pl
import networkx as nx
edges = [(0, 1), (1, 5), (5, 6), (5, 4), (1, 2),
(1, 3), (9, 10), (2, 4), (0, 6), (6, 7),
goal = 10
G = nx.Graph()
G.add_edges_from(edges)
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos)
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_labels(G, pos)
pl.show()
Note: The above graph may not look the same on reproduction of the code because
the networkx library in python produces a random graph from the given edges.
Step 3: Defining the reward the system for the bot
MATRIX_SIZE = 11
M *= -1
print(point)
if point[1] == goal:
M[point] = 100
else:
M[point] = 0
if point[0] == goal:
M[point[::-1]] = 100
else:
M[point[::-1]]= 0
# reverse of point
print(M)
Q = np.matrix(np.zeros([MATRIX_SIZE, MATRIX_SIZE]))
gamma = 0.75
# learning parameter
initial_state = 1
def available_actions(state):
current_state_row = M[state, ]
return available_action
available_action = available_actions(initial_state)
def sample_next_action(available_actions_range):
return next_action
action = sample_next_action(available_action)
if max_index.shape[0] > 1:
else:
max_index = int(max_index)
else:
return (0)
scores = []
for i in range(1000):
available_action = available_actions(current_state)
action = sample_next_action(available_action)
scores.append(score)
# print("Trained Q matrix:")
# print(Q / np.max(Q)*100)
# You can uncomment the above two lines to view the trained Q matrix
# Testing
current_state = 0
steps = [current_state]
if next_step_index.shape[0] > 1:
else:
next_step_index = int(next_step_index)
steps.append(next_step_index)
current_state = next_step_index
print(steps)
pl.plot(scores)
pl.xlabel('No of iterations')
pl.ylabel('Reward gained')
pl.show()
Now, Let’s bring this bot to a more realistic setting. Let us imagine that the bot is a detective
and is trying to find out the location of a large drug racket. He naturally concludes that the
drug sellers will not sell their products in a location which is known to be frequented by the
police and the selling locations are near the location of the drug racket. Also, the sellers
leave a trace of their products where they sell it and this can help the detective in finding out
the required location. We want to train our bot to find the location using
these Environmental Clues.
Step 6: Defining and visualizing the new graph with the environmental clues
police = [2, 4, 5]
drug_traces = [3, 8, 9]
G = nx.Graph()
G.add_edges_from(edges)
H = nx.relabel_nodes(G, mapping)
pos = nx.spring_layout(H)
nx.draw_networkx_nodes(H, pos, node_size =[200, 200, 200, 200, 200, 200, 200, 200])
nx.draw_networkx_edges(H, pos)
nx.draw_networkx_labels(H, pos)
pl.show()
Note: The above graph may look a bit different from the previous graph but they, in fact, are
the same graphs. This is due to the random placement of nodes by the networkx library.
Step 7: Defining some utility functions for the training process
Q = np.matrix(np.zeros([MATRIX_SIZE, MATRIX_SIZE]))
initial_state = 1
# Same as above
def available_actions(state):
current_state_row = M[state, ]
return av_action
# Same as above
def sample_next_action(available_actions_range):
return next_action
def collect_environmental_data(action):
found = []
if action in police:
found.append('p')
if action in drug_traces:
found.append('d')
return (found)
available_action = available_actions(initial_state)
action = sample_next_action(available_action)
else:
max_index = int(max_index)
environment = collect_environmental_data(action)
if 'p' in environment:
env_police[current_state, action] += 1
if 'd' in environment:
env_drugs[current_state, action] += 1
return(np.sum(Q / np.max(Q)*100))
else:
return (0)
# Same as above
def available_actions_with_env_help(state):
current_state_row = M[state, ]
temp_av_action = av_action[np.array(env_pos_row)[0]>= 0]
if len(temp_av_action) > 0:
av_action = temp_av_action
return av_action
scores = []
for i in range(1000):
available_action = available_actions(current_state)
action = sample_next_action(available_action)
print(env_police)
print('')
print(env_drugs)
scores = []
for i in range(1000):
available_action = available_actions_with_env_help(current_state)
action = sample_next_action(available_action)
pl.plot(scores)
pl.xlabel('Number of iterations')
pl.ylabel('Reward gained')
pl.show()
7. (a) Write a program to implement a Recurrent Neural Network
Our next task is defining all the necessary variables and functions we’ll use in the RNN model.
Our model will take in the input sequence, process it through a hidden layer of 100 units, and
learning_rate = 0.0001
nepoch = 25
T = 50 # length of sequence
hidden_dim = 100
output_dim = 1
bptt_truncate = 5
min_clip_value = -10
max_clip_value = 10
Here,
U is the weight matrix for weights between input and hidden layers
V is the weight matrix for weights between hidden and output layers
W is the weight matrix for shared weights in the RNN layer (hidden layer)
Finally, we will define the activation function, sigmoid, to be used in the hidden layer:
def sigmoid(x):
return 1 / (1 + np.exp(-x))
Step 2: Train the Model
Now that we have defined our model, we can finally move on with training it on our sequence
data. We can subdivide the training process into smaller steps, namely:
We need to repeat these steps until convergence. If the model starts to overfit, stop! Or simply
We will do a forward pass through our RNN model and calculate the squared error for the
# calculate error
loss_per_record = (y - mulv)**2 / 2
loss += loss_per_record
loss = loss / float(y.shape[0])
Step 2.2: Check the loss on validation data
We will do the same thing for calculating the loss on validation data (in the same loop):
loss_per_record = (y - mulv)**2 / 2
val_loss += loss_per_record
val_loss = val_loss / float(y.shape[0])
print('Epoch: ', epoch + 1, ', Loss: ', loss, ', Val Loss: ', val_loss)
We will now start with the actual training of the network. In this, we will first do a forward pass to
calculate the errors and a backward pass to calculate the gradients and update them. Let me
show you these step-by-step so you can visualize how it works in your mind.
Step 2.3.1: Forward Pass
We first multiply the input with the weights between input and hidden layers
Add this with the multiplication of weights in the RNN layer. This is because we want to capture
Multiply this with the weights between hidden and output layers
At the output layer, we have a linear activation of the values so we do not explicitly pass the value
Save the state at the current layer and also the state at the previous timestep in a dictionary
Here is the code for doing a forward pass (note that it is in continuation of the above loop):
# train model
for i in range(Y.shape[0]):
x, y = X[i], Y[i]
layers = []
prev_s = np.zeros((hidden_dim, 1))
dU = np.zeros(U.shape)
dV = np.zeros(V.shape)
dW = np.zeros(W.shape)
dU_t = np.zeros(U.shape)
dV_t = np.zeros(V.shape)
dW_t = np.zeros(W.shape)
dU_i = np.zeros(U.shape)
dW_i = np.zeros(W.shape)
# forward pass
for t in range(T):
new_input = np.zeros(x.shape)
new_input[t] = x[t]
mulu = np.dot(U, new_input)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)
mulv = np.dot(V, s)
layers.append({'s':s, 'prev_s':prev_s})
prev_s = s
Step 2.3.2 : Backpropagate Error
After the forward propagation step, we calculate the gradients at each layer, and backpropagate
the errors. We will use truncated back propagation through time (TBPTT), instead of vanilla
backprop. It may sound complex but its actually pretty straight forward.
The core difference in BPTT versus backprop is that the backpropagation step is done for all the
time steps in the RNN layer. So if our sequence length is 50, we will backpropagate for all the
If you have guessed correctly, BPTT seems very computationally expensive. So instead of
computational power. Consider this ideologically similar to stochastic gradient descent, where we
# derivative of pred
dmulv = (mulv - y)
# backward pass
for t in range(T):
dV_t = np.dot(dmulv, np.transpose(layers[t]['s']))
dsv = np.dot(np.transpose(V), dmulv)
ds = dsv
dadd = add * (1 - add) * ds
new_input = np.zeros(x.shape)
new_input[t] = x[t]
dU_i = np.dot(U, new_input)
dx = np.dot(np.transpose(U), dmulu)
dU_t += dU_i
dW_t += dW_i
dV += dV_t
dU += dU_t
dW += dW_t
Step 2.3.3 : Update weights
Lastly, we update the weights with the gradients of weights calculated. One thing we have to
keep in mind that the gradients tend to explode if you don’t keep them in check.This is a
fundamental issue in training neural networks, called the exploding gradient problem. So we
have to clamp them in a range so that they dont explode. We can do it like this
# update
U -= learning_rate * dU
V -= learning_rate * dV
W -= learning_rate * dW
Looking good! Time to get the predictions and plot them to get a visual sense of what we’ve
designed.
We will do a forward pass through the trained weights to get our predictions:
preds = []
for i in range(Y.shape[0]):
x, y = X[i], Y[i]
prev_s = np.zeros((hidden_dim, 1))
# Forward pass
for t in range(T):
mulu = np.dot(U, x)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)
mulv = np.dot(V, s)
prev_s = s
preds.append(mulv)
preds = np.array(preds)
This was on the training data. How do we know if our model didn’t overfit? This is where the validation
set, which we created earlier, comes into play:
preds = []
for i in range(Y_val.shape[0]):
x, y = X_val[i], Y_val[i]
prev_s = np.zeros((hidden_dim, 1))
# For each time step...
for t in range(T):
mulu = np.dot(U, x)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)
mulv = np.dot(V, s)
prev_s = s
preds.append(mulv)
preds = np.array(preds)
Not bad. The predictions are looking impressive. The RMSE score on the validation data is respectable
as well:
Data Preparation
Before a univariate series can be modeled, it must be prepared.
The LSTM model will learn a function that maps a sequence of past observations as input to an output
observation. As such, the sequence of observations must be transformed into multiple examples from
which the LSTM can learn.
We can divide the sequence into multiple input/output patterns called samples, where three time steps are
used as input and one time step is used as output for the one-step prediction that is being learned.
1 X, y
2 10, 20, 30 40
3 20, 30, 40 50
4 30, 40, 50 60
5 ...
The split_sequence() function below implements this behavior and will split a given univariate sequence
into multiple samples where each sample has a specified number of time steps and the output is a single
time step.
1 # split a univariate sequence into samples
3 X, y = list(), list()
4 for i in range(len(sequence)):
6 end_ix = i + n_steps
9 break
12 X.append(seq_x)
13 y.append(seq_y)
6 X, y = list(), list()
7 for i in range(len(sequence)):
9 end_ix = i + n_steps
12 break
15 X.append(seq_x)
16 y.append(seq_y)
18
20 raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
22 n_steps = 3
24 X, y = split_sequence(raw_seq, n_steps)
26 for i in range(len(X)):
27 print(X[i], y[i])
Running the example splits the univariate series into six samples where each sample has three input time
steps and one output time step.
1 [10 20 30] 40
2 [20 30 40] 50
3 [30 40 50] 60
4 [40 50 60] 70
5 [50 60 70] 80
6 [60 70 80] 90
8. (a) Write a program to perform object detection using Deep Learning
$ python deep_learning_object_detection.py \
--prototxt MobileNetSSD_deploy.prototxt.txt \
--model MobileNetSSD_deploy.caffemodel --image images/example_01.jpg
[INFO] loading model...
[INFO] computing object detections...
[INFO] loading model...
[INFO] computing object detections...
[INFO] car: 99.78%
[INFO] car: 99.25%
8. (b) Dog-Breed Classifier – Design and train a convolutional neural network to analyze images
of dogs and correctly identify their breeds. Use transfer learning and well-known architectures to
improve this model.
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def tanh(x):
return np.tanh(x)
def relu(x):
return np.maximum(0, x)
def softmax(x):
exp_x = np.exp(x)
return exp_x / np.sum(exp_x)
x = np.array([-1, 0, 1])
# Sigmoid
print(sigmoid(x)) # [0.26894142 0.5 0.73105858]
# Tanh
print(tanh(x)) # [-0.76159416 0. 0.76159416]
# ReLU
print(relu(x)) # [0 0 1]
# Leaky ReLU
print(leaky_relu(x)) # [-0.01 0. 1. ]
# Softmax
print(softmax(x)) # [0.09003057 0.24472847 0.66524096]
Output
[0.26894142 0.5 0.73105858]
[-0.76159416 0. 0.76159416]
[0 0 1]
[-0.01 0. 1. ]
[0.09003057 0.24472847 0.66524096]
9. (b) Write a program in TensorFlow to demonstrate different Loss functions.
import tensorflow as tf
from tensorflow.keras.losses import MeanAbsoluteError
mae_loss = MeanAbsoluteError()
print(mae_loss(y_true, y_pred).numpy())
import tensorflow as tf
from tensorflow.keras.losses import CategoricalCrossentropy
cross_entropy_loss = CategoricalCrossentropy()
print(cross_entropy_loss(y_true, y_pred).numpy())
model = Sequential([
Input(shape=(28,28,1,)),
Flatten(),
Dense(units=84, activation="relu"),
Dense(units=10, activation="softmax"),
])
print (model.summary())
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_1 (Flatten) (None, 784) 0
=================================================================
Total params: 66,790
Trainable params: 66,790
Non-trainable params: 0
Epoch 1/10
235/235 [==============================] - 2s 6ms/step - loss: 7.8607 - acc: 0.8184 - val_loss: 1.7445
- val_acc: 0.8789
Epoch 2/10
235/235 [==============================] - 1s 6ms/step - loss: 1.1011 - acc: 0.8854 - val_loss: 0.9082
- val_acc: 0.8821
Epoch 3/10
235/235 [==============================] - 1s 6ms/step - loss: 0.5729 - acc: 0.8998 - val_loss: 0.6689
- val_acc: 0.8927
Epoch 4/10
235/235 [==============================] - 1s 5ms/step - loss: 0.3911 - acc: 0.9203 - val_loss: 0.5406
- val_acc: 0.9097
Epoch 5/10
235/235 [==============================] - 1s 6ms/step - loss: 0.3016 - acc: 0.9306 - val_loss: 0.5024
- val_acc: 0.9182
Epoch 6/10
235/235 [==============================] - 1s 6ms/step - loss: 0.2443 - acc: 0.9405 - val_loss: 0.4571
- val_acc: 0.9242
Epoch 7/10
235/235 [==============================] - 1s 5ms/step - loss: 0.2076 - acc: 0.9469 - val_loss: 0.4173
- val_acc: 0.9282
Epoch 8/10
235/235 [==============================] - 1s 5ms/step - loss: 0.1852 - acc: 0.9514 - val_loss: 0.4335
- val_acc: 0.9287
Epoch 9/10
235/235 [==============================] - 1s 6ms/step - loss: 0.1576 - acc: 0.9577 - val_loss: 0.4217
- val_acc: 0.9342
Epoch 10/10
235/235 [==============================] - 1s 5ms/step - loss: 0.1455 - acc: 0.9597 - val_loss: 0.4151
- val_acc: 0.9344
10. Write a program to build an Artificial Neural Network by implementing the Back
propagation algorithm and test the same using appropriate data sets
1. Initialize Network
# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network
from random import seed
from random import random
# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network
seed(1)
network = initialize_network(2, 1, 2)
for layer in network:
print(layer)
[{'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}]
[{'weights': [0.2550690257394217, 0.49543508709194095]}, {'weights': [0.4494910647887381,
0.651592972722763]}]
2. Forward Propagate
We can break forward propagation down into three parts:
1. Neuron Activation.
2. Neuron Transfer.
3. Forward Propagation.
[0.6629970129852887, 0.7253160725279748]