[go: up one dir, main page]

0% found this document useful (0 votes)
51 views81 pages

DLP Lab

The document outlines a series of practical programming tasks focused on implementing various neural network models using Python libraries such as TensorFlow and Keras. It includes detailed instructions for creating perceptrons, multi-layer perceptrons, convolutional neural networks, and other advanced models, along with examples and code snippets. Each task emphasizes the application of these models on datasets like Iris and MNIST for classification and prediction purposes.

Uploaded by

zoro39708
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views81 pages

DLP Lab

The document outlines a series of practical programming tasks focused on implementing various neural network models using Python libraries such as TensorFlow and Keras. It includes detailed instructions for creating perceptrons, multi-layer perceptrons, convolutional neural networks, and other advanced models, along with examples and code snippets. Each task emphasizes the application of these models on datasets like Iris and MNIST for classification and prediction purposes.

Uploaded by

zoro39708
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

List of Practical

S.No. Detailed Statement Date Signature


1. Write a program for creating a perceptron.
2. Write a program to implement multi-layer perceptron using
TensorFlow. Apply multi-layer perceptron (MLP) on the Iris dataset.
3. (a) Write a program to implement a Convolution Neural Network
(CNN) in Keras. Perform predictions using the trained Convolution
Neural Network (CNN).
(b) Write a program to build an Image Classifier with CIFAR-10 Data.
4. (a) Write a program to perform face detection using CNN.
(b) Write a program to demonstrate hyperparameter tuning in CNN.
(c) Predicting Bike-Sharing Patterns – Build and train neural
networks from scratch to predict the number of bike share users on
a given day.
5. Write a program to build auto-encoder in Keras.

6. Write a program to implement basic reinforcement learning algorithm to


teach a bot to reach its destination.
7. (a) Write a program to implement a Recurrent Neural Network
(b) Write a program to implement LSTM and perform time series
analysis using LSTM.
8. (a) Write a program to perform object detection using Deep Learning
(b) Dog-Breed Classifier – Design and train a convolutional neural
network to analyze images of dogs and correctly identify their breeds.
Use transfer learning and well-known architectures to improve this
model.
9. (a) Write a program to demonstrate different activation functions.
(b) Write a program in TensorFlow to demonstrate different Loss
functions.
10. Write a program to build an Artificial Neural Network by
implementing the Back propagation algorithm and test the same using
appropriate data sets
PROGRAMS

1. Write a program for creating a perceptron.

import matplotlib.pyplot as plt


import numpy as np
import pandas as pd
import joblib from matplotlib.colors
import ListedColormap
plt.style.use("fivethirtyeight")

class Perceptron:
def __init__(self, eta, epochs):
self.weights = np.random.randn(3) * 1e-4 # RANDOM WEIGHT ASSIGNMENT
print(f"initial weights before training: n{self.weights}")
self.eta = eta # LEARNING RATE
self.epochs = epochs

def activationFunction(self, inputs, weights):


z = np.dot(inputs, weights) # z = W * X
return np.where(z > 0, 1, 0) # ACTIVATION FUNCTION

def fit(self, X, y):


self.X = X
self.y = y

X_with_bias = np.c_[self.X, -np.ones((len(self.X), 1))] # HERE WE ARE USING BIAS AS WELL


print(f"X with bias: n{X_with_bias}")

for epoch in range(self.epochs):


print("--"*10)
print(f"for epoch: {epoch}")
print("--"*10)

y_hat = self.activationFunction(X_with_bias, self.weights) # forward pass


print(f"predicted value after forward pass: n{y_hat}")
self.error = self.y - y_hat
print(f"error: n{self.error}")
self.weights = self.weights + self.eta * np.dot(X_with_bias.T, self.error)
# backward propagation
print(f"updated weights after epoch:n{epoch}/{self.epochs} : n{self.weights}")
print("#####"*10)

def predict(self, X):


X_with_bias = np.c_[X, -np.ones((len(X), 1))]
return self.activationFunction(X_with_bias, self.weights)#Prediction function

def total_loss(self):
total_loss = np.sum(self.error)
print(f"total loss: {total_loss}")
return total_loss

def prepare_data(df):
X=df.drop("y",axis=1)
y=df["y"]
return X,y
AND = {
"x1": [0,0,1,1],
"x2": [0,1,0,1],
"y": [0,0,0,1],
}

df = pd.DataFrame(AND)

X,y = prepare_data(df)

ETA = 0.3 # 0 and 1


EPOCHS = 10

model = Perceptron(eta=ETA, epochs=EPOCHS)


model.fit(X, y)# Calling the function

model.total_loss()
OUTPUT
2. Write a program to implement multi-layer perceptron using TensorFlow. Apply multi-layer
perceptron (MLP) on the Iris dataset.

import numpy as np

class Perceptron:

def __init__(self, learning_rate, epochs):


self.weights = None
self.bias = None
self.learning_rate = learning_rate
self.epochs = epochs

Here we have initialized some instance variables including, weights, bias, learning rate, and
epochs(iteration). Next, we are going to define the activation function method

# heaviside activation function


def activation(self, z):
return np.heaviside(z, 0) # haviside(z) heaviside -> activation

The Heaviside activation method only takes one parameter, which is the weighted sum of inputs z, and returns
the corresponding output.

Let's come to the main section of training Perceptron,

def fit(self, X, y):


n_features = X.shape[1]

# Initializing weights and bias


self.weights = np.zeros((n_features))
self.bias = 0

# Iterating until the number of epochs


for epoch in range(self.epochs):

# Traversing through the entire training set


for i in range(len(X)):
z = np.dot(X, self.weights) + self.bias # Finding the dot product and adding
the bias
y_pred = self.activation(z) # Passing through an activation function

#Updating weights and bias


self.weights = self.weights + self.learning_rate * (y[i] - y_pred[i]) * X[i]
self.bias = self.bias + self.learning_rate * (y[i] - y_pred[i])

return self.weights, self.bias

What happens here is really simple, first, we'll find the number of features in the training instance for assigning
the weights. Then the initial weights and biases are assigned randomly. After that, we find the weighted sum of
inputs and passed through the Heaviside activation function. Finally, the weights and bias are updated in each
case and the optimal values are returned.

Prediction method

def predict(self, X):


z = np.dot(X, self.weights) + self.bias
return self.activation(z)

That's it, we are done with the Perceptron class, So let's do the main part which is to classify the Iris dataset
Classifying Iris dataset using Perceptron

The iris data consisted of 150 samples of three species of Iris including Setosa, Versicolor, and Virginica. The
first column of the dataset represented sepal length, the second column represented sepal width, the third
column represented petal length, and the fourth column represented petal width. For this classification
purpose, we are only using the petal length and petal width

Loading the dataset

from sklearn.datasets import load_iris

iris = load_iris()

Splitting the dataset

from sklearn.model_selection import train_test_split

X = iris.data[:, (0, 1)] # petal length, petal width


y = (iris.target == 0).astype(np.int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

Here we are only considering petal length and petal width, So we transformed the training data which only
contains the petal length and width and not sepal. After that, the X and y are split into training and testing sets.

Training and making predictions

Alright, now let's train our Perceptron algorithm,

perceptron = Perceptron(0.001, 100)

perceptron.fit(X_train, y_train)

pred = perceptron.predict(X_test)
Now let's see how much accuracy we have got,

from sklearn.metrics import accuracy_score

accuracy_score(pred, y_test)

-------

0.96

That's great we got an accuracy of 96%, changing the learning rate or the number of epochs will result in more
accurate results.

Classification report

from sklearn.metrics import classification_report

report = classification_report(pred, y_test, digits=2)


print(report)

------

precision recall f1-score support

0.0 0.93 1.00 0.97 43


1.0 1.00 0.91 0.95 32

accuracy 0.96 75
macro avg 0.97 0.95 0.96 75
weighted avg 0.96 0.96 0.96 75
3. (a) Write a program to implement a Convolution Neural Network (CNN) in Keras. Perform
predictions using the trained Convolution Neural Network (CNN).

Let’s first download some packages we’ll need:

$ pip install tensorflow numpy mnist

Note: We don’t need to install the keras package because it now comes bundled with TensorFlow as its
official high-level API! Using TensorFlow’s Keras is now recommended over the
standalone keras package.

You should now be able to import these packages and poke around the MNIST dataset:

import numpy as np
import mnist
from tensorflow import keras

# The first time you run this might be a bit slow, since the
# mnist package has to download and cache the data.
train_images = mnist.train_images()
train_labels = mnist.train_labels()

print(train_images.shape) # (60000, 28, 28)


print(train_labels.shape) # (60000,)
2. Preparing the Data

Before we begin, we’ll normalize the image pixel values from [0, 255] to [-0.5, 0.5] to make our
network easier to train (using smaller, centered values usually leads to better results). We’ll also
reshape each image from (28, 28) to (28, 28, 1) because Keras requires the third dimension.

import numpy as np
import mnist

train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.


train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

# Reshape the images.


train_images = np.expand_dims(train_images, axis=3)
test_images = np.expand_dims(test_images, axis=3)

print(train_images.shape) # (60000, 28, 28, 1)


print(test_images.shape) # (10000, 28, 28, 1)

We’re ready to start building our CNN!


3. Building the Model

Every Keras model is either built using the Sequential class, which represents a linear stack of layers,
or the functional Model class, which is more customizeable. We’ll be using the
simpler Sequential model, since our CNN will be a linear stack of layers.

We start by instantiating a Sequential model:

from tensorflow.keras.models import Sequential

# WIP
model = Sequential([
# layers...
])

The Sequential constructor takes an array of Keras Layers. We’ll use 3 types of layers for our
CNN: Convolutional, Max Pooling, and Softmax.

This is the same CNN setup we used in my introduction to CNNs. Read that post if you’re not
comfortable with any of these 3 types of layers.

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten

num_filters = 8
filter_size = 3
pool_size = 2

model = Sequential([
Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(10, activation='softmax'),
])

 num_filters, filter_size, and pool_size are self-explanatory variables that set the hyperparameters for our
CNN.
 The first layer in any Sequential model must specify the input_shape, so we do so on Conv2D. Once this input
shape is specified, Keras will automatically infer the shapes of inputs for later layers.
 The output Softmax layer has 10 nodes, one for each class.

4. Compiling the Model

Before we can begin training, we need to configure the training process. We decide 3 key factors
during the compilation step:

 The optimizer. We’ll stick with a pretty good default: the Adam gradient-based optimizer. Keras has many
other optimizers you can look into as well.
 The loss function. Since we’re using a Softmax output layer, we’ll use the Cross-Entropy loss. Keras
distinguishes between binary_crossentropy (2 classes) and categorical_crossentropy (>2 classes), so we’ll
use the latter. See all Keras losses.
 A list of metrics. Since this is a classification problem, we’ll just have Keras report on the accuracy metric.

Here’s what that compilation looks like:

model.compile(
'adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)

Onwards!

5. Training the Model

Training a model in Keras literally consists only of calling fit() and specifying some parameters. There
are a lot of possible parameters, but we’ll only supply these:

 The training data (images and labels), commonly known as X and Y, respectively.
 The number of epochs (iterations over the entire dataset) to train for.
 The validation data (or test data), which is used during training to periodically measure the network’s
performance against data it hasn’t seen before.

There’s one thing we have to be careful about: Keras expects the training targets to be 10-dimensional
vectors, since there are 10 nodes in our Softmax output layer. Right now,
our train_labels and test_labels arrays contain single integers representing the class for each image:

import mnist

train_labels = mnist.train_labels()
print(train_labels[0]) # 5

Conveniently, Keras has a utility method that fixes this exact issue: to_categorical. It turns our array of
class integers into an array of one-hot vectors instead. For example, 2 would become [0, 0, 1, 0, 0,
0, 0, 0, 0, 0] (it’s zero-indexed).

Here’s what that looks like:

from tensorflow.keras.utils import to_categorical

model.fit(
train_images,
to_categorical(train_labels),
epochs=3,
validation_data=(test_images, to_categorical(test_labels)),
)

We can now put everything together to train our network:


import numpy as np
import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
from tensorflow.keras.utils import to_categorical

train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.


train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

# Reshape the images.


train_images = np.expand_dims(train_images, axis=3)
test_images = np.expand_dims(test_images, axis=3)

num_filters = 8
filter_size = 3
pool_size = 2

# Build the model.


model = Sequential([
Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(10, activation='softmax'),
])

# Compile the model.


model.compile(
'adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)

# Train the model.


model.fit(
train_images,
to_categorical(train_labels),
epochs=3,
validation_data=(test_images, to_categorical(test_labels)),
)

Running that code on the full MNIST dataset gives us results like this:

Epoch 1
loss: 0.2433 - acc: 0.9276 - val_loss: 0.1176 - val_acc: 0.9634
Epoch 2
loss: 0.1184 - acc: 0.9648 - val_loss: 0.0936 - val_acc: 0.9721
Epoch 3
loss: 0.0930 - acc: 0.9721 - val_loss: 0.0778 - val_acc: 0.9744

We achieve 97.4% test accuracy with this simple CNN!


6. Using the Model

Now that we have a working, trained model, let’s put it to use. The first thing we’ll do is save it to disk
so we can load it back up anytime:

model.save_weights('cnn.h5')

We can now reload the trained model whenever we want by rebuilding it and loading in the saved
weights:

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten

num_filters = 8
filter_size = 3
pool_size = 2

# Build the model.


model = Sequential([
Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(10, activation='softmax'),
])

# Load the model's saved weights.


model.load_weights('cnn.h5')

Using the trained model to make predictions is easy: we pass an array of inputs to predict() and it
returns an array of outputs. Keep in mind that the output of our network is 10 probabilities (because
of softmax), so we’ll use np.argmax() to turn those into actual digits.

# Predict on the first 5 test images.


predictions = model.predict(test_images[:5])

# Print our model's predictions.


print(np.argmax(predictions, axis=1)) # [7, 2, 1, 0, 4]

# Check our predictions against the ground truths.


print(test_labels[:5]) # [7, 2, 1, 0, 4]
8. Extensions

There’s much more we can do to experiment with and improve our network - in this official Keras
MNIST CNN example, they achieve 99 test accuracy after 15 epochs. Some examples of modifications
you could make to our CNN include:

Network Depth

What happens if we add or remove Convolutional layers? How does that affect training and/or the
model’s final performance?

model = Sequential([
Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
Conv2D(num_filters, filter_size), MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(10, activation='softmax'),
])
Dropout

What if we tried adding Dropout layers, which are commonly used to prevent overfitting?

from tensorflow.keras.layers import Dropout


model = Sequential([
Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=pool_size),
Dropout(0.5), Flatten(),
Dense(10, activation='softmax'),
])
Fully-connected Layers

What if we add fully-connected layers between the Convolutional outputs and the final Softmax layer?
This is something commonly done in CNNs used for Computer Vision.

from tensorflow.keras.layers import Dense


model = Sequential([
Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(64, activation='relu'), Dense(10, activation='softmax'),
])
Convolution Parameters

What if we play with the Conv2D parameters? For example:

# These can be changed, too!


num_filters = 8
filter_size = 3

model = Sequential([
# See https://keras.io/layers/convolutional/#conv2d for more info.
Conv2D(
num_filters,
filter_size,
input_shape=(28, 28, 1),
strides=2, padding='same', activation='relu', ),
MaxPooling2D(pool_size=pool_size),
Flatten(),
Dense(10, activation='softmax'),
])
3. (b) Write a program to build an Image Classifier with CIFAR-10 Data.

Evaluation:
We have 10 classes, so if we pick a image and we randomly gues it class, we have 1/10 probability to be true.

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.preprocessing.image import ImageDataGenerator

from sklearn.metrics import ConfusionMatrixDisplay


from sklearn.metrics import classification_report, confusion_matrix
Load the data
In [2]:

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

print(f"X_train shape: {X_train.shape}")


print(f"y_train shape: {y_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_test shape: {y_test.shape}")
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 6s 0us/step
170508288/170498071 [==============================] - 6s 0us/step
X_train shape: (50000, 32, 32, 3)
y_train shape: (50000, 1)
X_test shape: (10000, 32, 32, 3)
y_test shape: (10000, 1)
Data Visualization
In [3]:

# Define the labels of the dataset


labels = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']

# Let's view more images in a grid format


# Define the dimensions of the plot grid
W_grid = 10
L_grid = 10
# fig, axes = plt.subplots(L_grid, W_grid)
# subplot return the figure object and axes object
# we can use the axes object to plot specific figures at various locations

fig, axes = plt.subplots(L_grid, W_grid, figsize = (17,17))

axes = axes.ravel() # flaten the 15 x 15 matrix into 225 array

n_train = len(X_train) # get the length of the train dataset

# Select a random number from 0 to n_train


for i in np.arange(0, W_grid * L_grid): # create evenly spaces variables

# Select a random number


index = np.random.randint(0, n_train)
# read and display an image with the selected index
axes[i].imshow(X_train[index,1:])
label_index = int(y_train[index])
axes[i].set_title(labels[label_index], fontsize = 8)
axes[i].axis('off')

plt.subplots_adjust(hspace=0.4)

In [4]:

classes_name = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', '
Truck']

classes, counts = np.unique(y_train, return_counts=True)


plt.barh(classes_name, counts)
plt.title('Class distribution in training set')
Out[4]:

Text(0.5, 1.0, 'Class distribution in training set')

In [5]:

classes, counts = np.unique(y_test, return_counts=True)


plt.barh(classes_name, counts)
plt.title('Class distribution in testing set')
Out[5]:

Text(0.5, 1.0, 'Class distribution in testing set')

The class are equally distributed

Data Preprocessing
In [6]:

# Scale the data


X_train = X_train / 255.0
X_test = X_test / 255.0

# Transform target variable into one-hotencoding


y_cat_train = to_categorical(y_train, 10)
y_cat_test = to_categorical(y_test, 10)
In [7]:

y_cat_train
Out[7]:

array([[0., 0., 0., ..., 0., 0., 0.],


[0., 0., 0., ..., 0., 0., 1.],
[0., 0., 0., ..., 0., 0., 1.],
...,
[0., 0., 0., ..., 0., 0., 1.],
[0., 1., 0., ..., 0., 0., 0.],
[0., 1., 0., ..., 0., 0., 0.]], dtype=float32)
Model Building
In [8]:

linkcode

INPUT_SHAPE = (32, 32, 3)


KERNEL_SIZE = (3, 3)
model = Sequential()

# Convolutional Layer
model.add(Conv2D(filters=32, kernel_size=KERNEL_SIZE, input_shape=INPUT_SHAPE, activation='relu',
padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(filters=32, kernel_size=KERNEL_SIZE, input_shape=INPUT_SHAPE, activation='relu',
padding='same'))
model.add(BatchNormalization())
# Pooling layer
model.add(MaxPool2D(pool_size=(2, 2)))
# Dropout layers
model.add(Dropout(0.25))

model.add(Conv2D(filters=64, kernel_size=KERNEL_SIZE, input_shape=INPUT_SHAPE, activation='relu',


padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(filters=64, kernel_size=KERNEL_SIZE, input_shape=INPUT_SHAPE, activation='relu',
padding='same'))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters=128, kernel_size=KERNEL_SIZE, input_shape=INPUT_SHAPE, activation='relu',


padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(filters=128, kernel_size=KERNEL_SIZE, input_shape=INPUT_SHAPE, activation='relu',
padding='same'))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
# model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(10, activation='softmax'))

METRICS = [
'accuracy',
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall')
]
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=METRICS)

Model Evaluation
In [12]:

plt.figure(figsize=(12, 16))

plt.subplot(4, 2, 1)
plt.plot(r.history['loss'], label='Loss')
plt.plot(r.history['val_loss'], label='val_Loss')
plt.title('Loss Function Evolution')
plt.legend()

plt.subplot(4, 2, 2)
plt.plot(r.history['accuracy'], label='accuracy')
plt.plot(r.history['val_accuracy'], label='val_accuracy')
plt.title('Accuracy Function Evolution')
plt.legend()

plt.subplot(4, 2, 3)
plt.plot(r.history['precision'], label='precision')
plt.plot(r.history['val_precision'], label='val_precision')
plt.title('Precision Function Evolution')
plt.legend()

plt.subplot(4, 2, 4)
plt.plot(r.history['recall'], label='recall')
plt.plot(r.history['val_recall'], label='val_recall')
plt.title('Recall Function Evolution')
plt.legend()
Out[12]:

<matplotlib.legend.Legend at 0x7fc610996bd0>
4. (a) Write a program to perform face detection using CNN.

# Deep Learning CNN model to recognize face


'''This script uses a database of images and creates CNN model on top of it to test
if the given image is recognized correctly or not'''

'''####### IMAGE PRE-PROCESSING for TRAINING and TESTING data #######'''

# Specifying the folder where images are present


TrainingImagePath='/Users/farukh/Python Case Studies/Face Images/Final Training Images'

from keras.preprocessing.image import ImageDataGenerator


# Understand more about ImageDataGenerator at below link
# https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

# Defining pre-processing transformations on raw images of training data


# These hyper parameters helps to generate slightly twisted versions
# of the original image, which leads to a better model, since it learns
# on the good and bad mix of images
train_datagen = ImageDataGenerator(
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True)

# Defining pre-processing transformations on raw images of testing data


# No transformations are done on the testing images
test_datagen = ImageDataGenerator()

# Generating the Training Data


training_set = train_datagen.flow_from_directory(
TrainingImagePath,
target_size=(64, 64),
batch_size=32,
class_mode='categorical')

# Generating the Testing Data


test_set = test_datagen.flow_from_directory(
TrainingImagePath,
target_size=(64, 64),
batch_size=32,
class_mode='categorical')

# Printing class labels for each face


test_set.class_indices

Creating a mapping for index and face names


The above class_index dictionary has face names as keys and the numeric mapping as values. We need
to swap it, because the classifier model will return the answer as the numeric mapping and we need to get
the face-name out of it.

Also, since this is a multi-class classification problem, we are counting the number of unique faces, as that
will be used as the number of output neurons in the output layer of fully connected ANN classifier.

'''############ Creating lookup table for all faces ############'''


# class_indices have the numeric tag for each face
TrainClasses=training_set.class_indices

# Storing the face and the numeric tag for future reference
ResultMap={}
for faceValue,faceName in zip(TrainClasses.values(),TrainClasses.keys()):
ResultMap[faceValue]=faceName

# Saving the face map for future reference


import pickle
with open("ResultsMap.pkl", 'wb') as fileWriteStream:
pickle.dump(ResultMap, fileWriteStream)

# The model will give answer as a numeric tag


# This mapping will help to get the corresponding face name for it
print("Mapping of Face and its ID",ResultMap)

# The number of neurons for the output layer is equal to the number of faces
OutputNeurons=len(ResultMap)
print('\n The Number of output neurons: ', OutputNeurons)

Creating the CNN face recognition model


In the below code snippet, I have created a CNN model with

 2 hidden layers of convolution


 2 hidden layers of max pooling
 1 layer of flattening
 1 Hidden ANN layer
 1 output layer with 16-neurons (one for each face)
You can increase or decrease the convolution, max pooling, and hidden ANN layers and the number of
neurons in it.

Just keep in mind, the more layers/neurons you add, the slower the model becomes.

Also, when you have large amount of images, in the tune of 50K and above, then your laptop’ CPU might
not be efficient to learn those many images. You will have to get a GPU enabled laptop, or use cloud
services like AWS or Google Cloud.

Since the data we have used for the demonstration is small containing only 244 images for training, you

can run it on your laptop easily

Apart from selecting the best number of layers and the number of neurons in it, for each layer, there are
some hyper parameters which needs to be tuned as well.

Take a quick look at some of the important hyperparameters

 Filters=32: This number indicates how many filters we are using to look at the image pixels during
the convolution step. Some filters may catch sharp edges, some filters may catch color variations
some filters may catch outlines, etc. In the end, we get important information from the images. In
the first layer the number of filters=32 is commonly used, then increasing the power of 2. Like in
the next layer it is 64, in the next layer, it is 128 so on and so forth.
 kernel_size=(5,5): This indicates the size of the sliding window during convolution, in this case
study we are using 5X5 pixels sliding window.
 strides=(1, 1): How fast or slow should the sliding window move during convolution. We are using
the lowest setting of 1X1 pixels. Means slide the convolution window of 5X5 (kernal_size) by 1
pixel in the x-axis and 1 pixel in the y-axis until the whole image is scanned.
 input_shape=(64,64,3): Images are nothing but matrix of RGB color codes. during our data pre-
processing we have compressed the images to 64X64, hence the expected shape is 64X64X3.
Means 3 arrays of 64X64, one for RGB colors each.
 kernel_initializer=’uniform’: When the Neurons start their computation, some algorithm has to
decide the value for each weight. This parameter specifies that. You can choose different values
for it like ‘normal’ or ‘glorot_uniform’.
 activation=’relu’: This specifies the activation function for the calculations inside each neuron.
You can choose values like ‘relu’, ‘tanh’, ‘sigmoid’, etc.
 optimizer=’adam’: This parameter helps to find the optimum values of each weight in the neural
network. ‘adam’ is one of the most useful optimizers, another one is ‘rmsprop’
 batch_size=10: This specifies how many rows will be passed to the Network in one go after which
the SSE calculation will begin and the neural network will start adjusting its weights based on the
errors.
When all the rows are passed in the batches of 10 rows each as specified in this parameter, then
we call that 1-epoch. Or one full data cycle. This is also known as mini-batch gradient descent. A
small value of batch_size will make the LSTM look at the data slowly, like 2 rows at a time or 4
rows at a time which could lead to overfitting, as compared to a large value like 20 or 50 rows at a
time, which will make the LSTM look at the data fast which could lead to underfitting. Hence a
proper value must be chosen using hyperparameter tuning.
 Epochs=10: The same activity of adjusting weights continues for 10 times, as specified by this
parameter. In simple terms, the LSTM looks at the full training data 10 times and adjusts its
weights.
'''######################## Create CNN deep learning model ########################'''
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPool2D
from keras.layers import Flatten
from keras.layers import Dense

'''Initializing the Convolutional Neural Network'''


classifier= Sequential()

''' STEP--1 Convolution


# Adding the first layer of CNN
# we are using the format (64,64,3) because we are using TensorFlow backend
# It means 3 matrix of size (64X64) pixels representing Red, Green and Blue components of pixels
'''
classifier.add(Convolution2D(32, kernel_size=(5, 5), strides=(1, 1), input_shape=(64,64,3),
activation='relu'))

'''# STEP--2 MAX Pooling'''


classifier.add(MaxPool2D(pool_size=(2,2)))

'''############## ADDITIONAL LAYER of CONVOLUTION for better accuracy #################'''


classifier.add(Convolution2D(64, kernel_size=(5, 5), strides=(1, 1), activation='relu'))

classifier.add(MaxPool2D(pool_size=(2,2)))

'''# STEP--3 FLattening'''


classifier.add(Flatten())

'''# STEP--4 Fully Connected Neural Network'''


classifier.add(Dense(64, activation='relu'))

classifier.add(Dense(OutputNeurons, activation='softmax'))

'''# Compiling the CNN'''


#classifier.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
classifier.compile(loss='categorical_crossentropy', optimizer = 'adam', metrics=["accuracy"])

###########################################################
import time
# Measuring the time taken by the model to train
StartTime=time.time()
# Starting the model training
classifier.fit_generator(
training_set,
steps_per_epoch=30,
epochs=10,
validation_data=test_set,
validation_steps=10)

EndTime=time.time()
print("###### Total Time Taken: ", round((EndTime-StartTime)/60), 'Minutes ######')
4. (b) Write a program to demonstrate hyperparameter tuning in CNN.

# Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import cv2
from PIL import Image

from sklearn.model_selection import GridSearchCV


from sklearn.model_selection import KFold
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import Adam
from keras.layers import Dropout
In [2]:

#importing training dataset


train=pd.read_csv('../input/gtsrb-german-traffic-sign/Train.csv')
X_train=train['Path']
y_train=train.ClassId
train
Out[2]:

Width Height Roi.X1 Roi.Y1 Roi.X2 Roi.Y2 ClassId Path

0 27 26 5 5 22 20 20 Train/20/00020_00000_00000.png

1 28 27 5 6 23 22 20 Train/20/00020_00000_00001.png

2 29 26 6 5 24 21 20 Train/20/00020_00000_00002.png

3 28 27 5 6 23 22 20 Train/20/00020_00000_00003.png

4 28 26 5 5 23 21 20 Train/20/00020_00000_00004.png

... ... ... ... ... ... ... ... ...


Width Height Roi.X1 Roi.Y1 Roi.X2 Roi.Y2 ClassId Path

39204 52 56 5 6 47 51 42 Train/42/00042_00007_00025.png

39205 56 58 5 5 51 53 42 Train/42/00042_00007_00026.png

39206 58 62 5 6 53 57 42 Train/42/00042_00007_00027.png

39207 63 69 5 7 58 63 42 Train/42/00042_00007_00028.png

39208 68 69 7 6 62 63 42 Train/42/00042_00007_00029.png

39209 rows × 8 columns

In [3]:

data_dir = "../input/gtsrb-german-traffic-sign"
train_imgpath= list((data_dir + '/' + str(train.Path[i])) for i in range(len(train.Path)))
In [4]:

for i in range(0,9):
plt.subplot(331+i)
seed=np.random.randint(0,39210)
im = Image.open(train_imgpath[seed])
plt.imshow(im)

plt.show()

Preprocessing image-
converting images into arrays of the form (28,28,3)

In [5]:

train_data=[]
train_labels=[]

path = "../input/gtsrb-german-traffic-sign/"
for i in range(len(train.Path)):
image=cv2.imread(train_imgpath[i])
image_from_array = Image.fromarray(image, 'RGB')
size_image = image_from_array.resize((28,28))
train_data.append(np.array(size_image))
train_labels.append(train.ClassId[i])
X=np.array(train_data)
y=np.array(train_labels)
In [6]:

#Spliting the images into train and validation sets

from sklearn.model_selection import train_test_split


X_train, X_val, y_train, y_val = train_test_split( X, y, test_size=0.20, random_state=7777)
In [7]:

X_train = X_train.astype('float32')/255
X_val = X_val.astype('float32')/255

#Using one hote encoding for the train and validation labels
from keras.utils import to_categorical
y_train = to_categorical(y_train, 43)
y_val = to_categorical(y_val, 43)
CNN Model-
Grid Search to determine the layers and neurons in each layer in the sequential model.

In [8]:

linkcode

def create_model(layers):
cnn = tf.keras.models.Sequential()
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation="relu", i
nput_shape=[28, 28, 3]))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2, padding='valid'))
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation="relu"))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2, padding='valid'))
cnn.add(tf.keras.layers.Flatten())

for i, nodes in enumerate(layers):


cnn.add(tf.keras.layers.Dense(units=nodes, activation='relu'))

cnn.add(tf.keras.layers.Dense(units=43, activation='softmax'))

cnn.compile(optimizer = 'Adam', loss = 'binary_crossentropy', metrics = ['accuracy'])


return cnn

model = KerasClassifier(build_fn=create_model, verbose=1)


layers = [[128],(256, 128),(200, 150, 120)]
param_grid = dict(layers=layers)
grid = GridSearchCV(estimator=model, param_grid=param_grid, verbose=1)
grid_results = grid.fit(X_train,y_train, validation_data=(X_val, y_val))
print("Best: {0}, using {1}".format(grid_results.best_score_, grid_results.best_params_))
means = grid_results.cv_results_['mean_test_score']
stds = grid_results.cv_results_['std_test_score']
params = grid_results.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print('{0} ({1}) with: {2}'.format(mean, stdev, param))
4. (c) Predicting Bike-Sharing Patterns – Build and train neural networks from scratch to predict
the number of bike share users on a given day.

Bike Rental Ridership Prediction with a Deep Neural Network in Python


In this project, you'll build your first neural network and use it to predict daily bike rental ridership. We've provided
some of the code, but left the implementation of the neural network up to you (for the most part). After you've
submitted this project, feel free to explore the data and the model more.
In [33]:

%matplotlib inline
#%config InlineBackend.figure_format = 'retina'
%qtconsole

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Load and prepare the data


A critical step in working with neural networks is preparing the data correctly. Variables on different scales make it
difficult for the network to efficiently learn the correct weights. Below, we've written the code to load and prepare the
data. You'll learn more about this soon!
In [34]:

data_path = 'Bike-Sharing-Dataset/hour.csv'

rides = pd.read_csv(data_path)
In [35]:

rides.head()
Out[35]:
Checking out the data
This dataset has the number of riders for each hour of each day from January 1 2011 to December 31 2012. The
number of riders is split between casual and registered, summed up in the cnt column. You can see the first few
rows of the data above.
Below is a plot showing the number of bike riders over the first 10 days in the data set. You can see the hourly
rentals here. This data is pretty complicated! The weekends have lower over all ridership and there are spikes when
people are biking to and from work during the week. Looking at the data above, we also have information about
temperature, humidity, and windspeed, all of these likely affecting the number of riders. You'll be trying to capture all
this with your model.
In [36]:

rides[:24*10].plot(x='dteday', y='cnt')
Out[36]:

<matplotlib.axes._subplots.AxesSubplot at 0x8ea8a20>

Dummy variables

Here we have some categorical variables like season, weather, month. To include these in our model, we'll need to
make binary dummy variables. This is simple to do with Pandas thanks to get_dummies().
In [37]:

dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday']


for each in dummy_fields:
dummies = pd.get_dummies(rides[each], prefix=each, drop_first=False)
rides = pd.concat([rides, dummies], axis=1)

fields_to_drop = ['instant', 'dteday', 'season', 'weathersit',


'weekday', 'atemp', 'mnth', 'workingday', 'hr']
data = rides.drop(fields_to_drop, axis=1)
data.head()
Out[37]:

5 rows × 59 columns

Scaling target variables

To make training the network easier, we'll standardize each of the continuous variables. That is, we'll shift and scale
the variables such that they have zero mean and a standard deviation of 1.

The scaling factors are saved so we can go backwards when we use the network for predictions.
In [38]:

quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed']


# Store scalings in a dictionary so we can convert back later
scaled_features = {}
for each in quant_features:
mean, std = data[each].mean(), data[each].std()
scaled_features[each] = [mean, std]
data.loc[:, each] = (data[each] - mean)/std
In [39]:

data.head()
Out[39]:

5 rows × 59 columns
Splitting the data into training, testing, and validation sets

We'll save the last 21 days of the data to use as a test set after we've trained the network. We'll use this set to make
predictions and compare them with the actual number of riders.
In [40]:

# Save the last 21 days


test_data = data[-21*24:]
data = data[:-21*24]

# Separate the data into features and targets


target_fields = ['cnt', 'casual', 'registered']
features, targets = data.drop(target_fields, axis=1), data[target_fields]
test_features, test_targets = test_data.drop(target_fields, axis=1), test_data[target_fie
lds]
We'll split the data into two sets, one for training and one for validating as the network is being trained. Since this is
time series data, we'll train on historical data, then try to predict on future data (the validation set).
In [41]:

# Hold out the last 60 days of the remaining data as a validation set
train_features, train_targets = features[:-60*24], targets[:-60*24]
val_features, val_targets = features[-60*24:], targets[-60*24:]

Time to build the network


Below you'll build your network. We've built out the structure and the backwards pass. You'll implement the forward
pass through the network. You'll also set the hyperparameters: the learning rate, the number of hidden units, and
the number of training passes.
The network has two layers, a hidden layer and an output layer. The hidden layer will use the sigmoid function for
activations. The output layer has only one node and is used for the regression, the output of the node is the same as
the input of the node. That is, the activation function is f(x)=x�(�)=�. A function that takes the input signal and
generates an output signal, but takes into account the threshold, is called an activation function. We work through
each layer of our network calculating the outputs for each neuron. All of the outputs from one layer become inputs to
the neurons on the next layer. This process is called forward propagation.

We use the weights to propagate signals forward from the input to the output layers in a neural network. We use the
weights to also propagate error backwards from the output back into the network to update our weights. This is
called backpropagation.
Hint: You'll need the derivative of the output activation function ( f(x)=x�(�)=�) for the backpropagation
implementation. If you aren't familiar with calculus, this function is equivalent to the equation y=x�=�. What is the
slope of that equation? That is the derivative of f(x)�(�).

Below, you have these tasks:

1. Implement the sigmoid function to use as the activation function.


Set self.activation_function in __init__ to your sigmoid function.
2. Implement the forward pass in the train method.
3. Implement the backpropagation algorithm in the train method, including calculating the output error.
4. Implement the forward pass in the run method.

In [42]:
class NeuralNetwork(object):
def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
# Set number of nodes in input, hidden and output layers.
self.input_nodes = input_nodes
self.hidden_nodes = hidden_nodes
self.output_nodes = output_nodes

# Initialize weights
self.weights_input_to_hidden = np.random.normal(0.0, self.hidden_nodes**-0.5,
(self.hidden_nodes, self.input_nodes))
#returns a numpy array of hidden nodes elements each containing input nodes eleme
nts. These are the weights to
#multiply with each feature=input node
# It returns a shape of (2,30)

self.weights_hidden_to_output = np.random.normal(0.0, self.output_nodes**-0.5,


(self.output_nodes, self.hidden_nodes))
self.lr = learning_rate

#### Set this to your implemented sigmoid function ####


# Activation function is the sigmoid function

self.activation_function = lambda x: self.sigmoid(x)

def sigmoid(self, x):


return 1/(1 + np.exp(-x))

def train(self, inputs_list, targets_list):


# Convert inputs list to 2d array
# We are actually making a column out of each element in the list,resulting in an
array of lists(or arrays)
#the resulting shape is (30,1)
inputs = np.array(inputs_list, ndmin=2).T
targets = np.array(targets_list, ndmin=2).T

#### Implement the forward pass here ####


### Forward pass ###
# TODO: Hidden layer

# signals into hidden layer


# multiply each of the weight lists with the input features
# it results in a shape of (2,1) which is two arrays of one value each to be fed
into the two hidden nodes
hidden_inputs = np.dot(self.weights_input_to_hidden,inputs)
# signals from hidden layer
#Apply the activation function you get out a (2,1) array
hidden_outputs =self.activation_function(hidden_inputs)

# TODO: Output layer

# signals into final output layer


#apply the weights_hidden_to_output now which is a (1,2) array to the output of t
he hidden nodes (2,1)
#you get a (1,1) array
final_inputs = np.dot(self.weights_hidden_to_output, hidden_outputs)
# signals from final output layer
# no function so what comes in goes out
final_outputs = final_inputs

#### Implement the backward pass here ####


### Backward pass ###

# TODO: Output error (Wouldn't it be better if we had three output nodes?)

# Output layer error is the difference between desired target and actual output.
# deducts from the desired feature value and return a (1,1) array
output_errors = targets - final_outputs

# TODO: Backpropagated error

# errors propagated to the hidden layer


# going back now: weights_hidden_to_output is a (1,2) array and output_errors is
a (1,1) array , so we have to
# transpose the weights so that we can multiply them and we get a (2,1) array
hidden_errors = np.dot(self.weights_hidden_to_output.T,output_errors)

# hidden layer gradients


# hidden_outputs is a (2,1) array that came out of the sigmoid function of the hi
dden layer (2 hidden nodes)
# the hidden grad is the derivative of this output which we will use as our guide
to minimize the error
# it is also a (2,1) array
hidden_grad = hidden_outputs * (1 - hidden_outputs)

# TODO: Update the weights


# update hidden-to-output weights with gradient descent step
self.weights_hidden_to_output += self.lr * np.dot(output_errors, hidden_outputs.T
)

# update input-to-hidden weights with gradient descent step


# this is where we apply the gradient to the hidden errors and then dot it with t
he inputs to get the desired shape
self.weights_input_to_hidden += self.lr * np.dot(hidden_grad*hidden_errors, input
s.T)

def run(self, inputs_list):


# Run a forward pass through the network
inputs = np.array(inputs_list, ndmin=2).T

#### Implement the forward pass here ####


# TODO: Hidden layer
# signals into hidden layer
hidden_inputs = np.dot(self.weights_input_to_hidden,inputs)
# signals from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)

# TODO: Output layer


# signals into final output layer
final_inputs = np.dot(self.weights_hidden_to_output,hidden_outputs)
final_outputs = final_inputs# signals from final output layer

return final_outputs
In [43]:

# Calculate the difference between the loss of the training and the validation set
def MSE(y, Y):
return np.mean((y-Y)**2)

Training the network


Here you'll set the hyperparameters for the network. The strategy here is to find hyperparameters such that the error
on the training set is low, but you're not overfitting to the data. If you train the network too long or have too many
hidden nodes, it can become overly specific to the training set and will fail to generalize to the validation set. That is,
the loss on the validation set will start increasing as the training set loss drops.

You'll also be using a method know as Stochastic Gradient Descent (SGD) to train the network. The idea is that for
each training pass, you grab a random sample of the data instead of using the whole data set. You use many more
training passes than with normal gradient descent, but each pass is much faster. This ends up training the network
more efficiently. You'll learn more about SGD later.

Choose the number of epochs

This is the number of times the dataset will pass through the network, each time updating the weights. As the
number of epochs increases, the network becomes better and better at predicting the targets in the training set.
You'll need to choose enough epochs to train the network well but not too many or you'll be overfitting.

Choose the learning rate

This scales the size of weight updates. If this is too big, the weights tend to explode and the network fails to fit the
data. A good choice to start at is 0.1. If the network has problems fitting the data, try reducing the learning rate. Note
that the lower the learning rate, the smaller the steps are in the weight updates and the longer it takes for the neural
network to converge.

Choose the number of hidden nodes

The more hidden nodes you have, the more accurate predictions the model will make. Try a few different numbers
and see how it affects the performance. You can look at the losses dictionary for a metric of the network
performance. If the number of hidden units is too low, then the model won't have enough space to learn and if it is
too high there are too many options for the direction that the learning can take. The trick here is to find the right
balance in number of hidden units you choose.
In [45]:

import sys
### Set the hyperparameters here ###
epochs = 2000
learning_rate = 0.05
hidden_nodes = 28
output_nodes = 1

#get the number of input nodes from the shape of the first row of the train_features
N_i = train_features.shape[1]
#Initiate the network
network = NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate)

#record the losses in a dictionary of lists


losses = {'train':[], 'validation':[]}

#train the network


for e in range(epochs):
# Go through a random batch of 128 records from the training data set
# train_features.index is a generator of the index number of the pd dataframe
batch = np.random.choice(train_features.index, size=128)
for record, target in zip(train_features.ix[batch].values,
train_targets.ix[batch]['cnt']):
network.train(record, target)

# Printing out the training progress


train_loss = MSE(network.run(train_features), train_targets['cnt'].values)
val_loss = MSE(network.run(val_features), val_targets['cnt'].values)
sys.stdout.write("\rProgress: " + str(100 * e/float(epochs))[:4] \
+ "% ... Training loss: " + str(train_loss)[:5] \
+ " ... Validation loss: " + str(val_loss)[:5])

losses['train'].append(train_loss)
losses['validation'].append(val_loss)
Progress: 99.9% ... Training loss: 0.049 ... Validation loss: 0.155
In [46]:

plt.plot(losses['train'], label='Training loss')


plt.plot(losses['validation'], label='Validation loss')
plt.legend()
plt.ylim(ymax=0.5)
Out[46]:

(0.0, 0.5)

Check out your predictions


Here, use the test data to view how well your network is modeling the data. If something is completely wrong here,
make sure each step in your network is implemented correctly.
In [47]:
fig, ax = plt.subplots(figsize=(8,4))

mean, std = scaled_features['cnt']


predictions = network.run(test_features)*std + mean
ax.plot(predictions[0], label='Prediction')
ax.plot((test_targets['cnt']*std + mean).values, label='Data')
ax.set_xlim(right=len(predictions))
ax.legend()

dates = pd.to_datetime(rides.ix[test_data.index]['dteday'])
dates = dates.apply(lambda d: d.strftime('%b %d'))
ax.set_xticks(np.arange(len(dates))[12::24])
_ = ax.set_xticklabels(dates[12::24], rotation=45)

Thinking about your results


Answer these questions about your results. How well does the model predict the data? Where does it fail? Why
does it fail where it does?
Note: You can edit the text in this cell by double clicking on it. When you want to render the text, press control +
enter

Your answer below

The validation loss for our model and these hyper parameters ranges from o.130 to 0.143. The model does much
better on the normal days versus the weekends and holiday season. The large variance of the bike sharing users
over the weekend and especially on the holiday season at the end of December does not allow the model to
acurately predict these values without overfitting. Probably more training data , over a larger period of time would
improve the prediction for these outlyiing values.

Having tried various hyperarameter combinations, I have noticed that after around 2000 epochs the validation and
training stop converging and start diverging slightly , indicating that overfitting may be taking place after this number.
I did not observe any improvement in the validation loss or the model for learning rates below 0.09 or above a
number of hidden nodes around 30.

Unit tests
Run these unit tests to check the correctness of your network implementation. These tests must all be successful to
pass the project.
In [15]:

import unittest

inputs = [0.5, -0.2, 0.1]


targets = [0.4]
test_w_i_h = np.array([[0.1, 0.4, -0.3],
[-0.2, 0.5, 0.2]])
test_w_h_o = np.array([[0.3, -0.1]])

class TestMethods(unittest.TestCase):

##########
# Unit tests for data loading
##########

def test_data_path(self):
# Test that file path to dataset has been unaltered
self.assertTrue(data_path.lower() == 'bike-sharing-dataset/hour.csv')

def test_data_loaded(self):
# Test that data frame loaded
self.assertTrue(isinstance(rides, pd.DataFrame))

##########
# Unit tests for network functionality
##########

def test_activation(self):
network = NeuralNetwork(3, 2, 1, 0.5)
# Test that the activation function is a sigmoid
self.assertTrue(np.all(network.activation_function(0.5) == 1/(1+np.exp(-0.5))))

def test_train(self):
# Test that weights are updated correctly on training
network = NeuralNetwork(3, 2, 1, 0.5)
network.weights_input_to_hidden = test_w_i_h.copy()
network.weights_hidden_to_output = test_w_h_o.copy()

network.train(inputs, targets)
self.assertTrue(np.allclose(network.weights_hidden_to_output,
np.array([[ 0.37275328, -0.03172939]])))
self.assertTrue(np.allclose(network.weights_input_to_hidden,
np.array([[ 0.10562014, 0.39775194, -0.29887597],
[-0.20185996, 0.50074398, 0.19962801]])))

def test_run(self):
# Test correctness of run method
network = NeuralNetwork(3, 2, 1, 0.5)
network.weights_input_to_hidden = test_w_i_h.copy()
network.weights_hidden_to_output = test_w_h_o.copy()

self.assertTrue(np.allclose(network.run(inputs), 0.09998924))

suite = unittest.TestLoader().loadTestsFromModule(TestMethods())
unittest.TextTestRunner().run(suite)
.....
----------------------------------------------------------------------
Ran 5 tests in 0.016s

OK
5. Write a program to build auto-encoder in Keras.

Let's build the simplest possible autoencoder


We'll start simple, with a single fully-connected neural layer as encoder and as decoder:

import keras
from keras import layers

# This is the size of our encoded representations


encoding_dim = 32 # 32 floats -> compression of factor 24.5, assuming the input is 784 floats

# This is our input image


input_img = keras.Input(shape=(784,))
# "encoded" is the encoded representation of the input
encoded = layers.Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the lossy reconstruction of the input
decoded = layers.Dense(784, activation='sigmoid')(encoded)

# This model maps an input to its reconstruction


autoencoder = keras.Model(input_img, decoded)
Let's also create a separate encoder model:

# This model maps an input to its encoded representation


encoder = keras.Model(input_img, encoded)
As well as the decoder model:

# This is our encoded (32-dimensional) input


encoded_input = keras.Input(shape=(encoding_dim,))
# Retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# Create the decoder model
decoder = keras.Model(encoded_input, decoder_layer(encoded_input))
Now let's train our autoencoder to reconstruct MNIST digits.

First, we'll configure our model to use a per-pixel binary crossentropy loss, and the Adam optimizer:

autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
Let's prepare our input data. We're using MNIST digits, and we're discarding the labels (since we're only interested
in encoding/decoding the input images).

from keras.datasets import mnist


import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
We will normalize all values between 0 and 1 and we will flatten the 28x28 images into vectors of size 784.

x_train = x_train.astype('float32') / 255.


x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)
Now let's train our autoencoder for 50 epochs:

autoencoder.fit(x_train, x_train,
epochs=50,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
After 50 epochs, the autoencoder seems to reach a stable train/validation loss value of about 0.09. We can try to
visualize the reconstructed inputs and the encoded representations. We will use Matplotlib.
# Encode and decode some digits
# Note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
# Use Matplotlib (don't ask)
import matplotlib.pyplot as plt

n = 10 # How many digits we will display


plt.figure(figsize=(20, 4))
for i in range(n):
# Display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# Display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
Here's what we get. The top row is the original digits, and the bottom row is the reconstructed digits. We are losing
quite a bit of detail with this basic approach.

Adding a sparsity constraint on the encoded


representations
In the previous example, the representations were only constrained by the size of the hidden layer (32). In such a
situation, what typically happens is that the hidden layer is learning an approximation of PCA (principal component
analysis). But another way to constrain the representations to be compact is to add a sparsity contraint on the
activity of the hidden representations, so fewer units would "fire" at a given time. In Keras, this can be done by
adding an activity_regularizer to our Dense layer:
from keras import regularizers

encoding_dim = 32

input_img = keras.Input(shape=(784,))
# Add a Dense layer with a L1 activity regularizer
encoded = layers.Dense(encoding_dim, activation='relu',
activity_regularizer=regularizers.l1(10e-5))(input_img)
decoded = layers.Dense(784, activation='sigmoid')(encoded)

autoencoder = keras.Model(input_img, decoded)


Let's train this model for 100 epochs (with the added regularization the model is less likely to overfit and can be
trained longer). The models ends with a train loss of 0.11 and test loss of 0.10. The difference between the two is
mostly due to the regularization term being added to the loss during training (worth about 0.01).
Here's a visualization of our new results:
They look pretty similar to the previous model, the only significant difference being the sparsity of the encoded
representations. encoded_imgs.mean() yields a value 3.33 (over our 10,000 test images), whereas with the previous
model the same quantity was 7.30. So our new model yields encoded representations that are twice sparser.

Deep autoencoder
We do not have to limit ourselves to a single layer as encoder or decoder, we could instead use a stack of layers, such
as:

input_img = keras.Input(shape=(784,))
encoded = layers.Dense(128, activation='relu')(input_img)
encoded = layers.Dense(64, activation='relu')(encoded)
encoded = layers.Dense(32, activation='relu')(encoded)

decoded = layers.Dense(64, activation='relu')(encoded)


decoded = layers.Dense(128, activation='relu')(decoded)
decoded = layers.Dense(784, activation='sigmoid')(decoded)
Let's try this:

autoencoder = keras.Model(input_img, decoded)


autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

autoencoder.fit(x_train, x_train,
epochs=100,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
After 100 epochs, it reaches a train and validation loss of ~0.08, a bit better than our previous models. Our
reconstructed digits look a bit better too:

Convolutional autoencoder
Since our inputs are images, it makes sense to use convolutional neural networks (convnets) as encoders and
decoders. In practical settings, autoencoders applied to images are always convolutional autoencoders --they simply
perform much better.
Let's implement one. The encoder will consist in a stack of Conv2D and MaxPooling2D layers (max pooling being used
for spatial down-sampling), while the decoder will consist in a stack of Conv2D and UpSampling2D layers.
import keras
from keras import layers

input_img = keras.Input(shape=(28, 28, 1))

x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)


x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

# at this point the representation is (4, 4, 8) i.e. 128-dimensional

x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)


x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = keras.Model(input_img, decoded)


autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
To train it, we will use the original MNIST digits with shape (samples, 3, 28, 28), and we will just normalize pixel
values between 0 and 1.
from keras.datasets import mnist
import numpy as np

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.


x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))
Let's train this model for 50 epochs. For the sake of demonstrating how to visualize the results of a model during
training, we will be using the TensorFlow backend and the TensorBoard callback.
First, let's open up a terminal and start a TensorBoard server that will read logs stored at /tmp/autoencoder.
tensorboard --logdir=/tmp/autoencoder
Then let's train our model. In the callbacks list we pass an instance of the TensorBoard callback. After every epoch,
this callback will write logs to /tmp/autoencoder, which can be read by our TensorBoard server.
from keras.callbacks import TensorBoard

autoencoder.fit(x_train, x_train,
epochs=50,
batch_size=128,
shuffle=True,
validation_data=(x_test, x_test),
callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])
This allows us to monitor training in the TensorBoard web interface (by navighating to http://0.0.0.0:6006):
The model converges to a loss of 0.094, significantly better than our previous models (this is in large part due to the
higher entropic capacity of the encoded representation, 128 dimensions vs. 32 previously). Let's take a look at the
reconstructed digits:

decoded_imgs = autoencoder.predict(x_test)

n = 10
plt.figure(figsize=(20, 4))
for i in range(1, n + 1):
# Display original
ax = plt.subplot(2, n, i)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

# Display reconstruction
ax = plt.subplot(2, n, i + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()

We can also have a look at the 128-dimensional encoded representations. These representations are 8x4x4, so we
reshape them to 4x32 in order to be able to display them as grayscale images.

encoder = keras.Model(input_img, encoded)


encoded_imgs = encoder.predict(x_test)

n = 10
plt.figure(figsize=(20, 8))
for i in range(1, n + 1):
ax = plt.subplot(1, n, i)
plt.imshow(encoded_imgs[i].reshape((4, 4 * 8)).T)
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
Application to image denoising
Let's put our convolutional autoencoder to work on an image denoising problem. It's simple: we will train the
autoencoder to map noisy digits images to clean digits images.

Here's how we will generate synthetic noisy digits: we just apply a gaussian noise matrix and clip the images between
0 and 1.

from keras.datasets import mnist


import numpy as np

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.


x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)

x_train_noisy = np.clip(x_train_noisy, 0., 1.)


x_test_noisy = np.clip(x_test_noisy, 0., 1.)
Here's what the noisy digits look like:

n = 10
plt.figure(figsize=(20, 2))
for i in range(1, n + 1):
ax = plt.subplot(1, n, i)
plt.imshow(x_test_noisy[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()

If you squint you can still recognize them, but barely. Can our autoencoder learn to recover the original digits? Let's
find out.
Compared to the previous convolutional autoencoder, in order to improve the quality of the reconstructed, we'll use
a slightly different model with more filters per layer:

input_img = keras.Input(shape=(28, 28, 1))

x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)


x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

# At this point the representation is (7, 7, 32)

x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)


x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = keras.Model(input_img, decoded)


autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
Let's train it for 100 epochs:

autoencoder.fit(x_train_noisy, x_train,
epochs=100,
batch_size=128,
shuffle=True,
validation_data=(x_test_noisy, x_test),
callbacks=[TensorBoard(log_dir='/tmp/tb', histogram_freq=0, write_graph=False)])
Now let's take a look at the results. Top, the noisy digits fed to the network, and bottom, the digits are reconstructed
by the network.

It seems to work pretty well. If you scale this process to a bigger convnet, you can start building document denoising
or audio denoising models. Kaggle has an interesting dataset to get you started.

Sequence-to-sequence autoencoder
If you inputs are sequences, rather than vectors or 2D images, then you may want to use as encoder and decoder a
type of model that can capture temporal structure, such as a LSTM. To build a LSTM-based autoencoder, first use a
LSTM encoder to turn your input sequences into a single vector that contains information about the entire sequence,
then repeat this vector n times (where n is the number of timesteps in the output sequence), and run a LSTM decoder
to turn this constant sequence into the target sequence.
We won't be demonstrating that one on any specific dataset. We will just put a code example here for future
reference for the reader!

timesteps = ... # Length of your sequences


input_dim = ...
latent_dim = ...

inputs = keras.Input(shape=(timesteps, input_dim))


encoded = layers.LSTM(latent_dim)(inputs)

decoded = layers.RepeatVector(timesteps)(encoded)
decoded = layers.LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = keras.Model(inputs, decoded)
encoder = keras.Model(inputs, encoded)

Variational autoencoder (VAE)


Variational autoencoders are a slightly more modern and interesting take on autoencoding.

What is a variational autoencoder, you ask? It's a type of autoencoder with added constraints on the encoded
representations being learned. More precisely, it is an autoencoder that learns a latent variable model for its input
data. So instead of letting your neural network learn an arbitrary function, you are learning the parameters of a
probability distribution modeling your data. If you sample points from this distribution, you can generate new input
data samples: a VAE is a "generative model".
How does a variational autoencoder work?

First, an encoder network turns the input samples x into two parameters in a latent space, which we will
note z_mean and z_log_sigma. Then, we randomly sample similar points z from the latent normal distribution that is
assumed to generate the data, via z = z_mean + exp(z_log_sigma) * epsilon, where epsilon is a random normal
tensor. Finally, a decoder network maps these latent space points back to the original input data.
The parameters of the model are trained via two loss functions: a reconstruction loss forcing the decoded samples to
match the initial inputs (just like in our previous autoencoders), and the KL divergence between the learned latent
distribution and the prior distribution, acting as a regularization term. You could actually get rid of this latter term
entirely, although it does help in learning well-formed latent spaces and reducing overfitting to the training data.

Because a VAE is a more complex example, we have made the code available on Github as a standalone script. Here
we will review step by step how the model is created.
First, here's our encoder network, mapping inputs to our latent distribution parameters:

original_dim = 28 * 28
intermediate_dim = 64
latent_dim = 2

inputs = keras.Input(shape=(original_dim,))
h = layers.Dense(intermediate_dim, activation='relu')(inputs)
z_mean = layers.Dense(latent_dim)(h)
z_log_sigma = layers.Dense(latent_dim)(h)
We can use these parameters to sample new similar points from the latent space:

from keras import backend as K

def sampling(args):
z_mean, z_log_sigma = args
epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim),
mean=0., stddev=0.1)
return z_mean + K.exp(z_log_sigma) * epsilon

z = layers.Lambda(sampling)([z_mean, z_log_sigma])
Finally, we can map these sampled latent points back to reconstructed inputs:

# Create encoder
encoder = keras.Model(inputs, [z_mean, z_log_sigma, z], name='encoder')

# Create decoder
latent_inputs = keras.Input(shape=(latent_dim,), name='z_sampling')
x = layers.Dense(intermediate_dim, activation='relu')(latent_inputs)
outputs = layers.Dense(original_dim, activation='sigmoid')(x)
decoder = keras.Model(latent_inputs, outputs, name='decoder')

# instantiate VAE model


outputs = decoder(encoder(inputs)[2])
vae = keras.Model(inputs, outputs, name='vae_mlp')
What we've done so far allows us to instantiate 3 models:
 an end-to-end autoencoder mapping inputs to reconstructions
 an encoder mapping inputs to the latent space
 a generator that can take points on the latent space and will output the corresponding reconstructed samples.
We train the model using the end-to-end model, with a custom loss function: the sum of a reconstruction term, and
the KL divergence regularization term.

reconstruction_loss = keras.losses.binary_crossentropy(inputs, outputs)


reconstruction_loss *= original_dim
kl_loss = 1 + z_log_sigma - K.square(z_mean) - K.exp(z_log_sigma)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
vae_loss = K.mean(reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)
vae.compile(optimizer='adam')
We train our VAE on MNIST digits:

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32') / 255.


x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

vae.fit(x_train, x_train,
epochs=100,
batch_size=32,
validation_data=(x_test, x_test))
Because our latent space is two-dimensional, there are a few cool visualizations that can be done at this point. One is
to look at the neighborhoods of different classes on the latent 2D plane:

x_test_encoded = encoder.predict(x_test, batch_size=batch_size)


plt.figure(figsize=(6, 6))
plt.scatter(x_test_encoded[:, 0], x_test_encoded[:, 1], c=y_test)
plt.colorbar()
plt.show()
Each of these colored clusters is a type of digit. Close clusters are digits that are structurally similar (i.e. digits that
share information in the latent space).

Because the VAE is a generative model, we can also use it to generate new digits! Here we will scan the latent plane,
sampling latent points at regular intervals, and generating the corresponding digit for each of these points. This
gives us a visualization of the latent manifold that "generates" the MNIST digits.

# Display a 2D manifold of the digits


n = 15 # figure with 15x15 digits
digit_size = 28
figure = np.zeros((digit_size * n, digit_size * n))
# We will sample n points within [-15, 15] standard deviations
grid_x = np.linspace(-15, 15, n)
grid_y = np.linspace(-15, 15, n)

for i, yi in enumerate(grid_x):
for j, xi in enumerate(grid_y):
z_sample = np.array([[xi, yi]])
x_decoded = decoder.predict(z_sample)
digit = x_decoded[0].reshape(digit_size, digit_size)
figure[i * digit_size: (i + 1) * digit_size,
j * digit_size: (j + 1) * digit_size] = digit

plt.figure(figsize=(10, 10))
plt.imshow(figure)
plt.show()
6. Write a program to implement basic reinforcement learning algorithm to teach a bot to reach its
destination.

import numpy as np

import pylab as pl

import networkx as nx

Step 2: Defining and visualising the graph

edges = [(0, 1), (1, 5), (5, 6), (5, 4), (1, 2),

(1, 3), (9, 10), (2, 4), (0, 6), (6, 7),

(8, 9), (7, 8), (1, 7), (3, 9)]

goal = 10

G = nx.Graph()

G.add_edges_from(edges)

pos = nx.spring_layout(G)

nx.draw_networkx_nodes(G, pos)

nx.draw_networkx_edges(G, pos)

nx.draw_networkx_labels(G, pos)

pl.show()
Note: The above graph may not look the same on reproduction of the code because
the networkx library in python produces a random graph from the given edges.
Step 3: Defining the reward the system for the bot

MATRIX_SIZE = 11

M = np.matrix(np.ones(shape =(MATRIX_SIZE, MATRIX_SIZE)))

M *= -1

for point in edges:

print(point)

if point[1] == goal:

M[point] = 100

else:

M[point] = 0

if point[0] == goal:

M[point[::-1]] = 100

else:

M[point[::-1]]= 0
# reverse of point

M[goal, goal]= 100

print(M)

# add goal point round trip

Step 4: Defining some utility functions to be used in the training

Q = np.matrix(np.zeros([MATRIX_SIZE, MATRIX_SIZE]))

gamma = 0.75

# learning parameter

initial_state = 1

# Determines the available actions for a given state

def available_actions(state):

current_state_row = M[state, ]

available_action = np.where(current_state_row >= 0)[1]

return available_action
available_action = available_actions(initial_state)

# Chooses one of the available actions at random

def sample_next_action(available_actions_range):

next_action = int(np.random.choice(available_action, 1))

return next_action

action = sample_next_action(available_action)

def update(current_state, action, gamma):

max_index = np.where(Q[action, ] == np.max(Q[action, ]))[1]

if max_index.shape[0] > 1:

max_index = int(np.random.choice(max_index, size = 1))

else:

max_index = int(max_index)

max_value = Q[action, max_index]

Q[current_state, action] = M[current_state, action] + gamma * max_value

if (np.max(Q) > 0):


return(np.sum(Q / np.max(Q)*100))

else:

return (0)

# Updates the Q-Matrix according to the path chosen

update(initial_state, action, gamma)

Step 5: Training and evaluating the bot using the Q-Matrix

scores = []

for i in range(1000):

current_state = np.random.randint(0, int(Q.shape[0]))

available_action = available_actions(current_state)

action = sample_next_action(available_action)

score = update(current_state, action, gamma)

scores.append(score)

# print("Trained Q matrix:")

# print(Q / np.max(Q)*100)

# You can uncomment the above two lines to view the trained Q matrix

# Testing
current_state = 0

steps = [current_state]

while current_state != 10:

next_step_index = np.where(Q[current_state, ] == np.max(Q[current_state, ]))[1]

if next_step_index.shape[0] > 1:

next_step_index = int(np.random.choice(next_step_index, size = 1))

else:

next_step_index = int(next_step_index)

steps.append(next_step_index)

current_state = next_step_index

print("Most efficient path:")

print(steps)

pl.plot(scores)

pl.xlabel('No of iterations')

pl.ylabel('Reward gained')

pl.show()
Now, Let’s bring this bot to a more realistic setting. Let us imagine that the bot is a detective
and is trying to find out the location of a large drug racket. He naturally concludes that the
drug sellers will not sell their products in a location which is known to be frequented by the
police and the selling locations are near the location of the drug racket. Also, the sellers
leave a trace of their products where they sell it and this can help the detective in finding out
the required location. We want to train our bot to find the location using
these Environmental Clues.
Step 6: Defining and visualizing the new graph with the environmental clues

# Defining the locations of the police and the drug traces

police = [2, 4, 5]

drug_traces = [3, 8, 9]

G = nx.Graph()

G.add_edges_from(edges)

mapping = {0:'0 - Detective', 1:'1', 2:'2 - Police', 3:'3 - Drug traces',

4:'4 - Police', 5:'5 - Police', 6:'6', 7:'7', 8:'Drug traces',

9:'9 - Drug traces', 10:'10 - Drug racket location'}

H = nx.relabel_nodes(G, mapping)

pos = nx.spring_layout(H)
nx.draw_networkx_nodes(H, pos, node_size =[200, 200, 200, 200, 200, 200, 200, 200])

nx.draw_networkx_edges(H, pos)

nx.draw_networkx_labels(H, pos)

pl.show()

Note: The above graph may look a bit different from the previous graph but they, in fact, are
the same graphs. This is due to the random placement of nodes by the networkx library.
Step 7: Defining some utility functions for the training process

Q = np.matrix(np.zeros([MATRIX_SIZE, MATRIX_SIZE]))

env_police = np.matrix(np.zeros([MATRIX_SIZE, MATRIX_SIZE]))

env_drugs = np.matrix(np.zeros([MATRIX_SIZE, MATRIX_SIZE]))

initial_state = 1

# Same as above

def available_actions(state):

current_state_row = M[state, ]

av_action = np.where(current_state_row >= 0)[1]

return av_action
# Same as above

def sample_next_action(available_actions_range):

next_action = int(np.random.choice(available_action, 1))

return next_action

# Exploring the environment

def collect_environmental_data(action):

found = []

if action in police:

found.append('p')

if action in drug_traces:

found.append('d')

return (found)

available_action = available_actions(initial_state)

action = sample_next_action(available_action)

def update(current_state, action, gamma):

max_index = np.where(Q[action, ] == np.max(Q[action, ]))[1]


if max_index.shape[0] > 1:

max_index = int(np.random.choice(max_index, size = 1))

else:

max_index = int(max_index)

max_value = Q[action, max_index]

Q[current_state, action] = M[current_state, action] + gamma * max_value

environment = collect_environmental_data(action)

if 'p' in environment:

env_police[current_state, action] += 1

if 'd' in environment:

env_drugs[current_state, action] += 1

if (np.max(Q) > 0):

return(np.sum(Q / np.max(Q)*100))

else:

return (0)

# Same as above

update(initial_state, action, gamma)

def available_actions_with_env_help(state):

current_state_row = M[state, ]

av_action = np.where(current_state_row >= 0)[1]


# if there are multiple routes, dis-favor anything negative

env_pos_row = env_matrix_snap[state, av_action]

if (np.sum(env_pos_row < 0)):

# can we remove the negative directions from av_act?

temp_av_action = av_action[np.array(env_pos_row)[0]>= 0]

if len(temp_av_action) > 0:

av_action = temp_av_action

return av_action

# Determines the available actions according to the environment

Step 8: Visualising the Environmental matrices

scores = []

for i in range(1000):

current_state = np.random.randint(0, int(Q.shape[0]))

available_action = available_actions(current_state)

action = sample_next_action(available_action)

score = update(current_state, action, gamma)

# print environmental matrices


print('Police Found')

print(env_police)

print('')

print('Drug traces Found')

print(env_drugs)

Step 9: Training and evaluating the model

scores = []

for i in range(1000):

current_state = np.random.randint(0, int(Q.shape[0]))

available_action = available_actions_with_env_help(current_state)

action = sample_next_action(available_action)

score = update(current_state, action, gamma)


scores.append(score)

pl.plot(scores)

pl.xlabel('Number of iterations')

pl.ylabel('Reward gained')

pl.show()
7. (a) Write a program to implement a Recurrent Neural Network

Step 1: Create the Architecture for our RNN model

Our next task is defining all the necessary variables and functions we’ll use in the RNN model.

Our model will take in the input sequence, process it through a hidden layer of 100 units, and

produce a single valued output:

learning_rate = 0.0001
nepoch = 25
T = 50 # length of sequence
hidden_dim = 100
output_dim = 1

bptt_truncate = 5
min_clip_value = -10
max_clip_value = 10

We will then define the weights of the network:

U = np.random.uniform(0, 1, (hidden_dim, T))


W = np.random.uniform(0, 1, (hidden_dim, hidden_dim))
V = np.random.uniform(0, 1, (output_dim, hidden_dim))

Here,

 U is the weight matrix for weights between input and hidden layers

 V is the weight matrix for weights between hidden and output layers

 W is the weight matrix for shared weights in the RNN layer (hidden layer)

Finally, we will define the activation function, sigmoid, to be used in the hidden layer:

def sigmoid(x):
return 1 / (1 + np.exp(-x))
Step 2: Train the Model

Now that we have defined our model, we can finally move on with training it on our sequence

data. We can subdivide the training process into smaller steps, namely:

Step 2.1 : Check the loss on training data

Step 2.1.1 : Forward Pass

Step 2.1.2 : Calculate Error

Step 2.2 : Check the loss on validation data

Step 2.2.1 : Forward Pass

Step 2.2.2 : Calculate Error

Step 2.3 : Start actual training

Step 2.3.1 : Forward Pass

Step 2.3.2 : Backpropagate Error

Step 2.3.3 : Update weights

We need to repeat these steps until convergence. If the model starts to overfit, stop! Or simply

pre-define the number of epochs.

Step 2.1: Check the loss on training data

We will do a forward pass through our RNN model and calculate the squared error for the

predictions for all records in order to get the loss value.

for epoch in range(nepoch):


# check loss on train
loss = 0.0

# do a forward pass to get prediction


for i in range(Y.shape[0]):
x, y = X[i], Y[i] # get input, output values of each record
prev_s = np.zeros((hidden_dim, 1)) # here, prev-s is the value of the previous activation of hidden layer; which is
initialized as all zeroes
for t in range(T):
new_input = np.zeros(x.shape) # we then do a forward pass for every timestep in the sequence
new_input[t] = x[t] # for this, we define a single input for that timestep
mulu = np.dot(U, new_input)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)
mulv = np.dot(V, s)
prev_s = s

# calculate error
loss_per_record = (y - mulv)**2 / 2
loss += loss_per_record
loss = loss / float(y.shape[0])
Step 2.2: Check the loss on validation data

We will do the same thing for calculating the loss on validation data (in the same loop):

# check loss on val


val_loss = 0.0
for i in range(Y_val.shape[0]):
x, y = X_val[i], Y_val[i]
prev_s = np.zeros((hidden_dim, 1))
for t in range(T):
new_input = np.zeros(x.shape)
new_input[t] = x[t]
mulu = np.dot(U, new_input)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)
mulv = np.dot(V, s)
prev_s = s

loss_per_record = (y - mulv)**2 / 2
val_loss += loss_per_record
val_loss = val_loss / float(y.shape[0])

print('Epoch: ', epoch + 1, ', Loss: ', loss, ', Val Loss: ', val_loss)

You should get the below output:

Epoch: 1 , Loss: [[101185.61756671]] , Val Loss: [[50591.0340148]]


...
...
Step 2.3: Start actual training

We will now start with the actual training of the network. In this, we will first do a forward pass to

calculate the errors and a backward pass to calculate the gradients and update them. Let me

show you these step-by-step so you can visualize how it works in your mind.
Step 2.3.1: Forward Pass

In the forward pass:

 We first multiply the input with the weights between input and hidden layers

 Add this with the multiplication of weights in the RNN layer. This is because we want to capture

the knowledge of the previous timestep

 Pass it through a sigmoid activation function

 Multiply this with the weights between hidden and output layers

 At the output layer, we have a linear activation of the values so we do not explicitly pass the value

through an activation layer

 Save the state at the current layer and also the state at the previous timestep in a dictionary

Here is the code for doing a forward pass (note that it is in continuation of the above loop):

# train model
for i in range(Y.shape[0]):
x, y = X[i], Y[i]

layers = []
prev_s = np.zeros((hidden_dim, 1))
dU = np.zeros(U.shape)
dV = np.zeros(V.shape)
dW = np.zeros(W.shape)

dU_t = np.zeros(U.shape)
dV_t = np.zeros(V.shape)
dW_t = np.zeros(W.shape)

dU_i = np.zeros(U.shape)
dW_i = np.zeros(W.shape)

# forward pass
for t in range(T):
new_input = np.zeros(x.shape)
new_input[t] = x[t]
mulu = np.dot(U, new_input)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)
mulv = np.dot(V, s)
layers.append({'s':s, 'prev_s':prev_s})
prev_s = s
Step 2.3.2 : Backpropagate Error

After the forward propagation step, we calculate the gradients at each layer, and backpropagate

the errors. We will use truncated back propagation through time (TBPTT), instead of vanilla

backprop. It may sound complex but its actually pretty straight forward.

The core difference in BPTT versus backprop is that the backpropagation step is done for all the

time steps in the RNN layer. So if our sequence length is 50, we will backpropagate for all the

timesteps previous to the current timestep.

If you have guessed correctly, BPTT seems very computationally expensive. So instead of

backpropagating through all previous timestep , we backpropagate till x timesteps to save

computational power. Consider this ideologically similar to stochastic gradient descent, where we

include a batch of data points instead of all the data points.

Here is the code for backpropagating the errors:

# derivative of pred
dmulv = (mulv - y)

# backward pass
for t in range(T):
dV_t = np.dot(dmulv, np.transpose(layers[t]['s']))
dsv = np.dot(np.transpose(V), dmulv)

ds = dsv
dadd = add * (1 - add) * ds

dmulw = dadd * np.ones_like(mulw)

dprev_s = np.dot(np.transpose(W), dmulw)

for i in range(t-1, max(-1, t-bptt_truncate-1), -1):


ds = dsv + dprev_s
dadd = add * (1 - add) * ds

dmulw = dadd * np.ones_like(mulw)


dmulu = dadd * np.ones_like(mulu)

dW_i = np.dot(W, layers[t]['prev_s'])


dprev_s = np.dot(np.transpose(W), dmulw)

new_input = np.zeros(x.shape)
new_input[t] = x[t]
dU_i = np.dot(U, new_input)
dx = np.dot(np.transpose(U), dmulu)

dU_t += dU_i
dW_t += dW_i

dV += dV_t
dU += dU_t
dW += dW_t
Step 2.3.3 : Update weights

Lastly, we update the weights with the gradients of weights calculated. One thing we have to

keep in mind that the gradients tend to explode if you don’t keep them in check.This is a

fundamental issue in training neural networks, called the exploding gradient problem. So we

have to clamp them in a range so that they dont explode. We can do it like this

if dU.max() > max_clip_value:


dU[dU > max_clip_value] = max_clip_value
if dV.max() > max_clip_value:
dV[dV > max_clip_value] = max_clip_value
if dW.max() > max_clip_value:
dW[dW > max_clip_value] = max_clip_value

if dU.min() < min_clip_value:


dU[dU < min_clip_value] = min_clip_value
if dV.min() < min_clip_value:
dV[dV < min_clip_value] = min_clip_value
if dW.min() < min_clip_value:
dW[dW < min_clip_value] = min_clip_value

# update
U -= learning_rate * dU
V -= learning_rate * dV
W -= learning_rate * dW

On training the above model, we get this output:

Epoch: 1 , Loss: [[101185.61756671]] , Val Loss: [[50591.0340148]]


Epoch: 2 , Loss: [[61205.46869629]] , Val Loss: [[30601.34535365]]
Epoch: 3 , Loss: [[31225.3198258]] , Val Loss: [[15611.65669247]]
Epoch: 4 , Loss: [[11245.17049551]] , Val Loss: [[5621.96780111]]
Epoch: 5 , Loss: [[1264.5157739]] , Val Loss: [[632.02563908]]
Epoch: 6 , Loss: [[20.15654115]] , Val Loss: [[10.05477285]]
Epoch: 7 , Loss: [[17.13622839]] , Val Loss: [[8.55190426]]
Epoch: 8 , Loss: [[17.38870495]] , Val Loss: [[8.68196484]]
Epoch: 9 , Loss: [[17.181681]] , Val Loss: [[8.57837827]]
Epoch: 10 , Loss: [[17.31275313]] , Val Loss: [[8.64199652]]
Epoch: 11 , Loss: [[17.12960034]] , Val Loss: [[8.54768294]]
Epoch: 12 , Loss: [[17.09020065]] , Val Loss: [[8.52993502]]
Epoch: 13 , Loss: [[17.17370113]] , Val Loss: [[8.57517454]]
Epoch: 14 , Loss: [[17.04906914]] , Val Loss: [[8.50658127]]
Epoch: 15 , Loss: [[16.96420184]] , Val Loss: [[8.46794248]]
Epoch: 16 , Loss: [[17.017519]] , Val Loss: [[8.49241316]]
Epoch: 17 , Loss: [[16.94199493]] , Val Loss: [[8.45748739]]
Epoch: 18 , Loss: [[16.99796892]] , Val Loss: [[8.48242177]]
Epoch: 19 , Loss: [[17.24817035]] , Val Loss: [[8.6126231]]
Epoch: 20 , Loss: [[17.00844599]] , Val Loss: [[8.48682234]]
Epoch: 21 , Loss: [[17.03943262]] , Val Loss: [[8.50437328]]
Epoch: 22 , Loss: [[17.01417255]] , Val Loss: [[8.49409597]]
Epoch: 23 , Loss: [[17.20918888]] , Val Loss: [[8.5854792]]
Epoch: 24 , Loss: [[16.92068017]] , Val Loss: [[8.44794633]]
Epoch: 25 , Loss: [[16.76856238]] , Val Loss: [[8.37295808]]

Looking good! Time to get the predictions and plot them to get a visual sense of what we’ve

designed.

Step 3: Get predictions

We will do a forward pass through the trained weights to get our predictions:

preds = []
for i in range(Y.shape[0]):
x, y = X[i], Y[i]
prev_s = np.zeros((hidden_dim, 1))
# Forward pass
for t in range(T):
mulu = np.dot(U, x)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)
mulv = np.dot(V, s)
prev_s = s

preds.append(mulv)

preds = np.array(preds)

Plotting these predictions alongside the actual values:

plt.plot(preds[:, 0, 0], 'g')


plt.plot(Y[:, 0], 'r')
plt.show()

This was on the training data. How do we know if our model didn’t overfit? This is where the validation
set, which we created earlier, comes into play:

preds = []
for i in range(Y_val.shape[0]):
x, y = X_val[i], Y_val[i]
prev_s = np.zeros((hidden_dim, 1))
# For each time step...
for t in range(T):
mulu = np.dot(U, x)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)
mulv = np.dot(V, s)
prev_s = s

preds.append(mulv)

preds = np.array(preds)

plt.plot(preds[:, 0, 0], 'g')


plt.plot(Y_val[:, 0], 'r')
plt.show()

Not bad. The predictions are looking impressive. The RMSE score on the validation data is respectable
as well:

from sklearn.metrics import mean_squared_error

math.sqrt(mean_squared_error(Y_val[:, 0] * max_val, preds[:, 0, 0] * max_val))


0.127191931509431
7. (b) Write a program to implement LSTM and perform time series analysis using LSTM.

Data Preparation
Before a univariate series can be modeled, it must be prepared.

The LSTM model will learn a function that maps a sequence of past observations as input to an output
observation. As such, the sequence of observations must be transformed into multiple examples from
which the LSTM can learn.

Consider a given univariate sequence:

1 [10, 20, 30, 40, 50, 60, 70, 80, 90]

We can divide the sequence into multiple input/output patterns called samples, where three time steps are
used as input and one time step is used as output for the one-step prediction that is being learned.

1 X, y

2 10, 20, 30 40

3 20, 30, 40 50

4 30, 40, 50 60

5 ...

The split_sequence() function below implements this behavior and will split a given univariate sequence
into multiple samples where each sample has a specified number of time steps and the output is a single
time step.
1 # split a univariate sequence into samples

2 def split_sequence(sequence, n_steps):

3 X, y = list(), list()

4 for i in range(len(sequence)):

5 # find the end of this pattern

6 end_ix = i + n_steps

7 # check if we are beyond the sequence

8 if end_ix > len(sequence)-1:

9 break

10 # gather input and output parts of the pattern

11 seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]

12 X.append(seq_x)
13 y.append(seq_y)

14 return array(X), array(y)

We can demonstrate this function on our small contrived dataset above.

The complete example is listed below.

1 # univariate data preparation

2 from numpy import array

4 # split a univariate sequence into samples

5 def split_sequence(sequence, n_steps):

6 X, y = list(), list()

7 for i in range(len(sequence)):

8 # find the end of this pattern

9 end_ix = i + n_steps

10 # check if we are beyond the sequence

11 if end_ix > len(sequence)-1:

12 break

13 # gather input and output parts of the pattern

14 seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]

15 X.append(seq_x)

16 y.append(seq_y)

17 return array(X), array(y)

18

19 # define input sequence

20 raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]

21 # choose a number of time steps

22 n_steps = 3

23 # split into samples

24 X, y = split_sequence(raw_seq, n_steps)

25 # summarize the data

26 for i in range(len(X)):

27 print(X[i], y[i])
Running the example splits the univariate series into six samples where each sample has three input time
steps and one output time step.

1 [10 20 30] 40

2 [20 30 40] 50

3 [30 40 50] 60

4 [40 50 60] 70

5 [50 60 70] 80

6 [60 70 80] 90
8. (a) Write a program to perform object detection using Deep Learning

# import the necessary packages


import numpy as np
import argparse
import cv2
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image")
ap.add_argument("-p", "--prototxt", required=True,
help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.2,
help="minimum probability to filter weak detections")
args = vars(ap.parse_args())
# initialize the list of class labels MobileNet SSD was trained to
# detect, then generate a set of bounding box colors for each class
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
"sofa", "train", "tvmonitor"]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))
# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
# load the input image and construct an input blob for the image
# by resizing to a fixed 300x300 pixels and then normalizing it
# (note: normalization is done via the authors of the MobileNet SSD
# implementation)
image = cv2.imread(args["image"])
(h, w) = image.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843,
(300, 300), 127.5)
# pass the blob through the network and obtain the detections and
# predictions
print("[INFO] computing object detections...")
net.setInput(blob)
detections = net.forward()
# loop over the detections
for i in np.arange(0, detections.shape[2]):
# extract the confidence (i.e., probability) associated with the
# prediction
confidence = detections[0, 0, i, 2]
# filter out weak detections by ensuring the `confidence` is
# greater than the minimum confidence
if confidence > args["confidence"]:
# extract the index of the class label from the `detections`,
# then compute the (x, y)-coordinates of the bounding box for
# the object
idx = int(detections[0, 0, i, 1])
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
# display the prediction
label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)
print("[INFO] {}".format(label))
cv2.rectangle(image, (startX, startY), (endX, endY),
COLORS[idx], 2)
y = startY - 15 if startY - 15 > 15 else startY + 15
cv2.putText(image, label, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
# show the output image
cv2.imshow("Output", image)
cv2.waitKey(0)

$ python deep_learning_object_detection.py \
--prototxt MobileNetSSD_deploy.prototxt.txt \
--model MobileNetSSD_deploy.caffemodel --image images/example_01.jpg
[INFO] loading model...
[INFO] computing object detections...
[INFO] loading model...
[INFO] computing object detections...
[INFO] car: 99.78%
[INFO] car: 99.25%
8. (b) Dog-Breed Classifier – Design and train a convolutional neural network to analyze images
of dogs and correctly identify their breeds. Use transfer learning and well-known architectures to
improve this model.

from sklearn.datasets import load_files


from keras.utils import np_utils
import numpy as np
from glob import glob

# define function to load train, test, and validation datasets


def load_dataset(path):
data = load_files(path)
dog_files = np.array(data['filenames'])
dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
return dog_files, dog_targets

# load train, test, and validation datasets


train_files, train_targets = load_dataset('/data/dog_images/train')
valid_files, valid_targets = load_dataset('/data/dog_images/valid')
test_files, test_targets = load_dataset('/data/dog_images/test')

# load list of dog names


dog_names = [item[20:-1] for item in sorted(glob("/data/dog_images/train/*/"))]

# print statistics about the dataset


print('There are %d total dog categories.' % len(dog_names))
print('There are %s total dog images.\n' % len(np.hstack([train_files, valid_files,
test_files])))
print('There are %d training dog images.' % len(train_files))
print('There are %d validation dog images.' % len(valid_files))
print('There are %d test dog images.'% len(test_files))
import random
random.seed(8675309)

# load filenames in shuffled human dataset


human_files = np.array(glob("/data/lfw/*/*"))
random.shuffle(human_files)

# print statistics about the dataset


print('There are %d total human images.' % len(human_files))
import cv2
import matplotlib.pyplot as plt
%matplotlib inline

# extract pre-trained face detector


face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')

# load color (BGR) image


img = cv2.imread(human_files[100])
# convert BGR image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# find faces in image


faces = face_cascade.detectMultiScale(gray)

# print number of faces detected in the image


print('Number of faces detected:', len(faces))
# get bounding box for each detected face
for (x,y,w,h) in faces:
# add bounding box to color image
cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)

# convert BGR image to RGB for plotting


cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# display the image, along with bounding box


plt.imshow(cv_rgb)
plt.show()
9. (a) Write a program to demonstrate different activation functions.

import numpy as np

def sigmoid(x):
return 1 / (1 + np.exp(-x))

def tanh(x):
return np.tanh(x)

def relu(x):
return np.maximum(0, x)

def leaky_relu(x, alpha=0.01):


return np.maximum(alpha * x, x)

def softmax(x):
exp_x = np.exp(x)
return exp_x / np.sum(exp_x)
x = np.array([-1, 0, 1])

# Sigmoid
print(sigmoid(x)) # [0.26894142 0.5 0.73105858]

# Tanh
print(tanh(x)) # [-0.76159416 0. 0.76159416]

# ReLU
print(relu(x)) # [0 0 1]

# Leaky ReLU
print(leaky_relu(x)) # [-0.01 0. 1. ]

# Softmax
print(softmax(x)) # [0.09003057 0.24472847 0.66524096]

Output
[0.26894142 0.5 0.73105858]
[-0.76159416 0. 0.76159416]
[0 0 1]
[-0.01 0. 1. ]
[0.09003057 0.24472847 0.66524096]
9. (b) Write a program in TensorFlow to demonstrate different Loss functions.

import tensorflow as tf
from tensorflow.keras.losses import MeanAbsoluteError

y_true = [1., 0.]


y_pred = [2., 3.]

mae_loss = MeanAbsoluteError()

print(mae_loss(y_true, y_pred).numpy())

import tensorflow as tf
from tensorflow.keras.losses import CategoricalCrossentropy

# using one hot vector representation


y_true = [[0, 1, 0], [1, 0, 0]]
y_pred = [[0.15, 0.75, 0.1], [0.75, 0.15, 0.1]]

cross_entropy_loss = CategoricalCrossentropy()

print(cross_entropy_loss(y_true, y_pred).numpy())

import tensorflow.keras as keras

(trainX, trainY), (testX, testY) = keras.datasets.mnist.load_data()


from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Input, Flatten

model = Sequential([
Input(shape=(28,28,1,)),
Flatten(),
Dense(units=84, activation="relu"),
Dense(units=10, activation="softmax"),
])

print (model.summary())

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_1 (Flatten) (None, 784) 0

dense_2 (Dense) (None, 84) 65940


dense_3 (Dense) (None, 10) 850

=================================================================
Total params: 66,790
Trainable params: 66,790
Non-trainable params: 0

Epoch 1/10
235/235 [==============================] - 2s 6ms/step - loss: 7.8607 - acc: 0.8184 - val_loss: 1.7445
- val_acc: 0.8789
Epoch 2/10
235/235 [==============================] - 1s 6ms/step - loss: 1.1011 - acc: 0.8854 - val_loss: 0.9082
- val_acc: 0.8821
Epoch 3/10
235/235 [==============================] - 1s 6ms/step - loss: 0.5729 - acc: 0.8998 - val_loss: 0.6689
- val_acc: 0.8927
Epoch 4/10
235/235 [==============================] - 1s 5ms/step - loss: 0.3911 - acc: 0.9203 - val_loss: 0.5406
- val_acc: 0.9097
Epoch 5/10
235/235 [==============================] - 1s 6ms/step - loss: 0.3016 - acc: 0.9306 - val_loss: 0.5024
- val_acc: 0.9182
Epoch 6/10
235/235 [==============================] - 1s 6ms/step - loss: 0.2443 - acc: 0.9405 - val_loss: 0.4571
- val_acc: 0.9242
Epoch 7/10
235/235 [==============================] - 1s 5ms/step - loss: 0.2076 - acc: 0.9469 - val_loss: 0.4173
- val_acc: 0.9282
Epoch 8/10
235/235 [==============================] - 1s 5ms/step - loss: 0.1852 - acc: 0.9514 - val_loss: 0.4335
- val_acc: 0.9287
Epoch 9/10
235/235 [==============================] - 1s 6ms/step - loss: 0.1576 - acc: 0.9577 - val_loss: 0.4217
- val_acc: 0.9342
Epoch 10/10
235/235 [==============================] - 1s 5ms/step - loss: 0.1455 - acc: 0.9597 - val_loss: 0.4151
- val_acc: 0.9344
10. Write a program to build an Artificial Neural Network by implementing the Back
propagation algorithm and test the same using appropriate data sets

1. Initialize Network
# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network
from random import seed
from random import random

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
network.append(output_layer)
return network

seed(1)
network = initialize_network(2, 1, 2)
for layer in network:
print(layer)
[{'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}]
[{'weights': [0.2550690257394217, 0.49543508709194095]}, {'weights': [0.4494910647887381,
0.651592972722763]}]

2. Forward Propagate
We can break forward propagation down into three parts:

1. Neuron Activation.
2. Neuron Transfer.
3. Forward Propagation.

2.1. Neuron Activation


activation = sum(weight_i * input_i) + bias
# Calculate neuron activation for an input
def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation
2.2. Neuron Transfer
output = 1 / (1 + e^(-activation))
# Transfer neuron activation
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

2.3. Forward Propagation


# Forward propagate input to a network output
def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs
from math import exp

# Calculate neuron activation for an input


def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

# Transfer neuron activation


def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output


def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs

# test forward propagation


network = [[{'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
[{'weights': [0.2550690257394217, 0.49543508709194095]}, {'weights':
[0.4494910647887381, 0.651592972722763]}]]
row = [1, 0, None]
output = forward_propagate(network, row)
print(output)

[0.6629970129852887, 0.7253160725279748]

You might also like