Google Aiml
Google Aiml
Consider the following problem: You're building a system that performs activity recognition for
fitness tracking. You might have access to the speed at which a person is walking and attempt to
infer their activity based on that speed using a conditional.
if(speed<4){
status=WALKING;
}
if(speed<4){
status=WALKING;
} else {
status=RUNNING;
}
Now, consider what happens when you want to include an activity, like golf. It's less obvious how
to create a rule to determine the activity.
// Now what?
It's extremely difficult to write a program that will recognize the golfing activity, so what do you
do? You can use ML to solve the problem!
Prerequisites
Before attempting this codelab, you'll want to have:
If you're using a different IDE, make sure you have Python installed. You'll also need TensorFlow
and the NumPy library. You can learn more about and install TensorFlow here. Install NumPy
here.
2. What is ML?
Consider the traditional manner of building apps, as represented in the following diagram:
You express rules in a programming language. They act on data and your program provides answers**.**
In the case of the activity detection, the rules (the code you wrote to define activity types) acted upon the
data (the person's movement speed) to produce an answer: the return value from the function for
determining the activity status of the user (whether they were walking, running, biking, or doing something
else).
The process for detecting that activity status via ML is very similar, only the axes are different.
Instead of trying to define the rules and express them in a programming language, you provide the answers
(typically called labels) along with the data, and the machine infers the rules that determine the
relationship between the answers and data. For example, your activity detection scenario might look like
this in an ML context:
You gather lots of data and label it to effectively say, "This is what walking looks like," or "This is what
running looks like." Then, the computer can infer the rules that determine, from the data, what the distinct
patterns that denote a particular activity are.
Beyond being an alternative method to programming that scenario, that approach also gives you the ability
to open new scenarios, such as the golfing one that may not have been possible under the rules-based
traditional programming approach.
In traditional programming, your code compiles into a binary that is typically called a program. In ML, the
item that you create from the data and labels is called a model.
Consider the result of that to be a model, which is used like this at runtime:
You pass the model some data and the model uses the rules that it inferred from the training to make a
prediction, such as, "That data looks like walking," or "That data looks like biking."
Consider the following sets of numbers. Can you see the relationship between them?
X: -1 0 1 2 3 4
Y: -2 1 4 7 10 13
As you look at them, you might notice that the value of X is increasing by 1 as you read left to right and
the corresponding value of Y is increasing by 3. You probably think that Y equals 3X plus or minus
something. Then, you'd probably look at the 0 on X and see that Y is 1, and you'd come up with the
relationship Y=3X+1.
That's almost exactly how you would use code to train a model to spot the patterns in the data!
How would you train a neural network to do the equivalent task? Using data! By feeding it with a set of X's
and a set of Y's, it should be able to figure out the relationship between them.
Imports
> Note: If you're not using Colab and have your own Python environment set up with TensorFlow installed
and ready to use, then create a new Python file before continuing.
Start with your imports. Here, you're importing TensorFlow and calling it tf for ease of use.
Next, import a library called numpy, which represents your data as lists easily and quickly.
The framework for defining a neural network as a set of sequential layers is called keras, so import that,
too.
import tensorflow as tf
import numpy as np
from tensorflow import keras
Next, create the simplest possible neural network. It has one layer, that layer has one neuron, and the input
shape to it is only one value.
Next, write the code to compile your neural network. When you do so, you need to specify two functions—
a loss and an optimizer.
In this example, you know that the relationship between the numbers is Y=3X+1.
When the computer is trying to learn that, it makes a guess, maybe Y=10X+10. The loss function measures
the guessed answers against the known correct answers and measures how well or badly it did.
Next, the model uses the optimizer function to make another guess. Based on the loss function's result, it
tries to minimize the loss. At this point, maybe it will come up with something like Y=5X+5. While that's
still pretty bad, it's closer to the correct result (the loss is lower).
The model repeats that for the number of epochs, which you'll see shortly.
First, here's how to tell it to use mean_squared_error for the loss and stochastic gradient descent (sgd) for
the optimizer. You don't need to understand the math for those yet, but you can see that they work!
Over time, you'll learn the different and appropriate loss and optimizer functions for different scenarios.
model.compile(optimizer='sgd', loss='mean_squared_error')
Next, feed some data. In this case, you take the six X and six Y variables from earlier. You can see that the
relationship between those is that Y=3X+1, so where X is -1, Y is -2.
A python library called NumPy provides lots of array type data structures to do this. Specify the values as
an array in NumPy with np.array[].
Now you have all the code you need to define the neural network. The next step is to train it to see if it can
infer the patterns between those numbers and use them to create a model.
The process of training the neural network, where it learns the relationship between the X's and Y's, is in
the model.fit call. That's where it will go through the loop before making a guess, measuring how good or
bad it is (the loss), or using the optimizer to make another guess. It will do that for the number of epochs
that you specify. When you run that code, you'll see the loss will be printed out for each epoch.
For example, you can see that for the first few epochs, the loss value is quite large, but it's getting smaller
with each step.
You probably don't need all 500 epochs and can experiment with different amounts. As you can see from
the example, the loss is really small after only 50 epochs, so that might be enough!
You have a model that has been trained to learn the relationship between X and Y. You can use
the model.predict method to have it figure out the Y for a previously unknown X. For example, if X is 10,
what do you think Y will be? Take a guess before you run the following code:
print(model.predict([10.0]))
You might have thought 31, but it ended up being a little over. Why do you think that is?
Neural networks deal with probabilities, so it calculated that there is a very high probability that the
relationship between X and Y is Y=3X+1, but it can't know for sure with only six data points. The result is
very close to 31, but not necessarily 31.
As you work with neural networks, you'll see that pattern recurring. You will almost always deal with
probabilities, not certainties, and will do a little bit of coding to figure out what the result is based on the
probabilities, particularly when it comes to classification.
Module 2
2. Start coding
First, walk through the executable Colab notebook.
import tensorflow as tf
print(tf.__version__)
You'll train a neural network to recognize items of clothing from a common dataset
called Fashion MNIST. It contains 70,000 items of clothing in 10 different categories. Each item
of clothing is in a 28x28 grayscale image. You can see some examples here:
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
The Fashion MNIST data is available in the tf.keras.datasets API. Load it like this:
mnist = tf.keras.datasets.fashion_mnist
Calling load_data on that object gives you two sets of two lists: training values and testing values,
which represent graphics that show clothing items and their labels.
What do those values look like? Print a training image and a training label to see. You can
experiment with different indices in the array.
You may also want to look at 42, a different boot than the one at index 0.
Now, you might be wondering why there are two datasets—training and testing.
The idea is to have one set of data for training and another set of data that the model hasn't yet
encountered to see how well it can classify values. After all, when you're done, you'll want to use
the model with data that it hadn't previously seen! Also, without separate testing data, you'll run
the risk of the network only memorizing its training data without generalizing its knowledge.
3. Design the model
Now design the model. You'll have three layers. Go through them one-by-one and explore the
different types of layers and the parameters used for each.
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
Notice the use of metrics= as a parameter, which allows TensorFlow to report on the accuracy of
the training by checking the predicted results against the known answers (the labels).
model.compile(optimizer = tf.keras.optimizers.Adam(),
loss = 'sparse_categorical_crossentropy',
metrics=['accuracy'])
Epoch 1/5
60000/60000 [=======] - 6s 101us/sample - loss: 0.4964 - acc: 0.8247
Epoch 2/5
60000/60000 [=======] - 5s 86us/sample - loss: 0.3720 - acc: 0.8656
Epoch 3/5
60000/60000 [=======] - 5s 85us/sample - loss: 0.3335 - acc: 0.8780
Epoch 4/5
60000/60000 [=======] - 6s 103us/sample - loss: 0.3134 - acc: 0.8844
Epoch 5/5
60000/60000 [=======] - 6s 94us/sample - loss: 0.2931 - acc: 0.8926
When the model is done training, you will see an accuracy value at the end of the final epoch. It
might look something like 0.8926 as above. This tells you that your neural network is about 89%
accurate in classifying the training data. In other words, it figured out a pattern match between
the image and the labels that worked 89% of the time. Not great, but not bad considering it was
only trained for five epochs and done quickly.
model.evaluate(test_images, test_labels)
That example returned an accuracy of .8789, meaning it was about 88% accurate. (You might
have slightly different values.)
As expected, the model is not as accurate with the unknown data as it was with the data it was
trained on! As you learn more about TensorFlow, you'll find ways to improve that.
Exercise 1
For this first exercise, run the following code:
classifications = model.predict(test_images)
print(classifications[0])
It creates a set of classifications for each of the test images, then prints the first entry in the
classifications. The output after you run it is a list of numbers. Why do you think that is and what
do those numbers represent?
Try running print(test_labels[0]) and you'll get a 9. Does that help you understand why the list looks
the way it does?
The output of the model is a list of 10 numbers. Those numbers are a probability that the value
being classified is the corresponding label. For example, the first value in the list is the
probability that the clothing is of class 0 and the next is a 1. Notice that they are all very low
probabilities except one. Also, because of Softmax, all the probabilities in the list sum to 1.0.
The list and the labels are 0 based, so the ankle boot having label 9 means that it is the 10th of
the 10 classes. The list having the 10th element being the highest value means that the neural
network has predicted that the item it is classifying is most likely an ankle boot.
Exercise 2
Look at the layers in your model. Experiment with different values for the dense layer with 512
neurons.
What different results do you get for loss and training time? Why do you think that's the case?
For example, if you increase to 1,024 neurons, you have to do more calculations, slowing down
the process. But in this case they have a good impact because the model is more accurate. That
doesn't mean more is always better. You can hit the law of diminishing returns very quickly.
Exercise 3
What would happen if you remove the Flatten() layer. Why do you think that's the case?
You get an error about the shape of the data. The details of the error may seem vague right now,
but it reinforces the rule of thumb that the first layer in your network should be the same shape
as your data. Right now your data is 28x28 images, and 28 layers of 28 neurons would be
infeasible, so it makes more sense to flatten that 28,28 into a 784x1.
Instead of writing all the code, add the Flatten() layer at the beginning. When the arrays are loaded
into the model later, they'll automatically be flattened for you.
Exercise 4
Consider the final (output) layers. Why are there 10 of them? What would happen if you had a
different amount than 10?
Try training the network with 5. You get an error as soon as it finds an unexpected value. Another
rule of thumb—the number of neurons in the last layer should match the number of classes you
are classifying for. In this case, it's the digits 0 through 9, so there are 10 of them, and hence you
should have 10 neurons in your final layer.
Exercise 5
Consider the effects of additional layers in the network. What will happen if you add another layer
between the one with 512 and the final layer with 10?
There isn't a significant impact because this is relatively simple data. For far more complex data,
extra layers are often necessary.
Exercise 6
Before you trained, you normalized the data, going from values that were 0 through 255 to values
that were 0 through 1. What would be the impact of removing that? Here's the complete code to
give it a try (note that the two lines that normalize the data are commented out).
Why do you think you get different results? There's a great answer here on Stack Overflow.
import tensorflow as tf
print(tf.__version__)
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
#training_images=training_images/255.0
#test_images=test_images/255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
model.fit(training_images, training_labels, epochs=5)
model.evaluate(test_images, test_labels)
classifications = model.predict(test_images)
print(classifications[0])
print(test_labels[0])
7. Explore callbacks
Earlier, when you trained for extra epochs, you had an issue where your loss might change. It
might have taken a bit of time for you to wait for the training to do that and you might have
thought that it'd be nice if you could stop the training when you reach a desired value, such as
95% accuracy. If you reach that after 3 epochs, why sit around waiting for it to finish a lot more
epochs?
Like any other program, you have callbacks! See them in action:
import tensorflow as tf
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if(logs.get('accuracy')>0.95):
print("\nReached 95% accuracy so cancelling training!")
self.model.stop_training = True
callbacks = myCallback()
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images/255.0
test_images=test_images/255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5, callbacks=[callbacks])
Build convolutions and perform
module 3
pooling
2. What are convolutions?
A convolution is a filter that passes over an image, processes it, and extracts the important
features.
Let's say you have an image of a person wearing a sneaker. How would you detect that a sneaker
is present in the image? In order for your program to "see" the image as a sneaker, you'll have to
extract the important features, and blur the inessential features. This is called feature mapping.
The feature mapping process is theoretically simple. You'll scan every pixel in the image and
then look at its neighboring pixels. You multiply the values of those pixels by the equivalent
weights in a filter.
For example:
The current pixel value is 192. You can calculate the value of the new pixel by looking at the
neighbor values, multiplying them by the values specified in the filter, and making the new pixel
value the final amount.
Now it's time to explore how convolutions work by creating a basic convolution on a 2D
grayscale image.
You'll demonstrate that with the ascent image from SciPy. It's a nice built-in picture with lots of
angles and lines.
3. Start coding
Start by importing some Python libraries and the ascent picture:
import cv2
import numpy as np
from scipy import misc
i = misc.ascent()
Next, use the Pyplot library matplotlib to draw the image so that you know what it looks like:
You can see that it's an image of a stairwell. There are lots of features you can try and isolate.
For example, there are strong vertical lines.
The image is stored as a NumPy array, so we can create the transformed image by just copying
that array. The size_x and size_y variables will hold the dimensions of the image so you can loop
over it later.
i_transformed = np.copy(i)
size_x = i_transformed.shape[0]
size_y = i_transformed.shape[1]
4. Create the convolution matrix
First, make a convolution matrix (or kernel) as a 3x3 array:
Now, calculate the output pixels. Iterate over the image, leaving a 1-pixel margin, and multiply
each of the neighbors of the current pixel by the value defined in the filter.
That means that the current pixel's neighbor above it and to the left of it will be multiplied by the
top-left item in the filter. Then, multiply the result by the weight and ensure that the result is in the
range 0 through 255.
for x in range(1,size_x-1):
for y in range(1,size_y-1):
output_pixel = 0.0
output_pixel = output_pixel + (i[x - 1, y-1] * filter[0][0])
output_pixel = output_pixel + (i[x, y-1] * filter[0][1])
output_pixel = output_pixel + (i[x + 1, y-1] * filter[0][2])
output_pixel = output_pixel + (i[x-1, y] * filter[1][0])
output_pixel = output_pixel + (i[x, y] * filter[1][1])
output_pixel = output_pixel + (i[x+1, y] * filter[1][2])
output_pixel = output_pixel + (i[x-1, y+1] * filter[2][0])
output_pixel = output_pixel + (i[x, y+1] * filter[2][1])
output_pixel = output_pixel + (i[x+1, y+1] * filter[2][2])
output_pixel = output_pixel * weight
if(output_pixel<0):
output_pixel=0
if(output_pixel>255):
output_pixel=255
i_transformed[x, y] = output_pixel
5. Examine the results
Now, plot the image to see the effect of passing the filter over it:
# Plot the image. Note the size of the axes -- they are 512 by 512
plt.gray()
plt.grid(False)
plt.imshow(i_transformed)
#plt.axis('off')
plt.show()
Consider the following filter values and their impact on the image.
Explore different values! Also, try differently sized filters, such as 5x5 or 7x7.
6. Understanding Pooling
Now that you've identified the essential features of the image, what do you do? How do you use
the resulting feature map to classify images?
Similar to convolutions, pooling greatly helps with detecting features. Pooling layers reduce the
overall amount of information in an image while maintaining the features that are detected as
present.
There are a number of different types of pooling, but you'll use one called Maximum (Max)
Pooling.
Iterate over the image and, at each point, consider the pixel and its immediate neighbors to the
right, beneath, and right-beneath. Take the largest of those (hence max pooling) and load it into
the new image. Thus, the new image will be one-fourth the size of the
old.
7. Write code for pooling
The following code will show a (2, 2) pooling. Run it to see the output.
You'll see that while the image is one-fourth the size of the original, it kept all the features.
new_x = int(size_x/2)
new_y = int(size_y/2)
newImage = np.zeros((new_x, new_y))
for x in range(0, size_x, 2):
for y in range(0, size_y, 2):
pixels = []
pixels.append(i_transformed[x, y])
pixels.append(i_transformed[x+1, y])
pixels.append(i_transformed[x, y+1])
pixels.append(i_transformed[x+1, y+1])
pixels.sort(reverse=True)
newImage[int(x/2),int(y/2)] = pixels[0]
# Plot the image. Note the size of the axes -- now 256 pixels instead of 512
plt.gray()
plt.grid(False)
plt.imshow(newImage)
#plt.axis('off')
plt.show()
Note the axes of that plot. The image is now 256x256, one-fourth of its original size, and the
detected features have been enhanced despite less data now being in the image.
CNN
For convenience, here's the entire code again. Run it and take a note of the test accuracy that is
printed out at the end.
import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images/255.0
test_images=test_images/255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_accuracy*100))
Your accuracy is probably about 89% on training and 87% on validation. You can make that even
better using convolutions, which narrows down the content of the image to focus on specific,
distinct details.
If you've ever done image processing using a filter, then convolutions will look very familiar.
In short, you take an array (usually 3x3 or 5x5) and pass it over the image. By changing the
underlying pixels based on the formula within that matrix, you can perform operations like edge
detection. For example, typically a 3x3 is defined for edge detection where the middle cell is 8,
and all of its neighbors are -1. In this case, for each pixel, you would multiply its value by 8, then
subtract the value of each neighbor. Do this for every pixel, and you'll end up with a new image
that has its edges enhanced.
This is perfect for computer vision, because enhancing features like edges helps the computer
distinguish one item from another. Better still, the amount of information needed is much less,
because you'll train only on the highlighted features.
That's the concept of Convolutional Neural Networks. Add some layers to do convolution before
you have the dense layers, and then the information going to the dense layers becomes more
focused and possibly more accurate.
3. Try the code
Run the following code. It's the same neural network as earlier, but this time with convolutional
layers added first. It will take longer, but look at the impact on the accuracy:
import tensorflow as tf
print(tf.__version__)
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images=test_images / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(training_images, training_labels, epochs=5)
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_accuracy*100))
It's likely gone up to about 93% on the training data and 91% on the validation data.
Now try running it for more epochs—say about 20—and explore the results. While the training
results might seem really good, the validation results may actually go down due to a
phenomenon called overfitting.
Overfitting occurs when the network learns the data from the training set too well, so it's
specialised to recognize only that data, and as a result is less effective at seeing other data in
more general situations. For example, if you trained only on heels, then the network might be very
good at identifying heels, but sneakers might confuse it.
Look at the code again, and see step-by-step how the convolutions were built.
4. Gather the data
The first step is to gather the data.
You'll notice that there's a change here and the training data needed to be reshaped. That's
because the first convolution expects a single tensor containing everything, so instead of 60,000
28x28x1 items in a list, you have a single 4D list that is 60,000x28x28x1, and the same for the
test images. If you don't do that, then you'll get an error when training because the convolutions
do not recognize the shape.
import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images = training_images/255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images = test_images/255.0
• The number of convolutions you want to generate. A value like 32 is a good starting
point.
• The size of the convolutional matrix, in this case a 3x3 grid.
• The activation function to use, in this case use relu.
• In the first layer, the shape of the input data.
You'll follow the convolution with a max pooling layer, which is designed to compress the image
while maintaining the content of the features that were highlighted by the convolution. By
specifying (2,2) for the max pooling, the effect is to reduce the size of the image by a factor of 4.
It creates a 2x2 array of pixels and picks the largest pixel value, turning 4 pixels into 1. It repeats
this computation across the image, and in so doing halves the number of horizontal pixels and
halves the number of vertical pixels.
You can call model.summary() to see the size and shape of the network. Notice that after every
max pooling layer, the image size is reduced in the following way:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_2 (Conv2D) (None, 26, 26, 64) 640
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 11, 11, 64) 36928
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 5, 5, 64) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 1600) 0
_________________________________________________________________
dense_4 (Dense) (None, 128) 204928
_________________________________________________________________
dense_5 (Dense) (None, 10) 1290
=================================================================
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2, 2),
#Add another convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
#Now flatten the output. After this you'll just have the same DNN structure as the non convolutional
version
tf.keras.layers.Flatten(),
#The same 128 dense layers, and 10 output layers as in the pre-convolution example:
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
CODE:
print(test_labels[:100])
[9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0 2 5 7 9 1 4 6 0 9 3 8 8 3 3 8 0 7
5796137672122445822848077851123987026
2 3 1 2 8 4 1 8 5 9 5 0 3 2 0 6 5 3 6 7 1 8 0 1 4 2]
Now you can select some of the corresponding images for those labels and render what they
look like going through the convolutions. So, in the following
code, FIRST_IMAGE, SECOND_IMAGE and THIRD_IMAGE are all the indexes for value 9, an ankle boot.
And you should see something like the following, where the convolution is taking the essence of
the sole of the shoe, effectively spotting that as a common feature across all shoes.
8. Exercises
Exercise 1
Try editing the convolutions. Change the number of convolutions from 32 to either 16 or 64.
What impact does that have on accuracy and training time?
Exercise 2
Remove the final convolution. What impact does that have on accuracy or training time?
Exercise 3
Add more convolutions. What impact does that have?
Exercise 4
Remove all convolutions but the first. What impact does that have? Experiment with it.
COMPLEX IMAGES MODULE 5
!wget \
https://storage.googleapis.com/learning-datasets/horse-or-human.zip \
-O /tmp/horse-or-human.zip
The following Python code will use the OS library to use operating system libraries, giving you
access to the file system and the zip file library, therefore allowing you to unzip the data.
import os
import zipfile
local_zip = '/tmp/horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/horse-or-human')
zip_ref.close()
The contents of the zip file are extracted to the base directory /tmp/horse-or-human, which contain
horses and human subdirectories.
In short, the training set is the data that is used to tell the neural network model that "this is what
a horse looks like" and "this is what a human looks like."
3. Use the ImageGenerator to label and prepare the data
You do not explicitly label the images as horses or humans.
Later you'll see something called an ImageDataGenerator being used. It reads images from
subdirectories and automatically labels them from the name of that subdirectory. For example,
you have a training directory containing a horses directory and a humans
directory. ImageDataGenerator will label the images appropriately for you, reducing a coding step.
Now, see what the filenames look like in the horses and humans training directories:
train_horse_names = os.listdir(train_horse_dir)
print(train_horse_names[:10])
train_human_names = os.listdir(train_human_dir)
print(train_human_names[:10])
Find the total number of horse and human images in the directories:
%matplotlib inline
Now, display a batch of eight horse pictures and eight human pictures. You can rerun the cell to
see a fresh batch each time.
pic_index += 8
next_horse_pix = [os.path.join(train_horse_dir, fname)
for fname in train_horse_names[pic_index-8:pic_index]]
next_human_pix = [os.path.join(train_human_dir, fname)
for fname in train_human_names[pic_index-8:pic_index]]
img = mpimg.imread(img_path)
plt.imshow(img)
plt.show()
Here are some example images showing horses and humans in different poses and orientations:
5. Define the model
Start defining the model.
import tensorflow as tf
Then, add convolutional layers and flatten the final result to feed into the densely connected
layers. Finally, add the densely connected layers.
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 300x300 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fifth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
# Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('horses') and 1 for the
other ('humans')
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.summary()
You can see the results here:
The output shape column shows how the size of your feature map evolves in each successive
layer. The convolution layers reduce the size of the feature maps by a bit due to padding and
each pooling layer halves the dimensions.
Note: In this case, using the RMSprop optimization algorithm is preferable to stochastic
gradient descent (SGD) because RMSprop automates learning-rate tuning for you. (Other
optimizers, such as Adam and Adagrad, also automatically adapt the learning rate during
training and would work equally well here.)
Code:
from tensorflow.keras.optimizers import RMSprop
model.compile(loss='binary_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])
You'll have one generator for the training images and one for the validation images. Your
generators will yield batches of images of size 300x300 and their labels (binary).
As you may already know, data that goes into neural networks should usually be normalized in
some way to make it more amenable to processing by the network. (It's uncommon to feed raw
pixels into a CNN.) In your case, you'll preprocess your images by normalizing the pixel values to
be in the [0, 1] range (originally all values are in the [0, 255] range).
In Keras, that can be done via the keras.preprocessing.image.ImageDataGenerator class using the
rescale parameter. That ImageDataGenerator class allows you to instantiate generators of
augmented image batches (and their labels) via .flow(data, labels) or
.flow_from_directory(directory). Those generators can then be used with the Keras model
methods that accept data generators as
inputs: fit_generator, evaluate_generator and predict_generator.
history = model.fit(
train_generator,
steps_per_epoch=8,
epochs=15,
verbose=1)
The Loss and Accuracy are a great indication of progress of training. It's making a guess as to
the classification of the training data, and then measuring it against the known label, calculating
the result. Accuracy is the portion of correct guesses.
Epoch 1/15
9/9 [==============================] - 9s 1s/step - loss: 0.8662 - acc: 0.5151
Epoch 2/15
9/9 [==============================] - 8s 927ms/step - loss: 0.7212 - acc: 0.5969
Epoch 3/15
9/9 [==============================] - 8s 921ms/step - loss: 0.6612 - acc: 0.6592
Epoch 4/15
9/9 [==============================] - 8s 925ms/step - loss: 0.3135 - acc: 0.8481
Epoch 5/15
9/9 [==============================] - 8s 919ms/step - loss: 0.4640 - acc: 0.8530
Epoch 6/15
9/9 [==============================] - 8s 896ms/step - loss: 0.2306 - acc: 0.9231
Epoch 7/15
9/9 [==============================] - 8s 915ms/step - loss: 0.1464 - acc: 0.9396
Epoch 8/15
9/9 [==============================] - 8s 935ms/step - loss: 0.2663 - acc: 0.8919
Epoch 9/15
9/9 [==============================] - 8s 883ms/step - loss: 0.0772 - acc: 0.9698
Epoch 10/15
9/9 [==============================] - 9s 951ms/step - loss: 0.0403 - acc: 0.9805
Epoch 11/15
9/9 [==============================] - 8s 891ms/step - loss: 0.2618 - acc: 0.9075
Epoch 12/15
9/9 [==============================] - 8s 902ms/step - loss: 0.0434 - acc: 0.9873
Epoch 13/15
9/9 [==============================] - 8s 904ms/step - loss: 0.0187 - acc: 0.9932
Epoch 14/15
9/9 [==============================] - 9s 951ms/step - loss: 0.0974 - acc: 0.9649
Epoch 15/15
9/9 [==============================] - 8s 877ms/step - loss: 0.2859 - acc: 0.9338
9. Test the model
Now actually run a prediction using the model. The code will allow you to choose one or more
files from your file system. It will then upload them and run them through the model, giving an
indication of whether the object is a horse or a human.
You can download images from the internet to your file system to try them out! Note that you
might see that the network makes a lot of mistakes despite the fact that the training accuracy is
above 99%.
That's due to something called overfitting, which means that the neural network is trained with
very limited data (there are only roughly 500 images of each class). So it's very good at
recognizing images that look like those in the training set, but it can fail a lot at images that are
not in the training set.
That's a datapoint proving that the more data that you train on, the better your final network will
be!
There are many techniques that can be used to make your training better, despite limited data,
including something called image augmentation, but that's beyond the scope of this codelab.
import numpy as np
from google.colab import files
from keras.preprocessing import image
uploaded = files.upload()
for fn in uploaded.keys():
# predicting images
path = '/content/' + fn
img = image.load_img(path, target_size=(300, 300))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images, batch_size=10)
print(classes[0])
if classes[0]>0.5:
print(fn + " is a human")
else:
print(fn + " is a horse")
For example, say that you want to test with this image:
Here's what the colab produces:
Pick a random image from the training set, then generate a figure where each row is the output
of a layer and each image in the row is a specific filter in that output feature map. Rerun that cell
to generate intermediate representations for a variety of training images.
import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img
# Let's define a new Model that will take an image as input, and will output
# intermediate representations for all layers in the previous model after
# the first.
successive_outputs = [layer.output for layer in model.layers[1:]]
#visualization_model = Model(img_input, successive_outputs)
visualization_model = tf.keras.models.Model(inputs = model.input, outputs = successive_outputs)
# Let's prepare a random input image from the training set.
horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]
human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]
img_path = random.choice(horse_img_files + human_img_files)
# Rescale by 1/255
x /= 255
# Let's run our image through our network, thus obtaining all
# intermediate representations for this image.
successive_feature_maps = visualization_model.predict(x)
# These are the names of the layers, so can have them as part of our plot
layer_names = [layer.name for layer in model.layers]
As you can see, you go from the raw pixels of the images to increasingly abstract and compact
representations. The representations downstream start highlighting what the network pays
attention to, and they show fewer and fewer features being "activated." Most are set to zero.
That's called sparsity. Representation sparsity is a key feature of deep learning.
Those representations carry increasingly less information about the original pixels of the image,
but increasingly refined information about the class of the image. You can think of a CNN (or a
deep network in general) as an information distillation pipeline.
Use convolutional neural networks (CNNs) with large datasets to
avoid overfitting
First, set up your development environment with the requisite libraries that you'll need.
import os
import zipfile
import random
import tensorflow as tf
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from shutil import copyfile
try:
os.mkdir('/tmp/cats-v-dogs')
os.mkdir('/tmp/cats-v-dogs/training')
os.mkdir('/tmp/cats-v-dogs/testing')
os.mkdir('/tmp/cats-v-dogs/training/cats')
os.mkdir('/tmp/cats-v-dogs/training/dogs')
os.mkdir('/tmp/cats-v-dogs/testing/cats')
os.mkdir('/tmp/cats-v-dogs/testing/dogs')
except OSError:
pass
CAT_SOURCE_DIR = "/tmp/PetImages/Cat/"
TRAINING_CATS_DIR = "/tmp/cats-v-dogs/training/cats/"
TESTING_CATS_DIR = "/tmp/cats-v-dogs/testing/cats/"
DOG_SOURCE_DIR = "/tmp/PetImages/Dog/"
TRAINING_DOGS_DIR = "/tmp/cats-v-dogs/training/dogs/"
TESTING_DOGS_DIR = "/tmp/cats-v-dogs/testing/dogs/"
split_size = .9
split_data(CAT_SOURCE_DIR, TRAINING_CATS_DIR, TESTING_CATS_DIR, split_size)
split_data(DOG_SOURCE_DIR, TRAINING_DOGS_DIR, TESTING_DOGS_DIR, split_size)
# Expected output
# 666.jpg is zero length, so ignoring
# 11702.jpg is zero length, so ignoring
You can check to see if your data is properly unpacked using the following code:
print(len(os.listdir('/tmp/cats-v-dogs/training/cats/')))
print(len(os.listdir('/tmp/cats-v-dogs/training/dogs/')))
print(len(os.listdir('/tmp/cats-v-dogs/testing/cats/')))
print(len(os.listdir('/tmp/cats-v-dogs/testing/dogs/')))
# Expected output:
# 11250
# 11250
# 1250
# 1250
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer=RMSprop(lr=0.001), loss='binary_crossentropy', metrics=['accuracy'])
6. Train the model
Now that the model is defined, you can train the model using an ImageDataGenerator.
TRAINING_DIR = "/tmp/cats-v-dogs/training/"
train_datagen = ImageDataGenerator(rescale=1.0/255.)
train_generator = train_datagen.flow_from_directory(TRAINING_DIR,
batch_size=100,
class_mode='binary',
target_size=(150, 150))
VALIDATION_DIR = "/tmp/cats-v-dogs/testing/"
validation_datagen = ImageDataGenerator(rescale=1.0/255.)
validation_generator = validation_datagen.flow_from_directory(VALIDATION_DIR,
batch_size=100,
class_mode='binary',
target_size=(150, 150))
# Expected Output:
# Found 22498 images belonging to 2 classes.
# Found 2500 images belonging to 2 classes.
To train the model, you now call model.fit_generator, passing it to the generators that you created.
%matplotlib inline
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
#-----------------------------------------------------------
# Retrieve a list of list results on training and test data
# sets for each training epoch
#-----------------------------------------------------------
acc=history.history['accuracy']
val_acc=history.history['val_accuracy']
loss=history.history['loss']
val_loss=history.history['val_loss']
epochs=range(len(acc)) # Get number of epochs
#------------------------------------------------
# Plot training and validation accuracy per epoch
#------------------------------------------------
plt.plot(epochs, acc, 'r', "Training Accuracy")
plt.plot(epochs, val_acc, 'b', "Validation Accuracy")
plt.title('Training and validation accuracy')
plt.figure()
#------------------------------------------------
# Plot training and validation loss per epoch
#------------------------------------------------
plt.plot(epochs, loss, 'r', "Training Loss")
plt.plot(epochs, val_loss, 'b', "Validation Loss")
plt.figure()
If you want to take the model for a spin, you can use the following code. Upload images to see how it
classifies them!
# Here's a codeblock just for fun. You should be able to upload an image here
# and have it classified without crashing
import numpy as np
from google.colab import files
from keras.preprocessing import image
uploaded = files.upload()
for fn in uploaded.keys():
# predicting images
path = '/content/' + fn
img = image.load_img(path, target_size=(150, 150))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images, batch_size=10)
print(classes[0])
if classes[0]>0.5:
print(fn + " is a dog")
else: print(fn + " is a cat")