[go: up one dir, main page]

0% found this document useful (0 votes)
14 views5 pages

D1 - Deep Learning Workshop Session 3

This document outlines the step-by-step process for preparing an image for digit prediction using a CNN trained on the MNIST dataset. It covers loading or creating an image, resizing, normalizing pixel values, reshaping the image, and displaying it before prediction, followed by defining, compiling, training, and evaluating the CNN model. The entire process ensures that the input image is correctly formatted for accurate handwritten digit classification.

Uploaded by

nuhayd.m.k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

D1 - Deep Learning Workshop Session 3

This document outlines the step-by-step process for preparing an image for digit prediction using a CNN trained on the MNIST dataset. It covers loading or creating an image, resizing, normalizing pixel values, reshaping the image, and displaying it before prediction, followed by defining, compiling, training, and evaluating the CNN model. The entire process ensures that the input image is correctly formatted for accurate handwritten digit classification.

Uploaded by

nuhayd.m.k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CNN for MNIST Dataset: Handwritten

Digit Prediction (Step-by-Step)


This document explains the step-by-step process of preparing an image for
prediction using a Convolutional Neural Network (CNN) trained on the MNIST
dataset to predict handwritten digits.

1. Loading or Creating the Image

Loading an Image File (Typical Scenario)


●​ The commented-out line img = cv2.imread('student_image.jpg',
cv2.IMREAD_GRAYSCALE) demonstrates how you would typically load an
image file.
○​ cv2.imread(): Reads the image.
○​ 'student_image.jpg': Placeholder for the actual path to your
image file.
○​ cv2.IMREAD_GRAYSCALE: Specifies that the image should be loaded
as a grayscale image, which is the expected format for our MNIST
model.

Creating a Dummy Image (for Demonstration)


●​ Since a specific image file isn't available in this environment, a simple
dummy image is created using NumPy to simulate a handwritten digit.
○​ img = np.zeros((28, 28), dtype=np.uint8): Creates a 28x28
pixel image filled with zeros, representing black for the uint8 data
type.
○​ img[5:20, 10:18] = 255 and img[5:8, 8:20] = 255: These
lines draw a simple shape resembling the digit '7' by setting specific
pixel values to 255 (representing white). This dummy image is used
for testing the prediction process.

2. Resizing the Image


●​ img_resized = cv2.resize(img, (28, 28)): This commented-out line
shows how to resize a loaded image to the required 28x28 pixel size using
cv2.resize().
●​ For the dummy image, this step is skipped (img_resized = img) as it is
already 28x28.
●​ Importance: For a real image, this step is crucial to ensure the image
matches the input size the model was trained on.

3. Normalize the Pixel Values


●​ img_normalized = img_resized.astype('float32') / 255.0: This
crucial step converts the image data type to float32 and scales all pixel
values.
○​ Purpose: It divides all pixel values by 255.0, scaling them from their
original range (0-255) to a range between 0 and 1. This normalization
is essential because the CNN model was trained on data in this
scaled format.

4. Reshape the Normalized Image


●​ img_preprocessed = img_normalized.reshape((1, 28, 28, 1)):
The CNN model expects input in a specific shape.
○​ (number_of_images, height, width, channels): This is the required
input shape for the CNN.
○​ 1: Represents a single image being processed.
○​ 28, 28: Represent the height and width of the image.
○​ 1: Represents the number of color channels. Since it's a grayscale
image, there's only one channel.
○​ Purpose: This reshaping ensures the image has the correct
dimensions for the CNN model's input layer.
5. Display the Preprocessed Image (Optional)
●​ plt.imshow(img_preprocessed[0, :, :, 0], cmap='gray'): This
line uses Matplotlib to display the preprocessed image.
○​ [0, :, :, :]: Selects the first (and only) image.
○​ [:, :, 0]: Selects the single channel to display it as a 2D grayscale
image.
○​ cmap='gray': Specifies a grayscale colormap.
●​ plt.title("Preprocessed Image"): Adds a title to the plot.
●​ plt.show(): Displays the plot.
●​ Purpose: This step is helpful for visually confirming that the image has
been loaded and preprocessed correctly before feeding it to the model for
prediction.

In essence, this entire process ensures that any input image, whether loaded from
a file or created as a sample, is transformed into the precise size, format
(grayscale), and shape that the trained CNN model requires for accurate
handwritten digit prediction. This meticulously prepared image can then be
effectively used by the model for classification.

Model Steps:

1. Model Definition
●​ Sequential Model: A sequential model is defined, meaning layers are
stacked linearly.
●​ Convolutional Layers (Conv2D):
○​ Conv2D(32, kernel_size=(3, 3), activation='relu',
input_shape=(28, 28, 1)): The first convolutional layer with 32
filters, a 3x3 kernel, ReLU activation, and an input shape matching
the preprocessed image.
○​ Conv2D(64, kernel_size=(3, 3), activation='relu'): A
second convolutional layer with 64 filters.
●​ Max Pooling Layer (MaxPooling2D): MaxPooling2D(pool_size=(2, 2)):
Reduces spatial dimensions, helping to reduce computation and extract
dominant features.
●​ Dropout Layer: Dropout(0.25): Randomly sets a fraction of input units to
0 at each update during training, which helps prevent overfitting.
●​ Flatten Layer: Flatten(): Flattens the 2D output of the convolutional
layers into a 1D vector to be fed into fully connected layers.
●​ Dense (Fully Connected) Layers:
○​ Dense(128, activation='relu'): A fully connected layer with
128 neurons and ReLU activation.
○​ Dropout(0.5): Another dropout layer.
○​ Dense(10, activation='softmax'): The output layer with 10
neurons (one for each digit 0-9) and a softmax activation function to
produce probabilities for each digit.

2. Model Compilation
●​ Loss Function: model.compile(loss='categorical_crossentropy',
...): categorical_crossentropy is used as the loss function, suitable
for multi-class classification problems where the output is a probability
distribution.
●​ Optimizer: optimizer='adam': The Adam optimizer is chosen for its
efficiency in gradient descent.
●​ Metrics: metrics=['accuracy']: Accuracy is selected as the metric to
evaluate the model's performance during training and testing.

3. Model Training
●​ model.fit(): The fit method is used to train the model on the
preprocessed training data.
○​ X_train: The training images.
○​ y_train: The one-hot encoded training labels.
○​ batch_size=128: The number of samples per gradient update.
○​ epochs=10: The number of times the model will iterate over the entire
training dataset.
○​ verbose=1: Displays training progress.
○​ validation_data=(X_test, y_test): The model's performance
is evaluated on the test set after each epoch.
4. Model Evaluation
●​ model.evaluate(): After training, the model's performance is evaluated
on the unseen test data.
○​ X_test: The test images.
○​ y_test: The one-hot encoded test labels.
○​ Returns the loss and accuracy on the test set.

You might also like