[go: up one dir, main page]

0% found this document useful (0 votes)
7 views7 pages

D2 Deep Learning Workshop Session 2

This document discusses the importance of data augmentation and preprocessing for flower classification using deep learning models. It outlines the steps to load a flower dataset, apply data augmentation techniques, and visualize the results, emphasizing how these methods enhance model performance and generalization. Key benefits include reducing overfitting, increasing dataset size, and improving model robustness against variations in input data.

Uploaded by

nuhayd.m.k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views7 pages

D2 Deep Learning Workshop Session 2

This document discusses the importance of data augmentation and preprocessing for flower classification using deep learning models. It outlines the steps to load a flower dataset, apply data augmentation techniques, and visualize the results, emphasizing how these methods enhance model performance and generalization. Key benefits include reducing overfitting, increasing dataset size, and improving model robustness against variations in input data.

Uploaded by

nuhayd.m.k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Augmentation and

Preprocessing for Flower


Classification
Data augmentation and preprocessing are crucial steps in preparing image
datasets for deep learning models, especially when dealing with limited data.
These techniques help improve model generalization and reduce overfitting. This
document outlines the process of applying these techniques to a flower dataset.

1. Import Necessary Libraries


To begin, we import the required libraries. TensorFlow and Keras provide the
core functionalities for building and training deep learning models, while
Matplotlib is used for visualization.import tensorflow as tf

from tensorflow.keras import layers

import matplotlib.pyplot as plt

import numpy as np

2. Load the Flowers Dataset


We load the flowers dataset using
tf.keras.utils.image_dataset_from_directory. This utility automatically
infers the labels from the directory structure, making it convenient for image
classification tasks.# Assuming the flower dataset is organized into subdirectories
for each class

# Example directory structure:

# flowers/

# daisy/

# dandelion/

# roses/

# sunflowers/

# tulips/

data_dir = 'flowers' # Replace with the actual path to your dataset

image_size = (180, 180)

batch_size = 32

train_ds = tf.keras.utils.image_dataset_from_directory(

data_dir,

validation_split=0.2,

subset="training",

seed=123,

image_size=image_size,

batch_size=batch_size

)
val_ds = tf.keras.utils.image_dataset_from_directory(

data_dir,

validation_split=0.2,

subset="validation",

seed=123,

image_size=image_size,

batch_size=batch_size

class_names = train_ds.class_names

print(f"Class names: {class_names}")

3. Define Data Augmentation Layers


Keras preprocessing layers offer a straightforward way to define data
augmentation transformations. These layers are applied on-the-fly during training,
ensuring that the model sees new variations of the images in each
epoch.data_augmentation = tf.keras.Sequential([

layers.RandomFlip("horizontal_and_vertical"),

layers.RandomRotation(0.2),

layers.RandomZoom(0.2),

])
4. Visualize Original vs. Augmented Images
To understand the effect of the augmentation layers, it's helpful to visualize a
sample of original images alongside their augmented
versions.plt.figure(figsize=(10, 10))

for images, _ in train_ds.take(1):

for i in range(9):

augmented_images = data_augmentation(images)

ax = plt.subplot(3, 3, i + 1)

plt.imshow(augmented_images[0].numpy().astype("uint8"))

plt.axis("off")

plt.suptitle("Augmented Images")

plt.show()

5. Apply Data Augmentation to the Dataset


Data augmentation layers can be integrated directly into the tf.data.Dataset
pipeline. This ensures that the augmentation is applied efficiently during training
without needing to generate augmented images and save them to disk.def
prepare(ds, shuffle=False, augment=False):

# Resize and rescale images

ds = ds.map(lambda x, y: (tf.image.resize(x, image_size) / 255.0, y),

num_parallel_calls=tf.data.AUTOTUNE)

if shuffle:

ds = ds.shuffle(1000)
if augment:

ds = ds.map(lambda x, y: (data_augmentation(x, training=True), y),

num_parallel_calls=tf.data.AUTOTUNE)

# Use buffered prefetching to load images from disk without blocking I/O.

return ds.prefetch(buffer_size=tf.data.AUTOTUNE)

train_ds = prepare(train_ds, shuffle=True, augment=True)

val_ds = prepare(val_ds)

6. (Optional) Demonstrate Training with


Augmentation
While a full training demonstration has been conducted previously, here’s a
conceptual overview of how the augmented dataset would be used. The
augmented train_ds would be passed directly to the model.fit() method.#
Create a simple model for demonstration

num_classes = len(class_names)

model = tf.keras.Sequential([

layers.Rescaling(1./255, input_shape=(image_size[0], image_size[1], 3)),

data_augmentation, # Data augmentation can also be put here as a layer

layers.Conv2D(16, 3, padding='same', activation='relu'),

layers.MaxPooling2D(),

layers.Conv2D(32, 3, padding='same', activation='relu'),


layers.MaxPooling2D(),

layers.Conv2D(64, 3, padding='same', activation='relu'),

layers.MaxPooling2D(),

layers.Flatten(),

layers.Dense(128, activation='relu'),

layers.Dense(num_classes)

])

model.compile(optimizer='adam',

loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),

metrics=['accuracy'])

# This is where the augmented dataset would be used for training

# history = model.fit(train_ds, validation_data=val_ds, epochs=10)

print("Model compiled. The augmented 'train_ds' would be used for training via
model.fit().")

7. Finish Task: Summarize the Importance of


Data Augmentation
Data augmentation is a powerful technique for improving the robustness and
performance of deep learning models, especially in image classification tasks. Its
importance can be summarized as follows:

●​ Reduces Overfitting: By creating new, varied training samples from existing


ones, data augmentation helps prevent the model from memorizing the
training data. This leads to better generalization on unseen data.
●​ Increases Dataset Size: It effectively expands the training dataset, which is
particularly beneficial when the original dataset is small. A larger and more
diverse dataset helps the model learn more robust features.
●​ Improves Model Robustness: Exposing the model to different orientations,
scales, and perspectives of images makes it less sensitive to minor
variations in input data, thereby increasing its ability to correctly classify
images under varying conditions.
●​ Better Generalization: Models trained with augmented data tend to perform
better on real-world, diverse inputs because they have learned to recognize
patterns regardless of minor transformations.

In essence, data augmentation acts as a regularization technique, enabling deep


learning models to achieve higher accuracy and reliability in practical applications.

You might also like