[go: up one dir, main page]

0% found this document useful (0 votes)
31 views8 pages

Week 6

This document outlines a Week 6 lab for Mechatronics Engineering focused on training a Convolutional Neural Network (CNN) for image classification of robotic gripper poses. It details objectives, key skills, a problem statement, and step-by-step instructions for importing libraries, loading and preprocessing datasets, building and training a CNN model, and evaluating its performance. Additionally, it includes conceptual, practical, and application-based questions related to CNNs in robotic vision systems and challenges in deploying CNNs on embedded systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views8 pages

Week 6

This document outlines a Week 6 lab for Mechatronics Engineering focused on training a Convolutional Neural Network (CNN) for image classification of robotic gripper poses. It details objectives, key skills, a problem statement, and step-by-step instructions for importing libraries, loading and preprocessing datasets, building and training a CNN model, and evaluating its performance. Additionally, it includes conceptual, practical, and application-based questions related to CNNs in robotic vision systems and challenges in deploying CNNs on embedded systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

Week 6: Introduction to Artificial intelligence for

the Mechatronics Engineering

Title: Train a Convolutional Neural Network (CNN) for Image


Classification

Objective:
1. Understand the architecture and working principles of Convolutional Neural
Networks (CNNs).
2. Implement a CNN model using TensorFlow/Keras for image classification.
3. Train the CNN on a robotic gripper dataset to classify different grip poses.
4. Evaluate the CNN model using accuracy and loss metrics.
5. Visualize training performance and classified images using Matplotlib.

Key Skills:

 Understanding CNN architecture, convolution layers, pooling, and activation functions.


 Preprocessing image data for deep learning models.
 Implementing CNN using TensorFlow/Keras.
 Training, evaluating, and fine-tuning CNN models.
 Visualizing model training and classification results.

Problem Statement:

 A robotic arm equipped with a camera is used to classify different gripper poses
based on images. Your task is to train a CNN model that classifies gripper
positions into categories:
o Open, Closed, and Partially Open.
You will use a small dataset of gripper images, preprocess the images, train a CNN
model, and evaluate its classification performance.
Instructions

Step 1: Import Required Libraries (5 minutes)

Start by importing all necessary libraries.

import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers, models
import os

Step 2: Load and Preprocess the Dataset (20 minutes)

 Download and Load Dataset


 For this lab, use an open-source dataset or create a folder with images of robotic
grippers in different poses.

# Load dataset from TensorFlow datasets (or use a local dataset)


import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Load dataset (using "rock_paper_scissors" as an example dataset)
dataset_name = "rock_paper_scissors"
# You can replace this with your own dataset
dataset, info = tfds.load(dataset_name, as_supervised=True,
with_info=True)
# Split into training and testing sets
train_dataset, test_dataset = dataset['train'], dataset['test']

Preprocess Images

o Resize images and normalize pixel values for CNN training.

# Preprocess Images
IMG_SIZE = (64, 64)
BATCH_SIZE = 32
# Normalize and augment images using raw string for paths
train_generator = ImageDataGenerator(rescale=1.0/255.0).flow_from_direct-
ory(
r"C:\Users\global\Desktop\Gripper\Train", # Replace with your local
path
target_size=IMG_SIZE,
batch_size=BATCH_SIZE,
class_mode="categorical"
)

test_generator = ImageDataGenerator(rescale=1.0/255.0).flow_from_direct-
ory(
r"C:\Users\global\Desktop\Gripper\Test", # Replace with your local
path; assuming test images are in a Test folder
target_size=IMG_SIZE,
batch_size=BATCH_SIZE,
class_mode="categorical"
)

Step 3: * Step 3: Build a CNN Model Using TensorFlow/Keras (25 minutes)

o Define the CNN Architecture

from tensorflow.keras import layers, models

# Define the CNN model


model = models.Sequential([ # Added '=' for assignment
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),

# Fixed missing '=' in activation


layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'), # Fixed missing '=' in
activation
layers.MaxPooling2D((2, 2)),
layers.Conv2D(128, (3, 3), activation='relu'), # Fixed missing '=' in
activation
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(128, activation='relu'), # Fixed missing '=' in activa-
tion
layers.Dense(3, activation='softmax') # 3 output classes (Open,
Closed, Other)
])
# Compile the model
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)

# Display model architecture


model.summary()

o Train the model

# Train the model


history = model.fit( # Fixed missing '=' for history assignment
train_generator,
epochs=10,
validation_data=test_generator # Fixed incomplete line
)

Step 4: Evaluate and Visualize CNN Performance (20 minutes)

 Plot Accuracy and Loss Curves

# Plot training accuracy and loss


plt.figure(figsize=(12, 5))
# Plot Accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label="Training Accuracy")
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.title("Model Accuracy")
plt.legend()
# Plot Loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Model Loss")
plt.legend()
plt.show()
# Test the Model with Sample Images
# Load a random test image and predict
test_images, test_labels = next(iter(test_generator))

# Ensure random index is within range


random_index = random.randint(0, len(test_images) - 1)
test_image = test_images[random_index]

# Predict class for the sample image


prediction = model.predict(np.expand_dims(test_image, axis=0))
predicted_class = np.argmax(prediction)

# Display the image with the predicted class


plt.imshow(test_image)
plt.title(f"Predicted Class: {predicted_class}")
plt.axis('off')
plt.show()

Evaluation Questions:

Conceptual Questions:

1. What are the roles of convolution layers and pooling layers in CNNs?
Convolution layers and pooling layers are the backbone of CNNs, each serving distinct
but complementary purposes:

Convolution Layers: These layers are responsible for feature extraction. They apply a
set of learnable filters (or kernels) to the input image (or feature map) through a sliding
window operation called convolution. This process detects local patterns like edges,
textures, or shapes by computing dot products between the filter and the input. The
filters are updated during training via backpropagation, allowing the network to learn
increasingly complex features (e.g., from edges in early layers to object parts in deeper
layers). The output is a feature map that highlights where specific features appear in the
input.
Pooling Layers: Pooling layers reduce spatial dimensions (width and height) of the
feature maps while preserving important information. This downsizing, often done via
max pooling (taking the maximum value in a region) or average pooling, achieves two
key goals: (1) it reduces computational complexity by shrinking the data size, making the
network faster and less prone to overfitting, and (2) it introduces a degree of translation
invariance, meaning the network can recognize features regardless of their exact
position in the image. Together, convolution and pooling layers enable CNNs to focus on
relevant patterns while managing resource demands.

2. Why do we use activation functions like ReLU and Softmax in a CNN?


Activation functions introduce non-linearity and decision-making into CNNs, and their
roles depend on where they’re used:
ReLU (Rectified Linear Unit): Applied after convolution (and sometimes pooling)
layers, ReLU (f(x) = max(0, x)) turns negative values to zero while keeping positive
values unchanged. This non-linearity allows the network to model complex patterns—
without it, stacking linear operations (like convolution) would just produce another linear
function, limiting the network’s expressive power. ReLU also helps with faster
convergence during training by mitigating the vanishing gradient problem (common in
older functions like sigmoid) and promotes sparsity, as negative outputs are zeroed out,
reducing redundant computations.
Softmax: Typically used in the output layer of a classification CNN, Softmax converts
raw scores (logits) into probabilities that sum to 1. For example, in a 10-class problem
(like digit recognition), Softmax ensures the network outputs a probability distribution
(e.g., [0.1, 0.7, 0.05, ...]) where the highest value indicates the predicted class. This is
crucial for multi-class tasks, as it provides interpretable confidence scores and aligns
with cross-entropy loss, a common training objective.

Practical Questions:

1. Modify the CNN to use four convolution layers instead of three. What is the
impact on accuracy?
Modification: Add a fourth convolution layer, say with 64 filters and a 3x3 kernel,
followed by a pooling layer (e.g., max pooling). This increases the depth of the
network.
Impact on Accuracy:
 Positive: A deeper network can learn more hierarchical features—early
layers might detect edges, while the fourth could capture higher-level
patterns (e.g., object parts or textures). This could boost accuracy,
especially on complex datasets (e.g., ImageNet vs. MNIST), where
additional capacity helps.
 Negative: However, adding a layer increases the risk of overfitting,
particularly if the dataset is small or lacks diversity. It also raises the
chance of vanishing gradients if not mitigated (e.g., with ReLU or batch
normalization). Accuracy might drop if the model becomes too complex
for the task or if training isn’t tuned (e.g., insufficient epochs or
regularization).
 Net Effect: Accuracy likely improves with proper tuning (e.g., dropout,
more data) but could degrade without it. Empirical testing on your specific
dataset is key.

2. Change the batch size from 32 to 64. How does it affect training time and
performance?
Modification: Increase the batch size from 32 to 64 during training (e.g., in a framework
like PyTorch or TensorFlow).
Impact on Training Time:
 Larger batch sizes (64 vs. 32) allow more samples to be processed per
forward/backward pass, leveraging parallelism (e.g., GPU memory). This
typically reduces training time per epoch because fewer iterations are needed
(e.g., for 10,000 samples, 312 iterations at batch size 32 vs. 156 at 64).
However, if GPU memory is maxed out, it might slow down due to overhead.
Impact on Performance:
 Gradient Stability: A batch size of 64 provides a smoother gradient estimate
than 32, potentially leading to more stable convergence. However, it might miss
some of the "noise" in smaller batches that helps escape local minima, possibly
lowering generalization (accuracy on test data).
 Memory Trade-off: Larger batches use more memory, which could force a
reduction in model size or precision (e.g., FP16), indirectly affecting performance.
 Net Effect: Training time per epoch decreases, but total epochs to converge
might increase if generalization suffers. Performance (e.g., accuracy) could
slightly drop unless learning rate is adjusted (often increased with larger
batches).

Application-Based Questions:

1. How can CNNs be used in robotic vision systems for industrial applications?
CNNs are transformative in robotic vision for tasks like quality control, assembly,
and navigation:
 Object Detection: CNNs (e.g., YOLO, Faster R-CNN) identify and localize
parts on an assembly line, enabling robots to pick and place items accurately
(e.g., sorting defective widgets).
 Quality Inspection: Trained on images of good vs. defective products, CNNs
classify anomalies (e.g., scratches on metal surfaces) in real-time, improving
throughput over human inspection.
 Path Planning: In autonomous navigation, CNNs process camera feeds to
detect obstacles or recognize landmarks, guiding robots in dynamic
environments (e.g., warehouse bots).

2. What are the challenges in deploying CNN models on embedded systems in


mechatronics?
Deploying CNNs on embedded systems (e.g., microcontrollers, FPGAs in robots) is
tricky due to resource constraints:
 Limited Compute Power: Embedded devices (e.g., Raspberry Pi, Arduino) lack the
GPU horsepower of servers, slowing inference. Solutions like model pruning
(removing redundant weights) or quantization (reducing precision from 32-bit to 8-bit)
help but may degrade accuracy.
 Memory Constraints: CNNs with millions of parameters (e.g., ResNet-50) exceed
the RAM/ROM of embedded systems. Lightweight models (e.g., MobileNet) or on-
device compression are needed.
 Real-Time Requirements: Mechatronic systems (e.g., robotic arms) demand low-
latency inference (milliseconds), but deep CNNs can be too slow without optimization
(e.g., TensorRT, hardware accelerators).
 Power Consumption: Battery-powered devices can’t sustain energy-hungry CNNs.
Efficient architectures or offloading to edge servers mitigate this.
 Environmental Robustness: Industrial settings (e.g., varying lighting, vibrations)
challenge model generalization, requiring retraining or domain adaptation

You might also like