Indian Institute of Technology Kharagpur
Advanced Technology Development Centre
Assignment: Different CNN Architectures
Course: Embedded Machine Learning
Professor: Dr. Ayantika Chatterjee
1 Introduction
In this assignment, you will explore the progression of Convolutional Neural Network
(CNN) architectures from LeNet-5 to more advanced models, implemented in Tensor-
Flow/Keras. The task focuses on training and evaluating these architectures on two
benchmark datasets:
• MNIST: Handwritten digit recognition dataset.
• CIFAR-10: Object classification dataset with 10 categories.
The primary objective is to understand how network depth, convolutional filter size,
and architectural complexity affect model performance, training time, and resource uti-
lization.
2 Learning Objectives
By completing this assignment, you will be able to:
1. Understand the design principles of CNN architectures.
2. Implement LeNet-5 and progressively advanced CNN models in TensorFlow.
3. Train models on MNIST and CIFAR-10 datasets.
4. Compare accuracy, training time, and computational requirements across architec-
tures.
5. Visualize learning curves and feature maps to interpret model behavior.
3 Datasets
3.1 MNIST
The MNIST dataset consists of 70,000 grayscale images of handwritten digits (0–9) of
size 28 × 28 pixels.
Training set size: 60,000 Test set size: 10,000
3.2 CIFAR-10
The CIFAR-10 dataset consists of 60,000 color images (32 × 32 × 3) across 10 classes such
as airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.
Training set size: 50,000 Test set size: 10,000
1
4 CNN Architectures to be Implemented
The assignment is structured to start from the most basic CNN and advance step-by-step.
4.1 Model 1: LeNet-5 (Baseline)
Originally proposed by Yann LeCun for handwritten digit recognition:
• Input: 28 × 28 grayscale (MNIST) / 32 × 32 × 3 (CIFAR-10, resized if necessary)
• Conv1: 6 filters, 5 × 5, ReLU
• AvgPool: 2 × 2
• Conv2: 16 filters, 5 × 5, ReLU
• AvgPool: 2 × 2
• FC1: 120 neurons, ReLU
• FC2: 84 neurons, ReLU
• Output: Softmax (10 classes)
4.2 Model 2: Deeper Custom CNN
Increase depth and add more filters:
• 3–4 convolutional layers
• MaxPooling instead of AveragePooling
• Dropout layers for regularization
4.3 Model 3: VGG-like Architecture
Inspired by VGGNet:
• Stacks of 3 × 3 conv layers
• MaxPooling after 2–3 conv layers
• Fully connected layers with Dropout
4.4 Model 4: ResNet-inspired CNN
Incorporate residual connections:
y = F(x, {Wi }) + x
where F represents convolution operations and Wi are learnable parameters.
2
5 Assignment Tasks
Task 1: Implement all four CNN architectures in TensorFlow/Keras.
Task 2: Train each model separately on MNIST and CIFAR-10 datasets.
Task 3: Record the following metrics for each model-dataset pair:
• Training accuracy and loss
• Validation accuracy and loss
• Total training time
Task 4: Plot accuracy and loss curves for each architecture.
Task 5: Visualize first-layer feature maps for at least two architectures.
Task 6: Prepare a comparative analysis table.
6 Analysis and Submission Requirements
1. Submit source code (.ipynb or .py) for all models.
2. Provide a report (PDF) containing:
• Implementation details and architectural diagrams
• Training/validation plots
• Comparative performance table
• Observations on architecture depth vs accuracy/training time
3. Ensure code is well-commented and reproducible.
References
1. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning
applied to document recognition. Proceedings of the IEEE.
2. Krizhevsky, A., Hinton, G. (2009). Learning Multiple Layers of Features from Tiny
Images.
3. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image
Recognition. CVPR.