Assignment 3 (Hand-written)
1. Given a 5x5 image and a 3x3 filter, with a stride of 1 and no padding, what
is the output size after applying the convolution operation?
2. What will be the output shape of a convolutional layer with an input size of
(64x64), a filter size of (7x7), stride 2, and padding 3?
3. For an input size of (128x128), filter size of (3x3), padding 1, and stride 2,
what is the output size of the convolutional layer?
4. Calculate the output size of a convolution layer given an input size of
(28x28), filter size of (5x5), stride 1, and padding 2.
5. In a CNN model, if the input image is 50x50 pixels, and a convolution is
performed with a 4x4 filter, stride of 1, and no padding, what is the output
size?
6. Given a stride of 3, a filter size of 5x5, padding of 1, and an input size of
20x20, calculate the output size after convolution.
7. For a 3x3 kernel applied to a 10x10 image, with a stride of 2 and padding of
1, calculate the output size of the convolution layer.
8. Explain the role of the kernel (filter) in a convolutional neural network
(CNN) and how it helps in feature extraction.
9. What are the advantages of using smaller convolutional filters (such as 3x3)
compared to larger ones (5x5, 7x7)?
10.How does the stride in a convolution operation affect the output size? What
is the impact of increasing the stride on the resolution of the feature map?
11.What is the role of padding in a convolution operation, and how does it help
maintain the size of the output feature map?
12.What is the difference between valid and same padding in convolutional
layers?
13.Explain why convolutional layers are preferred over fully connected layers
in image processing tasks. Explain with example.
14.What is the significance of pooling layers in CNNs, and how do max-
pooling and average-pooling differ?
15.How does the pooling layer reduce the spatial dimensions of the feature
maps in a CNN?
16.What is the purpose of using multiple convolutional layers stacked on top of
each other in a CNN?
17.In a CNN, how does the depth (number of filters) of the convolutional layer
influence the feature representation of the input image?
18.What is the purpose of the sigmoid activation function, and where is it
typically used in neural networks?
19.What is the difference between the sigmoid and softmax functions in terms
of output range and use cases?
20.Why is sigmoid used for binary classification and softmax used for multi-
class classification?
21.What is binary cross-entropy, and why is it used in binary classification
problems?
22.What is categorical cross-entropy, and when should it be used in multi-class
classification tasks?
23.Why is cross-entropy loss generally preferred over mean squared error
(MSE) for classification problems?
24.How would you calculate binary cross-entropy for a predicted value of 0.8
and a true label of 1?
25.30.What is precision in classification, and how is it calculated from the
confusion matrix?
26.What is recall, and how does it differ from precision in terms of the
confusion matrix?
27.Why is accuracy not always the best metric for evaluating models, especially
in imbalanced datasets?
28.What is the relationship between precision and recall, and how can adjusting
the classification threshold impact these metrics?
29.In a binary classification task, what would happen to precision and recall if
you lower the classification threshold?
30.Explain the concept of R-CNN (Region-based CNN) and how it is used in
object detection tasks.
31.What are the key differences between R-CNN and traditional CNNs used for
image classification?
32.In R-CNN, how are region proposals generated, and what role do they play
in object detection?
33.What are the limitations of R-CNN, and how does Faster R-CNN address
some of these issues?
34.What is the role of the Region Proposal Network (RPN) in Faster R-CNN,
and how does it improve the object detection process?
35.How does Faster R-CNN improve upon R-CNN in terms of speed and
accuracy?
36.Explain how the Region of Interest (RoI) pooling layer works in Faster R-
CNN and why it is important for object detection.
37.In the context of Faster R-CNN, explain how the model generates bounding
box predictions for detected objects.
38.What is the key difference between object detection and image segmentation
tasks, and why is segmentation considered more complex?
39.How does semantic segmentation differ from instance segmentation, and
which one is typically used for detecting individual objects in an image?
40.Why is the segmentation task critical in applications like medical image
analysis or autonomous driving?