[go: up one dir, main page]

0% found this document useful (0 votes)
3 views7 pages

Deep Learning

The document outlines various deep learning architectures and their applications, including LeNet for handwritten digit recognition, ZF-Net for image classification, and VGGNet for style transfer. It discusses challenges in training models with limited data, the architecture of RNNs and LSTMs for sequential data processing, and the use of GANs and DBNs in generating and classifying images. Additionally, it addresses issues like adversarial examples, gradient flow in deep networks, and the importance of transfer learning in medical imaging.

Uploaded by

testadress30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views7 pages

Deep Learning

The document outlines various deep learning architectures and their applications, including LeNet for handwritten digit recognition, ZF-Net for image classification, and VGGNet for style transfer. It discusses challenges in training models with limited data, the architecture of RNNs and LSTMs for sequential data processing, and the use of GANs and DBNs in generating and classifying images. Additionally, it addresses issues like adversarial examples, gradient flow in deep networks, and the importance of transfer learning in medical imaging.

Uploaded by

testadress30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Unit III

1. Consider the MNIST Dataset and Assume that LeNet Model is Being
Trained on MNIST Dataset. Explain the Architecture of LeNet Suitable
for Recognizing Handwritten Digits. Explain Its Application in Sorting
the Postal Code Numbers by Identifying the Handwritten Postal Code
Numbers.

2. Explain the architecture of ZF-Net. Using ZF-Net, identify and recognize


the following elements in the image:
• Fox or foxes
• Grass

3. Explain the process of fine-tuning the ZF-Net architecture through


visualization of intermediate layers by adjusting the filter sizes and
strides.

4. Explain the Architecture of VGGNet with the Diagram. Explain the


Importance of Using 3 * 3 Convolutional Filters and 2 * 2 Pooling Layers
Throughout the Network.

5. State or List the Network Uses Skip Connections That Allow Gradients
to Flow Through the Network Without Passing Through Multiple Layers
At Once. Explain the Architecture of the Listed or Stated Network.

6. A CNN model correctly classifies a panda image with 95% confidence.


After adding imperceptible noise (invisible to humans), the same model
classifies it as a gibbon with 99% confidence.
• Explain this phenomenon in the context of fooling CNNs
• Describe how adversarial examples are generated using Fast Gradient
Sign Method (FGSM).

7. Explain the Deep Art (Neural Style Transfer) process:


• The role of content loss and style loss
• How Gram matrices capture style information
• Why VGGNet layers are typically used for style transfer?

8. A medical imaging company wants to classify chest X-rays but has


limited labeled data (only 5,000 images). They have three architecture
choices: AlexNet (60M parameters), VGGNet-16 (138M parameters), or
ResNet-18 (11M parameters).
• State which architecture would be most suitable and why?
• Explain how the number of parameters affects training with limited
data.

9. Explain the architecture of GoogLeNet (Inception v1).

10. When training from scratch on ImageNet:


• ResNet-34 achieves 73% accuracy
• ResNet-50 achieves 76% accuracy
• ResNet-110 achieves 71% accuracy (worse than ResNet-34)
i. The gradient norms are healthy across all three networks. So vanishing
gradients is NOT the problem.
ii. Explain the reasons that 110-layer network perform worse than the 34-
layer network on training data itself.

11. Explain the Deep Art (Neural Style Transfer) process:


• The role of content loss and style loss
• How Gram matrices capture style information
• Why VGGNet layers are typically used for style transfer?
12. VGGNet uses a uniform architecture with 3×3 convolutions and 2×2
pooling throughout. Show that two 3×3 convolutional layers have the
same receptive field as one 5×5 layer but with fewer parameters.
Calculate the number of parameters for both cases assuming C input
and C output channels. Why is having two ReLU activations (one after
each 3×3 layer) beneficial compared to one ReLU after a 5×5 layer?
Unit IV
1. A tech company is developing a real-time translation app that aims to
provide users with immediate spoken language translation in various
environments, including noisy settings like airports or conference halls.
The app uses RNNs to process audio input, translate it, and then output
the translation in the target language almost instantaneously. Explain
the Architecture of RNN Suitable for the Application Mentioned. Explain
the Challenges with Audio Data in Real Time Noisy Environments.

2. Explain the internal architecture and functioning of an LSTM cell:


a. Describe the purpose of each gate (forget, input, output) with
equations
b. Explain how the cell state acts as a "memory highway"

3. Explain the architecture and working of a basic Recurrent Neural


Network (RNN). Derive the forward pass equations for computing
hidden states and outputs. Why are RNNs suitable for sequential data
compared to feedforward networks? Illustrate with an example of
processing a sentence "The cat sat".

4. Consider the following sequence of images depicting a badminton player


in action, from the stride to the completing the smash or hit. The frames
capture different lengths of movement. Explain how sequence learning
can be applied to analyze this sequence of movements. Describe the
process of performing sequence learning on the images.

5. A company is building a real-time speech recognition system for a call


center. They notice that:
• Short utterances (2-3 seconds) are recognized accurately.
• Long conversations (>30 seconds) show degraded performance.
• The RNN model has 3 layers with 512 hidden units each.
a. Explain why performance degrades for long sequences from a
gradient perspective.
b. Compare how vanilla RNN, LSTM, and GRU would handle this
scenario.

6. Consider the following sequence of images depicting a bowler in action,


from the stride to the delivery of the ball. The frames capture different
lengths of movement. Explain how sequence learning can be applied to
analyze this sequence of movements. Describe the process of
performing sequence learning on the images.

7. Consider a sentiment analysis task on movie reviews where some


reviews are 50 words and others are 500 words:
a. Explain why Truncated BPTT would be beneficial here
b. Design a truncation strategy: What values of k1 (forward length)
and k2 (backward length) would you choose and why?
c. Discuss the trade-offs between k1=k2 (online update) vs k1>k2
(delayed update).

8. Explain the internal architecture of an LSTM cell with all three gates:
forget gate, input gate, and output gate.
a. For Each Gate Mention the Equation(s) and also Explain how the
cell state is updated using these gates.

9. Explain Encoder-Decoder Models with an Example.


10. Explain the basic structure and operation of a Gated Recurrent Unit
(GRU).

11. Explain Truncated Backpropagation Through Time.

Unit V
1. Suppose a model has been developed to recognize audio recordings from
a dataset. You are now tasked with developing a model to
simultaneously detect and recognize both video and audio components
in a dataset. Explain how transfer learning can be applied in this
scenario and describe the architecture commonly used in transfer
learning models.

2. Explain the architecture and training process of Generative Adversarial


Networks (GANs).

3. Explain the architecture and training of Deep Belief Networks (DBNs).

4. A GAN is trained to generate faces. Training dynamics show:


Discriminator loss decreases to 0.05 (nearly zero), Generator loss
increases to 4.5. Generated images show only 5-6 distinct faces
repeated. This is mode collapse.
Explain the following:
a. Why does the generator produce limited variety despite the
discriminator being fooled?
b. Why is the discriminator too powerful a problem?

5. A hospital wants to develop a medical image classifier but has only 500
labeled X-ray images. They have access to:
• Inception v3 pre-trained on ImageNet (1.4M natural images)
• A medical imaging expert
a. Explain the training procedure including learning rates, data
augmentation, and fine-tuning strategy.
b. Explain the Process of handle the domain shift from natural
images to X-rays.
c. Should they freeze layers? Which ones and why?

6. A DBN is trained on MNIST digits with architecture: 784 (input) → 500


→ 300 → 10 (classes). Without pre-training (random initialization),
accuracy is 87%. With RBM pre-training, accuracy is 96%.
Explain the following:
a. Unsupervised pre-training help supervised learning. Justify this
Statement.
b. What do the intermediate layers learn during pre-training that
random initialization doesn't capture?
c. During fine-tuning, should you use the same learning rate for
pre-trained and randomly initialized layers? Justify.

7. Explain the Architecture of Auto-Encoder Models. Explain the


Applications of Auto-Encoder Models.

8. Restricted Boltzmann Machines can be used to pre-train deep neural


networks. Is This Statement True? Justify Your Answer.

9. Explain the Functionality of Generative Adversarial Network and Its


Applications.

You might also like