[go: up one dir, main page]

0% found this document useful (0 votes)
11 views5 pages

Algorithmic Advances

Uploaded by

220701130
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Algorithmic Advances

Uploaded by

220701130
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1) What are the algorithmic advances that made modern deep learning successful?

Algorithms

Algorithms define how neural networks learn from data through mathematical computations
and optimizations, ensuring efficient learning, improving convergence speed and model
accuracy, and providing flexibility to solve a wide range of deep learning tasks.

Key parameters involved in deep learning Algorithms

a) Types of Neural Networks

1. Feedforward Neural Networks (FNNs)-The simplest type of artificial neural network


where connections between nodes do not form a cycle. Often used for basic pattern
recognition.Convolutional Neural Network (CNN)-CNNs are a class of deep neural networks
most commonly used for image analysis.They use convolutional layers to extract spatial
features from input images.

2. Recurrent Neural Network (RNN)-RNNs use sequential information to build a model.

Ideal for tasks involving time-series data, speech recognition, and natural language processing
as they can memorize past inputs using hidden states.

3. Generative Adversarial Network (GAN)-GANs consist of two neural networks: a


generator and a discriminator.They generate synthetic instances of data that resemble real data
(e.g., generating realistic images).

4. Deep Belief Network (DBN)-DBNs are generative graphical models composed of multiple
layers of hidden units (latent variables).Each layer is interconnected, though the individual
units within a layer are not connected to each other. Commonly used for feature extraction and
unsupervised learning tasks.

b) Activation Functions:

 Activation functions like ReLU and its variants solved the vanishing gradient problem,
 Functions like ReLU, Sigmoid, Tanh that introduce non-linearity into the model.
 Crucial for enabling the network to learn complex patterns.

c) Advanced Optimization Algorithms:

Optimizers such as Adam, RMSProp, AdaGrad, and Momentum play a crucial role in training
deep networks by using adaptive learning rates and momentum techniques. They enhance
convergence speed, training stability, and reduce sensitivity to hyperparameters, making deep
learning more efficient and reliable.
d) Loss Functions:

 Functions that measure the difference between the predicted output and the actual
target.
 Examples Mean Squared Error (MSE) for regression and Cross- Entropy Loss for
classification tasks.

e) Regularization Techniques:

• Reduced overfitting and improved generalization.


• Allowed training on complex models without requiring excessively large datasets.

• Methods such as Dropout, L1/L2 regularization, Batch Normalization, Early Stopping


to prevent overfitting.

• These building blocks form the foundation of deep learning, .

f) Backpropagation Algorithm

Backpropagation: efficiently computes gradients using the chain rule, enabling deep neural
networks to learn from data and making end-to-end training of deep models feasible.

1) Forward Propagation:

• The process of passing inputs through the network to obtain the output.

• Involves computing the weighted sum of inputs and applying the activation function at
each neuron.

2 Backward Propagation:
 The process of updating the weights and biases based on the error.
 Involves computing the gradient of the loss function with respect to each weight and
bias and adjusting them to minimize the loss.

g) Weight Initialization Strategies

Weight initialization plays a critical role in the successful training of deep neural networks. If
weights are not initialized properly, it can lead to problems like vanishing gradients (where
gradients shrink and slow learning) or exploding gradients (where gradients grow
uncontrollably), especially in deep architectures.
h) Batch Normalization

Batch Normalization is a widely used technique in deep learning that normalizes the inputs to
each layer during training, helping the network train faster, more efficiently, and with greater
stability. Its main purpose is to address internal covariate shift.The change in input distribution
to a layer as earlier layer parameters change. By stabilizing these inputs, batch normalization
accelerates convergence, allows the use of higher learning rates, and improves generalization
by reducing overfitting to some extent.

Advantages of Batch Normalization:

1. Faster Training

2. Improved Gradient Flow

3. Allows Higher Learning Rates

4. Acts as a Regularizer (Reduces Overfitting)

I) Residual Connections (ResNets)

Residual Connections, introduced in ResNets, solve the degradation problem in deep networks
by adding shortcut (identity) connections that allow the input to skip one or more layers. Instead
of learning a full transformation, the network learns the residual (difference), making training
easier. This helps gradients flow more effectively during backpropagation.

J) Dropout Regularization

Dropout helps to prevent overfitting. During training, it randomly drops a fraction of neurons
in a layer along with their connections, so each iteration trains a different subset of the network.
This process mimics training an ensemble of smaller networks, which improves generalization.
By preventing neurons from co-adapting too much, dropout forces the network to learn
redundant, robust features, making it more resilient to noise and better at handling unseen data.

K) Attention Mechanism and Transformers


The attention mechanism addresses the limitations of sequential processing in traditional
models like RNNs. RNNs process data one step at a time, which makes it difficult to learn long-
range dependencies and often leads to memory bottlenecks, where earlier input information is
forgotten or weakened. In contrast, attention allows the model to analyze all input positions
simultaneously, assigning attention weights based on the relevance of each input
element to the current output. This enables the model to focus on the most important words,
pixels, or features, greatly improving context understanding and performance.

You might also like