[go: up one dir, main page]

0% found this document useful (0 votes)
10 views59 pages

Unit 1 Neural Networks

The syllabus outlines a course on Deep Learning and Predictive Modelling, covering fundamental techniques of Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks, Deep Reinforcement Learning, and applications in various domains. It includes detailed units on machine learning mechanics, neural architectures, and practical implementations using TensorFlow. The course aims to equip students with the necessary skills to apply deep learning concepts in real-world scenarios, supported by recommended textbooks and references.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views59 pages

Unit 1 Neural Networks

The syllabus outlines a course on Deep Learning and Predictive Modelling, covering fundamental techniques of Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks, Deep Reinforcement Learning, and applications in various domains. It includes detailed units on machine learning mechanics, neural architectures, and practical implementations using TensorFlow. The course aims to equip students with the necessary skills to apply deep learning concepts in real-world scenarios, supported by recommended textbooks and references.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 59

SYLLABUS

DEEP LEARNING AND PREDICTIVE MODELLING

COURSE OBJECTIVES: To introduce the fundamental techniques and principles of Neural Networks To
study the different models in ANN and their applications To familiarize deep learning concepts with
Computer Vision case studies

UNIT I : NEURAL NETWORK 12


Mechanics of Machine Learning-Neuron-Linear Perceptron-Feed-Forward Neural Networks-Sigmoid,
Tanh, and ReLUNeurons- -Fast-Food Problem-Gradient Descent Delta Rule and Learning Rates.

UNIT II CONVOLUTIONAL NEURAL NETWORKS 12 TensorFlow: Creating and Manipulating


TensorFlow Variables-TensorFlow Operations-Neurons in Human VisionConvolutional Layer-Building a
Convolutional Network-Visualizing Learning in Convolutional NetworksLearningLower Dimensional
Representations- Principal Component AnalysisAutoencoder Architecture- Implementing anAutoencoder
in TensorFlow.

UNIT III RECURRENT NEURAL NETWORKS 12 Recurrent Neural Networks- Challenges with Vanishing
Gradients- Long ShortTerm Memory (LSTM) UnitsTensorFlow Primitives for RNN ModelsImplementing a
Sentiment Analysis Model- Solving seq2seq Tasks withRecurrent Neural Networks-MemoryAugmented
Neural Networks:Neural Turing Machines, Attention-Based MemoryAccess, Differentiable neural
Computers (DNC) -Memory Reuse - Temporal Linking - DNCController Network –Visualizing –
Implementing the DNC in TensorFlow.

UNIT IV DEEP REINFORCEMENT LEARNING 12 Deep Reinforcement Learning - Masters Atari Games-
Markov Decision ProcessesPolicy Versus Value Learning, PoleCart with Policy Gradients-Q-Learning and
Deep RecurrentvQ-Networks.

UNIT V APPLICATIONS OF DEEP LEARNING 12 Applications in Object Recognition and Computer


Vision- Unsupervised or generative feature learningSupervisedfeature learning and
classificationApplications in Multimodal and Multi-task Learning- Multimodalities: Text andimage-Speech
and image- Multi-task learning within the speech, NLP or image domain

TOTAL HOURS :60

TEXT BOOKS

1. Nikhil Buduma, Nicholas Locascio, “Fundamentals of Deep Learning: Designing Next-Generation


MachineIntelligence Algorithms”, O'Reilly Media, 2017.

2. Li Deng and Dong Yu “Deep Learning Methods and Applications”, Foundations and Trends in Signal
Processing, 2013.http://link. springer.com /openurl?genre=book&isbn=978-3-319-7 3004-2

REFERENCES
1. Ian Goodfellow, YoshuaBengio, Aaron Courville, ”Deep Learning (Adaptive Computation and Machine
Learningseries”, MIT Press, 2017. 109
2. SandroSkansi“Introduction to Deep Learning From Logical Calculus to Artificial Intelligence”Springer,
2018.
3. Michael Nielsen, Neural Networks and Deep Learning, Determination Press, 2015. Weblinks
https://www.oreilly.com/ai/free/files/fundamentals-of-deep-learning-sampler.pdf
NOTES
UNIT-I
UNIT I : NEURAL NETWORK 12
Mechanics of Machine Learning-Neuron-Linear Perceptron-Feed-Forward Neural
Networks-Sigmoid, Tanh, and ReLUNeurons- Training Feed-Forward Neural Networks-
Fast-Food Problem-Gradient Descent Delta Rule and Learning Rates.

I. MECHANICS OF MACHINE LEARNING

I) Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on the
development of algorithms and statistical models that enable computers to perform tasks
without explicit instructions. Instead, these systems learn from and make decisions based on
data. Here are the fundamental mechanics of machine learning:

1. Data Collection and Preparation


 Data Collection: Gathering raw data from various sources, which can include
databases, web scraping, IoT devices, and more.
 Data Cleaning: Removing or correcting errors, handling missing values, and eliminating
duplicates to ensure the data is accurate and usable.
 Data Transformation: Converting data into a suitable format or structure for analysis,
such as normalizing numerical data or encoding categorical variables.

2. Feature Engineering
 Feature Selection: Identifying the most relevant variables (features) that will contribute
to the model's predictions.
 Feature Creation: Creating new features from existing data to provide more informative
inputs to the model.

3. Model Selection
 Choosing an appropriate machine learning algorithm based on the problem type (e.g.,
regression, classification, clustering).
 Common algorithms include linear regression, decision trees, support vector machines,
and neural networks.

4. Training the Model


 Training Data: Using a subset of the data to train the model. This involves feeding the
data into the algorithm to learn patterns and relationships.
 Optimization: Adjusting model parameters (e.g., weights in neural networks) to
minimize the error (difference between predicted and actual values).
 Loss Function: A function that measures how well the model's predictions match the
actual outcomes. The goal is to minimize this loss.

5. Evaluation
 Validation Data: Using a separate subset of data to evaluate the model's performance
and prevent overfitting.
 Metrics: Common metrics include accuracy, precision, recall, F1-score, mean squared
error (MSE), and area under the ROC curve (AUC-ROC).

6. Hyperparameter Tuning
 Hyperparameters: Parameters that are not learned during training but are set before
the learning process begins (e.g., learning rate, number of layers in a neural network).
 Tuning: Adjusting these hyperparameters to find the optimal model configuration, often
using techniques like grid search or random search.

7. Deployment
 Model Integration: Incorporating the trained model into a production environment where
it can make predictions on new data.
 Monitoring: Continuously tracking the model’s performance to ensure it remains
accurate over time, and updating the model as needed.

8. Iterative Improvement
 Machine learning is an iterative process. As more data becomes available or as the
problem changes, the model is retrained and refined to improve its performance.

Types of Machine Learning


1. Supervised Learning: The model is trained on labeled data, which means the input
comes with corresponding output labels. Examples include regression and classification
tasks.
2. Unsupervised Learning: The model is trained on unlabeled data and must find patterns
and relationships within the data. Examples include clustering and dimensionality
reduction.
3. Semi-Supervised Learning: A combination of supervised and unsupervised learning
where the model is trained on a small amount of labeled data and a larger amount of
unlabeled data.
4. Reinforcement Learning: The model learns by interacting with an environment and
receiving feedback in the form of rewards or penalties.

Key Concepts
 Overfitting: When a model performs well on training data but poorly on unseen data due
to being too complex and capturing noise in the training data.
 Underfitting: When a model is too simple to capture the underlying patterns in the data,
leading to poor performance on both training and test data.
 Cross-Validation: A technique to evaluate the model’s performance by splitting the data
into multiple subsets and training/testing the model on different combinations of these
subsets.

Understanding these mechanics provides a foundational knowledge for building and deploying
effective machine learning models.
II. Neuron and Linear Perceptron
Neuron:-
A human brain has billions of neurons. Neurons are interconnected nerve cells in the human
brain that are involved in processing and transmitting chemical and electrical signals. Dendrites
are branches that receive information from other neurons.

Cell nucleus or Soma processes the information received from dendrites. Axon is a
cable that is used by neurons to send information. Synapse is the connection between
an axon and other neuron dendrites.
What is an Artificial Neuron?
An artificial neuron is a mathematical function based on a model of biological neurons,
where each neuron takes inputs, weighs them separately, sums them up and passes
this sum through a nonlinear function to produce output.

In the next section, let us compare the biological neuron with the artificial neuron.
Biological Neuron vs. Artificial Neuron
The biological neuron is analogous to artificial neurons in the following terms:

Biological Neuron Artificial Neuron


Cell Nucleus (Soma) Node

Dendrites Input

Synapse Weights or interconnections

Axon Output

Artificial Neuron at a Glance:-


The artificial neuron has the following characteristics:

 A neuron is a mathematical function modeled on the working of biological


neurons
 It is an elementary unit in an artificial neural network
 One or more inputs are separately weighted
 Inputs are summed and passed through a nonlinear function to produce
output
 Every neuron holds an internal state called activation signal
 Each connection link carries information about the input signal
 Every neuron is connected to another neuron via connection link

LINEAR PERCEPTRON
In the context of deep learning, the term "linear perceptron" typically refers to a basic building
block in neural networks, specifically the simplest form of a neuron called a perceptron. Here's a
breakdown of what a linear perceptron is and its role in deep learning:

Role in Deep Learning


In deep learning, perceptrons serve as the basic computational units arranged in layers to form
neural networks. Here’s how they fit into the broader context:

 Neural Networks: A neural network consists of multiple layers of interconnected


neurons (perceptrons). Each neuron in one layer is connected to every neuron in the
subsequent layer. The first layer (input layer) receives the raw input data, while the final
layer (output layer) produces the network's predictions or outputs.
 Training: During the training process, the network learns the optimal values of weights
www and bias through methods like gradient descent and backpropagation. This
enables the network to make accurate predictions or classifications based on the input
data.
 Non-linearity: While individual perceptrons (especially linear perceptrons) are linear in
nature, the stacking of multiple layers and the use of non-linear activation functions (like
ReLU, sigmoid, tanh) in modern neural networks introduce non-linearity, enabling them
to model complex relationships in data.

Limitations
 Linear Separability: Linear perceptrons can only learn linearly separable patterns.
Complex patterns that are not linearly separable require more sophisticated
architectures (e.g., multi-layer perceptrons with non-linear activation functions).
 Depth: Deep learning refers to neural n
 etworks with many layers. Deep networks can learn hierarchical representations of data,
extracting higher-level features from lower-level ones.

In summary, while the term "linear perceptron" specifically denotes a simple form of a neuron
without non-linear activation, its application in deep learning is foundational, forming the basis
upon which more complex neural network architectures are built to handle intricate tasks such
as image recognition, natural language processing, and more.

Perceptron
Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron
learning rule based on the original MCP neuron. A Perceptron is an algorithm for
supervised learning of binary classifiers. This algorithm enables neurons to learn and
processes elements in the training set one at a time.
Basic Components of Perceptron

Perceptron is a type of artificial neural network, which is a fundamental concept in


machine learning. The basic components of a perceptron are:

1. Input Layer: The input layer consists of one or more input neurons, which
receive input signals from the external world or from other layers of the neural
network.
2. Weights: Each input neuron is associated with a weight, which represents the
strength of the connection between the input neuron and the output neuron.
3. Bias: A bias term is added to the input layer to provide the perceptron with
additional flexibility in modeling complex patterns in the input data.
4. Activation Function: The activation function determines the output of the
perceptron based on the weighted sum of the inputs and the bias term.
Common activation functions used in perceptrons include the step function,
sigmoid function, and ReLU function.
5. Output: The output of the perceptron is a single binary value, either 0 or 1,
which indicates the class or category to which the input data belongs.
6. Training Algorithm: The perceptron is typically trained using a supervised
learning algorithm such as the perceptron learning algorithm or
backpropagation. During training, the weights and biases of the perceptron are
adjusted to minimize the error between the predicted output and the true
output for a given set of training examples.
7. Overall, the perceptron is a simple yet powerful algorithm that can be used to
perform binary classification tasks and has paved the way for more complex
neural networks used in deep learning today.
Types of Perceptron:

1. Single layer: Single layer perceptron can learn only linearly separable
patterns.
2. Multilayer: Multilayer perceptrons can learn about two or more layers having a
greater processing power.

PERCEPTRON
 Perceptron is a commonly used term in the arena of
Machine Learning and Artificial Intelligence. Being the
most basic component of Machine Learning and Deep
Learning technologies, the perceptron is the elementary
unit of an Artificial Neural Network.
 Perceptron is a commonly used term in the arena of
Machine Learning and Artificial Intelligence. Being the
most basic component of Machine Learning and Deep
Learning technologies, the perceptron is the elementary
unit of an Artificial Neural Network.

What is Perceptron?
 A perceptron is the smallest element of a neural network.
Perceptron is a single-layer neural network linear or a
Machine Learning algorithm used for supervised learning
of various binary classifiers.
 It works as an artificial neuron to perform computations
by learning elements and processing them for detecting
the business intelligence and capabilities of the input
data.
 A perceptron network is a group of simple logical
statements that come together to create an array of
complex logical statements, known as the neural network.
HUMAN NEURAL SYSTEM:

 The human brain is a complex network of billions of


interconnected cells known as Neurons. These cells
process and transmit signals. Biological neurons respond
to both chemical and electrical signals to create the
Biological Neural Network (BNN). The input and output
signals can either be excitatory or inhibitory, meaning
that they can either increase or decrease the potential of
the neuron to fire.
 The structure of a biological neuron consists of a Synapse,
dendrites, Soma or the cell body, and axon. All these
components participate in the neural processing
performed by neurons. Synapse connects an axon to
another neuron and also processes the inputs. Dendrites
receive the signals while the Soma sums up all the
incoming signals. The transmission of signals to other
neurons is carried by the axon. A Biological Neural
Network slowly yet efficiently processes highly complex
parallel inputs.
 An artificial neuron is based on a model of biological
neurons but it is a mathematical function. The neuron
takes inputs in the form of binary values i.e. 1 or 0,
meaning that they can either be ON or OFF. The output of
an artificial neuron is usually calculated by applying a
threshold function to the sum of its input values.
 The threshold function can be either linear or nonlinear. A
linear threshold function produces an output of 1 if the
sum of the input values is greater than or equal to a
certain threshold, and an output of 0 if the sum of the
input values is less than that threshold. A nonlinear
threshold function, on the other hand, can produce any
output value between 0 and 1, depending on the inputs.
 An Artificial Neural Network (ANN) is built on artificial
neurons and based on a Feed-Forward strategy. It is
known as the simplest type of neural network as it
continues learning irrespective of the data being linear or
nonlinear. The information flow through the nodes is
continuous and stops only after reaching the output node.
 The structure of artificial neurons is derived from
biological neurons and the network is also formed on a
similar principle but there are some differences between
a biological neural network and an artificial neural
network.
Perceptron Vs Neuron
 The perceptron is a mathematical model of the biological
neuron. It produces binary outputs from input values
while taking into consideration weights and threshold
values. Though created to imitate the working of
biological neurons, the perceptron model has since been
replaced by more advanced models like backpropagation
networks for training artificial neural networks.
Perceptrons use a brittle activation function to give a
positive or negative output based on a specific value.
 A neuron, also known as a node in a backpropagation
artificial neural network produces graded values between
0 and 1. It is a generalization of the idea of the perceptron
as the neuron also adds weighted inputs. However, it does
not produce a binary output but a graded value based on
the proximity of the input to the desired value of 1. The
results are biased towards the extreme values of 0 or 1 as
the node uses a sigmoidal output function. The graded
values can be interpreted to define the probability of the
input’s category.

Components of a Perceptron:

Each perceptron comprises four different parts:


1. Input Values: A set of values or a dataset for
predicting the output value. They are also described
as a dataset’s features and dataset.
2. Weights: The real value of each feature is known as
weight. It tells the importance of that feature in
predicting the final value.
3. Bias: The activation function is shifted towards the
left or right using bias. You may understand it simply
as the y-intercept in the line equation.
4. Summation Function: The summation function binds
the weights and inputs together. It is a function to find
their sum.
5. Activation Function: It introduces non-linearity in
the perceptron model.

Why do we Need Weight and Bias?


Weight and bias are two important aspects of the perceptron
model. These are learnable parameters and as the network
gets trained it adjusts both parameters to achieve the desired
values and the correct output.

 Weights are used to measure the importance of each


feature in predicting output value. Features with values
close to zero are said to have lesser weight or
significance.
 These have less importance in the prediction process
compared to the features with values further from zero
known as weights with a larger value.
 Besides high-weighted features having greater predictive
power than low-weighting ones, the weight can also be
positive or negative.
 If the weight of a feature is positive then it has a direct
relation with the target value, and if it is negative then it
has an inverse relationship with the target value.

 In contrast to weight in a neural network that increases


the speed of triggering an activation function, bias delays
the trigger of the activation function.
 It acts like an intercept in a linear equation. Simply
stated, Bias is a constant used to adjust the output and
help the model to provide the best fit output for the given
data.

Perceptron Learning Rule


The late 1950s saw the development of a new type of neural
network called perceptrons, which were similar to the neurons
from an earlier work by McCulloch and Pitts. One key
contribution by Frank Rosenblatt was his work for training
these networks with perceptron learning rules. According to
the rule, perceptron can learn automatically to generate the
desired results through optimal weight coefficients.

Rosenblatt defined four perceptron learning rules, that can be


classified as follows:

1.Supervised Learning Algorithms


Gradient Descent
In order to optimize the weights of a perceptron in a machine
learning model, there needs to be an adjustable function that
can predict future outcomes. Weights and activation functions
help with error reduction. Activation functions come into play
because they help determine how much weight should go
towards each input when prediction errors are calculated.
The more differentiable it becomes at predicting values based
on past statistics about samples within its domain (trained
data) the better it will be able to estimate accurate answers.
In this learning, the error gradient E impacts the adjustment of
weights. An example of this learning is the backpropagation
rule.
Stochastic
The term Stochastic is a mathematical term that refers to a
variable process or an outcome that involves randomness and
uncertainty. The perceptron in machine learning adjusts
weights in a probabilistic fashion under this rule.

2. Unsupervised Learning Algorithms


Hebbian
A perceptron learning rule was proposed by Hebb in 1949. It
uses a weight matrix, W to perform correlative adjustment of
weights. Weight adjustment is done by transposing the output.
Competitive
In the perceptron learning algorithm, when an input pattern is
sent to the entire layer of neurons all the units compete for
space and time. The only way that a neuron can win against
others in this type of competition is by having more efficient
weights.

Perceptron in Machine Learning


Perceptron in machine learning is used for the supervised
learning of the algorithm through various binary classification
tasks. Also referred to as Artificial Neuron or neural network
unit, a perceptron can learn to detect input data computations
in business intelligence.
The perceptron model in neural networks is one of the simplest
artificial neural networks. However, the perceptron learning
algorithm is a type of supervised machine-learning system that
uses binary classifiers for decision-making.

Binary Classifiers in Machine Learning


In Machine Learning, a binary classifier is used to decide
whether input data can be represented as vectors of numbers
and belongs to some specific category. Binary classifiers are
linear because they take into account weight values along with
features. It helps the algorithm determine the classification
value or probability distribution around the prediction point.

The Perceptron in Neural Network


Neural networks are computational algorithms or models that
understand the data and process information. As these
artificial neural networks are designed as per the structure of
the human brain, the role of neurons in the brain is played by
the perceptron in a neural network.
The perceptron model in a neural network is a convenient
model of supervised machine learning. Being the early
algorithm of binary classifiers it incorporates visual inputs and
organizes captions into one of two classes. Machine learning
algorithms exploit the crucial element of classification to
process, identify and analyze patterns. Perceptron algorithms
help in the linear separation of classes and patterns based on
the numerical or visual input data.

Perceptron Model (OR) Types of


Perceptron
Developed for the first time in 1957 at Cornell Aeronautical
Laboratory, United States the perceptron model was used for
machine-driven image recognition. Being the first-ever
artificial neural network it was claimed to be the most notable
AI-based innovation.
The perceptron algorithm however had some technical
constraints. Being single-layered the perceptron model was
only applicable for linearly separable classes. The issue was
later resolved by the discovery of multi-layered perceptron
algorithms. Here is a detailed look at the types of
perceptron models:

1. Single Layer Perceptron Model


A single-layer perceptron model is the simplest type of
artificial neural network. It includes a feed-forward network
that can analyze only linearly separable objects while being
dependent on a threshold transfer function. The model returns
only binary outcomes(target) i.e. 1, and 0.
The algorithm in a single-layered perceptron model does not
have any previous information initially. The weights are
allocated inconsistently, so the algorithm simply adds up all
the weighted inputs. If the value of the sum is more than the
threshold or a predetermined value then the output is
delivered as 1 and the single-layer perceptron is considered to
be activated.

When the values of input are similar to those desired for its
predicted output, then we can say that the perceptron has
performed satisfactorily. If there is any difference between
what was expected and obtained, then the weights will need
adjusting to limit how much these errors affect future
predictions based on unchanged parameters.
However, since the single-layer perceptron is a linear classifier
and it does not classify cases if they are not linearly separable.
So, due to the inability of the perceptron to solve problems
with linearly non-separable cases, the learning process will
never reach the point with all cases properly classified. The
inability was brought to light by Minsky & Papert in 1969.

Multilayer Perceptron Model


A multi-layer perceptron model uses the backpropagation
algorithm. Though it has the same structure as that of a single-
layer perceptron, it has one or more hidden layers.

The backpropagation algorithm is executed in two phases:


 Forward phase- Activation functions propagate from
the input layer to the output layer. All weighted inputs
are added to compute outputs using the sigmoid
threshold.
 Backward phase- The errors between the observed
actual value and the demanded nominal value in the
output layer are propagated backward. The weights
and bias values are modified to achieve the requested
value. The modification is done by apportioning the
weights and bias to each unit according to its impact
on the error.

Perceptron Function
Perceptron function ”f(x)” is generated by multiplying the
input ‘x’ with the learned weight coefficient ‘w’. The same can
be expressed through the following mathematical equation:
Advantages:

 A multi-layered perceptron model can solve complex non-linear problems.


 It works well with both small and large input data.
 Helps us to obtain quick predictions after the training.
 Helps us obtain the same accuracy ratio with big and small data.

Disadvantages:

 In multi-layered perceptron model, computations are time-consuming and


complex.
 It is tough to predict how much the dependent variable affects each
independent variable.
 The model functioning depends on the quality of training.

Characteristics of the Perceptron Model

The following are the characteristics of a Perceptron Model:

1. It is a machine learning algorithm that uses supervised learning of binary


classifiers.
2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and then the decision is
made whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the function is
more significant than zero.
5. The linear decision boundary is drawn, enabling the distinction between the
two linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it must
have an output signal; otherwise, no output will be shown.

Limitations of the Perceptron Model


A perceptron model has the following limitations:

 The input vectors must be presented to the network


one at a time or in batches so that the corrections can
be made to the network based on the results of each
presentation.
 The perceptron generates only a binary number (0 or
1) as an output due to the hard limit transfer function.
 It can classify linearly separable sets of inputs easily
whereas non-linear input vectors cannot be classified
properly.

Future of Perceptron
Machine learning is an artificial intelligence technique that has
been rapidly evolving for many years. Perceptron has been
supporting the growth of artificial intelligence and machine
learning technologies even during its development phase. It
will continue to aid analytical behavior by processing data
through pattern recognition algorithms.

FEED FORWARD NEURAL NETWORKS

 Definition:- Feed-forward neural networks transmit data in one


direction—from input to output—without feedback loops, making
them suitable for tasks like pattern recognition and classification.
Feedback neural networks, on the other hand, incorporate
feedback connections, allowing output to affect subsequent
processing.

What is a feed forward neural network?


 Feed forward neural networks are artificial neural networks in which nodes do not
form loops. This type of neural network is also known as a multi-layer neural
network as all information is only passed forward.
 During data flow, input nodes receive data, which travel through hidden layers,
and exit output nodes. No links exist in the network that could get used to by
sending information back from the output node.
 A feed forward neural network approximates functions in the following way:
 An algorithm calculates classifiers by using the formula y = f* (x).
 Input x is therefore assigned to category y.
 According to the feed forward model, y = f (x; θ). This value determines the
closest approximation of the function.
 Feed forward neural networks serve as the basis for object detection in
photos, as shown in the Google Photos app.

What is the working principle of a feed forward neural


network?

When the feed forward neural network gets simplified, it can appear as a single layer
perceptron.
This model multiplies inputs with weights as they enter the layer. Afterward, the
weighted input values get added together to get the sum. As long as the sum of the
values rises above a certain threshold, set at zero, the output value is usually 1, while if
it falls below the threshold, it is usually -1.
As a feed forward neural network model, the single-layer perceptron often gets used for
classification. Machine learning can also get integrated into single-layer perceptrons.
Through training, neural networks can adjust their weights based on a property called
the delta rule, which helps them compare their outputs with the intended values.
As a result of training and learning, gradient descent occurs. Similarly, multi-layered
perceptrons update their weights. But, this process gets known as back-propagation. If
this is the case, the network's hidden layers will get adjusted according to the output
values produced by the final layer.
Layers of feed forward neural network

 Input layer:

The neurons of this layer receive input and pass it on to the other layers of the network.
Feature or attribute numbers in the dataset must match the number of neurons in the
input layer.

 Output layer:

According to the type of model getting built, this layer represents the forecasted feature.

 Hidden layer:

Input and output layers get separated by hidden layers. Depending on the type of
model, there may be several hidden layers.
There are several neurons in hidden layers that transform the input before actually
transferring it to the next layer. This network gets constantly updated with weights in
order to make it easier to predict.

 Neuron weights:
Neurons get connected by a weight, which measures their strength or magnitude.
Similar to linear regression coefficients, input weights can also get compared.
Weight is normally between 0 and 1, with a value between 0 and 1.

 Neurons:

Artificial neurons get used in feed forward networks, which later get adapted from
biological neurons. A neural network consists of artificial neurons.
Neurons function in two ways: first, they create weighted input sums, and second, they
activate the sums to make them normal.
Activation functions can either be linear or nonlinear. Neurons have weights based on
their inputs. During the learning phase, the network studies these weights.

 Activation Function:

Neurons are responsible for making decisions in this area.


According to the activation function, the neurons determine whether to make a linear or
nonlinear decision. Since it passes through so many layers, it prevents the cascading
effect from increasing neuron outputs.
An activation function can be classified into three major categories: sigmoid, Tanh, and
Rectified Linear Unit (ReLu).

 Sigmoid:

Input values between 0 and 1 get mapped to the output values.

 Tanh:

A value between -1 and 1 gets mapped to the input values.

 Rectified linear Unit:

Only positive values are allowed to flow through this function. Negative values get
mapped to 0.
Function in feed forward neural network
Cost function
In a feed forward neural network, the cost function plays an important role. The
categorized data points are little affected by minor adjustments to weights and biases.
Thus, a smooth cost function can get used to determine a method of adjusting weights
and biases to improve performance.
Following is a definition of the mean square error cost function:

Image source
Where,
w = the weights gathered in the network
b = biases
n = number of inputs for training
a = output vectors
x = input
‖v‖ = vector v's normal length
Loss function
The loss function of a neural network gets used to determine if an adjustment needs to
be made in the learning process.
Neurons in the output layer are equal to the number of classes. Showing the differences
between predicted and actual probability distributions. Following is the cross-entropy
loss for binary classification.
Image source
As a result of multiclass categorization, a cross-entropy loss occurs:

Gradient learning algorithm


In the gradient descent algorithm, the next point gets calculated by scaling the gradient
at the current position by a learning rate. Then subtracted from the current position by
the achieved value.
To decrease the function, it subtracts the value (to increase, it would add). As an
example, here is how to write this procedure:

The gradient gets adjusted by the parameter η, which also determines the step size.
Performance is significantly affected by the learning rate in machine learning.
Output units
In the output layer, output units are those units that provide the desired output or
prediction, thereby fulfilling the task that the neural network needs to complete.
There is a close relationship between the choice of output units and the cost function.
Any unit that can serve as a hidden unit can also serve as an output unit in a neural
network.
Advantages of feed forward Neural Networks
 Machine learning can be boosted with feed forward neural networks' simplified
architecture.
 Multi-network in the feed forward networks operate independently, with a
moderated intermediary.
 Complex tasks need several neurons in the network.
 Neural networks can handle and process nonlinear data easily compared to
perceptrons and sigmoid neurons, which are otherwise complex.
 A neural network deals with the complicated problem of decision boundaries.
 Depending on the data, the neural network architecture can vary. For
example, convolutional neural networks (CNNs) perform exceptionally well in
image processing, whereas recurrent neural networks (RNNs) perform well in
text and voice processing.
 Neural networks need graphics processing units (GPUs) to handle large
datasets for massive computational and hardware performance. Several
GPUs get used widely in the market, including Kaggle Notebooks and Google
Collab Notebooks.

Architecture of Feedforward Neural Networks:-

The architecture of a feedforward neural network consists of three types of layers: the input
layer, hidden layers, and the output layer. Each layer is made up of units known as neurons,
and the layers are interconnected by weights.

 Input Layer: This layer consists of neurons that receive inputs and pass them on to the
next layer. The number of neurons in the input layer is determined by the dimensions of
the input data.
 Hidden Layers:
These layers are not exposed to the input or output and can be considered as the
computational engine of the neural network. Each hidden layer's neurons take the
weighted sum of the outputs from the previous layer, apply an activation function, and
pass the result to the next layer. The network can have zero or more hidden layers.
 Output Layer: The final layer that produces the output for the given inputs. The number
of neurons in the output layer depends on the number of possible outputs the network is
designed to produce.

Each neuron in one layer is connected to every neuron in the next layer, making this a fully
connected network. The strength of the connection between neurons is represented by weights,
and learning in a neural network involves updating these weights based on the error of the
output.
How Feedforward Neural Networks Work
The working of a feedforward neural network involves two phases: the feedforward phase and
the backpropagation phase.

 Feedforward Phase: In this phase, the input data is fed into the network, and it
propagates forward through the network. At each hidden layer, the weighted sum of the
inputs is calculated and passed through an activation function, which introduces non-
linearity into the model. This process continues until the output layer is reached, and a
prediction is made.
 Backpropagation Phase: Once a prediction is made, the error (difference between the
predicted output and the actual output) is calculated. This error is then propagated back
through the network, and the weights are adjusted to minimize this error. The process of
adjusting weights is typically done using a gradient descent optimization algorithm.

Activation Functions
Activation functions play a crucial role in feedforward neural networks. They introduce non-linear
properties to the network, which allows the model to learn more complex patterns. Common
activation functions include the sigmoid, tanh, and ReLU (Rectified Linear Unit).
Training Feedforward Neural Networks
Training a feedforward neural network involves using a dataset to adjust the weights of the
connections between neurons. This is done through an iterative process where the dataset is
passed through the network multiple times, and each time, the weights are updated to reduce
the error in prediction. This process is known as gradient descent, and it continues until the
network performs satisfactorily on the training data.
Applications of Feedforward Neural Networks
Feedforward neural networks are used in a variety of machine learning tasks including:

There are many applications for these neural networks. The following are a few of them.

1. Physiological feed forward system:-


It is possible to identify feed forward management in this situation because the central
involuntary regulates the heartbeat before exercise.

2. Gene regulation and feed forward:-


Detecting non-temporary changes to the atmosphere is a function of this motif as a feed
forward system. You can find the majority of this pattern in the illustrious networks.
3. Automation and machine management:-
Automation control using feed forward is one of the disciplines in automation.

4. Parallel feed forward compensation with derivative:-


An open-loop transfer converts non-minimum part systems into minimum part systems
using this technique.

 5. Pattern recognition
 6. Classification tasks
 7. Regression analysis
 8. Image recognition
 9. Time series prediction

Despite their simplicity, feedforward neural networks can model complex relationships in data
and have been the foundation for more complex neural network architectures.
Challenges and Limitations
While feedforward neural networks are powerful, they come with their own set of challenges and
limitations. One of the main challenges is the choice of the number of hidden layers and the
number of neurons in each layer, which can significantly affect the performance of the network.
Overfitting is another common issue where the network learns the training data too well,
including the noise, and performs poorly on new, unseen data.
In conclusion, feedforward neural networks are a foundational concept in the field of neural
networks and deep learning. They provide a straightforward approach to modeling data and
making predictions and have paved the way for more advanced neural network architectures
used in modern artificial intelligence applications.

Sigmoid, Tanh, and ReLUNeurons


Comparison of Sigmoid, Tanh and ReLU Neurons:

 ANN is the part of the Deep learning where we will learn about the artificial
neurons . To understand this we have to understand about the working of the
neurons in the proper way.
 In biology we understand that the neurons are used to accept the information of a
signals sensed by the organs and these organs sends the sensed data to our
brain so our brain can take the appropriate decisions based on the sensed
organs. According to the sensed results our brain do some operations,
calculations and give some appropriate answer/output. This output is followed by
our sensed organs.
 If we want to implement all these brain functionality artificially then this type of
network is known as the Artificial Neural Network where we will take the single
node(which is a replica of the Neuron) and partition it into further two parts. First
part is known as Summation and second one is considered as a function is
known as the Activation Function.

Summation
This summation is used to collect all the neural signals along with there weights. For
example first neuron signal is x1 and their weight is ω1 so the first neuron signal would
be x1 ω1. Similarly we will calculate the neural values for the second neuron, third
neuron and so on. At last we will take the some of all the neurons .So the total

𝑥1𝜔1+𝑥2𝜔2+𝑥3𝜔3−−−−−−−𝑥𝑛𝜔𝑛
summation weight is calculated as

X1ω1+x2ω2+x3ω3−−−−−−−xnωn
Activation Function
Activation function is used to generate or define a particular output for a given node
based on the input is getting provided. That mean we will apply the activation function

𝑌=𝑓(Σ𝑥𝑖𝜔𝑖+𝐵𝑖𝑎𝑠)
on the summation results.
Y=f(Σxiωi+Bias)
Activation Function are multiple types like Linear Activation Function, Heaviside
Activation Function, Sigmoid Function, Tanh function and RELU activation Function.
In deep learning, Activation functions are the very important part of the any neural
network because it is able to perform a very complicated and critical work like an object
detection, image classification, language translation, etc. which are necessary to
address by using an activation function. We cant imagine to perform these tasks without
using a deep learning.
But in this blog we will put more concentrate on the Sigmoid Activation Function, Tanh
Activation Function and Relu (Rectified Linear Unit) Activation Function because these
Activation Functions are mostly used in ANN (Artificial Neural Network) and deep
learning.

1. Sigmoid Activation Function


Sigmoid function is known as the logistic function which helps to normalize the output of
any input in the range between 0 to 1. The main purpose of the activation function is to
maintain the output or predicted value in the particular range, which makes the good
efficiency and accuracy of the model.

fig: sigmoid function


Equation of the sigmoid activation function is given by:
y = 1/(1+e(-x) )
Range: 0 to 1
Here Y can be anything for a neuron between range -infinity to +infinity. So, we have to
bound our output to get the desired prediction or generalized results.
The major drawback of the sigmoid activation function is to create a vanishing
gradient problem.

 This is the Non zero Centered Activation Function


 The model Learning rate is slow
 Create a Vanishing gradient problem.

Vanishing Gradient Problem


Vanishing gradient problem mostly occurs during the backpropagation when the value
of the weights are changed. To understand the problem we will increase the value of the
input values in the activation function, At that time we will notice that the predicted
output is available on the range of the selected activation function and maintain the
threshold value.
For the sigmoid function, the range is from 0 to 1. We know that the maximum threshold
value is 1 and the minimum value is 0. So when we increase the input values, the
predicted output must lie near to the upper threshold value which is 1. So the predicted
output must be less than or near to the 1.
We again increasing the input value and the output comes on the max threshold value
and lies there. When the neuron outputs are very small for example ( −1<output<1), the
patterns are created during the optimization will be smaller and smaller towards the
upper layers. This causes them to make the learning process very slow, and make them
converge to their optimum and this problem is known as the Vanishing Gradient

𝑌=𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛(∑(𝑤𝑒𝑖𝑔ℎ𝑡𝑠∗𝑖𝑛𝑝𝑢𝑡+𝑏𝑖𝑎𝑠))
Problem.

Y=Activationfunction(∑(weights∗input+bias))
In the nutshell, a neural network is a very dominant method and technology in machine
learning which mimics how a brain perceives and operates.

2. Hyperbolic Tangent Activation Function


Tanh Activation function is superior then the Sigmoid Activation function because the
range of this activation function is higher than the sigmoid activation function. This is the
major difference between the Sigmoid and Tanh activation function. Rest functionality is
the same as the sigmoid function like both can be used on the feed-forward network.

Equation can be created by: 𝑦=𝑡𝑎𝑛ℎ(𝑥)


Range : -1 to 1

y=tanh(x)
fig: Hyberbolic Tangent Activation function
Advantage of TanH Activation function
Here negative values are also considered whereas in the sigmoid minimum range is 0
but in Tanh, the minimum range is -1. This is why the Tanh activation function is also
known as the zero centered activation function.
Disadvantage
Also facing the same issue of Vanishing Gradient Problem like a sigmoid function.

3. ReLu (Rectified Linear Unit) Activation Function


ReLu is the best and most advanced activation function right now compared to the
sigmoid and TanH because all the drawbacks like Vanishing Gradient Problem is
completely removed in this activation function which makes this activation function more
advanced compare to other activation function.

Range: 0 to infinity
Equation can be created by:
{ xi if x >=0
0 if x <=0 }

fig: ReLu Activation function

Advantage of ReLu:
 Here all the negative values are converted into the 0 so there are no negative
values are available.
 Maximum Threshold values are Infinity, so there is no issue of Vanishing
Gradient problem so the output prediction accuracy and there efficiency is
maximum.
 Speed is fast compare to other activation function
Comparison

FAST FOOD PROBLEM

Introduction
 Fast-Food Problem: A term used to describe the challenges of optimizing and updating
model parameters efficiently during training in deep learning. The analogy is drawn to
the fast-food industry, where quick and efficient service is crucial.
Components of the Fast-Food Problem
1. Efficiency in Training:
o Batch Processing: Processing data in mini-batches to speed up training.
o Parallelism: Utilizing GPUs or TPUs for parallel processing to handle multiple
operations simultaneously.
2. Optimization Techniques:
o Gradient Descent Variants: Different methods to update model parameters
efficiently.
o Learning Rate Scheduling: Dynamically adjusting the learning rate during
training for improved convergence.
3. Model Architecture and Complexity:
o Efficient Network Design: Creating architectures that reduce computational
load while maintaining performance.
o Parameter Sharing and Pruning: Techniques to reduce model complexity and
speed up training and inference.

Addressing the Fast-Food Problem with Gradient Descent and Learning


Rates
Gradient Descent

Gradient descent is used to minimize the loss function by iteratively adjusting the model
parameters. The efficiency of gradient descent can be improved through various techniques:

 Stochastic Gradient Descent (SGD): Instead of using the entire dataset, SGD updates
model parameters using a single data point or a mini-batch, speeding up the training
process.
 Momentum: Momentum helps accelerate gradients vectors in the right directions, thus
leading to faster converging.
 Adaptive Methods: Algorithms like Adam and RMSprop adjust the learning rates based
on the gradients, providing a more efficient and stable training process.

Learning Rates

The learning rate (η\etaη) determines the step size at each iteration while moving towards a
minimum of the loss function. The choice of learning rate is crucial:

 Fixed Learning Rate: A constant learning rate throughout training.


 Adaptive Learning Rates: Methods like Adam adjust the learning rate dynamically
based on the gradients.
 Learning Rate Schedules: Reduce the learning rate at predefined points in training,
helping in fine-tuning the model.

Gradient Descent
Overview

 Gradient Descent: An optimization algorithm used to minimize the cost function by


iteratively adjusting the weights.

Variants of Gradient Descent

1. Batch Gradient Descent:


o Uses the entire dataset to compute the gradient of the cost function.
o Pros: Stable and accurate.
o Cons: Computationally expensive and slow for large datasets.
2. Stochastic Gradient Descent (SGD):
o Uses one sample at a time to compute the gradient.
o Pros: Faster updates, can escape local minima.
o Cons: More variance in the updates, less stable.
3. Mini-Batch Gradient Descent:
o Uses a small batch of samples to compute the gradient.
o Pros: Balances the efficiency and stability of Batch and SGD.
4.

Enhancements to Gradient Descent

1. Momentum:
o Accelerates gradient vectors in the right direction, leading to faster convergence.
o Formula: vt=γvt−1+η∇J(θ)v_t = \gamma v_{t-1} + \eta \nabla J(\theta)vt=γvt−1
+η∇J(θ)
o Update Rule: θ=θ−vt\theta = \theta - v_tθ=θ−vt
2. Adaptive Methods:
o AdaGrad: Adapts the learning rate based on the past gradients.
o RMSprop: Modifies AdaGrad to reduce aggressive, monotonically decreasing
learning rates.
o Adam: Combines the advantages of both AdaGrad and RMSprop.

Learning Rates
Importance of Learning Rates

 Learning Rate (η\etaη): Determines the step size at each iteration while moving
towards a minimum of the loss function.
 Choosing an appropriate learning rate is crucial for efficient training.

Strategies for Learning Rates

1. Fixed Learning Rate: Constant throughout training.


2. Adaptive Learning Rates: Algorithms like Adam adjust the learning rate dynamically
based on the gradients.
3. Learning Rate Schedules:
o Step Decay: Reduces the learning rate at specific intervals.
o Exponential Decay: Reduces the learning rate exponentially over time.
o Polynomial Decay: Reduces the learning rate following a polynomial function.
Model Architecture and Complexity
1. Efficient Network Design:
o Use architectures like ResNet, Inception, MobileNet to reduce computational
load.
o Techniques like skip connections and depthwise separable convolutions help in
efficiency.
2. Parameter Sharing and Pruning:
o Parameter Sharing: Reusing the same parameters across different parts of the
model (e.g., convolutional layers).
o Pruning: Removing unnecessary connections and weights to simplify the model.

Practical Implementation
Example Code for Gradient Descent with Learning Rate Scheduling
import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple neural network


class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)

def forward(self, x):


x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x

# Initialize the network, loss function, and optimizer


model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# Training loop with learning rate scheduling


for epoch in range(100):
# Adjust learning rate
if epoch % 30 == 0 and epoch != 0:
for param_group in optimizer.param_groups:
param_group['lr'] /= 2

for data, target in train_loader:


optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
print(f'Epoch {epoch}, Loss: {loss.item()}')

Summary
 The Fast-Food Problem in deep learning emphasizes the need for efficient training and
optimization techniques to handle large datasets and complex models.
 Key strategies include using mini-batches, parallel processing, gradient descent
variants, learning rate scheduling, and efficient network architectures.
 Practical implementation involves combining these strategies to achieve faster and more
stable training processes.

Delta Learning Rule & Gradient Descent | Neural Networks

 The development of the perceptron was a big step towards


the goal of creating useful connectionist networks capable
of learning complex relations between inputs and outputs.
 In the late 1950’s, the connectionist community understood
that what was needed for further development of
connectionist models was a mathematically-derived (and
thus potentially more flexible and powerful) rule for
learning.
 By early 1960’s, the Delta Rule [also known as the Widrow
& Hoff Learning rule or the Least Mean Square (LMS) rule]
was invented by Widrow and Hoff.
 This rule is similar to the perceptron learning rule by
McClelland & Rumelhart, 1988, but is also characterized by
a mathematical utility and elegance missing in the
perceptron and other early learning rules.
 The Delta Rule uses the difference between target
activation (i.e., target output values) and obtained
activation to drive learning.
 For reasons discussed below, the use of a threshold
activation function (as used in both the McCulloch-Pitts
network and the perceptron) is dropped & instead a linear
sum of products is used to calculate the activation of the
output neuron (alternative activation functions can also be
applied).
 Thus, the activation function is called a Linear Activation
function, in which the output node’s activation is simply
equal to the sum of the network’s respective input/weight
products.
 The strength of network connections (i.e., the values of the
weights) are adjusted to reduce the difference between
target and actual output activation (i.e., error).
 A graphical depiction of a simple two-layer network capable
of deploying the Delta Rule is given in the figure below
(Such a network is not limited to having only one output
node):

 During forward propagation through a network, the


output (activation) of a given node is a function of its inputs.
The inputs to a node, which are simply the products of the
output of preceding nodes with their associated weights,
are summed and then passed through an activation function
before being sent out from the node. Thus, we have the
following:
and

where ‘Sj’ is the sum of all relevant products of weights and


outputs from the previous layer i, ‘wij’ represents the relevant
weights connecting layer i with layer j, ‘ai’ represents the
activation of nodes in the previous layer i, ‘aj’ is the activation of
the node at hand, and ‘f’is the activation function.

Error function with just 2 weights w1 and w2


For any given set of input data and weights, there will be an
associated magnitude of error, which is measured by an error
function (also known as a cost function) (e.g., Oh, 1997; Yam and
Chow, 1997). The Delta Rule employs the error function for what
is known as Gradient Descent learning, which involves the
‘modification of weights along the most direct path in weight-
space to minimize error’, so change applied to a given weight is
proportional to the negative of the derivative of the error with
respect to that weight (McClelland and Rumelhart 1988, pp.126–
130). The Error/Cost function is commonly given as the sum of
the squares of the differences between all target and actual node
activation for the output layer. For a particular training pattern
(i.e., training case), error is thus given by:

 where ‘Ep’ is total error over the training pattern, ½ is a


value applied to simplify the function’s derivative, ’n’
represents all output nodes for a given training pattern, ‘tj’
sub n represents the Target value for node n in output layer
j, and ‘aj’ sub n represents the actual activation for the
same node. This particular error measure is attractive
because its derivative, whose value is needed in the
employment of the Delta Rule, and is easily calculated.
 Error over an entire set of training patterns (i.e., over one
iteration, or epoch) is calculated by summing all ‘Ep’:

Error/Cost Function
where ‘E’ is total error, and ‘p’ represents all training patterns.
An equivalent term for E in earlier equation is Sum-of-squares
error. A normalized version of this equation is given by the
Mean Squared Error (MSE) equation:
 where ‘P’ and ’N’ are the total number of training patterns
and output nodes, respectively. It is the error of both
previous equations, that gradient descent attempts to
minimize (not strictly true if weights are changed after each
input pattern is submitted to the network.
 Error over a given training pattern is commonly expressed
in terms of the Total Sum of Squares (‘tss’) error, which is
simply equal to the sum of all squared errors over all output
nodes and all training patterns.
 ‘The negative of the derivative of the error function is
required in order to perform Gradient Descent Learning’.
 The derivative of our equation(which measures error for a
given pattern ‘p’) above, with respect to a particular weight
‘wij’ sub ‘x’, is given by the chain rule as:

where ‘aj’ sub ‘z’ is activation of the node in the output layer that
corresponds to weight ‘wij’ sub x (subscripts refer to particular
layers of nodes or weights, and the ‘sub-subscripts’ simply refer
to individual weights and nodes within these layers). It follows
that:

and
Thus, the derivative of the error over an individual training
pattern is given by the product of the derivatives of our prior
equation:

Because Gradient Descent learning requires that any change in a


particular weight be proportional to the negative of the
derivative of the error, the change in a given weight must be
proportional to the negative of our prior equation . Replacing the
difference between the target and actual activation of the
relevant output node by d, and introducing a learning rate
epsilon, that equation can be re-written in the final form of the
Delta Rule:

Delta Rule for Perceptrons


The reasoning behind the use of a Linear Activation function
here instead of a Threshold Activation function can now be
justified: Threshold activation function that characterizes both
the McColloch and Pitts network and the perceptron is not
differentiable at the transition between the activations of 0and 1
(slope = infinity), and its derivative is 0 over the remainder of
the function. Hence, Threshold activation function cannot be
used in Gradient Descent learning. Whereas a Linear
Activation function (or any other function that is differential)
allows the derivative of the error to be calculated.
Three-dimensional depiction of an Actual error surface (Leverington, 2001)
Two-dimensional depiction of the error surface

UNIT-1
TWO MARKS
1) What is deep learning?
Deep learning is a part of machine learning with an algorithm inspired by the structure
and function of the brain, which is called an artificial neural network. In the mid-1960s,
Alexey GrigorevichIvakhnenko published the first general, while working on deep
learning network. Deep learning is suited over a range of fields such as computer vision,
speech recognition, natural language processing, etc.

2) What are the main differences between AI, Machine Learning,


and Deep Learning?
 AI stands for Artificial Intelligence. It is a technique which enables machines to

mimic human behavior.

 Machine Learning is a subset of AI which uses statistical methods to enable

machines to improve with experiences.

 Deep learning is a part of Machine learning, which makes the computation of

multi-layer neural networks feasible. It takes advantage of neural networks to

simulate human-like decision making.

3) Differentiate supervised and unsupervised deep learning


procedures.
 Supervised learning is a system in which both input and desired output data are

provided. Input and output data are labeled to provide a learning basis for future

data processing.

 Unsupervised procedure does not need labeling information explicitly, and the

operations can be carried out without the same. The common unsupervised
learning method is cluster analysis. It is used for exploratory data analysis to

find hidden patterns or grouping in data.

4) What are the applications of deep learning?


There are various applications of deep learning:

 Computer vision

 Natural language processing and pattern recognition

 Image recognition and processing

 Machine translation

 Sentiment analysis

 Question Answering system

 Object Classification and Detection

 Automatic Handwriting Generation

 Automatic Text Generation.

5)What are the deep learning frameworks or tools?


Deep learning frameworks or tools are:

Tensorflow, Keras, Chainer, Pytorch, Theano& Ecosystem, Caffe2, CNTK, DyNetGensim,


DSSTNE, Gluon, Paddle, Mxnet, BigDL

6. What are the advantages and disadvantages of deep learning?


Advantages Disadvantages

Feature Learning-Automatic feature Data Requirements-Requires large amounts of


extraction reduces need for domain labeled data.
expertise.

High Accuracy and Performance-Often Computational Resources-Needs powerful


achieves higher accuracy, especially in hardware (GPUs/TPUs) which can be costly.
complex tasks.
Handles Large and Complex Data- Complexity and Interpretability- Seen as "black
Excels with large datasets and boxes" with difficult-to-interpret decision
unstructured data like images and text. processes.

End-to-End Learning-Learns directly Overfitting-Easily overfits to training data without


from raw data to output. sufficient data diversity.

Scalability- Performance often improves Hyperparameter Tuning- Involves selecting many


with more data and resources. hyperparameters, requiring extensive
experimentation.

Versatility-Applicable to a wide range of Dependency on High-Quality Data-Sensitive to the


fields like NLP, computer vision, quality of the data (noisy, biased, or unbalanced
bioinformatics. datasets can degrade performance).

Potential for Innovation-Continuous Training Time-Can take a long time to train deep
advancements and new architectures networks.
keep improving capabilities.

Reduced Need for Feature Engineering- Energy Consumption-High computational intensity


Minimizes the need for manual feature results in high energy consumption.
design.

7. What are the prerequisites for starting in Deep Learning?


There are some basic requirements for starting in Deep Learning, which are:

 Machine Learning

 Mathematics

 Python Programming

8. What are the supervised learning algorithms in Deep learning?

 Artificial neural network

 Convolution neural network

 Recurrent neural network


9.What are the unsupervised learning algorithms in Deep

learning?

 Self Organizing Maps

 Deep belief networks (Boltzmann Machine)

 Auto Encoders

10. How many layers in the neural network?

 Input Layer

The input layer contains input neurons which send information to the hidden

layer.

 Hidden Layer

The hidden layer is used to send data to the output layer.

 Output Layer

The data is made available at the output layer.

11. What is the use of the Activation function?

The activation function is used to introduce nonlinearity into the neural network so that it
can learn more complex function. Without the Activation function, the neural network
would be only able to learn function, which is a linear combination of its input data.
Activation function translates the inputs into outputs. The activation function is
responsible for deciding whether a neuron should be activated or not. It makes the
decision by calculating the weighted sum and further adding bias with it. The basic
purpose of the activation function is to introduce non-linearity into the output of a
neuron.
Purpose of Activation Functions
1. Introducing Non-Linearity:
o Without activation functions, neural networks would be limited to learning
only linear relationships. Activation functions allow networks to model
complex, non-linear relationships by introducing non-linearity into the
network.
2. Enabling Network Depth:
o Activation functions enable deep networks to learn hierarchical
representations. By stacking multiple layers with activation functions,
networks can learn features at various levels of abstraction.
3. Squashing Output:
o Activation functions often squash the output into a specific range, making
it easier to handle and interpret. For example, sigmoid and tanh functions
squash outputs to a range between 0-1 and -1-1, respectively.
4. Providing Differentiability:
o Activation functions are designed to be differentiable, allowing for the
backpropagation algorithm to compute gradients and update weights
effectively during training.

12. How many types of activation function are available?


 Binary Step

 Sigmoid

 Tanh

 ReLU

 Leaky ReLU

 Softmax

 Swish

13.What is a perceptron?
A perceptron is similar to the actual neuron in the human brain. It receives inputs from

various entities and applies functions to these inputs, which transform them to be the

output.
A perceptron is mainly used to perform binary classification where it sees an input,

computes functions based on the weights of the input, and outputs the required

transformation.

14. What are the steps involved in training a perception in


Deep Learning?
There are five main steps that determine the learning of a perceptron:

1. Initialize thresholds and weights

2. Provide inputs

3. Calculate outputs

4. Update weights in each step

5. Repeat steps 2 to 4

15. Differentiate between a single-layer perceptron and a multi-layer perceptron.

Single-layer Perceptron Multi-layer Perceptron

Cannot classify non-linear data points Can classify non-linear data

Takes in a limited amount of Withstands a lot of parameters

parameters

Less efficient with large data Highly efficient with large datasets
PART-B
1. What are the Mechanics of Machine Learning?
2. What is deep learning? Explain its uses and application.
3. Explain the structure of Feed Forward Neural Network with
diagram.
4. What are the components of a Neural Network? Explain
with diagram.
5. Explain Activation Functions with a diagram.
6. Explain Fast-Food Problem with proper examples.
7. Explain in detail about the Gradient Descent Delta Rule and
Learning Rates.
8. Compare i) Supervised Learning ii) Unsupervised Learning.
9. Distinguish between Machine Learning and Deep Learning.
10. Discuss Perceptron in detail.
11. Describe the Following:
i) Linear Perceptron
ii) MultiLayer Perceptron
12. Explain Sigmoid, Tanh and ReLU.
13. What is Deep Learning? List the major architectures of Deep networks.
14. Explain the following terms denoting their notations and equations (where necessary)
with respect to deep neural networks:((Any-4))
1) Connection weights and Biases 2) Epoch 3)Layers and Parameters 4) Activation
Functions 5) Loss/Cost Functions 6) Learning rate.
15. What are the applications of Machine Learning? and When it is used.

You might also like