Unit 1 Neural Networks
Unit 1 Neural Networks
COURSE OBJECTIVES: To introduce the fundamental techniques and principles of Neural Networks To
study the different models in ANN and their applications To familiarize deep learning concepts with
Computer Vision case studies
UNIT III RECURRENT NEURAL NETWORKS 12 Recurrent Neural Networks- Challenges with Vanishing
Gradients- Long ShortTerm Memory (LSTM) UnitsTensorFlow Primitives for RNN ModelsImplementing a
Sentiment Analysis Model- Solving seq2seq Tasks withRecurrent Neural Networks-MemoryAugmented
Neural Networks:Neural Turing Machines, Attention-Based MemoryAccess, Differentiable neural
Computers (DNC) -Memory Reuse - Temporal Linking - DNCController Network –Visualizing –
Implementing the DNC in TensorFlow.
UNIT IV DEEP REINFORCEMENT LEARNING 12 Deep Reinforcement Learning - Masters Atari Games-
Markov Decision ProcessesPolicy Versus Value Learning, PoleCart with Policy Gradients-Q-Learning and
Deep RecurrentvQ-Networks.
TEXT BOOKS
2. Li Deng and Dong Yu “Deep Learning Methods and Applications”, Foundations and Trends in Signal
Processing, 2013.http://link. springer.com /openurl?genre=book&isbn=978-3-319-7 3004-2
REFERENCES
1. Ian Goodfellow, YoshuaBengio, Aaron Courville, ”Deep Learning (Adaptive Computation and Machine
Learningseries”, MIT Press, 2017. 109
2. SandroSkansi“Introduction to Deep Learning From Logical Calculus to Artificial Intelligence”Springer,
2018.
3. Michael Nielsen, Neural Networks and Deep Learning, Determination Press, 2015. Weblinks
https://www.oreilly.com/ai/free/files/fundamentals-of-deep-learning-sampler.pdf
NOTES
UNIT-I
UNIT I : NEURAL NETWORK 12
Mechanics of Machine Learning-Neuron-Linear Perceptron-Feed-Forward Neural
Networks-Sigmoid, Tanh, and ReLUNeurons- Training Feed-Forward Neural Networks-
Fast-Food Problem-Gradient Descent Delta Rule and Learning Rates.
I) Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on the
development of algorithms and statistical models that enable computers to perform tasks
without explicit instructions. Instead, these systems learn from and make decisions based on
data. Here are the fundamental mechanics of machine learning:
2. Feature Engineering
Feature Selection: Identifying the most relevant variables (features) that will contribute
to the model's predictions.
Feature Creation: Creating new features from existing data to provide more informative
inputs to the model.
3. Model Selection
Choosing an appropriate machine learning algorithm based on the problem type (e.g.,
regression, classification, clustering).
Common algorithms include linear regression, decision trees, support vector machines,
and neural networks.
5. Evaluation
Validation Data: Using a separate subset of data to evaluate the model's performance
and prevent overfitting.
Metrics: Common metrics include accuracy, precision, recall, F1-score, mean squared
error (MSE), and area under the ROC curve (AUC-ROC).
6. Hyperparameter Tuning
Hyperparameters: Parameters that are not learned during training but are set before
the learning process begins (e.g., learning rate, number of layers in a neural network).
Tuning: Adjusting these hyperparameters to find the optimal model configuration, often
using techniques like grid search or random search.
7. Deployment
Model Integration: Incorporating the trained model into a production environment where
it can make predictions on new data.
Monitoring: Continuously tracking the model’s performance to ensure it remains
accurate over time, and updating the model as needed.
8. Iterative Improvement
Machine learning is an iterative process. As more data becomes available or as the
problem changes, the model is retrained and refined to improve its performance.
Key Concepts
Overfitting: When a model performs well on training data but poorly on unseen data due
to being too complex and capturing noise in the training data.
Underfitting: When a model is too simple to capture the underlying patterns in the data,
leading to poor performance on both training and test data.
Cross-Validation: A technique to evaluate the model’s performance by splitting the data
into multiple subsets and training/testing the model on different combinations of these
subsets.
Understanding these mechanics provides a foundational knowledge for building and deploying
effective machine learning models.
II. Neuron and Linear Perceptron
Neuron:-
A human brain has billions of neurons. Neurons are interconnected nerve cells in the human
brain that are involved in processing and transmitting chemical and electrical signals. Dendrites
are branches that receive information from other neurons.
Cell nucleus or Soma processes the information received from dendrites. Axon is a
cable that is used by neurons to send information. Synapse is the connection between
an axon and other neuron dendrites.
What is an Artificial Neuron?
An artificial neuron is a mathematical function based on a model of biological neurons,
where each neuron takes inputs, weighs them separately, sums them up and passes
this sum through a nonlinear function to produce output.
In the next section, let us compare the biological neuron with the artificial neuron.
Biological Neuron vs. Artificial Neuron
The biological neuron is analogous to artificial neurons in the following terms:
Dendrites Input
Axon Output
LINEAR PERCEPTRON
In the context of deep learning, the term "linear perceptron" typically refers to a basic building
block in neural networks, specifically the simplest form of a neuron called a perceptron. Here's a
breakdown of what a linear perceptron is and its role in deep learning:
Limitations
Linear Separability: Linear perceptrons can only learn linearly separable patterns.
Complex patterns that are not linearly separable require more sophisticated
architectures (e.g., multi-layer perceptrons with non-linear activation functions).
Depth: Deep learning refers to neural n
etworks with many layers. Deep networks can learn hierarchical representations of data,
extracting higher-level features from lower-level ones.
In summary, while the term "linear perceptron" specifically denotes a simple form of a neuron
without non-linear activation, its application in deep learning is foundational, forming the basis
upon which more complex neural network architectures are built to handle intricate tasks such
as image recognition, natural language processing, and more.
Perceptron
Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron
learning rule based on the original MCP neuron. A Perceptron is an algorithm for
supervised learning of binary classifiers. This algorithm enables neurons to learn and
processes elements in the training set one at a time.
Basic Components of Perceptron
1. Input Layer: The input layer consists of one or more input neurons, which
receive input signals from the external world or from other layers of the neural
network.
2. Weights: Each input neuron is associated with a weight, which represents the
strength of the connection between the input neuron and the output neuron.
3. Bias: A bias term is added to the input layer to provide the perceptron with
additional flexibility in modeling complex patterns in the input data.
4. Activation Function: The activation function determines the output of the
perceptron based on the weighted sum of the inputs and the bias term.
Common activation functions used in perceptrons include the step function,
sigmoid function, and ReLU function.
5. Output: The output of the perceptron is a single binary value, either 0 or 1,
which indicates the class or category to which the input data belongs.
6. Training Algorithm: The perceptron is typically trained using a supervised
learning algorithm such as the perceptron learning algorithm or
backpropagation. During training, the weights and biases of the perceptron are
adjusted to minimize the error between the predicted output and the true
output for a given set of training examples.
7. Overall, the perceptron is a simple yet powerful algorithm that can be used to
perform binary classification tasks and has paved the way for more complex
neural networks used in deep learning today.
Types of Perceptron:
1. Single layer: Single layer perceptron can learn only linearly separable
patterns.
2. Multilayer: Multilayer perceptrons can learn about two or more layers having a
greater processing power.
PERCEPTRON
Perceptron is a commonly used term in the arena of
Machine Learning and Artificial Intelligence. Being the
most basic component of Machine Learning and Deep
Learning technologies, the perceptron is the elementary
unit of an Artificial Neural Network.
Perceptron is a commonly used term in the arena of
Machine Learning and Artificial Intelligence. Being the
most basic component of Machine Learning and Deep
Learning technologies, the perceptron is the elementary
unit of an Artificial Neural Network.
What is Perceptron?
A perceptron is the smallest element of a neural network.
Perceptron is a single-layer neural network linear or a
Machine Learning algorithm used for supervised learning
of various binary classifiers.
It works as an artificial neuron to perform computations
by learning elements and processing them for detecting
the business intelligence and capabilities of the input
data.
A perceptron network is a group of simple logical
statements that come together to create an array of
complex logical statements, known as the neural network.
HUMAN NEURAL SYSTEM:
Components of a Perceptron:
When the values of input are similar to those desired for its
predicted output, then we can say that the perceptron has
performed satisfactorily. If there is any difference between
what was expected and obtained, then the weights will need
adjusting to limit how much these errors affect future
predictions based on unchanged parameters.
However, since the single-layer perceptron is a linear classifier
and it does not classify cases if they are not linearly separable.
So, due to the inability of the perceptron to solve problems
with linearly non-separable cases, the learning process will
never reach the point with all cases properly classified. The
inability was brought to light by Minsky & Papert in 1969.
Perceptron Function
Perceptron function ”f(x)” is generated by multiplying the
input ‘x’ with the learned weight coefficient ‘w’. The same can
be expressed through the following mathematical equation:
Advantages:
Disadvantages:
Future of Perceptron
Machine learning is an artificial intelligence technique that has
been rapidly evolving for many years. Perceptron has been
supporting the growth of artificial intelligence and machine
learning technologies even during its development phase. It
will continue to aid analytical behavior by processing data
through pattern recognition algorithms.
When the feed forward neural network gets simplified, it can appear as a single layer
perceptron.
This model multiplies inputs with weights as they enter the layer. Afterward, the
weighted input values get added together to get the sum. As long as the sum of the
values rises above a certain threshold, set at zero, the output value is usually 1, while if
it falls below the threshold, it is usually -1.
As a feed forward neural network model, the single-layer perceptron often gets used for
classification. Machine learning can also get integrated into single-layer perceptrons.
Through training, neural networks can adjust their weights based on a property called
the delta rule, which helps them compare their outputs with the intended values.
As a result of training and learning, gradient descent occurs. Similarly, multi-layered
perceptrons update their weights. But, this process gets known as back-propagation. If
this is the case, the network's hidden layers will get adjusted according to the output
values produced by the final layer.
Layers of feed forward neural network
Input layer:
The neurons of this layer receive input and pass it on to the other layers of the network.
Feature or attribute numbers in the dataset must match the number of neurons in the
input layer.
Output layer:
According to the type of model getting built, this layer represents the forecasted feature.
Hidden layer:
Input and output layers get separated by hidden layers. Depending on the type of
model, there may be several hidden layers.
There are several neurons in hidden layers that transform the input before actually
transferring it to the next layer. This network gets constantly updated with weights in
order to make it easier to predict.
Neuron weights:
Neurons get connected by a weight, which measures their strength or magnitude.
Similar to linear regression coefficients, input weights can also get compared.
Weight is normally between 0 and 1, with a value between 0 and 1.
Neurons:
Artificial neurons get used in feed forward networks, which later get adapted from
biological neurons. A neural network consists of artificial neurons.
Neurons function in two ways: first, they create weighted input sums, and second, they
activate the sums to make them normal.
Activation functions can either be linear or nonlinear. Neurons have weights based on
their inputs. During the learning phase, the network studies these weights.
Activation Function:
Sigmoid:
Tanh:
Only positive values are allowed to flow through this function. Negative values get
mapped to 0.
Function in feed forward neural network
Cost function
In a feed forward neural network, the cost function plays an important role. The
categorized data points are little affected by minor adjustments to weights and biases.
Thus, a smooth cost function can get used to determine a method of adjusting weights
and biases to improve performance.
Following is a definition of the mean square error cost function:
Image source
Where,
w = the weights gathered in the network
b = biases
n = number of inputs for training
a = output vectors
x = input
‖v‖ = vector v's normal length
Loss function
The loss function of a neural network gets used to determine if an adjustment needs to
be made in the learning process.
Neurons in the output layer are equal to the number of classes. Showing the differences
between predicted and actual probability distributions. Following is the cross-entropy
loss for binary classification.
Image source
As a result of multiclass categorization, a cross-entropy loss occurs:
The gradient gets adjusted by the parameter η, which also determines the step size.
Performance is significantly affected by the learning rate in machine learning.
Output units
In the output layer, output units are those units that provide the desired output or
prediction, thereby fulfilling the task that the neural network needs to complete.
There is a close relationship between the choice of output units and the cost function.
Any unit that can serve as a hidden unit can also serve as an output unit in a neural
network.
Advantages of feed forward Neural Networks
Machine learning can be boosted with feed forward neural networks' simplified
architecture.
Multi-network in the feed forward networks operate independently, with a
moderated intermediary.
Complex tasks need several neurons in the network.
Neural networks can handle and process nonlinear data easily compared to
perceptrons and sigmoid neurons, which are otherwise complex.
A neural network deals with the complicated problem of decision boundaries.
Depending on the data, the neural network architecture can vary. For
example, convolutional neural networks (CNNs) perform exceptionally well in
image processing, whereas recurrent neural networks (RNNs) perform well in
text and voice processing.
Neural networks need graphics processing units (GPUs) to handle large
datasets for massive computational and hardware performance. Several
GPUs get used widely in the market, including Kaggle Notebooks and Google
Collab Notebooks.
The architecture of a feedforward neural network consists of three types of layers: the input
layer, hidden layers, and the output layer. Each layer is made up of units known as neurons,
and the layers are interconnected by weights.
Input Layer: This layer consists of neurons that receive inputs and pass them on to the
next layer. The number of neurons in the input layer is determined by the dimensions of
the input data.
Hidden Layers:
These layers are not exposed to the input or output and can be considered as the
computational engine of the neural network. Each hidden layer's neurons take the
weighted sum of the outputs from the previous layer, apply an activation function, and
pass the result to the next layer. The network can have zero or more hidden layers.
Output Layer: The final layer that produces the output for the given inputs. The number
of neurons in the output layer depends on the number of possible outputs the network is
designed to produce.
Each neuron in one layer is connected to every neuron in the next layer, making this a fully
connected network. The strength of the connection between neurons is represented by weights,
and learning in a neural network involves updating these weights based on the error of the
output.
How Feedforward Neural Networks Work
The working of a feedforward neural network involves two phases: the feedforward phase and
the backpropagation phase.
Feedforward Phase: In this phase, the input data is fed into the network, and it
propagates forward through the network. At each hidden layer, the weighted sum of the
inputs is calculated and passed through an activation function, which introduces non-
linearity into the model. This process continues until the output layer is reached, and a
prediction is made.
Backpropagation Phase: Once a prediction is made, the error (difference between the
predicted output and the actual output) is calculated. This error is then propagated back
through the network, and the weights are adjusted to minimize this error. The process of
adjusting weights is typically done using a gradient descent optimization algorithm.
Activation Functions
Activation functions play a crucial role in feedforward neural networks. They introduce non-linear
properties to the network, which allows the model to learn more complex patterns. Common
activation functions include the sigmoid, tanh, and ReLU (Rectified Linear Unit).
Training Feedforward Neural Networks
Training a feedforward neural network involves using a dataset to adjust the weights of the
connections between neurons. This is done through an iterative process where the dataset is
passed through the network multiple times, and each time, the weights are updated to reduce
the error in prediction. This process is known as gradient descent, and it continues until the
network performs satisfactorily on the training data.
Applications of Feedforward Neural Networks
Feedforward neural networks are used in a variety of machine learning tasks including:
There are many applications for these neural networks. The following are a few of them.
5. Pattern recognition
6. Classification tasks
7. Regression analysis
8. Image recognition
9. Time series prediction
Despite their simplicity, feedforward neural networks can model complex relationships in data
and have been the foundation for more complex neural network architectures.
Challenges and Limitations
While feedforward neural networks are powerful, they come with their own set of challenges and
limitations. One of the main challenges is the choice of the number of hidden layers and the
number of neurons in each layer, which can significantly affect the performance of the network.
Overfitting is another common issue where the network learns the training data too well,
including the noise, and performs poorly on new, unseen data.
In conclusion, feedforward neural networks are a foundational concept in the field of neural
networks and deep learning. They provide a straightforward approach to modeling data and
making predictions and have paved the way for more advanced neural network architectures
used in modern artificial intelligence applications.
ANN is the part of the Deep learning where we will learn about the artificial
neurons . To understand this we have to understand about the working of the
neurons in the proper way.
In biology we understand that the neurons are used to accept the information of a
signals sensed by the organs and these organs sends the sensed data to our
brain so our brain can take the appropriate decisions based on the sensed
organs. According to the sensed results our brain do some operations,
calculations and give some appropriate answer/output. This output is followed by
our sensed organs.
If we want to implement all these brain functionality artificially then this type of
network is known as the Artificial Neural Network where we will take the single
node(which is a replica of the Neuron) and partition it into further two parts. First
part is known as Summation and second one is considered as a function is
known as the Activation Function.
Summation
This summation is used to collect all the neural signals along with there weights. For
example first neuron signal is x1 and their weight is ω1 so the first neuron signal would
be x1 ω1. Similarly we will calculate the neural values for the second neuron, third
neuron and so on. At last we will take the some of all the neurons .So the total
𝑥1𝜔1+𝑥2𝜔2+𝑥3𝜔3−−−−−−−𝑥𝑛𝜔𝑛
summation weight is calculated as
X1ω1+x2ω2+x3ω3−−−−−−−xnωn
Activation Function
Activation function is used to generate or define a particular output for a given node
based on the input is getting provided. That mean we will apply the activation function
𝑌=𝑓(Σ𝑥𝑖𝜔𝑖+𝐵𝑖𝑎𝑠)
on the summation results.
Y=f(Σxiωi+Bias)
Activation Function are multiple types like Linear Activation Function, Heaviside
Activation Function, Sigmoid Function, Tanh function and RELU activation Function.
In deep learning, Activation functions are the very important part of the any neural
network because it is able to perform a very complicated and critical work like an object
detection, image classification, language translation, etc. which are necessary to
address by using an activation function. We cant imagine to perform these tasks without
using a deep learning.
But in this blog we will put more concentrate on the Sigmoid Activation Function, Tanh
Activation Function and Relu (Rectified Linear Unit) Activation Function because these
Activation Functions are mostly used in ANN (Artificial Neural Network) and deep
learning.
𝑌=𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛(∑(𝑤𝑒𝑖𝑔ℎ𝑡𝑠∗𝑖𝑛𝑝𝑢𝑡+𝑏𝑖𝑎𝑠))
Problem.
Y=Activationfunction(∑(weights∗input+bias))
In the nutshell, a neural network is a very dominant method and technology in machine
learning which mimics how a brain perceives and operates.
y=tanh(x)
fig: Hyberbolic Tangent Activation function
Advantage of TanH Activation function
Here negative values are also considered whereas in the sigmoid minimum range is 0
but in Tanh, the minimum range is -1. This is why the Tanh activation function is also
known as the zero centered activation function.
Disadvantage
Also facing the same issue of Vanishing Gradient Problem like a sigmoid function.
Range: 0 to infinity
Equation can be created by:
{ xi if x >=0
0 if x <=0 }
Advantage of ReLu:
Here all the negative values are converted into the 0 so there are no negative
values are available.
Maximum Threshold values are Infinity, so there is no issue of Vanishing
Gradient problem so the output prediction accuracy and there efficiency is
maximum.
Speed is fast compare to other activation function
Comparison
Introduction
Fast-Food Problem: A term used to describe the challenges of optimizing and updating
model parameters efficiently during training in deep learning. The analogy is drawn to
the fast-food industry, where quick and efficient service is crucial.
Components of the Fast-Food Problem
1. Efficiency in Training:
o Batch Processing: Processing data in mini-batches to speed up training.
o Parallelism: Utilizing GPUs or TPUs for parallel processing to handle multiple
operations simultaneously.
2. Optimization Techniques:
o Gradient Descent Variants: Different methods to update model parameters
efficiently.
o Learning Rate Scheduling: Dynamically adjusting the learning rate during
training for improved convergence.
3. Model Architecture and Complexity:
o Efficient Network Design: Creating architectures that reduce computational
load while maintaining performance.
o Parameter Sharing and Pruning: Techniques to reduce model complexity and
speed up training and inference.
Gradient descent is used to minimize the loss function by iteratively adjusting the model
parameters. The efficiency of gradient descent can be improved through various techniques:
Stochastic Gradient Descent (SGD): Instead of using the entire dataset, SGD updates
model parameters using a single data point or a mini-batch, speeding up the training
process.
Momentum: Momentum helps accelerate gradients vectors in the right directions, thus
leading to faster converging.
Adaptive Methods: Algorithms like Adam and RMSprop adjust the learning rates based
on the gradients, providing a more efficient and stable training process.
Learning Rates
The learning rate (η\etaη) determines the step size at each iteration while moving towards a
minimum of the loss function. The choice of learning rate is crucial:
Gradient Descent
Overview
1. Momentum:
o Accelerates gradient vectors in the right direction, leading to faster convergence.
o Formula: vt=γvt−1+η∇J(θ)v_t = \gamma v_{t-1} + \eta \nabla J(\theta)vt=γvt−1
+η∇J(θ)
o Update Rule: θ=θ−vt\theta = \theta - v_tθ=θ−vt
2. Adaptive Methods:
o AdaGrad: Adapts the learning rate based on the past gradients.
o RMSprop: Modifies AdaGrad to reduce aggressive, monotonically decreasing
learning rates.
o Adam: Combines the advantages of both AdaGrad and RMSprop.
Learning Rates
Importance of Learning Rates
Learning Rate (η\etaη): Determines the step size at each iteration while moving
towards a minimum of the loss function.
Choosing an appropriate learning rate is crucial for efficient training.
Practical Implementation
Example Code for Gradient Descent with Learning Rate Scheduling
import torch
import torch.nn as nn
import torch.optim as optim
Summary
The Fast-Food Problem in deep learning emphasizes the need for efficient training and
optimization techniques to handle large datasets and complex models.
Key strategies include using mini-batches, parallel processing, gradient descent
variants, learning rate scheduling, and efficient network architectures.
Practical implementation involves combining these strategies to achieve faster and more
stable training processes.
Error/Cost Function
where ‘E’ is total error, and ‘p’ represents all training patterns.
An equivalent term for E in earlier equation is Sum-of-squares
error. A normalized version of this equation is given by the
Mean Squared Error (MSE) equation:
where ‘P’ and ’N’ are the total number of training patterns
and output nodes, respectively. It is the error of both
previous equations, that gradient descent attempts to
minimize (not strictly true if weights are changed after each
input pattern is submitted to the network.
Error over a given training pattern is commonly expressed
in terms of the Total Sum of Squares (‘tss’) error, which is
simply equal to the sum of all squared errors over all output
nodes and all training patterns.
‘The negative of the derivative of the error function is
required in order to perform Gradient Descent Learning’.
The derivative of our equation(which measures error for a
given pattern ‘p’) above, with respect to a particular weight
‘wij’ sub ‘x’, is given by the chain rule as:
where ‘aj’ sub ‘z’ is activation of the node in the output layer that
corresponds to weight ‘wij’ sub x (subscripts refer to particular
layers of nodes or weights, and the ‘sub-subscripts’ simply refer
to individual weights and nodes within these layers). It follows
that:
and
Thus, the derivative of the error over an individual training
pattern is given by the product of the derivatives of our prior
equation:
UNIT-1
TWO MARKS
1) What is deep learning?
Deep learning is a part of machine learning with an algorithm inspired by the structure
and function of the brain, which is called an artificial neural network. In the mid-1960s,
Alexey GrigorevichIvakhnenko published the first general, while working on deep
learning network. Deep learning is suited over a range of fields such as computer vision,
speech recognition, natural language processing, etc.
provided. Input and output data are labeled to provide a learning basis for future
data processing.
Unsupervised procedure does not need labeling information explicitly, and the
operations can be carried out without the same. The common unsupervised
learning method is cluster analysis. It is used for exploratory data analysis to
Computer vision
Machine translation
Sentiment analysis
Potential for Innovation-Continuous Training Time-Can take a long time to train deep
advancements and new architectures networks.
keep improving capabilities.
Machine Learning
Mathematics
Python Programming
learning?
Auto Encoders
Input Layer
The input layer contains input neurons which send information to the hidden
layer.
Hidden Layer
Output Layer
The activation function is used to introduce nonlinearity into the neural network so that it
can learn more complex function. Without the Activation function, the neural network
would be only able to learn function, which is a linear combination of its input data.
Activation function translates the inputs into outputs. The activation function is
responsible for deciding whether a neuron should be activated or not. It makes the
decision by calculating the weighted sum and further adding bias with it. The basic
purpose of the activation function is to introduce non-linearity into the output of a
neuron.
Purpose of Activation Functions
1. Introducing Non-Linearity:
o Without activation functions, neural networks would be limited to learning
only linear relationships. Activation functions allow networks to model
complex, non-linear relationships by introducing non-linearity into the
network.
2. Enabling Network Depth:
o Activation functions enable deep networks to learn hierarchical
representations. By stacking multiple layers with activation functions,
networks can learn features at various levels of abstraction.
3. Squashing Output:
o Activation functions often squash the output into a specific range, making
it easier to handle and interpret. For example, sigmoid and tanh functions
squash outputs to a range between 0-1 and -1-1, respectively.
4. Providing Differentiability:
o Activation functions are designed to be differentiable, allowing for the
backpropagation algorithm to compute gradients and update weights
effectively during training.
Sigmoid
Tanh
ReLU
Leaky ReLU
Softmax
Swish
13.What is a perceptron?
A perceptron is similar to the actual neuron in the human brain. It receives inputs from
various entities and applies functions to these inputs, which transform them to be the
output.
A perceptron is mainly used to perform binary classification where it sees an input,
computes functions based on the weights of the input, and outputs the required
transformation.
2. Provide inputs
3. Calculate outputs
5. Repeat steps 2 to 4
parameters
Less efficient with large data Highly efficient with large datasets
PART-B
1. What are the Mechanics of Machine Learning?
2. What is deep learning? Explain its uses and application.
3. Explain the structure of Feed Forward Neural Network with
diagram.
4. What are the components of a Neural Network? Explain
with diagram.
5. Explain Activation Functions with a diagram.
6. Explain Fast-Food Problem with proper examples.
7. Explain in detail about the Gradient Descent Delta Rule and
Learning Rates.
8. Compare i) Supervised Learning ii) Unsupervised Learning.
9. Distinguish between Machine Learning and Deep Learning.
10. Discuss Perceptron in detail.
11. Describe the Following:
i) Linear Perceptron
ii) MultiLayer Perceptron
12. Explain Sigmoid, Tanh and ReLU.
13. What is Deep Learning? List the major architectures of Deep networks.
14. Explain the following terms denoting their notations and equations (where necessary)
with respect to deep neural networks:((Any-4))
1) Connection weights and Biases 2) Epoch 3)Layers and Parameters 4) Activation
Functions 5) Loss/Cost Functions 6) Learning rate.
15. What are the applications of Machine Learning? and When it is used.