Unit I Introduction to ANN
Architecture of Artificial Neural Network (ANN)
Architecture of Artificial Neural Network (ANN)
An Artificial Neural Network (ANN) is inspired by the structure of the human
brain and consists of interconnected layers of artificial neurons. The architecture
of an ANN can be divided into three main layers:
1. Input Layer
o This is the first layer of the ANN.
o It receives raw data or features as input.
o Each neuron in this layer represents an individual feature of the
input data.
2. Hidden Layer(s)
o These layers perform the computation and extract patterns from the
input data.
o Each hidden layer consists of multiple neurons, which apply
activation functions to introduce non-linearity.
o The more hidden layers an ANN has, the deeper the network (Deep
Neural Networks - DNNs).
3. Output Layer
o This layer provides the final result of the network.
o The number of neurons in this layer depends on the type of
problem:
▪ Regression: A single neuron outputs a continuous value.
▪ Binary Classification: A single neuron with a sigmoid
activation function outputs a probability.
▪ Multi-class Classification: Multiple neurons with a softmax
activation function provide class probabilities.
Working of ANN
1. Forward Propagation:
o Input data passes through the network, layer by layer, until the
output is generated.
o Each neuron processes the weighted sum of its inputs and applies
an activation function.
2. Loss Calculation:
o The difference between the predicted and actual values is
computed using a loss function.
3. Backpropagation & Weight Update:
o The error is propagated backward using an optimization algorithm
(e.g., Gradient Descent).
o Weights are updated to reduce the error and improve accuracy.
Common Activation Functions in ANN
• Sigmoid: Used for binary classification.
• ReLU (Rectified Linear Unit): Used in hidden layers to introduce non-
linearity.
• Tanh: Similar to sigmoid but ranges from -1 to 1.
• Softmax: Used in the output layer for multi-class classification.
Neural Network Activation Function
What is an Activation Function?
• An activation function is a mathematical function applied to the output
of a neuron.
• It introduces non-linearity, allowing the neural network to learn complex
patterns.
• Without activation functions, the network would behave like a linear
regression model, even with multiple layers.
• It decides whether a neuron should be activated by calculating the
weighted sum of inputs and adding a bias term.
• This helps the model make complex decisions and improve prediction
accuracy.
Types of Activation Functions
1. Linear Activation Function
o The output is directly proportional to the input.
o Limitation: Cannot handle complex patterns, so it is rarely used in
deep learning.
2. Sigmoid Function
o Output values range between 0 and 1, making it useful for binary
classification (e.g., spam detection).
o Limitation: Can cause vanishing gradient problem, making
training slow.
3. Tanh (Hyperbolic Tangent) Function
o Output ranges between -1 and 1, making it better than sigmoid for
handling negative values.
o Limitation: Also suffers from the vanishing gradient problem.
4. ReLU (Rectified Linear Unit) Function
o Most commonly used activation function in deep learning.
o Allows faster training and avoids the vanishing gradient problem.
o Limitation: Some neurons can become inactive (dead neurons),
always outputting 0.
5. Softmax Function
o Used for multi-class classification problems.
o Converts raw outputs into probabilities that sum to 1, helping in
predicting multiple categories (e.g., image classification).
Comparison of (BNN) and (ANN)
Biological Neural Network (BNN) Artificial Neural Network (ANN)
1. Natural neural network in the 1. Computer-based model inspired
human brain. by BNN.
2. Made of biological cells. 2. Made of artificial nodes
(mathematical functions).
3. Slower but highly parallel. 3. Faster but limited by hardware.
4. Learns from experience and 4. Trained using algorithms and
adapts over time. large datasets.
5. Very energy-efficient, 5. Requires high computational
consumes less power. power.
6. Extremely complex with 6. Simpler, with fewer neurons and
billions of neurons. layers.
7. Can recover from damage 7. Prone to errors; requires
(neuroplasticity). retraining.
8. Can handle multiple tasks at 8. Designed for specific tasks like
once. classification or prediction.
9. Reliability = Robust. 9. Reliability = very vulnerable
Biological Neural Network (BNN)
Structure of BNN:
1. Neurons (Nerve Cells):
o The basic unit of the nervous system.
o Responsible for processing and transmitting information.
2. Dendrites:
o Branch-like structures that receive signals from other neurons.
o Carry the signal to the cell body.
3. Cell Body (Soma):
o Processes the received signals.
o If the signal strength is sufficient, it triggers an action potential.
4. Axon:
o A long fiber that carries the signal away from the cell body.
o Covered with a myelin sheath for faster transmission.
5. Synapse:
o A gap between two neurons where signals are transmitted.
o Uses chemical neurotransmitters to pass the signal to the next
neuron.
Working of BNN:
1. Signal Reception: Dendrites receive signals from other neurons.
2. Processing: The cell body integrates all incoming signals.
3. Signal Transmission: If the signal is strong, an electrical impulse (action
potential) is generated.
4. Propagation: The impulse travels down the axon.
5. Communication: At the synapse, neurotransmitters are released to
transfer the signal to the next neuron.
McCulloch & Pitts Model
The McCulloch-Pitts (M-P) model is the first mathematical model of an
artificial neuron, proposed in 1943. It simplifies how a biological neuron
processes information using binary values (0 and 1).
Structure of M-P Model:
1. Inputs (x1,x2,...,xn): Represent signals received from other neurons.
2. Weights (w1,w2,...,wn): Each input has a weight that defines its
importance.
3. Summation Function: Adds all weighted inputs.
4. Threshold (θ): A fixed value that decides whether the neuron activates.
5. Activation Function: If the total sum is greater than or equal to θ,
output = 1; otherwise, output = 0.
• Weights (w1=1,w2=1w_1 = 1, w_2 = 1w1=1,w2=1), Threshold θ=2θ =
2θ=2.
• The neuron activates only when both inputs are 1, mimicking the AND
logic gate.
1. McCulloch-Pitts (M-P) Model
• Proposed by Warren McCulloch and Walter Pitts in 1943.
• It is the first mathematical model of a neuron, inspired by the working
of the human brain.
• Works on binary inputs (0 and 1) and fixed weights (1 or -1).
• Uses a step activation function, where output is either 0 or 1 based on a
threshold.
• Cannot learn from data since weights are fixed.
• Used for implementing simple logic gates like AND, OR, and NOT but
cannot solve complex problems.
2. Perceptron Model
• Introduced by Frank Rosenblatt in 1958 as an improvement over the M-
P model.
• Uses real-valued inputs and weights, making it more flexible.
• Applies a step function as an activation function for binary classification.
• Can learn from data using a weight update rule, allowing it to adjust
based on errors.
• Only works for linearly separable problems, meaning it cannot classify
problems like XOR.
• Considered the foundation of modern neural networks.
3. Adaline (Adaptive Linear Neuron) Model
• Developed by Bernard Widrow and Marcian Hoff in 1960 as an
enhancement to the perceptron.
• Uses real-valued inputs and weights, making it more powerful.
• Unlike the perceptron, it applies a linear activation function before
deciding the output.
• Uses gradient descent to optimize weights, leading to better learning
efficiency.
• Can only solve linearly separable problems, but it improves training by
adjusting weights before activation.
• Forms the basis for advanced neural networks like Multilayer
Perceptron (MLP) and deep learning models.