0% found this document useful (0 votes)

22 views11 pages

A) Explanation of Two Tensor Operations With Examp

Uploaded by

hijaw72603

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views11 pages

A) Explanation of Two Tensor Operations With Examp

Uploaded by

hijaw72603

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

a) Explanation of Two Tensor Operations with Examples

1. Tensor Addition
Tensor Addition is an element-wise operation performed between two tensors of the same
shape. For two tensors and , the sum is calculated as .
Example:
If

and

Then

The tensors must have identical dimensions to be added. [1] [2]

2. Hadamard Product (Element-wise Multiplication)

The Hadamard Product is the element-wise multiplication of two tensors (or matrices) of the
same shape, producing a tensor where each element is the product of the corresponding
elements in the original tensors. It is NOT matrix multiplication.
Example:
If

and
Then

The operation is element-wise—each element in the resulting tensor is the product of the
corresponding elements from the input tensors. [3] [2] [1]

b) Matrix Calculations
Given:

Addition ( ):

Subtraction ( ):

Hadamard Product ( ):

These operations are performed element-wise and require both tensors/matrices to have the
same dimensions. [4] [3] [1]
⁂

Explain the following activation functions with

their formulas, advantages and limitations. What
is vanishing gradient and dyeing ReLU problem? 2.
a) Sigmoid
c) ReLU
b) Tanh
d) Leaky ReLU
Here's a detailed explanation of requested activation functions—formula, advantages, and
limitations—and a discussion about vanishing gradient and dying ReLU problems.
a) Sigmoid Activation Function
Formula:

Advantages:
Smooth, differentiable, and provides a gradient everywhere.
Outputs values between 0 and 1, making it useful for probability and binary classification
tasks.
Historically popular and easy to interpret as “probability”. [11] [12] [13]
Limitations:
Not zero-centered; all outputs are positive, which can slow network convergence.
Prone to the vanishing gradient problem: for large input values (positive or negative),
gradients become very small, causing weights to update slowly or not at all.
Can lead to saturation where further learning nearly stops for large activations. [12] [14] [11]

b) Tanh (Hyperbolic Tangent) Function

Formula:

Advantages:
Output is zero-centered, ranging from -1 to 1.
Enables mapping of inputs to strongly negative, neutral, or positive outputs.
Facilitates faster convergence than sigmoid for many tasks. [15] [11]
Limitations:
Still suffers from the vanishing gradient problem (like sigmoid): gradients vanish for inputs
far from zero.
Can slow down learning in deep networks if not managed properly. [11] [15]

c) ReLU (Rectified Linear Unit) Function

Formula:

Advantages:
Simple, non-linear function with fast computation.
Allows for quick convergence by keeping (positive) gradients in active regions.
Helps mitigate the vanishing gradient problem compared to sigmoid/tanh. [15]
Limitations:
Not zero-centered.
Can cause the dying ReLU problem: neurons may get stuck outputting only zero (never
activate) if inputs are negative and never update again, effectively becoming “dead”. [15]

d) Leaky ReLU
Formula:

where is a small constant (e.g., 0.01).

Advantages:
Fixes the dying ReLU problem by allowing a small, non-zero gradient when the unit is not
active.
Like ReLU, computationally efficient and non-linear.
Allows some negative values to pass through. [11] [15]
Limitations:
The negative slope ( ) is chosen arbitrarily and not learned by default (unless using
Parametric ReLU).
May still lead to instability if is not selected properly. [11] [15]

Vanishing Gradient Problem

This occurs primarily with sigmoid and tanh functions. When propagating error gradients through
many layers, the gradients shrink (or “vanish”) exponentially as they’re multiplied by small
derivatives from each layer. As a result, early layers in deep networks learn extremely slowly or
not at all, making deep learning ineffective. Modern solutions include switching to ReLU family
functions or specialized architectures. [15] [11]

Dying ReLU Problem

The dying ReLU problem happens when some ReLU neurons only output zero for any input
(because their weights shifted during training to produce negative outputs only). Since the
gradient of ReLU is zero for negative values, such neurons never update afterward, causing
information loss and reduced model capacity. Leaky ReLU, Parametric ReLU, and similar variants
solve this by maintaining a small non-zero slope for negative inputs. [15]
Activation Formula Main Advantage Main Limitation

Probabilistic output, smooth Vanishing gradient, not zero-

Sigmoid
gradient centered

Zero-centered, strong
Tanh Still vanishing gradient
negative/positive

Simplicity, mitigates vanishing

ReLU Dying ReLU problem
gradient

Leaky if ; Fixes dying ReLU, small negative

Alpha is arbitrary
ReLU else gradient

Gradient-Based Optimization in Deep Learning

Gradient-based optimization is a foundational technique in deep learning for minimizing loss
(cost) functions and updating model parameters (weights and biases) to improve performance.
The most common method is gradient descent, which iteratively adjusts parameters in the
direction opposite to the gradient of the loss function with respect to the parameters. [17] [18] [19]

How Gradient Descent Works

1. Initialize parameters (randomly or otherwise).
2. Compute loss (“cost”) for current parameters on training data.
3. Calculate gradients (partial derivatives) of the loss with respect to each parameter.
4. Update parameters:

Where $ w $ are parameters, $ \nabla L(w) $ is the gradient, and $ \eta $ is the learning
rate.
5. Repeat steps 2-4 until convergence (loss stops changing significantly).
Variants like Stochastic Gradient Descent (SGD), Mini-batch Gradient Descent, and
momentum/adaptive algorithms exist, but all follow the above core principles. [18]

Effect of Learning Rate

1. Small Learning Rate ( is small)

Behavior: Updates are tiny; parameter changes are slow and cautious, possibly taking a
long time (many epochs) to reach the minimum.
Advantage: Precise and less likely to overshoot the minimum.
Limitation: Training can be very slow and may get stuck in small local minima or plateaus.
2. Large Learning Rate ( is large)
Behavior: Updates are large; parameter changes are drastic.
Advantage: Fast initial progress—can rapidly escape shallow minima.
Limitation: May overshoot the minimum, causing oscillation or divergence. The model may
never settle on a good solution.

Diagram Descriptions

Small vs. Large Learning Rate Illustration

Small Learning Rate: Shows a smooth, slow path that spirals or steps gently down to the
minimum.
Large Learning Rate: Shows big, skipping steps that might “jump” over the minimum and
possibly oscillate or diverge.
Below is a representation. The actual visual would look like a loss landscape (a bowl) with two
sets of arrows:
Red for large, erratic steps ("zig-zagging"), possibly overshooting.
Blue for small, steady steps, moving slowly but steadily to the bottom. [19] [17] [18]

Summary Table
Learning Rate Effect Typical Path on Loss Curve

Small ( ) Slow, precise convergence Smooth, gradual descent

Large ( ) Fast, risk of divergence/oscillation Big jumps, possibly unstable

In practice, choosing the right learning rate is crucial for effective deep neural network
optimization—too small wastes time, too large prevents learning stability. [17] [18] [19]
⁂

Definition of a Perceptron
A Perceptron is a type of artificial neuron and the simplest neural network that can perform
binary classification. It takes several weighted inputs, sums them, and passes the result through
an activation function (typically a step function) to produce a binary output: 0 or 1. [24] [25] [26]
Perceptron Model (Mathematical Formulation)

where $ w_1, w_2 $ are weights, $ x_1, x_2 $ are inputs, and $ b $ is a bias. [26] [24]

Perceptron Learning the OR Gate

OR Gate Truth Table

Output (OR)

0 0 0

0 1 1

1 0 1

1 1 1

Initialization for Learning

Initial Weights: $ W_1 = 0 $, $ W_2 = 1 $
Threshold ($ \theta $): 1
Learning Rate ($ \eta $): 0.6
Bias ($ b $) = $ -\theta $ = -1

Step 1: Calculate Weighted Sum

Step 2: Activation (Binary Step Function)

Step 3: Learning Rule

Training Example
Let's step through one epoch:
For $ x_1 = 0, x_2 = 1 $:

Activation: Output = 1 (correct)

For $ x_1 = 0, x_2 = 0 $:

Activation: Output = 0 (correct)

For $ x_1 = 1, x_2 = 0 $:

Output = 0, but Target = 1 (error)

Update $ W_1 $:

$ W_2 $ remains 1 (since $ x_2=0 $)

For $ x_1 = 1, x_2 = 1 $:

Output = 1 (correct)
The weights continue adjusting after each epoch until all outputs match OR gate behavior.

Binary Activation Function

The binary step activation function outputs 1 if input is above threshold and 0 otherwise. It is
suited for logical gates like AND, OR.

Applications of Perceptron
Binary Classification: Spam detection (spam/not spam), simple pattern detection.
Logic gate simulation: Implementation of logical functions when data are linearly separable.
Feature selection: As a building block for more complex networks in early stages. [25] [26]
Limitation: Single-layer perceptrons cannot learn non-linearly separable functions (like XOR
gate).

In summary, a Perceptron with appropriate weights can learn the OR gate using binary
activation, adjusting weights via the perceptron learning rule, and is mainly used for simple
binary classification tasks where classes are linearly separable. [24] [25] [26]
⁂

Neural Network Sketch and Explanation of Backpropagation

Neural Network Architecture

Input layer: 2 nodes (say, and )
Hidden layer: 3 nodes (say, , , )
Output layer: 1 node (say, )
The nodes in the input layer connect to all nodes in the hidden layer with weights , and each
hidden node has its bias . The hidden nodes connect to the output node with weights and
bias .

Input Layer Hidden Layer Output Layer

x1 -----\ h1 ----\
| / \
x2 -----|------ h2 ------- y (output)
| \ /
\------- h3 ----*/

Backpropagation Algorithm
Backpropagation is the algorithm used to train neural networks by minimizing the error (loss)
between the predicted output and actual output using gradient descent.

Key Steps in Backpropagation

1. Forward Pass:
Compute output from input through each layer.
For hidden neurons:

For output neuron:

is the activation function (e.g., sigmoid, ReLU).

2. Calculate Error:

3. Backward Pass (Gradient Calculation):

Compute gradients of the error with respect to weights and biases using the chain rule.
For output layer weights :

For hidden layer weights :

4. Update Weights and Biases:

Using learning rate :

Biases updated similarly:

Summary of Weight and Bias Updates in Backpropagation

Parameter Update Rule Description

Output layer Adjust weights from hidden to output

$ v_j = v_j - \eta (y - y_{true}) f'(net_o) h_j $
weight layer

Hidden layer $ w_{ij} = w_{ij} - \eta [\delta_o v_j f'(net_j)] Adjust weights from input to hidden
weight x_i $ layer

Output bias $ b_o = b_o - \eta (y - y_{true}) f'(net_o) $ Update output layer bias

Hidden bias $ b_j = b_j - \eta [\delta_o v_j f'(net_j)] $ Update hidden layer bias

This process repeats iteratively for multiple epochs over the training data, gradually reducing
error by improving weights and biases, enabling the network to learn to map inputs to desired
outputs effectively.

If needed, I can provide a detailed numerical example or code for backpropagation as well.
⁂

1. https://blog.rlamsal.com.np/basic-operations-on-tensors/
2. https://blog.langformers.com/basic-operations-on-tensors/
3. https://www.machinelearningmastery.com/introduction-to-tensors-for-machine-learning/
4. https://forums.developer.nvidia.com/t/implementing-hadamard-operations-with-tensors-in-cuda-c/328
910
5. https://www.youtube.com/watch?v=_MaVzNUjMPk
6. https://msbrijuniversity.ac.in/assets/uploads/newsupdate/ALGEBRA OF TENSORS.pdf
7. https://www.sciencedirect.com/topics/engineering/tensor
8. https://en.wikipedia.org/wiki/Hadamard_product_(matrices)
9. https://www.youtube.com/watch?v=fC46YoysPDU
10. https://u-next.com/blogs/machine-learning/what-is-a-tensor/
11. https://www.v7labs.com/blog/neural-networks-activation-functions
12. https://www.shiksha.com/online-courses/articles/all-that-you-need-to-know-about-sigmoid-function/
13. https://www.coursera.org/articles/sigmoid-activation-function
14. https://www.geeksforgeeks.org/machine-learning/derivative-of-the-sigmoid-function/
15. https://www.geeksforgeeks.org/machine-learning/activation-functions-neural-networks/
16. https://www.linkedin.com/pulse/top-10-activation-functions-advantages-disadvantages-dash
17. https://www.aionlinecourse.com/ai-basics/gradient-based-optimization
18. https://neptune.ai/blog/deep-learning-optimization-algorithms
19. https://www.mastersindatascience.org/learning/machine-learning-algorithms/gradient-descent/
20. https://www.geeksforgeeks.org/dsa/optimization-techniques-for-gradient-descent/
21. http://www.cedar.buffalo.edu/~srihari/CSE676/4.2 Gradient-based Optimization.pdf
22. https://arxiv.org/abs/2309.04877
23. https://studyglance.in/dl/display.php?tno=4&topic=Gradient-Based-Learning
24. https://www.simplilearn.com/tutorials/deep-learning-tutorial/perceptron
25. https://www.geeksforgeeks.org/machine-learning/what-is-perceptron-the-simplest-artificial-neural-net
work/
26. https://en.wikipedia.org/wiki/Perceptron
27. https://www.analytixlabs.co.in/blog/what-is-perceptron/
28. https://www.w3schools.com/ai/ai_perceptrons.asp
29. https://www.mathworks.com/help/deeplearning/ug/perceptron-neural-networks.html
30. https://www.v7labs.com/blog/neural-network-architectures-guide
31. https://www.quantstart.com/articles/introduction-to-artificial-neural-networks-and-the-perceptron/
32. https://www.ijimai.org/journal/sites/default/files/files/2016/02/ijimai20164_1_5_pdf_30533.pdf
33. https://web.stanford.edu/~jurafsky/slp3/7.pdf
34. https://en.wikipedia.org/wiki/Neural_network_(machine_learning)
35. https://www.coursera.org/articles/neural-network-architecture
36. https://www.geeksforgeeks.org/machine-learning/introduction-to-ann-set-4-network-architectures/
37. https://github.com/kennethleungty/Neural-Network-Architecture-Diagrams

DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Activation Functions
No ratings yet
Activation Functions
11 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Neural Network and Deep Learning - Unit 1
No ratings yet
Neural Network and Deep Learning - Unit 1
20 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
ANNs
No ratings yet
ANNs
57 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
Neural Networks - V Unit
No ratings yet
Neural Networks - V Unit
43 pages
Part (A) - Differences Between Scalars, Vectors, Ma
No ratings yet
Part (A) - Differences Between Scalars, Vectors, Ma
11 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Unit V
No ratings yet
Unit V
25 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Activation Function
No ratings yet
Activation Function
6 pages
Deep Learning & Neural Networks Guide
No ratings yet
Deep Learning & Neural Networks Guide
87 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
08 Practical Aspects of Deep Learning 2
No ratings yet
08 Practical Aspects of Deep Learning 2
100 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
15 pages
DL Assi02
No ratings yet
DL Assi02
9 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
50 pages
What Are The Activation Functions, How Do I Deter...
No ratings yet
What Are The Activation Functions, How Do I Deter...
3 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Module 2
No ratings yet
Module 2
13 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Soft - Computing - 2 With Numericals
No ratings yet
Soft - Computing - 2 With Numericals
64 pages
Neural Network Notes
No ratings yet
Neural Network Notes
8 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Jntuk R20 ML Unit-V
No ratings yet
Jntuk R20 ML Unit-V
19 pages
Module 1
No ratings yet
Module 1
23 pages
Perceptron Basics in SciML
No ratings yet
Perceptron Basics in SciML
42 pages
UNIT1 Perceptron MLP
No ratings yet
UNIT1 Perceptron MLP
26 pages
Module 1 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
100% (1)
Module 1 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
18 pages
Handwritten Notes - Unit 1,2
No ratings yet
Handwritten Notes - Unit 1,2
9 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Slide 2
No ratings yet
Slide 2
35 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
73 pages
Activation Function
No ratings yet
Activation Function
34 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
3 pages
Session NN
No ratings yet
Session NN
32 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
43 pages
Architectures Discription
No ratings yet
Architectures Discription
75 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Unit 2
No ratings yet
Unit 2
18 pages
Neural Networks & Gradient Descent
No ratings yet
Neural Networks & Gradient Descent
77 pages
Unit 5
No ratings yet
Unit 5
32 pages
Artificial Neural Artificial Neural Networks
No ratings yet
Artificial Neural Artificial Neural Networks
40 pages
Unit 2 - Activation Function - PR
No ratings yet
Unit 2 - Activation Function - PR
22 pages
NNDL
No ratings yet
NNDL
96 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
4 NN
No ratings yet
4 NN
25 pages
Ed Msme Labour Laws 100 MCQ
No ratings yet
Ed Msme Labour Laws 100 MCQ
2 pages
Development Proposal
No ratings yet
Development Proposal
3 pages
Software Testing Evaluation
No ratings yet
Software Testing Evaluation
1 page
Software Testing Answers
No ratings yet
Software Testing Answers
3 pages
Neural Networks & Algorithms Guide
No ratings yet
Neural Networks & Algorithms Guide
10 pages
Neural Networks and Deep Learning Notes
No ratings yet
Neural Networks and Deep Learning Notes
88 pages
Back-Propagation Algorithm
No ratings yet
Back-Propagation Algorithm
51 pages
CNN Training Tricks with PyTorch
No ratings yet
CNN Training Tricks with PyTorch
19 pages
Adobe Scan Dec 17, 2023
No ratings yet
Adobe Scan Dec 17, 2023
1 page
1cO1CO2: A CO1CO1Co1
No ratings yet
1cO1CO2: A CO1CO1Co1
4 pages
Single vs Multi-Layer Perceptrons
No ratings yet
Single vs Multi-Layer Perceptrons
57 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Computational & Artificial Neroscience
No ratings yet
Computational & Artificial Neroscience
6 pages
Unit 1
No ratings yet
Unit 1
25 pages
Neural Networks Complete Guide
No ratings yet
Neural Networks Complete Guide
3 pages
7 NN Apr 28 2021
No ratings yet
7 NN Apr 28 2021
81 pages
UNIT 6.machine Learning
No ratings yet
UNIT 6.machine Learning
34 pages
Neural Networks
No ratings yet
Neural Networks
3 pages
Backpropagation
No ratings yet
Backpropagation
4 pages
Experiment 1
No ratings yet
Experiment 1
7 pages
Neural Network Essentials
100% (1)
Neural Network Essentials
26 pages
Neural Networks: Introduction & Types
No ratings yet
Neural Networks: Introduction & Types
9 pages
Module 2 Convolutional Neural Network
No ratings yet
Module 2 Convolutional Neural Network
20 pages
Deep Learning Syllabus
100% (2)
Deep Learning Syllabus
2 pages
Long Short-Term Memory Networks PDF
No ratings yet
Long Short-Term Memory Networks PDF
22 pages
DL Unit 1
No ratings yet
DL Unit 1
19 pages
LSTM PPT
No ratings yet
LSTM PPT
22 pages
Generative AI Complete Questions
No ratings yet
Generative AI Complete Questions
3 pages
Final Exam ANNFL 2015-1
No ratings yet
Final Exam ANNFL 2015-1
9 pages
Unit IV Artificial Neural Networks
No ratings yet
Unit IV Artificial Neural Networks
25 pages
NN Assignment PDF
No ratings yet
NN Assignment PDF
3 pages
CNN Models for Computer Vision
No ratings yet
CNN Models for Computer Vision
14 pages
Python Neural Network Assignment
No ratings yet
Python Neural Network Assignment
2 pages
Lecture07. ANN (Chapter 10-2)
No ratings yet
Lecture07. ANN (Chapter 10-2)
26 pages

A) Explanation of Two Tensor Operations With Examp

Uploaded by

A) Explanation of Two Tensor Operations With Examp

Uploaded by

a) Explanation of Two Tensor Operations with Examples

The tensors must have identical dimensions to be added. [1] [2]

2. Hadamard Product (Element-wise Multiplication)

Explain the following activation functions with

b) Tanh (Hyperbolic Tangent) Function

c) ReLU (Rectified Linear Unit) Function

where is a small constant (e.g., 0.01).

Vanishing Gradient Problem

Dying ReLU Problem

Probabilistic output, smooth Vanishing gradient, not zero-

Simplicity, mitigates vanishing

Leaky if ; Fixes dying ReLU, small negative

Gradient-Based Optimization in Deep Learning

How Gradient Descent Works

Effect of Learning Rate

1. Small Learning Rate ( is small)

Small vs. Large Learning Rate Illustration

Small ( ) Slow, precise convergence Smooth, gradual descent

Large ( ) Fast, risk of divergence/oscillation Big jumps, possibly unstable

Perceptron Learning the OR Gate

OR Gate Truth Table

Initialization for Learning

Step 1: Calculate Weighted Sum

Step 2: Activation (Binary Step Function)

Step 3: Learning Rule

Activation: Output = 1 (correct)

Activation: Output = 0 (correct)

Output = 0, but Target = 1 (error)

$ W_2 $ remains 1 (since $ x_2=0 $)

Binary Activation Function

Neural Network Sketch and Explanation of Backpropagation

Neural Network Architecture

Input Layer Hidden Layer Output Layer

Key Steps in Backpropagation

For output neuron:

is the activation function (e.g., sigmoid, ReLU).

3. Backward Pass (Gradient Calculation):

For hidden layer weights :

4. Update Weights and Biases:

Biases updated similarly:

Summary of Weight and Bias Updates in Backpropagation

Output layer Adjust weights from hidden to output

You might also like