Activation Functions
3 Types of Activation
Functions
1. Binary step function
2. Linear Activation Function
3. Non-Linear Activation Function
Sigmoid/Logistic Activation Function
1
• 𝑦=σ
ො (w x + b) T
where σ(z)=
1+𝑒 −𝑧
• If z is very large then 𝑒 −𝑧 is close to zero and
1
σ(z) = ≈1
1+0
• If z is very small then 𝑒 −𝑧 is large and σ(z) =
1
≈0
1+𝐿𝑎𝑟𝑔𝑒 𝑁𝑢𝑚𝑏𝑒𝑟
22 January 2025
Sigmoid and Hard Sigmoid
1/22/2025
1/22/2025
Activation Functions: tanh
Tanh (Hyperbolic Tangent) activation function is more preferred than the sigmoid
function. It is the shifted version of sigmoid function with a mean value of zero.
It has better centering effect for the activation function to be used on the hidden layer.
For binary classification problem at the output layer, we use the Sigmoid function.
𝑒 𝑧 − 𝑒 −𝑧
tanh =
𝑒 𝑧 + 𝑒 −𝑧
Problem in both Sigmoid and tanh is that the slope of the
curve except the middle region is too small and goes
close to zero. This create a serious issue with gradient
descent and learning becomes very slow.
1/22/2025
ReLU and Leaky ReLU
Leaky ReLU max(0.1z,z)
Rectified Linear Unit is the default activation function being used now. If you see the
left part of the function, you can see that it is not zero, but almost zero. To resolve the
issue of dead neurons people, use Leaky ReLU.
1/22/2025
1/22/2025
Parametric ReLU
• Allows the negative slope to be
learned—unlike leaky ReLU, this
function provides the slope of the
negative part of the function as an
argument. It is, therefore, possible to
perform backpropagation and learn
the most appropriate value of α.
• Otherwise like ReLU
• May perform differently for different
problems
Softmax Function
When we have to classify in multiple categories then softmax function is useful. For
example, if you want to categorize pictures into
A) scene b) Animals c) Humans d) Vehicles then in that case we will have four outputs
from the softmax function which will give us the probabilities of each of these categories.
Sum of the probabilities will be one and that with the highest probability will be shown as
the answer.
1/22/2025
Understanding softmax
5 𝑒5
[𝐿]
2 𝑒2
𝑧 = 𝑡=
-1 𝑒 −1
3 𝑒3
“Hard Max”
1
𝑒 5 /(𝑒 5 + 𝑒2 + 𝑒 −1 + 𝑒3) 0.842
0
𝑎[𝑙] =𝑔 𝐿 𝑧 𝐿 = 𝑒 2 /(𝑒 5 + 𝑒2 + 𝑒 −1 + 𝑒3) = 0.042
0
𝑒 −1 /(𝑒 5 + 𝑒2 + 𝑒 −1 + 𝑒3) 0.002
0
𝑒 3 /(𝑒 5 + 𝑒2 + 𝑒 −1 + 𝑒3) 0.114
Softmax regression generalizes logistic regression to C classes.
If c=2, softmax reduces to logistic regression.
1/22/2025
1/22/2025
Popular Ones
1/22/2025
1/22/2025
https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
Resources
https://medium.com/the-theory-of-everything/understanding-activation-functions-in-neural-networks-
9491262884e0
https://medium.com/@snaily16/what-why-and-which-activation-functions-b2bf748c0441
https://medium.com/@vinodhb95/activation-functions-and-its-types-8750f1287464
1/22/2025
Thank You
For more information, please visit the following links:
gauravsingal789@gmail.com
gaurav.singal@nsut.ac.in
https://www.linkedin.com/in/gauravsingal789/
http://www.gauravsingal.in
20