Widely used Activation function inside the neurons
Moin Mostakim
Department of Computer Science and Enigneering
Faculty of School of Data Science
October 2023
Moin Mostakim (BRAC University) Activation Functions October 2023 1 / 10
Contents of the slide
1 Sigmoid Activation Function
2 Hyperbolic Tangent (Tanh) Activation Function
3 Rectified Linear Unit (ReLU) Activation Function
4 Leaky Rectified Linear Unit (Leaky ReLU) Activation Function
5 Exponential Linear Unit (ELU) Activation Function
6 Swish Activation Function
7 Gated Linear Unit (GLU) Activation Function
8 Softmax Activation Function Activation Functions
Moin Mostakim (BRAC University) October 2023 2 / 10
Sigmoid Activation Function
1
Formula: σ(x) = 1+e −x
Range: (0, 1)
First-order Derivative: 1
σ ′ (x) = σ(x) · (1 − σ(x)) 0.8
σ(x)
Output: 0.6
0.4
• Shape: S-shaped curve.
0.2
• Use Cases: Binary classification, 0
sigmoid neurons in the output −5 0 5
layer. x
• Benefits: Smooth gradient, suitable
for converting network outputs to
probabilities.
Moin Mostakim (BRAC University) Activation Functions October 2023 3 / 10
Hyperbolic Tangent (Tanh) Activation Function
e x −e −x
Formula: tanh(x) = e x +e −x
Range: (-1, 1)
First-order Derivative:
tanh′ (x) = 1 − tanh2 (x) 1
0.5
tanh(x)
Output:
0
• Shape: S-shaped curve similar to
sigmoid. −0.5
• Use Cases: Regression, −1
−2 −1 0 1 2
classification. x
• Benefits: Centered around zero,
mitigates vanishing gradient
problem, and provides smooth
gradients.
Moin Mostakim (BRAC University) Activation Functions October 2023 4 / 10
Rectified Linear Unit (ReLU) Activation Function
Formula: ReLU(x) = máx(0, x)
Range: [0, ∞)
First-order Derivative:
(
0 if x < 0
ReLU′ (x) =
1 if x ≥ 0
Output:
• Shape: Linear for positive values, zero for negatives.
• Use Cases: Hidden layers in most neural networks.
• Benefits: Efficient, mitigates vanishing gradient, induces sparsity.
Moin Mostakim (BRAC University) Activation Functions October 2023 5 / 10
Leaky Rectified Linear Unit (Leaky ReLU) Activation
Function
(
x if x ≥ 0
Formula: LeakyReLU(x, α) = Range : (−∞, ∞)
αx if x < 0
First-order Derivative:
(
1 if x ≥ 0
LeakyReLU′ (x, α) =
α if x < 0
Output:
• Shape: Linear for positive values, non-zero slope for negatives.
• Use Cases: Alternative to ReLU to prevent ”dying
ReLU”problem.
• Benefits: Addresses ”dying ReLUı̈ssue, retains sparsity.
Moin Mostakim (BRAC University) Activation Functions October 2023 6 / 10
Exponential Linear Unit (ELU) Activation Function
(
x if x ≥ 0
Formula: ELU(x, α) = x
Range : (−∞, ∞)
α(e − 1) if x < 0
First-order Derivative:
(
′ 1 if x ≥ 0
ELU (x, α) =
αe x if x < 0
Output:
• Shape: Smooth S-shaped curve with an exponential increase for
negative values.
• Use Cases: An alternative to ReLU with smoother gradients.
• Benefits: Smoother gradients, better training on negative values.
Moin Mostakim (BRAC University) Activation Functions October 2023 7 / 10
Swish Activation Function
Formula: Swish(x) = x · σ(x)
Range: (-∞, ∞)
First-order Derivative: Swish′ (x) = Swish(x) + σ(x) · (1 − Swish(x))
Output:
• Shape: Smooth, non-monotonic curve.
• Use Cases: Considered in some architectures as an alternative to
ReLU.
• Benefits: Smoothness, performance improvements observed in
deep networks.
Moin Mostakim (BRAC University) Activation Functions October 2023 8 / 10
Gated Linear Unit (GLU) Activation Function
Formula: GLU(x) = x · σ(g (x))
Range: (-∞, ∞)
First-order Derivative:
GLU′ (x) = σ(g (x)) + x · σ ′ (g (x)) · (1 − σ(g (x)))
Output:
• Shape: Complex, involving a sigmoid gate.
• Use Cases: Used in architectures like the Transformer and other
sequence-to-sequence models.
• Benefits: Enables modeling dependencies in sequences, better
than standard RNNs.
Moin Mostakim (BRAC University) Activation Functions October 2023 9 / 10
Softmax Activation Function
x
Formula (for class i): Softmax(x)i = Pe i xj
j e
Range: (0, 1)
First-order Derivative:
∂
∂xi Softmax(x)j = Softmax(x)i · (δij − Softmax(x)j )
Output:
• Shape: Probability distribution over classes.
• Use Cases: Used in the output layer of multi-class classification
for probability distribution over classes.
• Benefits: Converts scores to class probabilities, essential for
classification tasks.
Moin Mostakim (BRAC University) Activation Functions October 2023 10 / 10