[go: up one dir, main page]

0% found this document useful (0 votes)
56 views20 pages

Activation Function

The document discusses various activation functions used in neural networks, including Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Softmax, Swish, Maxout, and Softplus, highlighting their mathematical definitions, advantages, and disadvantages. Activation functions introduce non-linearity, enabling networks to learn complex patterns, with each function having unique characteristics that make them suitable for different tasks. The document emphasizes the importance of choosing the right activation function based on the specific requirements of the model and the data.

Uploaded by

nafulaaurelia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views20 pages

Activation Function

The document discusses various activation functions used in neural networks, including Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Softmax, Swish, Maxout, and Softplus, highlighting their mathematical definitions, advantages, and disadvantages. Activation functions introduce non-linearity, enabling networks to learn complex patterns, with each function having unique characteristics that make them suitable for different tasks. The document emphasizes the importance of choosing the right activation function based on the specific requirements of the model and the data.

Uploaded by

nafulaaurelia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Activation Functions

An activation function is a mathematical function applied to the output of a neuron.

It introduces non-linearity into the model, allowing the network to learn and
represent complex patterns in the data.

Without this non-linearity feature, a neural network would behave like a linear
regression model, no matter how many layers it has.

The activation function decides whether a neuron should be activated by


calculating the weighted sum of inputs and adding a bias term.

This helps the model make complex decisions and predictions by introducing non-
linearities to the output of each neuron.
1. SIGMOID Sigmoidal functions are frequently used
in machine learning, specifically in the
ACTIVATION testing of artificial neural networks, as a

FUNCTION way of understanding the output of a


node or “neuron.”

A sigmoid function is a type of activation


function, and more specifically defined
as a squashing function. Squashing
functions limit the output to a range
between 0 and 1.
Pros And Cons of Sigmoid Activation function
Pros Cons

1. The performance of Binary 1. The calculation in sigmoid function is


Classification is very well as compare complex.
to other activation function. 2. It is not useful in multiclass
2. Clear predictions, i.e very close to 1 or classification .
0. 3. For negative values of x-axis gives 0.
4. It become constant and gives 1 for any
high positive values.
5. Function output is not zero-centered
Hypertangent
activation Function

This function is easily defined as the


ratio between the hyperbolic sine and
the cosine functions
Pros And Cons of Tanh Activation function
Pros Cons

1. The gradient is stronger for tanh than 1. Tanh also has the vanishing gradient
sigmoid ( derivatives are steeper). problem.
2. The output interval of tanh is 1, and the
whole function is 0-centric, which is
better than sigmod
ReLu Activation
Function

The ReLU function is actually a


function that takes the maximum
value
Pros And Cons of ReLu function
Pros Cons

1. When the input is positive, there is no 1. When the input is negative, ReLU is
gradient saturation problem. completely inactive, which means that
2. The calculation speed is much faster. once a negative number is entered,
3. The ReLU function has only a linear ReLU will die
relationship. 2. We find that the output of the ReLU
4. Whether it is forward or backward, it is function is either 0 or a positive number,
much faster than sigmod and tanh. which means that the ReLU function is
not a 0-centric function.
Leaky ReLu
Function

It is an attempt to solve the dying


ReLU problem
The leak helps to increase the
range of the ReLU function.
Usually, the value of a is 0.01 or so.
Pros And Cons of Leaky ReLu Activation
function
Pros Cons

1. There will be no problems with Dead 1. It has not been fully proved that Leaky
ReLU. ReLU is always better than ReLU.
2. A parameter-based method, Parametric
ReLU : f(x)= max(alpha x,x), which
alpha can be learned from back
propagation.
ELU is very similiar to RELU except negative inputs.
They are both in identity function form for non-
negative inputs. On the other hand, ELU becomes
smooth slowly until its output equal to -α whereas
RELU sharply smoothes.

ELU (Exponential
Linear Units)
function
Pros And Cons of ELU Activation function
Pros Cons

1. ELU becomes smooth slowly until its 1. For x > 0, it can blow up the activation
output equal to -α whereas RELU with the output range of [0, inf].
sharply smoothes.
2. ELU is a strong alternative to ReLU.
3. Unlike to ReLU, ELU can produce
negative outputs.
Softmax Function

Softmax function calculates the


probabilities distribution of the event
over ‘n’ different events. In general
way of saying, this function will
calculate the probabilities of each
target class over all possible target
classes.
Pros And Cons of Softmax Activation function
Pros Cons

1. It mimics the one hot encoded labels 1. The softmax function should not be used
better than the absolute values. for multi-label classification.
2. If we use the absolute (modulus) values 2. the sigmoid function (discussed later) is
we would lose information, while the preferred for multi-label classification.
exponential intrinsically takes care of 3. The Softmax function should not be used
this. for a regression task as well.
Swish Function

Swish's design was inspired by the


use of sigmoid functions for gating
in LSTMs and highway networks. We
use the same value for gating to
simplify the gating mechanism,
which is called self-gating.
Pros And Cons of Swish Activation function
Pros Cons

1. No dying ReLU. 1. Slightly more computationally


2. Increase in accuracy over ReLU expensive.
3. Outperforms ReLU in every batch size. 2. More problems with the algorithm will
probably arise given time.
Maxout Function The Maxout activation function
is defined as follows

The Maxout activation is a


generalization of the ReLU and the
leaky ReLU functions. It is a
learnable activation function.
Pros And Cons of Maxout Activation function
Pros Cons

1. It is a learnable activation function. 1. It doubles the total number of


parameters for each neuron and hence,
a higher total number of parameters
need to be trained.
Softplus Activation
Funtion

The softplus function is similar to


the ReLU function, but it is relatively
smooth.It is unilateral suppression
like ReLU.It has a wide acceptance
range (0, + inf).

Softplus function: f(x) = ln(1+exp x)


Pros And Cons of Softplus Activation function
Pros Cons

1. It is relatively smooth. 1. Leaky ReLU is a piecewise linear


2. It is unilateral suppression like ReLU. function, just as for ReLU, so quick to
3. It has a wide acceptance range (0, + compute. ELU has the advantage over
inf). softmax and ReLU that it's mean output
is closer to zero, which improves
learning.

You might also like