Unit 1: 1. Introduction To Artificial Neural Network
Unit 1: 1. Introduction To Artificial Neural Network
UNIT 1
1. INTRODUCTION TO ARTIFICIAL NEURAL NETWORK
Artificial Neural Networks are the computational models that are inspired by the human
brain. Many of the recent advancements have been made in the field of Artificial
Intelligence, including Voice Recognition, Image Recognition, and Robotics using
Artificial Neural Networks.
Artificial Neural Networks are the biologically inspired simulations performed on the
computer to perform certain specific tasks like –
Clustering
Classification
Pattern Recognition
The term ‘Neural’ is derived from the human (animal) nervous system’s basic functional
unit ‘neuron’ or nerve cells that are present in the brain and other parts of the human
(animal) body.
A neural network is a group of algorithms that certify the underlying relationship in a set
of data similar to the human brain.
Artificial Neural Network ANN is an efficient computing system whose central theme is
borrowed from the analogy of biological neural networks.
ANNs are also named as “artificial neural systems,” or “parallel distributed processing
systems,” or “connectionist systems.”
ANN acquires a large collection of units that are interconnected in some pattern to allow
communication between the units.
These units, also referred to as nodes or neurons, are simple processors which operate in
parallel.
Every neuron is connected with other neuron through a connection link.
Each connection link is associated with a weight that has information about the input signal.
Each neuron has an internal state, which is called an activation signal.
Output signals, which are produced after combining the input signals and activation rule,
may be sent to other units
ANN during 1940s to 1960s
Some key developments of this era are as follows −
1943 − It has been assumed that the concept of neural network started with the work of
physiologist, Warren McCulloch, and mathematician, Walter Pitts, when in 1943 they
modeled a simple neural network using electrical circuits in order to describe how neurons
in the brain might work.
1949 − Donald Hebb’s book, The Organization of Behavior, put forth the fact that repeated
activation of one neuron by another increases its strength each time they are used.
1956 − an associative memory network was introduced by Taylor.
1958 − A learning method for McCulloch and Pitts neuron model named Perceptron was
invented by Rosenblatt.
1960 − Bernard Widrow and Marcian Hoff developed models called "ADALINE" and
“MADALINE.”
ANN during 1960s to 1980s
Some key developments of this era are as follows −
1961 − Rosenblatt made an unsuccessful attempt but proposed the “back propagation”
scheme for multilayer networks.
1969 − Multilayer perceptron MLP was invented by Minsky and Papert.
1971 − Kohonen developed Associative memories.
1976 − Stephen Grossberg and Gail Carpenter developed Adaptive resonance theory.
ANN from 1980s till Present
Some key developments of this era are as follows −
1982 − the major development was Hopfield’s Energy approach.
1985 − Boltzmann machine was developed by Ackley, Hinton.
1986 − Rumelhart, Hinton, and Williams introduced Generalised Delta Rule.
1988 − Kosko developed Binary Associative Memory BAM and also gave the concept of
Fuzzy Logic in ANN.
Soma Node
Dendrites Input
Axon Output
The following table shows the comparison between ANN and BNN based on some criteria
mentioned.
2. Model of ANN
The Artificial Neural Network receives information from the external world in the form of
pattern and image in vector form.
These inputs are mathematically designated by the notation x(n) for n number of inputs.
Each input is multiplied by its corresponding weights.
Weights are the information used by the neural network to solve a problem.
Typically weight represents the strength of the interconnection between neurons inside the
Neural Network
In case the weighted sum is zero, bias is added to make the output not- zero
For the above general model of artificial neural network, the net input can be calculated as
follows −
yin=x1.w1+x2.w2+x3.w3…xm.wm
i.e., Net input yin=∑𝑚
𝑖 𝑥𝑖. 𝑤𝑖
The output can be calculated by applying the activation function over the net input.
Y=F(yin)
Output = function
3. Activation Functions
It may be defined as the extra force or effort applied over the input to obtain an exact
output.
Definition of activation function:- Activation function decides, whether a neuron should
be activated or not by calculating weighted sum and further adding bias with it. The purpose
of the activation function is to introduce non-linearity into the output of a neuron.
It calculates a “weighted sum” of its input, adds a bias and then decides whether it should
be “fired” or not.
Y= ∑ (weight * input) + bias.
Now, the value of Y can be anything ranging from -inf to +inf. The neuron really doesn’t
know the bounds of the value. So how do we decide whether the neuron should fire or not
The composition of two linear function is a linear function itself. Neuron cannot learn with
just a linear function attached to it. A non-linear activation function will let it learn as per
the difference w.r.t error. So we need activation function.
We need to apply a Activation function f(x) so as to make the network more powerful and
add ability to it to learn something complex and complicated form data.
Hence using a nonlinear Activation we are able to generate non-linear mappings from
inputs to outputs.
Non-linear functions are those which have degree more than one and they have a curvature
when we plot a Non-Linear function.
The question arises that why can’t we do it without activating the input signal?
If we do not apply an Activation function then the output signal would simply be a
simple linear function.
A linear function is just a polynomial of one degree. Now, a linear equation is easy to
solve but they are limited in their complexity and have less power to learn complex
functional mappings from data and does not performs good most of the times.
Also without activation function our neural network would not be able to learn and model
other complicated kinds of data such as images, videos, audio, speech etc.
Hidden layer i.e. layer 1 :-
z(1) = W(1)X + b(1)
a(1) = z(1)
Here,
z(1) is the output of layer 1
W(1) be the weights assigned to neurons of hidden layer i.e. w1, w2, w3 and w4
X be the input features i.e. i1 and i2
b is the bias assigned to neurons in hidden layer i.e. b1 and b2
a(1) is the form of any linear function.
Layer 2 i.e. output layer :-
z(2) = W(2)a(1) + b(2)
a(2) = z(2)
Followings are some activation functions of interest.
Threshold Function
Piecewise-linear Function
Sigmoid Activation Function
Threshold Function:
Activation functions are decision making units of neural networks. They calculates net
output of a neural node.
Piecewise-linear Function:-
Equation : Linear function has the equation similar to as of a straight line i.e. y = ax
No matter how many layers we have, if all are linear in nature, the final activation function
of last layer is nothing but just a linear function of the input of first layer.
Range : -inf to +inf
Uses: Linear activation function is used at just one place i.e. output layer.
However, a linear activation function has two major problems:
Not possible to use back propagation to train the model, the derivative of the function is a
constant, and has no relation to the input, X. So it’s not possible to go back and understand
which weights in the input neurons can provide a better prediction.
All layers of the neural network collapse into one—with linear activation functions, no
matter how many layers in the neural network, the last layer will be a linear function of the
first layer (because a linear combination of linear functions is still a linear function). So a
linear activation function turns the neural network into just one layer.
A neural network with a linear activation function is simply a linear regression model. It
has limited power and ability to handle complexity varying parameters of input data.
Equation
y=−2x+5
y=2x+1
y=7
y=−x+12
The function is differentiable. That means, we can find the slope of the sigmoid curve at
any two points.
If x goes to minus infinity, y goes to 0 (tends not to fire).
As x goes to infinity, y goes to 1 (tends to fire):
At x=0, y=1/2.
The threshold is set to 0.5. If the value is above 0.5 it is scaled towards 1 and if it is
below 0.5 it is scaled towards 0.
In this network the information moves only from the input layer directly through any
hidden layers to the output layer without cycles/loops.
Feed forward networks can be constructed with various types of units, such as
binary mcCulloch-Pitts neurons, the simplest of which is the perceptron.
Continuous neurons, frequently with sigmoidal activation, are used in the context of back
propagation.
Feed forward neural networks are used in technologies like face recognition and computer
vision. This is because the target classes in these applications are hard to classify.
A Feed forward Artificial Neural Network, as the name suggests, consists of several layers
of processing units where each layer is feeding input to the next layer, in a feed through
manner. A simple two-layer network is an example of feed forward ANN.
The following is a simple structure of a three-layered feed forward ANN.