[go: up one dir, main page]

0% found this document useful (0 votes)
18 views65 pages

UNIT III 3.1 ML Artificial Neural Networks

Uploaded by

Hii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views65 pages

UNIT III 3.1 ML Artificial Neural Networks

Uploaded by

Hii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

MACHINE LEARNING

(20BT60501)

COURSE DESCRIPTION:
Concept learning, General to specific ordering, Decision tree
learning, Support vector machine, Artificial neural networks,
Multilayer neural networks, Bayesian learning, Instance based
learning, reinforcement learning.
Subject :MACHINE LEARNING -(20BT60501)

Topic: Unit III – ARTIFICIAL NEURAL NETWORKS

Prepared By:
Dr.J.Avanija
Professor
Dept. of CSE
Sree Vidyanikethan Engineering College
Tirupati.
Unit III – ARTIFICIAL NEURAL NETWORKS

 Neural network representations


 Appropriate problems for neural network learning
 Perceptrons
 Multilayer networks and Backpropagation algorithm
 Convergence and local minima
 Representational power of feedforward networks
 Hypothesis space search and inductive bias
 Hidden layer representations, Generalization
 Overfitting, Stopping criterion
 An Example - Face Recognition.
What is Artificial Neural Network?
The term "Artificial Neural Network" is derived from Biological
neural networks that develop the structure of a human brain. Similar
to the human brain that has neurons interconnected to one another,
artificial neural networks also have neurons that are interconnected to
one another in various layers of the networks. These neurons are
known as nodes.

4
What is Artificial Neural Network?
Biological Neural Artificial Neural
Network Network
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output

5
Artificial Neural Network Representation

 Dendrites from Biological Neural Network represent inputs in


Artificial Neural Networks, cell nucleus represents Nodes, synapse
represents Weights, and Axon represents Output.
 The outputs to inputs relationship keep changing because of the
neurons in our brain, which are "learning."

6
Architecture of Artificial Neural Network

Artificial Neural Network primarily consists of three layers:


 Input Layer
 Hidden Layer
 Output Layer

 The artificial neural


network takes input
and computes the
weighted sum of the
inputs and includes a
bias.
 This computation is
represented in the form
of a transfer function.

7
A Simple Neural Network

8
When to consider Neural Network ?

 Input is high-dimensional discrete or real-valued (e.g. raw


sensor input)
 Output is discrete or real valued.
 Output is a vector of values.
 Possibly noisy data Form of target function is unknown
 Human readability of result is unimportant

9
Appropriate Problems for Neural Network
Learning
 Instances are represented by many attribute-value pairs (e.g.,
the pixels of a picture)
 The target function output may be discrete-valued, real-valued,
or a vector of several real- or discrete-valued attributes.
 The training examples may contain errors.
 Long training times are acceptable.
 Fast evaluation of the learned target function may be required.
 The ability for humans to understand the learned target function is
not important.

10
Artificial Neural Network - Components

N o d e s – interconnected processing elements (units or neurons)

N e u r o n is connected to other by a connection link.

Each connection link is associated with weight which has information


about the input signal.
A N N processing elements are called as neurons or artificial neurons ,
since they have the capability to model networks of original neurons as found in
brain.
 Internal state of neuron is called activation or activity level of

neuron, which is the function of the inputs the neurons receives.


N e u r o n can send only one signal at a time.

11
Artificial Neural Network - Components

X 1 and X2 – input neurons

Y - output neuron

We i g h t e d interconnection links- w1 and w2.

N e t input calculation is :

O u t p u t is :

O u tpu t =function
T h e function to be applied over the ne t input is called
activation function.
12

A C T I V A T I O N FUNCTIONS
 To make work more efficient and for exact output, some force or activation is given.
 Like that, activation function is applied over the net input to calculate the output of an
ANN.
 Information processing of processing element has two major parts: input and output.
 An integration function (f) is associated with input of processing element.
 Several activation functions are there.
1. Identity function:
 it is a linear function which is defined as
f(x) =x for all x
 The output is same as the input.
2. Binary step function
 it is defined as

where θ represents thresh hold value.


It is used in single layer nets to convert the net input to an output that is bianary. ( 0 or 1(

13
contd..
3. Bipolar step function:
• It is defined as

• where θ represents threshold value.


• used in single layer nets to convert the net input to an output that is bipolar (+1 or -1).
4. Sigmoid function
used in Back propagation nets.
Two types:
a) binary sigmoid
function
-logistic
sigmoid
function or
unipolar
sigmoid
function.
-it is f’(x) = λ f(x)[1-f(x)]. The range of sigmoid function is 0 to 1. 33
defined as
contd..
b) Bipolar sigmoid function

where λ- steepness parameter and the sigmoid range is between -1


and +1.
- The derivative of this function can be
[1+f(x)][1-f(x)]
- It is closely related to hyberbolic tangent function, which is written as

15
contd..
The derivative of the hyberbolic tangent function is
h’(x)= [1+h(x))][1-h(x)]
5. Ramp function

The graphical representation of all these function is


given in the upcoming Figure

16
17
I M P O R TA N T T E R M I N O LO G I E S

 Weight

 T h e weight contain information about the input signal.


 I t is used by the net to solve the problem.
 I t is represented in terms of matrix & called as connection matrix.
 I f weight matrix W contains all the elements of an ANN, then the
set of all W matrices will determine the set of all possible
information processing configuration.
 T h e ANN can be realized by finding an appropriate matrix W.
 Weight encode long-term memory (LTM) and the activation states
of network encode short-term memory (STM) in a neural
18
network.
Contd..
 Bias
 Bias has an impact in calculating net input.
 Bias is included by adding x0 to the input vector x.

 The net output is calculated by

 The bias is of two types


 Positive bias

 Increase the net

input
 Negative bias
19
 Decrease the net
Contd..
 Threshold
 I t is a set value based upon which the final output
is calculated.
 Calculated net input and threshold is compared to get

the network output.


 T h e activation function of threshold is defined as

 where θ is the fixed threshold value

20
Perceptron

 A Perceptron unit is used to build the Artificial Neural Network


System
 A perceptron takes a vector of real-valued inputs , calculates a
linear combination of these inputs then outputs 1, if the result
is greater than some threshold and -1 otherwise.
 Given inputs x1 through xn the output o(x1,…xn) computed by
the perceptron is

 where wi is a real-valued constant or weight that determines the


contribution of input xi to the perceptron output
21
Perceptron Training Rule

22
Perceptron Training Rule

 One way to learn acceptable weight vector is to begin with random


weights then iteratively apply perceptron to each training example,
modifying the perceptron weights when it misclassifies the
training example.
 This process is repeated iteratively through the training examples
until the perceptron correctly classifies the training examples.
 Weights are modified at each step according to perceptron training
rule which updates the weight wi associated with input xi according
to the rule.
Learning rate

 where t- target output


o- actual output

23
Perceptron Training Rule

24
Perceptron Training Example
Designing AND Gate
 Consider w1=1.2, w2=0.6, Threshold =1
Learning rate n=0.5
 First Input:
 A=0, B=0 , Target=0
 wi.xi =1.2*0 + 0.6*0=0
 Apply activation function
 Since the value is not greater than the threshold 1 the output is 0.
 Actual output is same as target so no need to perform weight
updation
 Second Input:
 A=0, B=0 , Target=0
 wi.xi =1.2*0 + 0.6*1=0.6
 Since the value is not greater than the threshold 1 the output is 0.
 Actual output is same as target so no need to perform weight updation25
Perceptron Training Example
Designing AND Gate
 Consider w1=1.2, w2=0.6, Threshold =1
Learning rate n=0.5
 Third Input:
 A=1 B=0 , Target=0
 wi.xi =1.2*1+ 0.6*0=1.2
 Since the value is greater than the threshold 1 the output is 1.
 Actual output is not same as target so perform weight updation
 wi=wi+n(t-o)xi
 w1=1.2+0.5(0-1)1=0.7
 w2=0.6+0.5(0-1)0=0.6

26
Perceptron Training Example
Designing AND Gate
 Consider w1=0.7, w2=0.6, Threshold =1
Learning rate n=0.5
 First Input:
 A=0, B=0 , Target=0
 wi.xi =0.7*0 + 0.6*0=0
 Since the value is not greater than the threshold 1 the output is 0.
 Actual output is same as target so no need to perform weight
updation
 Second Input:
 A=0, B=0 , Target=0
 wi.xi =0.7*0 + 0.6*1=0.6
 Since the value is not greater than the threshold 1 the output is 0.
 Actual output is same as target so no need to perform weight
27
Perceptron Training Example
Designing AND Gate
 Consider w1=0.7, w2=0.6, Threshold =1
Learning rate n=0.5
 Third Input:
 A=1 B=0 , Target=0
 wi.xi =0.7*1+ 0.6*0=0.7
 Since the value is not greater than the threshold 1 the output is 0.
 Actual output is same as target so no need to perform weight
updation

28
Perceptron Training Example
Designing AND Gate
 Consider w1=0.7, w2=0.6, Threshold =1
Learning rate n=0.5
 Fourth Input:
 A=1 B=1 , Target=1
 wi.xi =0.7*1+ 0.6*1=1.3
 Since the value is greater than the threshold 1 the output is 1.
 Actual output is same as target so no need to perform weight
updation

29
Gradient Descent and Delta Rule
 Perceptron rule finds the successful weight vector when the training
examples are linearly separable and fails to converge if the examples are
not linearly separable.
 Delta rule can be used in training to classify non-linearly separable
training examples.
 The delta rule converges towards a best fit approximation to the target
concept if the training examples are not linearly separable.
 Delta rule uses gradient descent to search the hypothesis space of possible
weight vectors and finds the weights that best fits the training examples.
 This rule is important since gradient descent provides the basis for
Backpropagation algorithm which can learn network with many
interconnected units.
 Gradient descent can also serve as the basis for learning algorithm that
search through hypothesis containing many different types of
contonuously parameterized hytothesis.
30
Gradient Descent and Delta Rule
 Visualizing the hypothesis space

31
Gradient Descent and Delta Rule
 The delta rule can be best understood by considering the task of training an
unthresholded perceptron, ie. A linear unit for which the output o is given
by:

 Thus a linear unit corresponds to the first stage of a perceptron without a


threshold.
 In order to derive a weight learning rule for the linear units, begin by
specifying a measure for training error of the hypothesis(weight vector)
relative to the training examples.
 Although there are many ways to define this error , one common measure is

 Where D is the set of training examples, td is the target output of the


training example d and od is the output of the linear unit for training
example d. 32
Derivation of Gradient Descent Rule
 How can we calculate the steepest descent along the error surface?
 This direction can be found by computing the derivative of E with respect to
each component of vector
 This vector derivative is called the gradient of E with respect to ,
written as

 Since the gradient specifies the direction of steepest increase of E, the


training rule for gradient descent is

 where

 Here n is a positive constant called learning rate which determines the


step size in gradient descent search. The negative sign is presented since the
33
weight vector moves in the direction that decreases E.
Derivation of Gradient Descent Rule

 This training rule can also written in its component form

 Where,

34
Derivation of Gradient Descent Rule

35
Gradient Descent Algorithm
 Gradient descent and delta rule is used to separate non-linearly
separable data.
 Weights are updated using the following rule,

 Where

36
Gradient Descent Algorithm

37
Gradient Descent Algorithm
 Gradient descent is an important general paradigm for learning.
 It is a strategy for searching through a large or infinite
hypothesis space that can be applied whenever
 The hypothesis space contains continuously parameterized
hypothesis (eg. the weights in a linear unit) and
 The error can be differentiated with respect to these
hypothesis parameters

38
Gradient Descent Algorithm
 The key practical difficulties in applying gradient descent are:
 Converging to a local minima can be slow(ie. It can require many
thousands of gradient descent steps)
 If there are multiple local minima in the error surface then there is no
guarantee that the procedure will find the global minimum.
 Differences between Delta Rule and Perceptron Learning Rule
 Error (t-o) in delta rule is not restricted having values of 0,1 and -1 as in
Perceptron Learning Rule , but may have any value.
 Delta Rule can be derived for any differentiable output/activation
function f, whereas Perceptron Learning Rule only works for threshold
output function.

39
Multilayer Networks

 A multilayered network learned by Backpropagation algorithm are capable of


expressing rich variety of nonlinear decision surfaces.
 A multilayer networks can learn using gradient descent algorithm
 A differentiable threshold unit can be used.
 Multiple layers of linear units still produce only linear functions
 Perceptrons have a discontinuous threshold which is undifferentiable and
therefore unsuitable for gradient descent

40
Multilayer Networks

 A unit whose output is a nonlinear is required with differentiable function


of the inputs. One solution is a sigmoid unit.
 Like perceptrons it computes a linear combination of its inputs and then
applies a threshold to the result. But the threshold output is a continuous
function of its input which ranges from 0 to 1.It is often referred to as a
squashing function.
 Sigmoid unit computes its output as where,

41
Backpropagation Algorithm
 The Backpropagation network learns the weights for a multilayer network
given a network with fixed sets of units and interconnections.
 It uses gradient descent to minimize the error between the network output
values and target values.
 Redefine Error E by summing the errors over all of the network output units.

Target associated wit kth


output unit and training
example d
Output values
associated wit kth
output unit and training
example d
Set of output units in
the network

 Learning problem faced by Backpropagation is to search large hypothesis


space defined by possible weight values for a;; units in network.
 Gradient descent can be used to find the hypothesis to minimize E.
42
Backpropagation Algorithm

1. Inputs X, arrive through the preconnected path


2. Input is modeled using real weights W. The weights are usually randomly
selected.
3. Calculate the output for every neuron from the input layer, to the hidden
layers, to the output layer.
4. Calculate the error in the outputs
5. Travel back from the output layer to the hidden layer to adjust the weights
such that the error is decreased.
43
6. Repeat till desired output is achieved
Backpropagation Algorithm

44
Backpropagation Algorithm

45
Backpropagation Algorithm
Input Layer

X14,w14 Hidden Layers Output Layer

X15,w15 O6

O7

46
Backpropagation Algorithm

47
Backpropagation Algorithm

48
Backpropagation Algorithm Example

 Assume that the neuron has sigmoid activation function, perform a


forward pass and a backward pass on the network. Assume that the actual
output y is 0.5 and the learning rate n is 1. Perform another forward pass.

49
Backpropagation Algorithm Example
 Forward Pass:
Compute output for y3,y4,y5

Target y = 0.5

50
Backpropagation Algorithm Example

 Each weight changed by


Learning Rate

Error Measure

51
Backpropagation Algorithm Example

52
Backpropagation Algorithm Example

53
Backpropagation Algorithm Example

 Similarly update all other weights

54
Backpropagation Algorithm Example
 Forward Pass:
Compute output for y3,y4,y5

55
Adding Momentum

 Backpropagation is a widely used algorithm with many variations.Common


method is to alter weight update rule. Weight updation on nth iteration
depends on (n-1)th iteration.
Weight
Update Rule

Momentum
Term
 Momentum helps to speed up converge with small local minima on the
error surface.

56
Derivation of the Backpropagation Rule

 To derive the equation for updating weights in Backpropagation algorithm


gradient descent rule is used.
 Stochastic gradient descent involves iterating through training examples one
at a time, for each training example d descending the gradient of the
error Ed with respect to the example.
 For each training example d every weight wji is updated adding it to

57
Derivation of the Backpropagation Rule

 Error on training example is Ed

 Here outputs is the set of output units in the network, tk is the target value
of the unit k for training example d and ok is the output unit k given training
example d.

58
 Notations used

59
Derivation of the Backpropagation Rule

 Notice that weight wji can influence the rest of the network through netj.
 Therefore we can use chain rule

60
Derivation of the Backpropagation Rule

 To derive a convenient expression for


 We consider 2 cases:
 Case 1, where unit j is an output unit for the network
 Case 2, where unit j is the internal unit of the network

61
Derivation of the Backpropagation Rule

 To derive a convenient expression for


 We consider 2 cases:
 Case 1, where unit j is an output unit for the network
 wij can influence the network with netj and netj can influence the network
with oj.We can use chain rule as specified below:

62
Derivation of the Backpropagation Rule

 Case 1, Training rule for output unit weights

63
Derivation of the Backpropagation Rule

 Case 2, Training rule for hidden unit weights

64
Derivation of the Backpropagation Rule

 Case 2, Training rule for hidden unit weights

65

You might also like