NEURAL NETWORKS AND DEEP LEARNING
Subject Code - CSE_ 3273
CREDITS – 03
Module -3
1
Module -3
DYNAMICALLY DRIVEN RECURRENT NETWORKS AND RECURRENT HOPFIELD NETWORKS
Topic Page No.
Introduction
Recurrent Network Architectures
Universal Approximation Theorem
Controllability and Observability
Computational Power of Recurrent Networks
Learning Algorithms
Back Propagation Through Time
Real-Time Recurrent Learning
Operating Principles of the Hopfield Network
Stability Conditions of the Hopfield Network, Associative Memories
Outer Product Method, Pseudoinverse Matrix Method
Storage Capacity of Memories, Design Aspects of the Hopfield Network
Case studies
Text Book 1: 15.1-15.8, relevant journals. Text Book 2: 7.1-7.5, relevant journals
INTRODUCTION
Global feedback is a facilitator of computational intelligence.
Global feedback in a recurrent network makes it possible to achieve some useful tasks:
➢ Content-addressable memory, exemplified by the Hopfield network;
➢ Autoassociation, exemplified by Anderson’s brain-state-in-a-box model;
➢ Dynamic reconstruction of a chaotic process (a series of actions), using feedback built around
a regularized one-step predictor.
Content-addressable memory (CAM) is a type of computer memory that searches for data based on its content rather
than its address
Autoassociation network, is any type of memory that is able to retrieve a piece of data from only a tiny sample of
itself
Dynamic reconstruction of a chaotic process" refers to the method of mathematically recreating the full dynamics of a chaotic
system by analyzing only a limited set of observed data points from that system
INTRODUCTION
Important application of recurrent networks: input–output mapping
Consider, for example, a multilayer perceptron with a single hidden layer as the basic building
block of a recurrent network. The application of global feedback around the multilayer
perceptron can take a variety of forms. We may apply feedback from the outputs of the hidden
layer of the multilayer perceptron to the input layer. Alternatively, we may apply the
feedback from the output layer to the input of the hidden layer. We may even go one step
further and combine all these possible feedback loops in a single recurrent network
structure. We may also, of course, consider other neural network configurations as the building
blocks for the construction of recurrent networks. The important point is that recurrent networks
have a very rich repertoire of architectural layouts, which makes them all the more powerful in
computational terms
INTRODUCTION
By definition, the input space of a mapping network is mapped onto an output space. For this
kind of application, a recurrent network responds temporally to an externally applied input
signal. We may therefore speak of the recurrent networks considered in this chapter as
dynamically driven recurrent networks—hence the title of the chapter.
Moreover, the application of feedback enables recurrent networks to acquire state
representations, which makes them desirable tools for such diverse applications as nonlinear
prediction and modeling, adaptive equalization of communication channels, speech
processing, and plant control, to name just a few.
RECURRENT NETWORK ARCHITECTURES
Four specific network architectures, each of which highlights a specific form of global
feedback. They share the following common features:
• They all incorporate a static multilayer perceptron or parts thereof.
• They all exploit the nonlinear mapping capability of the multilayer perceptron.
➢ Input–Output Recurrent Model
➢ State-Space Model
➢ Recurrent Multilayer Perceptrons
➢ Second-Order Network
Input–Output Recurrent Model
Figure 15.1 shows the architecture of a generic recurrent
network that follows naturally from a multilayer perceptron.
The model has a single input that is applied to a tappeddelay-
line memory of q units. It has a single output that is fed back
to the input via another tapped-delay-line memory, also of q
units. The contents of these two tapped-delay-line memories
are used to feed the input layer of the multilayer perceptron.
The present value of the model input is denoted by un, and
the corresponding value of the model output is denoted by
yn+1; that is, the output is ahead of the input by one time unit.
Input–Output Recurrent Model
Thus, the signal vector applied to the input layer of the multilayer perceptron consists of a data
window made up of the following components:
State-Space Model
State-Space Model
Recurrent Multilayer Perceptrons
Recurrent Multilayer Perceptrons
Second-Order Network
Term “order” to refer to the number of hidden neurons whose outputs are fed back to the input
layer via a bank of time-unit delays.
Second-Order Network
Second-Order Network
Second-Order Network
In light of this relationship, second-order networks are readily used for representing
and learning deterministic finite-state automated (DFA)4; a DFA is an information
processing system with a finite number of states.
UNIVERSAL APPROXIMATION THEOREM
The universal approximation theorem states that a neural network with a single hidden layer can
approximate any continuous function to any desired accuracy. It's a fundamental theorem in machine
learning and neural networks
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
Increased the number of neurons in the hidden layer- Changed from 10 neurons to 50 neurons:
Changed the activation function- Replaced ReLU() with Tanh()
Increased the number of training epochs -Increased from 2000 epochs to 5000 epochs:
CONTROLLABILITY AND OBSERVABILITY
Many recurrent networks can be represented by the state-space model, where the state is defined by the
output of the hidden layer fed back to the input layer via a set of unit-time delays. In this context, it is
insightful to know whether the recurrent network is controllable and observable or not.
Controllability is concerned with whether we can control the dynamic behavior of the recurrent
network. Observability is concerned with whether we can observe the result of the control applied to the
recurrent network.
Formally, a dynamic system is said to be controllable if any initial state of the system is steerable to any
desired state within a finite number of time-steps; the output of the system is irrelevant to this definition.
Correspondingly, the system is said to be observable if the state of the system can be determined from a
finite set of input–output measurements
CONTROLLABILITY AND OBSERVABILITY
Local controllability and local observability—local in the sense that both properties apply in the
neighborhood of an equilibrium state of the network
Local Controllability
Let a recurrent network be defined by Eqs. (15.16) and (15.17), and let its linearized
version around the origin (i.e., equilibrium point) be defined by Eqs. (15.19) and (15.20). If
the linearized system is controllable, then the recurrent network is locally controllable
around the origin.
CONTROLLABILITY AND OBSERVABILITY
Local Observability
Let a recurrent network be defined by Eqs. (15.16) and (15.17), and let its linearized version
around the origin (i.e., equilibrium point) be defined by Eqs. (15.19) and (15.20). If the
linearized system is observable, then the recurrent network is locally observable around the
origin.
NARX-Nonlinear autoregressive network with exogenous inputs
COMPUTATIONAL POWER OF RECURRENT NETWORKS
Finite automata are abstract machines used
to recognize patterns in input sequences
The early work on recurrent networks used hard threshold logic for the activation
function of a neuron rather than soft sigmoid functions.
COMPUTATIONAL POWER OF RECURRENT NETWORKS
COMPUTATIONAL POWER OF RECURRENT NETWORKS
COMPUTATIONAL POWER OF RECURRENT NETWORKS
A nonlinear autoregressive network
with exogenous inputs (NARX) is a
type of artificial neural network
(ANN) that's used to model nonlinear
systems. It's often used for time series
prediction.
COMPUTATIONAL POWER OF RECURRENT NETWORKS
Figure 15.8 presents a portrayal of Theorems I and II and this corollary. It should, however, be noted that
when the network architecture is constrained, the computational power of a recurrent network may no
longer hold, as described in Sperduti (1997).
A Turing machine is a mathematical model of a computing device that manipulates symbols on a tape.
LEARNING ALGORITHMS
There are two modes of training an ordinary (static) multilayer perceptron: batch mode and
stochastic (sequential) mode.
In the batch mode, the sensitivity of the network is computed for the entire training sample before
adjusting the free parameters of the network.
In the stochastic mode, on the other hand, parameter adjustments are made after the presentation of
each pattern in the training sample.
LEARNING ALGORITHMS
There are two modes of training an ordinary (static) multilayer perceptron: batch mode and
stochastic (sequential) mode.
LEARNING ALGORITHMS
Likewise, we have two modes of training a recurrent network, described as follows (Williams and
Zipser, 1995):
Epochwise training.
➢ For a given epoch, the recurrent network uses a temporal sequence of input–target response pairs
and starts running from some initial state until it reaches a new state, at which point the training is
stopped and the network is reset to an initial state for the next epoch.
➢ The initial state doesn’t have to be the same for each epoch of training. Rather, what is important
is for the initial state for the new epoch to be different from the state reached by the network at the
end of the previous epoch. Consider, for example, the use of a recurrent network to emulate the
operation of a finite-state machine.
Epochwise training is a machine learning process where a model is trained on the entire dataset
multiple times (in "epochs") to improve performance. An epoch is one complete pass over the
entire training dataset.
LEARNING ALGORITHMS
Epochwise training.
➢ In such a situation, it is reasonable to use epochwise training, since there is a good possibility that
a number of distinct initial states and a set of distinct final states in the machine will be emulated
by the recurrent network. In epochwise training for recurrent networks, the term “epoch” is used
in a sense different from that for an ordinary multilayer perceptron.
➢ Although an epoch in the training of a multilayer perceptron involves the entire training sample of
input–target response pairs, an epoch in the training of a recurrent neural network involves a
single string of temporally consecutive input–target response pairs.
LEARNING ALGORITHMS
Continuous training. This second method of training is suitable for situations where there are
no reset states available or on-line learning is required. The distinguishing feature of
continuous training is that the network learns while performing signal processing.
Simply put, the learning process never stops. Consider, for example, the use of a recurrent
network to model a nonstationary process such as a speech signal. In this kind of situation,
continuous operation of the network offers no convenient times at which to stop the training
and begin a new with different values for the free parameters of the network.
LEARNING ALGORITHMS
Continuous training.
LEARNING ALGORITHMS
Continuous training.
Keeping these two modes of training in mind, in the next two sections we will describe two
different learning algorithms for recurrent networks, summarized as follows:
➢ The back-propagation-through-time (BPTT) algorithm, operates on the premise that the
temporal operation of a recurrent network may be unfolded into a multilayer perceptron. This
condition would then pave the way for application of the standard back-propagation algorithm.
The back-propagation through-time algorithm can be implemented in the epochwise mode,
continuous(real-time) mode, or a combination thereof.
➢ The real-time recurrent learning (RTRL) algorithm
LEARNING ALGORITHMS
Continuous training.
Basically, BPTT and RTRL involve the propagation of derivatives, one in the backward direction and
the other in the forward direction. They can be used in any training process that requires the use of
derivatives. BPTT requires less computation than RTRL does, but the memory space required by BPTT
increases fast as the length of a sequence of consecutive input–target response pairs increases. Generally
speaking, we therefore find that BPTT is better for off-line training, and RTRL is more suitable for
on-line continuous training.
In any event, these two algorithms share many common features. First, they are both based on the
method of gradient descent, whereby the instantaneous value of a cost function (based on a squared-
error criterion) is minimized with respect to the synaptic weights of the network. Second, they are
both relatively simple to implement, but can be slow to converge. Third, they are related in that the
signal-flow graph representation of the back-propagation-through-time algorithm can be obtained from
transposition of the signal-flow graph representation of a certain form of the real-time recurrent learning
algorithm (Lefebvre, 1991; Beaufays and Wan, 1994).
BACK PROPAGATION THROUGH TIME
Backpropagation Through Time
(BPTT) is an extension of the
standard backpropagation
algorithm used for training
Recurrent Neural Networks
(RNNs). Since RNNs process
sequential data, BPTT is designed to
handle time-dependent
relationships by unrolling the
network over multiple time steps
and propagating errors backward
through time.
BACK PROPAGATION THROUGH TIME
BACK PROPAGATION THROUGH TIME
"Unfolding"
•Instead of treating the RNN as a single network with loops, we expand or unroll it into multiple
layers.
•Each "layer" in this unfolded network represents the same RNN cell, but at a different time step.
•The weights are shared across these layers because the same RNN cell is used at every time
step.
BACK PROPAGATION THROUGH TIME
Application of the
unfolding procedure
leads to two basically
different
implementations of
back propagation
through time,
depending on
whether epochwise
training or continuous
(real-time) training is
used.
BACK PROPAGATION THROUGH TIME
Epochwise Back Propagation Through Time
BACK PROPAGATION THROUGH TIME
Backpropagation Through Time (BPTT) is an extension of the backpropagation algorithm used for
training Recurrent Neural Networks (RNNs). Since RNNs process sequential data, standard
backpropagation cannot be directly applied due to the temporal dependencies between time steps.
BPTT overcomes this by unrolling the RNN over time and applying backpropagation across the
unfolded network.
BPTT is widely used in applications like speech recognition, natural language processing, and
time-series prediction.
Unlike feedforward networks, RNNs have hidden states that retain memory of previous inputs. This
makes the standard backpropagation method unsuitable because weight updates must consider past
dependencies.
BACK PROPAGATION THROUGH TIME
BACK PROPAGATION THROUGH TIME
BACK PROPAGATION THROUGH TIME
BACK PROPAGATION THROUGH TIME
BACK PROPAGATION THROUGH TIME
REAL-TIME RECURRENT LEARNING
In this section, we describe the second learning algorithm, real-time recurrent learning (RTRL),9
which was briefly described in Section 15.6. The algorithm derives its name from the fact that
adjustments are made to the synaptic weights of a fully connected recurrent network in real
time—that is, while the network continues to perform its signal-processing function (Williams
and Zipser, 1989). Figure 15.10 shows the layout of such a recurrent network. It consists of q
neurons with m external inputs. The network has two distinct layers: a concatenated input-
feedback layer and a processing layer of computation nodes. Correspondingly, the synaptic
connections of the network are made up of feedforward and feedback connections; the feedback
connections are shown in red in Fig. 15.10.
REAL-TIME RECURRENT LEARNING
Real-Time Recurrent Learning (RTRL) is an algorithm used to train recurrent neural networks
(RNNs) in an online, real-time manner. Unlike backpropagation through time (BPTT), which
requires unrolling the network over multiple time steps, RTRL updates network weights
incrementally as each new input is received.
RTRL is particularly useful for applications requiring continuous learning and adaptation, such
as robotics, control systems, and speech processing.
REAL-TIME RECURRENT LEARNING
•Handles Time Dependencies: It effectively updates weights for sequences without needing to
store past activations.
•Online Learning: Unlike BPTT, which requires waiting for a sequence to finish before
updating weights, RTRL updates the model in real-time.
•No Need for Truncation: Truncated BPTT approximates gradient updates by stopping at a
certain sequence length, while RTRL accounts for the full history.
•Handles Streaming Data: Works well with continuously arriving data, making it suitable for
real-world applications.
RTRL computes the exact gradient of a recurrent neural network with respect to its weights at
every time step. It does this by maintaining and updating a Jacobian matrix that captures how each
hidden unit’s state depends on the network’s parameters.
REAL-TIME RECURRENT LEARNING
REAL-TIME RECURRENT LEARNING
REAL-TIME RECURRENT LEARNING
Operating Principles of the Hopfield Network
One example for recurrent network is Hopfield Network
Operating Principles of the Hopfield Network
Operating Principles of the Hopfield Network
Operating Principles of the Hopfield Network
Stability Conditions of the Hopfield Network
Stability Conditions of the Hopfield Network
Stability Conditions of the Hopfield Network
Stability Conditions of the Hopfield Network
Stability Conditions of the Hopfield Network
Stability Conditions of the Hopfield Network
Associate Memories
Associate Memories
Associate Memories
Associate Memories
Associate Memories
Associate Memories
Associate Memories Pseudoinverse Matrix Method
Associate Memories
Associate Memories Storage Capacity of Memories
Associate Memories Storage Capacity of Memories
Design Aspects of the Hopfield Network
Design Aspects of the Hopfield Network