[go: up one dir, main page]

0% found this document useful (0 votes)
6 views83 pages

Module 3

This document covers the principles and applications of dynamically driven recurrent networks and recurrent Hopfield networks, detailing their architectures, learning algorithms, and computational power. Key topics include the universal approximation theorem, controllability and observability, and various training methods such as back propagation through time and real-time recurrent learning. The document emphasizes the significance of global feedback in enhancing the capabilities of recurrent networks for tasks like nonlinear prediction and associative memory.

Uploaded by

girik11004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views83 pages

Module 3

This document covers the principles and applications of dynamically driven recurrent networks and recurrent Hopfield networks, detailing their architectures, learning algorithms, and computational power. Key topics include the universal approximation theorem, controllability and observability, and various training methods such as back propagation through time and real-time recurrent learning. The document emphasizes the significance of global feedback in enhancing the capabilities of recurrent networks for tasks like nonlinear prediction and associative memory.

Uploaded by

girik11004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

NEURAL NETWORKS AND DEEP LEARNING

Subject Code - CSE_ 3273

CREDITS – 03

Module -3

1
Module -3
DYNAMICALLY DRIVEN RECURRENT NETWORKS AND RECURRENT HOPFIELD NETWORKS

Topic Page No.


Introduction
Recurrent Network Architectures
Universal Approximation Theorem
Controllability and Observability
Computational Power of Recurrent Networks
Learning Algorithms
Back Propagation Through Time
Real-Time Recurrent Learning
Operating Principles of the Hopfield Network
Stability Conditions of the Hopfield Network, Associative Memories
Outer Product Method, Pseudoinverse Matrix Method
Storage Capacity of Memories, Design Aspects of the Hopfield Network
Case studies

Text Book 1: 15.1-15.8, relevant journals. Text Book 2: 7.1-7.5, relevant journals
INTRODUCTION

Global feedback is a facilitator of computational intelligence.

Global feedback in a recurrent network makes it possible to achieve some useful tasks:

➢ Content-addressable memory, exemplified by the Hopfield network;


➢ Autoassociation, exemplified by Anderson’s brain-state-in-a-box model;
➢ Dynamic reconstruction of a chaotic process (a series of actions), using feedback built around
a regularized one-step predictor.
Content-addressable memory (CAM) is a type of computer memory that searches for data based on its content rather
than its address

Autoassociation network, is any type of memory that is able to retrieve a piece of data from only a tiny sample of
itself
Dynamic reconstruction of a chaotic process" refers to the method of mathematically recreating the full dynamics of a chaotic
system by analyzing only a limited set of observed data points from that system
INTRODUCTION

Important application of recurrent networks: input–output mapping

Consider, for example, a multilayer perceptron with a single hidden layer as the basic building
block of a recurrent network. The application of global feedback around the multilayer
perceptron can take a variety of forms. We may apply feedback from the outputs of the hidden
layer of the multilayer perceptron to the input layer. Alternatively, we may apply the
feedback from the output layer to the input of the hidden layer. We may even go one step
further and combine all these possible feedback loops in a single recurrent network
structure. We may also, of course, consider other neural network configurations as the building
blocks for the construction of recurrent networks. The important point is that recurrent networks
have a very rich repertoire of architectural layouts, which makes them all the more powerful in
computational terms
INTRODUCTION

By definition, the input space of a mapping network is mapped onto an output space. For this
kind of application, a recurrent network responds temporally to an externally applied input
signal. We may therefore speak of the recurrent networks considered in this chapter as
dynamically driven recurrent networks—hence the title of the chapter.
Moreover, the application of feedback enables recurrent networks to acquire state
representations, which makes them desirable tools for such diverse applications as nonlinear
prediction and modeling, adaptive equalization of communication channels, speech
processing, and plant control, to name just a few.
RECURRENT NETWORK ARCHITECTURES

Four specific network architectures, each of which highlights a specific form of global
feedback. They share the following common features:

• They all incorporate a static multilayer perceptron or parts thereof.


• They all exploit the nonlinear mapping capability of the multilayer perceptron.

➢ Input–Output Recurrent Model

➢ State-Space Model
➢ Recurrent Multilayer Perceptrons

➢ Second-Order Network
Input–Output Recurrent Model

Figure 15.1 shows the architecture of a generic recurrent


network that follows naturally from a multilayer perceptron.
The model has a single input that is applied to a tappeddelay-
line memory of q units. It has a single output that is fed back
to the input via another tapped-delay-line memory, also of q
units. The contents of these two tapped-delay-line memories
are used to feed the input layer of the multilayer perceptron.

The present value of the model input is denoted by un, and


the corresponding value of the model output is denoted by
yn+1; that is, the output is ahead of the input by one time unit.
Input–Output Recurrent Model

Thus, the signal vector applied to the input layer of the multilayer perceptron consists of a data
window made up of the following components:
State-Space Model
State-Space Model
Recurrent Multilayer Perceptrons
Recurrent Multilayer Perceptrons
Second-Order Network

Term “order” to refer to the number of hidden neurons whose outputs are fed back to the input
layer via a bank of time-unit delays.
Second-Order Network
Second-Order Network
Second-Order Network

In light of this relationship, second-order networks are readily used for representing
and learning deterministic finite-state automated (DFA)4; a DFA is an information
processing system with a finite number of states.
UNIVERSAL APPROXIMATION THEOREM

The universal approximation theorem states that a neural network with a single hidden layer can
approximate any continuous function to any desired accuracy. It's a fundamental theorem in machine
learning and neural networks
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM
UNIVERSAL APPROXIMATION THEOREM

Increased the number of neurons in the hidden layer- Changed from 10 neurons to 50 neurons:

Changed the activation function- Replaced ReLU() with Tanh()

Increased the number of training epochs -Increased from 2000 epochs to 5000 epochs:
CONTROLLABILITY AND OBSERVABILITY

Many recurrent networks can be represented by the state-space model, where the state is defined by the
output of the hidden layer fed back to the input layer via a set of unit-time delays. In this context, it is
insightful to know whether the recurrent network is controllable and observable or not.

Controllability is concerned with whether we can control the dynamic behavior of the recurrent
network. Observability is concerned with whether we can observe the result of the control applied to the
recurrent network.

Formally, a dynamic system is said to be controllable if any initial state of the system is steerable to any
desired state within a finite number of time-steps; the output of the system is irrelevant to this definition.
Correspondingly, the system is said to be observable if the state of the system can be determined from a
finite set of input–output measurements
CONTROLLABILITY AND OBSERVABILITY

Local controllability and local observability—local in the sense that both properties apply in the
neighborhood of an equilibrium state of the network

Local Controllability

Let a recurrent network be defined by Eqs. (15.16) and (15.17), and let its linearized
version around the origin (i.e., equilibrium point) be defined by Eqs. (15.19) and (15.20). If
the linearized system is controllable, then the recurrent network is locally controllable
around the origin.
CONTROLLABILITY AND OBSERVABILITY

Local Observability

Let a recurrent network be defined by Eqs. (15.16) and (15.17), and let its linearized version
around the origin (i.e., equilibrium point) be defined by Eqs. (15.19) and (15.20). If the
linearized system is observable, then the recurrent network is locally observable around the
origin.
NARX-Nonlinear autoregressive network with exogenous inputs
COMPUTATIONAL POWER OF RECURRENT NETWORKS

Finite automata are abstract machines used


to recognize patterns in input sequences

The early work on recurrent networks used hard threshold logic for the activation
function of a neuron rather than soft sigmoid functions.
COMPUTATIONAL POWER OF RECURRENT NETWORKS
COMPUTATIONAL POWER OF RECURRENT NETWORKS
COMPUTATIONAL POWER OF RECURRENT NETWORKS

A nonlinear autoregressive network


with exogenous inputs (NARX) is a
type of artificial neural network
(ANN) that's used to model nonlinear
systems. It's often used for time series
prediction.
COMPUTATIONAL POWER OF RECURRENT NETWORKS

Figure 15.8 presents a portrayal of Theorems I and II and this corollary. It should, however, be noted that
when the network architecture is constrained, the computational power of a recurrent network may no
longer hold, as described in Sperduti (1997).

A Turing machine is a mathematical model of a computing device that manipulates symbols on a tape.
LEARNING ALGORITHMS

There are two modes of training an ordinary (static) multilayer perceptron: batch mode and
stochastic (sequential) mode.

In the batch mode, the sensitivity of the network is computed for the entire training sample before
adjusting the free parameters of the network.

In the stochastic mode, on the other hand, parameter adjustments are made after the presentation of
each pattern in the training sample.
LEARNING ALGORITHMS

There are two modes of training an ordinary (static) multilayer perceptron: batch mode and
stochastic (sequential) mode.
LEARNING ALGORITHMS

Likewise, we have two modes of training a recurrent network, described as follows (Williams and
Zipser, 1995):

Epochwise training.

➢ For a given epoch, the recurrent network uses a temporal sequence of input–target response pairs
and starts running from some initial state until it reaches a new state, at which point the training is
stopped and the network is reset to an initial state for the next epoch.

➢ The initial state doesn’t have to be the same for each epoch of training. Rather, what is important
is for the initial state for the new epoch to be different from the state reached by the network at the
end of the previous epoch. Consider, for example, the use of a recurrent network to emulate the
operation of a finite-state machine.

Epochwise training is a machine learning process where a model is trained on the entire dataset
multiple times (in "epochs") to improve performance. An epoch is one complete pass over the
entire training dataset.
LEARNING ALGORITHMS

Epochwise training.

➢ In such a situation, it is reasonable to use epochwise training, since there is a good possibility that
a number of distinct initial states and a set of distinct final states in the machine will be emulated
by the recurrent network. In epochwise training for recurrent networks, the term “epoch” is used
in a sense different from that for an ordinary multilayer perceptron.

➢ Although an epoch in the training of a multilayer perceptron involves the entire training sample of
input–target response pairs, an epoch in the training of a recurrent neural network involves a
single string of temporally consecutive input–target response pairs.
LEARNING ALGORITHMS

Continuous training. This second method of training is suitable for situations where there are
no reset states available or on-line learning is required. The distinguishing feature of
continuous training is that the network learns while performing signal processing.

Simply put, the learning process never stops. Consider, for example, the use of a recurrent
network to model a nonstationary process such as a speech signal. In this kind of situation,
continuous operation of the network offers no convenient times at which to stop the training
and begin a new with different values for the free parameters of the network.
LEARNING ALGORITHMS

Continuous training.
LEARNING ALGORITHMS

Continuous training.

Keeping these two modes of training in mind, in the next two sections we will describe two
different learning algorithms for recurrent networks, summarized as follows:

➢ The back-propagation-through-time (BPTT) algorithm, operates on the premise that the


temporal operation of a recurrent network may be unfolded into a multilayer perceptron. This
condition would then pave the way for application of the standard back-propagation algorithm.
The back-propagation through-time algorithm can be implemented in the epochwise mode,
continuous(real-time) mode, or a combination thereof.

➢ The real-time recurrent learning (RTRL) algorithm


LEARNING ALGORITHMS

Continuous training.

Basically, BPTT and RTRL involve the propagation of derivatives, one in the backward direction and
the other in the forward direction. They can be used in any training process that requires the use of
derivatives. BPTT requires less computation than RTRL does, but the memory space required by BPTT
increases fast as the length of a sequence of consecutive input–target response pairs increases. Generally
speaking, we therefore find that BPTT is better for off-line training, and RTRL is more suitable for
on-line continuous training.

In any event, these two algorithms share many common features. First, they are both based on the
method of gradient descent, whereby the instantaneous value of a cost function (based on a squared-
error criterion) is minimized with respect to the synaptic weights of the network. Second, they are
both relatively simple to implement, but can be slow to converge. Third, they are related in that the
signal-flow graph representation of the back-propagation-through-time algorithm can be obtained from
transposition of the signal-flow graph representation of a certain form of the real-time recurrent learning
algorithm (Lefebvre, 1991; Beaufays and Wan, 1994).
BACK PROPAGATION THROUGH TIME

Backpropagation Through Time


(BPTT) is an extension of the
standard backpropagation
algorithm used for training
Recurrent Neural Networks
(RNNs). Since RNNs process
sequential data, BPTT is designed to
handle time-dependent
relationships by unrolling the
network over multiple time steps
and propagating errors backward
through time.
BACK PROPAGATION THROUGH TIME
BACK PROPAGATION THROUGH TIME

"Unfolding"
•Instead of treating the RNN as a single network with loops, we expand or unroll it into multiple
layers.
•Each "layer" in this unfolded network represents the same RNN cell, but at a different time step.
•The weights are shared across these layers because the same RNN cell is used at every time
step.
BACK PROPAGATION THROUGH TIME

Application of the
unfolding procedure
leads to two basically
different
implementations of
back propagation
through time,
depending on
whether epochwise
training or continuous
(real-time) training is
used.
BACK PROPAGATION THROUGH TIME

Epochwise Back Propagation Through Time


BACK PROPAGATION THROUGH TIME

Backpropagation Through Time (BPTT) is an extension of the backpropagation algorithm used for
training Recurrent Neural Networks (RNNs). Since RNNs process sequential data, standard
backpropagation cannot be directly applied due to the temporal dependencies between time steps.
BPTT overcomes this by unrolling the RNN over time and applying backpropagation across the
unfolded network.

BPTT is widely used in applications like speech recognition, natural language processing, and
time-series prediction.

Unlike feedforward networks, RNNs have hidden states that retain memory of previous inputs. This
makes the standard backpropagation method unsuitable because weight updates must consider past
dependencies.
BACK PROPAGATION THROUGH TIME
BACK PROPAGATION THROUGH TIME
BACK PROPAGATION THROUGH TIME
BACK PROPAGATION THROUGH TIME
BACK PROPAGATION THROUGH TIME
REAL-TIME RECURRENT LEARNING

In this section, we describe the second learning algorithm, real-time recurrent learning (RTRL),9
which was briefly described in Section 15.6. The algorithm derives its name from the fact that
adjustments are made to the synaptic weights of a fully connected recurrent network in real
time—that is, while the network continues to perform its signal-processing function (Williams
and Zipser, 1989). Figure 15.10 shows the layout of such a recurrent network. It consists of q
neurons with m external inputs. The network has two distinct layers: a concatenated input-
feedback layer and a processing layer of computation nodes. Correspondingly, the synaptic
connections of the network are made up of feedforward and feedback connections; the feedback
connections are shown in red in Fig. 15.10.
REAL-TIME RECURRENT LEARNING

Real-Time Recurrent Learning (RTRL) is an algorithm used to train recurrent neural networks
(RNNs) in an online, real-time manner. Unlike backpropagation through time (BPTT), which
requires unrolling the network over multiple time steps, RTRL updates network weights
incrementally as each new input is received.

RTRL is particularly useful for applications requiring continuous learning and adaptation, such
as robotics, control systems, and speech processing.
REAL-TIME RECURRENT LEARNING

•Handles Time Dependencies: It effectively updates weights for sequences without needing to
store past activations.

•Online Learning: Unlike BPTT, which requires waiting for a sequence to finish before
updating weights, RTRL updates the model in real-time.

•No Need for Truncation: Truncated BPTT approximates gradient updates by stopping at a
certain sequence length, while RTRL accounts for the full history.

•Handles Streaming Data: Works well with continuously arriving data, making it suitable for
real-world applications.

RTRL computes the exact gradient of a recurrent neural network with respect to its weights at
every time step. It does this by maintaining and updating a Jacobian matrix that captures how each
hidden unit’s state depends on the network’s parameters.
REAL-TIME RECURRENT LEARNING
REAL-TIME RECURRENT LEARNING
REAL-TIME RECURRENT LEARNING
Operating Principles of the Hopfield Network

One example for recurrent network is Hopfield Network


Operating Principles of the Hopfield Network
Operating Principles of the Hopfield Network
Operating Principles of the Hopfield Network
Stability Conditions of the Hopfield Network
Stability Conditions of the Hopfield Network
Stability Conditions of the Hopfield Network
Stability Conditions of the Hopfield Network
Stability Conditions of the Hopfield Network
Stability Conditions of the Hopfield Network
Associate Memories
Associate Memories
Associate Memories
Associate Memories
Associate Memories
Associate Memories
Associate Memories Pseudoinverse Matrix Method
Associate Memories
Associate Memories Storage Capacity of Memories
Associate Memories Storage Capacity of Memories
Design Aspects of the Hopfield Network
Design Aspects of the Hopfield Network

You might also like