CCS355 Neural Networks and Deep learning
UNIT II
ASSOCIATIVE MEMORY AND
UNSUPERVISED LEARNING NETWORKS
Training Algorithms for Pattern Association-Auto associative Memory
Network-Hetero associative Memory Network-Bidirectional Associative
Memory (BAM)-Hopfield Networks-Iterative Auto associative Memory
Networks-Temporal Associative Memory Network-Fixed Weight Competitive
Nets-Kohonen Self-Organizing Feature Maps-Learning Vector Quantization-
Counter propagation Networks-Adaptive Resonance Theory Network.
Arunachala College of Engineering for Women 27
CCS355 Neural Networks and Deep learning
2.1 ASSOCIATIVE MEMORY NETWORKS
An associative memory network can store a set of patterns as memories. When the
associative memory is being presented with a key pattern, it responds by producing one of the
stored patterns, which closely resembles or relates to the key pattern. Thus, the recall is
through association of the key pattern, with the help of information memorized. These types
of memories are also called as content-addressable memories (CAM). The CAM can also
be viewed as associating data to address, i.e.; for every data in the memory there is a
corresponding unique address. Also, it can be viewed as data correlator.
Here input data is correlated with that of the stored data in the CAM. It should be
noted that the stored patterns must be unique, i.e., different patterns in each location. If the
same pattern exists in more than one location in the CAM, then, even though the correlation
is correct, the address is noted to be ambiguous. Associative memory makes a parallel search
within a stored data file. The concept behind this search is to Output any one or all stored
items which match the given search argument.
TRAINING ALGORITHMS FOR PATTERN ASSOCIATION
There are two algorithms developed for training of pattern association nets.
1. Hebb Rule
2. Outer Products Rule
1. Hebb Rule
The Hebb rule is widely used for finding the weights of an associative memory neural
network. The training vector pairs here are denoted as s:t. The weights are updated until there
is no weight change.
Hebb Rule Algorithm
Step 0: Set all the initial weights to zero, i.e.,
Wij = 0 (i = 1 to n, j = 1 to m)
Step 1: For each training target input output vector pairs s:t, perform Steps 2-4.
Step 3: Activate the output layer units to current target output,
yj = tj (for j = 1 to m)
Step 4: Start the weight adjustment
wij(new)=wij(old)+xiyj(i=1to n j=1 to m)
2. Outer Products Rule
Outer products rule is a method for finding weights of an associative net.
Input=> s = (s1, ... ,si, ... ,sn)
Output=> t= (t1, ... ,tj, ... ,tm)
Arunachala College of Engineering for Women 28
CCS355 Neural Networks and Deep learning
The outer product of the two vectors is the product of the matrices S = sT and T = t, i.e.,
between [n X 1] marrix and [1 x m] matrix. The transpose is to be taken for the input matrix
given.
ST = STt =>
This weight matrix is same as the weight matrix obtained by Hebb rule to store the pattern
association s:t. For storing a set of associations, s(p):t(p), p = 1 to P,
wherein,
s(p) = (s1 (p}, ... , si(p), ... , sn(p))
t(p) = (t1 (p), · · ·' tj(p), · · · 'tm(p))
the weight matrix W = {wij} can be given as
There two types of associative memories
Auto Associative Memory
Hetero Associative memory
2.2 AUTO ASSOCIATIVE MEMORY NETWORK
An auto-associative memory recovers a previously stored pattern that most closely relates to
the current pattern. It is also known as an auto-associative correlator. In the auto associative
memory network, the training input vector and training output vector are the same.
Auto Associative Memory Algorithm
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − Initialize all the weights to zero as wij = 0 i=1ton, j=1ton
Step 2 − Perform steps 3-4 for each input vector.
Step 3 − Activate each input unit as follows −
xi=si(i=1to n)
Step 4 − Activate each output unit as follows −
Arunachala College of Engineering for Women 29
CCS355 Neural Networks and Deep learning
yj=sj(j=1to n)
Step 5 − Adjust the weights as follows –
The weight can also be determine form the Hebb Rule or Outer Products Rule learning
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb‟s rule.
Step 2 − Perform steps 3-5 for each input vector.
Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to n;
Step 5 − Apply the following activation function to calculate the output
2.3 HETERO ASSOCIATIVE MEMORY NETWORK
In a hetero-associate memory, the training input and the target output vectors are different.
The weights are determined in a way that the network can store a set of pattern associations.
The association here is a pair of training input target output vector pairs (s(p), t(p)), with p =
1,2,…p. Each vector s(p) has n components and each vector t(p) has m components. The
determination of weights is done either by using Hebb rule or delta rule. The net finds an
Arunachala College of Engineering for Women 30
CCS355 Neural Networks and Deep learning
appropriate output vector, which corresponds to an input vector x, that may be either one of
the stored patterns or a new pattern.
Hetero Associative Memory Algorithm
Training Algorithm
Step 1 − Initialize all the weights to zero as wij = 0 i= 1 to n, j= 1 to m
Step 2 − Perform steps 3-4 for each input vector.
Step 3 − Activate each input unit as follows −
xi=si(i=1to n)
Step 4 − Activate each output unit as follows −
yj=sj(j=1to m)
Step 5 − Adjust the weights as follows −
wij(new)=wij(old)+xiyj
The weight can also be determine form the Hebb Rule or Outer Products Rule learning
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb‟s rule.
Step 2 − Perform steps 3-5 for each input vector.
Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to m;
Step 5 − Apply the following activation function to calculate the output
Arunachala College of Engineering for Women 31
CCS355 Neural Networks and Deep learning
2.4 BIDIRECTIONAL ASSOCIATIVE MEMORY (BAM)
Bidirectional associative memory (BAM), first proposed by Bart Kosko in the year 1988. The
BAM network performs forward and backward associative searches for stored stimulus
responses. The BAM is a recurrent hetero associative pattern-marching network that encodes
binary or bipolar patterns using Hebbian learning rule. It associates patterns, say from set A
to patterns from set B and vice versa is also performed. BAM neural nets can respond to input
from either layers (input layer and output layer).
Bidirectional Associative Memory Architecture
The architecture of BAM network consists of two layers of neurons which are connected by
directed weighted pare interconnections. The network dynamics involve two layers of
interaction. The BAM network iterates by sending the signals back and forth between the two
layers until all the neurons reach equilibrium. The weights associated with the network are
bidirectional. Thus, BAM can respond to the inputs in either layer.
Figure shows a BAM network consisting of n units in X layer and m units in Y layer. The
layers can be connected in both directions (bidirectional) with the result the weight matrix
sent from the X layer to the Y layer is W and the weight matrix for signals sent from the Y
layer to the X layer is WT. Thus, the Weight matrix is calculated in both directions.
Determination of Weights
Let the input vectors be denoted by s(p) and target vectors by t(p). p = 1, ... , P. Then the
weight matrix to store a set of input and target vectors, where
Arunachala College of Engineering for Women 32
CCS355 Neural Networks and Deep learning
s(p) = (s1(p), .. , si(p), ... , sn(p))
t(p) = (t1(p), .. , tj(p), ... , tm(p))
can be determined by Hebb rule training a1gorithm. In case of input vectors being binary, the
weight matrix W = {wij} is given by
When the input vectors are bipolar, the weight matrix W = {wij} can be defined as
The activation function is based on whether the input target vector pairs used are binary or
bipolar
Testing Algorithm for Discrete Bidirectional Associative Memory
Step 0: Initialize the weights to store p vectors. Also initialize all the activations to zero.
Step 1: Perform Steps 2-6 for each testing input.
Step 2: Set the activations of X layer to current input pattern, i.e., presenting the input pattern
x to X layer and similarly presenting the input pattern y to Y layer. Even though, it is
bidirectional memory, at one time step, signals can be sent from only one layer. So, either of
the input patterns may be the zero vector
Step 3: Perform Steps 4-6 when the activations are not converged.
Step 4: Update the activations of units in Y layer. Calculate the net input,
Arunachala College of Engineering for Women 33
CCS355 Neural Networks and Deep learning
Applying ilie activations, we obtain
Send this signal to the X layer.
Step 5: Update the activations of units in X layer. Calculate the net input,
Applying ilie activations, we obtain
Send this signal to the Y layer.
Step 6: Test for convergence of the net. The convergence occurs if the activation vectors x
and y reach equilibrium. If this occurs then stop, Otherwise, continue.
2.5 HOPFIELD NEURAL NETWORK
Hopfield neural network was proposed by John J. Hopfield in 1982. It is an auto-associative
fully interconnected single layer feedback network. It is a symmetrically weighted network
(i.e., Wij = Wji). The Hopfield network is commonly used for auto-association and
optimization tasks.
The Hopfield network is of two types
1. Discrete Hopfield Network
2. Continuous Hopfield Network
Discrete Hopfield Network
When this is operated in discrete line fashion it is called as discrete Hopfield network. The
network takes two-valued inputs: binary (0, 1) or bipolar (+1, -1); the use of bipolar inputs
makes the analysis easier. The network has symmetrical weights with no self-connections,
i.e.,
Wij =Wji
Wij = 0 if i = j
Architecture of Discrete Hopfield Network
Arunachala College of Engineering for Women 34
CCS355 Neural Networks and Deep learning
The Hopfield's model consists of processing elements with two outputs, one inverting and the
other non-inverting. The outputs from each processing element are fed back to the input of
other processing elements but not to itself.
Training Algorithm of Discrete Hopfield Network
During training of discrete Hopfield network, weights will be updated. As we know that we
can have the binary input vectors as well as bipolar input vectors.
Let the input vectors be denoted by s(p), p = 1, ... , P. Then the weight matrix W to store a set
of input vectors, where
In case of input vectors being binary, the weight matrix W = {wij} is given by
When the input vectors are bipolar, the weight matrix W = {wij} can be defined as
Testing Algorithm of Discrete Hopfield Net
Step 0: Initialize the weights to store patterns, i.e., weights obtained from training algorithm
using Hebb rule.
Step 1: When the activations of the net are not converged, then perform Steps 2-8.
Step 2: Perform Steps 3-7 for each input vector X.
Step 3: Make the initial activations of the net equal to the external input vector X:
yi=xi for i=1to n
Step 4: Perform Steps 5-7 for each unit yi. (Here, the units are updated in random order.)
Step 5: Calculate the net input of the network:
Arunachala College of Engineering for Women 35
CCS355 Neural Networks and Deep learning
Step 6: Apply the activations over the net input to calculate the output:
where θi is the threshold and is normally taken as zero.
Step 7: Now feedback the obtained output yi to all other units. Thus, the activation vectors
are updated.
Step 8: Finally, test the network for convergence.
Continuous Hopfield Network
Continuous network has time as a continuous variable, and can be used for associative
memory problems or optimization problems like traveling salesman problem. The nodes of
this nerwork have a continuous, graded output rather than a two state binary ourput. Thus, the
energy of the network decreases continuously with time.
Model − The model or architecture can be build up by adding electrical components such as
amplifiers which can map the input voltage to the output voltage over a sigmoid activation
function.
Energy Function Evaluation
Here λ is gain parameter and gri input conductance.
2.6 FIXED WEIGHT COMPETITIVE NETS
During training process also the weights remains fixed in these competitive networks. The
idea of competition is used among neurons for enhancement of contrast in their activation
functions. In this, two networks- Maxnet and Hamming networks
Maxnet
Maxnet network was developed by Lippmann in 1987. The Maxner serves as a sub net for
picking the node whose input is larger. All the nodes present in this subnet are fully
interconnected and there exist symmetrical weights in all these weighted interconnections.
Architecture of Maxnet
Arunachala College of Engineering for Women 36
CCS355 Neural Networks and Deep learning
The architecrure of Maxnet is fixed symmetrical weights are present over the weighted
interconnections. The weights between the neurons are inhibitory and fixed. The Maxnet with
this structure can be used as a subnet to select a particular node whose net input is the largest.
Testing Algorithm of Maxnet
The Maxnet uses the following activation function:
Testing algorithm
Step 0: Initial weights and initial activations are set. The weight is set as [0 < ε < 1/m], where
"m" is the total number of nodes. Let
Xj(0) = input the node Xj
and
Step 1: Perform Steps 2-4, when stopping condition is false.
Step 2: Update the activations of each node. For j = 1 to m,
Step 3: Save the activations obtained for use in the next iteration. For j = 1 to m,
Xj(new)=Xj(old)
Arunachala College of Engineering for Women 37
CCS355 Neural Networks and Deep learning
Step 4: Finally, test the stopping condition for convergence of the network. The following is
the stopping condition: If more than one node has a nonzero activation, continue; else stop.
Hamming Network
The Hamming network is a two-layer feed forward neural network for classification of binary
bipolar n-tuple input vectors using minimum Hamming distance denoted as DH(Lippmann,
1987). The first layer is the input layer for the n-tuple input vectors. The second layer (also
called the memory layer) stores p memory patterns. A p-class Hamming network has p output
neurons in this layer. The strongest response of a neuron is indicative of the minimum
Hamming distance between the stored pattern and the input vector.
Hamming Distance
Hamming distance of two vectors, x and y of dimension n
x.y = a - d
where: a is number of bits in agreement in x & y([Link] Similarities bits in x & y), and d is
number of bits different in x and y([Link] Dissimilarities bits in x & y).
The value "a - d" is the Hamming distance existing between two vectors. Since, the total
number of components is n, we have,
n=a+d
i.e., d = n - a
On simplification, we get
x.y = a - (n - a)
x.y = 2a - n
2a = x.y + n
From the above equation, it is clearly understood that the weights can be set to one-half the
exemplar vector and bias can be set initially to n/2
Testing Algorithm of Hamming Network
Step 0: Initialize the weights. For i = 1 to n and j = 1 to m,
Initialize the bias for storing the "m" exemplar vectors. For j = 1 to m,
Arunachala College of Engineering for Women 38
CCS355 Neural Networks and Deep learning
Step 1: Perform Steps 2-4 for each input vector x.
Step 2: Calculate the net input to each unit Yj, i.e.,
Step 3: Initialize the activations for Maxnet, i.e.,
Step 4: Maxnet is found to iterate for finding the exemplar that best matches the input
patterns.
2.7 KOHONEN SELF-ORGANIZING FEATURE MAPS
Self-Organizing Feature Maps (SOM) was developed by Dr. Teuvo Kohonen in 1982.
Kohonen Self-Organizing feature map (KSOM) refers to a neural network, which is trained
using competitive learning. Basic competitive learning implies that the competition process
takes place before the cycle of learning. The competition process suggests that some criteria
select a winning processing element. After the winning processing element is selected, its
weight vector is adjusted according to the used learning law.
Feature mapping is a process which converts the patterns of arbitrary dimensionality into a
response of one or two dimensions array of neurons. The network performing such a mapping
is called feature map. The reason for reducing the higher dimensionality, the ability to
preserve the neighbor topology.
Training Algorithm
Step 0: Initialize the weights with Random values and the learning rate
Step 1: Perform Steps 2-8 when stopping condition is false.
Arunachala College of Engineering for Women 39
CCS355 Neural Networks and Deep learning
Step 2: Perform Steps 3-5 for each input vector x.
Step 3: Compute the square of the Euclidean distance, i.e., for each j = i to m,
Step 4: Find the winning unit index J, so that D(J) is minimum.
Step 5: For all units j within a specific neighborhood of J and for all i, calculate the new
weights:
Step 6: Update the learning rare a using the formula (t is timestamp)
α(t+1)=0.5α(t)
Step 7: Reduce radius of topological neighborhood at specified time intervals.
Step 8: Test for stopping condition of the network.
2.8 LEARNING VECTOR QUANTIZATION
In 1980, Finnish Professor Kohonen discovered that some areas of the brain develop
structures with different areas, each of them with a high sensitive for a specific input pattern.
It is based on competition among neural units based on a principle called winner-takes-all.
Learning Vector Quantization (LVQ) is a prototype-based supervised classification
algorithm. A prototype is an early sample, model, or release of a product built to test a
concept or process. One or more prototypes are used to represent each class in the dataset.
New (unknown) data points are then assigned the class of the prototype that is nearest to
them. In order for "nearest" to make sense, a distance measure has to be defined. There is no
limitation on how many prototypes can be used per class, the only requirement being that
there is at least one prototype for each class. LVQ is a special case of an artificial neural
network and it applies a winner-take-all Hebbian learning-based approach. With a small
difference, it is similar to Self-Organizing Maps (SOM) algorithm. SOM and LVQ were
invented by Teuvo Kohonen.
LVQ system is represented by prototypes W=(W1....,Wn). In winner-take-all training
algorithms, the winner is moved closer if it correctly classifies the data point or moved away
if it classifies the data point incorrectly. An advantage of LVQ is that it creates prototypes
that are easy to interpret for experts in the respective application domain
Arunachala College of Engineering for Women 40
CCS355 Neural Networks and Deep learning
Training Algorithm
Step 0: Initialize the reference vectors. This can be done using the following steps.
From the given set of training vectors, take the first "m" (number of clusters) training
vectors and use them as weight vectors, the remaining vectors can be used for training.
Assign the initial weights and classifications randomly.
K-means clustering method.
Set initial learning rate α
Step l: Perform Steps 2-6 if the stopping condition is false.
Step 2: Perform Steps 3-4 for each training input vector x
Step 3: Calculate the Euclidean distance; for i = 1 to n, j = 1 to m,
Find the winning unit index J, when D(J) is minimum
Step 4: Update the weights on the winning unit, Wj using the following conditions.
Step 5: Reduce the learning rate α
Step 6: Test for the stopping condition of the training process. (The stopping conditions may
be fixed number of epochs or if learning rare has reduced to a negligible value.)
2.9 COUNTER PROPAGATION NETWORK
Counter propagation network (CPN) were proposed by Hecht Nielsen in [Link] are
multilayer network based on the combinations of the input, output, and clustering layers. The
application of counter propagation network are data compression, function approximation
and pattern association. The counter propagation network is basically constructed from an
instar-outstar model. This model is three layer neural network that performs input-output data
mapping, producing an output vector y in response to input vector x, on the basis of
competitive learning. The three layer in an instar-outstar model are the input layer, the
hidden(competitive) layer and the output layer.
There are two stages involved in the training process of a counter propagation net. The input
vector are clustered in the first stage. In the second stage of training, the weights from the
cluster layer units to the output units are tuned to obtain the desired response.
There are two types of counter propagation network:
Arunachala College of Engineering for Women 41
CCS355 Neural Networks and Deep learning
1. Full counter propagation network
2. Forward-only counter propagation network
Full counter propagation network
Full CPN efficiently represents a large number of vector pair x:y by adaptively constructing a
look-up-table. The full CPN works best if the inverse function exists and is continuous. The
vector x and y propagate through the network in a counterflow manner to yield output vector
x* and y*.
Architecture of Full Counter propagation Network
The four major components of the instar-outstar model are the input layer, the instar, the
competitive layer and the outstar. For each node in the input layer there is an input value xi.
All the instar are grouped into a layer called the competitive layer. Each of the instar
responds maximally to a group of input vectors in a different region of space. An outstar
model is found to have all the nodes in the output layer and a single node in the competitive
layer. The outstar looks like the fan-out of a node.
Training Algorithm for Full Counter propagation Network:
Step 0: Set the initial weights and the initial learning rare.
Step 1: Perform Steps 2-7 if stopping condition is false for phase-I training.
Step 2: For each of the training input vector pair x: y presented, perform Steps 3-5.
Step 3: Make the X-input layer activations to vector X. Make the Y-inpur layer activations to
vector Y.
Step 4: Find the winning cluster unit. If dot product method is used, find rhe cluster unit Zj
with target net input: for j = 1 to p.
If Euclidean distance method is used, find the cluster unit Zj whose squared distance from
input vectors is the smallest
If there occurs a tie in case of selection of winner unit, the unit with the smallest index is the
winner. Take the winner unit index as J.
Step 5: Update the weights over the calculated winner unit Zj
Arunachala College of Engineering for Women 42
CCS355 Neural Networks and Deep learning
Step 6: Reduce the learning rates α and β
Step 7: Test stopping condition for phase-I training.
Step 8: Perform Steps 9-15 when stopping condition is false for phase-II training.
Step 10: Make the X-input layer activations to vector x. Make the Y-input layer activations
to vector y.
Step 11: Find the winning cluster unit (use formulas from Step 4). Take the winner unit index
as J.
Step 12: Update the weights entering into unit ZJ
Step 13: Update the weights from unit Zj to the output layers.
Step 14: Reduce the learning rates a and b.
Step 15: Test stopping condition for phase-II training.
Forward-only Counter propagation network:
A simplified version of full CPN is the forward-only CPN. Forward-only CPN uses only the
x vector to form the cluster on the Kohonen units during phase I training. In case of forward-
only CPN, first input vectors are presented to the input units. First, the weights between the
input layer and cluster layer are trained. Then the weights between the cluster layer and
output layer are trained. This is a specific competitive network, with target known.
Architecture of forward-only CPN
It consists of three layers: input layer, cluster layer and output layer. Its architecture
resembles the back-propagation network, but in CPN there exists interconnections between
the units in the cluster layer.
Arunachala College of Engineering for Women 43
CCS355 Neural Networks and Deep learning
Training Algorithm for Forward-only Counter propagation network:
Step 0: Initial the weights and learning rare.
Step 1: Perform Steps 2-7 if stopping condition is false for phase-I training.
Step 2: Perform Steps 3-5 for each of uaining input X
Step 3: Set the X-input layer activations to vector X.
Step 4: Compute the winning cluster unit (J). If dot product method is used, find the cluster
unit zj with the largest net input.
If Euclidean distance method is used, find the cluster unit Zj whose squared distance from
input patterns is the smallest
If there exists a tie in the selection of winner unit, the unit with the smallest index is chosen
as the winner.
Step 5: Perform weight updation for unit Zj. For i= 1 to n,
Step 6: Reduce the learning rates α
Step 7: Test stopping condition for phase-I training.
Step 8: Perform Steps 9-15 when stopping condition is false for phase-II training.
Step 9: Perform Steps 10-13 for each training input Pair x:y..
Step 10: Set X-input layer activations to vector X. Sec Y-outpur layer activations to vector
Y.
Step 11: Find the winning cluster unit (use formulas from Step 4). Take the winner unit index
as J.
Step 12: Update the weights entering into unit ZJ,
Step 13: Update the weights from unit Zj to the output layers.
Step 14: Reduce the learning rates β.
Arunachala College of Engineering for Women 44
CCS355 Neural Networks and Deep learning
Step 15: Test stopping condition for phase-II training.
2.10 ADAPTIVE RESONANCE THEORY NETWORK
This network was developed by Stephen Grossberg and Gail Carpenter in 1987. It is based on
competition and uses unsupervised learning model. Adaptive Resonance Theory ART
networks, as the name suggests, is always open to new learning adaptive without losing the
old patterns resonance. Basically, ART network is a vector classifier which accepts an input
vector and classifies it into one of the categories depending upon which of the stored pattern
it resembles the most.
The Adaptive Resonance Theory addresses the stability-plasticity(stability can be
defined as the nature of memorizing the learning and plasticity refers to the fact that they are
flexible to gain new information) dilemma of a system that asks how learning can proceed in
response to huge input patterns and simultaneously not to lose the stability for irrelevant
patterns.
Types of Adaptive Resonance Theory (ART) Carpenter and Grossberg developed
different ART architectures as a result of 20 years of research. The ARTs can be classified
as follows:
ART1 – It is the simplest and the basic ART architecture. It is capable of clustering
binary input values.
ART2 – It is extension of ART1 that is capable of clustering continuous-valued input
data.
Fuzzy ART – It is the augmentation of fuzzy logic and ART.
ARTMAP – It is a supervised form of ART learning where one ART learns based on
the previous ART module. It is also known as predictive ART.
FARTMAP – This is a supervised ART architecture with Fuzzy logic included.
Operating Principal
The main operation of ART classification can be divided into the following phases −
Recognition phase − The input vector is compared with the classification presented
at every node in the output layer. The output of the neuron becomes “1” if it best
matches with the classification applied, otherwise it becomes “0”.
Arunachala College of Engineering for Women 45
CCS355 Neural Networks and Deep learning
Comparison phase − In this phase, a comparison of the input vector to the
comparison layer vector is done. The condition for reset is that the degree of similarity
would be less than vigilance parameter.
Search phase − In this phase, the network will search for reset as well as the match
done in the above phases. Hence, if there would be no reset and the match is quite
good, then the classification is over. Otherwise, the process would be repeated and the
other stored pattern must be sent to find the correct match.
ART1
It is a type of ART, which is designed to cluster binary vectors. We can understand about this
with the architecture of it.
Architecture of ART1
It consists of the following two units
Computational Unit − It is made up of the following
Input unit (F1 layer) − It further has the following two portions
F1a
Layer Input portion
− In ART1, there would be no processing in this portion rather than having the input vectors
only. It is connected to F1b Layer interface portion
F1b
Layer Interface portion
− This portion combines the signal from the input portion with that of F2 layer. F1b layer is
connected to F2 layer through bottom up weights bij and F2 layer is connected to F1b layer
through top down weights tji.
Cluster Unit (F2 layer) − This is a competitive layer. The unit having the largest net input is
selected to learn the input pattern. The activation of all other cluster unit are set to 0.
Reset Mechanism − The work of this mechanism is based upon the similarity between the
top-down weight and the input vector. Now, if the degree of this similarity is less than the
vigilance parameter, then the cluster is not allowed to learn the pattern and a rest would
happen.
Supplement Unit − Actually the issue with Reset mechanism is that the layer F2 must have
to be inhibited under certain conditions and must also be available when some learning
happens. That is why two supplemental units namely, G1 and G2 is added along with reset
unit, R. They are called gain control units. These units receive and send signals to the other
Arunachala College of Engineering for Women 46
CCS355 Neural Networks and Deep learning
units present in the network. „+‟ indicates an excitatory signal, while „−‟ indicates an
inhibitory signal.
Parameters Used
Following parameters are used −
n − Number of components in the input vector
m − Maximum number of clusters that can be formed
bij − Weight from F1b to F2 layer, i.e. bottom-up weights
tji − Weight from F2 to F1b layer, i.e. top-down weights
ρ − Vigilance parameter
||x|| − Norm of vector x
Arunachala College of Engineering for Women 47
CCS355 Neural Networks and Deep learning
Algorithm
Step 1 − Initialize the learning rate, the vigilance parameter, and the weights as follows −
α>1and0<ρ≤1
0<bij(0)<αα−1+n and
tij(0)=1
Step 2 − Continue step 3-9, when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training input.
Step 4 − Set activations of all F1a and F1 units as follows
F2 = 0 and F1a = input vectors
Step 5 − Input signal from F1a to F1b layer must be sent like
si=xi
Step 6 − For every inhibited F2 node yj=∑ibijxi the condition is yj ≠ -1
Step 7 − Perform step 8-10, when the reset is true.
Step 8 − Find J for yJ ≥ yj for all nodes j
Step 9 − Again calculate the activation on F1b as follows
xi=sitJi
Step 10 − Now, after calculating the norm of vector x and vector s, we need to check the
reset condition as follows −
If ||x||/ ||s|| < vigilance parameter ρ, then inhibit node J and go to step 7
Else If ||x||/ ||s|| ≥ vigilance parameter ρ, then proceed further.
Step 11 − Weight updating for node J can be done as follows −
Step 12 − the stopping condition for algorithm must be checked and it may be as follows
Do not have any change in weight.
Reset is not performed for units.
Maximum number of epochs reached.
Advantage of Adaptive Resonance Theory (ART)
It exhibits stability and is not disturbed by a wide variety of inputs provided to its
network.
It can be integrated and used with various other techniques to give more good results.
Arunachala College of Engineering for Women 48
CCS355 Neural Networks and Deep learning
It can be used for various fields such as mobile robot control, face recognition, land
cover classification, target recognition, medical diagnosis, signature verification,
clustering web users, etc.
It has got advantages over competitive learning (like bpnn etc). The competitive
learning lacks the capability to add new clusters when deemed necessary.
It does not guarantee stability in forming clusters.
Limitations of Adaptive Resonance Theory Some ART networks are inconsistent (like
the Fuzzy ART and ART1) as they depend upon the order of training data, or upon the
learning rate.
PART A (Two Marks)
1. What is associative memory network?
Associative memory network can store a set of patterns as memories. When the
associative memory is being presented with a key pattern, it responds by producing one of
the stored patterns, which closely resembles or relates to the key pattern.
2. List the two algorithms developed for training of pattern association?
There are two algorithms developed for training of pattern association nets.
1. Hebb Rule
2. Outer Products Rule
3. Define Hebb rule.
The Hebb rule is widely used for finding the weights of an associative memory neural
network. The training vector pairs here are denoted as s:t. The weights are updated until
there is no weight change.
4. List the types of associative memory?
Auto Associative Memory
Hetero Associative memory
5. What is BAM?
BAM network performs forward and backward associative searches for stored
stimulus responses. The BAM is a recurrent hetero associative pattern-marching network
that encodes binary or bipolar patterns using Hebbian learning rule. It associates patterns,
say from set A to patterns from set B and vice versa is also performed. BAM neural nets
can respond to input from either layers (input layer and output layer).
Arunachala College of Engineering for Women 49
CCS355 Neural Networks and Deep learning
6. Mention the types of Hopfield network?
Discrete Hopfield Network
Continuous Hopfield Network
7. What is hamming network?
The Hamming network is a two-layer feed forward neural network for classification
of binary bipolar n-tuple input vectors using minimum Hamming distance denoted as
DH(Lippmann, 1987). The first layer is the input layer for the n-tuple input vectors. The
second layer (also called the memory layer) stores p memory patterns.
8. Define hamming distance?
Hamming distance of two vectors, x and y of dimension n
x.y = a - d
where: a is number of bits in agreement in x & y([Link] Similarities bits in x & y), and d is
number of bits different in x and y([Link] Dissimilarities bits in x & y).
The value "a - d" is the Hamming distance existing between two vectors. Since, the total
number of components is n, we have,
n=a+d
9. What is Learning Vector Quantization?
Learning Vector Quantization (LVQ) is a prototype-based supervised classification
algorithm. A prototype is an early sample, model, or release of a product built to test a
concept or process. One or more prototypes are used to represent each class in the dataset.
New (unknown) data points are then assigned the class of the prototype that is nearest to
them.
10. Give the types of counter propagation network?
There are two types of counter propagation network:
1. Full counter propagation network
2. Forward-only counter propagation network
11. Define full counter propagation network?
Full CPN efficiently represents a large number of vector pair x:y by adaptively
constructing a look-up-table. The full CPN works best if the inverse function exists and is
continuous. The vector x and y propagate through the network in a counterflow manner to
yield output vector x* and y*.
12. What is Forward-only Counter propagation network?
Arunachala College of Engineering for Women 50
CCS355 Neural Networks and Deep learning
A simplified version of full CPN is the forward-only CPN. Forward-only CPN uses
only the x vector to form the cluster on the Kohonen units during phase I training. In case
of forward-only CPN, first input vectors are presented to the input units. First, the weights
between the input layer and cluster layer are trained. Then the weights between the cluster
layer and output layer are trained.
13. What is auto associative memory?
An auto-associative memory recovers a previously stored pattern that most closely
relates to the current pattern. It is also known as an auto-associative correlator.
14. State the limits of ART?
Some ART networks are inconsistent (like the Fuzzy ART and ART1) as they
depend upon the order of training data, or upon the learning rate.
15. List the types of ART?
• ART1 – It is the simplest and the basic ART architecture. It is capable of
clustering binary input values.
• ART2 – It is extension of ART1 that is capable of clustering continuous-valued
input data.
• Fuzzy ART – It is the augmentation of fuzzy logic and ART.
• ARTMAP – It is a supervised form of ART learning where one ART learns
based on the previous ART module. It is also known as predictive ART.
• FARTMAP – This is a supervised ART architecture with Fuzzy logic included.
16. What are the two units of ART1?
It consists of the following two units
Computational Unit − It is made up of the following
Input unit (F1 layer) − It further has the following two portions
17. Mention the phases of ART?
Recognition phase
Comparison phase
Search phase
18. What are the different layers of forward-only CPN?
Input layer
Cluster layer
Output layer
Arunachala College of Engineering for Women 51
CCS355 Neural Networks and Deep learning
19. Write the Hebb Rule Algorithm?
Step 0: Set all the initial weights to zero, i.e.,
Wij = 0 (i = 1 to n, j = 1 to m)
Step 1: For each training target input output vector pairs s:t, perform Steps 2-4.
Step 3: Activate the output layer units to current target output,
yj = tj (for j = 1 to m)
Step 4: Start the weight adjustment
wij(new)=wij(old)+xiyj(i=1to n j=1 to m)
20. Define hetro-associative memory network?
In a hetero-associate memory, the training input and the target output vectors are
different. The weights are determined in a way that the network can store a set of pattern
associations. The association here is a pair of training input target output vector pairs
(s(p), t(p)), with p = 1,2,…p. Each vector s(p) has n components and each vector t(p) has
m components.
PART B (Possible Questions)
1. Explain the algorithm of BAM with its Architecture.
2. Explain the algorithm of Continuous Hopfield network with its Architecture
3. Show the step-by-step training algorithm for basic pattern association problem using
Hebb rule
4. Give the step-by-step training and testing procedure of Discrete BAM
5. What is Hopfield Memory? Explain briefly.
6. What is bi-directional memory (BAM)? Explain briefly with its architecture
7. Distinguish between Auto associative Memory and Hetero Associative Memory.
8. Distinguish between Continuous BAM and Discrete BAM
9. Explain the Counter propagation Networks in detail
10. Discuss about Adaptive Resonance Theory Network.
Arunachala College of Engineering for Women 52
CCS355 Neural Networks and Deep learning
PART C (Possible Questions)
1. A hetero associative network is given. Find the weight matrix and test the network
with the training input vectors.
S1 = (1,1,0,0) , S2 = (0,1,0,0), S3 = (0,0,1,1), S4 = (0,0,1,0)
t1 = (1,0), t2 = (1,0), t3 = (0,1), t4 = (0,1)
2. A hetero associative network is trained by Hebb outer product rule for input vector S=
[x1, x2, x3, x4] to output row vectors t = [t1,t2]. Find the weight matrix:
S1 = (1,1,0,0) , S2 = (1,1,1,0), S3 = (0,0,1,1), S4 = (0,1,0,0)
t1 = (1,0), t2 = (0.1), t3 = (1,0), t4 = (1,0)
3. Describe the Characteristics of Continuous Hopfield memory and discuss how it
can be used to solve Traveling salesman Problem
4. a) Draw the architecture of Continuous Hopfield network and derive the energy
function for the same.
b) Discuss any one application of Continuous Hopfield network.
5. Explain the algorithm of discrete Hopfield network with its Architecture.
6. With neat diagram explain the pattern matching in ART1 network
7. Discuss in detail architecture , working principle of ART2
8. Consider a kohenen network with 2 inputs and 5 cluster units. The initial weights
is w=[0.1 0.2 0.3 0.4 0.5
0.5 0.4 0.3 0.2 0.1]
Calculate the new weight when the network is presented with the input vector
[0.35 0.65]
9. Explain the Algorithm needed to propagate information in a network while implementing
SOM stimulator
10. Give the Architecture of kohenen self-organizing and explain how it is used to cluster the
input vectors
Arunachala College of Engineering for Women 53