CHAPTER 4
ASSOCIATIVE MEMORY
NETWORKS
PATTERN ASSOCIATION
Associating patterns which are
• similar,
• contrary,
• in close proximity (spatial),
• in close succession (temporal).
Associative recall
• evoke associated patterns,
• recall a pattern by part of it,
• evoke/recall with incomplete/noisy patterns.
ASSOCIATIVE MEMORY (AM) NETWORK
Two types of associations exist. For two patterns s and t
• hetero-association (s != t): relating two different patterns (s –
input, t – target).
• auto-association (s = t): relating parts of a pattern with other
parts.
Architectures of NN associative memory:
• single layer (with/out input layer),
• two layers (for bidirectional association)
Learning algorithms for AM:
• Hebbian learning rule and its variations,
• gradient descent.
ASSOCIATIVE MEMORY NETWORK
WORKING PROCESS
• Recall a stored pattern by a noisy input pattern.
• Using the weights that capture the association.
• Stored patterns are viewed as “attractors”, each has its
“attraction basin”.
• Often call this type of NN “associative memory” (recall by
association, not explicit indexing/addressing).
TRAINING ALGORITHM FOR ASSOCIATIVE
MEMORY NETWORK
Network structure: single layer
• one output layer of non-linear units and one input layer.
s_1 x_1 w_1 y_1 t_1
1
w_1m
w_n
s_n x_n 1 y_m t_m
w_n
Goal of learning: m
• to obtain a set of weights w_ij from a set of training pattern
pairs {s:t} such that when s is applied to the input layer, t is
computed at the output layer,
• for all training pairs s:t, tj = f(sTwj) for all j.
HEBB RULE FOR PATTERN ASSOCIATION
Algorithm (bipolar or binary patterns):
• For each training samples s:t: wij si t j
• wij increases if both si and t j are ON (binary) or have the same
sign (bipolar).
If wij 0, then, after updates for all P training patterns,
P
wij si ( p)t j ( p), W { wij }
P 1
• Instead of obtaining W by iterative updates, it can be
computed from the training set by calculating the outer
product of s and t.
OUTER PRODUCT FOR PATTERN ASSOCIATION
Let s and t be row vectors.
Then for a particular training pair s:t
s1 s1t1...s1t m w11...w1m
s t ...s t
W ( p) s ( p) t ( p) t1 ,..., t m
T 2 1 2 m
n
s n 1 n m n1
s t ...s t w ...wnm
and
P
W ( p) s T ( p) t ( p)
p 1
HETERO-ASSOCIATIVE MEMORY NETWORK
• Binary pattern pairs s:t with |s| = 4 and |t| = 2.
• Total weighted input to output units: y _ in j xi wij
y _ in j 0
i
1 if
• Activation function: threshold y j
0 if y _ in j 0
• Weights are computed by Hebbian rule (sum of outer products
of all training pairs) P
W si ( p)t j ( p)
T
p 1
• Training samples:
s(p) t(p)
p=1 (1 0 0 0) (1, 0)
p=2 (1 1 0 0) (1, 0)
p=3 (0 0 0 1) (0, 1)
p=4 (0 0 1 1) (0, 1)
COMPUTING THE WEIGHTS
1 1 0 1 1 0
0 0 0 1 1 0
sT (1) t (1) 1 0 s T (2) t (2) 1 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
sT (3) t (3) 0 1 s T (4) t (4) 0 1
0 0 0 1 0 1
1 0 1 1 0 1
2 0
1 0
W
0 1
0 2
TEST/ RECALL THE NETWORK
x [1 0 0 0] x [0 1 1 0]
2 0 2 0
0 1 1 0
1 0
1 0 0 0
1 0
2 0 1 1
0 1
0 1
0 0 2
2
y1 1, y 2 0 y1 1, y 2 1
x [0 1 0 0] similar to s(1) and s(2) (1 0 0 0), (1 1 0 0) class (1, 0)
2 0 (0 0 0 1), (0 0 1 1) class (0, 1)
0 1 0 0
1 0
1 0 (0 1 1 0) is not sufficiently similar
0 1 to any class
0 2
y1 1, y 2 0
AUTO-ASSOCIATIVE MEMORY NETWORK
• Same as hetero-associative nets, except t(p) =s (p).
• Used to recall a pattern by a its noisy or incomplete version.
(pattern completion/pattern recovery)
• A single pattern s = (1, 1, 1, -1) is stored (weights computed
by Hebbian rule or outer product rule.
1 1 1 1
1 1 1 1
W
1 1 1 1
1 1 1 1
training pattern 111 1 W 4 4 4 4 111 1
noisy pattern 111 1W 2 2 2 2 111 1
missing info 0 0 1 1W 2 2 2 2 111 1
more noisy 1 11 1W 0 0 0 0 not recognized
AUTO-ASSOCIATIVE MEMORY NETWORK –
DIAGONAL ELEMENTS
• Diagonal elements will dominate the computation when
multiple patterns are stored (= P).
• When P is large, W is close to an identity matrix. This causes
output = input, which may not be any stoned pattern. The
pattern correction power is lost.
• Replace diagonal elements by zero.
0 1 1 1
1 0 1 1
W0
1 1 0 1
1 1 1 0
(1 1 1 1)W ' (3 3 3 3) (1 1 1 1)
(1 1 1 1)W ' (3 1 1 1) (1 1 1 1)
(0 0 1 1)W ' (2 2 1 1) (1 1 1 1)
(1 1 1 1)W ' (1 1 1 1) wrong
STORAGE CAPACITY
• Number of patterns that can be correctly stored & recalled by a
network.
• More patterns can be stored if they are not similar to each
other (e.g., orthogonal).
• Non-orthogonal
0 0 2 2
0 0 0 0
(1 1 1 1) W0 (1 1 11) W0 (1 0 1 1)
(1 1 1 1) 2 0 0 2
It is not stored correctly
2 0 2 0
• Orthogonal
0 1 1 1
(1 1 1 1) 1 0 1 1
(1 1 1 1) W0
(1 1 1 1) 1 1 0 1 All three patterns can be
1 1 1 0 correctly recalled
BIDIRECTIONAL ASSOCIATIVE MEMORY (BAM)
NETWORK
Architecture:
• Two layers of non-linear units: X-layer, Y-layer.
• Units: discrete threshold, continuing sigmoid (can be either
binary or bipolar).
Weights:
P
Wnm sT ( p) t ( p) (Hebbian/o uterproduc t)
p 1
Symmetric: w ij w ji
Convert binary patterns to bipolar when constructing
W.
RECALL OF BAM NETWORK
Bidirectional, either by X (to recall Y) or by Y (to recall X).
Recurrent:
y (t ) [ f ( y _ in1 (t ),..., f ( y _ inm (t )]
n
where y _ in j (t ) wi j xi (t 1)
i 1
x(t 1) [ f ( x _ in1 (t 1),..., f ( x _ inn (t 1)]
m
where x _ in i (t 1) wij y j (t )
j 1
Update can be either asynchronous (as in hetero-associative memory)
or synchronous (change all Y units at one time, then all X units the
next time).
ACTIVATION FUNCTIONS IN BAM
The activation function is based on whether the input target vector
pairs used are binary or bipolar.
Activation function for the Y- Activation function for the X-
layer layer
With binary input vectors With binary input vectors
With bipolar input vectors With bipolar input vectors
DISCRETE HOPFIELD NETWORK (DHN)
A single-layer network
• each node as both input and output units.
More than an Associative Memory, Discrete Hopfield Network can
be used in several applications.
• Other applications such as combinatorial optimization.
Different forms: discrete & continuous.
Major contribution of John Hopfield to NN:
• Treating a network as a dynamic system.
• Introduction of energy function into Neural Network Research.
ARCHITECTURE OF DHN
Architecture
• Single-layer (units serve as both input and output):
nodes are threshold units (binary or bipolar).
weights: fully connected, symmetric, and zero diagonal.
w ij w ji
w ii 0
xi are external inputs, which
may be transient or
permanent.
STORAGE CAPACITY OF DHN
P: maximum number of random patterns of dimension n can be stored
in a DHM of n nodes
P
Hopfield’s observation: P 0.15n, 0.15
n
n P 1
Theoretical analysis: P ,
2 log 2 n n 2 log 2 n
P/n decreases because larger n leads to more interference between
stored patterns.
Some work to modify HM to increase its capacity to close to n, W is
trained (not computed by Hebbian rule).
CONTINUOUS HOPFIELD NET
Architecture
• Continuous node output, and continuous time
• Fully connected with symmetric weights wij w ji , wii 0
dui (t ) n
• Internal activation ui : with wij x j (t ) i neti (t )
dt j 1
• Output (state) xi (t ) f (ui (t ))
where f is a sigmoid function to ensure binary/bipolar output. E.g. for
bipolar, use hyperbolic tangent function:
e x e x
f ( x) tanh( x) x
e ex
CONTINUOUS HOPFIELD NET
Computation: all units change their output (states) at the same time,
based on states of all others.
n
• Compute net: neti (t ) wij x j (t ) i
j 1
• Compute internal activationui (t ) by first-order Taylor expansion
dui (t )
ui (t ) ini (t )dt ui (t ) ... ui (t ) neti
dt
• Compute output
xi (t ) f (ui (t ))
ITERATIVE ASSOCIATIVE MEMORY NETWORK
Example
0 1 1 1
1 0 1 1 Output units are
x (1, 1, 1, 1) W
1 1 0 1 threshold units
1 1 1 0
An incomplete recall input : x' (1, 0, 0, 0)
Wx' (0, 1, 1, 1) x"
Wx" (3, 2, 2, 2) (1, 1, 1, 1) x
In general: using current output as input of the next iteration
x(0) = initial recall input
x(I) = S(Wx(I-1)), I = 1, 2, ……
until x(N) = x(K) for some K < N
Dynamic System: State vector x(I)
If K = N-1, x(N) is a stable state (fixed point)
f(Wx(N)) = f(Wx(N-1)) = x(N)
If x(K) is one of the stored pattern, then x(K) is called a genuine
memory
Otherwise, x(K) is a spurious memory (caused by cross-
talk/interference between genuine memories)
Each fixed point (genuine or spurious memory) is an attractor (with
different attraction basin)
If K != N-1, limit-circle,
The network will repeat
x(K), x(K+1), …, x(N) = x(K) when iteration continues.
Iteration will eventually stop because the total number of distinct state
is finite (3^n) if threshold units are used. If patterns are continuous,
the system may continue evolve forever (chaos) if no such K exists.
SUMMARY
This chapter discussed on the various associative networks:
• Autoassociative Network
• Hetero-associative Network
• Bidirectional Associative Memory Network
• Hopfield Nets
• Iterative Associative Nets