# Self-Organizing Networks:
● In unsupervised learning, there is no feedback from the environment to indicate desired
outputs.
● The network discovers relationships within the input data on its own, identifying features,
patterns, contours, correlations, categories, or classifications.
● Such networks are termed self-organizing networks.
# Unsupervised Learning:
● The network judges the similarity of new input patterns to previously seen patterns.
● The network learns to measure similarity, performing tasks such as principal component
analysis (PCA), clustering, adaptive vector quantization, and feature mapping.
Competition in Neural Networks:
● When trained to classify input patterns into distinct output classes (e.g., P, Q, R, S, or T),
the network must decide on a single class to which a new input belongs.
● The network employs a process called competition to ensure only one neuron (or unit)
responds.
● The principle of competition in neural networks is analogous to a "winner-take-all"
scenario, where only one neuron outputs a non-zero signal.
● This is similar to evaluating students and selecting the one with the highest score as the
winner.
● When a net is trained to classify the input signal into one of the output categories, A, B,
C, D, E, J, or K, the net sometimes responded that the signal was both a C and a K, or
both an E and a K, or both a J and a K, due to similarities in these character pairs. In this
case it will be better to include additional structure in the net to force it to make a
definitive decision. The mechanism by which this can be accomplished is called
competition.
● The most extreme form of competition among a group of neurons is called
Winner-Take-All, where only one neuron (the winner) in the group will have a nonzero
output signal when the competition is completed.
# Competitive Nets:
● Competitive nets are a type of self-organizing network that includes:
1. Max net
2. Mexican hat
3. Hamming net
4. Kohonen self-organizing feature map
5. Counter propagation net
6. Learning vector quantization (LVQ)
7. Adaptive resonance theory (ART)
● The extreme form of these competitive nets is known as winner-take-all.
# Kohonen Learning:
● Most of these networks use a learning algorithm called Kohonen learning.
● During learning, the network updates weights by forming a new weight vector that is a
linear combination of the old weight vector and the new input vector.
● The unit whose weight vector is closest to the input vector continues to learn and update
its weights.
Weight Update Formula in Kohonen Learning:
● For output cluster unit j:
where,
○ 𝑥: Input vector
○ 𝑤Θ𝑗: Weight vector for unit j
○ α: Learning rate, which decreases as training progresses.
Determining the Winner:
1. Euclidean Distance Method: The winner is the unit whose weight vector has the
smallest Euclidean distance from the input vector.
2. Dot Product Method: The winner is the unit with the largest dot product between the
input vector and weight vector. This corresponds to the smallest angle between the two
vectors when they are of unit length.
Fixed Weight Competitive Nets
● These competitive nets arc those where the weights remain fixed, even during the training
process.
● The idea of competition is used among neurons for enhancement of contrast in their
activation functions. These are
○ Maxnet,
○ Mexican hat and
○ Hamming net.
❖Maxnet
● The Maxnet network was developed by Lippmann in 1987.
● The Maxner serves as a sub net for picking the node whose input is larger.
● All the nodes present in this subnet are fully interconnected and there exist symmetrical
weights in all these weighted interconnections.
Architecture of Maxnet
● The architecture of Maxnet is that fixed symmetrical weights are present over the
weighted interconnections.
● The weights between the neurons are inhibitory and fixed.
● The Maxnet with this structure can be used as a subnet to select a particular node whose
net input is the largest.
The Maxnet uses the following activation function:
Testing algorithm
Step 0: Initial weights and initial activations are set. The weight is set as [0 < ε < 1/m], where
"m" is the total number of nodes. Let
and
Step 1: Perform Steps 2-4, when the stopping condition is false.
Step 2: Update the activations of each node. For j = 1 to m,
Step 3: Save the activations obtained for use in the next iteration. For j = 1 to m,
Step 4: Finally, test the stopping condition for convergence of the network. The following is the
stopping condition:
If more than one node has a nonzero activation, continue; else stop.
Example:
Construct a Maxnet with four neurons, given -
Inhibitory weight Ɛ= 0.2,
Initial activations (input signals) as follows:
a1(0) = 0.3 a2(0) = 0.5 a3(0) = 0.7 a4(0) = 0.9
Maxnet Example
❖Hamming Network
● The Hamming network is a two-layer feedforward neural network for classification of
binary bipolar n-tuple input vectors using minimum Hamming distance denoted as
H(Lippmann, 1987).
● The first layer is the input layer for the n-tuple input vectors.
● The second layer (also called the memory layer) stores p memory patterns.
● A p-class Hamming network has p output neurons in this layer.
● The strongest response of a neuron is indicative of the minimum Hamming distance
between the stored pattern and the input vector.
Structural diagram of the Hamming network:
Hamming Distance
Hamming distance of two vectors, x and y of dimension n
where: a is number of bits in agreement in x & y(No.of Similarity bits in x & y), and d is number
of bits different in x and y(No.of Dissimilarity bits in x & y).
The value "a - d" is the Hamming distance existing between two vectors. Since, the total number
of components is n, we have,
On simplification, we get
From the above equation, it is clearly understood that the weights can be set to one-half the
exemplar vector and bias can be set initially to n/2.
Testing Algorithm of Hamming Network
The given bipolar input vector is x and for a given set of "m" bipolar exemplar vectors say e(1),.
e(j)...., e(m), the Hamming network is used to determine the exemplar vector that is closest to the
input vector x.
The net input entering unit Yj gives the measure of the similarity between the input vector and
the exemplar vector. The parameters used here are the following:
n = number of input units (number of components of input-output vector)
m= number of output units (number of components of exemplar vector)
e(j) = jth exemplar vector, i.e.,
e(j) = [el (j), ..., ej(j), ..., en(j)]
The testing algorithm for the Hamming Net is as follows:
Step 0: Initialize the weights. For i = 1 to n and j = 1 to m,
Initialize the bias for storing the exemplar vectors. For j = 1 to m, ‘mn
Step 1: Perform Steps 2-4 for each input vector x.
Step 2: Calculate the net input to each unit
Step 3: Initialize the activations for Maxnet, i.e.,
Step 4: Maxnet is found to iterate for finding the exemplar that best matches the input patterns.
Hamming Network Example
Kohonen Self-Organizing Feature Maps (KSOFM)
➔Introduction to Feature Mapping
○ Converts patterns of arbitrary dimensionality into a one- or two-dimensional array of
neurons.
○ Purpose: Reduces higher-dimensional input space into a typical feature space while
preserving neighborhood relations of input patterns, creating a topology-preserving
map.
○ Network Structure:
■ One-Dimensional Mapping: Each component of the input vector is connected to
each node in a one-dimensional array of neurons.
■ Two-Dimensional Mapping: Inputs are arranged in a two-dimensional array, fully
connected, maintaining the input space's topology.
➔Kohonen Self-Organizing Feature Maps (KSOFM):
● Self-Organizing Feature Maps(SOM) was developed by Dr. Teuvo Kohonen in 1982.
● Kohonen Self-Organizing feature map (KSOM) refers to a neural network, which is
trained using competitive learning.
● Feature mapping is a process which converts the patterns of arbitrary dimensionality into
a response of one or two dimensions array of neurons. The network performing such a
mapping is called a feature map. The reason for reducing the higher dimensionality, the
ability to preserve the neighbor topology.
➔Architecture of Kohonen Self-Organizing Feature Maps
● Linear Array of Cluster Units:
○ Units are arranged in a linear array with neighborhoods of varying radii,
designated by Ni(k1), Ni(k2) and Ni(k,), k1 > k2 > k3, where k1 = 2, k2 = 1, k3 =
0.
○ Winning Unit: The unit with the symbol “#” is the winning unit, and other units
are indicated by "o".
Linear array of cluster units
● Grid Structures:
○ Rectangular Grid:
■ Neighborhood radius is defined by k1, k2, and k3.
■ Each unit has eight nearest neighbors.
Rectangular grid
○ Hexagonal Grid:
■ Similar to the rectangular grid but with six nearest neighbors for each unit.
■ In all the three cases, the unit with “#” symbol is the winning unit and the
other units are indicated by "o."
■ In both rectangular and hexagonal grids, k1 >k2 > k3, where k1 = 2, k2 =
1, k3 = 0.
Hexagonal grid
Architectural Representation: Typical architecture of KSOFM shows the arrangement and
interaction of units in either grid configuration.
Kohonen self organizing feature map architecture
➔Flowchart of Kohonen Self-Organizing Feature Maps
➔Training Algorithm of Kohonen Self-Organizing Feature Maps
● Step 0: Initialization:
○ Weights (wij): Initialized with random values, within the range of input vector
components.
○ Topological Neighborhood Parameters: Initially set to a broad range; the radius
decreases as clustering progresses.
○ Learning Rate (α): Initialized as a slowly decreasing function of time to allow
gradual convergence.
● Step 1: Continue iterating Steps 2 to 8 until the stopping condition is met.
● Step 2: Process each input vector through the network, performing subsequent steps for
each vector.
● Step 3: Compute the squared Euclidean distance for each unit j, defined as:
● Step 4: Identify the unit with the minimum distance D(J) as the winning unit J.
● Step 5: For all units j within the neighborhood of J, update the weights using:
Or
● Step 6: Update the learning rate α using: α(t+1)=0.5α(t)
● Step 7: Gradually reduce the radius of the topological neighborhood at specific time
intervals to refine the network's focus.
● Step 8: Test for the stopping condition, which typically involves criteria like convergence
or reaching a predefined number of iterations.
KSOFM Example
❖Learning Vector Quantization (LVQ)
➔ Overview of LVQ:
◆ LVQ is a process for classifying patterns, where each output unit represents a particular
class.
◆ Several output units can represent each class.
◆ The output unit’s weight vector is called the reference vector or codebook vector for the
class it represents.
◆ LVQ is a special case of competitive nets but uses supervised learning.
◆ During training, output units are adjusted to approximate the decision boundaries of a
Bayesian classifier.
◆ LVQ is used for tasks such as optical character recognition and speech-to-phoneme
conversion.
➔ Classification in LVQ:
◆ The network classifies an input vector by assigning it to the same class as the output unit
whose weight vector is closest to the input vector.
◆ LVQ adjusts boundaries between categories to minimize misclassification.
➔ LVQ Architecture:
◆ The LVQ architecture consists of:
● Input layer: With "n" units.
● Output layer: With "m" units.
◆ The input and output layers are fully interconnected with weighted links.
➔ Flowchart of LVQ:
➔ Training Algorithm of LVQ:
❖Counter Propagation Networks (CPNs)
● Counter Propagation Networks (CPNs) are multi-layer neural networks designed for tasks
such as data compression, function approximation, and pattern association.
● CPNs are constructed using an instar-outstar model, a three-layer neural network that
maps input to output using competitive learning.
● The three layers in this model are:
○ Input Layer
○ Competitive (Hidden) Layer
○ Output Layer
➔Working Mechanism
● Input to Competitive Layer: The connection between the input and competitive layers
follows an instar structure.
● Competitive to Output Layer: The connection between the competitive and output
layers follows an outstar structure.
● Training Process:
○ Stage 1: Input vectors are clustered using either Euclidean distance or dot product
methods.
○ Stage 2: Weights from the cluster units to the output units are adjusted to produce
the desired response.
➔Types of Counter Propagation Networks
1. Full Counter Propagation Network (Full CPN):
○ Efficiently represents large numbers of vector pairs.
○ Works best if the inverse function exists and is continuous.
○ Approximates the input-output pair using a look-up-table.
○ Learning Speed: Faster than traditional backpropagation networks, requiring
fewer training steps.
2. Forward-Only Counter Propagation Network:
○ Similar structure but differs in how inputs propagate through the network.
➢Full Counter Propagation Net (CPN)
➔ Efficiency
● Full CPN is as efficient as back-propagation networks in approximating continuous
functions, though it requires more hidden nodes for the same level of accuracy.
● Its key advantage is the speed of learning, as it combines:
○ Unsupervised learning (e.g., instar learning) and
○ Supervised learning (e.g., outstar learning).
➔ Learning in CPN
➔ Architecture
● Components:
1. Input Layer: Holds input vectors xi.
2. Competitive Layer (Instar): Contains nodes that compete based on the input
vectors.
■ Maximal Response: Each node maximally responds to inputs in specific
regions, classifying the input space.
■ The winning node is activated while others are suppressed (winner-take-all
mechanism).
3. Output Layer (Outstar): Receives fan-out from the competitive layer and
adjusts according to the winning node.
General Structure of full CPN
Architecture of full CPN
First phase of training of full CPN
Second phase of training of full CPN
➔ Training Algorithm
➔ Testing Algorithm
➢Forward-Only Counter Propagation Net (CPN)
● The Forward-Only CPN is a simplified version of the Full Counter Propagation Net
(CPN).
● It approximates the function y=f(x), but not x=f(y). This means it works when mapping
from x to y is well-defined, but the reverse is not.
● Only the x-vectors are used for forming clusters on the Kohonen units during the training
process.
➔ Training Process:
● Phase 1 (Cluster Formation):
○ Input vectors are presented to the input layer.
○ Cluster layer units compete to represent the input vector using the
"winner-take-all" approach.
○ The learning rate decreases over time, and multiple iterations are performed with
the input vectors.
○ Weights between the input layer and cluster layer are trained first.
● Phase 2 (Mapping to Output Layer):
○ Weights between the cluster layer and the output layer are then trained.
○ The target values corresponding to each input are presented to the output layer.
○ The winning cluster unit sends a signal to the output layer, where output units
compute signals.
➔ The weight updation from input units to cluster units is done using the learning rule given
below:
➔ Architecture of Forward Only Counter propagation Net:
Layers:
● Input Layer: Takes input vectors.
● Cluster (Competitive) Layer: Forms clusters based on the input vectors, using
competitive learning.
● Output Layer: Produces the target output corresponding to the input vector.
Competition:
● After the competition in the cluster layer, only one unit is activated, sending a signal to
the output layer.
● Similar to backpropagation networks, but the cluster layer has interconnections.
➔ Training Algorithm of Forward Only Counter propagation Net:
➔ Testing Algorithm of Forward Only Counter propagation Net:
❖ Adaptive Resonance Theory (ART) Network
● It was developed by Steven Grossberg and Gail Carpenter in 1987.
● ART networks are consistent with behavioral models and focus on unsupervised learning.
● They solve the problem of instability found in traditional feed-forward systems.
➔ Types of ART Networks:
Carpenter and Grossberg developed different ART architectures as a result of 20 years of
research. The ARTs can be classified as follows:
1. ART 1: The simplest variety of ART networks, accept only binary inputs. Designed for
clustering binary vectors.
2. ART 2: It extends network capabilities to support continuous inputs. Designed to accept
continuous-valued vectors.
3. Fuzzy ART : It implements fuzzy logic into ART’s pattern recognition, thus enhancing
generalizing ability. One very useful feature of fuzzy ART is complement coding, a
means of incorporating the absence of features into pattern classifications, which goes a
long way towards preventing inefficient and unnecessary category proliferation.
4. ARTMAP : Also known as Predictive ART, combines two slightly modified ARTs , may
be two ART-1 or two ART-2 units into a supervised learning structure where the first unit
takes the input data and the second unit takes the correct output data, then used to make
the minimum possible adjustment of the vigilance parameter in the first unit in order to
make the correct classification.
➔ Learning Process:
● Unsupervised Learning: Based on competition, where categories (clusters) are
autonomously found and new categories are created if needed.
● Order of Input Patterns: Input patterns can be presented in any sequence. The
appropriate cluster unit is chosen for each pattern, and its weights are adjusted to learn
the pattern.
● Similarity Control: The network controls the degree of similarity between patterns
placed on the same cluster unit, ensuring each input pattern is handled properly without
repeating clusters.
➔ Stability and Plasticity:
● Stability: Defined as the network’s ability to not assign the same pattern to previously
used cluster units during multiple presentations.
● Plasticity: Refers to the network's ability to respond to new patterns at any stage of
learning. It means the ART network can still learn new categories without losing previous
ones.
● Stability-Plasticity Dilemma: ART networks are designed to solve the conflict between
stability (preserving past learning) and plasticity (adapting to new information).
Traditional methods, like reducing learning rates to zero, freeze learned categories but
compromise plasticity. ART networks aim to maintain both stability and adaptability.
➔ Key Concepts:
● Bottom-up Learning: Input to output learning, similar to competitive learning.
● Top-down Learning: Output to input learning, which helps the system in stabilizing its
learned categories.
➔ Main Advantage of ART Networks:
● ART networks balance stability (preserving important past knowledge) and plasticity
(being flexible enough to learn new information). They remain adaptable and responsive
to new input patterns without losing previously learned categories.
➔Fundamental Architecture of ART:
An ART network is constructed using three main groups of neurons:
1. Input Processing Neurons (F1 Layer):
○ This layer is divided into two parts:
■ Input Portion: Receives and processes the input signals. In ART 2, more
processing is done compared to ART 1.
■ Interface Portion: Combines input from the input portion and the F2
layer to compare the similarity between the input signal and the weight
vector.
2. Clustering Units (F2 Layer):
○ Also known as the competitive layer. The clustering units compete using a
winner-take-all mechanism, and the unit with the highest net input is selected to
learn the input pattern. Other F2 units are set to zero (inactive).
3. Control Mechanism:
○ This mechanism regulates the degree of similarity of patterns placed on the same
cluster unit. It uses the signals from the interface and input portions of the F1
layer to decide whether the current cluster unit should learn the pattern or if a new
cluster unit should be selected.
Weight Interconnections in ART:
● Bottom-up Weights (F1 → F2): These connect the F1(b) layer to the F2 layer and control
how input signals affect the cluster layer. Represented by δij (i-th F1 unit to j-th F2 unit).
● Top-down Weights (F2 → F1): These connect the F2 layer back to the F1(b) layer. They
determine how the cluster layer influences the input layer's activation. Represented by μji
(j-th F2 unit to i-th F1 unit).
The similarity between the top-down weight vector and input vector determines whether the
cluster unit will learn the input pattern. If not, it is inhibited, and another cluster unit is chosen
for learning.
➔ Fundamental Algorithm of ART:
This is the step-by-step process followed by an ART network during training:
Step 0: Initialize the necessary parameters for the network (weights, learning rate, etc.).
Step 1: Begin the loop. Perform the following steps (2-9) until the stopping condition is met.
Step 2: For each input vector, perform steps 3-8.
Step 3: Process the input vector in the F1 layer (input layer).
Step 4: If the reset condition is true (meaning the current cluster unit doesn’t sufficiently match
the input), perform steps 5-7.
Step 5: Identify the victim unit in the F2 layer to learn the current input pattern. The victim is the
F2 unit (not inhibited) with the largest net input.
Step 6: F1(b) units combine their inputs from both the F1(a) (input) and F2 (cluster) layers.
Step 7: Check the reset condition:
○ If reset is true, the current victim unit is rejected (inhibited), and another cluster
unit is chosen (go back to step 4).
○ If reset is false, the victim unit is accepted for learning (proceed to step 8).
Step 8: Weight updation occurs. The network adjusts the weights to learn the input pattern.
Step 9: Check for the stopping condition (whether all input vectors have been learned, or a
predefined number of iterations has been reached).