ARRAY PROCESSORS
SIMD Computer Organization
SIMD Computer Organization:
I/O Instruction and data
CU memory
CU
CP
Control
Data bus
PE0
PEM0
PE1
PEM1
PEN-1
PEMN-1
Interconnection N/W CONFIG I
I/O Config II: CU memory CU
PE0
PE1
PEN-1
Alignment Network Data Bus
M0
M1
Mp-1
Formally, an SIMD computer is characterized by the following set of parameters: C = ( N, F, I, M)
N: Number of PEs in the system. Illiac IV ___ 64 MPP __ 16,384 (Massively parallel processing) F: Set of data routing functions provided by the interconnection network. Example: Mesh, star, Omega and butterfly.
data routing and network manipulation operations.
I: The set of instructions for scalar, vector,
M: Set of masking schemes, where each mask partitions the PEs into 2 disjoint subnets of enabled PEs & disable PEs.
Components in a PE:
Index Register Destination Register Ai Di Bi Ii Ci Data Routing Register
Ri
Si
ALU
Status Register
Ai, Bi and Ci are general purpose registers.
Only the contents of Ri are transformed to other PEs during data transfer.
If N = 2^m ( m = no. of bits required to identify a PE) PEs are
there, then Di will hold m bit address of the destined PE.
Each PEi is either active or inactive during instruction cycle:
Si = 1 Active
Si = 0 Inactive
Necessity of Data Routing:
Consider an Array: A = (A0, A1,..,An-1)
n-1
Now, for computing: i=0 S (n) = Ai For n = 8 with N = 8, addition is performed in log2 N steps i.e., 3 steps.
In SISD, the same thing would take 8 steps/loops by formula:
sum = sum + A[i]
If you want to make it faster, youll burn this loop into hardware steps.
But the SAME thing would take ONLY 3 steps in SIMD!!
SIMD:
1 2 3 1 1 + 2=3 2 + 3=5 3 + 4=7 4 + 5=9 5 + 6=11 6 + 7=13 7 + 8=15 1 3 5(1+5=6) 7(3+7=10) 9(5+9=14) 11(7+11=18) 13(9+13=22) 15(11+15=26) 1 3 6 10 1+!4 3+18
S(0) S(1)
S(2)
S(3) S(4) S(5) S(6) S(7)
4
5 6 7 8
6+22
10+26=36
Step 1
Step 2
Step 3
Step # 1: Ai would transfer data in Ri Ai Ri Ri Ri+1 Ai + Ri Step # 2: Ai Ri Ri Ri+2
Algorithm:
i = 0-6 i = 0-6
Ai
i = 1-7
i = 0-5 i = 0-5
Ai + Ri
Step # 3: Ai Ri Ri Ri+4 Ai + Ri
Ai
i = 2-7
i = 0-3 i = 0-3 Ai i = 4-7
During Data Routing:
Masking Scheme:
Step # 1: PE7 is disabled. Step # 2: PE6 and PE7 are disabled. Step # 3: PE4 PE7 are disabled.
During Addition:
Step # 1: PE0 is not involved.
Step # 2: PE0, PE1 are not involved.
Step # 3: PE0-PE3 are not involved.
SIMD Interconnection network
Interconnection networks
Static network
Dynamic network
1-D
2-D
3-D and hypercube
Bus Based
Switch Based
Single Stage
Multistage
crossbar
Static Networks (1D) Linear Array
N nodes connected by n-1 links; Internal nodes have degree 2 End nodes have degree 1. Diameter = n-1
2D Ring Network
Like a linear array, but the two end nodes are connected by an n
th
link.
Can be unidirectional or bi-directional.
Node degree (d)= 2 Diameter (D)
Unidirectional = n-1 Bidirectional = n/2
Star network
Internal node degree = n-1
External nodes have degree = 1 Network diameter = 2
Static Interconnection Networks (2D)
A n-dimensional
mesh [torus or wraparound mesh] is an extension of the linear array [ring]. Degree: 2-4 Examples: Intel Paragon (2D mesh),
Mesh interconnection network (ILLIAC IV N/W)
It is a chordal ring network.
a e 0 b 1 c 2 d 3 f
10
11
12 a
13 b
14 c
15 d
In the Illiac IV, each processor i was connected to processors: Ex: N=16 {i+1, i1, i+4, and i4} (mod 16). Here are the routing functions:
R+1(i) (i + 1) mod N R1(i) (i 1) mod N R+r (i) (i + r) mod N Rr (i) (i r) mod N
where r = N Where 0 i N-1. The diameter of an Illiac-IV mesh is N 1.
Chordal Ring N/W is ILLIAC-IV n/w. Also called as partially connected n/w. (Diag.)
15 14 13 12 11 10 9
1 2 3 4 5 6
By adding additional links (e.g. chords in a circle), the
node degree is increased, and the network diameter is reduced.
Barrel shifting network:
A barrel shifter is sometimes called a plus-minus- 2i network. Routing functions:
B+i (j ) = (j + 2i) mod N Bi (j ) = (j 2i) mod N
where 0 j N 1 and 0 i < log2N.
B+0, B-0, B+1, B-1, B+2, B-2, B+3 and B-3
In general, the diameter of a barrel shifter is (log2N)/2.
How Barrel Shifter Network is an enhancement over Mesh Interconnection network?
In Mesh Network: R+1=(0 1 2 3 .15) R-1 =(15 14 13.0) R+4=(0 4 8 12) (1 5 9 12) (2 6 10 14) (3 7 11 15) R-4 =(15 11 7 3) (14 10 6 2) (12 9 5 1) (12 8 4 0) In Barrel Shifting Network: B+0=(0 1 2 3 .15) B-0=(15 14 13.0) B+1=(0 2 4 6 8..14)(1 3 5 .15) B-1=(15 13 11.1) (14 12 ..0) B+2=(0 4 8 12) (1 5 9 13) (2 6 10 14) (3 7 11 15) B-2=(15 11 7 3) (14 10 6 2) (13 9 5 1) (12 8 4 0) B+3= (0 8) (1 9) (2 10) (3 11) (4 12) (5 13) (6 14) (7 15) B-3= (15 7) (14 6) (13 5) (12 4) (11 3) (10 2) (9 1) (8 0)
Contd
Comparing the two networks
B+0= R+1 B-0 = R-1 B+2= R+4 B-2 =R-4
Which means in general B+n/2 = R+r B-n/2 = R-r where n=log 2N
3D- Fully Connected
In the limit, we obtain a fully-connected network, with a node degree of n -1 and a diameter of 1.
3D networks :
000
3- cube
001 010 011 101
100 110
111
A hypercube is a generalized cube. In a hypercube, there are 2n nodes, for some n. Each node is connected to all other nodes whose
numbers differ from it in only one bit position. The node degree of n cube equals n and so does the
network diameter.
3-cube connected cyclic (CCC) network:
Is obtained from 3-cube n/w. The idea is to cutoff corner nodes of 3-cube and replace each by a cycle of 3 nodes. CCC n/w is constructed from k-cube with n = 2k cycles
nodes.
Hence a 3-cube can be transformed to a 3-CCC with k x 2k nodes.
4D hypercube
4D hypercube = two 3D hypercubes with an additional link connecting corresponding processors
Multistage Interconnection Network
Switch Modules
A x B switch module A inputs and B outputs In practice, A = B = power of 2 Each input is connected to one or more outputs
(conflicts must be avoided)
One-to-one (permutation) and one-to-many are allowed
Multistage Interconnection Network
Switch Modules
A 2 2 switch can be configured for
Straight-through Crossover Upper broadcast (upper input to both outputs) Lower broadcast (lower input to both outputs)
Binary Switch
2x2 Switch Legitimate States = 4
Perfect-shuffle interconnection:
This interconnection network is defined by the routing function S (an1 a1a0)2 = (an2 a1a0 an1)2
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
Perfect Shuffle
Inverse Shuffle
a shuffle network is not a complete interconnection network. This can be seen by looking at what happens as data is reci rculated through the network.
An exchange permutation can be added to a shuffle network to make
it into a complete interconnection structure.
E (an1 a1 a0)2 = an1 a1 0
with a shuffle-exchange network, arbitrary cyclic shifts of an N-element array can be performed in log N steps. Here
is a diagram of a multistage omega network for N = 8.
Exch. 1
0 1 2 3 4 5 6 7 0 4 1 5 2 6 3 7
Exch. 2
0 2 4 6 1 3 5 7
Exch. 3
0 1 2 3 4 5 6 7
Shuffle 1
Shuffle 2
Shuffle 3
Omega network features
There are log p stages each with p/2 switching elements each = p/2 * log p total Simple routing algorithm At each stage, look at the corresponding bit (starting with the msb) of the
source and destination address
If the bits are the same, messages passes through, otherwise is crossedover
Path Contention
0 1 4 2 3 5 4 0 1 2 3 4
5
6
5
6 7
Path Contention
0 1 2 3 4 4 0 1 2 3 4
5
6 5
5
6 7
Path Contention
0 1 2 3 4 0 1 2
3 4
5
6
5
6 5 7
Path Contention
0 1 2 3 4 0 1 2 3 4
5
6 4 5
5
6 7
Path Contention
0 1 2 3 4 0 1 2 3 4
5
6
5
6 7
Path Contention
0 1 2 3 4 0 1 2 3 4
5
6 5
5
6 7
Path Contention
0 1 2 3 4 0 1 2 3 4 5
5
6
5
6 7
Path Contention
0 1 2 3 4 0 1 2 3 4 5 5 6 7
5
6
Extra Problems
. Find the following in a 16x16 omega network
a) b)
Number of stages. Number of 2 x 2 switches needed in each stage?.
c)
Draw a 16-input Omega network using 2 x 2 switches as building
blocks.
d)
Show switch settings for routing a message
from node 1101 to node 0101 and from node 0111 to node 1001
simultaneously.
From node 2 to 7 and 6 to 4 simultaneously.
e)
Does blocking exist in above two case?
Show the necessity of data routing and masking mechanisms during
the addition of 15 numbers. Assume each PE holds one element.
Find the number of steps required to add 16 elements. Calculate the different routing functions. Show the routing and addition in each step.