Cco Unit 5
Cco Unit 5
Examples of RISC processors are SUN's SPARC, PowerPC, Microchip PIC processors, RISC-V.
1. The RISC processor's performance is better due to the simple and limited number of
the instruction set.
2. It requires several transistors that make it cheaper to design.
3. RISC allows the instruction to use free space on a microprocessor because of its
simplicity.
4. RISC processor is simpler than a CISC processor because of its simple and quick design,
and it can complete its work in one clock cycle.
Disadvantages of RISC Processor
1. The RISC processor's performance may vary according to the code executed because
subsequent instructions may depend on the previous instruction for their execution in
a cycle.
2. Programmers and compilers often use complex instructions.
3. RISC processors require very fast memory to save various instructions that require a
large collection of cache memory to respond to the instruction in a short time.
RISC Architecture
It is a highly customized set of instructions used in portable devices due to system reliability
such as Apple iPod, mobiles/smartphones, Nintendo DS,
Characteristics of RISC
Here, are an important characteristic of RICS:
1. One cycle execution time: For executing each instruction in a computer, the RISC
processors require one CPI (Clock per cycle). And each CPI includes the fetch, decode
and execute method applied in computer instruction.
2. Pipelining technique: The pipelining technique is used in the RISC processors to
execute multiple parts or stages of instructions to perform more efficiently.
3. A large number of registers: RISC processors are optimized with multiple registers
that can be used to store instruction and quickly respond to the computer and
minimize interaction with computer memory.
4. It supports a simple addressing mode and fixed length of instruction for executing the
pipeline.
5. It uses LOAD and STORE instruction to access the memory location.
6. Simple and limited instruction reduces the execution time of a process in a RISC.
CISC Processor
The CISC Stands for Complex Instruction Set Computer, developed by the Intel. It has a
large collection of complex instructions that range from simple to very complex and
specialized in the assembly language level, which takes a long time to execute the
instructions. So, CISC approaches reducing the number of instruction on each program and
ignoring the number of cycles per instruction. It emphasizes to build complex instructions
directly in the hardware because the hardware is always faster than software. However, CISC
chips are relatively slower as compared to RISC chips but use little instruction than RISC.
Examples of CISC processors are VAX, AMD, Intel x86 and the System/360.
1. CISC chips are slower than RSIC chips to execute per instruction cycle on each
program.
2. The performance of the machine decreases due to the slowness of the clock speed.
3. Executing the pipeline in the CISC processor makes it complicated to use.
4. The CISC chips require more transistors as compared to RISC design.
5. In CISC it uses only 20% of existing instructions in a programming event.
Characteristic of CISC
RISC CISC
It uses LOAD and STORE that are It uses LOAD and STORE
independent instructions in the instruction in the memory-to-
register-to-register a program's memory interaction of a
interaction. program.
Parallel Processing
Parallel processing can be described as a class of techniques which enables the system to achieve simultaneous data-
processing tasks to increase the computational speed of a computer system.
A parallel processing system can carry out simultaneous data-processing to achieve faster execution time.
For instance, while an instruction is being processed in the ALU component of the CPU, the next instruction can be
read from memory.
A parallel processing system can be achieved by having a multiplie functional units that
perform identical or different operations simultaneously. The data can be distributed among
various multiple functional units.
The following diagram shows one possible way of separating the execution unit into eight
functional units operating in parallel.
The operation performed in each functional unit is indicated in each block if the diagram:
o The adder and integer multiplier performs the arithmetic operation with integer
numbers.
o The floating-point operations are separated into three circuits operating in parallel.
o The logic, shift, and increment operations can be performed concurrently on different
data. All units are independent of each other, so one number can be shifted while
another number is being incremented.
Flynn's Classification of Computers
M.J. Flynn proposed a classification for the organization of a computer system by the number of instructions and data
items that are manipulated simultaneously.
The operations performed on the data in the processor constitute a data stream.
Parallel processing may occur in the instruction stream, in the data stream, or both.
50.5M
812
Hello Java Program for Beginners
Flynn's classification divides computers into four major groups that are:
Flynn’s classification –
1. Single-instruction, single-data (SISD) systems –
An SISD computing system is a uniprocessor machine which is capable of executing a single instruction,
operating on a single data stream. In SISD, machine instructions are processed in a sequential manner and
computers adopting this model are popularly called sequential computers. Most conventional computers have
SISD architecture. All the instructions and data to be processed have to be stored in primary memory.
he speed of the processing element in the SISD model is limited(dependent) by the rate at which the computer can
transfer information internally. Dominant representative SISD systems are IBM PC, workstations.
2.
3. Single-instruction, multiple-data (SIMD) systems –
An SIMD system is a multiprocessor machine capable of executing the same instruction on all the CPUs but
operating on different data streams. Machines based on an SIMD model are well suited to scientific computing
since they involve lots of vector and matrix operations. So that the information can be passed to all the processing
elements (PEs) organized data elements of vectors can be divided into multiple sets(N-sets for N PE systems) and
each PE can process one data set.
4.
Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same data set. Machines built using the MISD model are not useful
in most of the application, a few machines are built, but none of them are available commercially.
8.
9. Multiple-instruction, multiple-data (MIMD) systems –
An MIMD system is a multiprocessor machine which is capable of executing multiple instructions on multiple
data sets. Each PE in the MIMD model has separate instruction and data streams; therefore machines built using
this model are capable to any kind of application. Unlike SIMD and MISD machines, PEs in MIMD machines
work asynchronously.
10.
MIMD machines are broadly categorized into shared-memory MIMD and distributed-memory MIMD based on
the way PEs are coupled to the main memory.
11.
In the shared memory MIMD model (tightly coupled multiprocessor systems), all the PEs are connected to a single
global memory and they all have access to it. The communication between PEs in this model takes place through the
shared memory, modification of the data stored in the global memory by one PE is visible to all other PEs. Dominant
representative shared memory MIMD systems are Silicon Graphics machines and Sun/IBM’s SMP (Symmetric
Multi-Processing).
In Distributed memory MIMD machines (loosely coupled multiprocessor systems) all PEs have a local memory.
The communication between PEs in this model takes place through the interconnection network (the inter process
communication channel, or IPC). The network connecting PEs can be configured to tree, mesh or in accordance with
the requirement.
The shared-memory MIMD architecture is easier to program but is less tolerant to failures and harder to extend with
respect to the distributed memory MIMD model. Failures in a shared-memory MIMD affect the entire system,
whereas this is not the case of the distributed model, in which each of the PEs can be easily isolated. Moreover,
shared memory MIMD architectures are less likely to scale because the addition of more PEs leads to memory
contention. This is a situation that does not happen in the case of distributed memory, in which each PE has its own
memory. As a result of practical outcomes and user’s requirement , distributed memory MIMD architecture is
superior to the other existing models.
1. Single-instruction, single-data (SISD) systems –
An SISD computing system is a uniprocessor machine which is capable of executing a single instruction,
operating on a single data stream. In SISD, machine instructions are processed in a sequential manner and
computers adopting this model are popularly called sequential computers. Most conventional computers have
SISD architecture. All the instructions and data to be processed have to be stored in primary memory.
The speed of the processing element in the SISD model is limited(dependent) by the rate at which the computer can
transfer information internally. Dominant representative SISD systems are IBM PC, workstations.
2.
3. Single-instruction, multiple-data (SIMD) systems –
An SIMD system is a multiprocessor machine capable of executing the same instruction on all the CPUs but
operating on different data streams. Machines based on an SIMD model are well suited to scientific computing
since they involve lots of vector and matrix operations. So that the information can be passed to all the processing
elements (PEs) organized data elements of vectors can be divided into multiple sets(N-sets for N PE systems) and
each PE can process one data set.
4.
Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same data set. Machines built using the MISD model are not useful
in most of the application, a few machines are built, but none of them are available commercially.
8.
9. Multiple-instruction, multiple-data (MIMD) systems –
An MIMD system is a multiprocessor machine which is capable of executing multiple instructions on multiple
data sets. Each PE in the MIMD model has separate instruction and data streams; therefore machines built using
this model are capable to any kind of application. Unlike SIMD and MISD machines, PEs in MIMD machines
work asynchronously.
10.
MIMD machines are broadly categorized into shared-memory MIMD and distributed-memory MIMD based on
the way PEs are coupled to the main memory.
11.
In the shared memory MIMD model (tightly coupled multiprocessor systems), all the PEs are connected to a single
global memory and they all have access to it. The communication between PEs in this model takes place through the
shared memory, modification of the data stored in the global memory by one PE is visible to all other PEs. Dominant
representative shared memory MIMD systems are Silicon Graphics machines and Sun/IBM’s SMP (Symmetric
Multi-Processing).
In Distributed memory MIMD machines (loosely coupled multiprocessor systems) all PEs have a local memory.
The communication between PEs in this model takes place through the interconnection network (the inter process
communication channel, or IPC). The network connecting PEs can be configured to tree, mesh or in accordance with
the requirement.
The shared-memory MIMD architecture is easier to program but is less tolerant to failures and harder to extend with
respect to the distributed memory MIMD model. Failures in a shared-memory MIMD affect the entire system,
whereas this is not the case of the distributed model, in which each of the PEs can be easily isolated. Moreover,
shared memory MIMD architectures are less likely to scale because the addition of more PEs leads to memory
contention. This is a situation that does not happen in the case of distributed memory, in which each PE has its own
memory. As a result of practical outcomes and user’s requirement , distributed memory MIMD architecture is
superior to the other existing models.
12.
Pipelining
The term Pipelining refers to a technique of decomposing a sequential process into sub-operations, where each sub-
operation are executed in a dedicated segment that operates concurrently with all other segments.
The most important characteristic of a pipeline technique is that several computations can be in progress in
distinct segments at the same time.
The overlapping of computation is made possible by associating a register with each segment in the pipeline.
The registers provide isolation between each segment so that each can operate on distinct data simultaneously.
The structure of a pipeline organization can be represented simply by including an input register for each segment
followed by a combinational circuit.
example :- consider combined multiplication and addition operation to get a better understanding of the pipeline
organization.
The combined multiplication and addition operation is done with a stream of numbers such as:
The operation to be performed on the numbers is decomposed into sub-operations with each sub-operation to be
implemented in a segment within a pipeline.
The sub-operations performed in each segment of the pipeline are defined as:
The following block diagram represents the combined as well as the sub-operations performed in each segment of the
pipeline.
Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in a
particular segment.
The output generated by the combinational circuit in a given segment is applied as an input
register of the next segment. For instance, from the block diagram, the register R3 is used as
one of the input registers for the combinational adder circuit.
In general, the pipeline organization is applicable for two areas of computer design which
includes:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmetic Pipelines are mostly used in high-speed computers. They are used to implement
floating-point operations
X = A * 2a = 0.9504 * 103
Y = B * 2b = 0.8200 * 102
Where A and B are two fractions that represent the mantissa and a and b are the exponents.
The combined operation of floating-point addition and subtraction is divided into four
segments. Each segment contains the corresponding suboperation to be performed in the
given pipeline. The suboperations that are shown in the four segments are:
The following block diagram represents the suboperations performed in each segment of the
pipeline.
Note: Registers are placed after each suboperation to store the intermediate results.
X = 0.9504 * 103
Y = 0.08200 * 103
3. Add mantissas:
The two mantissas are added in segment three.
Z = X + Y = 1.0324 * 103
Z = 0.1324 * 104
Note: Registers are placed after each suboperation to store the intermediate results.
Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the instruction stream as
well.
Most of the digital computers with complex instructions require instruction pipeline to carry
out operations like fetch, decode and execute instructions.
In general, the computer needs to process each instruction with the following sequence of
steps.
Each steps ares executed in a particular segment, and some times different segments may
take different times to operate on the incoming information.
Moreover,some times two or more segments may require memory access at the same time,
causing one segment to wait until another is finished with the memory.
The organization of an instruction pipeline will be more efficient if the instruction cycle is
divided into segments of equal duration.
A four-segment instruction pipeline combines two or more different segments and makes
it as a single one.
For instance, the decoding of the instruction can be combined with the calculation of the
effective address into one segment.
The following block diagram shows a typical example of a four-segment instruction pipeline.
The instruction cycle is completed in four segments.
Diagram: a four-segment instruction pipeline.
Segment 1:
The instruction fetch segment can be implemented using first in, first out (FIFO) buffer.
Segment 2:
The instruction fetched from memory is decoded in the second segment, and eventually, the
effective address is calculated in a separate arithmetic circuit.
Segment 3:
Segment 4:
The instructions are finally executed in the last segment of the pipeline organization.
RISCs Pipeline
RISC stands for Reduced Instruction Set Computers. It was introduced to execute as fast as one instruction per clock
cycle. This RISC pipeline helps to simplify the computer architecture’s design.
It relates to what is known as the Semantic Gap, that is, the difference between the operations provided in the high-
level languages (HLLs) and those provided in computer architectures.
To avoid these consequences, the conventional response of the computer architects is to add layers of complexity to
newer architectures. This also increases the number and complexity of instructions together with an increase in the
number of addressing modes. The architecture which resulted from the adoption of this “add more complexity” are
known as Complex Instruction Set Computers (CISC).
The main benefit of RISC to implement instructions at the cost of one per clock cycle is continually not applicable
because each instruction cannot be fetched from memory and implemented in one clock cycle correctly under all
circumstances.
The method to obtain the implementation of an instruction per clock cycle is to initiate each instruction with each
clock cycle and to pipeline the processor to manage the objective of single-cycle instruction execution.
RISC compiler gives support to translate the high-level language program into a machine language program. There
are various issues in managing complexity about data conflicts and branch penalties are taken care of by the RISC
processors, which depends on the adaptability of the compiler to identify and reduce the delays encountered with
these issues.
Principles of RISCs Pipeline
There are various principles of RISCs pipeline which are as follows −
Parallel processing
Execution of Concurrent Events in the computing process to achieve faster
Computational Speed
» Instruction Stream
Sequence of Instructions read from memory
» Data Stream
Operations performed on the data in the processor
What is Pipelining?
Pipelining is the process of accumulating instruction from the processor through a
pipeline. It allows storing and executing instructions in an orderly process. It is also known
as pipeline processing.
Pipelining is a technique where multiple instructions are overlapped during execution. Pipeline
is divided into stages and these stages are connected with one another to form a pipe like
structure. Instructions enter from one end and exit from another end.
Pipelining increases the overall instruction throughput.
In pipeline system, each segment consists of an input register followed by a combinational
circuit. The register is used to hold data and combinational circuit performs operations on it. The
output of combinational circuit is applied to the input register of the next segment.
Pipeline system is like the modern day assembly line setup in factories. For example in a car
manufacturing industry, huge assembly lines are setup and at each point, there are robotic arms
to perform a certain task, and then the car moves on ahead to the next arm.
Types of Pipeline
It is divided into 2 categories:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for floating
point operations, multiplication of fixed point numbers etc. For example: The input to the
Floating Point Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point numbers), while a and b
are exponents.
The floating point addition and subtraction is done in 4 parts:
Registers are used for storing the intermediate results between the above operations.
Instruction Pipeline
In this a stream of instructions can be executed by overlapping fetch, decode and execute phases
of an instruction cycle. This type of technique is used to increase the throughput of the computer
system.
An instruction pipeline reads instruction from the memory while previous instructions are being
executed in other segments of the pipeline. Thus we can execute multiple instructions
simultaneously. The pipeline will be more efficient if the instruction cycle is divided into
segments of equal duration.
Advantages of Pipelining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.
VectorProcessing
There is a class of computational problems that are beyond the capabilities of a
conventional computer. These problems require vast number of computations on multiple data
items, that will take a conventional computer(with scalar processor) days or even weeks to
complete.
Such complex instructions, which operates on multiple data at the same time, requires a better
way of instruction execution, which was achieved by Vector processors.
Scalar CPUs can manipulate one or two data items at a time, which is not very efficient. Also,
simple instructions like ADD A to B, and store into C are not practically efficient.
Addresses are used to point to the memory location where the data to be operated will be found,
which leads to added overhead of data lookup. So until the data is found, the CPU would be
sitting ideal, which is a big performance issue.
Hence, the concept of Instruction Pipeline comes into picture, in which the instruction passes
through several sub-units in turn. These sub-units perform various independent functions, for
example: the first one decodes the instruction, the second sub-unit fetches the data and
the thirdsub-unit performs the math itself. Therefore, while the data is fetched for one
instruction, CPU does not sit idle, it rather works on decoding the next instruction set, ending up
working like an assembly line.
Vector processor, not only use Instruction pipeline, but it also pipelines the data, working on
multiple data at the same time.
A normal scalar processor instruction would be ADD A, B, which leads to addition of two
operands, but what if we can instruct the processor to ADD a group of
numbers(from 0 to n memory location) to another group of numbers(lets say, n to k memory
location). This can be achieved by vector processors.
In vector processor a single instruction, can ask for multiple data operations, which saves time,
as instruction is decoded once, and then it keeps on operating on different data items.
Applications of Vector Processors
Computer with vector processing capabilities are in demand
in specialized applications. The following are some areas
where vector processing is used:
1. Petroleum exploration.
2. Medical diagnosis.
3. Data analysis.
4. Weather forecasting.
5. Aerodynamics and space flight simulations.
6. Image processing.
7. Artificial intelligence.
- Arrays provide efficient access to any element and can not modify or increase the
size of the array.
- Vector is efficient in insertion, deletion and to increase the size.
- Arrays size is fixed where as Vector size can increase.
Vector processor
Vector processor is basically a central processing unit that has the ability to execute
the complete vector input in a single instruction.
Vector(array) is a ordered set of 1D array of data items a vector V of length n is
referred as V=[V1,v2,v3,v4].
vector processing performs the arithmetic operation on the large array of integers or
floating-point number. Vector processing operates on all the elements of the array in
parallel providing each pass is independent of the other.
Vector processing avoids the overhead of the loop control mechanism that occurs in
general-purpose computers.
Each element of the vector operand is a scalar quantity which can either be an
integer, floating-point number, logical value or a character. Below we have classified
the vector instructions in four types.
Here, V is representing the vector operands and S represents the scalar operands. In
the figure below, O1 and O2 are the unary operations and O3 and O4 are the binary
operations.
Vector Instruction
1. Operation Code
Operation code indicates the operation that has to be performed in the given
instruction.
2. Base Address
Base address field refers to the memory location from where the operands are to be
fetched or to where the result has to be stored. The base address is found in the
memory reference instructions. In the vector instruction, the operand and the result
both are stored in the vector registers. Here, the base address refers to the
designated vector register.
3. Address Increment
A vector operand has several data elements and address increment specifies
the address of the next element in the operand. Some computer stores the data
element consecutively in main memory for which the increment is always 1. But,
some computers that do not store the data elements consecutively requires the variable
address increment.
4. Address Offset
Address Offset is always specified related to the base address. The effective memory
address is calculated using the address offset.
5. Vector Length
Most of the vector instructions are pipelined as vector instruction performs the same
operation on the different data sets repeatedlyThe pipelined vector processors can be
classified into two types based on from where the operand is being fetched for vector
processing.
memory to memory architecture, the operands or the results are directly fetched from
the memory despite using registers.
Vector processor uses vector instructions by which code density of the instructions
can be improved.
The sequential arrangement of data helps to handle the data by the hardware in a
better way.
It offers a reduction in instruction bandwidth.
Array processors are also known as multiprocessors or vector processors. They
perform computations on large arrays of data. Thus, they are used to improve the
performance of the computer.
Array Processor
The main memory is used for storing the program. The control unit is responsible for
fetching the instructions. Vector instructions are send to all PE's simultaneously and
results are returned to the memory.
The best known SIMD array processor is the ILLIAC IV computer developed by
the Burroughs corps. SIMD processors are highly specialized computers. They are
only suitable for numerical problems that can be expressed in vector or matrix form
and they are not suitable for other types of computations.
Modes of I/O Data Transfer
Data transfer between the central unit and I/O devices can be handled in generally
three types of modes which are given below:
1. Programmed I/O
2. Interrupt Initiated I/O
3. Direct Memory Access
Programmed I/O
Programmed I/O instructions are the result of I/O instructions written in computer
program. Each data item transfer is initiated by the instruction in the program.
Usually the program controls data transfer to and from CPU and peripheral.
Transferring data under programmed I/O requires constant monitoring of the
peripherals by the CPU.
Interrupt Initiated I/O
In the programmed I/O method the CPU stays in the program loop until the I/O unit
indicates that it is ready for data transfer. This is time consuming process because it
keeps the processor busy needlessly.
This problem can be overcome by using interrupt initiated I/O. In this when the
interface determines that the peripheral is ready for data transfer, it generates an
interrupt. After receiving the interrupt signal, the CPU stops the task which it is
processing and service the I/O transfer and then returns back to its previous
processing task.
Removing the CPU from the path and letting the peripheral device manage the
memory buses directly would improve the speed of transfer. This technique is known
as DMA.
In this, the interface transfer data to and from the memory through memory bus. A
DMA controller manages to transfer data between peripherals and memory unit.
Many hardware systems use DMA such as disk drive controllers, graphic cards,
network cards and sound cards etc. It is also used for intra chip data transfer in
multicore processors. In DMA, CPU would initiate the transfer, do other operations
while the transfer is in progress and receive an interrupt from the DMA controller
when the transfer has been completed.
Each CPU in a distributed memory multiprocessor has its own private memory. Each
processor can use local data to accomplish the computational tasks. The processor
may use the bus to communicate with other processors or access the main memory if
remote data is required.
Advantages
There are various advantages of the multiprocessor system. Some advantages of the
multiprocessor system are as follows:
1. It is a very reliable system because multiple processors may share their work
between the systems, and the work is completed with collaboration.
2. It requires complex configuration.
3. Parallel processing is achieved via multiprocessing.
4. If multiple processors work at the same time, the throughput may increase.
5. Multiple processors execute the multiple processes a few times.
Disadvantages
Advantages
There are various advantages of the multicore system. Some advantages of the
multicore system are as follows:
Disadvantages
Traffic It has high traffic than It has less traffic than the
the multicore system. multiprocessors.
Interconnection structure
Interconnection structures :
The processors must be able to share a set of main memory modules & I/O devices in a
multiprocessor system. This sharing capability can be provided through interconnection
structures. The interconnection structure that are commonly used are as follows –
1. Time-shared / Common Bus
2. Cross bar Switch
3. Multiport Memory
4. Multistage Switching Network (Covered in 2nd part)
5. Hypercube System
To communicate with any functional unit, processor needs the bus to transfer the data.
To do so, the processor first need to see that whether the bus is available / not by
checking the status of the bus. If the bus is used by some other functional unit, the
status is busy, else free.
A processor can use bus only when the bus is free. The sender processor puts the
address of the destination on the bus & the destination unit identifies it. In order to
communicate with any functional unit, a command is issued to tell that unit, what
work is to be done. The other processors at that time will be either busy in internal
operations or will sit free, waiting to get bus.
We can use a bus controller to resolve conflicts, if any. (Bus controller can set priority
of different functional units)
This Single-Bus Multiprocessor Organization is easiest to reconfigure & is simple.
This interconnection structure contains only passive elements.
The bus interfaces of sender & receiver units controls the transfer operation here.
To decide the access to common bus without conflicts, methods such as static & fixed
priorities, First-In-Out (FIFO) queues & daisy chains can be used.
Advantages –
Inexpensive as no extra hardware is required such as switch.
Simple & easy to configure as the functional units are directly connected to the
bus .
Disadvantages –
Major fight with this kind of configuration is that if malfunctioning occurs in any
of the bus interface circuits, complete system will fail.
Decreased throughput —
At a time, only one processor can communicate with any other functional unit.
Increased arbitration logic it is the number of processors & memory unit
increases, the bus contention problem increases.
Both the buses are required in a single transfer operation. Here, the system complexity
is increased & the reliability is decreased, The solution is to use multiple bi-
directional buses.
Apart from the organization, there are many factors affecting the performance of bus.
They are –
Number of active devices on the bus.
Data width
Error Detection method
Synchronization of data transfer etc.
Crossbar Switch :
A point is reached at which there is a separate path available for each memory
module, if the number of buses in common bus system is increased. Crossbar
Switch (for multiprocessors) provides separate path fro each module.
A crossbar switch system permits simultaneous transfers from all memory modules
because there is a separate path associated with each module. Thus, the hardware
needed to implement the switch may become quite large and complex.
4.Hypercube Interconnection :
This is a binary n-cube architecture. Here we can connect 2n processors and each of
the processor here forms a node of the cube. A node can be memory module, I/O
interface also, not necessarily processor. The processor at a node has communication
path that is direct goes to n other nodes (total 2n nodes). There are total 2n distinct n-
bit binary addresses.