[go: up one dir, main page]

0% found this document useful (0 votes)
34 views41 pages

Cco Unit 5

Uploaded by

217r1a7436
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views41 pages

Cco Unit 5

Uploaded by

217r1a7436
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

UNIT 5

Reduced Instruction Set Computer: CISC Characteristics, RISC


Characteristics.
Pipeline and Vector Processing: Parallel Processing, Pipelining, Arithmetic
Pipeline,
Instruction Pipeline, RISC Pipeline, Vector Processing, Array Processor.
Multi Processors: Characteristics of Multiprocessors, Interconnection
Structures, Cache
Coherence.

Discuss about Associative mapping with example. (8M)


7. a) What is handshaking? Discuss with neat diagrams.
(8M)
b) Draw the block diagram for asynchronous communication
interface.
Difference Between RISC and CISC
RISC Processor
RISC stands for Reduced Instruction Set Computer Processor, a microprocessor
architecture with a simple collection and highly customized set of instructions. It is built to
minimize the instruction execution time by optimizing and limiting the number of
instructions. It means each instruction cycle requires only one clock cycle, and each cycle
contains three parameters: fetch, decode and execute. The RISC processor is also used to
perform various complex instructions by combining them into simpler ones. RISC chips
require several transistors, making it cheaper to design and reduce the execution time for
instruction.

Examples of RISC processors are SUN's SPARC, PowerPC, Microchip PIC processors, RISC-V.

Advantages of RISC Processor

1. The RISC processor's performance is better due to the simple and limited number of
the instruction set.
2. It requires several transistors that make it cheaper to design.
3. RISC allows the instruction to use free space on a microprocessor because of its
simplicity.
4. RISC processor is simpler than a CISC processor because of its simple and quick design,
and it can complete its work in one clock cycle.
Disadvantages of RISC Processor

1. The RISC processor's performance may vary according to the code executed because
subsequent instructions may depend on the previous instruction for their execution in
a cycle.
2. Programmers and compilers often use complex instructions.
3. RISC processors require very fast memory to save various instructions that require a
large collection of cache memory to respond to the instruction in a short time.

RISC Architecture
It is a highly customized set of instructions used in portable devices due to system reliability
such as Apple iPod, mobiles/smartphones, Nintendo DS,

Characteristics of RISC
Here, are an important characteristic of RICS:

 Simpler instruction decoding


 A number of general-purpose registers.
 Simple Addressing Modes
 Fewer Data types.
 A pipeline can be achieved
 One instruction per cycle
 Register-to-register operations
 Simple instruction format
 Instruction execution would be faster
 Smaller Programs :Less instruction is needed to write an application.
 It provides easier programming in assembly language.
 support for complex data structure and easy compilation of high-level languages.
 It emphasizes the building of instruction on hardware because it is faster to create than
the software.

Features of RISC Processor


Some important features of RISC processors are:

1. One cycle execution time: For executing each instruction in a computer, the RISC
processors require one CPI (Clock per cycle). And each CPI includes the fetch, decode
and execute method applied in computer instruction.
2. Pipelining technique: The pipelining technique is used in the RISC processors to
execute multiple parts or stages of instructions to perform more efficiently.
3. A large number of registers: RISC processors are optimized with multiple registers
that can be used to store instruction and quickly respond to the computer and
minimize interaction with computer memory.
4. It supports a simple addressing mode and fixed length of instruction for executing the
pipeline.
5. It uses LOAD and STORE instruction to access the memory location.
6. Simple and limited instruction reduces the execution time of a process in a RISC.

CISC Processor
The CISC Stands for Complex Instruction Set Computer, developed by the Intel. It has a
large collection of complex instructions that range from simple to very complex and
specialized in the assembly language level, which takes a long time to execute the
instructions. So, CISC approaches reducing the number of instruction on each program and
ignoring the number of cycles per instruction. It emphasizes to build complex instructions
directly in the hardware because the hardware is always faster than software. However, CISC
chips are relatively slower as compared to RISC chips but use little instruction than RISC.
Examples of CISC processors are VAX, AMD, Intel x86 and the System/360.

The CISC architecture helps reduce program code by embedding multiple


operations on each prograCharacteristics of CISC Processor
CISC Processors Architecture
m instruction, which makes the CISC processor more complex. The CISC architecture-based
computer is designed to decrease memory costs because large programs or instruction
required large memory space to store the data, thus increasing the memory requirement,
and a large collection of memory increases the memory cost, which makes them more
expensive.

Advantages of CISC Processors

1. The compiler requires little effort to translate high-level programs or statement


languages into assembly or machine language in CISC processors.
2. The code length is quite short, which minimizes the memory requirement.
3. To store the instruction on each CISC, it requires very less RAM.
4. Execution of a single instruction requires several low-level tasks.
5. CISC creates a process to manage power usage that adjusts clock speed and voltage.
6. It uses fewer instructions set to perform the same instruction as the RISC.

Disadvantages of CISC Processors

1. CISC chips are slower than RSIC chips to execute per instruction cycle on each
program.
2. The performance of the machine decreases due to the slowness of the clock speed.
3. Executing the pipeline in the CISC processor makes it complicated to use.
4. The CISC chips require more transistors as compared to RISC design.
5. In CISC it uses only 20% of existing instructions in a programming event.
Characteristic of CISC

 A large number of instructions.


 Instruction-decoding logic will be complex.
 Instructions for special tasks used infrequently.
 A large variety of addressing modes
 It offers variable-length instruction formats.
 Instruction are larger than one-word size.
 Instruction may take more than a single clock cycle to get executed.
 Less number of general-purpose registers as operation get performed in memory itself.
 Various CISC designs are set up with two special registers for the stack pointer for managing
interrupts

Difference between the RISC and CISC Processors

RISC CISC

It is a Reduced Instruction Set It is a Complex Instruction Set


Computer. Computer.

It emphasizes on software to It emphasizes on hardware to


optimize the instruction set. optimize the instruction set.

It is a hard wired unit of Uses Microprogramed control


programming in the RISC Processor. unit in CISC Processor.

It requires multiple register sets to It requires a single register set


store the instruction. to store the instruction.

RISC has simple decoding of CISC has complex decoding of


instruction. instruction.

Uses of the pipeline are simple in Uses of the pipeline are


RISC. difficult in CISC.

It uses a limited number of It uses a large number of


instruction that requires less time to instruction that requires more
execute the instructions. time to execute the
instructions.

It uses LOAD and STORE that are It uses LOAD and STORE
independent instructions in the instruction in the memory-to-
register-to-register a program's memory interaction of a
interaction. program.

RISC has more transistors on CISC has transistors to store


memory registers. complex instructions.

The execution time of RISC is very The execution time of CISC is


short. longer.

RISC architecture can be used with CISC architecture can be used


high-end applications like with low-end applications like
telecommunication, image home automation, security
processing, video processing, etc. system, etc.

It has fixed format instruction. It has variable format


instruction.

The program written for RISC Program written for CISC


architecture needs to take more architecture tends to take less
space in memory. space in memory.

Example of RISC: ARM, PA-RISC, Examples of CISC: VAX,


Power Architecture, Alpha, AVR, ARC Motorola 68000 family,
and the SPARC. System/360, AMD and the
Intel x86 CPUs.

Parallel Processing
Parallel processing can be described as a class of techniques which enables the system to achieve simultaneous data-
processing tasks to increase the computational speed of a computer system.

A parallel processing system can carry out simultaneous data-processing to achieve faster execution time.

For instance, while an instruction is being processed in the ALU component of the CPU, the next instruction can be
read from memory.

The primary purpose of parallel processing is


 to speed up the computer processing capability
 increase its throughput, i.e. the amount of processing that can be accomplished during a given interval of time.
 To achive faster processing capability

A parallel processing system can be achieved by having a multiplie functional units that
perform identical or different operations simultaneously. The data can be distributed among
various multiple functional units.

The following diagram shows one possible way of separating the execution unit into eight
functional units operating in parallel.

The operation performed in each functional unit is indicated in each block if the diagram:

o The adder and integer multiplier performs the arithmetic operation with integer
numbers.
o The floating-point operations are separated into three circuits operating in parallel.
o The logic, shift, and increment operations can be performed concurrently on different
data. All units are independent of each other, so one number can be shifted while
another number is being incremented.
Flynn's Classification of Computers

M.J. Flynn proposed a classification for the organization of a computer system by the number of instructions and data
items that are manipulated simultaneously.

The sequence of instructions read from memory constitutes an instruction stream.

The operations performed on the data in the processor constitute a data stream.

Note: The term 'Stream' refers to the flow of instructions or data.

Parallel processing may occur in the instruction stream, in the data stream, or both.

50.5M
812
Hello Java Program for Beginners

Flynn's classification divides computers into four major groups that are:

1. Single instruction stream, single data stream (SISD)


2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)

Flynn’s classification –
1. Single-instruction, single-data (SISD) systems –
An SISD computing system is a uniprocessor machine which is capable of executing a single instruction,
operating on a single data stream. In SISD, machine instructions are processed in a sequential manner and
computers adopting this model are popularly called sequential computers. Most conventional computers have
SISD architecture. All the instructions and data to be processed have to be stored in primary memory.

he speed of the processing element in the SISD model is limited(dependent) by the rate at which the computer can
transfer information internally. Dominant representative SISD systems are IBM PC, workstations.
2.
3. Single-instruction, multiple-data (SIMD) systems –
An SIMD system is a multiprocessor machine capable of executing the same instruction on all the CPUs but
operating on different data streams. Machines based on an SIMD model are well suited to scientific computing
since they involve lots of vector and matrix operations. So that the information can be passed to all the processing
elements (PEs) organized data elements of vectors can be divided into multiple sets(N-sets for N PE systems) and
each PE can process one data set.

4.

Dominant representative SIMD systems is Cray’s vector processing machine.


5.
6. Multiple-instruction, single-data (MISD) systems –
An MISD computing system is a multiprocessor machine capable of executing different instructions on different

PEs but all of them operating on the same dataset .


7.

Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same data set. Machines built using the MISD model are not useful
in most of the application, a few machines are built, but none of them are available commercially.
8.
9. Multiple-instruction, multiple-data (MIMD) systems –
An MIMD system is a multiprocessor machine which is capable of executing multiple instructions on multiple
data sets. Each PE in the MIMD model has separate instruction and data streams; therefore machines built using
this model are capable to any kind of application. Unlike SIMD and MISD machines, PEs in MIMD machines
work asynchronously.

10.

MIMD machines are broadly categorized into shared-memory MIMD and distributed-memory MIMD based on
the way PEs are coupled to the main memory.
11.

In the shared memory MIMD model (tightly coupled multiprocessor systems), all the PEs are connected to a single
global memory and they all have access to it. The communication between PEs in this model takes place through the
shared memory, modification of the data stored in the global memory by one PE is visible to all other PEs. Dominant
representative shared memory MIMD systems are Silicon Graphics machines and Sun/IBM’s SMP (Symmetric
Multi-Processing).
In Distributed memory MIMD machines (loosely coupled multiprocessor systems) all PEs have a local memory.
The communication between PEs in this model takes place through the interconnection network (the inter process
communication channel, or IPC). The network connecting PEs can be configured to tree, mesh or in accordance with
the requirement.
The shared-memory MIMD architecture is easier to program but is less tolerant to failures and harder to extend with
respect to the distributed memory MIMD model. Failures in a shared-memory MIMD affect the entire system,
whereas this is not the case of the distributed model, in which each of the PEs can be easily isolated. Moreover,
shared memory MIMD architectures are less likely to scale because the addition of more PEs leads to memory
contention. This is a situation that does not happen in the case of distributed memory, in which each PE has its own
memory. As a result of practical outcomes and user’s requirement , distributed memory MIMD architecture is
superior to the other existing models.
1. Single-instruction, single-data (SISD) systems –
An SISD computing system is a uniprocessor machine which is capable of executing a single instruction,
operating on a single data stream. In SISD, machine instructions are processed in a sequential manner and
computers adopting this model are popularly called sequential computers. Most conventional computers have
SISD architecture. All the instructions and data to be processed have to be stored in primary memory.

The speed of the processing element in the SISD model is limited(dependent) by the rate at which the computer can
transfer information internally. Dominant representative SISD systems are IBM PC, workstations.
2.
3. Single-instruction, multiple-data (SIMD) systems –
An SIMD system is a multiprocessor machine capable of executing the same instruction on all the CPUs but
operating on different data streams. Machines based on an SIMD model are well suited to scientific computing
since they involve lots of vector and matrix operations. So that the information can be passed to all the processing
elements (PEs) organized data elements of vectors can be divided into multiple sets(N-sets for N PE systems) and
each PE can process one data set.

4.

Dominant representative SIMD systems is Cray’s vector processing machine.


5.
6. Multiple-instruction, single-data (MISD) systems –
An MISD computing system is a multiprocessor machine capable of executing different instructions on different
PEs but all of them operating on the same dataset .
7.

Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same data set. Machines built using the MISD model are not useful
in most of the application, a few machines are built, but none of them are available commercially.
8.
9. Multiple-instruction, multiple-data (MIMD) systems –
An MIMD system is a multiprocessor machine which is capable of executing multiple instructions on multiple
data sets. Each PE in the MIMD model has separate instruction and data streams; therefore machines built using
this model are capable to any kind of application. Unlike SIMD and MISD machines, PEs in MIMD machines
work asynchronously.

10.

MIMD machines are broadly categorized into shared-memory MIMD and distributed-memory MIMD based on
the way PEs are coupled to the main memory.
11.

In the shared memory MIMD model (tightly coupled multiprocessor systems), all the PEs are connected to a single
global memory and they all have access to it. The communication between PEs in this model takes place through the
shared memory, modification of the data stored in the global memory by one PE is visible to all other PEs. Dominant
representative shared memory MIMD systems are Silicon Graphics machines and Sun/IBM’s SMP (Symmetric
Multi-Processing).
In Distributed memory MIMD machines (loosely coupled multiprocessor systems) all PEs have a local memory.
The communication between PEs in this model takes place through the interconnection network (the inter process
communication channel, or IPC). The network connecting PEs can be configured to tree, mesh or in accordance with
the requirement.
The shared-memory MIMD architecture is easier to program but is less tolerant to failures and harder to extend with
respect to the distributed memory MIMD model. Failures in a shared-memory MIMD affect the entire system,
whereas this is not the case of the distributed model, in which each of the PEs can be easily isolated. Moreover,
shared memory MIMD architectures are less likely to scale because the addition of more PEs leads to memory
contention. This is a situation that does not happen in the case of distributed memory, in which each PE has its own
memory. As a result of practical outcomes and user’s requirement , distributed memory MIMD architecture is
superior to the other existing models.
12.

Pipelining

The term Pipelining refers to a technique of decomposing a sequential process into sub-operations, where each sub-
operation are executed in a dedicated segment that operates concurrently with all other segments.

 The most important characteristic of a pipeline technique is that several computations can be in progress in
distinct segments at the same time.

 The overlapping of computation is made possible by associating a register with each segment in the pipeline.
The registers provide isolation between each segment so that each can operate on distinct data simultaneously.

The structure of a pipeline organization can be represented simply by including an input register for each segment
followed by a combinational circuit.

example :- consider combined multiplication and addition operation to get a better understanding of the pipeline
organization.

The combined multiplication and addition operation is done with a stream of numbers such as:

Ai* Bi + Ci for i = 1, 2, 3, ......., 7

The operation to be performed on the numbers is decomposed into sub-operations with each sub-operation to be
implemented in a segment within a pipeline.

The sub-operations performed in each segment of the pipeline are defined as:

R1 ← Ai, R2 ← Bi Input Ai, and Bi


R3 ← R1 * R2, R4 ← Ci Multiply, and input Ci
R5 ← R3 + R4 Add Ci to product

The following block diagram represents the combined as well as the sub-operations performed in each segment of the
pipeline.
Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in a
particular segment.

The output generated by the combinational circuit in a given segment is applied as an input
register of the next segment. For instance, from the block diagram, the register R3 is used as
one of the input registers for the combinational adder circuit.

In general, the pipeline organization is applicable for two areas of computer design which
includes:

1. Arithmetic Pipeline
2. Instruction Pipeline

Arithmetic Pipeline
Arithmetic Pipelines are mostly used in high-speed computers. They are used to implement
floating-point operations

multiplication of fixed-point numbers

computations encountered in scientific problems.

To understand the concepts of arithmetic pipeline in a more convenient way, s consider an


example of a pipeline unit for floating-point addition and subtraction.
The inputs to the floating-point adder pipeline are two normalized floating-point binary
numbers defined as:

X = A * 2a = 0.9504 * 103
Y = B * 2b = 0.8200 * 102

Where A and B are two fractions that represent the mantissa and a and b are the exponents.

The combined operation of floating-point addition and subtraction is divided into four
segments. Each segment contains the corresponding suboperation to be performed in the
given pipeline. The suboperations that are shown in the four segments are:

1. Compare the exponents by subtraction.


2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.

The following block diagram represents the suboperations performed in each segment of the
pipeline.
Note: Registers are placed after each suboperation to store the intermediate results.

1. Compare exponents by subtraction:


The exponents are compared by subtracting them to determine their difference. The larger
exponent is chosen as the exponent of the result.
The difference of the exponents, i.e., 3 - 2 = 1 determines how many times the mantissa
associated with the smaller exponent must be shifted to the right.

2. Align the mantissas:


The mantissa associated with the smaller exponent is shifted according to the difference of
exponents determined in segment one.

X = 0.9504 * 103
Y = 0.08200 * 103

3. Add mantissas:
The two mantissas are added in segment three.

Z = X + Y = 1.0324 * 103

4. Normalize the result:


After normalization, the result is written as:

Z = 0.1324 * 104

Note: Registers are placed after each suboperation to store the intermediate results.

Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the instruction stream as
well.

Most of the digital computers with complex instructions require instruction pipeline to carry
out operations like fetch, decode and execute instructions.

In general, the computer needs to process each instruction with the following sequence of
steps.

1. Fetch instruction from memory.


2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.

Each steps ares executed in a particular segment, and some times different segments may
take different times to operate on the incoming information.
Moreover,some times two or more segments may require memory access at the same time,
causing one segment to wait until another is finished with the memory.

The organization of an instruction pipeline will be more efficient if the instruction cycle is
divided into segments of equal duration.

One of the most common examples of this type of organization is a Four-segment


instruction pipeline.

A four-segment instruction pipeline combines two or more different segments and makes
it as a single one.

For instance, the decoding of the instruction can be combined with the calculation of the
effective address into one segment.

The following block diagram shows a typical example of a four-segment instruction pipeline.
The instruction cycle is completed in four segments.
Diagram: a four-segment instruction pipeline.

Segment 1:
The instruction fetch segment can be implemented using first in, first out (FIFO) buffer.

Segment 2:

The instruction fetched from memory is decoded in the second segment, and eventually, the
effective address is calculated in a separate arithmetic circuit.

Segment 3:

An operand from memory is fetched in the third segment.

Segment 4:

The instructions are finally executed in the last segment of the pipeline organization.

RISCs Pipeline

RISC stands for Reduced Instruction Set Computers. It was introduced to execute as fast as one instruction per clock
cycle. This RISC pipeline helps to simplify the computer architecture’s design.
It relates to what is known as the Semantic Gap, that is, the difference between the operations provided in the high-
level languages (HLLs) and those provided in computer architectures.
To avoid these consequences, the conventional response of the computer architects is to add layers of complexity to
newer architectures. This also increases the number and complexity of instructions together with an increase in the
number of addressing modes. The architecture which resulted from the adoption of this “add more complexity” are
known as Complex Instruction Set Computers (CISC).
The main benefit of RISC to implement instructions at the cost of one per clock cycle is continually not applicable
because each instruction cannot be fetched from memory and implemented in one clock cycle correctly under all
circumstances.
The method to obtain the implementation of an instruction per clock cycle is to initiate each instruction with each
clock cycle and to pipeline the processor to manage the objective of single-cycle instruction execution.
RISC compiler gives support to translate the high-level language program into a machine language program. There
are various issues in managing complexity about data conflicts and branch penalties are taken care of by the RISC
processors, which depends on the adaptability of the compiler to identify and reduce the delays encountered with
these issues.
Principles of RISCs Pipeline
There are various principles of RISCs pipeline which are as follows −

 Keep the most frequently accessed operands in CPU registers.


 It can minimize the register-to-memory operations.
 It can use a high number of registers to enhance operand referencing and decrease the processor memory
traffic.
 It can optimize the design of instruction pipelines such that minimum compiler code generation can be
achieved.
 It can use a simplified instruction set and leave out those complex and unnecessary instructions.
Let us consider a three-segment instruction pipeline that shows how a compiler can optimize the machine language
program to compensate for pipeline conflicts.
A frequent collection of instructions for a RISC processor is of three types are as follows −

 Data Manipulation Instructions − Manage the data in processor registers.


 Data Transfer Instructions − These are load and store instructions that use an effective address that is
obtained by adding the contents of two registers or a register and a displacement constant provided in the
instruction.
 Program Control Instructions − These instructions use register values and a constant to evaluate the branch
address, which is transferred to a register or the program counter (PC).

Pipelining and vector processing

Parallel processing
Execution of Concurrent Events in the computing process to achieve faster
Computational Speed

Levels of Parallel Processing


- Job or Program level
- Task or Procedure level
- Inter-Instruction level
- Intra-Instruction level
PARALLEL COMPUTERS
Architectural Classification
Flynn's classification

» Based on the multiplicity of Instruction Streams and Data Streams

» Instruction Stream
Sequence of Instructions read from memory

» Data Stream
Operations performed on the data in the processor

What is Pipelining?
Pipelining is the process of accumulating instruction from the processor through a
pipeline. It allows storing and executing instructions in an orderly process. It is also known
as pipeline processing.
Pipelining is a technique where multiple instructions are overlapped during execution. Pipeline
is divided into stages and these stages are connected with one another to form a pipe like
structure. Instructions enter from one end and exit from another end.
Pipelining increases the overall instruction throughput.
In pipeline system, each segment consists of an input register followed by a combinational
circuit. The register is used to hold data and combinational circuit performs operations on it. The
output of combinational circuit is applied to the input register of the next segment.
Pipeline system is like the modern day assembly line setup in factories. For example in a car
manufacturing industry, huge assembly lines are setup and at each point, there are robotic arms
to perform a certain task, and then the car moves on ahead to the next arm.

Types of Pipeline
It is divided into 2 categories:

1. Arithmetic Pipeline
2. Instruction Pipeline

Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for floating
point operations, multiplication of fixed point numbers etc. For example: The input to the
Floating Point Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point numbers), while a and b
are exponents.
The floating point addition and subtraction is done in 4 parts:

1. Compare the exponents.


2. Align the mantissas.
3. Add or subtract mantissas
4. Produce the result.

Registers are used for storing the intermediate results between the above operations.
Instruction Pipeline
In this a stream of instructions can be executed by overlapping fetch, decode and execute phases
of an instruction cycle. This type of technique is used to increase the throughput of the computer
system.
An instruction pipeline reads instruction from the memory while previous instructions are being
executed in other segments of the pipeline. Thus we can execute multiple instructions
simultaneously. The pipeline will be more efficient if the instruction cycle is divided into
segments of equal duration.
Advantages of Pipelining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.

Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.

VectorProcessing
There is a class of computational problems that are beyond the capabilities of a
conventional computer. These problems require vast number of computations on multiple data
items, that will take a conventional computer(with scalar processor) days or even weeks to
complete.
Such complex instructions, which operates on multiple data at the same time, requires a better
way of instruction execution, which was achieved by Vector processors.
Scalar CPUs can manipulate one or two data items at a time, which is not very efficient. Also,
simple instructions like ADD A to B, and store into C are not practically efficient.
Addresses are used to point to the memory location where the data to be operated will be found,
which leads to added overhead of data lookup. So until the data is found, the CPU would be
sitting ideal, which is a big performance issue.
Hence, the concept of Instruction Pipeline comes into picture, in which the instruction passes
through several sub-units in turn. These sub-units perform various independent functions, for
example: the first one decodes the instruction, the second sub-unit fetches the data and
the thirdsub-unit performs the math itself. Therefore, while the data is fetched for one
instruction, CPU does not sit idle, it rather works on decoding the next instruction set, ending up
working like an assembly line.
Vector processor, not only use Instruction pipeline, but it also pipelines the data, working on
multiple data at the same time.
A normal scalar processor instruction would be ADD A, B, which leads to addition of two
operands, but what if we can instruct the processor to ADD a group of
numbers(from 0 to n memory location) to another group of numbers(lets say, n to k memory
location). This can be achieved by vector processors.
In vector processor a single instruction, can ask for multiple data operations, which saves time,
as instruction is decoded once, and then it keeps on operating on different data items.
Applications of Vector Processors
Computer with vector processing capabilities are in demand
in specialized applications. The following are some areas
where vector processing is used:

1. Petroleum exploration.
2. Medical diagnosis.
3. Data analysis.
4. Weather forecasting.
5. Aerodynamics and space flight simulations.
6. Image processing.
7. Artificial intelligence.

Vector and Array processor

Differences between Vector and Array

- Vector is a growable and shrinkable where as Array is not.


- Vector implements the List interface where as array is a primitive data type
- Vector is synchronized where as array is not.
- The size of the array is established when the array is created. As the Vector is
growable, the size changes when it grows.

- Arrays provide efficient access to any element and can not modify or increase the
size of the array.
- Vector is efficient in insertion, deletion and to increase the size.
- Arrays size is fixed where as Vector size can increase.

Vector processor
Vector processor is basically a central processing unit that has the ability to execute
the complete vector input in a single instruction.

Vector(array) is a ordered set of 1D array of data items a vector V of length n is
referred as V=[V1,v2,v3,v4].

vector processor is a central processing unit (CPU) that implements an instruction


set where its instructions are designed to operate efficiently and effectively on
large one-dimensional arrays of data called vectors.

vector processing performs the arithmetic operation on the large array of integers or
floating-point number. Vector processing operates on all the elements of the array in
parallel providing each pass is independent of the other.

Vector processing avoids the overhead of the loop control mechanism that occurs in
general-purpose computers.

Vector processor is in contrast to scalar processors, whose instructions operate on


single data items only, and in contrast to some of those same scalar processors having
additional single instruction, multiple data (SIMD) .

Characteristics of Vector Processing

Each element of the vector operand is a scalar quantity which can either be an
integer, floating-point number, logical value or a character. Below we have classified
the vector instructions in four types.

Here, V is representing the vector operands and S represents the scalar operands. In
the figure below, O1 and O2 are the unary operations and O3 and O4 are the binary
operations.

In Register-to-Register vector processor the source operands for instruction, the


intermediate result, and the final result all are retrieved from vector or scalar
registers. Cray-1 and Fujitsu VP-200 use register-to-register format for vector
instructions.

Vector Instruction

A vector instruction has the following fields:

1. Operation Code

Operation code indicates the operation that has to be performed in the given
instruction.

2. Base Address

Base address field refers to the memory location from where the operands are to be
fetched or to where the result has to be stored. The base address is found in the
memory reference instructions. In the vector instruction, the operand and the result
both are stored in the vector registers. Here, the base address refers to the
designated vector register.

3. Address Increment

A vector operand has several data elements and address increment specifies
the address of the next element in the operand. Some computer stores the data
element consecutively in main memory for which the increment is always 1. But,
some computers that do not store the data elements consecutively requires the variable
address increment.

4. Address Offset

Address Offset is always specified related to the base address. The effective memory
address is calculated using the address offset.

5. Vector Length

Vector length specifies the number of elements in a vector operand. It identifies


the termination of a vector instruction.

Most of the vector instructions are pipelined as vector instruction performs the same
operation on the different data sets repeatedlyThe pipelined vector processors can be
classified into two types based on from where the operand is being fetched for vector
processing.

Register to Register Architecture


In this architecture, the fetching of the operand or previous results indirectly takes
place through the main memory by the use of registers.

Memory to Memory Architecture

memory to memory architecture, the operands or the results are directly fetched from
the memory despite using registers.

Advantages of Vector Processor

 Vector processor uses vector instructions by which code density of the instructions
can be improved.
 The sequential arrangement of data helps to handle the data by the hardware in a
better way.
 It offers a reduction in instruction bandwidth.
Array processors are also known as multiprocessors or vector processors. They
perform computations on large arrays of data. Thus, they are used to improve the
performance of the computer.

Array Processor

Why use the Array Processor

 Array processors increases the overall instruction processing speed.


 As most of the Array processors operates asynchronously from the host CPU,
hence it improves the overall capacity of the system.
 Array Processors has its own local memory, hence providing extra memory for
systems with low memory.

Types of Array Processors

There are basically two types of array processors:

1. Attached Array Processors


2. SIMD Array Processors

Attached Array Processors


An attached array processor is a processor which is attached to a general purpose
computer and its purpose is to enhance and improve the performance of that computer
in numerical computational tasks. It achieves high performance by means of parallel
processing with multiple functional units.

SIMD Array Processors

SIMD is the organization of a single computer containing multiple processors


operating in parallel. The processing units are made to operate under the control of a
common control unit, thus providing a single instruction stream and multiple data
streams.

A general block diagram of an array processor is shown below. It contains a set of


identical processing elements (PE's), each of which is having a local memory M. Each
processor element includes an ALU and registers. The master control unit controls all
the operations of the processor elements. It also decodes the instructions and
determines how the instruction is to be executed.

The main memory is used for storing the program. The control unit is responsible for
fetching the instructions. Vector instructions are send to all PE's simultaneously and
results are returned to the memory.

The best known SIMD array processor is the ILLIAC IV computer developed by
the Burroughs corps. SIMD processors are highly specialized computers. They are
only suitable for numerical problems that can be expressed in vector or matrix form
and they are not suitable for other types of computations.
Modes of I/O Data Transfer
Data transfer between the central unit and I/O devices can be handled in generally
three types of modes which are given below:

1. Programmed I/O
2. Interrupt Initiated I/O
3. Direct Memory Access

Programmed I/O

Programmed I/O instructions are the result of I/O instructions written in computer
program. Each data item transfer is initiated by the instruction in the program.

Usually the program controls data transfer to and from CPU and peripheral.
Transferring data under programmed I/O requires constant monitoring of the
peripherals by the CPU.
Interrupt Initiated I/O

In the programmed I/O method the CPU stays in the program loop until the I/O unit
indicates that it is ready for data transfer. This is time consuming process because it
keeps the processor busy needlessly.

This problem can be overcome by using interrupt initiated I/O. In this when the
interface determines that the peripheral is ready for data transfer, it generates an
interrupt. After receiving the interrupt signal, the CPU stops the task which it is
processing and service the I/O transfer and then returns back to its previous
processing task.

Direct Memory Access

Removing the CPU from the path and letting the peripheral device manage the
memory buses directly would improve the speed of transfer. This technique is known
as DMA.

In this, the interface transfer data to and from the memory through memory bus. A
DMA controller manages to transfer data between peripherals and memory unit.

Many hardware systems use DMA such as disk drive controllers, graphic cards,
network cards and sound cards etc. It is also used for intra chip data transfer in
multicore processors. In DMA, CPU would initiate the transfer, do other operations
while the transfer is in progress and receive an interrupt from the DMA controller
when the transfer has been completed.

A multiprocessor has multiple CPUs or processors in the system. Multiple instructions


are executed simultaneously by these systems. As a result, throughput is increased. If
one CPU fails, the other processors will continue to work normally. So,
multiprocessors are more reliable.

Shared memory or distributed memory can be used in multiprocessor systems. Each


processor in a shared memory multiprocessor shares main memory and peripherals to
execute instructions concurrently. In these systems, all CPUs access the main memory
over the same bus. Most CPUs will be idle as the bus traffic increases. This type of
multiprocessor is also known as the symmetric multiprocessor. It provides a single
memory space for all processors.

Each CPU in a distributed memory multiprocessor has its own private memory. Each
processor can use local data to accomplish the computational tasks. The processor
may use the bus to communicate with other processors or access the main memory if
remote data is required.

Advantages and disadvantages of Multiprocessor System


There are various advantages and disadvantages of the multiprocessor system. Some
advantages and disadvantages of the multiprocessor system are as follows:

Advantages

There are various advantages of the multiprocessor system. Some advantages of the
multiprocessor system are as follows:

1. It is a very reliable system because multiple processors may share their work
between the systems, and the work is completed with collaboration.
2. It requires complex configuration.
3. Parallel processing is achieved via multiprocessing.
4. If multiple processors work at the same time, the throughput may increase.
5. Multiple processors execute the multiple processes a few times.

Disadvantages

There are various disadvantages of the multiprocessor system. Some disadvantages of


the multiprocessor system are as follows:

1. Multiprocessors work with different systems, so processors require memory


space.
2. If one of the processors fails, the work is shared among the remaining
processors.
3. These types of systems are very expensive.
4. If any processor is already utilizing an I/O device, additional processors may
not utilize the same I/O device that creates deadlock.
5. The operating system implementation is complicated because multiple
processors communicate with each other.

What is a Multicore System?


A single computing component with multiple cores (independent processing units) is
known as a multicore processor. It denotes the presence of a single CPU with several
cores in the system. Individually, these cores may read and run computer instructions.
They work in such a way that the computer system appears to have several
processors, although they are cores, not processors. These cores may execute normal
processors instructions, including add, move data, and branch.

A single processor in a multicore system may run many instructions simultaneously,


increasing the overall speed of the system's program execution. It decreases the
amount of heat generated by the CPU while enhancing the speed with which
instructions are executed. Multicore processors are used in various applications,
including general-purpose, embedded, network, and graphics processing (GPU).
The software techniques used to implement the cores in a multicore system are
responsible for the system's performance. The extra focus has been put on developing
software that may execute in parallel because you want to achieve parallel execution
with the help of many cores'

Advantages and disadvantages of Multicore System


There are various advantages and disadvantages of the multicore system. Some
advantages and disadvantages of the multicore system are as follows:

Advantages

There are various advantages of the multicore system. Some advantages of the
multicore system are as follows:

1. Multicore processors may execute more data than single-core processors.


2. When you are using multicore processors, the PCB requires less space.
3. It will have less traffic.
4. Multicores are often integrated into a single integrated circuit die or onto
numerous dies but packaged as a single chip. As a result, Cache Coherency is
increased.
5. These systems are energy efficient because they provide increased
performance while using less energy.

Disadvantages

There are various disadvantages of the multicore system. Some


disadvantages of the multicore system are as follows:

The main differences between the Multiprocessors and Multicore


systems are as follows:

Features Multiprocessors Multicore

Definition It is a system with A multicore processor is a


multiple CPUs that allows single processor that
processing programs contains multiple
simultaneously. independent processing
units known as cores that
may read and execute
program instructions.

Execution Multiprocessors run The multicore executes a


multiple programs faster single program faster.
than a multicore system.
Reliability It is more reliable than It is not much reliable than
the multicore system. If the multiprocessors.
one of any processors
fails in the system, the
other processors will not
be affected.

Traffic It has high traffic than It has less traffic than the
the multicore system. multiprocessors.

Cost It is more expensive as These are cheaper than the


compared to a multicore multiprocessors system.
system.

Configurati It requires complex It doesn't need to


on configuration.

Interconnection structure
Interconnection structures :
The processors must be able to share a set of main memory modules & I/O devices in a
multiprocessor system. This sharing capability can be provided through interconnection
structures. The interconnection structure that are commonly used are as follows –
1. Time-shared / Common Bus
2. Cross bar Switch
3. Multiport Memory
4. Multistage Switching Network (Covered in 2nd part)
5. Hypercube System

Time-shared / Common Bus (Interconnection structure in Multiprocessor System) :


In a multiprocessor system, the time shared bus interconnection provides a common
communication path connecting all the functional units like processor, I/O processor,
memory unit etc. The figure below shows the multiple processors with common
communication path (single bus).
Single-Bus Multiprocessor Organization

To communicate with any functional unit, processor needs the bus to transfer the data.
To do so, the processor first need to see that whether the bus is available / not by
checking the status of the bus. If the bus is used by some other functional unit, the
status is busy, else free.

A processor can use bus only when the bus is free. The sender processor puts the
address of the destination on the bus & the destination unit identifies it. In order to
communicate with any functional unit, a command is issued to tell that unit, what
work is to be done. The other processors at that time will be either busy in internal
operations or will sit free, waiting to get bus.
We can use a bus controller to resolve conflicts, if any. (Bus controller can set priority
of different functional units)
This Single-Bus Multiprocessor Organization is easiest to reconfigure & is simple.
This interconnection structure contains only passive elements.

The bus interfaces of sender & receiver units controls the transfer operation here.
To decide the access to common bus without conflicts, methods such as static & fixed
priorities, First-In-Out (FIFO) queues & daisy chains can be used.

Advantages –
 Inexpensive as no extra hardware is required such as switch.
 Simple & easy to configure as the functional units are directly connected to the
bus .

Disadvantages –
 Major fight with this kind of configuration is that if malfunctioning occurs in any
of the bus interface circuits, complete system will fail.
 Decreased throughput —
At a time, only one processor can communicate with any other functional unit.
 Increased arbitration logic it is the number of processors & memory unit
increases, the bus contention problem increases.

To solve the above disadvantages, we can use two uni-directional buses as :


Multiprocessor System with unidirectional buses

Both the buses are required in a single transfer operation. Here, the system complexity
is increased & the reliability is decreased, The solution is to use multiple bi-
directional buses.

Multiple bi-directional buses :


The multiple bi-directional buses means that in the system there are multiple buses
that are bi-directional. It permits simultaneous transfers as many as buses are
available. But here also the complexity of the system is increased.

Multiple Bi-Directional Multiprocessor System

Apart from the organization, there are many factors affecting the performance of bus.
They are –
 Number of active devices on the bus.
 Data width
 Error Detection method
 Synchronization of data transfer etc.

Advantages of Multiple bi-directional buses –


 Lowest cost for hardware as no extra device is needed such as switch.
 Modifying the hardware system configuration is easy.
 Less complex when compared to other interconnection schemes as there are only
2 buses & all the components are connected via that buses.

Disadvantages of Multiple bi-directional buses –


 System Expansion will degrade the performance because as the number of
functional unit increases, more communication is required but at a time only 1
transfer can happen via 1 bus.
 Overall system capacity limits the transfer rate & If bus fails, whole system will
fail.
 Suitable for small systems only.

Crossbar Switch :
A point is reached at which there is a separate path available for each memory
module, if the number of buses in common bus system is increased. Crossbar
Switch (for multiprocessors) provides separate path fro each module.

Crossbar Switch system contains of a number of crosspoints that are kept at


intersections among memory module and processor buses paths. In each
crosspoint, the small square represents a switch which obtains the path from a
processor to a memory module. Each switch point has control logic to set up the
transfer path among a memory and processor. It calculates the address which is
placed in the bus to obtain whether its specific module is being addressed. In
addition, it eliminates multiple requests for access to the same memory module
on a predetermined priority basis.
Functional design of a crossbar switch connected to one memory module is shown
in figure. The circuit contains multiplexers which choose the data, address, and
control from one CPU for communication with the memory module. Arbitration
logic established priority levels to select one CPU when two or more CPUs attempt
to access the same memory. The multiplexers can be handled by the binary code
which is produced by a priority encoder within the arbitration logic.

A crossbar switch system permits simultaneous transfers from all memory modules
because there is a separate path associated with each module. Thus, the hardware
needed to implement the switch may become quite large and complex.

Multiport Memory : In Multiport Memory system, the control, switching &


priority arbitration logic are distributed throughout the crossbar switch matrix
which is distributed at the interfaces to the memory modules.
Multiport Memory System employs separate buses between each memory module
and each CPU. A processor bus comprises the address, data and control lines
necessary to communicate with memory. Each memory module connects each
processor bus. At any given time, the memory module should have internal control
logic to obtain which port can have access to memory.
Memory module can be said to have four ports and each port accommodates one of
the buses. Assigning fixed priorities to each memory port resolve the memory
access conflicts. the priority is established for memory access associated with each
processor by the physical port position that its bus occupies in each module.
therefore CPU 1 can have priority over CPU 2, CPU 2 can have priority over CPU 3
and CPU 4 can have the lowest priority.
Advantage:-
High transfer rate can be achieved because of multiple paths
Disadvantage:-
 It requires expensive memory control logic and a large number of cables and
connectors.
 It is only good for systems with small number of processors.

4.Hypercube Interconnection :

This is a binary n-cube architecture. Here we can connect 2n processors and each of
the processor here forms a node of the cube. A node can be memory module, I/O
interface also, not necessarily processor. The processor at a node has communication
path that is direct goes to n other nodes (total 2n nodes). There are total 2n distinct n-
bit binary addresses.

Note Interconnection structure can decide overall system’s performance in a multi


processor environment. Although using common bus system is much easy & simple,
but the availability of only 1 path is its major drawback & if the bus fails, whole
system fails. To overcome this & improve overall performance, crossbar, multi port,
hypercube & then multistage switch network evolved.

Cache coherence shared memory


multiprocessor system with a separate cache memory for each processor, it is possible to have
many copies of shared data: one copy in the main memory and one in the local cache of each
processor that requested it. When one of the copies of data is changed, the other copies must
reflect that change. Cache coherence is the discipline which ensures that the changes in the values
of shared operands (data) are propagated throughout the system in a timely fashion.[1]
The following are the requirements for cache coherence:[2]
Write Propagation
Changes to the data in any cache must be propagated to other copies (of that cache
line) in the peer caches.
Transaction Serialization
Reads/Writes to a single memory location must be seen by all processors in the
same order.
Theoretically, coherence can be performed at the load/store granularity. However, in practice it is
generally performed at the granularity of cache blocks.[3]

You might also like