3rd Unit

Uploaded by

gouginkhongsailnd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views72 pages

3rd Unit

Uploaded by

gouginkhongsailnd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Parallel Processing and

Pipelining
Parallel Processing
• Parallel processing is a term used to denote a large class of
techniques that are used to provide simultaneous data-
processing tasks for the purpose of increasing the
computational speed of a computer system.
• Instead of processing each instruction sequentially as in a
conventional computer, a parallel processing system is able to
perform concurrent data processing to achieve faster
execution time.
• For example, while an instruction is being executed in the ALU, the next instruction
can be read from memory.
• The purpose of parallel processing is to speed up the
computer processing capability and increase its throughput,
that is, the amount of processing that can be accomplished
during a given interval of time.
• Parallel processing can be viewed from various levels of
complexity.
• At the lowest level, we distinguish between parallel and serial operations by the
type of registers used. Shift registers operate in serial fashion one bit at a time,
while registers with parallel load operate with all the bits of the word
simultaneously.
• Parallel processing at a higher level of complexity can be
achieved by having a multiplicity of functional units that
perform identical or different operations simultaneously.
• Parallel processing is established by distributing the data among the multiple
functional units.
Processor with multiple functional units
Processor with multiple functional units contd.
• The operands in the registers are applied to one of the units
depending on the operation specified by the instruction
associated with the operands.
• The adder and integer multiplier perform the arithmetic
operations with integer numbers.
• The floating-point operations are separated into three circuits
operating in parallel.
• The logic, shift, and increment operations can be performed
concurrently on different data.
• All units are independent of each other.
• So one number can be shifted while another number is being
incremented.
• The amount of hardware increases with parallel processing,
and with it, the cost of the system increases.
However, technological developments have reduced hardware costs to the point
where parallel processing techniques are economically feasible.
• There are a variety of ways that parallel processing can be
classified. It can be considered from the internal organization
of the processors, from the interconnection structure
between processors, or from the flow of information through
the system.
• One classification introduced by M. J. Flynn considers the
organization of a computer system by the number of
instructions and data items that are manipulated
simultaneously (Flynn’s Classification).
Flynn’s Classification
• The Von Neumann architecture is characterized by a CPU and
central memory system, with instructions and data being read
from memory.
• The single data path between the CPU and memory over
which both instructions and data must pass, and the
sequential nature of instruction execution together limit the
performance possible from the computer. This is sometimes
known as the Von Neumann bottleneck.
• This is aided in uniprocessor’s by pipelining and superscalar
design enhancements.
• In 1972 Michael J Flynn introduced a classification of various
computer architectures based on notions of instruction and
data streams.
• Machines can thus be classified as SISD, MISD, SIMD and
MIMD.

 Single instruction stream, single data stream (SISD)

 A single-processor computer system is called SISD.
 A program executed by the processor constitutes the single
instruction stream, and the sequence of data items that it
operates on constitutes the single data stream.
 Instructions are executed sequentially and the system may or
may not have internal parallel processing capabilities.
 Parallel processing in this case may be achieved by means of
pipeline processing.
Flynn’s
• SISD – Single Instruction Single Data

SI PE SD

9
 Single instruction stream, multiple data stream (SIMD)
 A single stream of instructions is broadcast (or distributed) to
a number of processors.
 Each processor operates on its own data.
 All processors execute the same program but operate on
different data.
 The multiple data streams are the sequences of data items
accessed by the individual processors in their own memories.
Flynn’s
SIMD- Single instruction stream, multiple data stream

PE SD

SI PE SD

PE SD

11
 Multiple instruction stream, multiple data stream (MIMD)
 Involves a number of independent processors, each executing
a different program and accessing its own sequence of data
items.
 Most Multiprocessor fit into this category.
 A MIMD machine is usually a number of separate processors
connected together through some interconnection network.
Flynn’s
• MIMD- Multiple Instructions Multiple Data

Multiple Instructions Multiple Data

SI PE SD

13
 Multiple instruction stream, single data stream (MISD)
 MISD architecture is the most uncommon one.
 Is only of theoretical interest since no practical system has
been constructed using this organization.
 In this architecture, the same data stream flows through a
linear array of processors, executing different instructions on
the stream.
Flynn’s
• MISD- Multiple Instructions / Single Data

Multiple Instructions Single Data

SI PE

SI PE SD

SI PE

15
Pipelining
1. Pipelining is a technique of decomposing a sequential process
into suboperations, with each subprocess being executed in a
special dedicated segment that operates concurrently with all
other segments. A pipeline can be visualized as a collection of
processing segments through which binary information flows.
2. Each segment performs partial processing dictated by the way
the task is partitioned. The result obtained from the
computation in each segment is transferred to the next
segment in the pipeline. The final result is obtained after the
data have passed through all segments.
3. It is characteristic of pipelines that several computations can be
in progress in distinct segments at the same time. The
overlapping of computation is made possible by associating a
register with each segment in the pipeline. The registers provide
isolation between each segment so that each can operate on
distinct data simultaneously.
• The simplest way of viewing the pipeline structure is to imagine
that each segment consists of an input register followed by a
combinational circuit. The register holds the data and the
combinational circuit performs the sub operation in the
particular segment. The output of the combinational circuit in a
given segment is applied to the input register of the next
segment. A clock is applied to all registers after enough time has
elapsed to perform all segment activity. In this way the
information flows through the pipeline one step at a time.
Suppose that we want to perform the combined multiply and add
operations with a stream of numbers.
Now consider the case where a k-segment pipeline
with a clock cycle time tp is used to execute n tasks.
The first task T1 requires a time equal to ktp to
complete its operation since there are k segments in
the pipe. The remaining n - 1 tasks emerge from the
pipe at the rate of one task per clock cycle and they
will be completed after a time equal to (n - 1)tp.
Therefore, to complete n tasks using a k-segment
pipeline requires k + (n - 1) clock cycles. For example,
the diagram of Fig. 9-4 shows four segments and six
tasks. The time required to complete all the
operations is 4 + (6 - 1) = 9 clock cycles, as
indicated in the diagram.
Arithmetic Pipeline
Arithmetic Pipeline
• Pipeline arithmetic units are usually found in very high speed
computers. They are used to implement floating-point operations,
multiplication of fixed-point numbers, and similar computations
encountered in scientific problems.
• The inputs to the floating-point adder pipeline are two normalized
floating-point binary numbers.

• A and B are two fractions that represent the mantissas and a and b
are the exponents. The floating-point addition and subtraction can
be performed in four segments, as shown in Fig. 9-6. The registers
labeled R are placed between the segments to store intermediate
results. The suboperations that are performed in the four segments
are:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.
The exponents are compared by subtracting them to determine
their difference. The larger exponent is chosen as the exponent of
the result. The exponent difference determines how many times
the mantissa associated with the smaller exponent must be shifted
to the right. This produces an alignment of the two mantissas. It
should be noted that the shift must be designed as a combinational
circuit to reduce the shift time. The two mantissas are added or
subtracted in segment 3. The result is normalized in segment 4.
When an overflow occurs, the mantissa of the sum or difference is
shifted right and the exponent incremented by one. If an
underflow occurs, the number of leading zeros in the mantissa
determines the number of left shifts in the mantissa and the
number that must be subtracted from the exponent.
Consider the two normalized floating-point numbers:
Instruction Pipeline
In the most general case, the computer needs to process each
instruction with the following sequence of steps:-
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
RISC Pipeline
CISC
Major characteristics of CISC (Complex instruction set computer)
architecture are:
1. A large number of instructions—typically from 100 to 250
instructions.
2. Some instructions that perform specialized tasks and are used
infrequently.
3. A large variety of addressing modes—typically from 5 to 20
different modes.
4. Variable-length instruction formats .
5. Instructions that manipulate operands in memory .
Reduced Instruction Set Computer
In the early 1980s, a number of computer designers
recommended that computers use fewer instructions with simple
constructs so they can be executed much faster within the CPU
without having to use memory as often. This RISC type of
computer is classified as a reduced instruction set computer or
RISC.
The concept of RISC architecture involves an attempt to reduce
execution time by simplifying the instruction set of the computer.
The major characteristics of a RISC processor are:
1. Relatively few instructions
2. Relatively few addressing modes
3. Memory access limited to load and store instructions
4. All operations done within the registers of the CPU
5. Fixed-length, easily decoded instruction format
6. Single-cycle instruction execution
7. Hardwired rather than microprogrammed control
8. A relatively large number of registers in the processor unit
9. Use of overlapped register windows to speed-up procedure call
and return
10. Efficient instruction pipeline
11. Compiler support for efficient translation of high-level language
programs into machine language programs
Among the characteristics attributed to RISC is its ability to use
an efficient instruction pipeline. The simplicity of the
instruction set can be utilized to implement an instruction
pipeline using a small number of suboperations, with each
being executed in one clock cycle. Because of the fixed-length
instruction format, the decoding of the operation can occur at
the same time as the register selection. All data manipulation
instructions have register-to- register operations. Since all
operands are in registers, there is no need for calculating an
effective address or fetching of operands from memory.
Therefore, the instruction pipeline can be implemented with
two or three segments. One segment fetches the instruction
from program memory, and the other segment executes the
instruction in the ALU. A third segment may be used to store the
result of the ALU operation in a destination register.
The data transfer instructions in RISC are limited to load and
store instructions. These instructions use register indirect
addressing. They usually need three or four stages in the
pipeline. One of the major advantages of RISC is its ability to
execute instructions at the rate of one per clock cycle. The
advantage of RISC over CISC is that RISC can achieve pipeline
segments, requiring just one clock cycle, while CISC uses many
segments in its pipeline, with the longest segment requiring
two or more clock cycles.
Another characteristic of RISC is the support given by the
compiler that translates the high-level language program into
machine language program. Instead of designing hardware to
handle the difficulties associated with data conflicts and branch
penalties, RISC processors rely on the efficiency of the compiler to
detect and minimize the delays encountered with these
problems.
Example: Three-Segment Instruction Pipeline
Delayed Branch: It was shown in Fig. 9-8 that a branch instruction
delays the pipeline operation until the instruction at the branch
address is fetched. Several techniques for reducing branch
penalties were discussed in the preceding section. The method
used in most RISC processors is to rely on the compiler to redefine
the branches so that they take effect at the proper time in the
pipeline. This method is referred to as delayed branch.
Vector Processing
There is a class of computational problems that are beyond the
capabilities of a conventional computer. These problems are
characterized by the fact that they require a vast number of
computations that will take a conventional computer days or even
weeks to complete. In many science and engineering applications,
the problems can be formulated in terms of vectors and matrices
that lend themselves to vector processing. Computers with vector
processing capabilities are in demand in specialized applications.
The following are representative application areas when vector
processing is of the utmost importance.
Long-range weather forecasting
Petroleum explorations
Seismic data analysis
Medical diagnosis
Aerodynamics and space flight simulations
Artificial intelligence and expert systems
Mapping the human genome
Image processing .
Vector Operations :
Memory Interleaving
• Pipeline and vector processors often require simultaneous
access to memory from two or more sources.
• An instruction pipeline may require the fetching of an
instruction and an operand at the same time from two
different segments.
• Similarly, an arithmetic pipeline usually requires two or more
operands to enter the pipeline at the same time.
• Instead of using two memory buses for simultaneous access,
the memory can be partitioned into a number of modules
connected to a common memory address and data buses.
• A memory module is a memory array together with its own
address and data registers.
Interleaving
 Divides the memory system into a number of
memory modules. Each module has its own address buffer register
(ABR) and data buffer register (DBR).
 Arranges addressing so that successive words in
the address space are placed in different
modules.
 When requests for memory access involve
consecutive addresses, the access will be to
different modules.
 Since parallel access to these modules is
possible, the average rate of fetching words from
the Main Memory can be increased.
Methods of address layouts
k bits m bits
m bits k bits
Module Address in module MM address
Address in module Module MM address

ABR DBR ABR DBR ABR DBR ABR DBR ABR DBR ABR DBR

Module Module Module Module Module Module

k
0 i n- 1 0 i 2 -1

 Consecutive words are placed in a

module. •Consecutive words are located in
 High-order k bits of a memory address consecutive modules.
determine the module. •Consecutive addresses can be located in
 Low-order m bits of a memory address
determine the word within a module. consecutive modules.
 When a block of words is transferred •While transferring a block of data,
from main memory to cache, only one several memory modules can be kept busy
module is busy at a time.
at the same time.
Multiple module memory organization

Csso U 5
No ratings yet
Csso U 5
29 pages
Unit-6 Pipelining
No ratings yet
Unit-6 Pipelining
63 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
52 pages
CO Module 5 Notes
No ratings yet
CO Module 5 Notes
16 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Coa Unit 5
No ratings yet
Coa Unit 5
71 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
33 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Computer Systems Architecture 308 312
No ratings yet
Computer Systems Architecture 308 312
5 pages
Coa Module 5
No ratings yet
Coa Module 5
10 pages
Coa, Unit V, Notes
No ratings yet
Coa, Unit V, Notes
26 pages
Csa - Unit-4
No ratings yet
Csa - Unit-4
9 pages
COAU5
No ratings yet
COAU5
31 pages
Unit 5-2 COA
No ratings yet
Unit 5-2 COA
52 pages
Unit 6 - Pipeline, Vector Processing and Multiprocessors
No ratings yet
Unit 6 - Pipeline, Vector Processing and Multiprocessors
23 pages
CO UNIT - 5 - Parallel Processing
No ratings yet
CO UNIT - 5 - Parallel Processing
30 pages
Caalp Unit5
No ratings yet
Caalp Unit5
20 pages
Unit 7 N
No ratings yet
Unit 7 N
13 pages
Pipeline Processing Coa
No ratings yet
Pipeline Processing Coa
34 pages
COA Unit - 4
No ratings yet
COA Unit - 4
31 pages
COA Unit-5
No ratings yet
COA Unit-5
144 pages
Chapter 5 - CO - BIM - III
No ratings yet
Chapter 5 - CO - BIM - III
7 pages
Advanced Parallel Processing
0% (1)
Advanced Parallel Processing
12 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
Unit-5-Parallel Processing
No ratings yet
Unit-5-Parallel Processing
11 pages
05 - 01 Pipeline and Vector Processing
No ratings yet
05 - 01 Pipeline and Vector Processing
14 pages
Chapter 3 - Pipelining-And-Vector-Processing
100% (1)
Chapter 3 - Pipelining-And-Vector-Processing
29 pages
Multiprocessor Systems & Pipelining
No ratings yet
Multiprocessor Systems & Pipelining
11 pages
COA DR MVN 5 UNIT - Latest PDF
No ratings yet
COA DR MVN 5 UNIT - Latest PDF
24 pages
Parallel Processing Essentials
No ratings yet
Parallel Processing Essentials
32 pages
Vectors
No ratings yet
Vectors
52 pages
Pipeline - 3117
No ratings yet
Pipeline - 3117
21 pages
Coa Notes Unit 5
No ratings yet
Coa Notes Unit 5
55 pages
Practical ACA
No ratings yet
Practical ACA
44 pages
Coa Mod 4 5
No ratings yet
Coa Mod 4 5
91 pages
1 - Unit 8 Pipeline - MP
No ratings yet
1 - Unit 8 Pipeline - MP
12 pages
Chapter 08 - Pipeline and Vector Processing
No ratings yet
Chapter 08 - Pipeline and Vector Processing
14 pages
UNIT-4 - Pipelining & Parallel Processing
No ratings yet
UNIT-4 - Pipelining & Parallel Processing
34 pages
Unit 4 COA
No ratings yet
Unit 4 COA
8 pages
Parallel Processing Explained
No ratings yet
Parallel Processing Explained
33 pages
Unit 5 1
No ratings yet
Unit 5 1
21 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
Unit 5
No ratings yet
Unit 5
51 pages
Pipeline - 3117
No ratings yet
Pipeline - 3117
22 pages
Pipelining and Vector Processing Guide
No ratings yet
Pipelining and Vector Processing Guide
63 pages
Parallel Processing & Pipelining Guide
No ratings yet
Parallel Processing & Pipelining Guide
8 pages
RISC vs CISC: A Comparative Guide
No ratings yet
RISC vs CISC: A Comparative Guide
35 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Lecture 10
No ratings yet
Lecture 10
23 pages
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
No ratings yet
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
20 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Unit 9: Fundamentals of Parallel Processing
No ratings yet
Unit 9: Fundamentals of Parallel Processing
16 pages
Pipelining Vector Processing
No ratings yet
Pipelining Vector Processing
27 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
Access Modifiers Are Used by Object Oriented Programming Languages Like C
No ratings yet
Access Modifiers Are Used by Object Oriented Programming Languages Like C
3 pages
4th Unit
No ratings yet
4th Unit
81 pages
Page Replacement Examples
No ratings yet
Page Replacement Examples
2 pages
Program To Illustrate Public Access Modifier in A Class
No ratings yet
Program To Illustrate Public Access Modifier in A Class
5 pages
CSV Files
No ratings yet
CSV Files
2 pages
Function Python
No ratings yet
Function Python
5 pages
Strings Functions and Specifier
No ratings yet
Strings Functions and Specifier
7 pages
Binary Files
No ratings yet
Binary Files
2 pages
Access Specifiers or Access Modifiers in Python Programming
No ratings yet
Access Specifiers or Access Modifiers in Python Programming
4 pages
Types of Inheritance in Python
No ratings yet
Types of Inheritance in Python
10 pages
Python OOP: OOP Object-Oriented Programming
No ratings yet
Python OOP: OOP Object-Oriented Programming
6 pages
Num Py
No ratings yet
Num Py
15 pages
Routine Nep
No ratings yet
Routine Nep
1 page
What Is A Regular Expression
No ratings yet
What Is A Regular Expression
9 pages
Lambda and Module
No ratings yet
Lambda and Module
8 pages
List
No ratings yet
List
13 pages
Javascript Function String
No ratings yet
Javascript Function String
4 pages
Crypto Botany
No ratings yet
Crypto Botany
1 page
Maintenance Application User Guide - Issue E
No ratings yet
Maintenance Application User Guide - Issue E
48 pages
Practical Rust Projects Build Serverless AI Machine Learning Embedded Game and Web Applications 2nd Edition Shing Lyu Instant Download
100% (4)
Practical Rust Projects Build Serverless AI Machine Learning Embedded Game and Web Applications 2nd Edition Shing Lyu Instant Download
61 pages
Computer 7 2ND Tos
No ratings yet
Computer 7 2ND Tos
2 pages
MTK 6218 Schematics
100% (1)
MTK 6218 Schematics
5 pages
KCNA Exam Question Dump
No ratings yet
KCNA Exam Question Dump
9 pages
Vijay Kumar Singh (Oracle DBA)
No ratings yet
Vijay Kumar Singh (Oracle DBA)
7 pages
What Is Windows??: Definition
No ratings yet
What Is Windows??: Definition
7 pages
Java Insights for Aspiring Developers
No ratings yet
Java Insights for Aspiring Developers
5 pages
Chauhan Ashok Resume
100% (1)
Chauhan Ashok Resume
2 pages
Lesson 6 - The Internet
No ratings yet
Lesson 6 - The Internet
4 pages
Fundamentals of Computer Organization and Digital Electronics
No ratings yet
Fundamentals of Computer Organization and Digital Electronics
93 pages
Number Systems in Computers
100% (6)
Number Systems in Computers
6 pages
Half Yearly Revision: SGVP International School, Ahmedabad - INDIA of
No ratings yet
Half Yearly Revision: SGVP International School, Ahmedabad - INDIA of
3 pages
Z8L180
No ratings yet
Z8L180
71 pages
Sorting Visualizer
No ratings yet
Sorting Visualizer
16 pages
CIS 115 Lecture02 VB-Introduction
No ratings yet
CIS 115 Lecture02 VB-Introduction
41 pages
SIM Card Reg Documentation
100% (4)
SIM Card Reg Documentation
54 pages
Python Ludo Game Project Report
No ratings yet
Python Ludo Game Project Report
33 pages
Verilog HDL Behavioral Modeling Guide
No ratings yet
Verilog HDL Behavioral Modeling Guide
70 pages
Week 10
No ratings yet
Week 10
3 pages
EB8000 User Manual
No ratings yet
EB8000 User Manual
685 pages
LM80 LM81 LM82 LM83 LM94
No ratings yet
LM80 LM81 LM82 LM83 LM94
146 pages
EMC Isilon Best Practice Guide File System Auditing - Common Event Enabler
No ratings yet
EMC Isilon Best Practice Guide File System Auditing - Common Event Enabler
20 pages
How To Delete A Service in Windows
No ratings yet
How To Delete A Service in Windows
23 pages
CICS
100% (4)
CICS
109 pages
Usb Wifi Uputstvo
No ratings yet
Usb Wifi Uputstvo
11 pages
Product Release Review
No ratings yet
Product Release Review
10 pages
Enterprise Networking Power of The Platform
No ratings yet
Enterprise Networking Power of The Platform
67 pages
7350 Wu
No ratings yet
7350 Wu
11 pages
Expert
No ratings yet
Expert
212 pages

3rd Unit

Uploaded by

3rd Unit

Uploaded by

Parallel Processing and

 Single instruction stream, single data stream (SISD)

Multiple Instructions Multiple Data

Multiple Instructions Single Data

Module Module Module Module Module Module

 Consecutive words are placed in a

You might also like