0% found this document useful (0 votes)

259 views22 pages

Benefits of Parallel Computing

Parallel processing can increase computational speed by performing multiple tasks simultaneously. It allows instructions to be processed in parallel, such as reading the next instruction while the current one is being processed. The primary purpose is to enhance processing capability and throughput. Parallelism can be achieved by having multiple functional units perform operations simultaneously on distributed data. Common types of parallelism include bit-level, instruction-level, task-level, and data-level parallelism. Parallel processing is used in applications such as weather prediction, engineering, science, and more.

Uploaded by

deivasigamani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

259 views22 pages

Benefits of Parallel Computing

Uploaded by

deivasigamani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Parallelism

Why Parallel Architecture?

 Parallel computer architecture adds a new
dimension in the development of computer
system by using a greater number of
processors.
 In principle, performance achieved by utilizing
large number of processors is higher than the
performance of a single processor at a given
point of time.
Parallelism
Parallel processing can be described as a class of techniques which enables the system to achieve simultaneous data-processing tasks
to increase the computational speed of a computer system.

A parallel processing system can carry out simultaneous data-processing to achieve faster execution time.

For instance, while an instruction is being processed in the ALU component of the CPU, the next instruction can be read from
memory.

The primary purpose of parallel processing is to enhance the computer processing capability and increase its throughput,

A parallel processing system can be achieved by having a multiplicity of functional units that perform identical or different operations
simultaneously.

The data can be distributed among various multiple functional units.

The following diagram shows one possible way of separating the execution unit into eight functional units operating in parallel.

The operation performed in each functional unit is indicated in each block if the diagram
Parallelism
Parallelism

The adder and integer multiplier performs the arithmetic operation with
integer numbers.

The floating-point operations are separated into three circuits operating in

parallel.

The logic, shift, and increment operations can be performed concurrently on

different data.

All units are independent of each other, so one number can be shifted while
another number is being incremented
Parallelism
Advantages of Parallel Computing over Serial Computing are as follows:

1. It saves time and money as many resources working together will reduce
the time and cut potential costs.

2. It can be impractical to solve larger problems on Serial Computing.

3. It can take advantage of non-local resources when the local re sources

are finite.

4. Serial Computing ‘wastes’ the potential computing power, thus Parallel

Computing makes better work of hardware.
Parallelism
Types of Parallelism:
1. Bit-level parallelism:
• It is the form of parallel computing which is based on the increasing processor’s size. It reduces the
number of instructions that the system must execute in order to perform a task on large-sized data.
Example: Consider a scenario where an 8-bit processor must compute the sum of two 16-bit integers. It
must first sum up the 8 lower-order bits, then add the 8 higher-order bits, thus requiring two instructions
to perform the operation. A 16- bit processor can perform the operation with just one instruction.
2. Instruction-level parallelism:
• A processor can only address less than one instructionforeachclockcyclephase.Theseinstructionscanbere-
orderedand grouped which are later on executed concurrently without affecting the result of the
program. This is called instruction-level parallelism.
3. Task Parallelism:
• Task parallelism employs the decomposition of a task into subtasks and then allocating each of the
subtasks for execution. The processors perform execution of sub tasks concurrently.
4. Data-level parallelism (DLP)
• Instructions from a single stream operate concurrently on several data – Limited by non-regular data
manipulation patterns and by memory bandwidth
Parallelism-Applications
• Numeric Weather Predictions
• Socio Economics
• Finite Element Analysis
• AI and Automation
• Genetic Engineering
• Weapon Research and Defense
• Medical Applications
• Remote Sensing Applications
• Energy Resource Exploration
Parallelism
• Architectural Trends
• When multiple operations are executed in parallel, the number of cycles needed to execute
the program is reduced.
• However, resources are needed to support each of the concurrent activities.
• Resources are also needed to allocate local storage.
• The best performance is achieved by an intermediate action plan that uses resources to
utilize a degree of parallelism and a degree of locality.
• Generally, the history of computer architecture has been divided into four generations
having following basic technologies− Vacuum tubes, Transistors, Integrated circuits ,VLSI
• Till 1985, the duration was dominated by the growth in bit-level parallelism.
• 4-bit microprocessors followed by 8-bit, 16-bit, and soon.
• To reduce the number of cycles needed to perform a full 32-bit operation, the
widthofthedatapathwasdoubled.Lateron,64-bitoperationswereintroduced.
• The growth in instruction-level-parallelism dominated the mid-80s tomid-90s.
• The RISC approach showed that it was simple to pipeline the steps of instruction processing
so that on an average an instruction is executed in almost every cycle.
Instruction Level Parallelism
• Almost all processors since 1985 use pipelining to overlap the
execution of instructions and improve performance. This potential
overlap among instructions is called instruction level parallelism
• First introduced in the IBM Stretch (Model 7030) in about 1959
• Later the CDC 6600 incorporated pipelining and the use of multiple
functional units
• The Intel i486 was the first pipelined implementation of the IA32
architecture

January 2013 Instruction Level Parallelism 9

Instruction Level Parallelism

• Instruction level parallel processing is the concurrent processing of

multiple instructions
• Difficult to achieve within a basic code block
• Typical MIPS programs have a dynamic branch frequency of between 15% and
25%
• That is, between three and six instructions execute between a pair of
branches, and data hazards usually exist within these instructions as they are
likely to be dependent
• Given basic code block size in number of instructions, ILP must be
exploited across multiple blocks

January 2013 Instruction Level Parallelism 10

Instruction Level Parallelism
• The current trend is toward very deep pipelines, increasing from a
depth of < 10 to > 20.
• With more stages, each stage can be smaller, more simple and
provide less gate delay, therefore very high clock rates are possible.

January 2013 Instruction Level Parallelism 11

Loop Level Parallelism
Exploitation among Iterations of a Loop
• Loop adding two 1000 element arrays
• Code
for (i=1; i<= 1000; i=i+1)
x[i] = x[i] + y[i];
• If we look at the generated code, within a loop there may be little
opportunity for overlap of instructions, but each iteration of the loop
can overlap with any other iteration

January 2013 Instruction Level Parallelism 12

Concepts and Challenges
Approaches to Exploiting ILP
• Two major approaches
• Dynamic – these approaches depend upon the hardware to locate the
parallelism
• Static – fixed solutions generated by the compiler, and thus bound at compile
time
• These approaches are not totally disjoint, some requiring both
• Limitations are imposed by data and control hazards

January 2013 Instruction Level Parallelism 13

Features Limiting Exploitation of Parallelism
• Program features
• Instruction sequences
• Processor features
• Pipeline stages and their functions
• Interrelationships
• How do program properties limit performance? Under what circumstances?

January 2013 Instruction Level Parallelism 14

Approaches to Exploiting ILP
Dynamic Approach
• Hardware intensive approach
• Dominate desktop and server markets
• Pentium III, 4, Athlon
• MIPS R10000/12000
• Sun UltraSPARC III
• PowerPC 603, G3, G4
• Alpha 21264

January 2013 Instruction Level Parallelism 15

Approaches to Exploiting ILP
Static Approach
• Compiler intensive approach
• Embedded market and IA-64

January 2013 Instruction Level Parallelism 16

Terminology and Ideas
• Cycles Per Instruction
• Pipeline CPI = Ideal Pipeline CPI + Structural Stalls + Data Hazard Stalls +
Control Stalls
• Ideal Pipeline CPI is the max that we can achieve in a given
architecture. Stalls and/or their impacts must be minimized.
• During 1980s CPI =1 was a target objective for single chip
microprocessors
• 1990’s objective: reduce CPI below 1
• Scalar processors are pipelined processors that are designed to fetch
and issue at most one instruction every machine cycle
• Superscalar processors are those that are designed to fetch and issue
multiple instructions every machine cycle

January 2013 Instruction Level Parallelism 17

Approaches to Exploiting ILP
That We Will Explore
Technique Reduces
Forwarding and bypassing Potential data hazards and stalls
Delayed branches and simple branch scheduling Control hazard stalls
Basic dynamic scheduling (scoreboarding) Data hazard stalls from true dependences
Dynamic scheduling with renaming Data hazard stalls and stalls from antidependences and
output dependences
Branch prediction Control stalls
Issuing multiple instructions per cycle Ideal CPI
Hardware Speculation Data hazard and control hazard stalls
Dynamic memory disambiguation Data hazard stalls with memory
Loop unrolling Control hazard stalls
Basic computer pipeline scheduling Data hazard stalls
Compiler dependence analysis, software pipelining, trace Ideal CPI, data hazard stalls
scheduling
Hardware support for Compiler speculation Ideal CPI, data, control stalls.

January 2013 Instruction Level Parallelism 18

Flynn’s Classification
Flynn’s Classification
Flynn’s Classification
Flynn’s Classification

Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
No ratings yet
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
34 pages
1-IAS Architecture-12-12-2022
No ratings yet
1-IAS Architecture-12-12-2022
34 pages
Slides Chapter 5 Basic Processing Unit
No ratings yet
Slides Chapter 5 Basic Processing Unit
44 pages
Unix File System Case Study
No ratings yet
Unix File System Case Study
23 pages
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
No ratings yet
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
55 pages
SIMD Architecture Explained
100% (1)
SIMD Architecture Explained
45 pages
UNIT 5 (DSP Processor)
78% (9)
UNIT 5 (DSP Processor)
51 pages
3259 - Basics of CHN Lab
No ratings yet
3259 - Basics of CHN Lab
67 pages
Lesson 1: Structure of A Compiler
No ratings yet
Lesson 1: Structure of A Compiler
20 pages
Big Data Analytics Midterm Q&A
No ratings yet
Big Data Analytics Midterm Q&A
15 pages
Unit 5
No ratings yet
Unit 5
86 pages
Chapter 2 - Memory Management (Simple Systems)
No ratings yet
Chapter 2 - Memory Management (Simple Systems)
31 pages
Trends in Computer Architecture
No ratings yet
Trends in Computer Architecture
30 pages
CST 402 DC QB
No ratings yet
CST 402 DC QB
6 pages
Serial and Parallel First 3 Lecture
No ratings yet
Serial and Parallel First 3 Lecture
17 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
13 pages
Operating System: Operating Systems: Internals and Design Principles
No ratings yet
Operating System: Operating Systems: Internals and Design Principles
81 pages
Concurrent Processes and Real-Time Scheduling: Concurrency in Embedded Systems
No ratings yet
Concurrent Processes and Real-Time Scheduling: Concurrency in Embedded Systems
16 pages
Parallel Computer Models: CSE7002: Advanced Computer Architecture
No ratings yet
Parallel Computer Models: CSE7002: Advanced Computer Architecture
37 pages
Embedded System & IoT
No ratings yet
Embedded System & IoT
27 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
Characteristics of Modern Operating Systems
No ratings yet
Characteristics of Modern Operating Systems
16 pages
ACA Notes UNIT-1
No ratings yet
ACA Notes UNIT-1
20 pages
Multiprocessor vs Multicomputer Systems
No ratings yet
Multiprocessor vs Multicomputer Systems
27 pages
Distributed System Course File
No ratings yet
Distributed System Course File
26 pages
Unit 4
No ratings yet
Unit 4
62 pages
Unit-5 PST CS AGC
No ratings yet
Unit-5 PST CS AGC
17 pages
Mobile Computing Dr.P.rizwan Ahmed
No ratings yet
Mobile Computing Dr.P.rizwan Ahmed
6 pages
Intruders in Network Security
No ratings yet
Intruders in Network Security
9 pages
Instruction Set, Addressing Modes, Assembler Directives
No ratings yet
Instruction Set, Addressing Modes, Assembler Directives
9 pages
Unit-1 ACA
No ratings yet
Unit-1 ACA
26 pages
RTOS Multitasking & Scheduling Guide
No ratings yet
RTOS Multitasking & Scheduling Guide
34 pages
Scribid ACA Important Topics With Answers
No ratings yet
Scribid ACA Important Topics With Answers
57 pages
Unit I
No ratings yet
Unit I
53 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
Deadlock Management Strategies
No ratings yet
Deadlock Management Strategies
15 pages
Computer Architecture Assignment 1
No ratings yet
Computer Architecture Assignment 1
8 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
34 pages
Eiot Notes
No ratings yet
Eiot Notes
129 pages
Microcontroller Architecture PIC18F Family: ELEC 330 Digital Systems Engineering Dr. Ron Hayne
100% (1)
Microcontroller Architecture PIC18F Family: ELEC 330 Digital Systems Engineering Dr. Ron Hayne
29 pages
Programmed I/O vs. Interrupt I/O Explained
No ratings yet
Programmed I/O vs. Interrupt I/O Explained
9 pages
Unit 4 - Run - Time Environment
No ratings yet
Unit 4 - Run - Time Environment
34 pages
CS8791-CC Unit-II
No ratings yet
CS8791-CC Unit-II
75 pages
PPS - Unit 1
No ratings yet
PPS - Unit 1
69 pages
Embedded System Kerala University Module 1 Notes
100% (1)
Embedded System Kerala University Module 1 Notes
13 pages
Hardware Multithreading
No ratings yet
Hardware Multithreading
22 pages
Co Unit 1 Notes
100% (1)
Co Unit 1 Notes
51 pages
Chapter 4 (Processors and Memory Hierarchy)
100% (1)
Chapter 4 (Processors and Memory Hierarchy)
17 pages
ARM7 Processor Architecture
No ratings yet
ARM7 Processor Architecture
33 pages
CPU Interrupts & Memory Management
No ratings yet
CPU Interrupts & Memory Management
2 pages
Internet of Things (IoT) Enabling Technologies
No ratings yet
Internet of Things (IoT) Enabling Technologies
5 pages
Unit Ii
No ratings yet
Unit Ii
61 pages
5 ArduinoShields
No ratings yet
5 ArduinoShields
62 pages
Concurrent Process
No ratings yet
Concurrent Process
21 pages
FDP Brochure PDF
100% (1)
FDP Brochure PDF
2 pages
Systolic Array
No ratings yet
Systolic Array
42 pages
Artificial Intelligence (AI) - Water-Jug Problem
100% (1)
Artificial Intelligence (AI) - Water-Jug Problem
3 pages
Multiprocessing Vs Multithreading 2
No ratings yet
Multiprocessing Vs Multithreading 2
16 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
Parallel Programming - Unit 1
No ratings yet
Parallel Programming - Unit 1
81 pages
Revison Notes of Input Output
No ratings yet
Revison Notes of Input Output
3 pages
Basic Structure of Computers Revisions
No ratings yet
Basic Structure of Computers Revisions
4 pages
Memory Systems
No ratings yet
Memory Systems
93 pages
Topic 4 - Power Series - Q&A - Q1-Q6
No ratings yet
Topic 4 - Power Series - Q&A - Q1-Q6
12 pages
Basic Structure of Computers Revisions
No ratings yet
Basic Structure of Computers Revisions
3 pages
Chapter 6
No ratings yet
Chapter 6
38 pages
4 Losses - DataSheet (Assign 2)
No ratings yet
4 Losses - DataSheet (Assign 2)
1 page
1 Boost - Design (Assign 1)
No ratings yet
1 Boost - Design (Assign 1)
1 page
Microelectronics: Circuit Analysis and Design
No ratings yet
Microelectronics: Circuit Analysis and Design
46 pages
Chapter 9
No ratings yet
Chapter 9
24 pages
Semiconductor Basics & Diode Analysis
No ratings yet
Semiconductor Basics & Diode Analysis
64 pages
Microelectronics: Circuit Analysis and Design
100% (1)
Microelectronics: Circuit Analysis and Design
29 pages
Microelectronics: Circuit Analysis and Design
No ratings yet
Microelectronics: Circuit Analysis and Design
29 pages
Chapter 5
No ratings yet
Chapter 5
31 pages
Noc Engineer Interview Questions
No ratings yet
Noc Engineer Interview Questions
5 pages
Cyber Law
No ratings yet
Cyber Law
43 pages
USB Audio/MIDI Interface: Reference Manual
No ratings yet
USB Audio/MIDI Interface: Reference Manual
24 pages
1.1.1a Architecture of The CPU - OCR GCSE
No ratings yet
1.1.1a Architecture of The CPU - OCR GCSE
16 pages
Answer Quiz - Congestoin Control - Truong Quang Tuong
No ratings yet
Answer Quiz - Congestoin Control - Truong Quang Tuong
4 pages
Overview of Cloud Computing by Asst. Prof. Lija Mishra
100% (3)
Overview of Cloud Computing by Asst. Prof. Lija Mishra
23 pages
LLMNR Attack
No ratings yet
LLMNR Attack
4 pages
High-Speed Binary Memory Solution
No ratings yet
High-Speed Binary Memory Solution
8 pages
Lesson 5
No ratings yet
Lesson 5
10 pages
Requirements Gathering Template (Version 1.0)
No ratings yet
Requirements Gathering Template (Version 1.0)
7 pages
Classroom Template Toolkit
100% (1)
Classroom Template Toolkit
30 pages
Exiting Employee Checklist Guide
No ratings yet
Exiting Employee Checklist Guide
2 pages
Lab CC Final
No ratings yet
Lab CC Final
32 pages
Foundations of Programming Languages Unknown Download
No ratings yet
Foundations of Programming Languages Unknown Download
89 pages
Browse Word Help: Get The Latest Content While Working in The 2007 Release
No ratings yet
Browse Word Help: Get The Latest Content While Working in The 2007 Release
1 page
Information Assurance Security Reviewer Exam 2nd Semester 2025 2026
No ratings yet
Information Assurance Security Reviewer Exam 2nd Semester 2025 2026
8 pages
Project Presentation
No ratings yet
Project Presentation
16 pages
PyMOL Guide for Students & Professors
No ratings yet
PyMOL Guide for Students & Professors
11 pages
Conceptual Graphs For A Database Interface (Sowa 1976)
No ratings yet
Conceptual Graphs For A Database Interface (Sowa 1976)
22 pages
301 Software Engineering Unit-1 PDF
No ratings yet
301 Software Engineering Unit-1 PDF
24 pages
SET-331. Micro Controller Based Refrigeration Control System
No ratings yet
SET-331. Micro Controller Based Refrigeration Control System
4 pages
Fenris Debug-1
No ratings yet
Fenris Debug-1
16 pages
Are You Sure?: Astrology For Beginners B V Raman
No ratings yet
Are You Sure?: Astrology For Beginners B V Raman
2 pages
Arya's Resume
No ratings yet
Arya's Resume
1 page
(Advances in Information Systems and Business Engineering) Tahvo Hyötyläinen (Auth.)-Steps to Improved Firm Performance With Business Process Management_ Adding Business Value With Business Process Ma
100% (1)
(Advances in Information Systems and Business Engineering) Tahvo Hyötyläinen (Auth.)-Steps to Improved Firm Performance With Business Process Management_ Adding Business Value With Business Process Ma
235 pages
SinoGNSS N2 GNSS Receiver
No ratings yet
SinoGNSS N2 GNSS Receiver
2 pages
RNC Step2 Expansion - General - Procedure
No ratings yet
RNC Step2 Expansion - General - Procedure
9 pages
MGM LAN Tutorial
No ratings yet
MGM LAN Tutorial
13 pages
Log Com - Ea.gp - Fifamobile 1672758505
No ratings yet
Log Com - Ea.gp - Fifamobile 1672758505
11 pages
Username Password Peserta
No ratings yet
Username Password Peserta
4 pages

Benefits of Parallel Computing

Uploaded by

Benefits of Parallel Computing

Uploaded by

Parallelism

Why Parallel Architecture?

The data can be distributed among various multiple functional units.

The floating-point operations are separated into three circuits operating in

The logic, shift, and increment operations can be performed concurrently on

2. It can be impractical to solve larger problems on Serial Computing.

3. It can take advantage of non-local resources when the local re sources

4. Serial Computing ‘wastes’ the potential computing power, thus Parallel

January 2013 Instruction Level Parallelism 9

• Instruction level parallel processing is the concurrent processing of

January 2013 Instruction Level Parallelism 10

January 2013 Instruction Level Parallelism 11

January 2013 Instruction Level Parallelism 12

January 2013 Instruction Level Parallelism 13

January 2013 Instruction Level Parallelism 14

January 2013 Instruction Level Parallelism 15

January 2013 Instruction Level Parallelism 16

January 2013 Instruction Level Parallelism 17

January 2013 Instruction Level Parallelism 18

You might also like