0% found this document useful (0 votes)

36 views27 pages

Chapter2 Part 3

The document discusses multiple issue scheduling in processors, including static and dynamic methods, as well as speculation for executing instructions simultaneously. It covers hardware multithreading techniques, including fine-grained and simultaneous multithreading (SMT), and explores Flynn's Taxonomy of parallel computing architectures such as SIMD and MIMD. Additionally, it addresses shared and distributed memory systems, interconnection networks, and the importance of bisection width in communication performance.

Uploaded by

hzfhzf137

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views27 pages

Chapter2 Part 3

Uploaded by

hzfhzf137

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Multiple Issue (2)

 static multiple issue - functional units are

scheduled at compile time.

 dynamic multiple issue – functional units

are scheduled at run-time.

Copyright © 2010, Elsevier Inc. All rights Reserved 1

Speculation (1)
 In order to make use of multiple issue, the
system must find instructions that can be
executed simultaneously.
 In speculation, the compiler or
the processor makes a guess
about an instruction, and then
executes the instruction on the
basis of the guess.

Copyright © 2010, Elsevier Inc. All rights Reserved 2

Speculation (2)

z=x+y;
i f ( z > 0) Z will be
w=x; positive

else
w=y;

If the system speculates incorrectly,

it must go back and recalculate w = y.

Copyright © 2010, Elsevier Inc. All rights Reserved 3

Hardware multithreading (1)
 There aren’t always good opportunities for
simultaneous execution of different
threads.
 Hardware multithreading provides a means
for systems to continue doing useful work
when the task being currently executed
has stalled.

Copyright © 2010, Elsevier Inc. All rights Reserved 4

Hardware multithreading (2)
 Fine-grained - the processor switches
between threads after each instruction,
skipping threads that are stalled.

 Pros: potential to avoid wasted machine time

due to stalls.
 Cons: a thread that’s ready to execute a long
sequence of instructions may have to wait to
execute every instruction.

Copyright © 2010, Elsevier Inc. All rights Reserved 5

Hardware multithreading (3)
 Simultaneous multithreading (SMT) - a
variation on fine-grained multithreading.

 Allows multiple threads to make use of the

multiple functional units.

Copyright © 2010, Elsevier Inc. All rights Reserved 6

PARALLEL HARDWARE

Copyright © 2010, Elsevier Inc. All rights Reserved 7

Flynn’s Taxonomy
a n n
N e um
v o n
s ic SISD (SIMD)
cla s
Single instruction stream Single instruction stream
Single data stream Multiple data stream

MISD (MIMD)
Multiple instruction stream Multiple instruction stream
Single data stream Multiple data stream

no
tc
ov
ere
d

Copyright © 2010, Elsevier Inc. All rights Reserved 8

SIMD
 Parallelism achieved by dividing data
among the processors.

 Applies the same instruction to multiple

data items.

 Called data parallelism.

Copyright © 2010, Elsevier Inc. All rights Reserved 9

SIMD example

n data items
control unit
n ALUs

x[1] x[2] … x[n]

ALU1 ALU2 ALUn

for (i = 0; i < n; i++)

x[i] += y[i];

Copyright © 2010, Elsevier Inc. All rights Reserved 10

SIMD
 What if we don’t have as many ALUs as
data items?
 Divide the work and process iteratively.
 Ex. m = 4 ALUs and n = 15 data items.

Round3 ALU1 ALU2 ALU3 ALU4

1 X[0] X[1] X[2] X[3]
2 X[4] X[5] X[6] X[7]
3 X[8] X[9] X[10] X[11]
4 X[12] X[13] X[14]

Copyright © 2010, Elsevier Inc. All rights Reserved 11

Graphics Processing Units (GPU)
 Real time graphics application
programming interfaces or API’s use
points, lines, and triangles to internally
represent the surface of an object.

Copyright © 2010, Elsevier Inc. All rights Reserved 12

GPUs
 A graphics processing pipeline converts
the internal representation into an array of
pixels that can be sent to a computer
screen.

 Several stages of this pipeline

(called shader functions) are
programmable.
 Typically just a few lines of C code.

Copyright © 2010, Elsevier Inc. All rights Reserved 13

GPUs
 Shader functions are also implicitly
parallel, since they can be applied to
multiple elements in the graphics stream.

 GPU’s can often optimize performance by

using SIMD parallelism.
 The current generation of GPU’s use SIMD
parallelism.
 Although they are not pure SIMD systems.

Copyright © 2010, Elsevier Inc. All rights Reserved 14

MIMD
 Supports multiple simultaneous instruction
streams operating on multiple data
streams.

 Typically consist of a collection of fully

independent processing units or cores,
each of which has its own control unit and
its own ALU.

Copyright © 2010, Elsevier Inc. All rights Reserved 15

Shared Memory System (1)
 A collection of autonomous processors is
connected to a memory system via an
interconnection network.
 Each processor can access each memory
location.
 The processors usually communicate
implicitly by accessing shared data
structures.

Shared Memory System (2)
 Most widely available shared memory
systems use one or more multicore
processors.

Shared Memory System

Figure 2.3

Distributed Memory System
 Clusters (most popular)
 A collection of commodity systems.
 Connected by a commodity interconnection
network.

 Nodes of a cluster are individual

computations units joined by a
communication network.

Distributed Memory System

Figure 2.4

Interconnection networks
 Affects performance of both distributed
and shared memory systems.

 Two categories:
 Shared memory interconnects
 Distributed memory interconnects

Shared memory interconnects
 Bus interconnect
 A collection of parallel communication wires
together with some hardware that controls
access to the bus.
 Communication wires are shared by the
devices that are connected to it.
 As the number of devices connected to the
bus increases, contention for use of the bus
increases, and performance decreases.

Shared memory interconnects
 Switched interconnect
 Uses switches to control the routing of data
among the connected devices.

 Crossbar –

Allows simultaneous communication among
different devices.

Faster than buses.

But the cost of the switches and links is relatively
high.

Distributed memory interconnects
 Two groups
 Direct interconnect

Each switch is directly connected to a processor
memory pair, and the switches are connected to
each other.

 Indirect interconnect

Switches may not be directly connected to a
processor.

Bisection width
 A measure of “number of simultaneous
communications” or “connectivity”.

 How many simultaneous communications

can take place “across the divide” between
the halves?

Two bisections of a ring

Figure 2.9

Fully connected network
 Each switch is directly connected to every
other switch.

Figure 2.11

Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
PARALLEL PROGRAMMING Module 1
No ratings yet
PARALLEL PROGRAMMING Module 1
20 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
Parallel Processing Essentials
No ratings yet
Parallel Processing Essentials
49 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
No ratings yet
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
73 pages
Lec 44 Multicore
No ratings yet
Lec 44 Multicore
23 pages
Parallel Processing Explained
No ratings yet
Parallel Processing Explained
22 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
BCS702 Module1 Detailed Notes
No ratings yet
BCS702 Module1 Detailed Notes
14 pages
Parallel Computing IA1
No ratings yet
Parallel Computing IA1
29 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Architecture
No ratings yet
Architecture
67 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
CP4253 Map Unit I
No ratings yet
CP4253 Map Unit I
31 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Multiprocessor Architecture and Programming
No ratings yet
Multiprocessor Architecture and Programming
20 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Ch12 Parallel Proc3-Aula
No ratings yet
Ch12 Parallel Proc3-Aula
35 pages
(并行课件w3) 第2讲 1&2
No ratings yet
(并行课件w3) 第2讲 1&2
143 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
CS6461 - Computer Architecture Fall 2016: Morris Lancaster - Lecturer
No ratings yet
CS6461 - Computer Architecture Fall 2016: Morris Lancaster - Lecturer
58 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Multiprocessor Systems Overview
No ratings yet
Multiprocessor Systems Overview
51 pages
Advanced Parallel Computing Concepts
No ratings yet
Advanced Parallel Computing Concepts
38 pages
Pda 2
No ratings yet
Pda 2
105 pages
Seminar
No ratings yet
Seminar
85 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Module 1-3
No ratings yet
Module 1-3
87 pages
Cs8083 MCP Unit I Notes
No ratings yet
Cs8083 MCP Unit I Notes
31 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
CS213 Parallel Processing Syllabus
No ratings yet
CS213 Parallel Processing Syllabus
26 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
HPC Important Question
No ratings yet
HPC Important Question
19 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Lecture 3 Flynn's Classical Taxonomy
No ratings yet
Lecture 3 Flynn's Classical Taxonomy
29 pages
Parallel Processing Lecture2
No ratings yet
Parallel Processing Lecture2
62 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Lecture ParallelArchTLP-DLP
No ratings yet
Lecture ParallelArchTLP-DLP
52 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
18 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Flynn'S Classification: Cs6303 Computer Architecture
No ratings yet
Flynn'S Classification: Cs6303 Computer Architecture
11 pages
Parallel Detailed Explanations
No ratings yet
Parallel Detailed Explanations
2 pages
Hci 101
No ratings yet
Hci 101
11 pages
Analysis On Enhancing Financial Decision-Making Through Prompt Engineering
No ratings yet
Analysis On Enhancing Financial Decision-Making Through Prompt Engineering
5 pages
T-63 Allison 250 Turbine Engine Parts Catalog
100% (1)
T-63 Allison 250 Turbine Engine Parts Catalog
144 pages
Boost Customer Satisfaction & Value
No ratings yet
Boost Customer Satisfaction & Value
1 page
Lab-05-Creating Database Backups
No ratings yet
Lab-05-Creating Database Backups
35 pages
Mid-Term Speaking Test TANC5
No ratings yet
Mid-Term Speaking Test TANC5
4 pages
Cs302 Final Term Solved Papers Mega File
100% (1)
Cs302 Final Term Solved Papers Mega File
6 pages
Fire Protection System Design: Course# FP201
100% (1)
Fire Protection System Design: Course# FP201
48 pages
Facility Inspection Report: Tail Gas Treatment Unit
No ratings yet
Facility Inspection Report: Tail Gas Treatment Unit
7 pages
XPSUAT Manuale.00
No ratings yet
XPSUAT Manuale.00
104 pages
Oracle Apps Consultant Profile
No ratings yet
Oracle Apps Consultant Profile
3 pages
IATF16949+Chapter+27 +Maintenance+Related
No ratings yet
IATF16949+Chapter+27 +Maintenance+Related
12 pages
Classical and Modern Control Design
100% (2)
Classical and Modern Control Design
430 pages
Remove - Install Chain Tensioner
No ratings yet
Remove - Install Chain Tensioner
1 page
Ict Notes
No ratings yet
Ict Notes
17 pages
Evol Comp Exam Example
No ratings yet
Evol Comp Exam Example
2 pages
Premiere Pro Shortcut Keys
No ratings yet
Premiere Pro Shortcut Keys
19 pages
Righthing Software
100% (2)
Righthing Software
475 pages
Eaj 9002 MG2
No ratings yet
Eaj 9002 MG2
2 pages
Yamaha htr-4068 Rx-A550 rx-v479 SM
No ratings yet
Yamaha htr-4068 Rx-A550 rx-v479 SM
182 pages
Resume Url
No ratings yet
Resume Url
1 page
If We Were Able To Exclude The Eccentric, The Different, The Misfits, and The Weak, What Would Happen To Society?
No ratings yet
If We Were Able To Exclude The Eccentric, The Different, The Misfits, and The Weak, What Would Happen To Society?
4 pages
Smart Logistics Warehouse Moving-Object Tracking Based On
No ratings yet
Smart Logistics Warehouse Moving-Object Tracking Based On
18 pages
Indore-Smart-City-Case-Study Remname
No ratings yet
Indore-Smart-City-Case-Study Remname
63 pages
Benefits and Disadvantages of 3d Printing
No ratings yet
Benefits and Disadvantages of 3d Printing
2 pages
Usb Ota
0% (1)
Usb Ota
3 pages
Forensics Notes Summaries
No ratings yet
Forensics Notes Summaries
16 pages
Campus Recruitment System
No ratings yet
Campus Recruitment System
5 pages
Identification of Chipset
No ratings yet
Identification of Chipset
10 pages
Exam Time Table Report Subject Wise
No ratings yet
Exam Time Table Report Subject Wise
83 pages

Chapter2 Part 3

Uploaded by

Chapter2 Part 3

Uploaded by

Multiple Issue (2)

 static multiple issue - functional units are

 dynamic multiple issue – functional units

Copyright © 2010, Elsevier Inc. All rights Reserved 1

Copyright © 2010, Elsevier Inc. All rights Reserved 2

If the system speculates incorrectly,

Copyright © 2010, Elsevier Inc. All rights Reserved 3

Copyright © 2010, Elsevier Inc. All rights Reserved 4

 Pros: potential to avoid wasted machine time

Copyright © 2010, Elsevier Inc. All rights Reserved 5

 Allows multiple threads to make use of the

Copyright © 2010, Elsevier Inc. All rights Reserved 6

Copyright © 2010, Elsevier Inc. All rights Reserved 7

Copyright © 2010, Elsevier Inc. All rights Reserved 8

 Applies the same instruction to multiple

 Called data parallelism.

Copyright © 2010, Elsevier Inc. All rights Reserved 9

x[1] x[2] … x[n]

for (i = 0; i < n; i++)

Copyright © 2010, Elsevier Inc. All rights Reserved 10

Round3 ALU1 ALU2 ALU3 ALU4

Copyright © 2010, Elsevier Inc. All rights Reserved 11

Copyright © 2010, Elsevier Inc. All rights Reserved 12

 Several stages of this pipeline

Copyright © 2010, Elsevier Inc. All rights Reserved 13

 GPU’s can often optimize performance by

Copyright © 2010, Elsevier Inc. All rights Reserved 14

 Typically consist of a collection of fully

Copyright © 2010, Elsevier Inc. All rights Reserved 15

Copyright © 2010, Elsevier Inc. All rights Reserved 16

Copyright © 2010, Elsevier Inc. All rights Reserved 17

Copyright © 2010, Elsevier Inc. All rights Reserved 18

 Nodes of a cluster are individual

Copyright © 2010, Elsevier Inc. All rights Reserved 19

Copyright © 2010, Elsevier Inc. All rights Reserved 20

Copyright © 2010, Elsevier Inc. All rights Reserved 21

Copyright © 2010, Elsevier Inc. All rights Reserved 22

Copyright © 2010, Elsevier Inc. All rights Reserved 23

Copyright © 2010, Elsevier Inc. All rights Reserved 24

 How many simultaneous communications

Copyright © 2010, Elsevier Inc. All rights Reserved 25

Copyright © 2010, Elsevier Inc. All rights Reserved 26

Copyright © 2010, Elsevier Inc. All rights Reserved 27

You might also like