Chapter 9

This document discusses scalable multithreaded and dataflow architectures. It covers topics like latency hiding techniques, shared virtual memory, page swapping, prefetching, distributed coherent caches, relaxed memory consistency models, multiple context support, and dataflow architectures. It provides examples of scalable architectures like the Stanford DASH multiprocessor, KSR-1, ETL-4 machine, and MIT dataflow prototype.

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views50 pages

Chapter 9

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Chapter 9

scalable multithreaded and dataflow

architecture
-by
Prajwala T R
Dept. of CSE
PESIT
• Latency hiding techniques
– Pefetching caches
– Coherent caches
– Relaxed memory models
– Multiple context support
Shared virtual memory
• Architecture environment
– Large scale cache coherent NUMA system
– Physical memory is distributed among clusters
– Cache coherence mechanism
– Multilevel caches
– Write buffers
SVM concept
Page swapping
• Access rights
• Page fault-memory manager, page frame
• Message passing environment
• Improves scalability
A. Prefetching techniques
• Binding Prefetch
• Nonbinding
• Software-access remote data
• Hardware-cache
• Benefits
– Reduce latency
– Reduce network traffic
B. Distributed coherent caches
• Snoopy cache coherence protocols
• Hardware coherent caches-DASH
• Benefits
– Reduction in number of cycles wasted due to read
miss
Scalable coherence interface(SCI)
• Point to point connections
• Interface between nodes and external
interconnect
• Backplane bus
• Interconnection network is ring or cross bar
• Converter is used to bridge SCI and bus
Sharing list structures
Sharing list creation
• Memory state-cached or uncached
• Header node.
• Steps
– Home state and all other copies are invalid in
cache
– Sharing list creation -invalid to pending state
– Read issue-cached to uncached.
Strong consistency model
• Sequential consistency model
– Result of execution appears as interleaving of
operations of processors when executed on
multithreaded sequential machine
Relaxed memory consistency
• Processor consistency
– Writes by individual processors should always
follow program order.
– Writes from different processors can be out of
order
– Before read operation by a processor all other
processors must complete all previous reads
– Similarly write operation
• Weak consistency
– Uses synchronization operators for sequential flow
of instructions
– Lock
– Test and set atomic operations
• Release consistency
– Acquires(R) and releases(W)-lock and unlock
– Before an ordinary read or write access is allowed
to perform w.r.t any processor all previous acquire
access must be performed
– Before release access is allowed to perform w.r.t
any processor all previous read and store
operations must be performed
– Requires processor consistency
Principles of multi treading
• Thread
• Multithreading environment
• Context switch
Architecture environment
• Parameters
– Latency
– Number of threads
– Context switching overhead
– Interval between switches.
Multithreaded computations
1. Computation starts with sequential thread
2. Thread scheduling
3. Processors begin computation
4. Inter computer messages
5. Synchronization operations
Problems of asynchrony
Solutions to multithreaded problems
• When remote load is issued processor begins
to do work of another thread.
• Messages carry continuations
• Each remote node and response has an
identifier
Distributed caching
• Each node has
– Memory location has owner
– Cache memory
– Directory-import/export
shared(read),exclusive(write)
Multiple context(threads) processors
• Enhanced processor model
• Single threaded,multithreaded-when remote
request arrives
• Idle condition
• Effciency
• States-ready,running,leaving,blocked and busy
state
Context switching policies

• Switch on cache miss

• Switch on every load
• Switch on every instruction
• Switch on block of instructions
Processor efficiencies

• R-remote request reference

• L-latency
• Efficiency=R/L+R
• Saturation region
• Linear region
• Efficiency of saturated region=R/R+C
• Efficiency of linear region=N. R/R+C+L
Multidimensional architecture
• Wisconsin multicube
– First level cache-processor cache-SRAM
– Second level cache-snooping cache
• Orthogonal multiprocessor
– N X n memory mesh
– 2n logical bus
Fine grain Multicomputers
• Fine grain instructions
• Communication latency
• Tc +Ts= total time required for IPC
• Grain size ts is function of execution time
The MIT J machine
• Machine architecture
– K ary n cube architecture
– 65536 nodes is maximum network size
• ISA fixed format 3 address instruction
– 3 execution levels
• Communication support
– Provides end to end delivery of messages
• Send 4 word message
• SEND R0,0
• SEND R1,R2,0
• SEND R3,[A3],0
Scalable and multithreaded
architecture
• Stanford dash multiprocessor
• Prototype architecture
– 2 X 2 mesh network
– 16 processors at each node
• Memory hierarchy
– Invalidation based coherence protocol-
uncached,shared,dirty
• Directory based protocol
– presence bit and state bit
Stanford dash multiprocessor
Kendall square research KSR-1
• Architecture
– Ring architecture
• Memory structure
– No hierarchy of memory
– ALLCACHE
• Programming model
– Sequential consistency model
– Dynamic memory mgmt
Dataflow and hybrid architectures
• Dataflow architecture-MIT tagged token
architecture
• Hybrid=von neuman+dataflow arch
• Static dataflow computers
• Dynamic dataflow computers
ETL 4 machine
• Architecture
• Omega network
• Within each node cross bar network
MIT prototype
• Architecture
• Single address space system
• Cube network for interconnection
• 16 X 16 switches

Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Multicore Question Bank
No ratings yet
Multicore Question Bank
5 pages
Cs7103 Multicore Architecture
No ratings yet
Cs7103 Multicore Architecture
5 pages
GPU Unit-1
No ratings yet
GPU Unit-1
10 pages
Module - 6
No ratings yet
Module - 6
89 pages
Lect. 9: Multithreading: - Dynamic Out-Of-Order Scheduling - Prefetching
No ratings yet
Lect. 9: Multithreading: - Dynamic Out-Of-Order Scheduling - Prefetching
12 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
CCA - Module 3 - Concurrent Computing
No ratings yet
CCA - Module 3 - Concurrent Computing
49 pages
Unit 4
No ratings yet
Unit 4
9 pages
Lec 4 Superscalarprocessor Updated PDF
No ratings yet
Lec 4 Superscalarprocessor Updated PDF
40 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Ch-9 MIMD Architecture and SPMD
No ratings yet
Ch-9 MIMD Architecture and SPMD
8 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Lec 4 Superscalarprocessor PDF
No ratings yet
Lec 4 Superscalarprocessor PDF
23 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Yan Solihin - Fundamentals of Parallel Computer Architecture
100% (2)
Yan Solihin - Fundamentals of Parallel Computer Architecture
547 pages
10 Multithreading
No ratings yet
10 Multithreading
60 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
Introduction To Parallel Processing Architecture
No ratings yet
Introduction To Parallel Processing Architecture
31 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
Lecture 25
No ratings yet
Lecture 25
41 pages
Multi Core
No ratings yet
Multi Core
7 pages
Lecture 12
No ratings yet
Lecture 12
49 pages
Distributed System
100% (1)
Distributed System
26 pages
Multithreading, SMT and CMP
No ratings yet
Multithreading, SMT and CMP
7 pages
Lss Computer Science
No ratings yet
Lss Computer Science
14 pages
Future Processors To Use Coarse-Grain Parallelism
No ratings yet
Future Processors To Use Coarse-Grain Parallelism
48 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
33 pages
Kai Hwang: Advanced Computer Architecture
No ratings yet
Kai Hwang: Advanced Computer Architecture
9 pages
Module 4
No ratings yet
Module 4
66 pages
Aca
No ratings yet
Aca
13 pages
28895568
No ratings yet
28895568
9 pages
ADVANCED COMPUTER ARCHITECTURE - Parallelism, Scalability, Programmability
No ratings yet
ADVANCED COMPUTER ARCHITECTURE - Parallelism, Scalability, Programmability
9 pages
Module 4 - Architecture
No ratings yet
Module 4 - Architecture
22 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
Advanced Multiprocessor Architecture
No ratings yet
Advanced Multiprocessor Architecture
42 pages
LP V Theory and Practical Explanation: o o o o
No ratings yet
LP V Theory and Practical Explanation: o o o o
96 pages
MULTITHREADING
No ratings yet
MULTITHREADING
30 pages
S 00458 Ed 1 V 01 y 201212 Cac 021
No ratings yet
S 00458 Ed 1 V 01 y 201212 Cac 021
111 pages
Getting More Out of Processors: Everyone Wants To Compute Faster, But How?
No ratings yet
Getting More Out of Processors: Everyone Wants To Compute Faster, But How?
8 pages
Prebook MCAP
No ratings yet
Prebook MCAP
11 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Module 2
No ratings yet
Module 2
5 pages
Multi-Core Architectures
100% (1)
Multi-Core Architectures
43 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
RG1 Intro ParallelArch HPCAI Jan2020
No ratings yet
RG1 Intro ParallelArch HPCAI Jan2020
47 pages
Lecture 16
No ratings yet
Lecture 16
30 pages
Coils
No ratings yet
Coils
7 pages
Blueprints For High Availability 2nd Ed Edition Evan Marcus PDF Version
No ratings yet
Blueprints For High Availability 2nd Ed Edition Evan Marcus PDF Version
64 pages
0a880a0c-59b6-49d9-a2d1-e8b0c35044e1
No ratings yet
0a880a0c-59b6-49d9-a2d1-e8b0c35044e1
558 pages
Fujitsu ETERNUS Overview 2019
No ratings yet
Fujitsu ETERNUS Overview 2019
12 pages
Fuyansheng Electronics (Fujian) Co. LTD
No ratings yet
Fuyansheng Electronics (Fujian) Co. LTD
4 pages
Spectrum - 5.4.1 - ReleaseNotes
No ratings yet
Spectrum - 5.4.1 - ReleaseNotes
18 pages
Fundamental Information Technology Engineer Examination (Morning)
No ratings yet
Fundamental Information Technology Engineer Examination (Morning)
35 pages
DBMS
No ratings yet
DBMS
73 pages
ATCommandSet StepIII
No ratings yet
ATCommandSet StepIII
70 pages
Algorithm & Data Structures Course
No ratings yet
Algorithm & Data Structures Course
5 pages
The SysAdmin Handbook - The Best of Simple Talk
100% (1)
The SysAdmin Handbook - The Best of Simple Talk
519 pages
Turn Based Combat Tutorial - English
No ratings yet
Turn Based Combat Tutorial - English
21 pages
Network Data Backup Policy
100% (1)
Network Data Backup Policy
19 pages
IP Addresses: Classless Addressing: Objectives
No ratings yet
IP Addresses: Classless Addressing: Objectives
49 pages
Database by Kuya NR
No ratings yet
Database by Kuya NR
4 pages
MCQ-Python FILE HANDLING-QB
No ratings yet
MCQ-Python FILE HANDLING-QB
17 pages
19 Master Document & Record Index
No ratings yet
19 Master Document & Record Index
3 pages
Power BI Report Server
No ratings yet
Power BI Report Server
2 pages
Answers: Oracle Exadata On-Premise 2020 Solutions Engineer Specialist Assessment
100% (1)
Answers: Oracle Exadata On-Premise 2020 Solutions Engineer Specialist Assessment
2 pages
ES Chapter6
No ratings yet
ES Chapter6
41 pages
CT042-3-1-IDB-Lecture 1
No ratings yet
CT042-3-1-IDB-Lecture 1
23 pages
DBMS MCQ
No ratings yet
DBMS MCQ
4 pages
Data Compression Project-Huffman Algorithm
56% (9)
Data Compression Project-Huffman Algorithm
54 pages
Custom Authentication in Oracle APEX
No ratings yet
Custom Authentication in Oracle APEX
25 pages
SAP Hana Exam Content
100% (1)
SAP Hana Exam Content
144 pages
PEGA CSSA 7.1 Exam Q&A Guide
No ratings yet
PEGA CSSA 7.1 Exam Q&A Guide
108 pages
Spau and SPDD: Technical Upgrade Functional Upgrade
No ratings yet
Spau and SPDD: Technical Upgrade Functional Upgrade
4 pages
Druva Partner Marketing Starter Kit AMS PDF
No ratings yet
Druva Partner Marketing Starter Kit AMS PDF
19 pages
AZURE DATA FACTORY Content
No ratings yet
AZURE DATA FACTORY Content
5 pages
FedEx Ground Label Layout Specification
50% (2)
FedEx Ground Label Layout Specification
59 pages

Chapter 9

Uploaded by

Chapter 9

Uploaded by

Chapter 9

scalable multithreaded and dataflow

• Switch on cache miss

• R-remote request reference

You might also like