[go: up one dir, main page]

0% found this document useful (0 votes)
86 views50 pages

Chapter 9

This document discusses scalable multithreaded and dataflow architectures. It covers topics like latency hiding techniques, shared virtual memory, page swapping, prefetching, distributed coherent caches, relaxed memory consistency models, multiple context support, and dataflow architectures. It provides examples of scalable architectures like the Stanford DASH multiprocessor, KSR-1, ETL-4 machine, and MIT dataflow prototype.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views50 pages

Chapter 9

This document discusses scalable multithreaded and dataflow architectures. It covers topics like latency hiding techniques, shared virtual memory, page swapping, prefetching, distributed coherent caches, relaxed memory consistency models, multiple context support, and dataflow architectures. It provides examples of scalable architectures like the Stanford DASH multiprocessor, KSR-1, ETL-4 machine, and MIT dataflow prototype.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Chapter 9

scalable multithreaded and dataflow


architecture
-by
Prajwala T R
Dept. of CSE
PESIT
• Latency hiding techniques
– Pefetching caches
– Coherent caches
– Relaxed memory models
– Multiple context support
Shared virtual memory
• Architecture environment
– Large scale cache coherent NUMA system
– Physical memory is distributed among clusters
– Cache coherence mechanism
– Multilevel caches
– Write buffers
SVM concept
Page swapping
• Access rights
• Page fault-memory manager, page frame
• Message passing environment
• Improves scalability
A. Prefetching techniques
• Binding Prefetch
• Nonbinding
• Software-access remote data
• Hardware-cache
• Benefits
– Reduce latency
– Reduce network traffic
B. Distributed coherent caches
• Snoopy cache coherence protocols
• Hardware coherent caches-DASH
• Benefits
– Reduction in number of cycles wasted due to read
miss
Scalable coherence interface(SCI)
• Point to point connections
• Interface between nodes and external
interconnect
• Backplane bus
• Interconnection network is ring or cross bar
• Converter is used to bridge SCI and bus
Sharing list structures
Sharing list creation
• Memory state-cached or uncached
• Header node.
• Steps
– Home state and all other copies are invalid in
cache
– Sharing list creation -invalid to pending state
– Read issue-cached to uncached.
Strong consistency model
• Sequential consistency model
– Result of execution appears as interleaving of
operations of processors when executed on
multithreaded sequential machine
Relaxed memory consistency
• Processor consistency
– Writes by individual processors should always
follow program order.
– Writes from different processors can be out of
order
– Before read operation by a processor all other
processors must complete all previous reads
– Similarly write operation
• Weak consistency
– Uses synchronization operators for sequential flow
of instructions
– Lock
– Test and set atomic operations
• Release consistency
– Acquires(R) and releases(W)-lock and unlock
– Before an ordinary read or write access is allowed
to perform w.r.t any processor all previous acquire
access must be performed
– Before release access is allowed to perform w.r.t
any processor all previous read and store
operations must be performed
– Requires processor consistency
Principles of multi treading
• Thread
• Multithreading environment
• Context switch
Architecture environment
• Parameters
– Latency
– Number of threads
– Context switching overhead
– Interval between switches.
Multithreaded computations
1. Computation starts with sequential thread
2. Thread scheduling
3. Processors begin computation
4. Inter computer messages
5. Synchronization operations
Problems of asynchrony
Solutions to multithreaded problems
• When remote load is issued processor begins
to do work of another thread.
• Messages carry continuations
• Each remote node and response has an
identifier
Distributed caching
• Each node has
– Memory location has owner
– Cache memory
– Directory-import/export
shared(read),exclusive(write)
Multiple context(threads) processors
• Enhanced processor model
• Single threaded,multithreaded-when remote
request arrives
• Idle condition
• Effciency
• States-ready,running,leaving,blocked and busy
state
Context switching policies

• Switch on cache miss


• Switch on every load
• Switch on every instruction
• Switch on block of instructions
Processor efficiencies

• R-remote request reference


• L-latency
• Efficiency=R/L+R
• Saturation region
• Linear region
• Efficiency of saturated region=R/R+C
• Efficiency of linear region=N. R/R+C+L
Multidimensional architecture
• Wisconsin multicube
– First level cache-processor cache-SRAM
– Second level cache-snooping cache
• Orthogonal multiprocessor
– N X n memory mesh
– 2n logical bus
Fine grain Multicomputers
• Fine grain instructions
• Communication latency
• Tc +Ts= total time required for IPC
• Grain size ts is function of execution time
The MIT J machine
• Machine architecture
– K ary n cube architecture
– 65536 nodes is maximum network size
• ISA fixed format 3 address instruction
– 3 execution levels
• Communication support
– Provides end to end delivery of messages
• Send 4 word message
• SEND R0,0
• SEND R1,R2,0
• SEND R3,[A3],0
Scalable and multithreaded
architecture
• Stanford dash multiprocessor
• Prototype architecture
– 2 X 2 mesh network
– 16 processors at each node
• Memory hierarchy
– Invalidation based coherence protocol-
uncached,shared,dirty
• Directory based protocol
– presence bit and state bit
Stanford dash multiprocessor
Kendall square research KSR-1
• Architecture
– Ring architecture
• Memory structure
– No hierarchy of memory
– ALLCACHE
• Programming model
– Sequential consistency model
– Dynamic memory mgmt
Dataflow and hybrid architectures
• Dataflow architecture-MIT tagged token
architecture
• Hybrid=von neuman+dataflow arch
• Static dataflow computers
• Dynamic dataflow computers
ETL 4 machine
• Architecture
• Omega network
• Within each node cross bar network
MIT prototype
• Architecture
• Single address space system
• Cube network for interconnection
• 16 X 16 switches

You might also like