[go: up one dir, main page]

0% found this document useful (0 votes)
20 views9 pages

Chap 5 Slides - DSM2

The document discusses Distributed Shared Memory (DSM) which consists of shared address space and replication and consistency of memory objects. It outlines the benefits of DSM, including ease of programming and performance in high-performance computing, and details various implementations and models. Additionally, it covers design issues, requirements, and applications of DSM in fields like scientific computing and graphics.

Uploaded by

bakoyid185
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views9 pages

Chap 5 Slides - DSM2

The document discusses Distributed Shared Memory (DSM) which consists of shared address space and replication and consistency of memory objects. It outlines the benefits of DSM, including ease of programming and performance in high-performance computing, and details various implementations and models. Additionally, it covers design issues, requirements, and applications of DSM in fields like scientific computing and graphics.

Uploaded by

bakoyid185
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

S HARED A DDRESS S PACE

DSM consists of two components:


D ISTRIBUTED S YSTEMS [COMP9243] ➀ Shared address space
➁ Replication and consistency of memory objects

Shared address space:


Lecture 3b: Distributed Shared Memory
Node 1 Node 2

Slide 1 Slide 3 0x1000 0x1000

➀ DSM
➁ Case study 0x2000 0x2000

➂ Design issues
➃ Implementation issues
Network

➜ Shared addresses are valid in all processes

Transparent remote access:


D ISTRIBUTED S HARED M EMORY (DSM) Node 1 Node 2

0x1000 0x1000
DSM: shared memory + multicomputer
Shared global address space 0x2000 0x2000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Slide 2 Slide 4
0 2 5 1 3 6 4 7 11 13 15 16 Network

9 8 10 12 14

Properties:
CPU 1 CPU 2 CPU 3 CPU 4 ➜ Remote access is expensive compared to local memory access
➜ Individual operations can have very low overhead
➜ Threads can distinguish between local and remote access

S HARED A DDRESS S PACE 1 S HARED A DDRESS S PACE 2


R EQUIREMENTS OF DSM
Why DSM?:
➜ Shared memory model: easiest to program to Transparency:
➜ Physical shared memory not possible on multicomputer ➜ Location, migration, replication, concurrency
➜ DSM emulates shared memory Reliability:
Benefits of DSM: ➜ Computations depend on availability of data
Slide 5 ➜ Ease of programming (shared memory model) Slide 7
Performance:
➜ Eases porting of existing code
➜ Important in high-performance computing
➜ Pointer handling
➜ Important for transparency
• Shared pointers refer to shared memory
• Share complex data (lists, etc.) Scalability:
➜ No marshalling ➜ Important in wide-area
➜ Important for large computations

A PPLICATIONS OF DSM
Consistency:
➜ Scientific parallel computing
➜ Access to DSM should be consistent
• Bioinformatics (gene sequence analysis)
➜ According to a consistency model
• Simulations (climate modeling, economic modeling)
Slide 6 Slide 8
• Data processing (physics, astronomy) Programmability:
➜ Graphics (image processing, rendering) ➜ Easy to program
➜ Data server (distributed FS, Web server) ➜ Communication transparency
➜ Data storage

R EQUIREMENTS OF DSM 3 DSM E NVIRONMENTS 4


DSM E NVIRONMENTS Middleware:
➜ Multiprocessor ➜ Library:

• NUMA • Library routines to create/access shared memory


➜ Multicomputer • Example: MPI-2, CRL
Slide 9 Slide 11
➜ Language
• Supercomputer
• Cluster • Shared memory encapsulated in language constructs
• Network of Workstations • Extend language with annotations
• Wide-area • Example: Orca, Linda, JavaSpaces, JavaParty, Jackal

DSM I MPLEMENTATIONS
Hardware: Typical Implementation:
➜ Multiprocessor ➜ Provided by some research OSes (e.g., Mach and Chorus)
➜ Example: MIT Alewife, DASH ➜ Most often implemented in user space (e.g., TreadMarks, CVM)
➜ User space: what’s needed from the kernel?
OS with hardware support:
• User-level fault handler
Slide 10 ➜ SCI network cards (SCI = Scalable Coherent Interconnect) Slide 12
[e.g., Unix signals]
➜ SCI maps extended physical address space to remote nodes • User-level VM page mapping and protection
➜ OS maps shared virtual address space to SCI range [e.g., mmap() and mprotect()]
OS and Virtual Memory: • Message passing layer
[e.g., socket API]
➜ Virtual memory (page faults, paging)
➜ Local address space vs Large address space

DSM I MPLEMENTATIONS 5 DSM I MPLEMENTATIONS 6


Example: two processes sharing memory pages: Page migration and replication:

Node 1 Node 2 Node 1 Node 2

0x1000 0x1000 0x1000 0x1000

Slide 13 Slide 15

Network Network

Occurrence of a read fault: Recovery from read fault:

Node 1 Node 2 Node 1 Node 2

0x1000 0x1000 0x1000 0x1000

Slide 14 Slide 16
Fault! Resume

Network Network

DSM I MPLEMENTATIONS 7 DSM M ODELS 8


DSM M ODELS
Tuple Space:
Shared page (coarse-grained):
➜ Traditional model A Write A B Write B T Read T

➜ Ideal page size? C


X False sharing
➜ Examples: Ivy, TreadMarks Look for
Slide 17 Slide 19 Insert a Insert a tuple that
copy of A copy of B matches T
Shared region (fine-grained):
➜ More fine grained than sharing pages B A Return C
A (and optionally
V Prevent false sharing remove it)
B
Tuple instance B C
X Not regular memory access (transparency)
➜ Examples: CRL (C Region Library), MPI-2 one-sided A JavaSpace
communication, Shasta

L INDA E XAMPLE
Shared variable:
main() {
➜ Release and Entry based consistency ...
➜ Annotations eval("function", f()) ;
V Fine grained eval("function", f()) ;
X More complex for programmer ...
➜ Examples: Munin, Midway for (i=0; i<100; i++)
out("data", i) ;
Slide 18 Shared structure: Slide 20 ...
➜ Encapsulate shared data }
➜ Access only through predefined procedures (e.g., methods) f(){
...
V Tightly integrated synchronisation
in("data", ?x) ;
V Encapsulate (hide) consistency model
y = g(x) ;
X Lose familiar shared memory model out("function", x, y) ;
➜ Examples: Orca (shared object), Linda (tuple space) ...
}

DSM M ODELS 9 CASE S TUDY 10


D ESIGN I SSUES
Granularity
➜ Page based, Page size: minimum system page size
Replication
CASE S TUDY
➜ Lazy release consistency
TreadMarks: Scalability
➜ 1992 Rice University ➜ Meant for cluster or NOW
Slide 21 ➜ Page based DSM library Slide 23
Synchronisation primitives
➜ C, C++, Java, Fortran
➜ Locks (acquire and release), Barrier
➜ Lazy release consistency model
Heterogeneity
➜ Heterogeneous environment
➜ Limited (doesn’t address endianness or mismatched word sizes)
Fault Tolerance
➜ Research

No Security

U SING DSM
D ESIGN I SSUES
Initialisation: Shared Memory:
➜ Structure and Granularity
➜ Compiling ➜ Allocating
➜ Replication ➜ Starting program ➜ Accessing
➜ Synchronisation primitives ➜ Starting services ➜ Freeing
Slide 22 Slide 24 ➜ Specifying nodes
➜ Heterogeneity Synchronisation:
➜ Finding nodes
➜ Scalability ➜ Locks
➜ Fault tolerance Processes: ➜ Barriers
➜ Security ➜ Start
➜ Stop
➜ Process Ids

D ESIGN I SSUES 11 U SING T READ M ARKS 12


U SING T READ M ARKS
Compiling:
➜ Compile
➜ Link with TreadMarks libraries

Starting a TreadMarks Application:

app -- -h host1 -h host2 -h host3 -h host4


T READ M ARKS A PPLICATION E XAMPLE
Slide 25 Slide 27
full code example
Anatomy of a TreadMarks Program:
➜ Starting remote processes
Tmk_startup(argc, argv);

➜ Allocating and sharing memory


shared = (struct shared*) Tmk_Malloc(sizeof(shared));
Tmk_distribute(&shared, sizeof(shared));

I MPLEMENTATION I SSUES
➜ Consistency protocol
➜ Barriers ➜ Update propagation
➜ Communication
Tmk_barrier(0);
➜ Data location
➜ Modification detection
Slide 26 ➜ Acquire/Release Slide 28
➜ Memory management
Tmk_lock_acquire(0);
➜ Thrashing
shared->sum += mySum;
➜ Heterogeneity
Tmk_lock_release(0);
➜ Implementation of synchronisation primitives
➜ Implementation of fault tolerance
➜ Implementation of security

T READ M ARKS A PPLICATION E XAMPLE 13 T READ M ARKS I MPLEMENTATION 14


T READ M ARKS I MPLEMENTATION
Consistency Protocol:
➜ Multiple writer
Data Location:
➜ Twins
➜ Know who has diffs because of invalidations
➜ Reduce false sharing
➜ Each page has a statically assigned manager
R RW twin
x=1 x=1
P1
x(0)
P1
x(0) x(0)
Modification Detection:
Slide 29 Slide 31 ➜ Page Fault
➜ If page is read-only then do consistency protocol
1. Write causes page fault 2. After page fault ➜ If not in local memory, get from manager

RW twin RW Memory Management:


x=1
P1
x(1) x(0)
P1
x(1) diff ➜ Garbage collection of diffs
x
0 1

3. Write is executed 4. At release or barrier

Update Propagation:
➜ Modified pages invalidated at acquire
Initialisation:
➜ Page is updated at access time
➜ Processes set up communication channels between themselves
➜ Updates are transferred as diffs
➜ Register SIGIO handler for communication
Lazy Diffs: ➜ Allocate large block of memory
Slide 30 ➜ Normally make diffs at release time Slide 32 • Same (virtual) address on each machine
➜ Lazy: make diffs only when they are requested • Mark as non-accessible
• Assign manager process for each page, lock, barrier (round
Communication:
robin)
➜ UDP/IP or AAL3/4 (ATM)
➜ Register SEGV handler
➜ Light-weight, user-level protocols to ensure message delivery
➜ Use SIGIO for message receive notification

T READ M ARKS I MPLEMENTATION 15 R EADING L IST 16


R EADING L IST
Distributed Shared Memory: A Survey of Issues and Algorithms
An overview of DSM and key issues as well as older DSM
Slide 33 implementations.

TreadMarks: Shared Memory Computing on Networks of Workstations


An overview of TreadMarks, design decisions and
implementation.

R EADING L IST 17

You might also like