S HARED A DDRESS S PACE
DSM consists of two components:
D ISTRIBUTED S YSTEMS [COMP9243] ➀ Shared address space
➁ Replication and consistency of memory objects
Shared address space:
Lecture 3b: Distributed Shared Memory
Node 1 Node 2
Slide 1 Slide 3 0x1000 0x1000
➀ DSM
➁ Case study 0x2000 0x2000
➂ Design issues
➃ Implementation issues
Network
➜ Shared addresses are valid in all processes
Transparent remote access:
D ISTRIBUTED S HARED M EMORY (DSM) Node 1 Node 2
0x1000 0x1000
DSM: shared memory + multicomputer
Shared global address space 0x2000 0x2000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Slide 2 Slide 4
0 2 5 1 3 6 4 7 11 13 15 16 Network
9 8 10 12 14
Properties:
CPU 1 CPU 2 CPU 3 CPU 4 ➜ Remote access is expensive compared to local memory access
➜ Individual operations can have very low overhead
➜ Threads can distinguish between local and remote access
S HARED A DDRESS S PACE 1 S HARED A DDRESS S PACE 2
R EQUIREMENTS OF DSM
Why DSM?:
➜ Shared memory model: easiest to program to Transparency:
➜ Physical shared memory not possible on multicomputer ➜ Location, migration, replication, concurrency
➜ DSM emulates shared memory Reliability:
Benefits of DSM: ➜ Computations depend on availability of data
Slide 5 ➜ Ease of programming (shared memory model) Slide 7
Performance:
➜ Eases porting of existing code
➜ Important in high-performance computing
➜ Pointer handling
➜ Important for transparency
• Shared pointers refer to shared memory
• Share complex data (lists, etc.) Scalability:
➜ No marshalling ➜ Important in wide-area
➜ Important for large computations
A PPLICATIONS OF DSM
Consistency:
➜ Scientific parallel computing
➜ Access to DSM should be consistent
• Bioinformatics (gene sequence analysis)
➜ According to a consistency model
• Simulations (climate modeling, economic modeling)
Slide 6 Slide 8
• Data processing (physics, astronomy) Programmability:
➜ Graphics (image processing, rendering) ➜ Easy to program
➜ Data server (distributed FS, Web server) ➜ Communication transparency
➜ Data storage
R EQUIREMENTS OF DSM 3 DSM E NVIRONMENTS 4
DSM E NVIRONMENTS Middleware:
➜ Multiprocessor ➜ Library:
• NUMA • Library routines to create/access shared memory
➜ Multicomputer • Example: MPI-2, CRL
Slide 9 Slide 11
➜ Language
• Supercomputer
• Cluster • Shared memory encapsulated in language constructs
• Network of Workstations • Extend language with annotations
• Wide-area • Example: Orca, Linda, JavaSpaces, JavaParty, Jackal
DSM I MPLEMENTATIONS
Hardware: Typical Implementation:
➜ Multiprocessor ➜ Provided by some research OSes (e.g., Mach and Chorus)
➜ Example: MIT Alewife, DASH ➜ Most often implemented in user space (e.g., TreadMarks, CVM)
➜ User space: what’s needed from the kernel?
OS with hardware support:
• User-level fault handler
Slide 10 ➜ SCI network cards (SCI = Scalable Coherent Interconnect) Slide 12
[e.g., Unix signals]
➜ SCI maps extended physical address space to remote nodes • User-level VM page mapping and protection
➜ OS maps shared virtual address space to SCI range [e.g., mmap() and mprotect()]
OS and Virtual Memory: • Message passing layer
[e.g., socket API]
➜ Virtual memory (page faults, paging)
➜ Local address space vs Large address space
DSM I MPLEMENTATIONS 5 DSM I MPLEMENTATIONS 6
Example: two processes sharing memory pages: Page migration and replication:
Node 1 Node 2 Node 1 Node 2
0x1000 0x1000 0x1000 0x1000
Slide 13 Slide 15
Network Network
Occurrence of a read fault: Recovery from read fault:
Node 1 Node 2 Node 1 Node 2
0x1000 0x1000 0x1000 0x1000
Slide 14 Slide 16
Fault! Resume
Network Network
DSM I MPLEMENTATIONS 7 DSM M ODELS 8
DSM M ODELS
Tuple Space:
Shared page (coarse-grained):
➜ Traditional model A Write A B Write B T Read T
➜ Ideal page size? C
X False sharing
➜ Examples: Ivy, TreadMarks Look for
Slide 17 Slide 19 Insert a Insert a tuple that
copy of A copy of B matches T
Shared region (fine-grained):
➜ More fine grained than sharing pages B A Return C
A (and optionally
V Prevent false sharing remove it)
B
Tuple instance B C
X Not regular memory access (transparency)
➜ Examples: CRL (C Region Library), MPI-2 one-sided A JavaSpace
communication, Shasta
L INDA E XAMPLE
Shared variable:
main() {
➜ Release and Entry based consistency ...
➜ Annotations eval("function", f()) ;
V Fine grained eval("function", f()) ;
X More complex for programmer ...
➜ Examples: Munin, Midway for (i=0; i<100; i++)
out("data", i) ;
Slide 18 Shared structure: Slide 20 ...
➜ Encapsulate shared data }
➜ Access only through predefined procedures (e.g., methods) f(){
...
V Tightly integrated synchronisation
in("data", ?x) ;
V Encapsulate (hide) consistency model
y = g(x) ;
X Lose familiar shared memory model out("function", x, y) ;
➜ Examples: Orca (shared object), Linda (tuple space) ...
}
DSM M ODELS 9 CASE S TUDY 10
D ESIGN I SSUES
Granularity
➜ Page based, Page size: minimum system page size
Replication
CASE S TUDY
➜ Lazy release consistency
TreadMarks: Scalability
➜ 1992 Rice University ➜ Meant for cluster or NOW
Slide 21 ➜ Page based DSM library Slide 23
Synchronisation primitives
➜ C, C++, Java, Fortran
➜ Locks (acquire and release), Barrier
➜ Lazy release consistency model
Heterogeneity
➜ Heterogeneous environment
➜ Limited (doesn’t address endianness or mismatched word sizes)
Fault Tolerance
➜ Research
No Security
U SING DSM
D ESIGN I SSUES
Initialisation: Shared Memory:
➜ Structure and Granularity
➜ Compiling ➜ Allocating
➜ Replication ➜ Starting program ➜ Accessing
➜ Synchronisation primitives ➜ Starting services ➜ Freeing
Slide 22 Slide 24 ➜ Specifying nodes
➜ Heterogeneity Synchronisation:
➜ Finding nodes
➜ Scalability ➜ Locks
➜ Fault tolerance Processes: ➜ Barriers
➜ Security ➜ Start
➜ Stop
➜ Process Ids
D ESIGN I SSUES 11 U SING T READ M ARKS 12
U SING T READ M ARKS
Compiling:
➜ Compile
➜ Link with TreadMarks libraries
Starting a TreadMarks Application:
app -- -h host1 -h host2 -h host3 -h host4
T READ M ARKS A PPLICATION E XAMPLE
Slide 25 Slide 27
full code example
Anatomy of a TreadMarks Program:
➜ Starting remote processes
Tmk_startup(argc, argv);
➜ Allocating and sharing memory
shared = (struct shared*) Tmk_Malloc(sizeof(shared));
Tmk_distribute(&shared, sizeof(shared));
I MPLEMENTATION I SSUES
➜ Consistency protocol
➜ Barriers ➜ Update propagation
➜ Communication
Tmk_barrier(0);
➜ Data location
➜ Modification detection
Slide 26 ➜ Acquire/Release Slide 28
➜ Memory management
Tmk_lock_acquire(0);
➜ Thrashing
shared->sum += mySum;
➜ Heterogeneity
Tmk_lock_release(0);
➜ Implementation of synchronisation primitives
➜ Implementation of fault tolerance
➜ Implementation of security
T READ M ARKS A PPLICATION E XAMPLE 13 T READ M ARKS I MPLEMENTATION 14
T READ M ARKS I MPLEMENTATION
Consistency Protocol:
➜ Multiple writer
Data Location:
➜ Twins
➜ Know who has diffs because of invalidations
➜ Reduce false sharing
➜ Each page has a statically assigned manager
R RW twin
x=1 x=1
P1
x(0)
P1
x(0) x(0)
Modification Detection:
Slide 29 Slide 31 ➜ Page Fault
➜ If page is read-only then do consistency protocol
1. Write causes page fault 2. After page fault ➜ If not in local memory, get from manager
RW twin RW Memory Management:
x=1
P1
x(1) x(0)
P1
x(1) diff ➜ Garbage collection of diffs
x
0 1
3. Write is executed 4. At release or barrier
Update Propagation:
➜ Modified pages invalidated at acquire
Initialisation:
➜ Page is updated at access time
➜ Processes set up communication channels between themselves
➜ Updates are transferred as diffs
➜ Register SIGIO handler for communication
Lazy Diffs: ➜ Allocate large block of memory
Slide 30 ➜ Normally make diffs at release time Slide 32 • Same (virtual) address on each machine
➜ Lazy: make diffs only when they are requested • Mark as non-accessible
• Assign manager process for each page, lock, barrier (round
Communication:
robin)
➜ UDP/IP or AAL3/4 (ATM)
➜ Register SEGV handler
➜ Light-weight, user-level protocols to ensure message delivery
➜ Use SIGIO for message receive notification
T READ M ARKS I MPLEMENTATION 15 R EADING L IST 16
R EADING L IST
Distributed Shared Memory: A Survey of Issues and Algorithms
An overview of DSM and key issues as well as older DSM
Slide 33 implementations.
TreadMarks: Shared Memory Computing on Networks of Workstations
An overview of TreadMarks, design decisions and
implementation.
R EADING L IST 17