Multiprocessor
Operating Systems
Why Choose a Multiprocessor?
• A single CPU can only go so fast, use more than
one CPU to improve performance
• Multiple users
• Multiple applications
• Multi-tasking within an application
• Responsiveness and/or throughput
• Share hardware between CPUs
Multiprocessor Symmetry
• In a multiprocessing system, all CPUs may be equal, or some
may be reserved for special purposes.
• A combination of hardware and operating-system software
design considerations determine the symmetry.
• Systems that treat all CPUs equally are called symmetric
multiprocessing (SMP) systems.
• If all CPUs are not equal, system resources may be divided in a
number of ways, including asymmetric multiprocessing
(ASMP), non-uniform memory access (NUMA)
multiprocessing, and clustered multiprocessing.
Processor Coupling
Tightly-coupled multiprocessor systems:
• Contain multiple CPUs that are connected at the bus level.
• These CPUs may have access to a central shared memory
(Symmetric Multiprocessing, or SMP), or may participate in a
memory hierarchy with both local and shared memory (Non-
Uniform Memory Access, or NUMA).
• Example: IBM p690 Regatta, Chip multiprocessors, also known as
multi-core computing.
Loosely-coupled multiprocessor systems:
• Often referred as clusters
• Based on multiple standalone single or dual processor commodity
computers interconnected via a high speed communication system,
such as Gigabit ethernet.
• Example: Linux Beowulf cluster
Multiprocessor Communication
Architectures
Message Passing
• Separate address space for each processor
• Processors communicate via message passing
• Processors have private memories
• Focuses attention on costly non-local operations
Shared Memory
• Processors communicate with shared address space
• Processors communicate by memory read/write
• Easy on small-scale machines
• Lower latency (minimum delay)
• SMP or NUMA
• Mostly use
Shared-Memory Processors
•Single copy of the OS (although some parts might be parallel)
•Relatively easy to program and port sequential code to
•Difficult to scale to large numbers of processors
processor
1
processor
2
... processor
N
cache cache cache
interconnection network
memory
1
memory
2
... memory
M
UMA machine block diagram
Types of Shared-Memory Architectures
UMA
• Uniform Memory Access
• Access to all memory occurred at the same speed for all
processors.
NUMA
• Non-Uniform Memory Access also called as “Distributed Shared
Memory”.
• Typically interconnection is grid or hypercube.
• Access to some parts of memory is faster for some processors
than other parts of memory.
• Harder to program, but scales to more processors
Multiple Processor Systems
Figure: (a) A shared-memory multiprocessor. (b) A message-
passing multicomputer. (c) A wide area distributed system.
(a) is a Uniform Memory Access (UMA) architecture, while (b) and
(c) are Non-Uniform Memory Access (NUMA) architectures
UMA Multiprocessors with
Bus-Based Architectures
Figure: Three bus-based multiprocessors. (a) Without caching. (b)
With caching. (c) With caching and private memories.
Interconnection Technology (1)
Figure: Various interconnect topologies.
(a) A single switch. (b) A ring. (c) A grid.
Interconnection Technology (2)
Figure: Various interconnect topologies.
(d) A double torus. (e) A cube. (f) A 4D hypercube.
UMA Multiprocessors
Using Crossbar Switches
Figure: The “Dance Hall” approach: (a) An 8 × 8 crossbar switch.
(b) An open crosspoint. (c) A closed crosspoint.
NUMA Multiprocessors (1)
Characteristics of NUMA machines:
1. Access to remote memory is via LOAD and
STORE instructions.
2. Access to remote memory is slower than
access to local memory.
NUMA Multiprocessors (2)
Figure: (a) A 256-node directory-based multiprocessor.
UMA vs NUMA
BASIS OF COMPARISON UMA NUMA
Memory access time is balanced or Memory access time is not
Memory Access Time
equal. balanced or equal.
There are three types of buses used There are two types of buses used
in Uniform Memory Access, they in Non-uniform Memory Access,
Types Of Buses
include: single, multiple and they include: Tree and
Crossbar. Hierarchical.
Used in time-sharing applications Used in real-time applications and
Application
and general purpose applications. time-critical applications.
Has less bandwidth when Has more bandwidth when
Bandwidth compared to Non-uniform Memory compared to Uniform Memory
Access. Access (UMA).
Memory Controllers Has a single memory controller. Has multiple memory controllers.
It is slower than Non-uniform It is faster than uniform memory
Speed
Memory Access (NUMA). access (UMA).
Multiprocessor Operating Systems
• Multiprocessing systems offer the advantage of increased
throughput due to multiple processors, are economical to work on
and have increased reliability due to fault tolerance.
• There are three structures for multiprocessor operating system:
▪ Separate kernel configuration
▪ Master-slave configuration
▪ Symmetric configuration.
Separate kernel configuration
• This configuration statically partitions the resources in the system
into different domains of control and assigning these domains to
various processors in the system.
• Each processor executes its own operating system. Each processor
has its own I/O devices and file system.
Separate kernel configuration
Each CPU has its own OS
• Statically allocate physical memory to each CPU
• Each CPU runs its own independents OS
• Share peripherals
• Each CPU handles its processes system calls
• Used in early multiprocessor systems
• Simple to implement
• Avoids concurrency issues by not sharing
• Issues: 1. Each processor has its own scheduling queue.
2. Each processor has its own memory partition.
3. Consistency is an issue with independent disk buffer caches and
potentially shared files.
Master-slave configuration
• This configuration assigns one processor as master and other
processors in the system as slave processors.
• The master processor runs the operating system and processes
while slave processors run the processes only.
Master-slave configuration
Master-Slave Multiprocessors
• OS mostly runs on a single fixed CPU.
• User-level applications run on the other CPUs.
• All system calls are passed to the Master CPU for processing
• Very little synchronisation required
• Single to implement
• Single centralised scheduler to keep all processors busy
• Memory can be allocated as needed to all CPUs.
• Issues: Master CPU becomes the bottleneck.
Symmetric configuration
• The symmetry has been established by treating all the processors
same.
• All the processors are able to execute the operating system in this
configuration.
• Any processor can access any device and can handle any interrupts
generated on it.
Symmetric configuration
Symmetric Multiprocessors (SMP)
• OS kernel runs on all processors, while load and resources are balanced between
all processors.
• One alternative: A single mutex (mutual exclusion object) that make the entire
kernel a large critical section; Only one CPU can be in the kernel at a time; Only
slight better than master-slave
• Better alternative: Identify independent parts of the kernel and make each of
them their own critical section, which allows parallelism in the kernel
• Issues: A difficult task; Code is mostly similar to uniprocessor code; hard part is
identifying independent parts that don’t interfere with each other
Process synchronization
• Spin-locks
– Spin-locks are useful in the case when the number of processes
is less than the number of processors in the system.
• Queued locks
– A global queue is maintained where all waiting processes
reside.
– When the process holding the lock releases it, the process in
the queue is waken up.
• Special Hardware for process synchronization
– System link and interface controller (SLIC) chip.
Semaphore Usage - Synchronization
• Semaphores can also be used to solve synchronization
problems.
• Say there are 2 concurrently executing processes: P1 with a
statement S1 and P2 with S2. S2 needs to be executed only after
S1.
• Solution: Let P1 and P2 share a semaphore, synch, initialized to
0.
Process P1: S1
signal (synch);
Process P2: wait (synch);
S2;
Because synch is initialized to 0, P2 will execute S2 only
after P1 has invoked signal (synch) after executing S1.
Disadvantage of these mutual exclusion
solutions
• These solutions all require busy waiting i.e., while a process is in its
CS, all other processes trying to enter CS must loop continuously in
the entry code (wait on mutex).
• Thus, CPU cycles are wasted. This is also called spinlock (Pi “spins”
waiting on mutex lock).
• Advantage of spin lock is that no context switch is required. So, this
can be used where applications need to hold the lock for a short
duration.
• Note that applications may spend lots of time in critical sections
and therefore this may not be a good solution.
Process scheduling
• There are various factors to be considered while scheduling the
processes in multiprocessor systems:
• to run processes simultaneously
• cache corruption
• context switch time
• There are two types of scheduling algorithms:
– Job-blind scheduling algorithms : These algorithms schedules
processes on any processor without considering any parallel
computation of processes or their any other preferences.
– The job-aware scheduling algorithms : These algorithms
consider various factors while scheduling the processes or
threads.
Various time scheduling algorithms
a. Timesharing scheduling : This type of scheduling considers
the independent processes of a program to be executed in
parallel.
b. Affinity based scheduling : If a process has shown affinity to
a processor, it is scheduled to be executed on the same
processor.
c. Space sharing : Scheduling multiple processes or threads at
the same time across multiple processors is known as space
sharing.
Space Sharing
Figure: A set of 32 CPUs split into four partitions,
with two CPUs available.
Various time scheduling algorithms
d. Gang Scheduling : A group or gang of threads from the same
process are scheduled as a unit, i.e. all the threads in the gang are
scheduled at the same time and start run on the processors.
e. Smallest number of threads first scheduling : Schedules a
process that has smallest number of threads in the system
Multiprocessor Scheduling
• Problem with communication between two
threads
– both belong to process A
– both running out of phase
Multiprocessor Scheduling
Solution: Gang Scheduling
1. Groups of related threads scheduled as a unit (a
gang)
2. All members of gang run simultaneously
• on different timeshared CPUs
3. All gang members start and end time slices together
Gang Scheduling
Figure: Gang scheduling.
Memory Sharing
• Memory or Cache Coherence
– Invalidate
– Update
– Home Node
Home Node Home Node
1. Request 1. Request
2.Request forwarded 3. Modified
2. Reply Copy
P P P1(Modifier)
3. Modified
Copy
When Page is clean
When Page is Dirty
Process migration
• While migrating a process from one node to another node, its
residual dependency, i.e. process’s dependency on its former node,
must be checked before its migration.
• There are two types of migration strategy supported by
multiprocessor systems :
➢eager migration strategy
➢dirty eager migration strategy
Fault Tolerance
• FAULT DETECTION
• FAULT RECOVERY