MODULE 2: Distributed Operating Systems
2.1 Introduction to Distributed Systems
A distributed system is a collection of independent computers that appear to the
users as a single coherent system.
Each node has its own memory and CPU.
Key Features:
Resource sharing
Transparency (location, access, concurrency, fault, replication, etc.)
Openness and scalability
Fault tolerance and reliability
2.2 System Architectures in Distributed OS
1. Client-Server Model
o Centralized server provides services to multiple clients.
o Simple, but has single point of failure.
o Example: NFS.
2. Peer-to-Peer Model
o All nodes are equal; share resources without centralized coordination.
o More robust and scalable.
3. Middleware-Based Systems
o Middleware acts as glue to connect different distributed applications.
o Examples: CORBA, RMI.
4. Hybrid Architectures
o Combines client-server and peer-to-peer features.
o Used in modern cloud and IoT systems.
2.3 Design Issues in Distributed OS
1. Transparency:
o Location Transparency: Users need not know where resources are located.
o Access Transparency: Uniform access regardless of resource location.
o Concurrency Transparency: Several users/processes can use resources
concurrently.
o Failure Transparency: System should handle failures gracefully.
2. Resource Management:
o Load balancing
o Process migration
o Global scheduling
3. Fault Tolerance:
o Redundancy and replication
o Failure detection and recovery
4. Scalability:
o Efficient algorithms for thousands of nodes
2.4 Communication Models in Distributed OS
1. Message Passing:
o Send/receive primitives
o Blocking and Non-blocking calls
o Socket Programming, MPI
2. Remote Procedure Calls (RPC):
o Allows a process to call a procedure on a remote system
o Hides communication details
o Examples: Sun RPC, Java RMI
3. Remote Method Invocation (RMI):
o Object-oriented variant of RPC
o Used in Java-based distributed apps
4. Group Communication:
o Multicast messages to a group of nodes
o Used in collaborative applications
2.5 Clock Synchronization
In distributed systems, there’s no global clock, so synchronization is critical.
2.5.1 Problems:
Events can happen concurrently on different machines.
Ordering events becomes difficult without synchronized clocks.
2.5.2 Algorithms:
a) Cristian's Algorithm
One machine acts as a time server.
Client sends request, server replies with current time.
Client sets its clock using server time and adjusts for round-trip delay.
Formula:
Client Time = Server Time + (RTT / 2)
b) Berkeley Algorithm
Master polls slaves to get their time.
Calculates average time and asks all nodes to adjust their clocks.
c) Network Time Protocol (NTP)
Internet protocol that synchronizes clocks to a few milliseconds accuracy.
2.6 Mutual Exclusion in Distributed Systems
Problem:
Ensure only one process accesses the critical section (shared resource) at a time across
nodes.
Algorithms:
a) Centralized Algorithm
A coordinator grants permission.
Simple, but single point of failure.
b) Ricart-Agrawala Algorithm
Based on timestamped messages.
Node requests permission from all other nodes before entering CS.
Nodes reply based on timestamps.
c) Token Ring Algorithm
Logical ring; token circulates.
Node can enter CS when it holds the token.
Efficient but failure of one node breaks the ring.
2.7 Election Algorithms
Used to choose a coordinator among distributed nodes.
Common Algorithms:
a) Bully Algorithm (by Garcia-Molina):
All processes have unique IDs.
Any process can initiate an election.
Process with highest ID wins and becomes the coordinator.
Steps:
1. Process P detects failure of coordinator.
2. Sends election message to higher-ID processes.
3. If no response, P becomes coordinator.
4. If any higher-ID responds, P waits for new coordinator message.
b) Ring Algorithm:
All nodes arranged in logical ring.
Election message circulates around ring collecting IDs.
Node with highest ID becomes coordinator.
2.8 Distributed Deadlock Detection
Deadlock: A set of processes are blocked waiting for each other’s resources.
Detection Techniques:
1. Centralized Deadlock Detection:
o One site maintains global Wait-For Graph (WFG).
o Checks for cycles = deadlock.
2. Distributed Deadlock Detection:
o Each site keeps local WFG.
o Messages passed between sites to detect cycles.
o Example: Path-pushing algorithm
3. Hierarchical Detection:
o Sites organized in a hierarchy.
o Deadlock detection escalated up the hierarchy.