Unit - IV Notes
Unit - IV Notes
It is a mechanism that manages memory across multiple nodes and makes inter-process
communications transparent to end-users. The applications will think that they are running
on shared memory. DSM is a mechanism of allowing user processes to access shared data
without using inter-process communications. In DSM every node has its own memory and
provides memory read and write services and it provides consistency protocols. The
distributed shared memory (DSM) implements the shared memory model in distributed
systems but it doesn’t have physical shared memory. All the nodes share the virtual address
space provided by the shared memory model. The Data moves between the main memories
of different nodes.
DSM implements the distributed systems shared memory model in a distributed system,
that hasn’t any physically shared memory. Shared model provides a virtual address area
shared between any or all nodes. To beat the high forged of communication in distributed
system. DSM memo, model provides a virtual address area shared between all nodes.
systems move information to the placement of access. Information moves between main
memory and secondary memory (within a node) and between main recollections of
various nodes. Every Greek deity object is in hand by a node. The initial owner is that the
node that created the object. possession will amendment as the object moves from node
to node. Once a method accesses information within the shared address space, the
mapping manager maps shared memory address to physical memory (local or remote).
1.Nodes: Each node in the distributed system consists of one or more CPUs and a memory
unit. These nodes are connected via a high-speed communication network.
2.Memory Mapping Manager Unit: The memory mapping manager routine in each node is
responsible for mapping the local memory onto the shared memory space. This involves
dividing the shared memory space into blocks and managing the mapping of these blocks
to the physical memory of the node.
Caching is employed to reduce operation latency. Each node uses its local memory to
cache portions of the shared memory space. The memory mapping manager treats the
local memory as a cache for the shared memory space, with memory blocks as the basic
unit of caching.
• In this, a central server maintains all shared data. It services read requests from other
nodes by returning the data items to them and write requests by updating the data
and returning acknowledgement messages.
• It is simpler to implement but the central server can become bottleneck and to
overcome this shared data can be distributed among several servers. This distribution
can be by address or by using a mapping function to locate the appropriate server.
2. Migration Algorithm:
• In contrast to central server algo where every data access request is forwarded to
location of data while in this data is shipped to location of data access request which
allows subsequent access to be performed locally.
• It allows only one node to access a shared data at a time and the whole block
containing data item migrates instead of individual item requested.
• This algo provides an opportunity to integrate DSM with virtual memory provided by
operating system at individual nodes.
o This extends the migration algorithm by replicating data blocks and allowing
multiple nodes to have read access or one node to have both read write
access.
o DSM must keep track of location of all copies of data blocks in this.
4. Full Replication Algorithm:
o Since many nodes can write shared data concurrently, the access to shared
data must be controlled to maintain it’s consistency.
o To maintain consistency, it can use a gap free sequences in which all nodes
wishing to modify shared data will send the modification to sequencer which
will then assign a sequence number and multicast the modification with
sequence number to all nodes that have a copy of shared data item.
• Simpler Abstraction: Programmer need not concern about data movement, as the
address space is the same it is easier to implement than RPC.
• Easier Portability: The access protocols used in DSM allow for a natural transition
from sequential to distributed systems. DSM programs are portable as they use a
common programming interface.
• On-Demand Data Movement: It provided by DSM will eliminate the data exchange
phase.
• Larger Memory Space: It provides large virtual memory space, the total memory size
is the sum of the memory size of all the nodes, paging activities are reduced.
• Message Passing: DSM use asynchronous message passing and is not efficient as per
other message passing implementation.
• Data Redundancy: DSM allows simultaneous access to data, consistency and data
redundancy is common disadvantage.
• Lower Performance: CPU gets slowed down, even cache memory does not aid the
situation.
consistency Models
Consistency models in distributed computing are fundamental for ensuring that data
remains coherent and predictable across multiple nodes in a network. These models define
the guarantees that a distributed system provides regarding the order and visibility of
operations on shared data. Understanding these models is crucial for designing and
implementing reliable distributed systems.
Data-centric consistency models focus on the properties of the data itself and the operations
performed on it. These models ensure that the data remains consistent across all nodes in
the system.
1.1 Linearizability
Linearizability ensures that all operations appear to have executed atomically in some total
order. Every operation is either completed before or after any other operation, and the order
is consistent with the real-time execution order. This model provides strong consistency but
requires high coordination, making it challenging for performance and availability.
• Strength: Strong
Sequential consistency ensures that all operations are executed in the order they were
issued by clients, but the observed order may differ across different clients. This guarantees
consistency in a distributed system while maintaining a balance between strict ordering and
performance.
• Strength: Strong
• Use Cases: Distributed databases, shared memory
• Challenges: Requires significant coordination, which may impact performance.
Causal consistency ensures that operations are ordered based on their causal dependencies.
If operation A causally precedes operation B, then all nodes will see the effects of A before
B. However, operations that are independent of each other can be seen in different orders
by different nodes.
• Strength: Moderate
• Use Cases: Social media platforms, messaging systems
• Challenges: Complex to implement and does not provide strict ordering.
Eventual consistency ensures that all nodes will eventually converge to the same state, but
there are no guarantees regarding the order or timing of operations. This model is widely
used in large-scale distributed systems where immediate consistency is not a priority.
• Strength: Weak
• Use Cases: Content delivery networks, replicated cloud storage
• Challenges: Potential for stale data and weak consistency guarantees.
Client-centric consistency models focus on the perspective of the client and the guarantees
provided to the client about the state of the data.
2.1 Read-Your-Writes Consistency This model guarantees that if a client writes a value, all
subsequent reads from the same client will reflect that write. This ensures that a user
always sees their most recent changes.
• Strength: Moderate
Monotonic reads ensure that once a client reads a value, it will never see an older value in
subsequent reads. This is useful for applications that require increasing consistency over
time.
• Strength: Moderate
This model ensures that writes by a client are executed in order. If a client writes a value
and then writes another value, the system guarantees that the first write is visible before
the second.
• Strength: Moderate
• Strength: Moderate
Hybrid consistency models combine aspects of both data-centric and client-centric models
to provide a balance between strong consistency and performance.
Bounded staleness ensures that reads return values that are no older than a specified time
or number of operations. This provides a trade-off between strong consistency and
performance.
• Strength: Moderate
Timeline consistency ensures that operations are ordered based on a timeline, which can be
defined by the system or the client. This allows for more flexible ordering guarantees while
still providing some level of consistency.
• Strength: Moderate
Thrashing
Thrashing in distributed computing refers to a performance degradation phenomenon
where excessive resource contention leads to a system spending more time on management
activities rather than executing actual processes. This issue arises when processes
continuously compete for memory, CPU, or network resources, leading to frequent page
swaps, excessive task migrations, and increased communication overhead. Thrashing can
significantly degrade system performance, making it crucial to implement proper resource
management techniques.
Thrashing is a process that occurs when the system spends a major portion of time
transferring shared data block blocks from one node to another in comparison with the time
spent on doing the useful work of executing the application process. If thrashing is not
handled carefully it degrades system performance considerably.
1. Ping pong effect- It occurs when processes make interleaved data access on two or
more nodes it may cause a data block to move back and forth from one node to
another in quick succession known as the ping-pong effect,
2. When the blocks having read-only permission are repeatedly invalidated after they
are replicated. It is caused due to poor locality of reference.
• For this method, an application-controlled lock can be associated with each data
block.
• On the basis of past access patterns, time t can be fixed statically or dynamically
• There are two ways to tune the value of t. They are as follows:
o Based on the past access pattern of the block- the value of t can be tuned.
And,
o Based on the length of the queue of processes waiting to access that block.
• Different coherence protocols for shared data having different characteristics can be
used to minimize thrashing.
Effects of Thrashing
• Reduced Throughput: As the system spends more time on resource swapping rather
than actual computation, the overall number of completed tasks decreases.
• Lower System Efficiency: The system becomes inefficient as CPU cycles and memory
bandwidth are wasted on overhead tasks rather than productive execution.
Load Balancing
The primary objective of load balancing is to distribute the workload evenly across all
available processors to prevent certain processors from being overloaded while others
remain underutilized. This ensures efficient resource utilization, prevents bottlenecks, and
enhances system responsiveness.
Global scheduling algorithms aim to reduce the overall execution time of processes by
efficiently assigning tasks to processors while considering task dependencies,
communication overhead, and resource availability. This enhances system performance and
user experience.
Fault Tolerance
An essential feature of global scheduling algorithms is fault tolerance, ensuring that the
scheduling mechanism continues to function even when certain system nodes crash or
become temporarily unavailable. This enhances system reliability and availability.
Scalability
As the number of nodes in a distributed system increases, the scheduling algorithm must
efficiently handle the growing workload without performance degradation. Scalable global
scheduling algorithms adapt to system growth, ensuring optimal resource utilization.
User Knowledge
A good scheduling algorithm requires minimal prior knowledge from users regarding process
characteristics and resource requirements. This reduces user overhead and makes the
system more user-friendly.
Global scheduling algorithms must make quick decisions about process allocation with
minimal computational overhead to prevent the scheduling process itself from becoming a
bottleneck.
Types of Global Scheduling Algorithms
Tasks are assigned priorities dynamically based on their current state and requirements.
• Example: Ali et al. and Taherin et al. propose algorithms for periodic tasks with dual
criticality levels, optimizing performance and resource utilization through online
scheduling mechanisms.
Tasks are assigned fixed priorities, with the scheduler always selecting the highest-priority
task that is ready to run.
• Example: Völp et al. and Wägemann et al. present algorithms for scheduling sporadic
and periodic tasks under energy constraints, ensuring optimal execution based on
predefined priority levels.
This approach involves rearranging the order of execution of code segments to improve
performance by reducing execution time and maximizing resource utilization. Techniques
include:
• Primitive Code Motion: Moves code segments outside basic blocks or loops to
reduce memory accesses.
• Code Sinking: Moves code segments inside loops when outputs change with each
iteration.
Global code scheduling techniques are extensively used in compiler optimization to enhance
program performance.
Operating Systems
• Linux Kernel:
o Preemption: Yes
• Windows OS:
o Preemption: Yes
Real-Time Systems
• Earliest Deadline First (EDF): Schedules processes based on their deadlines, ensuring
that the process with the nearest deadline executes first.
Distributed Systems
Scheduling algorithms must account for system resource limitations, ensuring that tasks are
allocated efficiently within available CPU, memory, and energy constraints.
Task Dependencies
Tasks often have dependencies that must be respected to ensure correct execution and
prevent deadlocks.
• Example: The Dynamic Critical Path (DCP) algorithm prioritizes critical tasks, reducing
overall schedule length.
Scalability
As the number of tasks and processors grows, scheduling complexity increases, requiring
efficient heuristics for optimal task allocation.
A load balancer is a device that acts as a reverse proxy and distributes network or
application traffic across a number of servers. Load adjusting is the approach to conveying
load units (i.e., occupations/assignments) across the organization which is associated with
the distributed system. Load adjusting should be possible by the load balancer. The load
balancer is a framework that can deal with the load and is utilized to disperse the
assignments to the servers. The load balancers allocates the primary undertaking to the
main server and the second assignment to the second server.
• Protect applications from emerging threats: The Web Application Firewall (WAF) in
the load balancer shields your site.
• Authenticate User Access: The load balancer can demand a username and secret key
prior to conceding admittance to your site to safeguard against unapproved access.
• Protect against DDoS attacks: The load balancer can distinguish and drop conveyed
refusal of administration (DDoS) traffic before it gets to your site.
• Performance: Load balancers can decrease the load on your web servers and
advance traffic for a superior client experience.
• SSL Offload: Protecting traffic with SSL (Secure Sockets Layer) on the load balancer
eliminates the upward from web servers bringing about additional assets being
accessible for your web application.
• Traffic Compression: A load balancer can pack site traffic giving your clients a vastly
improved encounter with your site.
• Round Robin
• Least Connections
• Least Time
• Hash
• IP Hash
Following are a portion of the various classes of the load adjusting calculations.
• Static: In this model assuming any hub/node is found with a heavy load, an
assignment can be taken arbitrarily and move the undertaking to some other
arbitrary system. .
• Dynamic: It involves the present status data for load adjusting. These are better
calculations than static calculations.
• Load balancer ensures high availability and reliability by sending requests only to
online servers
Load sharing basically denotes the process of forwarding a router to share the forwarding of
traffic, in case of multiple paths if available in the routing table. In case there are equal paths
then the forwarding process will follow the load-sharing algorithm. In load sharing systems,
all nodes share the overall workload, and the failure of some nodes increases the pressure of
the rest of the nodes. The load sharing approach ensures that no node is kept idle so that
each node can share the load.
For example, suppose there are two connections of servers of different bandwidths of
500Mbps and another 250Mbps. Let, there are 2 packets. Instead of sending the 2 packets
to the same connection i.e. 500Mbps, 1 packet will be forwarded to the 500Mbps and
another to the 250Mbps connection. Here the goal is not to use the same amount of
bandwidth in two connections but to share the load so that each connection can sensibly
deal with it without any traffic.
There are several issues in designing Load Balancing Algorithms. To overcome these issues
we use the load-sharing algorithm. The issues are:
Static information exchange: It decides how the framework loads information that can be
exchanged among the nodes.
Location policy: It decides the determination of an objective hub during process migration.
Priority assignment: It decides the priority of execution of a bunch of nearby and remote
processes on a specific node.
Migration restricting policy: It decides the absolute number of times a process can move
starting with one hub then onto the next.
Load Sharing algorithm includes policies like location policy, process transfer policy, state
information exchange policy, load estimation policy, priority assignment policy, and
migration limiting policy.
1. Location Policies: The location policy concludes the sender node or the receiver node of a
process that will be moved inside the framework for load sharing. Depending upon the sort
of node that steps up and searches globally for a reasonable node for the process, the
location strategies are of the accompanying kinds:
Sender-inaugurated policy: Here the sender node of the process has the priority to choose
where the process has to be sent. The actively loaded nodes search for lightly loaded nodes
where the workload has to be transferred to balance the pressure of traffic. Whenever a
node’s load turns out to be more than the threshold esteem, it either communicates a
message or arbitrarily tests different nodes individually to observe a lightly loaded node that
can acknowledge at least one of its processes. In the event that a reasonable receiver node
isn’t found, the node on which the process began should execute that process.
Receiver-inaugurated policy: Here the receiver node of the process has the priority to
choose where to receive the process. In this policy, lightly loaded nodes search for actively
loaded nodes from which the execution of the process can be accepted. Whenever the load
on a node falls under threshold esteem, it communicates a text message to all nodes or tests
nodes individually to search for the actively loaded nodes. Some vigorously loaded node
might move one of its processes if such a transfer does not reduce its load underneath the
normal threshold.
2. Process transfer Policy: All or nothing approach is used in this policy. The threshold value
of all the nodes is allotted as 1. A node turns into a receiver node if there is no process and
on the other side a node becomes a sender node if it has more than 1 process. If the nodes
turn idle then they can’t accept a new process immediately and thus it misuses the
processing power To overcome this problem, transfer the process in such a node that is
expected to be idle in the future. Sometimes to ignore the processing power on the nodes,
the load-sharing algorithm turns the threshold value from 1 to 2.
3. State Information exchange Policy: In load-sharing calculation, it is not required for the
nodes to regularly exchange information, however, have to know the condition of different
nodes when it is either underloaded or overloaded. Thus two sub-policies are used here:
Broadcast when the state changes: The nodes will broadcast the state information request
only when there is a change in state. In the sender-inaugurated location policy, the state
information request is only broadcasted by the node when a node is overloaded. In the
receiver-inaugurated location policy, the state information request is only broadcasted by
the node when a node is underloaded.
Poll when the state changes: In a large network the polling operation is performed. It
arbitrarily asks different nodes for state information till it gets an appropriate one or it
reaches the test limit.
4. Load Estimation Policy: Load-sharing algorithms aim to keep away from nodes from being
idle yet it is adequate to know whether a node is occupied or idle. Consequently, these
algorithms typically utilize the least complex load estimation policy of counting the absolute
number of processes on a node.
5. Priority Assignment Policy: It uses some rules to determine the priority of a particular
node. The rules are:
Selfish: Higher priority is provided to the local process than the remote process. Thus, it has
the worst response time performance for the remote process and the best response time
performance for the local process.
Altruistic: Higher priority is provided to the remote process than the local process. It has the
best response time performance.
Intermediate: The number of local and remote processes on a node decides the priority. At
the point when the quantity of local processes is more or equivalent to the number of
remote processes then local processes are given higher priority otherwise remote processes
are given higher priority than local processes.
6. Migration limiting policy: This policy decides the absolute number of times a process can
move. One of the accompanying two strategies might be utilized.
Controlled: A migration count parameter is used to fix the limit of the migration of a process.
Thus, a process can migrate a fixed number of times here. This removes the instability of
uncontrolled strategy.Load Sharing
• Fault Tolerance: Improves system resilience by allowing other nodes to take over if a
node fails.
Load balancing is widely applied in distributed computing to maintain system efficiency and
reliability. Some of its key use cases include:
• Example: Auto-scaling groups in AWS, Azure Load Balancer, Google Cloud Load
Balancing.
• Example: CDNs like Akamai or Cloudflare using load balancing to deliver content
efficiently from edge servers.
• Analyzes large volumes of data by splitting tasks into smaller chunks for distributed
processing.
Distributed Databases
• Example: NoSQL databases like Cassandra or MongoDB, where data partitions are
spread across nodes.
Process Migration
Process migration in distributed systems refers to the relocation of a running process from
one node to another within a network. This technique is widely used to optimize resource
utilization, balance workload distribution, improve fault tolerance, and enhance overall
system performance and reliability. By dynamically transferring processes between different
nodes, process migration ensures efficient system operations, minimizes response time, and
enhances fault recovery.
System Maintenance: During planned system maintenance, processes from affected nodes
can be transferred to other nodes, preventing service interruptions.
Data Locality Optimization: Migrating processes closer to relevant data sources improves
data access speeds, reduces communication overhead, and enhances processing efficiency.
Fault Recovery: Provides mechanisms to pause, transport, and resume processes in case of
failures, ensuring system resilience and stability.
• Checkpointing: The act of saving the current state of a process to enable resumption
from that point after migration. Checkpoints can be taken manually or automatically
at regular intervals.
• Migration Overhead: The resources and time required to transfer the process state
from one node to another, including network bandwidth and computational
resources.
• Consistency : Ensuring that the process state remains consistent and valid during and
after migration, avoiding data corruption or inconsistencies.
• Transparency: Making the migration process seamless so that the process and its
users do not notice the transition, which involves hiding the complexities of
migration from the user.
• Fault Tolerance: Mechanisms to handle failures during migration, ensuring that the
process can be restarted or resumed without loss of critical data.
• Static Migration:
o Definition: The entire process is moved to a new node, and it starts execution
from the point where it was suspended.
o Pros: Simple to implement; the process state is saved and restored in full.
o Cons: High overhead due to the transfer of the entire process state; not ideal
for processes with large memory footprints.
• Dynamic Migration:
o Definition: The process migrates while it is still running, often by migrating its
active state incrementally.
o Pros: Reduces downtime and allows for more fluid load balancing.
• Preemptive Migration:
o Definition: The process is temporarily paused, its state is saved, and it is then
moved to a new node where it resumes execution.
o Pros: Allows for planned migrations with minimal disruption.
• Non-Preemptive Migration:
o Pros: Avoids disruption during migration; can be more efficient for long-
running processes.
o Cons: Requires processes to reach suitable stopping points, which may not
always align with optimal migration times.
• Incremental Migration:
o Pros: Can reduce the impact of migration on system performance and allows
for smoother transitions.
Each type of process migration has its own advantages and trade-offs, and the choice of
method depends on factors like the system’s architecture, the nature of the processes, and
performance requirements.
o Details: Evaluate the process’s resource usage, current load on the source
node, and potential benefits of migration.
o Description: Select the appropriate destination node where the process will
be relocated.
o Description: Transfer the process from the source node to the destination
node.
o Pause the process on the source node, transfer its state, and then restart it on
the destination node.
o The process is temporarily halted to save its state, which is then restored and
execution resumes on the new node.
o Move the process’s address space, including memory and execution context,
from the source node to the destination node.
o The entire address space or significant portions are transferred to ensure that
the process can resume exactly where it left off.
• Message Forwarding
By following these steps and subcategories, process migration can be effectively managed to
achieve optimal performance and system stability in distributed environments.
Process migration techniques are strategies used to transfer a process from one node to
another in a distributed system. These techniques aim to balance load, optimize resource
utilization, and improve fault tolerance. The primary techniques include:
• Steps:
3. Restore: Load the state into the process’s new environment and resume
execution.
• Pros: Simplifies the migration process as it deals with the entire state at once.
• Cons: High overhead due to the large volume of data to transfer; downtime may
occur during migration.
2. Incremental Migration
• Description: The process state is transferred in stages rather than all at once.
• Steps:
• Pros: Reduces the impact on system performance and allows for a more gradual
transfer.
3. Lazy Migration
• Steps:
3. Transfer and Restore: Move and restore the process state at the destination
node.
4. Preemptive Migration
• Description: The process is paused, its state is saved, and then it is migrated to the
new node where it resumes execution.
• Steps:
3. Restore and Resume: Load the state at the destination node and resume
execution.
• Cons: Requires pausing the process, which can affect performance and
responsiveness.
5. Non-Preemptive Migration
• Description: The process continues to run until it reaches a natural stopping point or
checkpoint, at which point it is migrated.
• Steps:
1. Execution: Allow the process to run until a suitable stopping point is reached.
3. Restore and Resume: Load the state at the destination node and resume
execution.
• Pros: Avoids the need for pausing the process, reducing performance impact.
6. Snapshot-Based Migration
• Steps:
• Pros: Allows for point-in-time migrations and can simplify state management.
• Cons: Requires mechanisms to ensure consistency and handle potential snapshot
inconsistencies.
Threads
Threads are the smallest units of execution within a process, enabling parallel and
concurrent task execution. They share process resources, making them efficient for handling
multiple operations simultaneously, such as client requests or data processing. Threads
improve system responsiveness and throughput, essential for real-time applications and
microservices.
Distributed systems are collections of independent computers that appear to the users as a
single coherent system. These systems work together to achieve a common goal by sharing
resources and coordinating tasks across different nodes. The main characteristics of
distributed systems include:
• Scalability: They can be expanded easily by adding more nodes to handle increased
load.
• Fault Tolerance: They can continue to operate even if some components fail.
• Transparency: The complexities of the system are hidden from users, making it
appear as a single, unified entity.
Threads offer significant benefits in distributed systems, such as improving performance and
enabling concurrent task execution. However, they also present several challenges:
• Scalability: While threads can improve performance, they can also lead to scalability
issues. Too many threads can overwhelm the system, causing context-switching
overhead and reduced performance.
• Security Concerns: Threads sharing the same memory space pose security risks, as
one thread can potentially access the data of another thread. Ensuring secure data
handling and access control is crucial.
3. Load Balancing: Distributing workloads evenly across threads and nodes prevents
bottlenecks and ensures optimal resource utilization. Load balancing algorithms
dynamically allocate tasks based on current load and system capacity.
4. Resource Allocation: Allocating CPU time, memory, and other resources effectively to
threads prevents contention and ensures fair usage. Mechanisms like priority
scheduling and quotas help manage resource distribution.
7. Monitoring and Debugging: Tools for monitoring thread activity and debugging
issues are vital. Profiling tools, logging, and visualization can help identify
performance bottlenecks and concurrency issues.
Synchronization Techniques
o Locks: Ensure that only one thread can access a resource at a time.
Distributed locks can be implemented using coordination services like
Zookeeper.
o Mutexes: A mutual exclusion object that allows only one thread to hold the
lock at a time, ensuring serialized access to resources.
• Semaphores:
• Barriers:
• Condition Variables:
o Used to block a thread until a particular condition is met. They are usually
used in conjunction with mutexes to avoid race conditions.
• Monitors:
• Consensus Algorithms:
o Protocols like Paxos or Raft ensure that multiple nodes agree on a single value
or course of action, providing consistency in the face of network partitions
and failures.
• Quorum-Based Techniques:
• Token Passing:
o A token circulates among nodes, and only the node holding the token can
perform certain operations, ensuring mutual exclusion without requiring
locks.
Communication and coordination between threads in distributed systems are crucial for
ensuring that tasks are performed efficiently and correctly. Here are the primary methods
and techniques used for thread communication and coordination in such environments:
Communication Mechanisms
• Message Passing:
o RPCs allow threads to invoke methods on remote nodes as if they were local.
Frameworks like gRPC, Apache Thrift, and CORBA support RPC
communication by handling the complexities of network communication and
serialization.
• Shared Memory:
Coordination Techniques
• Consensus Algorithms:
o Algorithms like Paxos and Raft are used to achieve agreement among
distributed nodes. These protocols ensure that nodes agree on a single value
or state, which is critical for maintaining consistency.
• Leader Election:
• Quorum-Based Coordination:
• Event Coordination:
o Systems like Apache Kafka use a publish-subscribe model where threads can
publish events to a topic, and other threads can subscribe to these topics to
receive notifications. This allows for decoupled and scalable event-driven
coordination.
Fault tolerance and resilience are crucial for ensuring that threads in distributed systems can
continue operating correctly despite failures. Here are key strategies and techniques used to
achieve fault tolerance and resilience:
• Replication: Data Replication is storing copies of data across multiple nodes ensures
that if one node fails, the data can still be accessed from another node.
• Task Replication: Running the same task on multiple nodes allows the system to
continue functioning if one node fails. Results from multiple nodes can be compared
or merged to ensure correctness.
Resilience Strategies
• Load Balancing: Distributing workloads evenly across nodes and threads to prevent
overloading any single component. This helps in managing failures by ensuring that
no single node becomes a bottleneck or point of failure.
1. Load Balancing
• Dynamic Load Balancing: Distribute tasks dynamically across nodes and threads
based on current load. This helps prevent any single node from becoming a
bottleneck. Use load balancers that can adjust to changing workloads in real-time,
ensuring even distribution of tasks.
• Task Partitioning: Divide tasks into smaller, manageable units that can be distributed
across multiple threads and nodes. Ensure that tasks are independent to avoid
excessive synchronization overhead.
2. Resource Management
• Thread Pools: Use thread pools to manage a fixed number of threads that are reused
for executing tasks. This reduces the overhead of creating and destroying threads.
Adjust the size of thread pools based on system load and resource availability to
optimize performance.
3. Concurrency Control
4. Communication Efficiency
• Efficient Messaging: Use efficient messaging protocols and libraries that minimize
latency and overhead for inter-thread communication. Asynchronous messaging can
help decouple threads and improve scalability. Implement batching and aggregation
techniques to reduce the frequency and size of messages.
Consensus Algorithms
What are Consensus Algorithms?
Consensus algorithms in distributed systems are protocols that enable multiple computers
or nodes within a network to agree on a single data value or decision,
ensuring consistency and reliability across the system despite potential failures or malicious
behavior of some nodes. These algorithms are foundational for maintaining data integrity
and synchrony, especially in environments where nodes operate independently and may
experience different states or updates.
• Key examples include Paxos and Raft, which ensure that all non-faulty nodes
agree on the same value through a series of proposals and acceptances, and
Practical Byzantine Fault Tolerance (PBFT), which can handle malicious nodes
by requiring a majority consensus from honest nodes.
• Data Consistency:
o Consensus algorithms ensure that all nodes agree on the same data
values or state, providing a consistent view of the system.
• Fault Tolerance:
o Distributed systems must be resilient to failures, whether they are due
to hardware malfunctions, network issues, or software bugs.
• Scalability:
• Paxos: A family of protocols that achieve consensus despite network delays, node
failures, and message losses. Paxos is known for its robustness but is often
considered complex to understand and implement.
• Raft: Designed to be more understandable and easier to implement than Paxos, Raft
achieves consensus by electing a leader that manages the replication of log entries to
other nodes.
• Practical Byzantine Fault Tolerance (PBFT): Handles Byzantine failures, where nodes
can act arbitrarily or maliciously. PBFT requires a supermajority of honest nodes to
reach consensus and is used in systems requiring high security.
• Tendermint: A BFT consensus algorithm designed for blockchain networks,
combining fast finality with high throughput. It uses a combination of voting rounds
and is optimized for performance and security.
3. Proof-Based Algorithms:
• Proof of Work (PoW): Used in Bitcoin and other cryptocurrencies, PoW requires
nodes (miners) to solve complex cryptographic puzzles to validate transactions and
add new blocks to the blockchain. It is energy-intensive but provides robust security.
• Proof of Stake (PoS): Validators are chosen based on the number of tokens they hold
and are willing to "stake" as collateral. PoS is more energy-efficient than PoW and is
used in cryptocurrencies like Ethereum 2.0.
• Delegated Proof of Stake (DPoS): Token holders vote for a small number of delegates
to validate transactions and create blocks. DPoS aims to achieve faster consensus and
is used in platforms like EOS and TRON.
4. Leader-Based Algorithms:
• Viewstamped Replication (VR): Similar to Paxos and Raft, VR involves a primary node
(leader) that coordinates the replication of logs to backup nodes. If the leader fails, a
new leader is elected to continue operations.
5. Voting-Based Algorithms:
• Federated Byzantine Agreement (FBA): Used in systems like Stellar, FBA allows each
node to choose its quorum slices, leading to decentralized consensus formation.
Nodes reach agreement through overlapping quorums, ensuring security and
scalability.
Here are some of the most popular consensus algorithms in distributed systems, each with
its unique features and applications:
1. Paxos
Paxos is a family of protocols developed by Leslie Lamport for achieving consensus in
distributed systems despite network delays, node failures, and message losses. Paxos
ensures that all nodes agree on a single value even if some nodes fail.
• The protocol involves proposers, acceptors, and learners. Proposers suggest values,
acceptors agree on a value, and learners learn the agreed value.
• Paxos operates in two main phases: the prepare phase, where proposers seek
agreement from a majority of acceptors, and the accept phase, where they finalize
the agreement. Although Paxos is robust, it is often considered complex to
understand and implement.
• It is used in systems like Google’s Chubby, Microsoft’s Azure Storage, and Yahoo’s
ZooKeeper.
2. Raft
• The leader receives log entries from clients and replicates them to follower nodes,
ensuring all nodes have the same log entries.
• If the leader fails, a new leader is elected. Raft’s design focuses on simplicity and has
been widely adopted in systems like etcd, Consul, and CockroachDB.
PBFT is designed to handle Byzantine faults, where nodes may fail or act maliciously. It
ensures consensus as long as less than one-third of the nodes are faulty. PBFT operates in
three phases: pre-prepare, prepare, and commit. In the pre-prepare phase, the leader
proposes a value.
• In the prepare phase, nodes exchange messages to agree on the proposal. In the
commit phase, nodes commit the proposal once a supermajority consensus is
reached.
• PBFT is used in high-security applications like Hyperledger Fabric and Zilliqa due to its
ability to handle arbitrary failures.
• PoW’s security relies on the computational effort required, making it difficult for
attackers to alter the blockchain.
• However, PoW is criticized for its high energy consumption. Despite this, PoW
remains a cornerstone of many cryptocurrencies due to its robust security.
PoS is a more energy-efficient consensus algorithm where validators are chosen based on
the number of tokens they hold and are willing to stake as collateral. Validators create and
propose new blocks, and their stake incentivizes them to act honestly.
• If they validate malicious transactions, they risk losing their staked tokens.
• PoS is used in various cryptocurrencies, including Ethereum 2.0 and Cardano, due to
its lower energy requirements compared to PoW while maintaining security.
Fault
Algorithm Description Tolerance Use Cases Benefits Challenges
Achieves
consensus Crash Google’s Robust and Complex to
despite Fault Chubby, proven; high understand
network Tolerant Microsoft’s fault and
delays and (CFT) Azure tolerance implement
Paxos node failures.
Easier to
Leader-based Crash
understand Leader
log replication Fault etcd, Consul,
and election can
for Tolerant CockroachDB
implement cause delays
consensus. (CFT)
Raft than Paxos
Fault
Algorithm Description Tolerance Use Cases Benefits Challenges
Validators are
Byzantine Wealth
chosen based Energy
Fault Ethereum concentration;
Proof of on stake to efficient;
Tolerant 2.0, Cardano potential
Stake propose new scalable
(BFT) centralization
(PoS) blocks.
1. Fault Tolerance
Fault tolerance is the ability of a system to continue operating correctly even when some of
its components fail. In distributed systems, failures can include node crashes, network
partitions, and even malicious behavior. Consensus algorithms must be designed to handle
these failures gracefully.
• Crash Fault Tolerance (CFT): Algorithms like Paxos and Raft are designed to handle
node crashes and recover without data loss.
• Byzantine Fault Tolerance (BFT): Algorithms like PBFT and Tendermint are designed
to handle arbitrary failures, including malicious behavior, which is more complex and
resource-intensive.
2. Scalability
3. Security
Security is crucial to protect the integrity and confidentiality of data in distributed systems.
Consensus algorithms must be robust against various attacks, including Sybil attacks, double-
spending, and Denial-of-Service (DoS) attacks.
• Sybil Attacks: Attackers create multiple fake identities to gain influence over the
network. PoW and PoS address this by requiring computational work or stake,
respectively, making it costly to mount such attacks.
• Double-Spending: Ensuring that a digital currency cannot be spent more than once is
critical in blockchain systems, requiring mechanisms to detect and prevent double-
spending.
4. Synchronization
Synchronization ensures that all nodes in the distributed system have a consistent view of
the state and agree on the same data.
• Network Latency: Variations in network latency can cause delays in message delivery,
leading to nodes having different views of the system state.
5. Configuration Management
Choosing the right consensus algorithm for a distributed system depends on various factors
specific to the system's requirements, environment, and constraints. Here's a detailed guide
to help you make an informed decision:
• Fault Tolerance: Assess the types of faults your system must handle (e.g., crash
faults, Byzantine faults). PBFT and Tendermint handle Byzantine faults, while Paxos
and Raft handle crash faults.
• Scalability: Determine the number of nodes your system will need to support.
Algorithms like PoS and DPoS are more scalable than PoW and PBFT.
• Attack Resistance: Identify potential security threats (e.g., Sybil attacks, double-
spending, DoS attacks). PoW is robust against Sybil attacks, while PoS and PBFT
provide different security guarantees.
• Energy Efficiency: Consider the environmental impact and operational costs. PoS and
DPoS are more energy-efficient compared to PoW.
• Step 1: Define System Requirements: List down the specific needs regarding
throughput, latency, fault tolerance, and security.
• Step 5: Make an Informed Decision: Choose the algorithm that best fits your
requirements, supported by test results and thorough evaluation.