[go: up one dir, main page]

0% found this document useful (0 votes)
22 views71 pages

Aks Replication Control

The document discusses replication control in distributed computing, focusing on how to manage operations across multiple servers. It highlights the benefits of replication, such as fault tolerance and load balancing, and explains the challenges of maintaining replication transparency and consistency. Additionally, it covers transaction management, including atomic commitment protocols and the two-phase commit process to ensure all servers either commit or abort transactions collectively.

Uploaded by

rushabhvekariya8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views71 pages

Aks Replication Control

The document discusses replication control in distributed computing, focusing on how to manage operations across multiple servers. It highlights the benefits of replication, such as fault tolerance and load balancing, and explains the challenges of maintaining replication transparency and consistency. Additionally, it covers transaction management, including atomic commitment protocols and the two-phase commit process to ensure all servers either commit or abort transactions collectively.

Uploaded by

rushabhvekariya8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Distributed Computing

Replication Control
A K Singh
Computer Engineering
N I T Kurukshetra
Slides adapted from Indy@UIUC © IG
Server-side Focus

• Concurrency Control = how to coordinate


multiple concurrent clients executing operations
(or transactions) with a server
Next:
• Replication Control = how to handle operations
(or transactions) when the objects are stored at
multiple servers, with or without replication

2
Replication: What and Why

• Replication = An object has identical copies,


each maintained by a separate server
– Copies are called “replicas”
• Why replication?
– Fault-tolerance: With k replicas of each object, can
tolerate failure of any (k-1) servers in the system
– Load balancing: Spread read/write operations out over
the k replicas => load lowered by a factor of k compared
to a single replica
– Replication => Higher Availability

3
Rollback Recovery vs. Rollforward
Recovery
Rollback to
Recovery
Last Checkpoint
Comleted

Checkpoint Failed Failure Restarted Replay Handle


Detected logged new
messages messages

Repair interval Recovery


Comleted message  request
Rollforward
to Checkpoint
R1
Failed Failure Restarted Handle new
Detected messages
R2
Take a
Checkpoint

R3
4
Availability

• If each of k servers has an independent


probability p of failing or becoming unreachable
– Server’s failure probability
• With no replication, availability of object =
= Probability that single copy is up
= (1 – p)
• With k replicas, availability of object =
Probability that at least one replicas is up
= 1 – Probability that all replicas are down
= (1 – p k)
5
Nines Availability

• With no replication, availability of object =


= (1 – p)
• With k replicas, availability of object =
= (1 – p k)
Availability Table
p = failure No replication k=3 replicas k=5 replicas
probability
0.1 90% 99.9% 99.999%
0.05 95% 99.9875% 6 Nines
0.01 99% 99.9999% 10 Nines 6
What’s the Catch?

• Challenge is to maintain two properties


1. Replication Transparency
– A client ought not to be aware of multiple copies of
objects existing on the server side
2. Replication Consistency
– All clients see single consistent copy of data, in spite
of replication
– For transactions, guarantee ACID

7
Replication Transparency
Replicas of an
Front ends Replica 1 object O
provide replication
transparency
Client Front End
Replica 2
Client
Front End
Client Replica 3

Requests
(replies flow opposite) 8
Replication Consistency

• Two ways to forward updates from front-ends State machine replication:


(FEs) to replica group Each replica is modeled as
a state machine =>
– Passive Replication: uses a primary replica (leader or
sequential & deterministic
previously a.k.a. “master”) state changes
– Active Replication: treats all replicas identically •State variables
•Interfaces accessible by
• Both approaches use the concept of “Replicated the clients that operate on
State Machines” state variables
•Deterministic state change
– Each replica’s code runs the same state machine
via interface
– Multiple copies of the same State Machine begun in the
Start state, and receiving the same Inputs in the same
order will arrive at the same State having generated the
same Outputs. [Schneider 1990] 9
Passive Replication

• Leader => total ordering of all updates


Replica 1 • On leader failure, run election

Client Front End Leader (elected leader)


Replica 2
Client
Front End
Client Replica 3

Requests
(replies flow opposite) 10
Active Replication

Multicast
Front ends Replica 1 inside
provide replication Replica group
transparency
Client Front End
Replica 2
Client
Front End
Client Replica 3

Requests
(replies flow opposite) 11
Active Replication Using Concepts
You’ve Learnt earlier
• Can use any flavor of multicast ordering,
depending on application
– FIFO ordering
– Causal ordering
– Total ordering
– Hybrid ordering
• Total or Hybrid (*-Total) ordering + Replicated
State machines approach
– => all replicas reflect the same sequence of updates to
the object
12
Active Replication Using Concepts
You’ve Learnt earlier (2)
• What about failures?
– Use virtual synchrony (i.e., view synchrony)
• Virtual synchrony with total ordering for
multicasts =>
– All replicas see all failures/joins/leaves and all
multicasts in the same order
– Could also use causal (or even FIFO) ordering if
application can tolerate it

13
Transactions and Replication
• One-copy serializability
– A concurrent execution of transactions in a replicated database
is one-copy-serializable if it is equivalent to a serial execution of
these transactions over a single logical copy of the database
– (Or) The effect of transactions performed by clients on
replicated objects should be the same as if they had been
performed one at a time on a single set of objects (i.e., one
replica per object)
• In a non-replicated system, transactions appear to
be performed one at a time in some order
– Correctness means serial equivalence of transactions
• When objects are replicated, transaction systems
for correctness need one-copy serializability 14
Next

• Committing transactions with distributed servers

15
Transactions with Distributed
Servers
Server 1
Transaction T Object A
write(A,1);
write(B,2); Object B
… .
write(Y, 25); .
write(Z, 26); .
commit
Server 13
Object Y

Object Z
16
Transactions With Distributed
Servers

• Transaction T may touch objects that reside on


different servers
• When T tries to commit, need to ensure
– Either all these servers commit their updates from T =>
T will commit
– Or none of these servers commit => T will abort
• What problem is this?

17
Transactions With Distributed
Servers

• Transaction T may touch objects that reside on


different servers
• When T tries to commit
– Need to ensure all these servers commit their updates
from T => T will commit
– Or none of these servers commit => T will abort
• What problem is this?
– Consensus!
– (It’s also called the “Atomic Commit problem”)

18
Atomic Commitment Protocol

• An atomic commitment protocol (ACP)


is an algorithm for the coordinator and
participants such that either the
coordinator and all participants commit
the transaction or they all abort it
• Each process may cast exactly one of
two votes: Yes or No, and can reach
exactly one of two decisions: Commit or
19
Abort
Atomic Commitment Protocol (2)

Two important consequence of


• An ACP is an algorithm for processes to reach decisions such AC3
that: • Each process can unilaterally
– AC1: All processes that reach a decision reach the same one (consistent) decide Abort at any time, if it has
– AC2: A process can’t reverse its decision after it has reached one (irrevocable) not yet voted Yes
– AC3: The Commit decision can only be reached if all processes voted Yes •After voting Yes a process can’t
– AC4: If there are no failures and all processes voted Yes, then the decision take unilateral action
will be to Commit • The period between the moment a
process votes Yes and the moment it
– AC5: Consider any execution containing only failures that the algorithm is
has received sufficient information
designed to tolerate. At any point in this execution, if all existing failures
to know what the decision will be is
are repaired and no new failures occur for sufficiently long, then all
called the uncertainty period for that
processes will eventually reach a decision
process
• A process is called uncertain while it
is in its uncertainty period

20
Atomic Commitment Protocol (3)

• Scenario I: A failure disables communication between a


process p and all other processes, while p is uncertain. By the
definition of uncertainty period, p can’t reach a decision until
after the communication failure has been repaired
• Scenario II: p fails while in its uncertainty period. When p
recovers, it can’t reach a decision on its own unless it
communicates with other processes to find out what the
decision was

21
Atomic Commitment Protocol (4)

• Independent recovery: The ability of a recovering process


to reach a decision without communicating with other
processes
• This ability is very attractive, because
– It makes recovery cheaper and simpler
– The lack of independent recovery in conjunction with total failures (when
all processes fail) gives rise to blocking
• Example
– Say p, in Scenario II, is the first process to recover from a total failure
– Since p is uncertain, it must communicate with other processes before it
can reach a decision
– However, it can’t communicate with them, since they all are down
– Thus, p is blocked
22
Atomic Commitment Protocol (5)

• Failures while a process is uncertain  serious problems


• Can we design an ACP that eliminates uncertainty periods?
– Unfortunately, not
• Doing so would essentially require that a process cast its vote
and learn the votes of all other processes all at once
– In general, this is impossible
• Thus, we have the following important observations:
• Proposition 1: If communication failures or total failures are
possible, then every ACP may cause processes to become
blocked
– However, a non-blocking ACP may exist if only site failures occur
• Proposition 2: No ACP can guarantee independent recovery 23
of failed processes
One-phase Commit

Coordinator Server 1 1-phase atomic commit


Transaction T Server Object A protocol:
The coordinator
write(A,1); communicates the commit
. Object B
write(B,2); or abort request to all of the
.
… . .
participants in the
transaction and keep on
write(Y, 25); . repeating the request until
write(Z, 26); . all of them have
commit acknowledged that they
Server 13 have carried it out
• Special server called “Coordinator” Object Y
initiates atomic commit
• Tells other servers to either
Object Z
commit or abort 24
One-phase Commit: Issues

Reasons that prevent a server from


• One-phase atomic commit protocol is being able to commit its part of a
transaction (i) If locking is in use,
inadequate, though, because it does not allow a the resolution of a deadlock can lead
server to make a unilateral decision to abort, in to the aborting of a transaction
without the client being aware (ii) If
case it can’t commit, a transaction when the optimistic concurrency control is in
client requests a commit use, the failure of validation at a
– Server with object has no say in whether transaction server would cause it to decide to
abort the transaction (iii) The
commits or aborts
coordinator may not know if a
• If object corrupted, it just can’t commit (while other servers have server has crashed and been
committed)
replaced during the progress of a
– Server may crash, before receiving commit message, distributed transaction – such a
with some updates still in memory server will need to abort the
transaction 25
Two-phase Commit
In the 2nd phase of the protocol, every
• The two-phase commit protocol is designed to allow any participant in the transaction carries out
the joint decision
participant to abort its part of a transaction •If any one participant votes to abort, then the
– Due to the requirement for atomicity, if one part of a transaction is aborted, decision must be to abort the transaction. If all
then the whole transaction must be aborted the participants vote to commit, then the
• In the 1st phase of the protocol, each participant votes for the decision is to commit the transaction
Problem: To ensure that all of the
transaction to be committed or aborted participants vote and that they all reach
– Once a participant has voted to commit, it is not allowed to abort
the same decision
– Therefore, before a participant votes to commit a transaction, it must •Simple if no errors occur, but the protocol
ensure that it will eventually be able to carry out its part of the commit must work correctly even when some of the
protocol, even if it fails and is replaced in the interim servers fail, messages are lost or servers are
– Such participant in a transaction is said to be in a prepared state temporarily unable to communicate with one
another
• To make sure of this, each participant saves, in permanent
storage, all of the objects that it has altered in the transaction,
together with its status – prepared 26
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare

27
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare

• Save updates to disk


• Respond with “Yes” or “No”
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare

• Save updates to disk


• Respond with “Yes” or “No”
If any
“No” vote Abort
or timeout
before all
(13) votes
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare

• Save updates to disk


• Respond with “Yes” or “No”
All (13)
“Yes” Commit
votes
received
within
timeout?
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare

• Save updates to disk


• Respond with “Yes” or “No”
All (13)
“Yes” Commit
votes • Wait! Can’t commit or abort
received before receiving next message!
within
timeout?
Two-phase Commit
Coordinator

Server Server 1 Server 13
Prepare

• Save updates to disk


• Respond with “Yes” or “No”
All (13)
“Yes” Commit
votes • Commit updates from disk
received to store
within
OK
timeout?
Two-phase Commit: An Example

• e-commerce website handling an order


transaction
– When a customer places an order, the e-commerce
system needs to check the inventory (whether items are
in stock), payment processing (funds availability),
and delivery system (delivery partner availability)
– The coordinator (order system) sends a Prepare request
to each component
– If all components confirm they are ready, the
coordinator proceeds with Commit
– If any component indicates an issue (e.g., insufficient
stock), the coordinator aborts the transaction to maintain 33
consistency
Failures in Two-phase Commit

• If server voted Yes, it can’t commit unilaterally before


receiving Commit message
• If server voted No, can abort right away (why?)
• To deal with server crashes
– Each server saves tentative updates into permanent storage, right
before replying Yes/No in 1st phase. Retrievable after crash
recovery
• To deal with coordinator crashes
– Coordinator logs all decisions and received/sent messages on
disk
– After recovery or new election => new coordinator takes over
34
Failures in Two-phase Commit (2)

• To deal with Prepare message loss


– The server may decide to abort unilaterally after a timeout for 1st
phase (server will vote No, and so coordinator will also
eventually abort)
• To deal with Yes/No message loss
– The coordinator aborts the transaction after a timeout
(pessimistic!). It must announce Abort message to all
• To deal with Commit/Abort message loss
– Server can poll coordinator (repeatedly)

35
Failure Model for Commit
protocols
• Commit protocols are designed to work in an asynchronous
system in which servers may crash, msgs may be lost, and no
Byzantine faults
• The 2-phase commit is a protocol for reaching consensus
– FLP – consensus can’t be reached in an asynchronous system if processes
sometimes fail
• However, the 2-phase commit protocol does reach consensus
under those conditions
– Because crash failures of processes are masked by replacing a crashed
process with a new process whose state is set from information saved in
permanent storage and information held by other processes

36
Using Paxos in Distributed
Servers
Atomic Commit set of multiple operations to be executed as a single operation
• Can instead use Paxos to decide whether to
commit a transaction or not
• But need to ensure that if any server votes No,
everyone aborts
Ordering updates
• Paxos can also be used by replica group (for an
object) to order all updates – iteratively do:
– Server proposes message for next sequence number
– Group reaches consensus (or not) 37
Two-phase Commit: Issues

• If the coordinator has failed,


• 2PC does not handle network failures very well the participant will not be able
to get the decision until the
• If the coordinator fails during the Commit Phase, some coordinator is replaced, which
participants may not receive the final decision can result in extensive delays
• It can result in blocking for participants in the
– A participant must await the repair of failures before proceeding uncertain state
– Participants wait for an arbitrarily long period of time, especially if the • Further, if all the active
coordinator fails to send a response participants are uncertain,
– Such a participant is uncertain of the outcome (commit/abort) and can’t delays will occur even if a
proceed any further until it gets the outcome of the vote from the cooperative protocol allows
coordinator participants to make requests
– The participant can’t decide unilaterally what to do next, and meanwhile to other participants
the transaction may remain unterminated, uselessly consuming resources • “How 3PC protocol alleviate
(e.g. holding locks), for arbitrarily long at a blocked process’s site
such delays”, we’ll see later
– Consequently, the objects used by its transaction can’t be released for use
by other transactions
38
In step (3), if the coordinator waiting for
YES or NO messages from all the

2PC: Steps and Timeouts participants timeouts, it can decide Abort


but must send ABORT to every
participant from which it received a YES

In step (2), if a participant waiting for a VOTE-REQ from the coordinator timeouts, it can simply
decide Abort and stop • In step (4), if a participant
1. The coordinator sends a VOTE-REQ (i.e., vote request) message to all voted Yes and waiting for a
participants COMMIT or ABORT from the
2. When a participant receives a VOTE-REQ, it responds by sending to the coordinator timeouts, it is
coordinator a message containing that participant’s vote: YES or NO. If uncertain; now, it can consult
the participant votes No, it decides Abort and stops other processes to find out
what to decide
3. The coordinator collects the vote messages from all participants, If all of
them were YES and the coordinator’s vote is also Yes, then the • An insight for termination
– Say, there are two participants p and q
coordinator decides Commit and sends COMMIT messages to all
– The coordinator might send a
participants; otherwise, the coordinator decides Abort and sends COMMIT or ABORT to q but fail
ABORT messages to all participants that voted Yes (those that voted No just before sending it to p. Thus, even
though p is uncertain, q is not. If p
already decided Abort in step (2)). In either case, the coordinator then can communicate with q, it can find
stops out the decision from q. It need not
block waiting for the coordinator’s
4. Each participant that voted Yes waits for a COMMIT or ABORT recovery
message from the coordinator. When it receives the message, it decides 39
accordingly and stops
2PC: The Cooperative Termination

• With this protocol, if p can


• A participant p, which times out while in its uncertainty communicate with some q for
which either (1) or (2) holds,
period, sends a DECISION-REQ message to every other then p can reach a decision
process q, to inquire whether q either knows the without blocking
decision or can unilaterally reach one. In this scenario, p • On the other hand, if (3) holds
is the initiator and q a responder in the termination for all processes with which p
protocol can communicate, then p is
blocked
• Three cases: • In summary, even though the
1. q has already decided Commit (or Abort): q simply sends a cooperative termination
COMMIT (or ABORT) to p, and p decides accordingly protocol reduces the
2. q has not voted yet: q can unilaterally decide Abort. It then sends probability of blocking, it does
an ABORT to p, and p therefore decides Abort not eliminate it
3. q has voted Yes but has not yet reached a decision: q is also 40
uncertain and therefore can’t help p reach a decision
2PC: Time Complexity
A round is the maximum time for a message to reach its destination
• This is independent of the
• In the absence of failures, 2PC requires three rounds:
number of failures! The catch
– (1) the coordinator broadcasts VOTE-REQS; (2) the participants reply with
is that some processes may be
their votes; and (3) the coordinator broadcasts the decision
blocked
• If failures happen, then the termination protocol may need • By definition, a blocked
two additional rounds: process may remain blocked
– (4) for a participant, which timed out, to send a DECISION-REQ, and (5) for an unbounded period of
for a process that receives that message and is outside its uncertainty period time
to reply
• Therefore, to get meaningful
• Several participants may independently invoke the results, we must exclude
termination protocol blocked processes from
– However, the two rounds of different invocations can overlap, so the consideration in measuring
combined effect of all invocations of the termination protocol is only two time complexity
rounds
– Thus, in the presence of failures it will take up to five rounds for all 41
processes that aren’t blocked or failed to reach a decision
2PC: Message Complexity

• Let n be the number of participants (so the total number of processes


is n + 1)
– In each round of 2PC, n messages are sent
– Thus, in the absence of failures, the protocol uses 3n messages
• The cooperative termination protocol is invoked by all participants
that voted Yes but didn’t receive COMMIT or ABORT from the
coordinator
• Let, there be m such uncertain participants, 0 ≤ m ≤ n
– Thus, m processes will initiate the termination protocol, each sending n
DECISION-REQ messages
– At most (n - m) + 1 processes (the maximum that might not be in their
uncertainty period) will respond to the 1st DECISION-REQ message
• As a result of these responses, one more process may move outside its
uncertainty period and thus respond to the DECISION-REQ message
42
of another initiator of the termination protocol
2PC: Message Complexity (2)

43
2PC: Lessons Learned

• 3PC is a protocol that satisfies


• In 2PC, if all operational processes are uncertain, they are NB
blocked • The idea is simple
• They can’t decide Abort, even if they know that processes • Consider why 2PC violates
they can’t communicate with have failed, because some NB
failed process could have decided Commit before failing • The coordinator sends
• Suppose we’ve managed to design an ACP with the COMMITS to the participants
while the latter are uncertain
following “non-blocking property”:
• Thus, if participant p receives
• NB: If any operational process is uncertain then no process a COMMIT before participant
(whether operational or failed) can have decided to Commit q, the former will decide
– If the operational sites discover they are all uncertain, they can decide Commit while the latter is still
Abort, safe in their knowledge that the failed processes had not decided uncertain
Commit
• How 3PC avoids this?
– When the failed processes recover they can be told to decide Abort too 44
• This way blocking can be prevented
Three-phase Commit: Key Idea

• After the coordinator has found that all votes were Yes, it sends PRE- • If a process votes No, then
COMMIT messages to the participants 3PC behaves just like 2PC
• When a participant p receives that message, it knows that al1 • The coordinator sends ABORT
processes voted Yes and is thereby moved outside its uncertainty to all processes
period
– However, p does not decide Commit yet
• At this point, p knows that it will decide Commit provided it does not
fail
• Each participant acknowledges the receipt of PRE-COMMIT
• When the coordinator has received all the acknowledgments to PRE-
COMMITs, it knows that no participant is uncertain anymore
• It then sends COMMIT to all participants
• When a participant receives a COMMIT it can decide Commit
45
• This decision satisfies NB since no process is uncertain any longer
Three-phase Commit: Assumptions

• The first version of 3PC is designed to handle only (non-


total) site failures
– Consequently, we assume that communication failures do not happen
• Although, site failures may cause communication failures as
a side effect (Fig.)
• Two major implications of assuming only site failures
• if sites A and C fail, then
1. All operational processes can communicate with each other
sites B and E can’t
2. A process that times out waiting for a message from process q knows that q
is down and therefore that no processing can be taking place there; in communicate even though
particular, no other process can be communicating with q both are operational
• Neither of these statements is true if communication failures
are possible
46
Three-phase Commit: Steps

1. The coordinator sends a VOTE-REQ to all participants


2. When a participant receives a VOTE-REQ, it responds with a YES or NO message, depending on
its vote. If a participant sends NO, it decides Abort and stops
3. The coordinator collects the vote messages from all participants. If any of them was NO or if the
coordinator voted No, then the coordinator decides Abort, sends ABORT to all participants that
voted Yes, and stops. Otherwise, the coordinator sends PRE-COMMIT messages to all participants
4. A participant that voted Yes waits for a PRE-COMMIT or ABORT message from the coordinator
I. If participant receives an ABORT, it decides Abort and stops
II. If participant receives a PRE-COMMIT, it responds with an ACK (i.e., acknowledgment) message to the coordinator
5. The coordinator collects the ACKs. When they have all been received, it decides Commit, sends
COMMIT to all participants, and stops
6. A participant waits for a COMMIT from the coordinator. When it receives that message,
47it decides
Commit and stops
3PC: Timeout Actions

• Cases (1) & (2) are handled


• On timeout, “what a process should do” depends exactly as in 2PC
on the message it was waiting for • In both cases the process that
times out knows that no
• There are five places in which a process waits for process can have decided
Commit
some message in 3PC
• Thus, it can unilaterally decide
1. In Step (2) participants wait for VOTE-REQ Abort
2. In step (3) the coordinator waits for the votes • In case (1) the participant can
3. In step (4) participants wait for a PRE-COMMIT or simply stop once it has
ABORT decided Abort
• In case (2) the coordinator
4. In step (5) the coordinator waits for ACKs
should also send ABORTs to
5. In Step (6) participants wait for COMMIT all participants that had voted
Yes 48
Timeout Actions (during W1 & W2)
Coordinator Participant p
no process can have decided
W1 Commit, decide Abort
no process can have decided
Commit, decide Abort W2

W3

W4

W5

49
3PC: Timeout Actions (2)

• The failed participants, when they • In case (4) the coordinator


• On timeout, “what a process recover,should
do”
are responsible
depends
that the decision was to Commit
for finding out times out because one or more
participants failed before
on the message it was •waitingWith this fortimeout action, processes sending an ACK

• might decide Commit while some The coordinator does not know
There are five places in which a process waits for whether these participants failed
failed participant is uncertain before or after receiving a PRE-
some message in 3PC • This will happen if some participant COMMIT

1. In Step (2) participants wait


thatfor
did VOTE-REQ
not send an ACK had actually • But it does know that these
participants had voted Yes and
2. In step (3) the coordinatorfailed
waitsbefore
for theeven receiving the PRE-
votes were therefore prepared to
COMMIT from the coordinator (refer
3. In step (4) participants wait
stepfor
4 ofa 3PC)
PRE-COMMIT or decide Commit
ABORT • This does not violate NB, which • Thus, it ignores the failures
4. In step (5) the coordinatorrequires
waits for onlyACKs
that while any and proceeds to send
operational process is uncertain (let failed COMMIT to the operational
5. In Step (6) participants wait for COMMIT participants as if it had
process be uncertain), no operational or
failed process has decided Commit received all ACKs50
Timeout Action (during W4)

Participant q Coordinator Participant p

W1
W2

W3

W3
q uncertain, but failed W4 p might decide Commit

W5

51
3PC: Timeout Actions (3)

• In case (3) and (5) processes


• On timeout, “what a process should do” depends can’t act autonomously in
response to the timeout
on the message it was waiting for
• They must communicate with
• There are five places in which a process waits for other processes to reach a
consistent decision
some message in 3PC
• It is clear why such
1. In Step (2) participants wait for VOTE-REQ communication is necessary in
2. In step (3) the coordinator waits for the votes case (3)
3. In step (4) participants wait for a PRE-COMMIT or • p is operational and has not
ABORT received any PRE-COMMIT
and is therefore uncertain
4. In step (5) the coordinator waits for ACKs
• The timeout of an operational
5. In Step (6) participants wait for COMMIT uncertain participant can’t be
52
handled unilaterally
Timeout Actions (during W3)
Coordinator Participant p

W1
W2

W3 p uncertain, but operational

W4

W5

53
3PC: Timeout Actions (4)

• But, in case (5), why does a


• On timeout, “what a process should do” depends timed out participant p need to
communicate with others?
on the message it was waiting for
• p is operational and has
• There are five places in which a process waits for already received a PRE-
COMMIT and is therefore not
some message in 3PC uncertain
1. In Step (2) participants wait for VOTE-REQ • After all, p knows that only
2. In step (3) the coordinator waits for the votes COMMIT could possibly
3. In step (4) participants wait for a PRE-COMMIT or arrive from the coordinator
ABORT • Why can’t p ignore the
timeout and simply decide
4. In step (5) the coordinator waits for ACKs Commit?
5. In Step (6) participants wait for COMMIT • By deciding Commit, p could
violate NB 54
Timeout Action (during W5)
Coordinator Participant p

W1
W2

W3

W4

W5 p not uncertain

55
3PC: Timeout Actions (5)

Example:
• On timeout, “what a process should do”• depends Say, the coordinator failed after having sent the
on the message it was waiting for PRE-COMMIT to p but before sending it to some
other participant q
• There are five places in which a process• waits Thus p for
will time out in case (5) – outside its
some message in 3PC uncertainty period, while q will time out in case (3)
– inside its uncertainty period
1. In Step (2) participants wait for VOTE-REQ
• If p, on timeout, were to decide Commit while q
2. In step (3) the coordinator waits for the votes (which is operational) is still uncertain, it would
3. In step (4) participants wait for a PRE-COMMIT violate
or NB
ABORT • This suggests that before deciding Commit, p
should make sure that all operational participants
4. In step (5) the coordinator waits for ACKs
have received a PRE-COMMIT, and have therefore
5. In Step (6) participants wait for COMMIT moved outside their uncertainty periods
56
Timeout Actions (during W3 & W5)
Participant q Coordinator Participant p

W1
W2

W3
q uncertain, but
operational W3
W4

p not uncertain, on timeout


W5 might decide Commit

57
3PC: Termination Protocol

• Let us define states of a process relative to the


messages it has sent or received Note:-
• Four possible states: 1. Any process is in precisely
one state at any time
1. Aborted: the process has not voted, has voted No, or has
received an ABORT (i.e., it has either decided Abort or 2. Some pairs of states can’t
coexist, i.e., can’t be occupied
can unilaterally decide so)
at the same time by two
2. Uncertain: the process is in its uncertainty period operational processes
3. Committable: the process has received a PRE-COMMIT
but has not yet received a COMMIT
4. Committed: the process has received a COMMIT (i.e., it
has decided Commit) 58
3PC: Termination Protocol (2)

• Fig. shows which states can


coexist and which can’t
• Two operational sites can
occupy two states s and s' at
the same time iff the entry
whose row corresponds to s
and whose column
corresponds to s' contains a
“Y” while an “N” means they
can’t
• e.g., as per NB, if any
operational site is “Uncertain”
then no site (operational or
failed) can be “Committed”
59
3PC: Termination Protocol (3)

• The first step of termination protocol is coordinator


election, which works as follows:
• When a participant times out in case (3) or (5), it
initiates an election protocol
– Election protocol involves all processes that are operational and
results in the “election” of a new coordinator
– The old one must have failed; otherwise, no participant would
have timed out!
• Election Rule: If the present coordinator fails, the
smallest of the operational processes will become the
new coordinator
60
3PC: Termination Protocol (4)

• Once the new coordinator has been elected, the


termination protocol proceeds as follows:
• The coordinator sends a STATE-REQ message to all
processes that participated in the election
– By our assumption about failures, all operational sites will
participate
• A participant in the termination protocol (i.e., any
operational process other than the new coordinator)
responds to this message by sending its present state to
the coordinator
• The coordinator collects these states and proceeds 61
according to the following termination rule:
3PC: Termination Protocol (5)

• TR1: If some process is Aborted, the coordinator decides Abort, sends


ABORT messages to all participants, and stops
• TR2: If some process is Committed, the coordinator decides Commit,
sends COMMIT messages to all participants, and stops
• TR3: If all processes that reported their state are Uncertain, the coordinator
decides Abort, sends ABORT messages to all participants, and stops
• TR4: If some process is Committable but none is Committed, the
coordinator first sends PRE-COMMIT messages to all processes that
reported Uncertain, and waits for acknowledgments from these processes.
After having received these acknowledgments the coordinator decides
Commit, sends COMMIT messages to all processes, and stops
• A participant that receives a COMMIT (or ABORT) message, decides 62
Commit (or Abort), and stops
3PC: Time Complexity

• Resiliency and Blocking: 3PC is resilient to site failures


only. It is non-blocking except for a total site failure. By
Propositions 1 & 2 this is the maximal level of fault tolerance • This may appear deceptively
a non-blocking ACP can attain worse than the five rounds
required, in the worst case, for
• Time Complexity: In the absence of failures, 3PC uses at 2PC (independent of the
most five rounds of messages: number of failures!)
– (1) to distribute VOTE-REQs (2) to deliver votes (3) to distribute PRE- • Nevertheless, recall that 2PC
COMMITs (4) to acknowledge the PRE-COMMITs (5) to distribute may cause blocking and the
COMMITs
five round bound concerned
– If the decision is Abort, only three rounds are needed
only non-blocked processes
• Each invocation of the termination protocol contributes at
most five more rounds, plus the election protocol, which
requires only one round to send UR-ELECTEDs 63
• Thus, if f processes fail, at most 6f + 5 rounds are needed
3PC: Message Complexity

= 3(f + 1)(2n - 1) - n

64
3PC and Communication Failures

• Up to this point we have assumed that no communication


failures occur
• Let us now remove this assumption. Unfortunately, the
termination protocol for 3PC, we’ve presented, may result in
processes reaching inconsistent decisions
• Example:
– Say, the processes are partitioned into two components A and B
– It is possible that, at the time the partition occurred, all processes in A are
Uncertain while all those in B are Committable (refer table)
– According to the termination protocol, the processes in A will decide Abort
(case TR3), while those in B will decide Commit (case TR4)
• We need new termination protocol that avoids such
inconsistencies and guarantees correctness even in the 65
presence of communication failures
3PC and Communication Failure (2)

• The overall structure of the termination protocol is as


before
– A coordinator is elected that collects the states of participants
and decides how to proceed on the basis of the states it has
received
• The termination protocol, in short:
– After being elected, a coordinator sends STATE-REQ messages
to all processes
– A process that receives this message from its coordinator
responds by sending its present state
– The coordinator waits for a period of time, collecting responses
– It then proceeds by using the Majority Termination Rule (self
study) 66
3PC and Communication Failure (3)

• Therefore, the majority termination rule guarantees


that a decision will be reached • The crucial property of
– Note:- It is not necessary for all failures to be repaired majorities, that is needed here,
is “any two majorities must
• In general, if there is a set of processes A that forms a intersect”
majority and such that every process in A can • A quorum is a generalization
communicate with the smallest process in A, then all of the concept of majority that
processes in A will reach a decision (i.e., will become also satisfies this intersection
property
unblocked)
• Note:- Certain actions can be taken by the coordinator
only if it can communicate with a majority of sites
67
Three-phase Commit: In Short

• Enhanced version of 2PC


• 3PC mitigates blocking
– Introduces an additional phase between Prepare and Commit, known as
the Pre-Commit phase, to ensure participants have enough time to respond, issues by adding a timeout
reducing the risk of indefinite blocking mechanism, allowing
• 1. Prepare Phase participants to autonomously
– Similar to 2PC, the coordinator sends a Prepare request to participants,
decide to commit or abort if
asking if they can commit the transaction; participants respond Yes or No they don’t receive further
messages from the
• 2. Pre-Commit Phase
coordinator
– If all participants respond Yes, the coordinator sends a Pre-
Commit message, indicating the transaction is likely to be committed; • 3PC reduces the likelihood of
participants acknowledge the Pre-Commit, preparing to commit the indefinite blocking,
transaction but not yet completing it particularly helpful in
• 3. Commit Phase situations with unreliable
– If all participants acknowledge the Pre-Commit, the coordinator sends networks
the Commit command; if any participant fails to acknowledge the Pre- 68
Commit, the transaction is Aborted
2PC vs. 3PC: Closing Remarks

• The 1st version of 3PC has the advantage over 2PC that it
completely eliminates blocking (except, unavoidably, in the
event of total failures)
– Useful for systems built to tolerate only site failures
• The 2nd version of 3PC can be used in systems designed to
tolerate both site and communication failures
– It does not completely eliminate blocking but causes blocking less
frequently than 2PC
• For instance, in 2PC processes may be blocked even if just
one process – the coordinator – fails
• In the 2nd version of 3PC no process will be blocked (in the
absence of communication failures), as long as a majority of
the processes are still operational 69
2PC vs. 3PC: Closing Remarks (2)

• In most practical applications, the circumstances under which


2PC causes blocking are sufficiently rare, and thus blocking
is usually not considered a big problem
• Secondly, 3PC has greater message and round complexity
than 2PC
• Consequently, almost all systems we know of that employ
atomic commitment protocols use some version of 2PC
• Even though, generally, 3PC is not used in practice, it is an
interesting protocol both in its own right and also because it
illustrates a number of important techniques used in the
design of fault-tolerant communication protocols
• For these reasons, we feel that its study is a worthwhile 70
endeavor
Summary

• Multiple servers in cloud


– Replication for Fault-tolerance
– Load balancing across objects
• Replication Flavors using concepts we learnt
earlier
– Active replication
– Passive replication
• Transactions and distributed servers
– Two phase commit
– Three phase commit
71

You might also like