Topic 4
Topic 4
Optimistic Locking
With multi-user systems it is quite common to be in a situation where collisions are
infrequent. Although the two of us are working with Customer objects, you’re working with the
Wayne Miller object while I work with the John Berg object and therefore we won’t collide. When
this is the case optimistic locking becomes a viable concurrency control strategy. The idea is that
you accept the fact that collisions occur infrequently, and instead of trying to prevent them you
simply choose to detect them and then resolve the collision when it does occur.
The application reads the object into memory. To do this a read lock is obtained on the data, the
data is read into memory, and the lock is released. At this point in time the row(s) may be marked
to facilitate detection of a collision. The application then manipulates the object until the point that
it needs to be updated. The application then obtains a write lock on the data and reads the original
source back so as to determine if there’s been a collision. The application determines that there
has not been a collision so it updates the data and unlocks it. Had a collision been detected, e.g.
the data had been updated by another process after it had originally been read into memory, then
the collision would need to be resolved.
Single Lock-Manager Approach: In the single lock-manager approach, the system maintains a
single lock manager that resides in a single chosen site—say Si. All lock and unlock requests are
made at site Si. When a transaction needs to lock a data item, it sends a lock request to Si. The lock
manager determines whether the lock can be granted immediately. If the lock can be granted, the
lock manager sends a message to that effect to the site at which the lock request was initiated.
Otherwise, the request is delayed until it can be granted, at which time a message is sent to the site
at which the lock request was initiated. The transaction can read the data item from any one of the
sites at which a replica of the data item resides. In the case of a write, all the sites where a replica
of the data item resides must be involved in the writing.
The scheme has these advantages:
i. Simple implementation. This scheme requires two messages for handling lock
requests, and one message for handling unlock requests.
ii. Simple deadlock handling. Since all lock and unlock requests are made at one site,
the deadlock-handling algorithms can be applied directly to this environment.
The disadvantages of the scheme are:
i. Bottleneck. The site Si becomes a bottleneck/jam, since all requests must be processed
there.
ii. Vulnerability. If the site Si fails, the concurrency controller is lost. Either processing
must stop, or a recovery scheme must be used so that a backup site can take over lock
management from Si.
Distributed Lock Manager: the lock-manager function is distributed over several sites.
Each site maintains a local lock manager whose function is to administer the lock and unlock
requests for those data items that are stored in that site. When a transaction wishes to lock data
item Q, which is not replicated and resides at site Si, a message is sent to the lock manager at site
Si requesting a lock (in a particular lock mode). If data item Q is locked in an incompatible mode,
then the request is delayed until it can be granted. Once it has determined that the lock request can
be granted, the lock manager sends a message back to the initiator indicating that it has granted
the lock request.
The distributed lock manager scheme has the advantage of simple implementation, and reduces
the degree to which the coordinator is a bottleneck. It has a reasonably low overhead, requiring
two message transfers for handling lock requests, and one message transfer for handling unlock
requests. However, deadlock handling is more complex, since the lock and unlock requests are no
longer made at a single site:
There may be intersite deadlocks even when there is no deadlock within a single site.
Primary Copy: When a system uses data replication, we can choose one of the replicas as the
primary copy. Thus, for each data item Q, the primary copy of Q must reside in precisely one
site, which we call the primary site of Q.
When a transaction needs to lock a data item Q, it requests a lock at the primary site of Q. As
before, the response to the request is delayed until it can be granted.
Thus, the primary copy enables concurrency control for replicated data to be handled like that for
unreplicated data. This similarity allows for a simple implementation.
However, if the primary site of Q fails, Q is inaccessible, even though other sites containing a
replica may be accessible.
Majority Protocol: The majority protocol works this way: If data item Q is replicated in n
different sites, then a lock-request message must be sent to more than one-half of the n sites in
which
Q is stored. Each lock manager determines whether the lock can be granted immediately (as far as
it is concerned). As before, the response is delayed until the request can be granted. The transaction
does not operate on Q until it has successfully obtained a lock on a majority of the replicas of Q.
This scheme deals with replicated data in a decentralized manner, thus avoiding the drawbacks of
central control.
However, it suffers from these disadvantages:
i. Implementation. The majority protocol is more complicated to implement than are
the previous schemes. It requires 2(n/2 + 1) messages for handling lock requests, and
(n/2 + 1) messages for handling unlock requests.
ii. Deadlock handling. In addition to the problem of global deadlocks due to the use of a
distributed lock-manager approach, it is possible for a deadlock to occur even if only
one data item is being locked. As an illustration, consider a system with four sites and
full replication. Suppose that transactions T1 and T2 wish to lock data item Q in
exclusive mode. Transaction T1 may succeed in locking Q at sites S1 and S3, while
transaction T2 may succeed in locking Q at sites S2 and S4. Each then must wait to
acquire the third lock; hence, a deadlock has occurred. Luckily, we can avoid such
deadlocks with relative ease, by requiring all sites to request locks on the replicas of a
data item in the same predetermined order.
Biased Protocol: The difference from the majority protocol is that requests for shared locks are
given more favorable treatment than requests for exclusive locks.
i. Shared locks. When a transaction needs to lock data item Q, it simply requests a lock
on Q from the lock manager at one site that contains a replica of Q.
ii. Exclusive locks. When a transaction needs to lock data item Q, it requests a lock on Q
from the lock manager at all sites that contain a replica of Q.
As before, the response to the request is delayed until it can be granted.
The biased scheme has the advantage of imposing less overhead on read operations than does the
majority protocol. This savings is especially significant in common cases in which the frequency
of read is much greater than the frequency of write.
However, the additional overhead on writes is a disadvantage. Furthermore, the biased protocol
shares the majority protocol’s disadvantage of complexity in handling deadlock.
4.4.2 Timestamping
The principal idea behind the timestamping scheme is that each transaction is given a unique
timestamp that the system uses in deciding the serialization order. Our first task, is to develop a
scheme for generating unique timestamps. Then, the various protocols can operate directly to the
non replicated environment.
There are two primary methods for generating unique timestamps, one centralized and one
distributed. In the centralized scheme, a single site distributes the timestamps.
The site can use a logical counter or its own local clock for this purpose.
In the distributed scheme, each site generates a unique local timestamp by using either a logical
counter or the local clock.
We obtain the unique global timestamp by concatenating the unique local timestamp with the site
identifier, which also must be unique. The order of concatenation is important! We use the site
identifier in the least significant position to ensure that the global timestamp generated in one site
are not always greater than those generated in another site.
We may still have a problem if one site generates local timestamps at a rate faster than that of the
other sites. In such a case, the fast site’s logical counter will be larger than that of other sites.
Therefore, all timestamps generated by the fast site will be larger than those generated by other
sites. What we need is a mechanism to ensure that local timestamps are generated fairly across the
system. We define within each site Si a logical clock (LCi), which generates the unique local
timestamp. The logical clock can be implemented as a counter that is incremented after a new local
timestamp is generated. To ensure that the various logical clocks are synchronized, we require that
a site Si advance its logical clock whenever a transaction Ti with timestamp
<x,y> visits that site and x is greater than the current value of LCi. In this case, site Si advances its
logical clock to the value x + 1.
If the system clock is used to generate timestamps, then timestamps will be assigned fairly,
provided that no site has a system clock that runs fast or slow. Since clocks may not be perfectly
accurate, a technique similar to that for logical clocks must be used to ensure that no clock gets far
ahead of or behind another clock.