[go: up one dir, main page]

0% found this document useful (0 votes)
15 views8 pages

Topic 4

Concurrency control in databases is essential for managing simultaneous operations without conflicts, particularly in multi-user systems where both read and write operations occur. Techniques such as lock-based methods, timestamp methods, and various distributed protocols help ensure data integrity and consistency while addressing challenges like deadlocks and starvation. Different locking strategies, including pessimistic and optimistic locking, are employed to manage access to data items effectively, with each having its advantages and disadvantages.

Uploaded by

brianalbert262
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views8 pages

Topic 4

Concurrency control in databases is essential for managing simultaneous operations without conflicts, particularly in multi-user systems where both read and write operations occur. Techniques such as lock-based methods, timestamp methods, and various distributed protocols help ensure data integrity and consistency while addressing challenges like deadlocks and starvation. Different locking strategies, including pessimistic and optimistic locking, are employed to manage access to data items effectively, with each having its advantages and disadvantages.

Uploaded by

brianalbert262
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Topic 4: Concurrency control in databases

4.1 Concurrency Control in Databases


Concurrency control is the procedure in DBMS for managing simultaneous operations without
conflicting with each another. Concurrent access is quite easy if all users are just reading data.
There is no way they can interfere with one another. Though for any practical database, would
have a mix of reading and WRITE operations and hence the concurrency is a challenge.
Concurrency control is used to address such conflicts which mostly occur with a multi-user system.
It helps you to make sure that database transactions are performed concurrently without violating
the data integrity of respective databases. Therefore, concurrency control is a most important
element for the proper functioning of a system where two or multiple database transactions that
require access to the same data, are executed simultaneously. The reasons for concurrency control
in DBMS include:
i. To apply Isolation through mutual exclusion between conflicting transactions
ii. To resolve read-write and write-write conflict issues
iii. To preserve database consistency through constantly preserving execution obstructions
iv. The system needs to control the interaction among the concurrent transactions. This control
is achieved using concurrent-control schemes.
v. Concurrency control helps to ensure serializability

4.2 Concurrency control techniques


4.2.1 Lock-Based techniques
A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes :
1 . shared (X) mode . Data item can be both read as well as written. X-lock is requested using
lock-X instruction.
2 . exclusive (S) mode . Data item can only be read
Lock requests are made to concurrency-control manager. Transaction can proceed only after
request is granted.
A transaction may be granted a lock on an item if the requested lock is compatible with locks
already held on the item by other transactions
Any number of transactions can hold shared locks on an item, but if any transaction holds an
exclusive on the item no other transaction may hold any lock on the item.
If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks
held by other transactions have been released. The lock is then granted.
Example of a transaction performing locking:
T 2 : lock-S (A) ;
read (A) ;
unlock (A) ;
lock-S (B) ;
read (B) ;
unlock (B) ;
display (A+B)
Locking as above is not sufficient to guarantee serializability — if A and B get updated in-
between the read of A and B , the displayed sum would be wrong.
A locking protocol is a set of rules followed by all transactions while requesting and releasing
locks. Locking protocols restrict the set of possible schedules.

4.2.1.1 Locking Strategies


Pessimistic Locking
Pessimistic locking is an approach where an entity is locked in the database for the entire time
that it is in application memory (often in the form of an object). A lock either limits or prevents
other users from working with the entity in the database. A write lock indicates that the holder
of the lock intends to update the entity and disallows anyone from reading, updating, or deleting
the entity. A read lock indicates that the holder of the lock does not want the entity to change
while the hold the lock, allowing others to read the entity but not update or delete it. The scope
of a lock might be the entire database, a table, a collection of rows, or a single row. These types
of locks are called database locks, table locks, page locks, and row locks respectively.
The advantages of pessimistic locking are that it is easy to implement and guarantees that your
changes to the database are made consistently and safely. The primary disadvantage is that this
approach isn’t scalable. When a system has many users, or when the transactions involve a
greater number of entities, or when transactions are long lived, then the chance of having to wait
for a lock to be released increases. Therefore, this limits the practical number of simultaneous
users that your system can support.

Optimistic Locking
With multi-user systems it is quite common to be in a situation where collisions are
infrequent. Although the two of us are working with Customer objects, you’re working with the
Wayne Miller object while I work with the John Berg object and therefore we won’t collide. When
this is the case optimistic locking becomes a viable concurrency control strategy. The idea is that
you accept the fact that collisions occur infrequently, and instead of trying to prevent them you
simply choose to detect them and then resolve the collision when it does occur.
The application reads the object into memory. To do this a read lock is obtained on the data, the
data is read into memory, and the lock is released. At this point in time the row(s) may be marked
to facilitate detection of a collision. The application then manipulates the object until the point that
it needs to be updated. The application then obtains a write lock on the data and reads the original
source back so as to determine if there’s been a collision. The application determines that there
has not been a collision so it updates the data and unlocks it. Had a collision been detected, e.g.
the data had been updated by another process after it had originally been read into memory, then
the collision would need to be resolved.

Overly Optimistic Locking


With the strategy you neither try to avoid nor detect collisions, assuming that they will never
occur. This strategy is appropriate for single user systems, systems where the system of record is
guaranteed to be accessed by only one user or system process at a time, or read-only tables. These
situations do occur. It is important to recognize that this strategy is completely inappropriate for
multi-user systems.

4.2.1.2 Pitfalls of Lock-Based Protocols


Consider the partial schedule
Neither T 3 nor T 4 can make progress — executing lock-S (B) causes T 4 to wait for T 3 to
release its lock on B , while executing lock-X (A) causes T 3 to wait for T 4 to release its lock
on A .
Such a situation is called a deadlock
To handle a deadlock one of T 3 or T 4 must be rolled back and its locks released.
The potential for deadlock exists in most locking protocols. Deadlocks are a necessary evil.
Starvation is also possible if concurrency control manager is badly designed. For example:
A transaction may be waiting for an X-lock on an item, while a sequence of other transactions
request and are granted an S-lock on the same item.
The same transaction is repeatedly rolled back due to deadlocks.
Concurrency control manager can be designed to prevent starvation.

4.2.2 The Two-Phase Locking Protocol


This is a protocol which ensures conflict-serializable schedules.
Phase 1: Growing Phase
transaction may obtain locks
transaction may not release locks
Phase 2: Shrinking Phase
transaction may release locks
transaction may not obtain locks
The protocol assures serializability. It can be proved that the transactions can be serialized in
the order of their lock points (i.e. the point where a transaction acquired its final lock).
Two-phase locking does not ensure freedom from deadlocks
Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified
protocol called strict two-phase locking. Here a transaction must hold all its exclusive locks till
it commits/aborts.
Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this
protocol transactions can be serialized in the order in which they commit.
There can be conflict serializable schedules that cannot be obtained if two-phase locking is used.
However, in the absence of extra information (e.g., ordering of access to data), two-phase
locking is needed for conflict serializability in the following sense:
Given a transaction T i that does not follow two-phase locking, we can find a transaction T j
that uses two-phase locking, and a schedule for T i and T j that is not conflict serializable.

4.2.3 Implementation of Locking


A Lock manager can be implemented as a separate process to which transactions send lock and
unlock requests
The lock manager replies to a lock request by sending a lock grant messages (or a message
asking the transaction to roll back, in case of a deadlock)
The requesting transaction waits until its request is answered
The lock manager maintains a datastructure called a lock table to record granted locks and
pending requests
The lock table is usually implemented as an in-memory hash table indexed on the name of the
data item being locked

4.2.4 Lock Table


Lock table also records the type of lock granted or requested
New request is added to the end of the queue of requests for the data item, and granted if it is
compatible with all earlier locks
Unlock requests result in the request being deleted, and later requests are checked to see if they
can now be granted
If transaction aborts, all waiting or granted requests of the transaction are deleted
lock manager may keep a list of locks held by each transaction, to implement this efficiently

4.3 Timestamp technique


Each transaction is assigned a unique timestamp when it begins (can be from a physical or logical
clock).
Each object in the system has a read and write timestamp associated with it (two timestamps per
object). The read timestamp is the timestamp of the last committed transaction that read the object.
The write timestamp is the timestamp of the last committed transaction that modified the object
(note - the timestamps are obtained from the transaction timestamp - the start of that transaction)
The rule of timestamp ordering is:
- if a transaction wants to write an object, it compares its own timestamp with the object’s read
and write timestamps. If the object’s timestamps are older, then the ordering is good.
- if a transaction wants to read an object, it compares its own timestamp with the object’s write
timestamp. If the object’s write timestamp is older than the current transaction, then the ordering
is good.
If a transaction attempts to access an object and does not detect proper ordering, the transaction is
aborted and restarted (improper ordering means that a newer transaction came in and modified
data before the older one could access the data or read data that the older one wants to modify).

4.4 Concurrency Control in Distributed Databases


We assume that each site participates in the execution of a commit protocol to ensure global
transaction atomicity. If any site containing a replica of a data item has failed, updates to the data
item cannot be processed.

4.4.1 Locking Protocols


Various locking protocols can be used in a distributed environment. The only change that needs to
be incorporated is in the way the lock manager deals with replicated data.

Single Lock-Manager Approach: In the single lock-manager approach, the system maintains a
single lock manager that resides in a single chosen site—say Si. All lock and unlock requests are
made at site Si. When a transaction needs to lock a data item, it sends a lock request to Si. The lock
manager determines whether the lock can be granted immediately. If the lock can be granted, the
lock manager sends a message to that effect to the site at which the lock request was initiated.
Otherwise, the request is delayed until it can be granted, at which time a message is sent to the site
at which the lock request was initiated. The transaction can read the data item from any one of the
sites at which a replica of the data item resides. In the case of a write, all the sites where a replica
of the data item resides must be involved in the writing.
The scheme has these advantages:
i. Simple implementation. This scheme requires two messages for handling lock
requests, and one message for handling unlock requests.
ii. Simple deadlock handling. Since all lock and unlock requests are made at one site,
the deadlock-handling algorithms can be applied directly to this environment.
The disadvantages of the scheme are:
i. Bottleneck. The site Si becomes a bottleneck/jam, since all requests must be processed
there.
ii. Vulnerability. If the site Si fails, the concurrency controller is lost. Either processing
must stop, or a recovery scheme must be used so that a backup site can take over lock
management from Si.

Distributed Lock Manager: the lock-manager function is distributed over several sites.
Each site maintains a local lock manager whose function is to administer the lock and unlock
requests for those data items that are stored in that site. When a transaction wishes to lock data
item Q, which is not replicated and resides at site Si, a message is sent to the lock manager at site
Si requesting a lock (in a particular lock mode). If data item Q is locked in an incompatible mode,
then the request is delayed until it can be granted. Once it has determined that the lock request can
be granted, the lock manager sends a message back to the initiator indicating that it has granted
the lock request.
The distributed lock manager scheme has the advantage of simple implementation, and reduces
the degree to which the coordinator is a bottleneck. It has a reasonably low overhead, requiring
two message transfers for handling lock requests, and one message transfer for handling unlock
requests. However, deadlock handling is more complex, since the lock and unlock requests are no
longer made at a single site:
There may be intersite deadlocks even when there is no deadlock within a single site.

Primary Copy: When a system uses data replication, we can choose one of the replicas as the
primary copy. Thus, for each data item Q, the primary copy of Q must reside in precisely one
site, which we call the primary site of Q.
When a transaction needs to lock a data item Q, it requests a lock at the primary site of Q. As
before, the response to the request is delayed until it can be granted.
Thus, the primary copy enables concurrency control for replicated data to be handled like that for
unreplicated data. This similarity allows for a simple implementation.
However, if the primary site of Q fails, Q is inaccessible, even though other sites containing a
replica may be accessible.

Majority Protocol: The majority protocol works this way: If data item Q is replicated in n
different sites, then a lock-request message must be sent to more than one-half of the n sites in
which
Q is stored. Each lock manager determines whether the lock can be granted immediately (as far as
it is concerned). As before, the response is delayed until the request can be granted. The transaction
does not operate on Q until it has successfully obtained a lock on a majority of the replicas of Q.
This scheme deals with replicated data in a decentralized manner, thus avoiding the drawbacks of
central control.
However, it suffers from these disadvantages:
i. Implementation. The majority protocol is more complicated to implement than are
the previous schemes. It requires 2(n/2 + 1) messages for handling lock requests, and
(n/2 + 1) messages for handling unlock requests.
ii. Deadlock handling. In addition to the problem of global deadlocks due to the use of a
distributed lock-manager approach, it is possible for a deadlock to occur even if only
one data item is being locked. As an illustration, consider a system with four sites and
full replication. Suppose that transactions T1 and T2 wish to lock data item Q in
exclusive mode. Transaction T1 may succeed in locking Q at sites S1 and S3, while
transaction T2 may succeed in locking Q at sites S2 and S4. Each then must wait to
acquire the third lock; hence, a deadlock has occurred. Luckily, we can avoid such
deadlocks with relative ease, by requiring all sites to request locks on the replicas of a
data item in the same predetermined order.

Biased Protocol: The difference from the majority protocol is that requests for shared locks are
given more favorable treatment than requests for exclusive locks.
i. Shared locks. When a transaction needs to lock data item Q, it simply requests a lock
on Q from the lock manager at one site that contains a replica of Q.
ii. Exclusive locks. When a transaction needs to lock data item Q, it requests a lock on Q
from the lock manager at all sites that contain a replica of Q.
As before, the response to the request is delayed until it can be granted.
The biased scheme has the advantage of imposing less overhead on read operations than does the
majority protocol. This savings is especially significant in common cases in which the frequency
of read is much greater than the frequency of write.
However, the additional overhead on writes is a disadvantage. Furthermore, the biased protocol
shares the majority protocol’s disadvantage of complexity in handling deadlock.

Quorum Consensus Protocol: The quorum consensus protocol is a generalization of the


majority protocol. The quorum consensus protocol assigns each site a nonnegative weight. It
assigns read and write operations on an item x two integers, called read quorum Qr and write
quorum Qw, that must satisfy the following condition, where S is the total weight of all sites at
which x resides:
Qr + Qw > S and 2 ∗ Qw > S
To execute a read operation, enough replicas must be read that their total weight is ≥ Qr. To execute
a write operation, enough replicas must be written so that their total weight is ≥ Qw.
The benefit of the quorum consensus approach is that it can permit the cost of either reads or writes
to be selectively reduced by appropriately defining the read and write quorums. For instance, with
a small read quorum, reads need to read fewer replicas, but the write quorum will be higher, hence
writes can succeed only if correspondingly more replicas are available. Also, if higher weights are
given to some sites (for example, those less likely to fail), fewer sites need to be accessed for
acquiring locks.
In fact, by setting weights and quorums appropriately, the quorum consensus protocol can simulate
the majority protocol and the biased protocols.

4.4.2 Timestamping
The principal idea behind the timestamping scheme is that each transaction is given a unique
timestamp that the system uses in deciding the serialization order. Our first task, is to develop a
scheme for generating unique timestamps. Then, the various protocols can operate directly to the
non replicated environment.
There are two primary methods for generating unique timestamps, one centralized and one
distributed. In the centralized scheme, a single site distributes the timestamps.
The site can use a logical counter or its own local clock for this purpose.
In the distributed scheme, each site generates a unique local timestamp by using either a logical
counter or the local clock.
We obtain the unique global timestamp by concatenating the unique local timestamp with the site
identifier, which also must be unique. The order of concatenation is important! We use the site
identifier in the least significant position to ensure that the global timestamp generated in one site
are not always greater than those generated in another site.
We may still have a problem if one site generates local timestamps at a rate faster than that of the
other sites. In such a case, the fast site’s logical counter will be larger than that of other sites.
Therefore, all timestamps generated by the fast site will be larger than those generated by other
sites. What we need is a mechanism to ensure that local timestamps are generated fairly across the
system. We define within each site Si a logical clock (LCi), which generates the unique local
timestamp. The logical clock can be implemented as a counter that is incremented after a new local
timestamp is generated. To ensure that the various logical clocks are synchronized, we require that
a site Si advance its logical clock whenever a transaction Ti with timestamp
<x,y> visits that site and x is greater than the current value of LCi. In this case, site Si advances its
logical clock to the value x + 1.
If the system clock is used to generate timestamps, then timestamps will be assigned fairly,
provided that no site has a system clock that runs fast or slow. Since clocks may not be perfectly
accurate, a technique similar to that for logical clocks must be used to ensure that no clock gets far
ahead of or behind another clock.

4.4.3 Deadlock Handling


The deadlock-prevention and deadlock-detection algorithms can be used in a distributed system,
provided that modifications are made. For example, we can use the tree protocol by defining a
global tree among the system data items.
Similarly, the timestamp-ordering approach could be directly applied to a distributed Environment.
Deadlock prevention may result in unnecessary waiting and rollback. Furthermore, certain
deadlock-prevention techniques may require more sites to be involved in the execution of a
transaction than would otherwise be the case.
If we allow deadlocks to occur and rely on deadlock detection, the main problem in a distributed
system is deciding how to maintain the wait-for graph. Common techniques for dealing with this
issue require that each site keep a local wait-for graph. The nodes of the graph correspond to all
the transactions (local as well as nonlocal) that are currently either holding or requesting any of
the items local to that site.
These local wait-for graphs are constructed in the usual manner for local transactions and data
items. When a transaction Ti on site S1 needs a resource in site S2, it sends a request message to
site S2. If the resource is held by transaction Tj , the system inserts an edge Ti → Tj in the local
wait-for graph of site S2.
Clearly, if any local wait-for graph has a cycle, deadlock has occurred. On the other hand, the fact
that there are no cycles in any of the local wait-for graphs does not mean that there are no
deadlocks.
In the centralized deadlock detection approach, the system constructs and maintains a global
wait-for graph (the union of all the local graphs) in a single site: the deadlock-detection
coordinator. Since there is communication delay in the system, we must distinguish between two
types of wait-for graphs. The real graph describes the real but unknown state of the system at any
instance in time, as would be seen by an omniscient observer. The constructed graph is an
approximation generated by the controller during the execution of the controller’s algorithm.
Obviously, the controller must generate the constructed graph in such a way that, whenever the
detection algorithm is invoked, the reported results are correct. Correct means in this case that, if
a deadlock exists, it is reported promptly, and if the system reports a deadlock, it is indeed in a
deadlock state.
The global wait-for graph can be reconstructed or updated under these conditions:
i. Whenever a new edge is inserted in or removed from one of the local wait-for graphs.
ii. Periodically, when a number of changes have occurred in a local wait-for graph.
iii. Whenever the coordinator needs to invoke the cycle-detection algorithm.
When the coordinator invokes the deadlock-detection algorithm, it searches its global graph. If it
finds a cycle, it selects a victim to be rolled back. The coordinator must notify all the sites that a
particular transaction has been selected as victim. The sites, in turn, roll back the victim
transaction.

You might also like