[go: up one dir, main page]

0% found this document useful (0 votes)
37 views59 pages

Transaction Management and Concurrency Control

This document discusses transaction management and concurrency control in database systems, highlighting the importance of transactions, their properties (ACID), and the challenges posed by concurrent transactions. It describes various types of transaction failures, the need for concurrency control, and different protocols such as locking methods to ensure serializability. Additionally, it explains the concepts of interleaved and simultaneous concurrency, along with the implications of transaction scheduling and execution order.

Uploaded by

makureya1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views59 pages

Transaction Management and Concurrency Control

This document discusses transaction management and concurrency control in database systems, highlighting the importance of transactions, their properties (ACID), and the challenges posed by concurrent transactions. It describes various types of transaction failures, the need for concurrency control, and different protocols such as locking methods to ensure serializability. Additionally, it explains the concepts of interleaved and simultaneous concurrency, along with the implications of transaction scheduling and execution order.

Uploaded by

makureya1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Database Systems

Course Code: INSY122

1
Topic 6: Transaction Management and
Concurrency Control

2
Objectives
• At the end of this chapter you should be able to:
• Describe the nature of transactions and the reasons for designing
database systems around transactions.
• Explain the causes of transaction failure.
• Analyse the problems of data management in a concurrent
environment.
• Critically compare the relative strengths of different concurrency
control approaches.

3
Transaction
Definition
• A transaction is the execution of a program that accesses or changes
the contents of a database.
• Logical unit of work that must be entirely completed or aborted
Consists of:
• SELECT statement
• Series of related UPDATE statements
• Series of INSERT statements
• Combination of SELECT, UPDATE, and INSERT statements
• It is either completed in its entirety (COMMIT) or not done at all
(ROLLBACK).

4
Example

5
Example

6
• A transaction may be distributed (available on different physical
systems or organised into different logical subsystems) and/or use
data concurrently with multiple users for different purposes.
• Online transaction processing (OLTP) systems support a large
number of concurrent transactions without imposing excessive
delays.
• OLTP, or online transactional processing, enables the real-time
execution of large numbers of database transactions by large
numbers of people, typically over the internet.
• Eg Online banking and ATM transactions, e-commerce and in-store
purchases, and hotel and airline bookings, to name a very few

7
Transaction Properties (ACID)
• The acronym ACID indicates the properties of any well-formed
transaction.
• Any transaction that violates these principles will cause failures of
concurrency.
Atomicity
• A transaction is an atomic unit of processing; it is either performed
in its entirety or not performed at all.
• All operations of a transaction must be completed. If not, the
transaction is aborted
• Atomicity is maintained by commitment and rollback.
Consistency
• The database state is consistent at the end of a transaction.
• All data integrity constraints are satisfied

8
Isolation
• A transaction should not make its updates visible to other
transactions until it is committed.
• Data used during transaction cannot be used by second transaction
until the first is completed.
• This property, when enforced strictly, solves the temporary update
problem and makes cascading rollbacks of transactions unnecessary.
Durability
• When a transaction has made a change to the database state and
the change is committed, this change is permanent and should be
available to all other transactions.

9
Transaction states

BEGIN_TRANSTRACTION
• This marks the beginning of transaction execution.
READ or WRITE
• These specify read or write operations on the database items that are executed
as part of a transaction.
END_TRANSTRACTION
This specifies that read and write operations have ended and marks the end limit
of transaction execution. However, at this point it may be necessary to check
whether the changes introduced by the transaction can be permanently applied
to the database (committed) or whether the transaction has to be aborted
because it violates concurrency control, or for some other reason (rollback).
COMMIT_TRANSTRACTION
This signals a successful end of the transaction so that any changes (updates)
executed by the transaction can be safely committed to the database and will not
be undone.
ROLLBACK (or ABORT)
This signals the transaction has ended unsuccessfully, so that any changes or
10
effects that the transaction may have applied to the database must be undone.
Concurrency Control

• Concurrency is the ability of the DBMS to process more than one


transaction at a time.
• Coordination of the simultaneous transactions execution in a multiuser
database system.
• Objective - Ensures serializability of transactions in a multiuser
database environment.
• Concurrency is everywhere in modern programming, whether we like it or not:
• Multiple computers in a network
• Multiple applications running on one computer
• Multiple processors in a computer (today, often multiple processor cores on a
single chip)
• In fact, concurrency is essential in modern programming:
• Web sites must handle multiple simultaneous users.
• Mobile apps need to do some of their processing on servers (“in the cloud”).
• Graphical user interfaces almost always require background work that does
not interrupt the user. For example, Eclipse compiles your Java code while 11
you’re still editing it.
Interleaved concurrency
• Many computer systems, including DBMSs, are used
simultaneously by more than one user.
• This means the computer runs multiple transactions
(programs) at the same time. Eg an airline reservations
system, Systems in banks, insurance agencies, stock
exchanges, etc.
• To avoid excessive delays, concurrent systems execute some
commands from one program (transaction), then suspended
that program and execute some commands from the next
program, and so on.
• A program is resumed at the point where it was suspended
when it gets its turn to use the CPU again. This is known as
interleaving. 12
13
Simultaneous concurrency
• If the computer system has multiple hardware processors
(CPUs), simultaneous processing of multiple programs is
possible, leading to simultaneous rather than interleaved
concurrency a process known as Multiprocessing.
• Most of the theory concerning concurrency control in
databases is developed in terms of interleaved concurrency,
although it may be adapted to simultaneous concurrency.

14
Need for concurrency control
Lost update
• Occurs in two concurrent transactions when:
• Same data element is updated (operations interleaved)
• One of the updates is lost
• That is, interleaved use of the same data item would cause some
problems when an update operation from one transaction
overwrites another update from a second transaction.

15
Example
• Suppose the two transactions T1 and T2 are submitted at approximately the same
time in an airline reservation system.

• If X = 80, originally there were 80 reservations on the flight


• If N = 5, T1 cancels 5 seats on the flight corresponding to X and reserves them on
the flight corresponding to Y, and
• If M = 4, T2 reserves 4 seats on X.
• The final result should be X = 80 – 5 + 4 = 79; but in the concurrent operations of
the figure above, it is X = 84 because the update that cancelled 5 seats in T1 was
lost. 16
• The above interleaved operation will lead to an incorrect value for data item X,
because at time step 3, T2 reads in the original value of X which is before T1
changes it in the database, and hence the updated value resulting from T1 is lost.
Uncommitted dependency (or dirty read / temporary update)
Occurs when:
• Two transactions are executed concurrently
• First transaction is rolled back after the second transaction has already
accessed uncommitted data
• transaction is allowed to retrieve or(worse) update a record that
has been updated by another transaction, but which has not yet
been committed by that other transaction.
• Because it has not yet been committed, there is always a
possibility that it will never be committed but rather rolled back,
in which case, the first transaction will have used some data that
is now incorrect - a dirty read for the first transaction.

17
Example

• T1 updates item X and then fails before completion, so the system must change
X back to its original value.
• Before it can do so, however, transaction T2 reads the ‘temporary’ value of X,
which will not be recorded permanently in the database because of the failure
of T1.
• The value of item X that is read by T2 is called dirty data, because it has been
created by a transaction that has not been completed and committed yet
• Hence this problem is also known as the dirty read problem. Since the dirty 18
data read in by T2 is only a temporary value of X, the problem is sometimes
called temporary update too.
Inconsistent analysis
• Occurs when a transaction accesses data before and after one or more
other transactions finish working with such data

• For example, suppose that a transaction T3 is calculating the total number


of reservations on all the flights; meanwhile, transaction T1 is executing. If
the interleaving of operations shown below occurs, the result of T3 will be
off by amount N, because T3 reads the value of X after N seats have been
subtracted from it, but reads the value of Y before those N seats have been
added to it.
19
Transaction problems
The practical aspects of transactions are about keeping control. There are
a variety of causes of transaction failure. These may include:
1. Concurrency control enforcement: Concurrency control method may
abort the transaction, to be restarted later, because it violates
serialisability (the need for transactions to be executed in an equivalent
as would have resulted if they had been executed sequentially), or
because several transactions are in a state of deadlock.
2.Local error detected by the transaction: During transaction executions,
certain conditions may occur that necessitate cancellation of the
transaction (e.g. an account with insufficient funds may cause a
withdrawal transaction from that account to be cancelled). This may be
done by a programmed ABORT in the transaction itself.
3. A transaction or system error: Some operation in the transaction may
cause it to fail, such as integer overflow, division by zero, erroneous
parameter values or logical programming errors.

20
4. System software errors that result in abnormal termination or
destruction of the database management system.
5. Crashes due to hardware malfunction, resulting in loss of internal
(main and cache) memory (otherwise known as system crashes).
6. Disk malfunctions such as read or write malfunction, or a disk
read/write head crash. This may happen during a read or write operation
of the transaction.
7. Natural physical disasters and catastrophes such as fires, earthquakes
or power surges; sabotages, intentional contamination with computer
viruses, or destruction of data or facilities by operators or users.

21
Serialisability

• This is a criterion that most concurrency control methods


enforce
• When transactions are executed concurrently in an
interleaved fashion, the order of execution of operations from
the various transactions forms what is known as a transaction
schedule (or history).
• The scheduler is the DBMS component that establishes the
order in which concurrent database operations are executed.
• The scheduler interleaves the execution of the database
operations (belonging to several concurrent transactions) to
ensure the serializability of transactions.

22
• In other words, the scheduler guarantees that the execution
of concurrent transactions will yield the same result as though
the transactions were executed one after another.
• The scheduler is important because it is the DBMS component
that will ensure transaction serializability.
• In other words, the scheduler allows the concurrent execution
of transactions, giving end users the impression that they are
the DBMS's only users.

23
Schedule
• A series of operation from one transaction to another
transaction is known as schedule.
• It is used to preserve the order of the operation in each of the
individual transaction.

24
Serial Schedule
• The serial schedule is a type of schedule where one
transaction is executed completely before starting another
transaction.
• In the serial schedule, when the first transaction completes its
cycle, then the next transaction is executed.
• For example: Suppose there are two transactions T1 and T2
which have some operations.
• If it has no interleaving of operations, then there are the
following two possible outcomes:
• Execute all the operations of T1 followed by all the operations
of T2.
• Execute all the operations of T1 followed by all the operations
of T2.
• In the given (a) figure, Schedule A shows the serial schedule
where T1 followed by T2.
• In the given (b) figure, Schedule B shows the serial schedule
25
where T2 followed by T1.
26
Non-serial Schedule
• If interleaving of operations is allowed, then there will be non-
serial schedule.
• It contains many possible orders in which the system can
execute the individual operations of the transactions.
• In the given figure (c) and (d), Schedule C and Schedule D are
the non-serial schedules. It has interleaving of operations.

27
Serializable schedule
• The serializability of schedules is used to find non-serial
schedules that allow the transaction to execute concurrently
without interfering with one another.
• It identifies which schedules are correct when executions of
the transaction have interleaving of their operations.
• A non-serial schedule will be serializable if its result is equal to
the result of its transactions executed serially.

28
Explanation: In the given scenario, scheduleC is serializable if
the output obtained from both ScheduleC and ScheduleA are
equivalent to one another. In a nutshell, a transaction within a
given non-serial schedule is serializable if its outcome is
equivalent to the outcome of the same transaction when
executed serially.

29
Concurrency Control Protocols/Techniques

Why use Concurrency method?


• Reasons for using Concurrency control method is DBMS:
• To apply Isolation through mutual exclusion between conflicting
transactions
• To resolve read-write and write-write conflict issues
• To preserve database consistency through constantly preserving
execution obstructions
• The system needs to control the interaction among the concurrent
transactions. This control is achieved using concurrent-control
schemes.
• Concurrency control helps to ensure serializability
30
Example
• Assume that two people who go to electronic kiosks at the
same time to buy a movie ticket for the same movie and the
same show time.
• However, there is only one seat left in for the movie show in
that particular theatre. Without concurrency control in DBMS,
it is possible that both moviegoers will end up purchasing a
ticket. However, concurrency control method does not allow
this to happen. Both moviegoers can still access information
written in the movie seating database. But concurrency
control only provides a ticket to the buyer who has completed
the transaction process first.

31
Different concurrency control protocols offer different benefits
between the amount of concurrency they allow and the amount of
overhead that they impose.
• Lock-Based Protocols
• Two Phase
• Timestamp-Based Protocols
• Validation-Based Protocols

32
Concurrency Control with Locking Methods
• One of the main techniques used to control concurrency execution
of transactions (that is, to provide serialisable execution of
transactions) is based on the concept of locking data items.
• A lock is a variable associate with a data item in the database and
describes the status of that data item with respect to possible
operations that can be applied to the item.
• Lock: Guarantees exclusive use of a data item to a current
transaction. The lock prevents other transactions from accessing the
object.
• The overall purpose of locking is to obtain maximum concurrency
and minimum delay in processing transactions.
• Lock manager: Responsible for assigning and policing the locks used
by the transactions. The DBMS has a lock manager subsystem to
keep track of and control access to locks.

33
Lock Types
Binary locks
• Has two states, locked (1) and unlocked (0)
• If an object is locked by a transaction, no other transaction can
use that object
• If an object is unlocked, any transaction can lock the object for its
use
• Binary lock enforces mutual exclusion (prevents simultaneous access
to a shared resource) on the data item.

Shared Lock
• Exists when concurrent transactions are granted read access on the
basis of a common lock
• It is also called read lock, used for reading data items only.
• Shared locks support read integrity.
• They ensure that a record is not in process of being updated during a
read-only request.
• Shared locks can also be used to prevent any kind of updates of
34
record.
Exclusive lock
• Exists when access is reserved for the transaction that locked the
object
• With the Exclusive Lock, a data item can be read as well as written.
Also called write lock.
• An exclusive lock prevents any other locker from obtaining any sort
of a lock on the object.
• They can be owned by only one transaction at a time.

35
Pre-claiming Locking :
Pre-claiming lock protocol helps to evaluate operations and create a
list of required data items which are needed to initiate an execution
process.
In the situation when all locks are granted, the transaction executes.
After that, all locks release when all of its operations are over.
Starvation : Starvation is the situation when a transaction needs to
wait for an indefinite period to acquire a lock . Following are the
reasons for Starvation:
• When waiting scheme for locked items is not properly managed
• In the case of resource leak
• a resource leak is a particular type of resource consumption by a
computer program where the program does not release resources
it has acquired
• The same transaction is selected as a victim repeatedly
Deadlock : Deadlock refers to a specific situation where two or more
processes are waiting for each other to release a resource or more
36
than two processes are waiting for the resource in a circular chain
Example
• If a transaction A holds a shared lock on item X, then a request from
another transaction B for an exclusive lock on X will cause B to go
into a wait state (and B will wait until A’s lock is released). A request
from transaction B for a shared lock on X will be granted (that is, B
will now also hold a shared lock on X).
• If transaction A holds an exclusive lock on record X, then a request
from transaction B for a lock of either type on X will cause B to go
into a wait state (and B will wait until A’s lock is released).

• When transaction A holds an exclusive (X) lock on data item X, the


request from transaction B for an exclusive lock on X will not be
granted.
• If transaction A holds a shared (S) lock on data item X, the request
from transaction B for a shared lock will be granted (two
transactions can read access the same item simultaneously) but not
37
for an exclusive lock.
Guaranteeing serialisability by Two-Phase Locking (2PL) Two
Phase Locking Protocol
• Is a method of concurrency control in DBMS that ensures
serializability by applying a lock to the transaction data which
blocks other transactions to access the same data
simultaneously.
• Two Phase Locking protocol helps to eliminate the
concurrency problem in DBMS.
• This locking protocol divides the execution phase of a
transaction into three different parts.
• In the first phase, when the transaction begins to execute, it
requires permission for the locks it needs.
• The second part is where the transaction obtains all the locks.
When a transaction releases its first lock, the third phase
starts.
• In this third phase, the transaction cannot demand any new 38
locks. Instead, it only releases the acquired locks.
• The Two-Phase Locking protocol allows each transaction to
make a lock or unlock request in two steps:
• Growing Phase: In this phase transaction may obtain locks but
may not release any locks.
• Shrinking Phase: In this phase, a transaction may release locks
but not obtain any new lock

39
• Two-Phase Locking Protocol

40
• It is true that the 2PL protocol offers serializability. However, it
does not ensure that deadlocks do not happen.
• In the above-given diagram, you can see that local and global
deadlock detectors are searching for deadlocks and solve
them with resuming transactions to their initial states.

Strict Two-Phase Locking


• The first phase of Strict-2PL is same as 2PL.
• After acquiring all the locks in the first phase, the transaction
continues to execute normally.
• But in contrast to 2PL, Strict-2PL does not release a lock after
using it.
• Strict-2PL holds all the locks until the commit point and
releases all the locks at a time.

41
Timestamp-based Protocols
• Timestamp based Protocol in DBMS is an algorithm which uses
the System Time or Logical Counter as a timestamp to serialize
the execution of concurrent transactions.
• The Timestamp-based protocol ensures that every conflicting
read and write operations are executed in a timestamp order.
• The older transaction is always given priority in this method.
• It uses system time to determine the time stamp of the
transaction.
• This is the most commonly used concurrency protocol.
• Lock-based protocols help you to manage the order between
the conflicting transactions when they will execute.
Timestamp-based protocols manage conflicts as soon as an
operation is created.

42
Example:
Suppose there are there transactions T1, T2, and T3.
T1 has entered the system at time 0010
T2 has entered the system at 0020
T3 has entered the system at 0030
Priority will be given to transaction T1, then transaction T2 and
lastly Transaction T3.
Advantages:
• Schedules are serializable just like 2PL protocols
• No waiting for the transaction, which eliminates the possibility
of deadlocks!
Disadvantages:
• Starvation is possible if the same transaction is restarted and
continually aborted 43
Validation Based Protocol
• Validation based Protocol in DBMS also known as Optimistic
Concurrency Control Technique is a method to avoid
concurrency in transactions.
• In this protocol, the local copies of the transaction data are
updated rather than the data itself, which results in less
interference while execution of the transaction.
• The Validation based Protocol is performed in the following
three phases:
• Read Phase
• Validation Phase
• Write Phase

44
Validation Based Protocol
Read Phase
• In the Read Phase, the data values from the database can be
read by a transaction but the write operation or updates are
only applied to the local data copies, not the actual database.
Validation Phase
• In Validation Phase, the data is checked to ensure that there is
no violation of serializability while applying the transaction
updates to the database.
Write Phase
• In the Write Phase, the updates are applied to the database if
the validation is successful, else; the updates are not applied,
and the transaction is rolled back.

45
Levels of locking
Database-level lock
• In a database-level lock, the entire database is locked, thus preventing the
use of any tables in the database by transaction T2 while transaction Tl is
being executed.
• This level of locking is good for batch processes, but it is unsuitable for
multiuser DBMSs
Table-level lock
• In a table-level lock, the entire table is locked, preventing access to any row
by transaction T2 while transaction T1 is using the table.
• Consequently, table-level locks are not suitable for multiuser DBMSs.

46
Page-level lock
• In a page-level lock, the DBMS will lock an entire disk page. A
disk page, or page, is the equivalent of a disk block, which can
be described as a directly addressable section of a disk.
• A page has a fixed size, such as 4K, 8K, or 16K. For example, if
you want to write only 73 bytes to a 4K page, the entire 4K
page must be read from disk, updated in memory, and written
back to disk.
• A table can span several pages, and a page can contain several
rows of one or more tables.
• Page-level locks are currently the most frequently used
multiuser DBMS locking method

47
Row-level lock
• The DBMS allows concurrent transactions to access different rows of
the same table even when the rows are located on the same page.
• Transactions can execute concurrently, even when the requested
rows are on the same page. T2 must wait only if it requests the same
row as T1.

48
• MySQL uses row-level locking for InnoDB tables to support
simultaneous write access by multiple sessions, making them
suitable for multi-user, highly concurrent, and OLTP applications.
Field-level lock
• The field-level lock allows concurrent transactions to access the
same row as long as they require the use of different fields
(attributes) within that row.
• Although field-level locking clearly yields the most flexible multiuser
data access, it is rarely implemented in a DBMS because it requires
an extremely high level of computer overhead (combination of
excess or indirect computation time, memory, bandwidth, or other
resources that are required to perform a specific task).

49
Characteristics of Good Concurrency Protocol
• An ideal concurrency control DBMS mechanism has the
following objectives:
• Must be resilient to site and communication failures.
• It allows the parallel execution of transactions to achieve
maximum concurrency.
• Its storage mechanisms and computational methods should be
modest to minimize overhead.
• It must enforce some constraints on the structure of atomic
actions of transactions.

50
Problems in Using Locks
• Resulting transaction schedule might not be serializable
• Schedule might create deadlocks

51
Deadlocks
• Occurs when two transactions wait indefinitely for each other
to unlock data
How deadlock is created

52
Deadlock represented by a wait graph

53
Deadlock Control Techniques
Deadlock prevention
• The approach does not allow any transaction to acquire locks
that will lead to deadlocks.
• The convention is that when more than one transactions
request for locking the same data item, only one of them is
granted the lock.
• One of the most popular deadlock prevention methods is pre-
acquisition of all the locks.
• In this method, a transaction acquires all the locks before
starting to execute and retains the locks for the entire
duration of transaction. If another transaction needs any of
the already acquired locks, it has to wait until all the locks it
needs are available.
• Using this approach, the system is prevented from being
deadlocked since none of the waiting transactions are holding 54
any lock.
Deadlock Avoidance
• The approach handles deadlocks before they occur.
• It analyzes the transactions and the locks to determine
whether or not waiting leads to a deadlock.
• Transactions start executing and request data items that they
need to lock.
• The lock manager checks whether the lock is available.
• If it is available, the lock manager allocates the data item and
the transaction acquires the lock.
• However, if the item is locked by some other transaction in
incompatible mode, the lock manager runs an algorithm to
test whether keeping the transaction in waiting state will
cause a deadlock or not.
• Accordingly, the algorithm decides whether the transaction
can wait or one of the transactions should be aborted.
55
• These algorithms use the concept of transaction timestamp
TS(T), which is a unique identifier assigned to each
transaction.
• The timestamps are ordered based on the order in which
transactions are started; hence, if transaction T1 starts before
transaction T2, then TS(T1) < TS(T2).
• Notice that the older transaction has a smaller timestamp
value.

• Eg For an older transaction T which is ‘born’ earlier, its


birthday (i.e. TS(T) = 10am) is smaller than a younger
transaction T’, which is ‘born’ at 11am (i.e. TS(T’) = 11am, and
TS(T) < TS(T’) ).

56
• There are two algorithms are :
• wait-die
• wound-wait.
• Let us assume that there are two transactions, T1 and T2, where T1
tries to lock a data item which is already locked by T2.
• The algorithms are as follows −
Wait-Die − If T1 is older ( has smaller timestamp) than T2, T1 is
allowed to wait. Otherwise, if T1 is younger( higher timestamp)
than T2, T1 is aborted and later restarted with the same timestamp.

57
Wound-Wait − If T1 is older (has smaller timestamp) than T2, T2 is
aborted and later restarted with the same timestamp,. Otherwise, if
T1 is younger ( has higher timestamp) than T2, T1 is allowed to
wait.

58
Summary
• In wait-die, an older transaction is allowed to wait on a younger
transaction, whereas a younger transaction requesting an item held by an
older transaction is aborted and restarted.
• The wound-die approach does the opposite: a younger transaction is
allowed to wait on an older one, whereas an older transaction requesting
an item held by a younger transaction pre-empts the younger transaction
by aborting it.

59

You might also like