Chapter 3
Transaction Processing Concepts
Difference between single and multi- User
system?
• Single-User System
– At most one user at a time can use the DBMS.
• Eg. Personal computer system.
• Multi-user System
– Many users can access the DBMS
concurrently.
• Eg. Airline reservation, Bank and the like system
are operated by many users who submit transaction
concurrently to the system.
– This is achieved by multi programming, which
allows the computer to execute multiple
programs /processes at the same time.
Concurrency of Transaction
• Interleaved processing
– Concurrent execution of processes is interleaved in a
single CPU that using round robin algorithm as an
examples.
– The advantages interleaving process are
• keeps the CPU busy when the process requires I/O by switching
to execute another process rather than remaining idle during
I/O time and
• Increase system throughput (average no. of transactions
completed within a given time) and
• prevents long process from delaying other processes (minimize
unpredictable delay in the response time).
• Parallel processing: -The process is concurrently
executed in multiple CPUs.
What is Transaction?
• It is a unit of program execution that
accesses and possibly updates various
data items.
• Transaction is initiated by user program
written in
• A high-level data-manipulation language (typically SQL), or
• Programming language (for example, C++, or Java), with
embedded database accesses in JDBC or ODBC.
Cont’d
• It is a logical unit of database processing that
includes one or more access operations (read -
retrieval, write - insert or update, delete).
– Examples include ATM transactions, credit card
approvals, flight reservations, hotel check-in,
phone calls, supermarket scanning, academic
registration and billing.
• It is a collection of operations that form a single
logical unit of work.
Transaction boundaries
• Any single transaction in an application
program is bounded with Begin and End
statements.
– An application program may contain several
transactions separated by the Begin and End
transaction boundaries.
• This collection of steps must appear to the
user as a single, indivisible unit.
Cont’d
• Since transaction is indivisible, it either
executes in its entirety or not at all.
– Thus, if a transaction begins to execute but
fails for whatever reason, any changes to the
database that the transaction may have made
must be undone.
• This requirement holds regardless of
whether the transaction itself failed. For
example,
– if it divided by zero,
– the operating system crashed, or
– The computer itself stopped operating.
ACID properties of the transactions
• Atomicity.
– Either all operations of the transaction are reflected properly in the
database, or none are.
– This “all-or-none” property is referred to as atomicity.
• Consistency.
– Execution of a transaction in isolation (with no other transaction
executing concurrently) preserves the consistency of the database.
– A transaction must preserve database consistency—if a transaction
is run atomically in isolation starting from a consistent database, the
database must again be consistent at the end of the transaction.
Cont’d
• Isolation
– Even though multiple transactions may execute
concurrently,
• The system guarantees that, for every pair of
transactions Ti and Tj, it appears to Ti that either Tj
finished execution before Ti started or Tj started
execution after Ti finished.
– Thus, each transaction is unaware of other
transactions executing concurrently in the system.
Cont’d
• The database system must take special actions
– to ensure that transactions operate properly without
interference from concurrently executing database
statements. This property is referred to as isolation.
• Durability
– After a transaction completes successfully, the changes it
has made to the database persist, even if there are
system failures.
Simple Model of a Database
• A database is a collection of data items.
– Because SQL is a powerful and complex language, we
focus on when data are moved from disk to main
memory and main memory to disk.
• Granularity of data a field, a record ,or a whole
disk block that measure the size of the data item
• Basic operations that a transaction can perform are
read and write.
Cont’d
• Transactions access data using two
operations.
– read_item(X):
• Reads a database item named X into a program variable.
• To simplify our notation, we assume that the program
variable is also named X.
– write_item(X):
• Writes the value of program variable X into the database
item named X.
Cont’d
• Basic unit of data transfer from the disk to the
computer main memory is one block.
• read_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if
that disk block is not already in some main memory buffer).
3. Copy item X from the buffer to the program variable
named X.
Cont’d
• write_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if
that disk block is not already in some main memory buffer).
3. Copy item X from the program variable named X into its
correct location in the buffer.
4. Store the updated block from the buffer back to disk (either
immediately or later).
Cont’d
• The DBMS maintains a number of buffers in the main
memory that holds database disk blocks which
contains the database items being processed.
– When this buffers are occupied and if there is a need for
additional database block to be copied to the main memory.
• Some buffer management policy is used to choose for
replacement but if the chosen buffer has been modified,
it must be written back to disk before it is used.
Transaction Atomicity and Durability
• A transaction may not always complete its
execution successfully.
– Such a transaction is aborted.
• If we are to ensure the atomicity property, an
aborted transaction must have no effect on the
state of the database.
– Thus, any changes that the aborted transaction made to
the database must be undone.
Cont’d
– Once the changes caused by an aborted transaction have
been undone, we say that the transaction has been rolled
back.
• It is part of the responsibility of the recovery scheme
to manage transaction aborts.
– This is done typically by maintaining a log.
• Each database modification made by a transaction is
first recorded in the log.
Cont’d
• We record
– the identifier of the transaction performing the
modification,
– the identifier of the data item being modified, and
– both the old value (prior to modification) and the
new value (after modification) of the data item.
• Only then is the database itself modified.
Cont’d
• Maintaining a log provides
– the possibility of redoing a modification to ensure
atomicity and durability.
– the possibility of undoing a modification to ensure
atomicity in case of a failure during transaction
execution.
• A transaction that completes its execution
successfully is said to be committed.
Cont’d
• A committed transaction that has performed updates
transforms the database into a new consistent state,
which must persist even if there is a system failure.
– Once a transaction has committed, we cannot undo its effects by
aborting it.
• The only way to undo the effects of a committed transaction
is to execute a compensating transaction (pay costs).
– For instance, if a transaction added $20 to an account, the
compensating transaction would subtract $20 from the account.
Transaction States
– Active: - the transaction stays in this state while it is in
executing.
– Partially committed, after the final statement has been
executed.
– Failed, after the discovery that normal execution can no longer
proceed.
– Aborted, after the transaction has been rolled back and the
database has been restored to its state prior to the start of
the transaction.
– Committed, after successful completion.
Cont’d
• A transaction has committed only if it has entered the
committed state.
• A transaction has aborted only if it has entered the
aborted state.
• A transaction is said to have terminated if it has either
committed or aborted.
Cont’d
Figure 1: - State diagram of a
transaction
Cont’d
• A transaction starts in the active state.
• When it finishes its final statement, it enters the
partially committed state.
– At this point, the transaction has completed its
execution, but it is still possible that it may have to be
aborted, since
• The actual output may still be temporarily residing in main memory,
and
• Thus a hardware failure may prevent its successful completion.
Cont’d
• The database system then writes out enough
information to disk that,
– even in the event of a failure, the updates
performed by the transaction can be re-created
when the system restarts after the failure.
• When the last of this information is written out,
the transaction enters the committed state.
Cont’d
• A transaction enters the failed state
after the system determines that
– The transaction can no longer proceed
with its normal execution (example,
because of hardware or logical errors).
• Such a transaction must be rolled back.
– Then, it enters the aborted state.
Cont’d
• At this point(Aborted state), the system has two options:
1. It can restart the transaction, but only if the transaction was
aborted as a result of some hardware or software error that
was not created through the internal logic of the transaction.
– A restarted transaction is considered to be a new transaction.
2. It can kill the transaction.
– It usually does so because of some internal logical error that can be
corrected only by rewriting the application program, or because the input
was bad, or because the desired data were not found in the database.
Transaction Isolation
• Transaction-processing systems usually allow
multiple transactions to run concurrently.
• Allowing multiple transactions to update data
concurrently causes several problems with
consistency of the data.
Cont’d
• There are two good reasons for allowing concurrency:
1. Improved throughput and resource utilization.
– To increases the throughput of the system—that is, the
number of transactions executed in a given amount of time.
– Correspondingly, the processor (CPU) and disk
utilization also increase;
– In other words, the processor and disk spend less time idle,
or not performing any useful work.
Cont’d
2. Reduced waiting time
– If transactions run serially, a short transaction may have to wait for a
preceding long transaction to complete, which can lead to unpredictable
delays in running a transaction.
– If the transactions are operating on different parts of the database, it is
better to let them run concurrently, sharing the CPU cycles and disk
accesses among them.
– Concurrent execution reduces the unpredictable delays in running
transactions. Moreover, it also reduces the average response time:
the average time for a transaction to be completed after it has been
submitted.
Cont’d
• The database system must control
– The interaction among the concurrent
transactions to prevent them from destroying
the consistency of the database.
• It does so through a variety of
mechanisms called concurrency-control
schemes.
Cont’d
• Example: - Let T1 and T2 be two transactions
that transfer money from one account to
another.
Cont’d
• Suppose the current values of accounts A and B are 1000
birr and 2000 birr, respectively.
• Suppose two transactions are executed one at a time in
the order T1 followed by T2.
• In the figure, the sequence of instruction steps is in
chronological order from top to bottom.
Cont’d
• The final values of accounts A and B, after the
execution in Figure 2 takes place, are 855 birr
and 2145 birr, respectively.
• Thus, the total amount of money in accounts A
and B—that is, the sum A + B—is preserved after
the execution of both transactions.
Cont’d
Cont’d
• Schedule 1 is serial schedule in which T1 is
followed by T2.
• Similarly, if the transactions are executed one at a
time in the order T2 followed by T1, then the
corresponding execution sequence is that of Figure 3.
– Again, as expected, the sum A + B is preserved, and the
final values of accounts A and B are 850 birr and 2150
birr, respectively.
Cont’d
Cont’d
• The execution sequences just described are called
schedules.
– They represent the chronological order in which instructions
are executed in the system.
– These schedules are serial.
• Each serial schedule consists of a sequence of
instructions from various transactions, where the
instructions belonging to one single transaction appear
together in that schedule.
Cont’d
• When the database system executes several transactions
concurrently, the corresponding schedule no longer needs to
be serial.
– If two transactions are running concurrently,
• the operating system may execute one transaction for a little while, then
perform a context switch, execute the second transaction for some time,
and then switch back to the first transaction for some time, and so on.
• With multiple transactions, the CPU time is shared among all
the transactions.
Cont’d
• Not all concurrent executions result in a correct state.
• Consider the schedule of Figure 5.
– After the execution of this schedule, we arrive at a state where
the final values of accounts A and B are 950 birr and 2100
birr, respectively.
– This final state is an inconsistent state, since we have gained
50 birr in the process of the concurrent execution.
– Indeed, the sum A + B is not preserved by the execution of the
two transactions.
Cont’d
• It is the job of the database system to ensure that
any schedule that is executed will leave the
database in a consistent state.
• The concurrency-control component of the
database system carries out this task.
Cont’d
• We can ensure consistency of the database under
concurrent execution
– by making sure that any schedule that is executed has
the same effect as a schedule that could have occurred
without any concurrent execution.
• That is, the schedule should, in some sense, be
equivalent to a serial schedule.
• Such schedules are called serializable schedules.
Serializable
• Serial schedules are serializable, but if steps of multiple
transactions are interleaved, it is harder to determine whether a
schedule is serializable or not.
• Since transactions are programs, it is difficult to determine
exactly
– What operations a transaction performs and
– How operations of various transactions interact.
• For this reason, we shall consider only on two types of operations
that a transaction can perform on a data item: read and write.
Cont’d
• We assume that, between a read (Q) instruction
and a write (Q) instruction on a data item Q,
– A transaction may perform an arbitrary sequence of
operations on the copy of Q that is residing in the local
buffer of the transaction.
• Let us consider a schedule S in which there are two
consecutive instructions, I and J, of transactions
Ti and Tj, respectively (i = j).
Cont’d
• If I and J refer to different data items, then we can swap (exchange)
I and J without affecting the results of any instruction in the schedule.
• If I and J refer to the same data item Q, then the order of the two
steps may matter.
1. I = read (Q), J = read (Q). The order of I and J does not matter.
2. I = read (Q), J = write (Q).
– If I comes before J, then Ti does not read the value of Q that is written by Tj
in instruction J .
– If J comes before I, then Ti reads the value of Q that is written by Tj.
– Thus, the order of I and J matters.
Cont’d
3. I = write (Q), J = read (Q).
– The order of I and J matters for reasons similar to those of the previous case.
4. I = write (Q), J = write (Q).
– Since both instructions are write operations, the order of these instructions does
not affect either Ti or Tj.
– However, the value obtained by the next read (Q) instruction of S is affected,
since the result of only the latter of the two write instructions is preserved in the
database.
• If there is no other write (Q) instruction after I and J in S, then the order
of I and J directly affects the final value of Q in the database state that
results from schedule S.
Cont’d
• We say that I and J conflict
– if they are operations by different transactions on the same data item,
and at least one of these instructions is a write operation.
• To illustrate the concept of conflicting instructions, we consider
schedule 3.
– The write (A) instruction of T1 conflicts with the read (A)
instruction of T2.
– However, the write (A) instruction of T2 does not conflict with the
read (B) instruction of T1, because the two instructions access
different data items.
Cont’d
Cont’d
Cont’d
• Let I and J be consecutive instructions of a schedule S.
– If I and J are instructions of different transactions and I and J do not
conflict, then we can swap the order of I and J to produce a new
schedule S.
• S is equivalent to S, since all instructions appear in the same
order in both schedules except for I and J, whose order does not
matter.
– Since the write (A) instruction of T2 in schedule 3 does not conflict
with the read (B) instruction of T1, we can swap these instructions
to generate an equivalent schedule, schedule 5, in Figure 7.
Cont’d
• Regardless of the initial system state, schedules 3 and 5 both
produce the same final system state.
– We continue to swap non conflicting instructions:
– Swap the read (B) instruction of T1 with the read (A) instruction of T2.
– Swap the write (B) instruction of T1 with the write (A) instruction of T2.
• Swap the write (B) instruction of T1 with the read (A) instruction
of T2.
• The final result of these swaps, schedule 6 of Figure 8, is a serial
schedule.
Transaction Isolation and Atomicity
• Effect of transaction failures during concurrent
execution.
Cont’d
• If a transaction Ti fails, for whatever reason, we need to
undo the effect of this transaction to ensure the
atomicity property of the transaction.
– In a system that allows concurrent execution, the atomicity
property requires that any transaction Tj that is dependent on Ti
(that is, Tj has read data written by Ti) is also aborted.
– To achieve this, we need to place restrictions on the type of
schedules permitted in the system. In the following two
subsections, we address the issue of what schedules are acceptable
from the viewpoint of recovery from transaction failure.
Recoverable Schedules
• A recoverable schedule is one
where,
– for each pair of transactions Ti and Tj
such that Tj reads a data item
previously written by Ti, the commit
operation of Ti appears before the
commit operation of Tj.
Cascadeless Schedules
• Even if a schedule is recoverable, to recover
correctly from the failure of a transaction Ti, we
may have to roll back several transactions.
• Such situations occur if transactions have read
data written by Ti.
Cont’d
• Example
– transaction T8 writes a value of A that is read by transaction T9.
– Transaction T9 writes a value of A that is read by transaction T10.
• Suppose that, at this point, T8 fails. T8 must be rolled back. Since T9 is
dependent on T8, T9 must be rolled back. Since T10 is dependent on T9,
T10 must be rolled back.
• This phenomenon, in which a single transaction failure leads to
a series of transaction rollbacks, is called cascading rollback.
Cont’d
Cont’d
• Cascading rollback is undesirable, since it leads to the undoing of
a significant amount of work.
• It is desirable to restrict the schedules to those where cascading
rollbacks cannot occur.
• Such schedules are called Cascadeless schedules.
– Formally, a Cascadeless schedule is one where, for each pair of
transactions Ti and Tj such that Tj reads a data item previously written
by Ti, the commit operation of Ti appears before the read operation of Tj.
– It is easy to verify that every Cascadeless schedule is also
recoverable.
Transaction Isolation Levels
• Serializability is allows programmers to ignore issues
related to concurrency when they code transactions.
• The SQL standard also allows a transaction to specify that it
may be executed in such a way that it becomes non-
serializable with respect to other transactions.
– For instance, a transaction may operate at the isolation level of
read uncommitted, which permits the transaction to read a data
item even if it was written by a transaction that has not been
committed.
The isolation levels specified by the SQL
standard are as follows:
• Serializable usually ensures serializable execution.
• Repeatable read allows only committed data to be read and
further requires that, between two reads of a data item by a
transaction, no other transaction is allowed to update it
• Read committed allows only committed data to be read, but
does not require repeatable reads.
– For instance, between two reads of a data item by the transaction,
another transaction may have updated the data item and
committed.
Cont’d
• Read uncommitted allows uncommitted data to be
read.
– It is the lowest isolation level allowed by SQL.
• All the isolation levels above additionally disallow
dirty writes, that is,
– they disallow writes to a data item that has already been
written by another transaction that has not yet
committed or aborted.
Locking
• Instead of locking the entire database, a transaction
could, lock only those data items that it accesses.
• Under such a policy, the transaction must hold locks long
enough to ensure Serializability, but for a period short
enough not to harm performance excessively.
• There are two kinds of locks.
1. Shared locks are used for data that the transaction reads and
2. Exclusive locks are used for those it writes.
Cont’d
• Many transactions can hold shared locks on the
same data item at the same time,
– but a transaction is allowed an exclusive lock on a data
item only if no other transaction holds any lock (regardless
of whether shared or exclusive) on the data item.
• This use of two modes of locks along with two-
phase locking allows concurrent reading of
data while still ensuring Serializability.
Transactions as SQL Statements
• In SQL, insert statements create new data and delete
statements delete data.
– These two statements are, write operations, since they change
the database, but their interactions with the actions of other
transactions are different.
• Example, consider the following SQL query on our
university database that finds all instructors who earn more
than 90,000 birr.
– Select ID, name from instructor where salary > 90000;
Cont’d
• Using our sample instructor relation,
– we find that only Ein-stein and Brandt satisfy the
condition.
• Now assume that around the same time we are
running our query, another user inserts a new
instructor named James‖ whose salary is $100,000.
– Insert into instructor values (’11111’, ’James’,
’Marketing’, 100000);
Cont’d
• The result of our query will be different depending on
whether this insert comes before or after our
query is run.
– In a concurrent execution of these transactions, it is
intuitively clear that they conflict, but this is a conflict not
captured by our simple model.
– This situation is referred to as the phantom phenomenon,
because a conflict may exist on phantom data.
Cont’d
• But in an SQL statement, the specific data
items (tuples) referenced may be determined
by a where clause predicate.
• So the same transaction, if run more than
once, might reference different data items
each time it is run if the values in the
database change between runs.
Cont’d
• One way of dealing with the above problem is
to recognize that it is not sufficient for
concurrency control to consider only the tuples
that are accessed by a transaction;
– the information used to find the tuples that are
accessed by the transaction must also be
considered for the purpose of concurrency control.
Cont’d
• The information used to find tuples could be updated by an
insertion or deletion, or in the case of an index, even by an
update to a search-key attribute.
– For example, if locking is used for concurrency control, the data
structures that track the tuples in a relation, as well as index structures,
must be appropriately locked.
• However, such locking can lead to poor concurrency in some
situations; index-locking protocols which maximize concurrency,
while ensuring Serializability in spite of inserts, deletes, and
predicates in queries.
Cont’d
• Let us consider again the query:
– Select ID, name from instructor where salary>
90000;
• And the following SQL update:
– Update instructor set salary = salary *0.9 where
name = ’Wu’;
Cont’d
• If our query reads the entire instructor relation,
then it reads the tuple with Wu’s data and conflicts
with the update.
– However, if an index were available that allowed our
query direct access to those tuples with salary > 90000,
• then our query would not have accessed Wu’s data at all
because Wu’s salary is initially 90,000 birr in our example
instructor relation, and reduces to 81,000 birr after the update.
Cont’d
– However, using the above approach, it would appear that
the existence of a conflict depends on a low-level query
processing decision by the system that is unrelated to a
user-level view of the meaning of the two SQL statements!
– An alternative approach to concurrency control treats an
insert, delete or update as conflicting with a predicate
on a relation, if it could affect the set of tuples selected by
a predicate.
Cont’d
• In our example query above,
– the predicate is salary > 90000‖, and an update of Wu’s
salary from 90,000 birr to a value greater than 90,000
birr, or an update of Einstein’s salary from a value
greater than 90,000 birr to a value less than or equal to
90,000 birr, would conflict with this predicate.
• Locking based on this idea is called predicate
locking; however predicate locking is expensive, and
not used in practice.
Summary
• A transaction is a unit of program execution that accesses and
updates various data items.
• Transactions are required to have the ACID properties:
atomicity, consistency, isolation, and durability.
– Atomicity ensures that either all the effects of a transaction are
reflected in the database, or none are; a failure cannot leave the
database in a state where a transaction is partially executed.
– Consistency ensures that, if the database is initially consistent, the
execution of the transaction (by itself) leaves the database in a
consistent state.
Cont’d
– Isolation ensures that concurrently executing
transactions are isolated from one another, so
that each has the impression that no other
transaction is executing concurrently with it.
– Durability ensures that, once a transaction has
been committed, that transaction’s updates do
not get lost, even if there is a system failure.
Cont’d
• Concurrent execution of transactions improves throughput of transactions
and system utilization, and also reduces waiting time of transactions.
• The various types of storage in a computer are volatile storage, nonvolatile
storage, and stable storage.
• Data in volatile storage, such as in RAM, are lost when the computer crashes.
• Data in nonvolatile storage, such as disk, are not lost when the computer
crashes, but may occasionally be lost because of failures such as disk crashes.
• Data in stable storage are never lost.
Cont’d
• Stable storage that must be accessible online is approximated with mirrored
disks, or other forms of RAID, which provide redundant data storage.
• Offline or archival, stable storage may consist of multiple tape copies of data
stored in physically secure locations.
• When several transactions execute concurrently on the database, the
consistency of data may no longer be preserved.
• It is therefore necessary for the system to control the interaction among the
concurrent transactions.
Cont’d
• Since a transaction is a unit that preserves consistency, a serial
execution of transactions guarantees that consistency is
preserved.
• A schedule captures the key actions of transactions that affect
concurrent execution, such as read and write operations, while
abstracting away internal details of the execution of the
transaction.
– We require that any schedule produced by concurrent processing of a set
of transactions will have an effect equivalent to a schedule produced
when these transactions are run serially in some order.
Cont’d
• A system that guarantees this property is said to ensure
serializability.
• There are several different notions of equivalence
leading to the concepts of conflict serializability and
view serializability.
• Serializability of schedules generated by concurrently executing
transactions can be ensured through one of a variety of mechanisms
called concurrency-control policies.
Cont’d
• We can test a given schedule for conflict
serializability by constructing a precedence graph
for the schedule, and by searching for absence of
cycles in the graph.
• However, there are more efficient concurrency-
control policies for ensuring serializability.
Cont’d
• Schedules must be recoverable, to make sure that if transaction a
sees the effects of transaction b, and b then aborts, then a also
gets aborted.
– Schedules should preferably be Cascadeless, so that the abort of a
transaction does not result in cascading aborts of other transactions.
– Cascadelessness is ensured by allowing transactions to only read
committed data.
• The concurrency-control–management component of the
database is responsible for handling the concurrency-control
policies.
Questions