Dbms Unit IV
Dbms Unit IV
Nuthanakanti Bhaskar
UNIT-IV
-------------------------------------------------------------------------------------------------------------------------------
-
3. Serializability
4. Recoverability
5. Implementation of isolation
16. ARIES
Transaction Concept
• A transaction is a unit of program execution that accesses and possibly updates various data
items.
• E.g. transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Two main issues to deal with:
– Failures of various kinds, such as hardware failures and system crashes
– Concurrent execution of multiple transactions
ACID Properties
• Atomicity. Either all operations of the transaction are properly reflected in the database or none
are.
• Consistency. Execution of a transaction in isolation preserves the consistency of the database.
• Isolation. Although multiple transactions may execute concurrently, each transaction must be
unaware of other concurrently executing transactions. Intermediate transaction results must be
hidden from other concurrently executed transactions.
– That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished
execution before Ti started, or Tj started execution after Ti finished.
• Durability. After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
• Example of Fund Transfer
• Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Atomicity requirement
– if the transaction fails after step 3 and before step 6, money will be “lost” leading to an
inconsistent database state
• Failure could be due to software or hardware
– the system should ensure that updates of a partially executed transaction are not reflected
in the database
Durability requirement
Once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken
place), the updates to the database by the transaction must persist even if there are software or hardware
failures.
• Example of Fund Transfer (Cont.)
Consistency requirement
in above example:
– the sum of A and B is unchanged by the execution of the transaction
• In general, consistency requirements include
• Explicitly specified integrity constraints such as primary keys and foreign keys
• Implicit integrity constraints
– e.g. sum of balances of all accounts, minus sum of loan amounts must
equal value of cash-in-hand
– A transaction must see a consistent database.
– During transaction execution the database may be temporarily inconsistent.
– When the transaction completes successfully the database must be consistent
• Erroneous transaction logic can lead to inconsistency
Isolation requirement
If between steps 3 and 6, another transaction T2 is allowed to access the partially updated database, it will
see an inconsistent database (the sum A + B will be less than it should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
• Isolation can be ensured trivially by running transactions serially
– that is, one after the other.
• However, executing multiple transactions concurrently has significant benefits, as we will see
later.
Transaction State
• Active – the initial state; the transaction stays in this state while it is executing
• Partially committed – after the final statement has been executed.
• Failed -- after the discovery that normal execution can no longer proceed.
• Aborted – after the transaction has been rolled back and the database restored to its state prior to
the start of the transaction. Two options after it has been aborted:
–
restart the transaction
• can be done only if no internal logical error
– kill the transaction
• Committed – after successful completion.
Concurrent Executions
• Multiple transactions are allowed to run concurrently in the system. Advantages are:
– increased processor and disk utilization, leading to better transaction throughput
• E.g. one transaction can be using the CPU while another is reading from or
writing to the disk
– reduced average response time for transactions: short transactions need not wait behind
long ones.
– must preserve the order in which the instructions appear in each individual transaction.
• A transaction that successfully completes its execution will have a commit instructions as the last
statement
– by default transaction assumed to execute commit instruction as its last step
• A transaction that fails to successfully complete its execution will have an abort instruction as the
last statement
• Schedule 1
• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
• A serial schedule in which T1 is followed by T2 :
• Schedule 2
• Schedule 3
• Let T1 and T2 be the transactions defined previously. The following schedule is not a serial
schedule, but it is equivalent to Schedule 1.
• Schedule 4
• The following concurrent schedule does not preserve the value of (A + B ).
Serializability
• Basic Assumption – Each transaction preserves database consistency.
• Thus serial execution of a set of transactions preserves database consistency.
• A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different
forms of schedule equivalence give rise to the notions of:
1. conflict serializability
2. view serializability
• Simplified view of transactions
– We ignore operations other than read and write instructions
– We assume that transactions may perform arbitrary computations on data in local buffers
in between reads and writes.
– Our simplified schedules consist of only read and write instructions.
Conflicting Instructions
• Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some
item Q accessed by both li and lj, and at least one of these instructions wrote Q.
View Serializability
• Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if
the following three conditions are met, for each data item Q,
– If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’ also
transaction Ti must read the initial value of Q.
– If in schedule S transaction Ti executes read(Q), and that value was produced by
transaction Tj (if any), then in schedule S’ also transaction Ti must read the value of Q
that was produced by the same write(Q) operation of transaction Tj .
– The transaction (if any) that performs the final write(Q) operation in schedule S must
also perform the final write(Q) operation in schedule S’.
As can be seen, view equivalence is also based purely on reads and writes alone.
• A schedule S is view serializable if it is view equivalent to a serial schedule.
• Every conflict serializable schedule is also view serializable.
• Below is a schedule which is view-serializable but not conflict serializable.
• What serial schedule is above equivalent to?
• Every view serializable schedule that is not conflict serializable has blind writes.
• Other Notions of Serializability
• The schedule below produces same outcome as the serial schedule < T1, T5 >, yet is not conflict
equivalent or view equivalent to it.
• Determining such equivalence requires analysis of operations other than read and write.
Recoverable Schedules
• Recoverable schedule — if a transaction Tj reads a data item previously written by a transaction
Ti , then the commit operation of Ti appears before the commit operation of Tj.
• The following schedule (Schedule 11) is not recoverable if T9 commits immediately after the read
• If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent database
state. Hence, database must ensure that schedules are recoverable.
Cascading Rollbacks
• Cascading rollback – a single transaction failure leads to a series of transaction rollbacks.
Consider the following schedule where none of the transactions has yet committed (so the
schedule is recoverable)
• If T10 fails, T11 and T12 must also be rolled back.
• Can lead to the undoing of a significant amount of work
Cascadeless Schedules
Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions Ti and Tj
such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the
read operation of Tj.
• Every cascadeless schedule is also recoverable
• It is desirable to restrict the schedules to those that are cascadeless
Concurrency Control
• A database must provide a mechanism that will ensure that all possible schedules are
– either conflict or view serializable, and
– are recoverable and preferably cascadeless
• A policy in which only one transaction can execute at a time generates serial schedules, but
provides a poor degree of concurrency
– Are serial schedules recoverable/cascadeless?
• Testing a schedule for serializability after it has executed is a little too late!
• Goal – to develop concurrency control protocols that will assure serializability.
Implementation of Isolation
• Schedules must be conflict or view serializable, and recoverable, for the sake of database
consistency, and preferably cascadeless.
• A policy in which only one transaction can execute at a time generates serial schedules, but
provides a poor degree of concurrency.
• Concurrency-control schemes tradeoff between the amount of concurrency they allow and the
amount of overhead that they incur.
• Some schemes allow only conflict-serializable schedules to be generated, while others allow
view-serializable schedules that are not conflict-serializable.
read(U)
write(U)
Lock-Based Protocols
Lock-compatibility matrix
• A transaction may be granted a lock on an item if the requested lock is compatible with locks
already held on the item by other transactions
• Any number of transactions can hold shared locks on an item,
– but if any transaction holds an exclusive on the item no other transaction may hold any
lock on the item.
• If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks
held by other transactions have been released. The lock is then granted.
• Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
• Locking as above is not sufficient to guarantee serializability — if A and B get updated in-
between the read of A and B, the displayed sum would be wrong.
• A locking protocol is a set of rules followed by all transactions while requesting and releasing
locks. Locking protocols restrict the set of possible schedules.
• A transaction Ti issues the standard read/write instruction, without explicit locking calls.
• The operation read(D) is processed as:
if Ti has a lock on D
then
read(D)
else begin
if necessary wait until no other
transaction has a lock-X on D
grant Ti a lock-S on D;
read(D)
end
Implementation of Locking
• A lock manager can be implemented as a separate process to which transactions send lock and
unlock requests
• The lock manager replies to a lock request by sending a lock grant messages (or a message asking
the transaction to roll back, in case of a deadlock)
• The requesting transaction waits until its request is answered
• The lock manager maintains a data-structure called a lock table to record granted locks and
pending requests
• The lock table is usually implemented as an in-memory hash table indexed on the name of the
data item being locked
Lock Table
• Black rectangles indicate granted locks, white ones indicate waiting requests
• Lock table also records the type of lock granted or requested
• New request is added to the end of the queue of requests for the data item, and granted if it is
compatible with all earlier locks
• Unlock requests result in the request being deleted, and later requests are checked to see if they
can now be granted
• If transaction aborts, all waiting or granted requests of the transaction are deleted
– lock manager may keep a list of locks held by each transaction, to implement this
efficiently
Graph-Based Protocols
Tree Protocol
Timestamp-Based Protocols
• Each transaction is issued a timestamp when it enters the system. If an old transaction Ti has time-
stamp TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti) <TS(Tj).
• The protocol manages concurrent execution such that the time-stamps determine the
serializability order.
• In order to assure such behavior, the protocol maintains for each data Q two timestamp values:
– W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q)
successfully.
– R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q)
successfully.
• The timestamp ordering protocol ensures that any conflicting read and write operations are
executed in timestamp order.
• Suppose a transaction Ti issues a read(Q)
– If TS(Ti) W-timestamp(Q), then Ti needs to read a value of Q that was already
overwritten.
n Hence, the read operation is rejected, and Ti is rolled back.
– If TS(Ti) W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is
set to max(R-timestamp(Q), TS(Ti)).
• Suppose that transaction Ti issues write(Q).
– If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed
previously, and the system assumed that that value would never be produced.
n Hence, the write operation is rejected, and Ti is rolled back.
– If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.
n Hence, this write operation is rejected, and Ti is rolled back.
– Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).
• Example Use of the Protocol
A partial schedule for several data items for transactions with
timestamps 1, 2, 3, 4, 5
• The timestamp-ordering protocol guarantees serializability since all the arcs in the precedence
graph are of the form:
• Modified version of the timestamp-ordering protocol in which obsolete write operations may be
ignored under certain circumstances.
• When Ti attempts to write data item Q, if TS(Ti) < W-timestamp(Q), then Ti is attempting to write
an obsolete value of {Q}.
– Rather than rolling back Ti as the timestamp ordering protocol would have done, this
{write} operation can be ignored.
• Otherwise this protocol is the same as the timestamp ordering protocol.
• Thomas' Write Rule allows greater potential concurrency.
– Allows some view-serializable schedules that are not conflict-serializable.
Validation-Based Protocol
Multiple Granularity
• Allow data items to be of various sizes and define a hierarchy of data granularities, where the
small granularities are nested within larger ones
• Can be represented graphically as a tree (but don't confuse with tree-locking protocol)
• When a transaction locks a node in the tree explicitly, it implicitly locks all the node's descendents
in the same mode.
• Granularity of locking (level in tree where locking is done):
n fine granularity (lower in tree): high concurrency, high locking overhead
n coarse granularity (higher in tree): low locking overhead, low concurrency
• Example of Granularity Hierarchy
The levels, starting from the coarsest (top) level are
– database
– area
– file
– record
• In addition to S and X lock modes, there are three additional lock modes with multiple
granularity:
– intention-shared (IS): indicates explicit locking at a lower level of the tree but only with
shared locks.
– intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or
shared locks
– shared and intention-exclusive (SIX): the subtree rooted by that node is locked explicitly
in shared mode and explicit locking is being done at a lower level with exclusive-mode
locks.
• intention locks allow a higher level node to be locked in S or X mode without having to check all
descendent nodes.
– The root of the tree must be locked first, and may be locked in any mode.
– A node Q can be locked by Ti in S or IS mode only if the parent of Q is currently locked
by Ti in either IX or IS mode.
– A node Q can be locked by Ti in X, SIX, or IX mode only if the parent of Q is currently
locked by Ti in either IX or SIX mode.
– Ti can lock a node only if it has not previously unlocked any node (that is, Ti is two-
phase).
– Ti can unlock a node Q only if none of the children of Q are currently locked by Ti.
• Observe that locks are acquired in root-to-leaf order, whereas they are released in leaf-to-root
order.
• Modifying the database without ensuring that the transaction will commit may leave the database
in an inconsistent state.
• Consider transaction Ti that transfers $50 from account A to account B; goal is either to perform
all database modifications made by Ti or none at all.
• Several output operations may be required for Ti (to output A and B). A failure may occur after
one of these modifications have been made but before all of them are made.
• To ensure atomicity despite failures, we first output information describing the modifications to
stable storage without modifying the database itself.
• We study two approaches:
– log-based recovery, and
– shadow-paging
• We assume (initially) that transactions run serially, that is, one after the other.
Recovery Algorithms
• Recovery algorithms are techniques to ensure database consistency and transaction atomicity and
durability despite of failures
– Focus of this chapter
• Recovery algorithms have two parts
– Actions taken during normal transaction processing to ensure enough information exists
to recover from failures
– Actions taken after a failure to recover the database contents to a state that ensures
atomicity, consistency and durability
Log-Based Recovery
• When Ti finishes it last statement, the log record <Ti commit> is written.
• We assume for now that log records are written directly to stable storage (that is, they are not
buffered)
• Two approaches using logs
– Deferred database modification
– Immediate database modification
• The deferred database modification scheme records all modifications to the log, but defers all
the writes to after partial commit.
• Assume that transactions execute serially
• Transaction starts by writing <Ti start> record to log.
• A write(X) operation results in a log record <Ti, X, V> being written, where V is the new value
for X
– Note: old value is not needed for this scheme
• The write is not performed on X at this time, but is deferred.
• When Ti partially commits, <Ti commit> is written to the log
• Finally, the log records are read and used to actually execute the previously deferred writes.
• During recovery after a crash, a transaction needs to be redone if and only if both <Ti start>
and<Ti commit> are there in the log.
• Redoing a transaction Ti ( redoTi) sets the value of all data items updated by the transaction to the
new values.
• Crashes can occur while
– the transaction is executing the original updates, or
– while recovery action is being taken
• example transactions T0 and T1 (T0 executes before T1):
T0: read (A) T1 : read (C)
A: - A - 50 C:- C- 100
Write (A) write (C)
read (B)
B:- B + 50
write (B)
• Below we show the log as it appears at three instances of time.
• If log on stable storage at time of crash is as in case:
(a) No redo actions need to be taken
(b) redo(T0) must be performed since <T0 commit> is present
(c) redo(T0) must be performed followed by redo(T1) since
<T0 commit> and <Ti commit> are present
Checkpoints
• During the scan, perform undo for each log record that belongs to a transaction
in undo-list.
2. Locate the most recent <checkpoint L> record.
3. Scan log forwards from the <checkpoint L> record till the end of the log.
• During the scan, perform redo for each log record that belongs to a transaction
on redo-list
• Example of Recovery
• Go over the steps of the recovery algorithm on the following log:
<T0 start>
<T0, A, 0, 10>
<T0 commit>
<T1 start> /* Scan at step 1 comes up to here */
<T1, B, 0, 10>
<T2 start>
<T2, C, 0, 10>
<T2, C, 10, 20>
<checkpoint {T1, T2}>
<T3 start>
<T3, A, 10, 20>
<T3, D, 0, 10>
<T3 commit>
• Log record buffering: log records are buffered in main memory, instead of of being output
directly to stable storage.
– Log records are output to stable storage when a block of log records in the buffer is full,
or a log force operation is executed.
• Log force is performed to commit a transaction by forcing all its log records (including the
commit record) to stable storage.
• Several log records can thus be output using a single output operation, reducing the I/O cost.
• The rules below must be followed if log records are buffered:
– Log records are output to stable storage in the order in which they are created.
– Transaction Ti enters the commit state only when the log record
<Ti commit> has been output to stable storage.
– Before a block of data in main memory is output to the database, all log records
pertaining to data in that block must have been output to stable storage.
• This rule is called the write-ahead logging or WAL rule
– Strictly speaking WAL only requires undo information to be output
Database Buffering
– Before writing a data item, transaction acquires exclusive lock on block containing the
data item
– Lock can be released once the write is completed.
• Such locks held for short duration are called latches.
– Before a block is output to disk, the system acquires an exclusive latch on the block
• Ensures no update can be in progress on the block
• Database buffer can be implemented either
– in an area of real main-memory reserved for the database, or
– in virtual memory
• Implementing buffer in reserved main-memory has drawbacks:
– Memory is partitioned before-hand between database buffer and applications, limiting
flexibility.
– Needs may change, and although operating system knows best how memory should be
divided up at any time, it cannot change the partitioning of memory.
• Database buffers are generally implemented in virtual memory in spite of some drawbacks:
– When operating system needs to evict a page that has been modified, the page is written
to swap space on disk.
– When database decides to write buffer page to disk, buffer page may be in swap space,
and may have to be read from swap space on disk and output to the database on disk,
resulting in extra I/O!
• Known as dual paging problem.
– Ideally when OS needs to evict a page from the buffer, it should pass control to database,
which in turn should
• Output the page to database instead of to swap space (making sure to output log
records first), if it is modified
• Release the page from the buffer, for the OS to use
Dual paging can thus be avoided, but common operating systems do not support such functionality.
• Support for high-concurrency locking techniques, such as those used for B+-tree concurrency
control, which release locks early
– Supports “logical undo”
• Recovery based on “repeating history”, whereby recovery executes exactly the same actions as
normal processing
– including redo of log records of incomplete transactions, followed by subsequent undo
– Key benefits
• supports logical undo
• easier to understand/show correctness
1. If a log record <Ti, X, V1, V2> is found, perform the undo and log a special redo-only log
record <Ti, X, V1>.
2. If a <Ti, Oj, operation-end, U> record is found
• Rollback the operation logically using the undo information U.
– Updates performed during roll back are logged just like during normal
operation execution.
– At the end of the operation rollback, instead of logging an operation-
end record, generate a record
<Ti, Oj, operation-abort>.
• Skip all preceding log records for Ti until the record
<Ti, Oj operation-begin> is found
4. If a redo-only record is found ignore it
5. If a <Ti, Oj, operation-abort> record is found:
• skip all preceding log records for Ti until the record
<Ti, Oj, operation-begin> is found.
6. Stop the scan when the record <Ti, start> is found
7. Add a <Ti, abort> record to the log
Some points to note:
• Cases 3 and 4 above can occur only if the database crashes while a transaction is being rolled
back.
• Skipping of log records as in case 4 is important to prevent multiple rollback of the same
operation.
• Advanced Recovery: Txn Rollback Example
• Example with a complete and an incomplete operation
ARIES
ARIES Optimizations
• Physiological redo
– Affected page is physically identified, action within page can be logical
Used to reduce logging overheads
– e.g. when a record is deleted and all other records have to be moved to
fill hole
» Physiological redo can log just the record deletion
» Physical redo would require logging of old and new values for
much of the page
Requires page to be output to disk atomically
– Easy to achieve with hardware RAID, also supported by some disk
systems
– Incomplete page output can be detected by checksum techniques,
» But extra actions are required for recovery
» Treated as a media failure
ARIES Data Structures
• Each log record contains LSN of previous log record of the same transaction
– LSN in log record may be implicit
• Special redo-only log record called compensation log record (CLR) used to log actions taken
during recovery that never need to be undone
– Serves the role of operation-abort log records used in advanced recovery algorithm
– Has a field UndoNextLSN to note next (earlier) record to be undone
Records in between would have already been undone
Required to avoid repeated undo of already undone actions
• DirtyPageTable
– List of pages in the buffer that have been updated
– Contains, for each such page
PageLSN of the page
RecLSN is an LSN such that log records before this LSN have already been
applied to the page version on disk
– Set to current end of log when a page is inserted into dirty page table
(just before being updated)
– Recorded in checkpoints, helps to minimize redo work
• Remote backup systems provide high availability by allowing transaction processing to continue
even if the primary site is destroyed.
Detection of failure: Backup site must detect when primary site has failed
– to distinguish primary site failure from link failure maintain several communication links
between the primary and the remote backup.
– Heart-beat messages
Transfer of control:
– To take over control backup site first perform recovery using its copy of the database and
all the long records it has received from the primary.
• Thus, completed transactions are redone and incomplete transactions are rolled
back.
– When the backup site takes over processing it becomes the new primary
– To transfer control back to old primary when it recovers, old primary must receive redo
logs from the old backup and apply all updates locally.
Time to recover: To reduce delay in takeover, backup site periodically proceses the redo log records (in
effect, performing recovery from previous database state), performs a checkpoint, and can then delete
earlier parts of the log.
Two-very-safe: commit when transaction’s commit log record is written at primary and backup
– Reduces availability since transactions cannot commit if either site fails.
Two-safe: proceed as in two-very-safe if both primary and backup are active. If only the primary is
active, the transaction commits as soon as is commit log record is written at the primary.
– Better availability than two-very-safe; avoids problem of lost transactions in one-safe.