[go: up one dir, main page]

0% found this document useful (0 votes)
123 views40 pages

Unit-3 PPT

DBMS UNIT 3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views40 pages

Unit-3 PPT

DBMS UNIT 3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

UNIT III

TRANSACTIONS
Transaction Concepts – ACID Properties –
Schedules – Serializability – Concurrency Control
– Need for Concurrency – Locking Protocols –
Two Phase Locking – Deadlock – Transaction
Recovery.
TRANSACTIONS
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to
perform operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y’s
account. This small transaction contains several low-level tasks:
X's Account Open_Account(X) Old_Balance = X.balance
• New_Balance = Old_Balance - 800 X.balance = New_Balance
Close_Account(X)
Y's Account Open_Account(Y) Old_Balance = Y.balance
• New_Balance = Old_Balance + 800 Y.balance = New_Balance
Close_Account(Y)
Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in
main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following
operations:
1. R(X);
2. X = X - 500;
3. W(X);
Let's assume the value of X before starting of the transaction is 4000.
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
The third operation will write the buffer's value to the database. So X's final value will be 3500.
For example: If in the above transaction, the debit transaction fails after executing operation 2 then
X's value will remain 4000 in the database which is not acceptable by the bank.
To solve this problem, we have two important operations:
• Commit: It is used to save the work done permanently.
• Rollback: It is used to undo the work done.
ACID PROPERTIES
The transaction has the four properties. These are used to maintain consistency in a
database, before and after the transaction.
Property of Transaction
• Atomicity
• Consistency
• Isolation
• Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the transaction is
aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is
treated as one unit and either run to completion or is not executed at all.
Atomicity involves the following two operations:
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Consistency
o The integrity constraints are maintained so that the database is consistent before and after the
transaction.
o The execution of a transaction will leave a database in either its prior stable state or a new stable
state.
o The consistent property of database states that every transaction sees a consistent database
instance.
o The transaction is used to transform the database from one consistent state to another consistent
state.
Isolation
o It shows that the data which is used at the time of execution of a transaction cannot be used
by the second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then that data
item can't be accessed by any other transaction T2 until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.
Durability
o The durability property is used to indicate the performance of the database's consistent state.
It states that the transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the system
failure. When a transaction is completed, then the database reaches a state known as the
consistent state. That consistent state cannot be lost, even in the event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.

TRANSACTION STATE:
A transaction must be in one of the following states:
1. Active, the initial state; the transaction stays in this state while it is executing.
2. Partially committed, after the final statement has been execute.
3. Failed, after the discovery that normal execution can no longer Proceed.
4. Aborted, after the transaction has been rolled back and the database has been restored to its state
prior to the start of the transaction.
5. Committed, after successful completion.
The state diagram corresponding to a transaction
SCHEDULES
 Transaction-processing systems usually allow multiple transactions to run
concurrently.
 Allowing multiple transactions to update data concurrently causes several
complications with consistency of the data.
 There are two good reasons for allowing concurrency:
Improved throughput and resource utilization. A transaction consists of many
steps. Some involve I/O activity; others involve CPU activity. Therefore, I/O
activity can be done in parallel with processing at the CPU. All of this increases the
throughput of the system correspondingly; the processor and disk utilization also
increases.
Reduced waiting time. There may be a mix of transactions running on a system,
some short and some long. Concurrent execution reduces the unpredictable delays
in running transactions. Moreover, it also reduces the average response time: the
average time for a transaction to be completed after it has been submitted.
Schedule 1—a serial schedule in which T1 is followed by T2.
Schedule 2—a serial schedule in which T2 is followed by T1
Schedule 3—a concurrent schedule equivalent to schedule 1.
Schedule 4—a concurrent schedule.
SERIALIZABILITY
• The database system must control concurrent execution of transactions, to ensure that the database state
remains consistent.
• Since transactions are programs, it is computationally difficult to determine exactly what operations a
transaction performs and how operations of various transactions interact.
Two operations:
Read and write. We thus assume that, between a read(Q) instruction and a write(Q) instruction on a data item
Q, a transaction may perform an arbitrary sequence of operations on the copy of Q that is residing in the local
buffer of the transaction.
• Thus, the only significant operations of a transaction, from a scheduling point of view, are its read and write
instructions. T1 T2
read(A)
write(A)
read(A)
write(A)
1. Conflict Serializability
• Let us consider a schedule S in which there are two consecutive instructions Ii and Ij, of transactions Ti and Tj
, respectively (i _= j). If Ii and Ij refer to different data items, then we can swap Ii and Ij without affecting the
results of any instruction in the schedule.
2. View Serializability
• Consider two schedules S and S_, where the same set of transactions participates
in both schedules. The schedules S and S_ are said to be view equivalent if three
conditions are met:
1.For each data item Q, if transaction Ti reads the initial value of Q in schedule S,
then transaction Ti must, in schedule S_, also read the initial value of Q.
2.For each data item Q, if transaction Ti executes read(Q) in schedule S, and if that
value was produced by a write(Q) operation executed by transaction Tj, then the
read(Q) operation of transaction Ti must, in schedule S_, also read the value of Q
that was produced by the same write(Q) operation of transaction Tj.
3.For each data item Q, the transaction (if any) that performs the final write(Q)
operation in schedule S must perform the final write(Q) operation in schedule S_.
TESTING FOR SERIALIZABILITY
• When designing concurrency control schemes, we must show that schedules generated by the scheme
are serializable. To do that, we must first understand how to determine, given a particular schedule S,
whether the schedule is serializable.
• We now present a simple and efficient method for determining conflict serializability of a schedule.
• Consider a schedule S. We construct a directed graph, called a precedence graph, from S.
• This graph consists of a pair G = (V, E), where V is a set of vertices and E is a set of edges. The set of
vertices consists of all the transactions participating in the schedule.
• The set of edges consists of all edges Ti →Tj for which one of three conditions holds:
1. Ti executes write(Q) before Tj executes read(Q).
2. Ti executes read(Q) before Tj executes write(Q).
3. Ti executes write(Q) before Tj executes write(Q).
CONCURRENCY CONTROL
o In the concurrency control, the multiple transactions can be executed simultaneously.
o It may affect the transaction result. It is highly important to maintain the order of execution of
those transactions.
Problems of concurrency control
• Several problems can occur when concurrent transactions are executed in an uncontrolled manner.
Following are the three problems in concurrency control.
1. Lost updates
2. Dirty read
3. Unrepeatable read
1. Lost update problem
o When two transactions that access the same database items contain their operations in a way
that makes the value of some database item incorrect, then the lost update problem occurs.
o If two transactions T1 and T2 read a record and then update it, then the effect of updating of
the first record will be overwritten by the second update.
2. Dirty Read
o The dirty read occurs in the case when one transaction updates an item of the database, and then the
transaction fails for some reason. The updated database item is accessed by another transaction before
it is changed back to the original value.
o A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has values which have
never formed part of the stable database.
3. Inconsistent Retrievals Problem
o Inconsistent Retrievals Problem is also known as unrepeatable read. When a transaction calculates
some summary function over a set of data while the other transactions are updating the data, then the
Inconsistent Retrievals Problem occurs.
o A transaction T1 reads a record and then does some other processing during which the transaction T2
updates the record. Now when the transaction T1 reads the record, then the new value will be
inconsistent with the previous value.
o Transaction-X is doing the sum of all balance while transaction-Y is transferring an amount 50 from
Account-1 to Account-3.
o Here, transaction-X produces the result of 550 which is incorrect. If we write this produced result in the
database, the database will become an inconsistent state because the actual sum is 600.
o Here, transaction-X has seen an inconsistent state of the database.
THE NEED FOR CONCURRENCY
CONTROL
A key purpose in developing a database is to facilitate multiple users to access shared
data in parallel (i.e., at the same time). Concurrent accessing of data is comparatively
easy when all users are only reading data, as there is no means that they can interfere
with one another. However, when multiple users are accessing the database at the same
time, and at least one is updating data, there may be the case of interference which can
result in data inconsistencies.
• Concurrency control technique implements some protocols which can be broadly
classified into two categories. These are:
1. Lock-based protocol: Those database systems that are prepared with the concept of lock-based
protocols employ a mechanism where any transaction cannot read or write data until it gains a
suitable lock on it.
2. Timestamp-based Protocol: It is the most frequently used concurrency protocol is the timestamp
based protocol. This protocol uses either system time or logical counter as a timestamp.
LOCK-BASED PROTOCOLS
 One way to ensure serializability is to require that data items be accessed in a
mutually exclusive manner, while one transaction is accessing a data item, no
other transaction can modify it.
 Lock is the most common mechanism to implement this requirement.

Lock:
 Mechanism to control concurrent access to a data item.
 Lock requests are made to concurrency-control manager.
 Transaction can proceed only after request is granted. Data items can be locked
in two modes:
1. exclusive mode (X): Data item can be both read as well as written. X-lock is
requested using lock-X(A) instruction.
2. shared mode (S): Data item can only be read. S-lock is requested using lock-
S(A) instruction. Locks can be released: U-lock(A)
Locking protocol:
• A set of rules followed by all transactions while requesting and releasing locks.
Locking protocols restrict the set of possible schedules. Ensure serializable schedules by
delaying transactions that might violate serializability.
• Lock-compatibility matrix tells whether two locks are compatible or not. Any number of
transactions can hold shared locks on a data item. If any transaction holds an exclusive
lock on a data item no other transaction may hold any lock on that item.
Locking Rules/Protocol:
 A transaction may be granted a lock on an item if the requested lock is compatible with
locks already held on the item by other transactions.
 If a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. The lock is then
granted.
• Lock 2
• Lock 1
Pitfalls of Lock-Based Protocols:
 Too early unlocking can lead to non-serializable schedules. Too late unlocking can lead
to deadlocks.
Example 12. display A + B
• Transaction T1 transfers $50 from account B to account 13. X-lock(A)
A. Transaction T2 displays the total amount of money in 14. read A
accounts A and B, that is, the sum of A + B. 15. A := A+50
• Early unlocking can cause incorrect results, non- 16. write A
serializable schedules. if A and B get updated in- between 17. U-lock(A) T1 T2
the read of A and B, the displayed sum would be wrong.

e.g., A = $100, B = $200, display A + B shows $250


1. X-lock(B)
2. read B
3. B := B-50
4. write B
5. U-lock(B)
6. S-lock(A)
7. read A
8. U-lock(A)
9. S-lock(B)
10. read B
11. U-lock(B)
• Late unlocking causes deadlocks. Neither T1 nor T2 can make progress:
executing lock-S(B) causes T2 to wait for T1 to release its lock on B.
executing lock-X(A) causes T1 to wait for T2 to release its lock on A. To
handle a deadlock one of T1 or T2 must be rolled back and its locks
released.
1. X-lock(B)
2. read(B) 3. B := B-50
4. write(B)
5. S-lock(A)
6. read(A)
7. S-lock(B)
8. X-lock(A) T1 T2
Two-Phase Locking Protocol:
A locking protocol that ensures conflict-serializable schedules. It works in two phases:
o Phase 1: Growing Phase: transaction may obtain locks transaction may not release locks

o Phase 2: Shrinking Phase: transaction may release locks transaction may not obtain locks
 When the first lock is released, the transaction moves from phase 1 to phase 2.
 Properties of the Two-Phase Locking Protocol. Ensures serializability It can be shown that the
transactions can be serialized in the order of their lock points (i.e. the point where a transaction
acquired its final lock).
 Does not ensure freedom from deadlocks. Cascading roll-back is possible. Modifications of the
two-phase locking protocol
Strict two-phase locking
* A transaction must hold all its exclusive locks till it commits/aborts
* Avoids cascading roll-back Rigorous two-phase
locking
 All locks are held till commit/abort. Transactions can be serialized in the order in which they
commit.
 Refine the two-phase locking protocol with lock conversions
• Phase 1: Acquire a lock-S on item can acquire a lock-X on item can convert a
lock-S to a lock-X (upgrade)
• Phase 2: can release a lock-S,can release a lock-X,can convert a lock-X to a lock-
S (downgrade)
* Ensures serializability; but still relies on the programmer to insert the various
locking instructions.
• *Strict and rigorous two-phase locking (with lock conversions) are used
extensively in DBMS.
Implementation of Locking:
 A lock manager can be implemented as a separate process to which transactions

send lock and unlock requests.


 The lock manager replies to a lock request by sending a lock grant message (or a

message asking the transaction to roll back, in case of a deadlock).


 The requesting transaction waits until its request is answered.

 The lock manager maintains a data structure called a lock table to record granted

locks and pending requests.


Lock table:
 Implemented as in-memory hash table indexed on the data item being locked.

Black rectangles indicate granted locks.


 White rectangles indicate waiting requests. Records also the type of lock

granted/requested.
 Processing of requests:

 New request is added to the end of the queue of requests for the data item,

and granted if it is compatible with all earlier locks.


 Unlock requests result in the request being deleted, and later requests are

checked to see if they can now be granted.


 If transaction aborts, all waiting or granted requests of the transaction are

deleted. Index on transaction to implement this efficiently.


TIMESTAMP BASED PROTOCOL
 Timestamp based protocol is the locking protocols and the order between every pair of
conflicting transactions at execution time by the first that both members of the pairs request that
involves incompatible modes.
 Another method for determining the serializability order is to select an ordering among
transaction in advance. The most common method for doing so is to use a timestamp ordering
scheme.
Timestamps:
 This timestamp is assigned by the database system before the transaction Ti starts execution.

 If a transaction Ti has been assigned timestampTS(Ti)and a new transaction Tj enters the system,
thenTS(Ti)<TS(Tj).There are two simple method for implementing this scheme.
1. Use the value of the system clock as the timestamp,(i.e) a transactions timestamp is equal to the
value of the clock when the transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been assigned and a transaction
timestamp is equal to the value of the counter when the transaction enter the system.
To implement this scheme we associate with each data item Q two timestamp values.
 W-timestamp(Q) denotes the largest timestamp of any transaction that executed write(Q)
successfully.
 R-timestamp(Q) denotes the largest timestamp of any transaction that executed read(Q)
successfully.
Thomas’ Write rule:
 Let us consider schedule 4 and apply the timestamp ordering protocol.Since T16 starts before T17 we
shall assume that TS(T16)<TS(T17).
 The read(Q) operation of T16 succeeds as does the write (Q) operation,we find that TS(T16)<W-
timestamp(Q),since W-timestamp(Q)=TS(T17).
 Thus the write (Q) by T16 is rejected and transaction T16 must be rolled back.

 Although the rollback of T16 is required by the timestamp ordering protocol it is unnecessary.Since
T17 has already written Q,the value that T16 is attempting to write is one that will never need to be
read.
 Any transaction Ti with TS(Ti)<TS(T17) that attempts a read(Q) will be rolled back.Since TS(Ti)<W-
timestamp(Q).
• The modification to the timestamp-ordering protocol called Thomas write rule is this.Suppose that
transaction Ti issues write(Q).
1. If TS(Ti)<R-timestamp(Q),then the value of Q that Ti is producing was previously needed and it had been
assumed that the value would never be produced.Hence the system rejects the write operation and rolls Ti back.
2. If TS(Ti)<W-timestamp(Q) then Ti is attempting to write an obsolute value of Q.Hence the write operation can
be ignored.
3. Otherwise the system executes the write operation and sets W-timestamp(Q) to TS(Ti).
• Under Thomas’ write rule the write(Q) operation of T16 would be ign ored.The result is a schedule
that is view equivalent to the serial schedule<T16,T17>.
DEADLOCK
• A deadlock is a condition where two or more transactions are waiting indefinitely for one another
to give up locks. Deadlock is said to be one of the most feared complications in DBMS as no task
ever gets finished and is in waiting state forever.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update
some rows in the grade table. Simultaneously, transaction T2 holds locks on some rows in the grade
table and needs to update the rows in the Student table held by Transaction T1.
• Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and
similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a halt state and
remain at a standstill. It will remain in a standstill until the DBMS detects the deadlock and aborts
one of the transactions.
Deadlock Prevention
• To prevent any deadlock situation in the system, the DBMS aggressively inspects all the operations, where
transactions are about to execute. The DBMS inspects the operations and analyzes if they can create a deadlock
situation. If it finds that a deadlock situation might occur, then that transaction is never allowed to be executed.
• There are deadlock prevention schemes that use timestamp ordering mechanism of transactions in order to
predetermine a deadlock situation.
Wait-Die Scheme
• In this scheme, if a transaction requests to lock a resource (data item), which is already held with a conflicting
lock by another transaction, then one of the two possibilities may occur −
 If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than Tj − then Ti is allowed to wait
until the data-item is available.
 If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a random delay but with the
same timestamp.
• This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
• In this scheme, if a transaction requests to lock a resource (data item), which is already held with conflicting lock
by some another transaction, one of the two possibilities may occur −
• If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later with a random
delay but with the same timestamp.
• If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
• This scheme, allows the younger transaction to wait; but when an older transaction requests an item held by a
younger one, the older transaction forces the younger one to abort and release the item.
• In both the cases, the transaction that enters the system at a later stage is aborted.
Deadlock Avoidance
• Aborting a transaction is not always a practical approach. Instead, deadlock avoidance
mechanisms can be used to detect any deadlock situation in advance. Methods like "wait-for
graph" are available but they are suitable for only those systems where transactions are
lightweight having fewer instances of resource. In a bulky system, deadlock prevention
techniques may work well.
Wait-for Graph
• This is a simple method available to track if any deadlock situation may arise. For each
transaction entering into the system, a node is created. When a transaction Ti requests for a lock
on an item, say X, which is held by some other transaction Tj, a directed edge is created from Ti
to Tj. If Tj releases item X, the edge between them is dropped and Ti locks the data item.
• The system maintains this wait-for graph for every transaction waiting for some data items held
by others. The system keeps checking if there's any cycle in the graph.
TRANSACTION RECOVERY
Recovery system
• An integral part of a database system is a recovery scheme that can restore the database to the
consistent state that existed before the failure. The recovery scheme must also provide high availability;
that is, it must minimize the time for which the database is not usable after a failure. Recovering a system
from failure crash is called recovery system or crash recovery.
FAILURE CLASSIFICATION
• Various types of failure that may occur in a system
1. Transaction failure
a. Logical errors - The Transaction cannot complete due to some internal error condition

b. System errorsThe database system must terminate an active transaction due to an error condition. (e.g.,
deadlock)
2. System crash:
• A power failure or other hardware or software failure causes the system to crash.
Fail stop assumption
• A non-volatile storage contents are assumed to not be corrupted by system crash. Database systems
have numerous integrity checks to prevent corruption of disk data
1. Disk failure
• A head crash or similar disk failure destroys all or part of Disk. Destruction is assumed to be detectable:
disk drives use checksums to detect failures.
RECOVERY AND ATOMICITY
 Modifying the database without ensuring that the transaction will commit

may leave the database in an inconsistent state.


 Consider transaction Ti that transfers $50 from account A to account B; goal

is either to perform all database modifications made by Ti or none at all.


 Several output operations may be required for Ti (to output A and B). A

failure may occur after one of these modifications have been made but before
all of them are made.
• To ensure atomicity despite failures, we first output information describing
the modifications to stable storage without modifying the database itself.
Two approaches:
1. log based recovery
2. shadow paging
• Assume that transactions run serially, that are one after the other.
LOG BASED RECOVERY
 A log is kept on stable storage.
 The log is a sequence of log records, and maintains a record of update activities
on the database.
 When transaction Ti starts, it registers itself by writing a<Ti start>log record
 Before Ti executes write(X), a log record <Ti, X, V1, V2> is written,
 where V1 is the value of X before the write, and V2 is the value to be written to x.
 Log record notes that Ti has performed a write on data item Xj Xj had value V1
before the write, and will have value V2 after the write.
 When Ti finishes it last statement, the log record <Ti commit> is written to .x
 We assume for now that log records are written directly to stable storage (that is,
they are not buffered)
 Two approaches using logs

 Deferred database modification

 Immediate database modification


SHADOW PAGING
 Shadow paging is an alternative to log-based recovery
 It is useful if transactions execute serially
 Requires fewer disk access, compare to log based methods the database is partitioned into
some number of fixed-length blocks, which are referred to as pages.
 Page Table is used locate the Ith page of database for given i.
It maintain two page tables during the lifetime of a transaction
o current page table and
o shadow page table

 Store the shadow page table in nonvolatile storage, such that state of the database prior to
transaction execution may be recovered.
 Shadow page table is never modified during execution. To start with, both the page tables are
identical. Only current page table is used for data item accesses during execution of the
transaction.
Whenever any page is about to be written for the first time.
 A copy of this page is made onto an unused page.
 The current page table is then made to point to the copy.
• Advantages of shadow-paging over log-based schemes
 no overhead of writing log records
 recovery is trivial
• Disadvantages:
 Copying the entire page table is very expensive
 Can be reduced by using a page table structured like a B+-tree

 No need to copy entire tree, only need to copy paths in the tree that lead to updated leaf
nodes
SAVEPOINTS
• When a system crash occurs, we must consult the log to determine those transactions that need to
be redone and those that need to be undone. In principle, we need to search the entire log to
determine this information. There are two major difficulties with this approach:
1. The search process is time-consuming.
2. Most of the transactions that, according to our algorithm, need to be redone have already written
their updates into the database. Although redoing them will cause no harm, it will nevertheless
cause recovery to take longer.
• To reduce these types of overhead, we introduce savepoints.
(a) Does not permit any updates to be performed while the savepoint operation is in progress, and
(b) Outputs all modified buffer blocks to disk when the savepoint is performed.
ISOLATION LEVELS
• The SQL standard defines the different isolation levels in terms of three phenomena that violate
serializability. The three phenomena are called dirty read, non-repeatable read, and phantom read,
and are defined as follows:
• Dirty read: The transaction reads values written by another transaction that hasn’t committed yet.
• Non-repeatable read: A transaction reads the same object twice during execution and finds a
different value the second time, although the transaction has not changed the value in the
meantime.
• Phantom read: A transaction re-executes a query returning a set of rows that satisfy a search
condition and finds that the set of rows satisfying the condition has changed as a result of another
recently committed transaction.

You might also like