ADVANCED DATABASE
Chapter 3- Transaction Processing Concepts
Introduction
► A transaction can be defined as a group of tasks.
► A single task is the minimum processing unit which
cannot be divided further. The concept of transaction
provides a mechanism for describing logical units of
database processing.
► Transaction processing systems are systems with large
databases and hundreds of concurrent users executing
database transactions.
► Examples of such systems include airline reservation,
banking, credit card processing, stock markets, and so
on.
Con…
► One criterion for classifying a database system is according
to the number of users who can use the system
concurrently.
► A DBMS is single-user if at most one user at a time can use
the system, and it is multiuser if many users can use the
system at a time.
► Multiple users can access databases simultaneously
because of the concept of multiprogramming, which allows
the computer to execute multiple programs or processes at
the same time.
► A transaction is an executing program that forms a logical
unit of database processing.
► A transaction includes one or more database access
operations – these can include insertion, deletion,
Read and write operation
The DBMS will generally maintain a number of buffers in main
memory that hold database disk blocks containing the database
items being processed.
► read_item(X): Reads a database item named X into a program
variable.
► write_item(X): Writes the value of program variable X into the
database item named X.
1. Steps involved in read_item(X):
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory.
3. Copy item X from the buffer to the program variable named X.
Con…
Steps involved in write_item(X):
1. Find the address of the disk block that contains item X.
2. Copy the disk block into a buffer in main memory.
3. Copy item X from the program variable named X into its correct
location in the buffer.
4. Store the updated block from the buffer back to disk.
State of transaction
► Active, the initial state; the transaction stays in this
state while it is executing
► Partially committed, after the final statement has
been executed.
► Failed, after the discovery that normal execution can
no longer proceed.
► Aborted, after the transaction has been rolled back
and the database restored to its state prior to the start
of the transaction. Two options after it has been
aborted:
► restart the transaction – only if no internal logical
error
► kill the transaction
Process in Transaction
► A transaction is an atomic unit of work that is either
completed in its entirety or not done at all.
► For recovery purposes, the system needs to keep track of
when the transaction starts, terminates, and commits or
aborts. Therefore, the recovery manager keeps track of the
following operations:
1.begin_transaction: This marks the beginning of transaction
execution.
2.read or write: These specify read or write operations on the
database items that are executed as part of a transaction.
3.end_transaction: This specifies that read and write
transaction operations have ended and marks the end of
transaction execution.
Con….
4.commit_transaction: This signals a successful end of the
transaction so that any changes (updates) executed by the
transaction can be safely committed to the database and will
not be undone.
5.rollback (or abort): This signals that the transaction has
ended unsuccessfully; so that any changes or effects that the
transaction may have applied to the database must be undone.
Properties of a transaction
► Four basic (ACID) properties of a transaction:
1. Atomicity
All changes to data are performed as if they are a single operation.
That is, all the changes are performed, or none of them are.
2. Consistency
Data is in a consistent state when a transaction starts and when it
ends.
3. Isolation
The intermediate state of a transaction is invisible to other
transactions. As a result, transactions that run concurrently appear
to be serialized.
4. Durability
After a transaction successfully completes, changes to data persist
and are not undone, even in the event of a system failure.
Example of Transaction
► Example :
Consider two transactions T1 and T2.
T1 : Deposits 1000 birr to both accounts X and Y.
T2 : Doubles the balance of account X and Y.
T1
► Read (X)
X ← X + 1000
Write (X)
Read (Y)
Y ← Y + 1000
Write (Y)
Transaction and schedule
► Schedule, as the name suggests, is a process of lining the
transactions and executing them one by one.
► When there are multiple transactions that are running in a
concurrent manner and the order of operation is needed to be set
so that the operations do not overlap each other,
► Scheduling is brought into play and the transactions are timed
accordingly.
Types of Schedules
1. Serial Schedules:
Schedules in which the transactions are executed non-
interleaved, i.e., a serial schedule is one in which no
transaction starts until a running transaction has ended are
called serial schedules.
Example: Consider the following schedule involving two
transactions T 1 and T2 .
Types of Schedules
► 2. Non-Serial Schedule:
This is a type of Scheduling where the operations of
multiple transactions are interleaved. This might lead to a
rise in the concurrency problem.
► The transactions are executed in a non-serial manner,
keeping the end result correct and same as the serial
schedule.
► Unlike the serial schedule where one transaction must wait
for another to complete all its operation, in the non-serial
schedule, the other transaction proceeds without waiting
for the previous transaction to complete.
Types of Schedules
► This sort of schedule does not provide any benefit of the
concurrent transaction. It can be of two types namely,
Serializable and Non-Serializable Schedule. The Non-Serial
Schedule can be divided further into Serializable and Non-
Serializable.
A) Serializable:
This is used to maintain the consistency of the database. It is
mainly used in the Non-Serial scheduling to verify whether
the scheduling will lead to any inconsistency or not.
On the other hand, a serial schedule does not need the
serializability because it follows a transaction only when the
previous transaction is complete.
The non-serial schedule is said to be in a serializable schedule
only when it is equivalent to the serial schedules, for an n
number of transactions
Types of Schedules
1. Conflict serializable
A schedule is called conflict serializable if it can be transformed
into a serial schedule by swapping non-conflicting operations.
Two operations are said to be conflicting if all conditions satisfy:
► They belong to different transactions
► They operate on the same data item
► At Least one of them is a write operation
2. View serializable
A Schedule is called view serializable if it is view equal to a serial
schedule (no overlapping transactions). A conflict schedule is a
view serializable but if the serializability contains blind writes,
then the view serializable does not conflict serializable.
Types of Schedules
B) Non-Serializable: The non-serializable schedule is divided into
two types, Recoverable and Non-recoverable Schedule.
1. Recoverable Schedule
Schedules in which transactions commit only after all
transactions whose changes they read commit are called
recoverable schedules. In other words, if some transaction Tj is
reading value updated or written by some other transaction Ti ,
then the commit of Tj must occur after the commit of Ti .
Types of Schedules
► There can be three types of recoverable schedule:
1. Cascading Schedule: Also called Avoids cascading aborts/
rollbacks (ACA). When there is a failure in one transaction and this
leads to the rolling back or aborting other dependent transactions,
then such scheduling is referred to as Cascading rollback or
cascading abort.
2. Cascade-less schedule: Schedules in which transactions read values
only after all transactions whose changes they are going to read
commit are called cascade-less schedules. Avoids that a single
transaction abort leads to a series of transaction rollbacks. A strategy
to prevent cascading aborts is to disallow a transaction from reading
uncommitted changes from another transaction in the same
schedule.
Types of Schedules
► This schedule is cascade-less. Since the updated value of A is read
by T2 only after the updating transaction i.e. T1 commits.
Types of Schedules
► It is a recoverable schedule but it does not avoid cascading aborts.
It can be seen that if T1 aborts, T 2 will have to be aborted too in
order to maintain the correctness of the schedule as T2 has
already read the uncommitted value written by T 1.
Con…
3. Strict Schedule:
A schedule is strict if for any two transactions Ti , Tj, if a write
operation of Ti precedes a conflicting operation of Tj (either read or
write), then the commit or abort event of Ti also precedes that
conflicting operation of Tj.
In other words, Tj can read or write updated or written value of
Ti only after T i commits/aborts.
Types of Schedules
2. Non-Recoverable Schedule: Consider the following schedule
involving two transactions T 1 and T2.
T2 read the value of A written by T1, and committed. T1 later aborted,
therefore the value read by T2 is wrong, but since T2 committed, this
schedule is non-recoverable.
Con…
Transaction Support in SQL
► With SQL, there is no explicit Begin_Transaction statement.
Transaction initiation is done implicitly when particular SQL
statements are encountered. However, every transaction must
have an explicit end statement, which is either a COMMIT or a
ROLLBACK.
► Transaction Control
The following commands are used to control transactions.
1. COMMIT − to save the changes.
2. ROLLBACK − to roll back the changes.
3. SAVEPOINT − creates points within the groups of transactions in
which to ROLLBACK.
4. SET TRANSACTION − Places a name on a transaction.
Con…
1. The COMMIT Command
► The COMMIT command is the transactional command used to save
changes invoked by a transaction to the database. The COMMIT
command saves all the transactions to the database since the last
COMMIT or ROLLBACK command.
SQL > DELETE FROM CUSTOMERS WHERE AGE = 25;
►
SQL > COMMIT;
2. The ROLLBACK Command
► The ROLLBACK command is the transactional command used to undo
transactions that have not already been saved to the database. This
command can only be used to undo transactions since the last COMMIT
SQL > DELETE FROM CUSTOMERS WHERE AGE = 25;
or ROLLBACK command was issued.
SQL > ROLLBACK;
ROLLBACK TO SAVEPOINT_NAME;
Con…
3. The SAVEPOINT Command
► A SAVEPOINT is a point in a transaction when you can roll the
transaction back to a certain point without rolling back the entire
transaction.
SQL > SAVEPOINT SP1; Savepoint created.
SQL > DELETE FROM CUSTOMERS WHERE ID=1; 1 row deleted.
SQL > SAVEPOINT SP2; Savepoint created.
SQL > DELETE FROM CUSTOMERS WHERE ID=2; 1 row deleted.
► This command is used only in the creation of SAVEPOINT among
all the transactions.
► In general ROLLBACK is used to undo a group of transactions.
Syntax for rolling back to Savepoint command:
ROLLBACK TO SAVEPOINT_NAME;
Thank you
ADVANCED DATABASE
Chapter 4- Concurrency control Technique
Introduction
► Concurrency control concept comes under the Transaction
in database management system (DBMS).
► It is a procedure in DBMS which helps us for the
management of two simultaneous processes to execute
without conflicts between each other, these conflicts occur
in multi user systems
► It is required to increase time efficiency. If many
transactions try to access the same data, then inconsistency
arises. Concurrency control required to maintain
consistency data.
► For example, if we take ATM machines and do not use
concurrency, multiple persons cannot draw money at a
time in different places. This is where we need concurrency.
Introduction
The advantages of concurrency control are as follows −
Waiting time will be decreased.
Response time will decrease.
Resource utilization will increase.
System performance & Efficiency is increased.
Why Concurrency Control Is Needed:
1. The Lost Update Problem: This problem occurs when two
transactions that access the same database items have their
operations interleaved in a way that makes the value of some
database items incorrect.
2. The Temporary Update (or Dirty Read) Problem: This problem
occurs when one transaction updates a database item and then
the transaction fails for some reason. The updated item is
accessed by another transaction before it is changed back to its
original value.
3. The Incorrect Summary Problem: If one transaction is
calculating an aggregate summary function on a number of
records while other transactions are updating some of these
records, the aggregate function may calculate some values before
they are updated and others after they are updated.
The Lost Update Problem:
► Consider the below diagram where two transactions T X and TY, are
performed on the same account A where the balance of account A is $300
The Temporary Update (or Dirty Read)
Problem
► Consider two transactions TX and TY in the below diagram performing
read/write operations on account A where the available balance in
account A is $300:
The Incorrect Summary Problem:
Con..
• At time t1, transaction TX reads the value from account A, i.e.,
$300.
• At time t2, transaction TY reads the value from account A, i.e.,
$300.
• At time t3, transaction TY updates the value of account A by
adding $100 to the available balance, and then it becomes $400.
• At time t4, transaction TY writes the updated value, i.e., $400.
• After that, at time t5, transaction TX reads the available value of
account A, and that will be read as $400.
• It means that within the same transaction TX, it reads two
different values of account A, i.e., $ 300 initially, and after
updating made by transaction TY, it reads $400. It is an
unrepeatable read and is therefore known as the Unrepeatable
read problem.
Concurrency Control Technique
► Some of the main techniques used to control concurrent
execution of transactions are based on the concept of locking data
items.
► A lock is a variable associated with a data item that describes the
status of the item with respect to possible operations that can be
applied to it. Generally, there is one lock for each data item in the
database.
► Locks are used as a means of synchronizing the access by
concurrency transactions to the database items.
Types of locks
► Several types of locks are used in concurrency control such as
binary locks and shared/exclusive locks.
Binary Locks:
► A binary lock can have two states or values: locked and unlocked
(or 1 and 0, for simplicity). A distinct lock is associated with each
database item X.
► If the value of the lock on X is 1, item X cannot be accessed by a
database operation that requests the item. If the value of the lock
on X is 0, the item can be accessed when requested.
► We refer to the current value (or state) of the lock associated with
item X as lock(X).
► Two operations, lock_item and unlock_item, are used with binary
locking.
Types of locks
Shared/Exclusive (or Read/Write) Lock:
► Shared lock: These locks are referred to as read locks. If a
transaction T has obtained Shared-lock on data item X, then T can
read X, but cannot write X. Multiple Shared lock can be placed
simultaneously on a data item.
► Exclusive lock: These Locks are referred to as write locks. If a
transaction T has obtained Exclusive lock on data item X, then T
can be read as well as write X. Only one Exclusive lock can be
placed on a data item at a time. This means that a single
transaction exclusively holds the lock on the item.
Types of locks
3. Two-Phase Locking (2PL): A transaction is said to follow the two-
phase locking protocol if all locking operations (read_lock,
write_lock) precede the first unlock operation in the transaction.
Such a transaction can be divided into two phases: an expanding or
growing (first) phase, during which new locks on items can be
acquired but none can be released; and a shrinking (second) phase,
during which existing locks can be released but no new locks can be
acquired.
Two-Phase Locking
Two-Phase Locking
Transaction T1:
• The growing Phase is from steps 1-3.
• The shrinking Phase is from steps 5-7.
• Lock Point at 4
Transaction T2:
• The growing Phase is from steps 2-6.
• The shrinking Phase is from steps 8-9.
• Lock Point at 7
What is LOCK POINT? The Point at which the growing phase ends, i.
e., when a transaction takes the final lock it needs to carry on its
work.
Strict Two-phase locking (Strict-2PL)
The first phase of Strict-2PL is similar to 2PL. In the first phase,
after acquiring all the locks, the transaction continues to execute
normally.
The only difference between 2PL and strict 2PL is that Strict-2PL
does not release a lock after using it.
Strict-2PL waits until the whole transaction to commit, and then it
releases all the locks at a time.
Strict-2PL protocol does not have shrinking phase of lock release.
1. Deadlock
► A deadlock is a condition where two or more transactions are
waiting indefinitely for one another to give up locks.
► Deadlock is said to be one of the most feared complications in
DBMS as no task ever gets finished and is in waiting state forever.
► For example, Transaction A might hold a lock on some rows in
the students table and needs to update some rows in
the grade table to finish. Transaction B holds locks on those very
rows in the grade table but needs to update the rows in
the student table held by Transaction A. Transaction A cannot
complete its transaction because of the lock on grade. Transaction
B cannot complete its transaction because of the lock on students.
Example
Deadlock Avoidance
► When a database is stuck in a deadlock state, then it is better to
avoid the database rather than aborting or restating the database.
This is a waste of time and resource.
► Deadlock avoidance mechanism is used to detect any deadlock
situation in advance. A method like "wait for graph" is used for
detecting the deadlock situation but this method is suitable only
for the smaller database. For the larger database, deadlock
prevention method can be used.
Deadlock Detection
► In a database, when a transaction waits indefinitely to obtain a
lock, then the DBMS should detect whether the transaction is
involved in a deadlock or not. The lock manager maintains a Wait
for the graph to detect the deadlock cycle in the database.
► Wait for Graph
This is the suitable method for deadlock detection. In this method, a
graph is created based on the transaction and their lock. If the
created graph has a cycle or closed loop, then there is a deadlock.
The wait for the graph is maintained by the system for every
transaction which is waiting for some data held by the others. The
system keeps checking the graph if there is any cycle in the graph.
Deadlock detection
Deadlock Prevention
► Deadlock prevention method is suitable for a large database. If
the resources are allocated in such a way that deadlock never
occurs, then the deadlock can be prevented.
► The Database management system analyzes the operations of the
transaction whether they can create a deadlock situation or not. If
they do, then the DBMS never allowed that transaction to be
executed.
Con….
Wait-Die scheme
► In this scheme, if a transaction requests for a resource which is already
held with a conflicting lock by another transaction then the DBMS simply
checks the timestamp of both transactions. It allows the older transaction
to wait until the resource is available for execution.
► Let's assume there are two transactions Ti and Tj and let TS(T) is a
timestamp of any transaction T. If T2 holds a lock by some other
transaction and T1 is requesting for resources held by T2 then the
following actions are performed by DBMS:
1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has
held some resource, then Ti is allowed to wait until the data-
item is available for execution. That means if the older
transaction is waiting for a resource which is locked by the
younger transaction, then the older transaction is allowed to
wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held
some resource and if Tj is waiting for it, then Tj is killed and
restarted later with the random delay but with the same
Con..
Wound wait scheme
► In wound wait scheme, if the older transaction requests for a
resource which is held by the younger transaction, then older
transaction forces younger one to kill the transaction and release
the resource. After the minute delay, the younger transaction is
restarted but with the same timestamp.
► If the older transaction has held a resource which is requested by
the Younger transaction, then the younger transaction is asked to
wait until older releases it.
Con..
Timeouts
Another simple scheme to deal with deadlock is the use of timeouts.
This method is practical because of its low overhead and simplicity.
In this method, if a transaction waits for a period longer than a
system-defined timeout period, the system assumes that the
transaction may be deadlocked and aborts it—regardless of whether
a deadlock actually exists or not.
2. Starvation.
Another problem that may occur when we use locking is starvation,
which occurs when a transaction cannot proceed for an indefinite
period of time while other transactions in the system continue
normally. This may occur if the waiting scheme for locked items is
unfair, giving priority to some transactions over others. One solution
for starvation is to have a fair waiting scheme, such as using a first-
come-first-served queue; transactions are enabled to lock an item in
the order in which they originally requested the lock.
Concurrency Control Based on Timestamp Ordering
► The use of locks, combined with the 2PL protocol, guarantees
serializability of schedules.
► The serializable schedules produced by 2PL have their equivalent
serial schedules based on the order in which executing
transactions lock the items they acquire. If a transaction needs an
item that is already locked, it may be forced to wait until the item
is released.
► Some transactions may be aborted and restarted because of the
deadlock problem.
► A different approach that guarantees serializability involves
using transaction timestamps to order transaction execution for
an equivalent serial schedule.
Timestamps
► Recall that a timestamp is a unique identifier created by the
DBMS to identify a transaction. Typically, timestamp values are
assigned in the order in which the transactions are submitted to
the system, so a timestamp can be thought of as the transaction
start time.
► Timestamps can be generated in several ways. One possibility is
to use a counter that is incremented each time its value is
assigned to a transaction. The transaction time-stamps are
numbered 1, 2, 3, ... in this scheme.
► Another way to implement timestamps is to use the current date/
time value of the system clock and ensure that no two timestamp
values are generated during the same tick of the clock.
The Timestamp Ordering Algorithm
► Read/write proceeds only if last update on that data
item was carried out by an older transaction.
► Otherwise, transaction requesting read/write is
restarted and given a new timestamp.
► Also timestamps for data items:
► read-timestamp - timestamp of last transaction to
read item;
► write-timestamp - timestamp of last transaction to
write item.
Con..
► Consider a transaction T with timestamp ts(T):
► Check the last write on the Data Item
ts(T) < WTS(x)
► x already updated by younger (later) transaction.
► Transaction must be aborted and restarted with a new
timestamp.
► Otherwise the Read continues and the RTS(X) will be set to the
max of ( RTS(x) and ts(T))
Con..
Check the last read and write
If ts(T) < RTS(x)
► X is already read by a younger transaction
► Hence error to update now and restarted with new time stamp
If ts(T) < WTS (x)
► x already written by younger transaction.
► Write can safely be ignored - ignore obsolete write rule. (Don’t
need to restart the transaction)
► Otherwise, operation is accepted and executed.
► WTS(x) is set to ts(T)
Con…
► Whenever the basic TO algorithm detects two conflicting
operations that occur in the incorrect order, it rejects the later of
the two operations by aborting the transaction that issued it.
► The schedules produced by basic TO are hence guaranteed to be
conflict serializable, like the 2PL protocol.
► However, some schedules are possible under each protocol that
are not allowed under the other. Thus, neither protocol allows all
possible serializable schedules. As mentioned earlier, deadlock
does not occur with timestamp ordering. However, cyclic restart
(and hence starvation) may occur if a transaction is continually
aborted and restarted.
Multiversion Concurrency Control
► Other protocols for concurrency control keep the old values of a
data item when the item is updated. These are known
as multiversion concurrency control, because several versions
(values) of an item are maintained.
► When a transaction requires access to an item,
an appropriate version is chosen to maintain the serializability of
the currently executing schedule, if possible. The idea is that some
read operations that would be rejected in other techniques can
still be accepted by reading an older version of the item to
maintain serializability.
► When a transaction writes an item, it writes a new version and the
old version(s) of the item are retained. Some multiver-sion
concurrency control algorithms use the concept of view
serializability rather than conflict serializability.
Multiversion Technique Based on Timestamp
Ordering
► In this method, several versions X1, X2, ..., Xk of each data
item X are maintained. For each version, the value of
version Xi and the following two timestamps are kept:
► read_TS(Xi). The read timestamp of Xi is the largest of all the
timestamps of transactions that have successfully read
version Xi.
► write_TS(Xi). The write timestamp of Xi is the timestamp of
the transaction that wrote the value of version Xi.
► Whenever a transaction T is allowed to execute a write_item(X)
operation, a new version Xk+1 of item X is created, with both
the write_TS(Xk+1) and the read_TS(Xk+1) set to TS(T).
Multiversion Two-Phase Locking Using Certify
Locks
► In this multiple-mode locking scheme, there are three locking
modes for an item: read, write, and certify, instead of just the two
modes (read, write) discussed previously. Hence, the state
of LOCK(X) for an item X can be one of read-locked, write-locked,
certify-locked, or unlocked.
► In the standard locking scheme, once a transaction obtains a
write lock on an item, no other transactions can access that item.
The idea behind multiversion 2PL is to allow other
transactions T to read an item X while a single transaction T holds
a write lock on X. This is accomplished by allowing two
versions for each item X; one version must always have been
written by some committed transaction.
► The second version X is created when a transaction T acquires a
write lock on the item. Other transactions can continue to read
the committed version of X while T holds the write lock.
Transaction T can write the value of X as needed, without
affecting the value of the committed version X.
Con…
► However, once T is ready to commit, it must obtain a certify
lock on all items that it currently holds write locks on before it
can commit. The certify lock is not compatible with read locks, so
the transaction may have to delay its commit until all its write-
locked items are released by any reading transactions in order to
obtain the certify locks.
► Once the certify locks—which are exclusive locks—are acquired,
the committed version X of the data item is set to the value of
version X , version X is discarded, and the certify locks are then
released.
Validation (Optimistic) Concurrency Control
Techniques
► In all the concurrency control techniques we have discussed so
far, a certain degree of checking is done before a database
operation can be executed.
► In optimistic concurrency control techniques, also known as
validation or certification techniques, no checking is done while
the transaction is executing.
► There are three phases for this concurrency control protocol:
Read phase: A transaction can read values of committed data items
from the database. However, updates are applied only to local copies
(versions) of the data items kept in the transaction workspace.
Validation phase: Checking is performed to ensure that
serializability will not be violated if the transaction updates are
applied to the database.
Write phase: If the validation phase is successful, the transaction
updates are applied to the database; otherwise, the updates are
discarded and the trans-action is restarted.
Con…
► The optimistic protocol we describe uses transaction timestamps
and also requires that the write_sets and read_sets of the
transactions be kept by the system. Additionally, start and end
times for some of the three phases need to be kept for each
transaction.
► Recall that the write_set of a transaction is the set of items it
writes, and the read_set is the set of items it reads.
► In the validation phase for transaction Ti, the protocol checks that
Ti does not interfere with any committed transactions or with any
other transactions currently in their validation phase.
Granularity of Data Items and Multiple
Granularity Locking
► All concurrency control techniques assume that the database is
formed of a number of named data items. A database item could
be chosen to be one of the following:
■ A database record
■ A field value of a database record
■ A disk block
■ A whole file
■ The whole database
Granularity of Data Items
► The size of data items is often called the data item granularity.
Fine granularity refers to small item sizes, whereas coarse
granularity refers to large item sizes. Several tradeoffs must be
considered in choosing the data item size.
► The larger the data item size is, the lower the degree of
concurrency permitted.
► On the other hand, the smaller the data item size is, the more the
number of items in the database. Because every item is associated
with a lock, the system will have a larger number of active locks
to be handled by the lock manager. More lock and unlock
operations will be performed, causing a higher overhead.
► Best item size depends on the types of transactions.
Hierarchy of Granularity
► Since the best granularity size depends on the given transaction,
it seems appropriate that a database system should support
multiple levels of granularity, where the granularity level can be
different for various mixes of transactions.
► Could represent granularity of locks in a hierarchical structure.
► Root node represents entire database, level 1s represent files, etc.
► When node is locked, all its descendants are also locked.
► DBMS should check hierarchical path before granting lock.
Levels of Locking
Using Locks for Concurrency Control in
Indexes
► Two-phase locking can also be applied to indexes ,where the
nodes of an index correspond to disk pages. However, holding
locks on index pages until the shrinking phase of 2PL could cause
an undue amount of transaction blocking because searching an
index always starts at the root.
► Therefore, if a transaction wants to insert a record (write
operation), the root would be locked in exclusive mode, so all
other conflicting lock requests for the index must wait until the
transaction enters its shrinking phase. This blocks all other
transactions from accessing the index, so in practice other
approaches to locking an index must be used.
Con…
► The tree structure of the index can be taken advantage of when
developing a con-currency control scheme. For example, when an
index search (read operation) is being executed, a path in the tree
is traversed from the root to a leaf. Once a lower-level node in the
path has been accessed, the higher-level nodes in that path will
not be used again.
► So once a read lock on a child node is obtained, the lock on the
parent can be released. When an insertion is being applied to a
leaf node (that is, when a key and a pointer are inserted), then a
specific leaf node must be locked in exclu-sive mode.
Other Concurrency Control Issues
Reading Assignment
Thank
you
Advanced Database
Chapter-5 Database Recovery Techniques
Database recovery techniques
► It is Process of restoring database to a correct state in the event of
a failure.
► Database recovery techniques are used in database management
systems (DBMS) to restore a database to a consistent state after a
failure or error has occurred.
► The main goal of recovery techniques is to ensure data integrity
and consistency and prevent data loss.
► Example: If the system crashes before a fund transfer
transaction completes its execution, then either one or both
accounts may have incorrect value. Thus, the database must be
restored to the state before the transaction modified any of the
accounts.
Con….
► In every database system, the possibility of a system failure is
always present. Should system failure occur, you must recover
the database as quickly, and with as little detrimental impact on
users, as possible.
► Recovering from any type of system failure requires the following:
1. Determining which data structures are intact and which ones
need recovery.
2. Following the appropriate recovery steps.
3. Restarting the database so that it can resume normal operations.
4. Ensuring that no work has been lost nor incorrect data entered in
the database.
► The goal is to return to normal as quickly as possible while
insulating database users from any problems and the possibility
of losing or duplicating work.
Need for Recovery Control
Two types of storage: volatile (main memory) and nonvolatile.
Volatile storage does not survive system crashes. Stable storage
represents information that has been replicated in several
nonvolatile storage media with independent failure modes.
► To preserve transaction properties (Atomicity, Consistency,
Isolation and Durability).
► To bring the database into the last consistent state, which
existed prior to the failure
Types of Failure
► The database may become unavailable for use due to
► Transaction failure: Transactions may fail because of incorrect
input, deadlock, incorrect synchronization.
► System failure: System may fail because of addressing error,
application error, operating system fault, RAM failure, etc.
► Media failure: Disk head crash, power disruption, etc.
A catastrophic failure
► A catastrophic failure is one where a stable, secondary storage
device gets corrupt. With the storage device, all the valuable data
that is stored inside is lost.
► Catastrophic failure refer to the quick and sudden failure from
which it is impossible to recover. It shuts down the ability of the
business system to perform real-time tasks. The core operations
can also be failed through this.
Typical strategy for recovery
1. If there is extensive damage to a wide portion of the database
due to cata-strophic failure, such as a disk crash, the recovery
method restores a past copy of the database that was backed up to
archival storage (typically tape or other large capacity offline
storage media) and reconstructs a more current state by
reapplying or redoing the operations of committed transactions
from the backed up log, up to the time of failure.
Typical strategy for recovery
2. When the database on disk is not physically damaged, and a non
catastrophic failure of has occurred, the recovery strategy is to identify
any changes that may cause an inconsistency in the database. For
example, a transaction that has updated some database items on disk but
has not been committed needs to have its changes reversed by undoing
its write operations. It may also be necessary to redo some operations in
order to restore a consistent state of the database; for example, if a
transaction has committed but some of its write operations have not yet
been written to disk.
Con……
► For non catastrophic failure, the recovery protocol does not need
a complete archival copy of the database. Rather, the entries kept
in the online system log on disk are analyzed to determine the
appropriate actions for recovery.
Deferred update and Immediate
update
► Conceptually, we can distinguish two main techniques for
recovery from non-catastrophic transaction failures: deferred
update and immediate update.
Deferred Update
► The deferred update techniques do not physically update the
database on disk until after a trans-action reaches its commit
point; then the updates are recorded in the database. Before
reaching commit, all transaction updates are recorded in the local
transaction workspace or in the main memory buffers that the
DBMS maintains (the DBMS main memory cache).
Deferred Update
► If a transaction fails before reaching its commit point, it will not
have changed the database in any way so UNDO is not needed. It
may be necessary to REDO the effect of the operations that are
recorded in the local transaction workspace, because their effect
may not yet have been written in the database. Hence, a deferred
update is also known as the No-undo/redo algorithm
Immediate update
► In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its
commit point. However, these operations are recorded in a log on
disk before they are applied to the database, making recovery still
possible.
► If a transaction fails to reach its commit point, the effect of its
operation must be undone i.e. the transaction must be rolled back
hence we require both undo and redo. This technique is known
as undo/redo algorithm.
Con….
► The UNDO and REDO operations are required to be
idempotent—that is, executing an operation multiple times is
equivalent to executing it just once. In fact, the whole recovery
process should be idempotent because if the system were to fail
during the recovery process, the next recovery attempt might
UNDO and REDO certain write_item operations that had already
been executed during the first recovery process.
Data Caching
► In this one or more disk pages that include data items to be
updated are cached into main memory buffers and then updated
in memory before being written back to disk. A collection of in-
memory buffers called the DBMS cache is kept under the control
of DBMS for holding these buffers.
► Data items to be modified are first stored into database cache by
the Cache Manager (CM) and after modification they are flushed
(written) to the disk
► The flushing is controlled by Modified and Pin-Unpin bits.
► Pin-Unpin: Instructs the operating system not to flush the data
item. Modified: Indicates the AFIM of the data item
Transaction Log
► Transaction Log For recovery from any type of failure data
values prior to modification (BFIM - BeFore Image) and the new
value after modification (AFIM – AFter Image) are required.
► These values and other information is stored in a sequential file
called Transaction log. A sample log is given below. Back P and
Next P point to the previous and next log records of the same
transaction.
Transaction Roll-back (Undo) and Roll-
Forward (Redo)
► To maintain atomicity, a transaction’s operations are redone or
undone.
► Undo: Restore all BFIMs on to disk (Remove all AFIMs).
► Redo: Restore all AFIMs on to disk.
► Database recovery is achieved either by performing only Undos
or only Redos or by a combination of the two. These operations
are recorded in the log as they happen.
Write-Ahead Logging
► When in-place update (immediate or deferred) is used then log is
necessary for recovery and it must be available to recovery
manager. This is achieved by Write-Ahead Logging (WAL)
protocol. WAL states that
► For Undo: Before a data item’s AFIM is flushed to the database
disk (overwriting the BFIM) its BFIM must be written to the log
and the log must be saved on a stable store (log disk).
► For Redo: Before a transaction executes its commit operation, all
its AFIMs must be written to the log and the log must be saved on
a stable store
Checkpointing
► Time to time (randomly or under some criteria) the database
flushes its buffer to database disk to minimize the task of
recovery. The following steps defines a checkpoint operation:
1. Suspend execution of transactions temporarily.
2. Force write modified buffer data to disk.
3. Write a [checkpoint] record to the log, save the log to disk.
4. Resume normal transaction execution.
► During recovery redo or undo is required to transactions
appearing after [checkpoint] record.
Steal/No-Steal and Force/No-Force
Possible ways for flushing database cache to database disk:
1. Steal: Cache can be flushed before transaction commits.
2. No-Steal: Cache cannot be flushed before transaction commit.
3. Force: Cache is immediately flushed (forced) to disk.
4. No-Force: Cache is deferred until transaction commits
These give rise to four different ways for handling recovery:
Steal/No-Force (Undo/Redo)
Steal/Force (Undo/No-redo)
No-Steal/No-Force (Redo/No-undo)
No-Steal/Force (No-undo/No-redo)
Recovery Scheme
Deferred Update (No Undo/Redo)
► The data update goes as follows:
► A set of transactions records their updates in the log.
► At commit point under WAL scheme these updates are saved on
database disk.
► After reboot from a failure the log is used to redo all the
transactions affected by this failure. No undo is required because
no AFIM is flushed to the disk before a transaction commits.
Deferred Update
Deferred Update in a single-user system There is no concurrent data
sharing in a single user system. The data update goes as follows:
► A set of transactions records their updates in the log.
► At commit point under WAL scheme these updates are saved on
database disk.
► After reboot from a failure the log is used to redo all the
transactions affected by this failure. No undo is required because
no AFIM is flushed to the disk before a transaction commits.
Con…..
► Deferred Update with concurrent users.
► This environment requires some concurrency control mechanism
to guarantee isolation property of transactions. In a system
recovery transactions which were recorded in the log after the
last checkpoint were redone. The recovery manager may scan
some of the transactions recorded before the checkpoint to get the
AFIMs.
Con….
► Deferred Update with concurrent users
► Two tables are required for implementing this protocol:
Active table: All active transactions are entered in this table.
Commit table: Transactions to be committed are entered in this table.
► During recovery, all transactions of the commit table are redone
and all transactions of active tables are ignored since none of
their AFIMs reached the database. It is possible that a commit
table transaction may be redone twice but this does not create
any inconsistency because of a redone is “idempotent”, that is,
one redone for an AFIM is equivalent to multiple redone for the
same AFIM.
Recovery Techniques Based on Immediate
Update
Undo/No-redo Algorithm
► In this algorithm AFIMs of a transaction are flushed to the
database disk under WAL before it commits.
► For this reason the recovery manager undoes all transactions
during recovery.
► No transaction is redone.
► It is possible that a transaction might have completed execution
and ready to commit but this transaction is also undone.
Recovery Techniques Based on Immediate
Update
Undo/Redo Algorithm (Single-user environment)
► Recovery schemes of this category apply undo and also redo for
recovery.
► In a single-user environment no concurrency control is required
but a log is maintained under WAL.
► Note that at any time there will be one transaction in the system
and it will be either in the commit table or in the active table.
► The recovery manager performs:
► Undo of a transaction if it is in the active table.
► Redo of a transaction if it is in the commit table
Recovery Techniques Based on Immediate
Update
Undo/Redo Algorithm (Concurrent execution)
► Recovery schemes of this category applies undo and also redo to
recover the database from failure.
► In concurrent execution environment a concurrency control is
required and log is maintained under WAL.
► Commit table records transactions to be committed and active
table records active transactions. To minimize the work of the
recovery manager checkpointing is used.
► The recovery performs:
► Undo of a transaction if it is in the active table.
► Redo of a transaction if it is in the commit table.
Shadow Paging
► The AFIM does not overwrite its BFIM but recorded at another
place on the disk. Thus, at any time a data item has AFIM and
BFIM (Shadow copy of the data item) at two different places on
the disk.
► To manage access of data items by concurrent transactions two
directories (current and shadow) are used.
The ARIES Recovery Algorithm
► The ARIES Recovery Algorithm is based on:
► WAL (Write Ahead Logging)
► Repeating history during redo:
► ARIES will retrace all actions of the database system prior to
the crash to reconstruct the database state when the crash
occurred.
► Logging changes during undo:
► It will prevent ARIES from repeating the completed undo
operations if a failure occurs during recovery, which causes a
restart of the recovery process.
The ARIES Recovery Algorithm
► The ARIES recovery algorithm consists of three steps:
1. Analysis: step identifies the dirty (updated) pages in the buffer
and the set of transactions active at the time of crash. The
appropriate point in the log where redo is to start is also
determined.
2. Redo: necessary redo operations are applied.
3. Undo: log is scanned backwards and the operations of
transactions active at the time of crash are undone in reverse
order.
The ARIES Recovery Algorithm
► The Log and Log Sequence Number (LSN)
A log record is written for:
(a) data update
(b) transaction commit
(c) transaction abort
(d) undo
(e) transaction end
In the case of undo a compensating log record is written.
The ARIES Recovery Algorithm
► A unique LSN is associated with every log record.
► LSN increases monotonically and indicates the disk address of the
log record it is associated with.
► In addition, each data page stores the LSN of the latest log record
corresponding to a change for that page.
► A log record stores
► (a) the previous LSN of that transaction
► (b) the transaction ID
► (c) the type of log record.
Co…
► A log record stores:
► 1. Previous LSN of that transaction: It links the log record of each
transaction. It is like a back pointer points to the previous record
of the same transaction
► 2. Transaction ID
► 3. Type of log record
► For a write operation the following additional information is
logged:
1. Page ID for the page that includes the item
2. Length of the updated item
3. Its offset from the beginning of the page
4. BFIM of the item
5. AFIM of the item
Recovery in multi-database system
► A multi-database system is a special distributed database system
where one node may be running relational database system
under UNIX, another may be running object-oriented system
under Windows and so on.
► A transaction may run in a distributed fashion at multiple nodes.
► In this execution scenario the transaction commits only when all
these multiple nodes agree to commit individually the part of the
transaction they were executing.
► This commit scheme is referred to as “two-phase commit” (2PC).
► If any one of these nodes fails or cannot commit the part of the
transaction, then the transaction is aborted.
► Each node recovers the transaction under its own recovery
protocol.
Quiz (10%)
1. Write the concept of recovery in multi-database system
2. list and explain ARIES Recovery Algorithm steps
3. write the difference between deferred update and immediate update
4. why do we need recovery control techniques
5. list recovery control techniques
Advanced Database
Chapter 6&7- Database Security and Integrity Distributed Database Systems
Database Security and Integrity
► Database security encompasses hardware, software, people and
data.
► Multi-user database system - DBMS must provide a database
security and authorization subsystem to enforce limits on
individual and group access rights and privileges.
► Database misuse could be Intentional or accidental, where
accidental misuse is easier to cope with than intentional misuse.
Accidental inconsistency could occur due to:
System crash during transaction processing
Anomalies due to concurrent access
Anomalies due to redundancy
Logical errors
Con…
► Likewise, even though there are various threats that could be
categorized in this group, intentional misuse could be:
Unauthorized reading of data
Unauthorized modification of data or
Unauthorized destruction of data
► Most systems implement good Database Integrity to protect the
system from accidental misuse while there are many computer
based measures to protect the system from intentional misuse,
which is termed as Database Security measures.
Security Issues and general considerations
► Legal, ethical and social issues regarding the right to access
information
► Physical control
► Policy issues regarding privacy of individual level at enterprise
and national level
► Operational consideration on the techniques used (password, etc)
► System level security including operating system and hardware
control Security levels and security policies in enterprise level
► Database security - the mechanisms that protect the database
against intentional or accidental threats. And Database security
encompasses hardware, software, people and data
► Threat - any situation or event, whether intentional or accidental,
that may adversely affect a system and consequently the
organization
co
The harm to an organization may be tangible or intangible
► Tangible - loss of hardware, software, or data
► Intangible - loss of credibility or client confidence
Examples of threats:
* Using another persons’ means of access
* Unauthorized amendment/modification or copying of data
* Program alteration
* Inadequate policies and procedures that allow a mix of confidential and
normal out put
* Wire-tapping
* Illegal entry by hacker
* Blackmail
Levels of Security Measures
Security measures can be implemented at several levels and for
different components of the system. These levels are:
1. Physical Level: concerned with securing the site containing
the computer system should be physically secured. The backup
systems should also be physically protected from access except for
authorized users.
2. Human Level: concerned with authorization of database users
for access the content at different levels and privileges.
3. Operating System: concerned with the weakness and strength of
the operating system security on data files. Weakness may serve as a
means of unauthorized access to the database. This also includes
protection of data in primary and secondary memory from
unauthorized access.
4. Database System: concerned with data access limit enforced by
the database system. Access limit like password, isolated transaction
and etc.
Countermeasures: Computer based
controls
► The types of countermeasure to threats on computer systems
range from physical controls to administrative procedures
► Despite the range of computer-based controls that are available, it
is worth noting that, generally, the security of a DBMS is only as
good as that of the operating system, owing to their close
association
► The following are computer-based security controls for a multi-
user environment:
Computer based controls
Authorization
► The granting of a right or privilege that enables a subject to have
legitimate access to a system or a system’s object
► Authorization controls can be built into the software, and govern
not only what system or object a specified user can access, but
also what the user may do with it
► The process of authorization involves authentication of subjects(i.
e. a user or program) requesting access to objects (i.e. a database
table, view, procedure, trigger, or any other object that can be
created within the system)
Computer based controls
Views
► A view is the dynamic result of one or more relational operations
operation on the base relations to produce another relation
► A view is a virtual relation that does not actually exist in the
database, but is produced upon request by a particular user
► The view mechanism provides a powerful and flexible security
mechanism by hiding parts of the database from certain users
► Using a view is more restrictive than simply having certain
privileges granted to a user on the base relation(s)
Integrity
► Integrity constraints contribute to maintaining a secure database
system by preventing data from becoming invalid and hence
giving misleading or incorrect results
► Domain Integrity
► Entity integrity
► Referential integrity
► Key constraints
Backup and recovery
► Backup is the process of periodically taking a copy of the database
and log file (and possibly programs) on to offline storage media
► A DBMS should provide backup facilities to assist with the
recovery of a database following failure.
► Database recovery is the process of restoring the database to a
correct state in the event of a failure
► Journaling is the process of keeping and maintaining a log file (or
journal) of all changes made to the database to enable recovery to
be undertaken effectively in the event of a failure.
Encryption
► The encoding of the data by a special algorithm that renders the
data unreadable by any program without the decryption key
► If a database system holds particularly sensitive data, it may be
deemed necessary to encode it as a precaution against possible
external threats or attempts to access it.
► Encryption also protects data transmitted over
communication lines
► To transmit data securely over insecure networks requires the
use of a Cryptosystem, which includes Authentication.
Authentication
► All users of the database will have different access levels and
permission for different data objects, and authentication is
the process of checking whether the user is the one with the
privilege for the access level.
► Is the process of checking the users are who they say they are.
► Each user is given a unique identifier, which is used by the
operating system to determine who they are
► Thus the system will check whether the user with a
specific username and password is trying to use the resource.
Authentication
Any database access request will have the following three
major components
1. Requested Operation: what kind of operation is requested by a
specific query?
2. Requested Object: on which resource or data of the database is the
operation sought to be applied?
3. Requesting User: who is the user requesting the operation on the
specified object?
Forms of user authorization
► There are different forms of user authorization on the resource of the
database. These forms are privileges on what operations are allowed on
a specific data object.
► User authorization on the data/extension
1. Read Authorization: the user with this privilege is allowed only
to read the content of the data object.
2. Insert Authorization: the user with this privilege is allowed only
to insert new records or items to the data object.
3. Update Authorization: users with this privilege are allowed to
modify content of attributes but are not authorized to delete the
records.
4. Delete Authorization: users with this privilege are only allowed
to delete a record and not anything else.
Example….
Eg: Creating user in ORACLE database;
Create user abebe identified by 123;
Grant all privileges to abebe;
Grant select, insert , update , delete on schema.test to
abebe;
Role of DBA in Database Security
1. Account Creation: involves creating different accounts for
different USERS as well as USER GROUPS.
2. Security Level Assignment: involves in assigning different users
at different categories of access levels.
3. Privilege Grant: involves giving different levels of privileges for
different users and user groups.
4. Privilege Revocation: involves denying or canceling previously
granted privileges for users due to various reasons.
5. Account Deletion: involves in deleting an existing account of
users or user groups. Is similar with denying all privileges of users
on the database.
RAID (Redundant Array of Independent Disks)
► The hardware that the DBMS is running on must be fault-tolerant, meaning
that the DBMS should continue to operate even if one of the hardware
components fails.
► One solution is the use of RAID technology.
► RAID works on having a large disk array comprising an arrangement of
several independent disks that are organized to improve reliability and at
the same time increase performance.
► Reliability is improved through storing redundant information across the
disks using a parity scheme or an error-correcting scheme
Distributed Database Systems
► Database development facilitates the integration of data available
in an organization and enforces security on data access. But it is
not always the case that organizational data reside in one site.
► In a distributed database system, the database is stored on
several computers. The computers in a distributed system
communicate with each other through various communication
media, such as high speed buses or telephone line.
► A distributed database system consists of loosely coupled sites
that share no physical component and database systems that run
on each site are independent of each other.
► Replication: System maintains multiple copies of data, stored in different
sites, for faster retrieval and fault tolerance.
► Fragmentation: Relation is partitioned into several fragments stored in
distinct sites
► Data transparency: Degree to which system user may remain unaware
of the details of how and where the data items are stored in a distributed
system.
► A distributed database system consists of a collection of sites, each of which
maintains a local database system and also participates in global transaction
where different databases are integrated together.
► Local Transaction: transactions that access data only in that single
site
► Global Transaction: transactions that access data in several sites.
Advantages of DDBMS
► Data sharing and distributed control: User at one site may be
able access data that is available at another site
Each site can retain some degree of control over local data
We will have local as well as global database administrator
► Reliability and availability of data
If one site fails the rest can continue operation as long as
transaction does not demand data from the failed system and the
data is not replicated in other sites
► Speedup of query processing
If a query involves data from several sites, it may be possible to split
the query into sub-queries that can be executed at several sites
which is parallel processing
Disadvantages of DDBMS
► Software development cost
► Greater potential for bugs (parallel processing may endanger
correctness)
► Increased processing overhead (due to communication jargons)
► Communication problems
Homogeneous distributed database
► All sites have identical software
► Are aware of each other and agree to cooperate in processing
user requests.
► Each site surrenders part of its autonomy in terms of right to
change schemas or software
► Appears to user as a single system
Heterogeneous distributed database
► Different sites may use different schemas and software
► Difference in schema is a major problem for query
processing
► Difference in software is a major problem for transaction
processing
► Sites may not be aware of each other and may provide only
limited facilities for cooperation in transaction processing
Thank
You