UNIT-4
ACID Properties in DBMS
Transaction:
A transaction is a set of instructions that are used to perform single logical
operation on database.
Let’s take an example of a simple transaction. Suppose a bank employee transfers
Rs 500 from A's account to B's account.
This very simple and small transaction involves several low-level tasks.
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
In order to maintain consistency in a database, before and after the transaction,
certain properties are followed.
These are called ACID properties.
Atomicity:
1. It means that either the entire transaction takes place at once or doesn’t
happen at all.
2. There is no midway i.e. transactions do not occur partially.
3. Each transaction is considered as one unit and either runs to completion or
is not executed at all.
“States that either transaction successfully executed or not start to execute the
transaction. It means that no incomplete transactions are occurred in the
database”
It involves the following two operations.
—Abort: If a transaction aborts, changes made to the database are not visible.
—Commit: If a transaction commits, changes made are visible.
“Atomicity is also known as the ‘All or nothing rule’. “
Consider the following transaction T consisting of T1 and T2: Transfer of 100
from account X to account Y.
Consistency:
1. This means that integrity constraints must be maintained so that the
database is consistent before and after the transaction.
2. It refers to the correctness of a database.
Referring to the example above,
The total amount before and after the transaction must be maintained or Equal.
1. Total before T occurs T=T1+T2=> 500 + 200 = 700.
2. Total after T occurs T=T1+T2=> 400 + 300 = 700.
Therefore, the database is consistent. i.e. before transaction and after transaction
value of T is not modified
Isolation:
1. Isolation means that any data being used during the processing of one
transaction cannot be used by another transaction until the first
transaction is complete, means that the transactions are executed one after
the other.
2. This property ensures that multiple transactions can occur concurrently
(executes same time) with leading to the inconsistency of the database state.
3. All the Transactions occur independently without interference.
4. Changes occurring in a particular transaction will not be visible to any other
transaction until that particular change in that transaction is written to
memory or has been committed.
Durability:
1. This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk
and they persist even if a system failure occurs.
2. These updates now become permanent and are stored in non-volatile
memory. The effects of the transaction, thus, are never lost.
Transaction Operations
Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following
operations:
1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
Let's assume the value of X before starting of the transaction is 4000.
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will be
3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
Commit: It is used to save the work done permanently.
Rollback: It is used to undo the work done.
Transaction states in DBMS
These are different types of Transaction States:
1. Active State
When the instructions of the transaction are running then the transaction is in active
state. If all the ‘read and write’ operations are performed without any error then it
goes to the “partially committed state”; if any instruction fails, it goes to the “failed
state”.
2. Partially Committed
After completion of all the read and write operation the changes are made in main
memory or local buffer. If the changes are made permanent on the DataBase then
the state will change to “committed state” and in case of failure it will go to the
“failed state”.
3. Failed State
When any instruction of the transaction fails, it goes to the “failed state” or if failure
occurs in making a permanent change of data on Data Base.
4. Aborted State
After having any type of failure the transaction goes from “failed state” to “aborted
state” and since in previous states, the changes are only made to local buffer or main
memory and hence these changes are deleted or rolled-back.
5. Committed State
It is the state when the changes are made permanent on the Data Base and the
transaction is complete and therefore terminated in the “terminated state”.
6. Terminated State
If there isn’t any roll-back or the transaction comes from the “committed state”, then
the system is consistent and ready for new transaction and the old transaction is
terminated.
Schedule
A schedule is a series of operations from one or more transactions. A schedule can be of
two types:
Serial Schedule: When one transaction completely executes before starting another
transaction, the schedule is called a serial schedule. A serial schedule is always
consistent. e.g.; If a schedule S has debit transaction T1 and credit transaction T2,
possible serial schedules are T1 followed by T2 (T1->T2) or T2 followed by T1 ((T2-
>T1). A serial schedule has low throughput and less resource utilization.
Concurrent Schedule: When operations of a transaction are interleaved with
operations of other transactions of a schedule, the schedule is called a Concurrent
schedule. e.g.; the Schedule of debit and credit transactions shown in Table 1 is
concurrent. But concurrency can lead to inconsistency in the database. The above
example of a concurrent schedule is also inconsistent.
Difference between Serial Schedule and Serializable
Schedule
Serial Schedule Serializable Schedule
In Serial schedule, transactions will be In Serializable schedule transaction are
executed one after other. executed concurrently.
Serial schedule are less efficient. Serializable schedule are more efficient.
Serial Schedule Serializable Schedule
In serial schedule only one transaction In Serializable schedule multiple transactions
executed at a time. can be executed at a time.
Serial schedule takes more time for
In Serializable schedule execution is fast.
execution.
Concurrency Control in DBMS
Executing a single transaction at a time will increase the waiting time of the other
transactions which may result in delay in the overall execution. Hence for increasing
the overall throughput and efficiency of the system, several transactions are
executed.
Concurrently control is a very important concept of DBMS which ensures the
simultaneous execution or manipulation of data by several processes or user without
resulting in data inconsistency.
Concurrency control provides a procedure that is able to control concurrent
execution of the operations in the database.
Concurrency Control Problems
There are several problems that arise when numerous transactions are executed
simultaneously in a random manner. The database transaction consist of two major
operations “Read” and “Write”. It is very important to manage these operations in the
concurrent execution of the transactions in order to maintain the consistency of the
data.
Problem 1: Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions perform the read/write operations
on the same database items in an interleaved manner (i.e., concurrent execution) that makes the
values of the items incorrect hence making the database inconsistent.
For example:
Consider the below diagram where two transactions T X and TY, are performed on the same
account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted
and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300 only
because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not
updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250 only,
as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by T X is lost, i.e., $250 is
lost.
Hence data becomes incorrect, and database sets to inconsistent.
Dirty Read Problems (W-R Conflict)
The dirty read problem occurs when one transaction updates an item of the database, and
somehow the transaction fails, and before the data gets rollback, the updated database item is
accessed by another transaction. There comes the Read-Write Conflict between both
transactions.
For example:
Consider two transactions TX and TY in the below diagram performing read/write
operations on account A where the available balance in account A is $300:
o At time t1, transaction TX reads the value of account A, i.e., $300.
o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction T X rollbacks due to server problem, and the value changes
back to $300 (as initially).
o But the value for account A remains $350 for transaction T Y as committed, which is the
dirty read and therefore known as the Dirty Read Problem.
Unrepeatable Read Problem (W-R Conflict)
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different
values are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account A,
having an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the available
balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and that will
be read as $400.
o It means that within the same transaction T X, it reads two different values of account A,
i.e., $ 300 initially, and after updation made by transaction T Y, it reads $400. It is an
unrepeatable read and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that take place in
concurrent execution, management is needed, and that is where the concept of Concurrency
Control comes into role.
Concurrency Control Protocols
Concurrency control protocols are the set of rules which are maintained in order to
solve the concurrency control problems in the database. It ensures that the concurrent
transactions can execute properly while maintaining the database consistency. The
concurrent execution of a transaction is provided with atomicity, consistency, isolation,
durability, and serializability via the concurrency control protocols.
Locked based concurrency control protocol
Timestamp based concurrency control protocol
2Phase Locking Protocol (2PL)
Lock based protocol
Any transaction cannot read or write data until it acquires an appropriate lock on it. There are
two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock, then
it can't update the data on the data item.
Ex S(A)
S is Shared lock assigned to data item A by the transaction
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
Ex: X(A)
X is Exclusive lock assigned to data item A by the transaction
2PL locking protocol
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released,
but no new locks can be acquired.
The two-phase locking protocol divides the execution phase of the transaction into three
parts.
In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.
In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
.
The 2PL locking protocol is represented diagrammatically as follows −
In the growing phase transaction reaches a point where all the locks it may need has been
acquired. This point is called LOCK POINT.
After the lock point has been reached, the transaction enters a shrinking phase.
Let’s see a transaction implementing 2-PL.
T1 T2
1 lock-S(A)
2 lock-S(A)
3 lock-X(B)
4 …….
5 Unlock(A) Lock-X(C)
6 ……..
7 Unlock(B)
8 Unlock(A)
9 Unlock(C)
10 ……. ……
This is just a skeleton transaction that shows how unlocking and locking work with 2-
PL. Note for:
Transaction T1:
The growing Phase is from steps 1-3.
The shrinking Phase is from steps 5-7.
Lock Point at 3
Transaction T2:
The growing Phase is from steps 2-5.
The shrinking Phase is from steps 8-9.
Lock Point at 6
LOCK POINT is the point at which the growing phase ends
Recoverability of Schedule
Sometimes a transaction may not execute completely due to a software issue, system crash or
hardware failure. In that case, the failed transaction has to be rollback. But some other
transaction may also have used value produced by the failed transaction. So we also have to
rollback those transactions.
The above table 1 shows a schedule which has two transactions. T1 reads and writes the value of
A and that value is read and written by T2. T2 commits but later on, T1 fails. Due to the failure,
we have to rollback T1. So this type of schedule is called recoverable schedule.T2 should also
be rollback because it reads the value written by T1, but T2 can't be rollback because it already
committed. So this type of schedule is known as irrecoverable schedule.
Irrecoverable schedule: The schedule will be irrecoverable if Tj reads the updated value of Ti
and Tj committed before Ti commit.
The above table 2 shows a schedule with two transactions. Transaction T1 reads and writes A,
and that value is read and written by transaction T2. But later on, T1 fails. Due to this, we have
to rollback T1. T2 should be rollback because T2 has read the value written by T1. As it has not
committed before T1 commits so we can rollback transaction T2 as well. So it is recoverable
with cascade rollback.
Remote Backup Systems
Remote backup provides a sense of security in case the primary location where the database is
located gets destroyed. Remote backup can be offline or real-time or online.
1. In case it is offline, it is maintained manually.
2. Online backup systems are more real-time and lifesavers for database administrators
and investors.
3. Remote backup systems provide a wide range of availability, allowing the
transaction processing to continue even if the primary site is destroyed by a fire,
flood or earthquake.
4. Data and log records from a primary site are continuously backed up into a
remote backup site.
5. One can achieve ‘wide range availability’ of data by performing transaction
processing at one site, called the ‘primary site’, and having a ‘remote backup’
site where all the data from the primary site are duplicated.
6. The remote site is also called ‘secondary site’.
7. The remote site must be synchronized with the primary site, as updates are
performed at the primary.
In designing a remote backup system, the following points are important.
a) Detection of failure: It is important for the remote backup system to detect when the
primary site has failed.
b) Transfer of control: When the primary site fails, the backup site takes over the
processing and becomes the new primary site.
c) Time to recover: If the log at the remote backup becomes large, recovery will take a
long time.
d) Time to commit: To ensure that the updates of a committed transaction are durable, a
transaction should not be announced committed until its log records have reached the
backup site.
Granularity: It is the size of data item allowed to lock.
Multiple Granularity:
o It can be defined as hierarchically breaking up the database into blocks which can be
locked.
o The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
o It maintains the track of what to lock and how to lock.
o It makes easy to decide either to lock a data item or to unlock a data item. This type of
hierarchy can be graphically represented as a tree.
For example: Consider a tree which has four levels of nodes.
o The first level or higher level shows the entire database.
o The second level represents a node of type area. The higher level database consists of
exactly these areas.
o The area consists of children nodes which are known as files. No file can be present in
more than one area.
o Finally, each file contains child nodes known as records. The file has exactly those
records that are its child nodes. No records represent in more than one file.
o Hence, the levels of the tree starting from the top level are as follows:
1. Database
2. Area
3. File
4. Record
In this example, the highest level shows the entire database. The levels below are file, record,
and fields.
There are three additional lock modes with multiple granularity:
Intention Mode Lock
Intention-shared (IS): It contains explicit locking at a lower level of the tree but only with
shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive or shared
locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode, and some
node is locked in exclusive mode by the same transaction.
Compatibility Matrix with Intention Lock Modes: The below table describes the
compatibility matrix for these lock modes:
ARIES Algorithm
Consider the recovery example shown in Figure 23.5. There are three
transactions: T1, T2, and T3. T1 updates page C, T2 updates pages B and C,
and T3 updates page A.
Timestamp based Protocol
o The Timestamp based Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
o The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.
o The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as soon
as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has
entered the system at 007 times and transaction T2 has entered the system at 009 times.
T1 has the higher priority, so it executes first as it is entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write'
operation on a data.
Basic Timestamp ordering protocol works as follows:
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
o If W_TS(X) >TS(Ti) then the operation is rejected.
o If W_TS(X) <= TS(Ti) then the operation is executed.
o Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
o If TS(Ti) < R_TS(X) then the operation is rejected.
o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the
operation is executed.
Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Validation Based Protocol
Validation phase is also known as optimistic concurrency control technique. In the validation
based protocol, the transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the
value of various data items and stores them in temporary local variables. It can perform
all the write operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against
the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results
are written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:
Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its validation
phase.
Finish(Ti): It contains the time when Ti finishes its write phase.
o This protocol is used to determine the time stamp for the transaction for serialization
using the time stamp of the validation phase, as it is the actual phase which determines if
the transaction will commit or rollback.
o Hence TS(T) = validation(T).
o The serializability is determined during the validation process. It can't be decided in
advance.
o While executing the transaction, it ensures a greater degree of concurrency and also less
number of conflicts.
o Thus it contains transactions which have less number of rollbacks.
Buffer Management
A buffer is a memory location used by a database management system (DBMS) to temporarily hold
data that has recently been accessed or updated in the database. This buffer, often referred to as a
database buffer, acts as a link between the programs accessing the data and the physical storage
devices
Purpose of the Database Buffer
A crucial part of a DBMS is the database buffer. It does a variety of things, like as
1. Reducing Disk I/O Operations
2. Improving Data Access Time
3. Managing Data Consistency
4. Supporting Concurrency Control
Database Buffer Types
A database buffer might be of several types. The most typical varieties include −
Buffer Pool
The most fundamental kind of database buffer is a buffer pool. It improves data access speed by
caching frequently used data pages in RAM, minimizing disc I/O operations.
Write Buffer
Before writing data changes to the disc, a write buffer, often referred to as a log buffer, temporarily
stores the changes. A vital part of the recovery mechanism that makes sure the database can recover
from errors is the write buffer.
Multi-Buffering
Multi-buffering is a method for caching various kinds of data using numerous buffer pools. For
instance, one buffer pool may store data that is often requested while another pool might store index
data.
NUMA Buffer for Non-Uniform Memory Access
A particular kind of database buffer called a NUMA buffer is made for systems with non-uniform
memory access. These systems have several processors, each of which has its own memory.