[go: up one dir, main page]

0% found this document useful (0 votes)
8 views38 pages

dbms unit4

Uploaded by

sirisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views38 pages

dbms unit4

Uploaded by

sirisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 38

1

Introduction to Database Transaction


 Database Transaction is an atomic unit that contains one or more SQL statements.
 It is a series of operations that performs as a single unit of work against a database.
 It is a logical unit of work.
 It has a beginning and an end to specify its boundary.

Let's take an simple example of bank transaction, Suppose a Bank clerk transfers Rs. 1000 from X's
account to Y's account.

X's Account

open-account (X)
prev-balance = X.balance
curr-balance = prev-balance – 1000
X.balance = curr-balance
close-account (X)

Decreasing Rs. 1000 from X's account, saving new balance that is current balance and after completion of
transaction the last step is closing the account.

Y's Account

open-account (Y)
prev - balance = Y.balance
curr - balance = prev-balance + 1000
Y.balance = curr-balance
close-account (Y)

Adding Rs. 1000 in the Y's account and saving new balance that is current balance and after completion of
transaction the last step is closing the account.

 The above example defines a very simple and small transaction that tells how the transaction management
actual works.

Transaction Properties
2

Following are the Transaction Properties, referred to by an acronym ACID properties:

1. Atomicity
2. Consistency
3. Isolation
4. Durability

 ACID properties are the most important concepts of database theory.


 A transaction is a small unit of program which contains several low level tasks.
 These properties guarantee that the database transactions are processed reliably.
1. Atomicity
 Atomicity defines that all operations of the transactions are either executed or none.
 Atomicity is also known as 'All or Nothing', it means that either perform the operations or not perform at all.
 It is maintained in the presence of deadlocks, CPU failures, disk failures, database and application software
failures.
 It can be turned off at system level and session level.
2. Consistency
 Consistency defines that after the transaction is finished, the database must remain in a consistent state.
 It preserves consistency of the database.
 If execution of transaction is successful, then the database remains in a consistent state. If the transaction fails,
then the transaction will be rolled back and the database will be restored to a state consistent.
3. Isolation
 Isolation defines that the transactions are securely and independently processed at the same time without
interference.
 Isolation property does not ensure the order of transactions.
 The operations cannot access or see the data in an intermediate state during a transaction.
 Isolation is needed when there are concurrent transactions occurring at the same time.
4. Durability
 Durability states that after completion of transaction successfully, the changes are required for the database.
 Durability holds its latest updates even if the system fails or restarts.
 It has the ability to recover committed transaction updates even if the storage media fails.

Transaction States

 A transaction is a small unit of program which contains several low level tasks.
 It is an event which occurs on the database.
3

It has the following states,

1. Active
2. Partially Committed
3. Failed
4. Aborted
5. Committed

1. Active : Active is the initial state of every transaction. The transaction stays in Active state during
execution.
2. Partially Committed : Partially committed state defines that the transaction has executed the final
statement.
3. Failed : Failed state defines that the execution of the transaction can no longer proceed further.
4. Aborted : Aborted state defines that the transaction has rolled back and the database is being restored to the
consistent state.
5. Committed : If the transaction has completed its execution successfully, then it is said to be committed.

Concurrency Control
Concurrency control is the procedure in DBMS for managing simultaneous operations
without conflicting with each another. Concurrent access is quite easy if all users are just
reading data. There is no way they can interfere with one another. Though for any practical
4

database, would have a mix of reading and WRITE operations and hence the concurrency is
a challenge.

Concurrency control is used to address such conflicts which mostly occur with a multi-user
system. It helps you to make sure that database transactions are performed concurrently
without violating the data integrity of respective databases.

Therefore, concurrency control is a most important element for the proper functioning of a
system where two or multiple database transactions that require access to the same data, are
executed simultaneously.

Why use Concurrency method?


Reasons for using Concurrency control method is DBMS:

 To apply Isolation through mutual exclusion between conflicting transactions


 To resolve read-write and write-write conflict issues
 To preserve database consistency through constantly preserving execution
obstructions
 The system needs to control the interaction among the concurrent transactions. This
control is achieved using concurrent-control schemes.
 Concurrency control helps to ensure serializability

Example
Assume that two people who go to electronic kiosks at the same time to buy a movie ticket
for the same movie and the same show time.

However, there is only one seat left in for the movie show in that particular theatre. Without
concurrency control, it is possible that both moviegoers will end up purchasing a ticket.
However, concurrency control method does not allow this to happen. Both moviegoers can
still access information written in the movie seating database. But concurrency control only
provides a ticket to the buyer who has completed the transaction process first.

Concurrency Control
o In the concurrency control, the multiple transactions can be executed simultaneously.
o It may affect the transaction result. It is highly important to maintain the order of
execution of those transactions.

Problems of concurrency control


Several problems can occur when concurrent transactions are executed in an uncontrolled
manner. Following are the three problems in concurrency control.
5

1. Lost updates
2. Dirty read
3. Unrepeatable read

1. Lost update problem


o When two transactions that access the same database items contain their operations in a
way that makes the value of some database item incorrect, then the lost update problem
occurs.
o If two transactions T1 and T2 read a record and then update it, then the effect of updating
of the first record will be overwritten by the second update.

Example:

Here,

o At time t2, transaction-X reads A's value.


o At time t3, Transaction-Y reads A's value.
o At time t4, Transactions-X writes A's value on the basis of the value seen at time t2.
o At time t5, Transactions-Y writes A's value on the basis of the value seen at time t3.
o So at time T5, the update of Transaction-X is lost because Transaction y overwrites it
without looking at its current value.
o Such type of problem is known as Lost Update Problem as update made by one
transaction is lost here.

2. Dirty Read
o The dirty read occurs in the case when one transaction updates an item of the database,
and then the transaction fails for some reason. The updated database item is accessed by
another transaction before it is changed back to the original value.
o A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has
values which have never formed part of the stable database.
6

Example:

At time t2, transaction-Y writes A's value.

o At time t3, Transaction-X reads A's value.


o At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior to t1.
o So, Transaction-X now contains a value which has never become part of the stable
database.
o Such type of problem is known as Dirty Read Problem, as one transaction reads a dirty
value which has not been committed.

3. Inconsistent Retrievals Problem


o Inconsistent Retrievals Problem is also known as unrepeatable read. When a transaction
calculates some summary function over a set of data while the other transactions are
updating the data, then the Inconsistent Retrievals Problem occurs.
o A transaction T1 reads a record and then does some other processing during which the
transaction T2 updates the record. Now when the transaction T1 reads the record, then the
new value will be inconsistent with the previous value.

Example:

Suppose two transactions operate on three accounts.


7

Transaction-X is doing the sum of all balance while transaction-Y is transferring an


amount 50 from Account-1 to Account-3.

o Here, transaction-X produces the result of 550 which is incorrect. If we write this
produced result in the database, the database will become an inconsistent state because
the actual sum is 600.
o Here, transaction-X has seen an inconsistent state of the database.

Methods for Concurrency control


There are main three methods for concurrency control. They are as follows:
1. Locking Methods
2. Time-stamp Methods
3. Optimistic Methods

1. Locking Methods of Concurrency Control:


"A lock is a variable, associated with the data item, which controls the access of that data item."
Locking is the most widely used form of the concurrency control. Locks are further divided into
three fields:

1. Lock Granularity
2. Lock Types
8

3. Deadlocks

1. Lock Granularity :
A database is basically represented as a collection of named data items. The size of the data item
chosen as the unit of protection by a concurrency control program is
called GRANULARITY. Locking can take place at the following level :

 Database level.
 Table level.
 Page level.
 Row (Tuple) level.
 Attributes (fields) level.

i. Database level Locking :


At database level locking, the entire database is locked. Thus, it prevents the use of any tables in
the database by transaction T2 while transaction T1 is being executed. Database level of locking
is suitable for batch processes. Being very slow, it is unsuitable for on-line multi-user DBMSs.

ii. Table level Locking :


At table level locking, the entire table is locked. Thus, it prevents the access to any row (tuple)
by transaction T2 while transaction T1 is using the table. if a transaction requires access to
several tables, each table may be locked. However, two transactions can access the same
database as long as they access different tables. Table level locking is less restrictive than
database level. Table level locks are not suitable for multi-user DBMS

iii. Page level Locking :


At page level locking, the entire disk-page (or disk-block) is locked. A page has a fixed size
such as 4 K, 8 K, 16 K, 32 K and so on. A table can span several pages, and a page can contain
several rows (tuples) of one or more tables. Page level of locking is most suitable for multi-user
DBMSs.

iv. Row (Tuple) level Locking :


At row level locking, particular row (or tuple) is locked. A lock exists for each row in each
table of the database. The DBMS allows concurrent transactions to access different rows of the
same table, even if the rows are located on the same page. The row level lock is much less
restrictive than database level, table level, or page level locks. The row level locking improves
the availability of data. However, the management of row level locking requires high overhead
cost.

v. Attributes (fields) level Locking :


At attribute level locking, particular attribute (or field) is locked. Attribute level locking allows
9

concurrent transactions to access the same row, as long as they require the use of different
attributes within the row. The attribute level lock yields the most flexible multi-user data access.
It requires a high level of computer overhead.
2. Lock Types :
The DBMS mailnly uses following types of locking techniques.

a. Binary Locking
b. Shared / Exclusive Locking
c. Two - Phase Locking (2PL)

a. Binary Locking :

A binary lock can have two states or values: locked and unlocked (or 1 and 0, for simplicity). A
distinct lock is associated with each database item X.
If the value of the lock on X is 1, item X cannot be accessed by a database operation that
requests the item. If the value of the lock on X is 0, the item can be accessed when requested.
We refer to the current value (or state) of the lock associated with item X as LOCK(X).
Two operations, lock_item and unlock_item, are used with binary locking.
Lock_item(X):
A transaction requests access to an item X by first issuing a lock_item(X) operation. If LOCK(X)
= 1, the transaction is forced to wait. If LOCK(X) = 0, it is set to 1 (the transaction locks the
item) and the transaction is allowed to access item X.

Unlock_item (X):
When the transaction is through using the item, it issues an unlock_item(X) operation, which
sets LOCK(X) to 0 (unlocks the item) so that X may be accessed by other transactions. Hence, a
binary lock enforces mutual exclusion on the data item ; i.e., at a time only one transaction can
hold a lock.

b. Shared / Exclusive Locking :


Shared lock :
These locks are reffered as read locks, and denoted by 'S'.
If a transaction T has obtained Shared-lock on data item X, then T can read X, but cannot write
X. Multiple Shared lock can be placed simultaneously on a data item.

Exclusive lock :
These Locks are referred as Write locks, and denoted by 'X'.
If a transaction T has obtained Exclusive lock on data item X, then T can be read as well as write
X. Only one Exclusive lock can be placed on a data item at a time. This means multipls
10

transactions does not modify the same data simultaneously.

c. Two-Phase Locking (2PL) :

Two-phase locking (also called 2PL) is a method or a protocol of controlling concurrent


processing in which all locking operations precede the first unlocking operation. Thus, a
transaction is said to follow the two-phase locking protocol if all locking operations (such as
read_Lock, write_Lock) precede the first unlock operation in the transaction. Two-phase
locking is the standard protocol used to maintain level 3 consistency 2PL defines how
transactions acquire and relinquish locks. The essential discipline is that after a transaction has
released a lock it may not obtain any further locks. 2PL has the following two phases:
A growing phase, in which a transaction acquires all the required locks without unlocking any
data. Once all locks have been acquired, the transaction is in its locked
point.
A shrinking phase, in which a transaction releases all locks and cannot obtain any new lock.

A transaction shows Two-Phase Locking lechnique.

Time Transaction Remarks

t0 Lock - X (A) acquire Exclusive lock on A.

t1 Read A read original value of A

t2 A = A - 100 subtract 100 from A

t3 Write A write new value of A

t4 Lock - X (B) acquire Exclusive lock on B.

t5 Read B read original value of B

t6 B = B + 100 add 100 to B

t7 Write B write new value of B

t8 Unlock (A) release lock on A


11

t9 Unock (B) release lock on B

3. Deadlocks :
A deadlock is a condition in which two (or more) transactions in a set are waiting
simultaneously for locks held by some other transaction in the set.
Neither transaction can continue because each transaction in the set is on a waiting queue,
waiting for one of the other transactions in the set to release the lock on an item. Thus, a
deadlock is an impasse that may result when two or more transactions are each waiting for locks
to be released that are held by the other. Transactions whose lock requests have been refused are
queued until the lock can be granted.
A deadlock is also called a circular waiting condition where two transactions are waiting
(directly or indirectly) for each other. Thus in a deadlock, two transactions are mutually
excluded from accessing the next record required to complete their transactions, also called a
deadly embrace.

Example:
A deadlock exists two transactions A and B exist in the following example:
Transaction A = access data items X and Y
Transaction B = access data items Y and X
Here, Transaction-A has aquired lock on X and is waiting to acquire lock on y. While,
Transaction-B has aquired lock on Y and is waiting to aquire lock on X. But, none of them can
execute further.

Transaction-A Time Transaction-B

--- t0 ---

Lock (X) (acquired lock on X) t1 ---

--- t2 Lock (Y) (acquired lock on Y)

Lock (Y) (request lock on Y) t3 ---


12

Wait t4 Lock (X) (request lock on X)

Wait t5 Wait

Wait t6 Wait

Wait t7 Wait

Deadlock Detection and Prevention:


Deadlock detection:
This technique allows deadlock to occur, but then, it detects it and solves it. Here, a database is
periodically checked for deadlocks. If a deadlock is detected, one of the transactions, involved in
deadlock cycle, is aborted. other transaction continue their execution. An aborted transaction is
rolled back and restarted.

Deadlock Prevention:
Deadlock prevention technique avoids the conditions that lead to deadlocking. It requires that
every transaction lock all data items it needs in advance. If any of the items cannot be obtained,
none of the items are locked. In other words, a transaction requesting a new lock is aborted if
there is the possibility that a deadlock can occur. Thus, a timeout may be used to abort
transactions that have been idle for too long. This is a simple but indiscriminate approach. If the
transaction is aborted, all the changes made by this transaction are rolled back and all locks
obtained by the transaction are released. The transaction is then rescheduled for execution.
Deadlock prevention technique is used in two-phase locking.

2. Time-Stamp Methods for Concurrency control :


Timestamp is a unique identifier created by the DBMS to identify the relative starting time of a
transaction.
Typically, timestamp values are assigned in the order in which the transactions are submitted to
the system. So, a timestamp can be thought of as the transaction start time. Therefore, time
stamping is a method of concurrency control in which each transaction is assigned a transaction
timestamp. Timestamps must have two properties namely

1. Uniqueness : The uniqueness property assures that no equal timestamp values can exist.
2. monotonicity : monotonicity assures that timestamp values always increase.
13

Timestamp are divided into further fields :

1. Granule Timestamps
2. Timestamp Ordering
3. Conflict Resolution in Timestamps

1. Granule Timestamps :
Granule timestamp is a record of the timestamp of the last transaction to access it. Each granule
accessed by an active transaction must have a granule timestamp.
A separate record of last Read and Write accesses may be kept. Granule timestamp may cause.
Additional Write operations for Read accesses if they are stored with the granules. The problem
can be avoided by maintaining granule timestamps as an in-memory table. The table may be of
limited size, since conflicts may only occur between current transactions. An entry in a granule
timestamp table consists of the granule identifier and the transaction timestamp. The record
containing the largest (latest) granule timestamp removed from the table is also maintained. A
search for a granule timestamp, using the granule identifier, will either be successful or will use
the largest removed timestamp.

2. Timestamp Ordering :
Following are the three basic variants of timestamp-based methods of concurrency control:

 Total timestamp ordering


 Partial timestamp ordering
 Multiversion timestamp ordering

(a) Total timestamp ordering :


The total timestamp ordering algorithm depends on maintaining access to granules in timestamp
order by aborting one of the transactions involved in any conflicting access. No distinction is
made between Read and Write access, so only a single value is required for each granule
timestamp .

(b)Partial timestamp ordering :


In a partial timestamp ordering, only non-permutable actions are ordered to improve upon the
total timestamp ordering. In this case, both Read and Write granule timestamps are stored.
The algorithm allows the granule to be read by any transaction younger than the last transaction
that updated the granule. A transaction is aborted if it tries to update a granule that has previously
been accessed by a younger transaction. The partial timestamp ordering algorithm aborts fewer
transactions than the total timestamp ordering algorithm, at the cost of extra storage for granule
timestamps
14

(c) Multiversion Timestamp ordering :


The multiversion timestamp ordering algorithm stores several versions of an updated granule,
allowing transactions to see a consistent set of versions for all granules it accesses. So, it reduces
the conflicts that result in transaction restarts to those where there is a Write-Write conflict.
Each update of a granule creates a new version, with an associated granule timestamp.
A transaction that requires read access to the granule sees the youngest version that is older than
the transaction. That is, the version having a timestamp equal to or immediately below the
transaction's timestamp.

3. Conflict Resolution in Timestamps :


To deal with conflicts in timestamp algorithms, some transactions involved in conflicts are made
to wait and to abort others.
Following are the main strategies of conflict resolution in timestamps:
WAIT-DIE:

 The older transaction waits for the younger if the younger has accessed the granule first.
 The younger transaction is aborted (dies) and restarted if it tries to access a granule after
an older concurrent transaction.

WOUND-WAIT:

 The older transaction pre-empts the younger by suspending (wounding) it if the younger
transaction tries to access a granule after an older concurrent transaction.
 An older transaction will wait for a younger one to commit if the younger has accessed a
granule that both want.

The handling of aborted transactions is an important aspect of conflict resolution algorithm. In


the case that the aborted transaction is the one requesting access, the transaction must
be restarted with a new (younger) timestamp. It is possible that the transaction can be repeatedly
aborted if there are conflicts with other transactions.
An aborted transaction that had prior access to granule where conflict occurred can be restarted
with the same timestamp. This will take priority by eliminating the possibility of transaction
being continuously locked out.

Drawbacks of Time-stamp

 Each value stored in the database requires two additional timestamp fields, one for the
lasttime the field (attribute) was read and one for the last update.
 It increases the memory requirements and the processing overhead of database.
15

3. Optimistic Methods of Concurrency Control :


The optimistic method of concurrency control is based on the assumption that conflicts of
database operations are rare and that it is better to let transactions run to completion and only
check for conflicts before they commit.
An optimistic concurrency control method is also known as validation or certification methods.
No checking is done while the transaction is executing. The optimistic method does not require
locking or timestamping techniques. Instead, a transaction is executed without restrictions until it
is committed. In optimistic methods, each transaction moves through the following phases:

a. Read phase.
b. Validation or certification phase.
c. Write phase.

a. Read phase :
In a Read phase, the updates are prepared using private (or local) copies (or versions) of the
granule. In this phase, the transaction reads values of committed data from the database, executes
the needed computations, and makes the updates to a private copy of the database values. All
update operations of the transaction are recorded in a temporary update file, which is not
accessed by the remaining transactions.
It is conventional to allocate a timestamp to each transaction at the end of its Read to determine
the set of transactions that must be examined by the validation procedure. These set of
transactions are those who have finished their Read phases since the start of the transaction being
verified

b. Validation or certification phase :


In a validation (or certification) phase, the transaction is validated to assure that the changes
made will not affect the integrity and consistency of the database.
If the validation test is positive, the transaction goes to the write phase. If the validation test is
negative, the transaction is restarted, and the changes are discarded. Thus, in this phase the list of
granules is checked for conflicts. If conflicts are detected in this phase, the transaction is aborted
and restarted. The validation algorithm must check that the transaction has :

 Seen all modifications of transactions committed after it starts.


 Not read granules updated by a transaction committed after its start.

c. Write phase :
In a Write phase, the changes are permanently applied to the database and the updated granules
are made public. Otherwise, the updates are discarded and the transaction is restarted. This phase
is only for the Read-Write transactions and not for Read-only transactions.
16

Advantages of Optimistic Methods for Concurrency Control :

i. This technique is very efficient when conflicts are rare. The occasional conflicts result in
the transaction roll back.
ii. The rollback involves only the local copy of data, the database is not involved and thus
there will not be any cascading rollbacks.

Problems of Optimistic Methods for Concurrency Control :

i. Conflicts are expensive to deal with, since the conflicting transaction must be rolled back.
ii. Longer transactions are more likely to have conflicts and may be repeatedly rolled
back because of conflicts with short transactions.

Applications of Optimistic Methods for Concurrency Control :

i. Only suitable for environments where there are few conflicts and no long transactions.
ii. Acceptable for mostly Read or Query database systems that require very few update
transactions

What is serializability?

 Serializability is a concurrency scheme where the concurrent transaction is equivalent to one that executes the
transactions serially.
 A schedule is a list of transactions.
 Serial schedule defines each transaction is executed consecutively without any interference from other
transactions.
 Non-serial schedule defines the operations from a group of concurrent transactions that are interleaved.
 In non-serial schedule, if the schedule is not proper, then the problems can arise like multiple update,
uncommitted dependency and incorrect analysis.
 The main objective of serializability is to find non-serial schedules that allow transactions to execute
concurrently without interference and produce a database state that could be produced by a serial execution.

1. Conflict Serializability

 Conflict serializability defines two instructions of two different transactions accessing the same data item to
perform a read/write operation.
 It deals with detecting the instructions that are conflicting in any way and specifying the order in which the
instructions should execute in case there is any conflict.
 A conflict serializability arises when one of the instruction is a write operation.
The following rules are important in Conflict Serializability,
17

1. If two transactions are both read operation, then they are not in conflict.

2. If one transaction wants to perform a read operation and other transaction wants to perform a write
operation, then they are in conflict and cannot be swapped.

3. If both the transactions are for write operation, then they are in conflict, but can be allowed to take place in
any order, because the transactions do not read the value updated by each other.

2. View Serializability

 View serializability is the another type of serializability.


 It can be derived by creating another schedule out of an existing schedule and involves the same set of
transactions.
Example : Let us assume two transactions T1 and T2 that are being serialized to create two different schedules
SH1 and SH2, where T1 and T2 want to access the same data item. Now there can be three scenarios

1. If in SH1, T1 reads the initial value of data item, then in SH2 , T1 should read the initial value of that same
data item.

2. If in SH2, T1 writes a value in the data item which is read by T2, then in SH2, T1 should write the value in
the data item before T2 reads it.

3. If in SH1, T1 performs the final write operation on that data item, then in SH2, T1 should perform the final
write operation on that data item.

If a concurrent schedule is view equivalent to a serial schedule of same transaction then it is said to be View
serializable.

What is a Schedule?
A schedule is a series of operations from one or more transactions. A schedule can be of two types:
 Serial Schedule: When one transaction completely executes before starting another transaction, the
schedule is called serial schedule. A serial schedule is always consistent. e.g.; If a schedule S has
debit transaction T1 and credit transaction T2, possible serial schedules are T1 followed by T2 (T1-
>T2) or T2 followed by T1 ((T1->T2). A serial schedule has low throughput and less resource
utilization.

 Concurrent Schedule: When operations of a transaction are interleaved with operations of other
transactions of a schedule, the schedule is called Concurrent schedule. e.g.; Schedule of debit and
credit transaction shown in Table 1 is concurrent in nature. But concurrency can lead to
inconsistency in the database. The above example of a concurrent schedule is also inconsistent.
18

DBMS | Recoverability of Schedules


As discussed, a transaction may not execute completely due to hardware failure, system crash or software
issues. In that case, we have to rollback the failed transaction. But some other transaction may also have
used values produced by failed transaction. So we have to rollback those transactions as well.

Above table shows a schedule with two transactions, T1 reads and writes A and that value is read and
written by T2. T2 commits. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value
written by T1, it should also be rollbacked. But we have already committed that. So this schedule is
irrecoverable schedule.
Irrecoverable Schedule: When Tj is reading the value updated by Ti and Tj is committed before commit
of Ti, the schedule will be irrecoverable.

Table 2 shows a schedule with two transactions, T1 reads and writes A and that value is read and written
by T2. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value written by T1, it
should also be rollbacked. As it has not committed, we can rollback T2 as well. So it is recoverable with
19

cascading rollback.
Recoverable with cascading rollback: If Tj is reading value updated by Ti and commit of Tj is delayed till
commit of Ti , the schedule is called recoverable with cascading rollback.

Table 3 shows a schedule with two transactions, T1 reads and writes A and commits and that value is read
by T2. But if T1 fails before commit, no other transaction has read its value, so there is no need to
rollback other transaction. So this is a cascadeless recoverable schedule.

Data Recovery

It is the method of restoring the database to its correct state in the event of a failure at the time of the
transaction or after the end of a process. Earlier you have been given the concept of database recovery as a
service which should be provided by all the DBMS for ensuring that the database is dependable and remains in
a consistent state in the presence of failures. In this context, dependability refers to both the flexibility of the
DBMS to various kinds of failure and its ability to recover from those failures. In this chapter, you will gather
a brief knowledge of how this service can be provided. To gain a better understanding of the possible problems
you may encounter in providing a consistent system, you will first learn about the need for recovery and its
types of failure which usually occurs in a database environment.

What is the Need for Recovery of data?

The storage of data usually includes four types of media with an increasing amount of reliability: the main
memory, the magnetic disk, the magnetic tape, and the optical disk. There are many different forms of failure
that can have an effect on database processing and/or transaction and each of them has to be dealt with
20

differently. Some data failures can affect main memory only, while others involve non-volatile or secondary
storage also. Among the sources of failure are:

 Due to hardware or software errors, the system crashes which ultimately resulting in loss of main memory.
 Failures of media, such as head crashes or unreadable media that results in the loss of portions of secondary
storage.
 There can be application software errors, such as logical errors which are accessing the database that can
cause one or more transactions to abort or fail.
 Natural physical disasters can also occur such as fires, floods, earthquakes, or power failures.
 Carelessness or unintentional destruction of data or directories by operators or users.
 Damage or intentional corruption or hampering of data (using malicious software or files) hardware or
software facilities.

Whatever the grounds of the failure are, there are two principal things that you have to consider:

 Failure of main memory including that database buffers.


 Failure of the disk copy of that database.

Database Backup and Recovery


Database Backup

 Database Backup is storage of data that means the copy of the data.
 It is a safeguard against unexpected data loss and application errors.
 It protects the database against data loss.
 If the original data is lost, then using the backup it can reconstructed.
The backups are divided into two types,
1. Physical Backup
2. Logical Backup

1. Physical backups

 Physical Backups are the backups of the physical files used in storing and recovering your database, such as
datafiles, control files and archived redo logs, log files.
 It is a copy of files storing database information to some other location, such as disk, some offline storage like
magnetic tape.
 Physical backups are the foundation of the recovery mechanism in the database.
 Physical backup provides the minute details about the transaction and modification to the database.
2. Logical backup
21

 Logical Backup contains logical data which is extracted from a database.


 It includes backup of logical data like views, procedures, functions, tables, etc.
 It is a useful supplement to physical backups in many circumstances but not a sufficient protection against
data loss without physical backups, because logical backup provides only structural information.

Importance of Backups

 Planning and testing backup helps against failure of media, operating system, software and any other kind of
failures that cause a serious data crash.
 It determines the speed and success of the recovery.
 Physical backup extracts data from physical storage (usually from disk to tape). Operating system is an
example of physical backup.
 Logical backup extracts data using SQL from the database and store it in a binary file.
 Logical backup is used to restore the database objects into the database. So the logical backup utilities allow
DBA (Database Administrator) to back up and recover selected objects within the database.

Storage of Data

Data storage is the memory structure in the system.

The storage of data is divided into three categories:


1. Volatile Memory
2. Non – Volatile Memory
3. Stable Memory

1. Volatile Memory

 Volatile memory can store only a small amount of data. For eg. Main memory, cache memory etc.
 Volatile memory is the primary memory device in the system and placed along with the CPU.
 In volatile memory, if the system crashes, then the data will be lost.
 RAM is a primary storage device which stores a disk buffer, active logs and other related data of a database.
 Primary memory is always faster than secondary memory.
 When we fire a query, the database fetches a data from the primary memory and then moves to the secondary
memory to fetch the record.
 If the primary memory crashes, then the whole data in the primary memory is lost and cannot be recovered.
 To avoid data loss, create a copy of primary memory in the database with all the logs and buffers, create
checkpoints at several places so the data is copied to the database.
22

2. Non – Volatile Memory

 Non – volatile memory is the secondary memory.


 These memories are huge in size, but slow in processing. For eg. Flash memory, hard disk, magnetic tapes
etc.
 If the secondary memory crashes, whole data in the primary memory is lost and cannot be recovered.
To avoid data loss in the secondary memory, there are three methods used to back it up :

1. Remote backup creates a database copy and stores it in the remote network. The database is updated with
the current database and sync with data and other details.

The remote backup is also called as an offline backup because it can be updated manually. If the current
database fails, then the system automatically switches to the remote database and starts functioning. The user
will not know that there was a failure.

2. The database is copied to secondary memory devices like Flash memory, hard disk, magnetic tapes, etc. and
kept in a secured place. If the system crashes or any failure occurs, the data would be copied from these tapes
to bring the database up.

3. The huge amount of data is an overhead to backup the whole database. To overcome this problem the log
files are backed up at regular intervals.

The log file includes all the information about the transaction being made. These files are backed up at regular
intervals and the database is backed up once in a week.

3. Stable Memory

 Stable memory is the third form of the memory structure and same as non-volatile memory.
 In stable memory, copies of the same non – volatile memories are stored in different places, because if the
system crashes and data loss occurs, the data can be recovered from other copies.

Causes of Database Failures

 A database includes a huge amount of data and transaction.


 If the system crashes or failure occurs, then it is very difficult to recover the database.

There are some common causes of failures such as,


23

1. System Crash
2. Transaction Failure
3. Network Failure
4. Disk Failure
5. Media Failure

 Each transaction has ACID property. If we fail to maintain the ACID properties, it is the failure of the
database system.
1. System Crash

 System crash occurs when there is a hardware or software failure or external factors like a power failure.
 The data in the secondary memory is not affected when system crashes because the database has lots of
integrity. Checkpoint prevents the loss of data from secondary memory.
2. Transaction Failure

 The transaction failure is affected on only few tables or processes because of logical errors in the code.
 This failure occurs when there are system errors like deadlock or unavailability of system resources to execute
the transaction.
3. Network Failure

 A network failure occurs when a client – server configuration or distributed database system are connected by
communication networks.
4. Disk Failure

 Disk Failure occurs when there are issues with hard disks like formation of bad sectors, disk head crash,
unavailability of disk etc.
5. Media Failure

 Media failure is the most dangerous failure because, it takes more time to recover than any other kind of
failures.
 A disk controller or disk head crash is a typical example of media failure.
 Natural disasters like floods, earthquakes, power failures, etc. damage the data.

Nested Transactions
A nested transaction is used to provide a transactional guarantee for a subset of operations performed within the
scope of a larger transaction. Doing this allows you to commit and abort the subset of operations independently of
the larger transaction.
24

The rules to the usage of a nested transaction are as follows:

 While the nested (child) transaction is active, the parent transaction may not perform any operations other
than to commit or abort, or to create more child transactions.
 Committing a nested transaction has no effect on the state of the parent transaction. The parent transaction
is still uncommitted. However, the parent transaction can now see any modifications made by the child
transaction. Those modifications, of course, are still hidden to all other transactions until the parent also
commits.
 Likewise, aborting the nested transaction has no effect on the state of the parent transaction. The only result
of the abort is that neither the parent nor any other transactions will see any of the container modifications
performed under the protection of the nested transaction.
 If the parent transaction commits or aborts while it has active children, the child transactions are resolved in
the same way as the parent. That is, if the parent aborts, then the child transactions abort as well. If the
parent commits, then whatever modifications have been performed by the child transactions are also
committed.
 The locks held by a nested transaction are not released when that transaction commits. Rather, they are now
held by the parent transaction until such a time as that parent commits.
 Any container modifications performed by the nested transaction are not visible outside of the larger
encompassing transaction until such a time as that parent transaction is committed.
 The depth of the nesting that you can achieve with nested transaction is limited only by memory.

To create a nested transaction, use the XmlManager::createTransaction method, but pass it the internal Berkeley
DB Transaction object as an argument. For example:

// parent transaction
XmlTransaction parentTxn = myManager.createTransaction();
// child transaction
XmlTransaction childTxn =
XmlManager.createTransaction(parentTxn.getTransaction(), null);

Types of Threats to Database Security


Database attacks are an increasing trend these days. What is the reason behind
database attacks? One reason is the increase in access to data stored in
databases. When the data is been accessed by many people, the chances of data
theft increases. In the past, database attacks were prevalent, but were less in
number as hackers hacked the network more to show it was possible to hack and
not to sell proprietary information. Another reason for database attacks is to
gain money selling sensitive information, which includes credit card numbers,
Social Security Numbers, etc. We previously defined database security and
talked about common database security concepts. Now let’s look at the various
types of threats that affect database security.
Types of threats to database security
25

1. Privilege abuse: When database users are provided with privileges that
exceeds their day-to-day job requirement, these privileges may be abused
intentionally or unintentionally.
Take, for instance, a database administrator in a financial institution. What will
happen if he turns off audit trails or create bogus accounts? He will be able to
transfer money from one account to another thereby abusing the excessive
privilege intentionally.

Having seen how privilege can be abused intentionally, let us see how privilege
can be abused unintentionally. A company is providing a “work from home"
option to its employees and the employee takes a backup of sensitive data to
work on from his home. This not only violates the security policies of the
organization, but also may result in data security breach if the system at home is
compromised.

2. Operating System vulnerabilities: Vulnerabilities in underlying operating


systems like Windows, UNIX, Linux, etc., and the services that are related to
the databases could lead to unauthorized access. This may lead to a Denial of
Service (DoS) attack. This could be prevented by updating the operating system
related security patches as and when they become available.
3. Database rootkits: A database rootkit is a program or a procedure that is
hidden inside the database and that provides administrator-level privileges to
gain access to the data in the database. These rootkits may even turn off alerts
triggered by Intrusion Prevention Systems (IPS). It is possible to install a rootkit
only after compromising the underlying operating system. This can be avoided
by periodical audit trails, else the presence of the database rootkit may go
undetected.
4. Weak authentication: Weak authentication models allow attackers to
employ strategies such as social engineering and brute force to obtain database
login credentials and assume the identity of legitimate database users.
5. Weak audit trails: A weak audit logging mechanism in a database server
represents a critical risk to an organization especially in retail, financial,
healthcare, and other industries with stringent regulatory compliance.
Regulations such as PCI, SOX, and HIPAA demand extensive logging of
actions to reproduce an event at a later point of time in case of an incident.
Logging of sensitive or unusual transactions happening in a database must be
done in an automated manner for resolving incidents. Audit trails act as the last
line of database defense. Audit trails can detect the existence of a violation that
26

could help trace back the violation to a particular point of time and a particular
user.

Authorization
Definition - What does Authorization mean?
Authorization is a security mechanism used to determine user/client privileges or access levels related to
system resources, including computer programs, files, services, data and application features.
Authorization is normally preceded by authentication for user identity verification. System administrators
(SA) are typically assigned permission levels covering all system and user resources.

During authorization, a system verifies an authenticated user's access rules and either grants or refuses
resource access.

Authorization
Modern and multiuser operating systems depend on effectively designed authorization processes to
facilitate application deployment and management. Key factors include user type, number, credentials
requiring verification and related actions and roles. For example, role-based authorization may be
designated by user groups requiring specific user resource tracking privileges. Additionally, authorization
may be based on an enterprise authentication mechanism, like Active Directory (AD), for seamless
security policy integration.

For example, ASP.NET works with Internet Information Server (IIS) and Microsoft Windows to provide
authentication and authorization services for Web-based .NET applications. Windows uses New
Technology File System (NTFS) to maintain Access Control Lists (ACL) for all resources. The ACL
serves as the ultimate authority on resource access.

The .NET Framework provides an alternate role-based security approach for authorization support. Role-
based security is a flexible method that suits server applications and is similar to code access security
checks, where authorized application users are determined according to roles.

Database Authentication
Database authentication is the process or act of confirming that a user who is attempting to log in to a
database is authorized to do so, and is only accorded the rights to perform activities that he or she has
been authorized to do.

Database Authentication
The concept of authentication is familiar to almost everyone. For example, a mobile phone performs
authentication by asking for a PIN. Similarly, a computer authenticates a username by asking for the
corresponding password.
27

In the context of databases, however, authentication acquires one more dimension because it may happen
at different levels. It may be performed by the database itself, or the setup may be changed to allow either
the operating system, or some other external method, to authenticate users.
For example, while creating a database in Microsoft’s SQL Server, a user is required to define whether to
to use database authentication, operating system authentication, or both (the so-called mixed-mode
authentication). Other databases in which security is paramount employ near-foolproof authentication
modes like fingerprint recognition and retinal scans.

Access control
Access control is a security technique that regulates who or what can view or use resources
in a computing environment. It is a fundamental concept in security that minimizes risk to
the business or organization.

There are two types of access control: physical and logical. Physical access control limits
access to campuses, buildings, rooms and physical IT assets. Logical access control limits
connections to computer networks, system files and data.

To secure a facility, organizations use electronic access control systems that rely on user
credentials, access card readers, auditing and reports to track employee access to restricted
business locations and proprietary areas, such as data centers. Some of these systems
incorporate access control panels to restrict entry to rooms and buildings as well as alarms
and lockdown capabilities to prevent unauthorized access or operations.

Access control systems perform identification authentication and authorization of users and
entities by evaluating required login credentials that can include passwords, personal
identification numbers (PINs), biometric scans, security tokens or other authentication
factors. Multifactor authentication, which requires two or more authentication factors, is
often an important part of layered defense to protect access control systems.

These security controls work by identifying an individual or entity, verifying that the person
or application is who or what it claims to be, and authorizing the access level and set of
actions associated with the username or IP address. Directory services and protocols,
including the Local Directory Access Protocol (LDAP) and the Security Assertion Markup
Language (SAML), provide access controls for authenticating and authorizing users and
28

entities and enabling them to connect to computer resources, such as distributed applications
and web servers.

Organizations use different access control models depending on their compliance


requirements and the security levels of information technology they are trying to protect.

Types of access control

The main types of access control are:

 Mandatory access control (MAC): A security model in which access rights are
regulated by a central authority based on multiple levels of security. Often used in
government and military environments, classifications are assigned to system resources
and the operating system or security kernel, grants or denies access to those resource
objects based on the information security clearance of the user or device. For
example, Security Enhanced Linux is an implementation of MAC on the Linux operating
system.

 Discretionary access control (DAC): An access control method in which owners or


administrators of the protected system, data or resource set the policies defining who or
what is authorized to access the resource. Many of these systems enable administrators to
limit the propagation of access rights. A common criticism of DAC systems is a lack of
centralized control.

 Role-based access control (RBAC): A widely used access control mechanism that
restricts access to computer resources based on individuals or groups with defined
business functions -- executive level, engineer level 1 -- rather than the identities of
individual users. The role-based security model relies on a complex structure of role
assignments, role authorizations and role permissions developed using role engineering to
regulate employee access to systems. RBAC systems can be used to enforce MAC and
DAC frameworks.

 Rule-based access control: A security model in which the system administrator defines
the rules that to govern access to resource objects. Often these rules are based on
29

conditions, such as time of day or location. It is not uncommon to use some form of both
rule-based access control and role-based access control to enforce access policies and
procedures.

 Attribute-based access control (ABAC): A methodology that manages access rights by


evaluating a set of rules, policies and relationships using the attributes of users, systems
and environmental conditions.
Use of access control

The goal of access control is to minimize the risk of unauthorized access to physical and
logical systems. Access control is a fundamental component of security compliance
programs that ensures security technology and access control policies are in place to protect
confidential information, such as customer data. Most organizations have infrastructure and
procedures that limit access to networks, computer systems, applications, files and sensitive
data, such as personally identifiable information and intellectual property.

Access control systems are complex and can be challenging to manage in dynamic IT
environments that involve on-premises systems and cloud services. After some high-profile
breaches, technology vendors have shifted away from single sign-on systems to unified
access management, which offers access controls for on-premises and cloud environments.

Implementing access control

Access control is a process that is integrated into an organization's IT environment. It can


involve identity and access management systems. These systems provide access control
software, a user database, and management tools for access control policies, auditing and
enforcement.

Data Encryption in DBMS


A DBMS can use encryption to protect information in certain situations where the normal
security mechanisms of the DBMS are not adequate. For example, an intruder may steal tapes
containing some data or tap a communication line. By storing and transmitting data in an
encrypted form, the DBMS ensures that such stolen data is not intelligible to the intruder. Thus,
encryption is a technique to provide privacy of data.
30

In encryption, the message to be encrypted is known as plaintext. The plaintext is transformed by


a function that is parameterized by a key. The output of the encryption process is known as the
cipher text. Ciphertext is then transmitted over the network. The process of converting the
plaintext to ciphertext is called as Encryption and process of converting the ciphertext to
plaintext is called as Decryption. Encryption is performed at the transmitting end and decryption
is performed at the receiving end. For encryption process we need the encryption key and for
decryption process we need decryption key as shown in figure. Without the knowledge of
decryption key intruder cannot break the ciphertext to plaintext. This process is also called as
Cryptography.
The basic idea behind encryption is to apply an encryption algorithm, which may' be accessible
to the intruder, to the original data and a user-specified or DBA-specified encryption key, 'which
is kept secret. The output of the algorithm is the encrypted version of the data. There is also a
decryption algorithm, which takes the encrypted data and the decryption key as input and then
returns the original data. Without the correct decryption key, the decryption algorithm produces
gibberish. Encryption and decryption keys may be same or· different but there must be relation
between the both which must me secret.

Techniques used for Encryption


There are following techniques used for encryption process:

• Substitution Ciphers
• Transposition Ciphers

Substitution Ciphers: In a substitution cipher each letter or group of letters is replaced by


another letter or group of letters to mask them For example: a is replaced with D, b with E, c
with F and z with C. In this way attack becomes DWWDFN. The substitution ciphers are not
much secure because intruder can easily guess the substitution characters.

Transposition Ciphers: Substitution ciphers preserve the order of the plaintext symbols but
mask them-;-The transposition cipher in contrast reorders the letters but do not mask them. For
this process a key is used. For example: iliveinqadian may be coded as divienaniqnli. The
transposition ciphers are more secure as compared to substitution ciphers.

Algorithms for Encryption Process


31

There are commonly used algorithms for encryption process. These are:

• Data Encryption Standard (DES)


• Public Key Encryption

Data Encryption Standard (DES)

It uses both a substitution of characters and a rearrangement of their order on the basis of an
encryption key. The main weakness of this approach is that authorized users must be told the
encryption key, and the mechanism for communicating this information is vulnerable to clever
intruders.

Public Key Encryption

Another approach to encryption, called public-key encryption, has become increasingly popular
in recent years. The encryption scheme proposed by Rivest, Shamir, and Adheman, called RSA,
is a well-knm.vnexample of public-key encryption. Each authorized user has a public encryption
key, known to everyone and a private decryption key (used by the decryption algorithm), chosen
by the user and known only to him or her. The encryption and decryption algorithms themselves
are assumed to be publicly known.
Consider user called Suneet. Anyone can send Suneet a secret message by encrypting the
message using Sunset's publicly known encryption key. Only Suneet can decrypt this secret
message because the decryption algorithm required Suneet's decryption key, known only to
Suneet. Since users choose their own decryption keys, the weakness 0f DES is avoided.
The main issue for public-key encryption is how encryption and decryption keys are chosen.
Technically, public-key encryption algorithms rely on the existence of one-way functions, which
are functions whose inverse is computationally very hard to determine.
The RSA algorithm, for example is based on the observation that although checking whether a
given number of prime is easy, determining the prime factors of a nonprime number is extremely
hard. (Determining the prime factors of a number with over 100 digits can take years of CPU-
time on the fastest available computers today.)
We now sketch the intuition behind the RSA algorithm, assuming that the data to be encrypted is
an integer 1. To choose an encryption key and a decryption key, our friend Suneet-- create a
public key by computing the product of two large prime numbers: PI and P2. The private key
consists of the pair (PI, P2) and decryption algorithms cannot be used if the product of PI and P2
is known. So we publish the product PI *P2, but an unauthorized user would need to be able to
factor PIP2 to steal data. By choosing PI and P2 to be sufficiently large (over 100 digits), we can
make it very difficult (or nearly impossible) for an intruder to factorize it.
32

Although this technique is secure, but it is also computationally expensive. A hybrid scheme
used for secure communication is to use DES keys exchanged via a public-key encryption
scheme and DES encryption is used on the data transmitted subsequently.

Disadvantages of encryption
There are following problems of Encryption:

 Key management (i.e. keeping keys secret) is a problem. Even in public-key encryption the decryption
key must be kept secret.
 Even in a system that supports encryption, data must often be processed in plaintext form. Thus
sensitive data may still be accessible to transaction programs.
 Encrypting data gives rise to serious technical problems at the level of physical storage organization.
For example indexing over data, which is stored in encrypted form, can be very difficult.

ADVANTAGES OF DATA ENCRYPTION:

1. Encryption Provides Security for Data at All Times


Generally, data is most vulnerable when it is being moved from one location to another.
Encryption works during data transport or at rest, making it an ideal solution no matter where
data is stored or how it is used. Encryption should be standard for all data stored at all times,
regardless of whether or not it is deemed “important”.

2. Encrypted Data Maintains Integrity


Hackers don’t just steal information, they also can benefit from altering data to commit
fraud. While it is possible for skilled individuals to alter encrypted data, recipients of the data
will be able to detect the corruption, which allows for a quick response to the cyber-attack.

3. Encryption Protects Privacy


Encryption is used to protect sensitive data, including personal information for individuals.
This helps to ensure anonymity and privacy, reducing opportunities for surveillance by both
criminals and government agencies. Encryption technology is so powerful that some
governments are attempting to put limits on the effectiveness of encryption—which does not
ensure privacy for companies or individuals.
33

4. Encryption is Part of Compliance


Many industries have strict compliance requirements to help protect those whose personal
information is stored by organizations. HIPAA, FIPS, and other regulations rely on security
methods such as encryption to protect data, and businesses can use encryption to
achieve comprehensive security.

5. Encryption Protects Data across Devices


Multiple (and mobile) devices are a big part of our lives, and transferring data from device to
device is a risky proposition. Encryption technology can help protect store data across all
devices, even during transfer. Additional security measures like advanced authentication help
deter unauthorized users.

The Future of Encryption


As hackers continue to become more savvy and sophisticated, encryption technology must
evolve as well. Security professionals are working on a few different exciting technological
advances in the encryption field, including Elliptic Curve Cryptography (ECC),
homomorphic encryption, and quantum computation.
ECC is a method of cryptography that isn’t so much an improvement of the encryption
method itself, but a method that allows encryption and decryption to take place much faster,
without any loss of data security.
Homomorphic encryption would be a system allowing calculations on encrypted data
without decrypting it. This method would allow encryption across cloud systems, and ensure
greater privacy for users. As an example, a financial institution could make assessments for
individuals without revealing personal information.

DBMS - Storage System

Databases are stored in file formats, which contain records. At physical level, the actual data is
stored in electromagnetic format on some device. These storage devices can be broadly
categorized into three types −
34

 Primary Storage − The memory storage that is directly accessible to the CPU comes
under this category. CPU's internal memory (registers), fast memory (cache), and main
memory (RAM) are directly accessible to the CPU, as they are all placed on the
motherboard or CPU chipset. This storage is typically very small, ultra-fast, and volatile.
Primary storage requires continuous power supply in order to maintain its state. In case
of a power failure, all its data is lost.
 Secondary Storage − Secondary storage devices are used to store data for future use or
as backup. Secondary storage includes memory devices that are not a part of the CPU
chipset or motherboard, for example, magnetic disks, optical disks (DVD, CD, etc.),
hard disks, flash drives, and magnetic tapes.
 Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since such
storage devices are external to the computer system, they are the slowest in speed. These
storage devices are mostly used to take the back up of an entire system. Optical disks
and magnetic tapes are widely used as tertiary storage.

Memory Hierarchy
A computer system has a well-defined hierarchy of memory. A CPU has direct access to it main
memory as well as its inbuilt registers. The access time of the main memory is obviously less
than the CPU speed. To minimize this speed mismatch, cache memory is introduced. Cache
memory provides the fastest access time and it contains data that is most frequently accessed by
the CPU.
The memory with the fastest access is the costliest one. Larger storage devices offer slow speed
and they are less expensive, however they can store huge volumes of data as compared to CPU
registers or cache memory.

Magnetic Disks
Hard disk drives are the most common secondary storage devices in present computer systems.
These are called magnetic disks because they use the concept of magnetization to store
information. Hard disks consist of metal disks coated with magnetizable material. These disks
are placed vertically on a spindle. A read/write head moves in between the disks and is used to
35

magnetize or de-magnetize the spot under it. A magnetized spot can be recognized as 0 (zero) or
1 (one).
Hard disks are formatted in a well-defined order to store data efficiently. A hard disk plate has
many concentric circles on it, called tracks. Every track is further divided into sectors. A sector
on a hard disk typically stores 512 bytes of data.

Redundant Array of Independent Disks


RAID or Redundant Array of Independent Disks, is a technology to connect multiple secondary
storage devices and use them as a single storage media.
RAID consists of an array of disks in which multiple disks are connected together to achieve
different goals. RAID levels define the use of disk arrays.

RAID 0
In this level, a striped array of disks is implemented. The data is broken down into blocks and
the blocks are distributed among disks. Each disk receives a block of data to write/read in
parallel. It enhances the speed and performance of the storage device. There is no parity and
backup in Level 0.

RAID 1
RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy of
data to all the disks in the array. RAID level 1 is also called mirroring and provides 100%
redundancy in case of a failure.

RAID 2
RAID 2 records Error Correction Code using Hamming distance for its data, striped on different
disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes of the
36

data words are stored on a different set disks. Due to its complex structure and high cost, RAID
2 is not commercially available.

RAID 3
RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored on
a different disk. This technique makes it to overcome single disk failures.

RAID 4
In this level, an entire block of data is written onto data disks and then the parity is generated
and stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4 uses
block-level striping. Both level 3 and level 4 require at least three disks to implement RAID.

RAID 5
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data
block stripe are distributed among all the data disks rather than storing them on a different
dedicated disk.
37

RAID 6
RAID 6 is an extension of level 5. In this level, two independent parities are generated and
stored in distributed fashion among multiple disks. Two parities provide additional fault
tolerance. This level requires at least four disk drives to implement RAID.

Top Advantages of Common RAID Systems


Most people know that RAID is widely used in the contemporary era. Yet, only a few of them realize the actual
advantages of it. Thus, in this post, we will exhibit you 5 primary merits of RAID.

Nowadays, redundant array of independent disks (RAID) is accepted and used by more and more businesses and
individuals. As we all know, RAID is a kind of data storage technology that can combine multiple physical disks into a
single logical unit. Therefore, without any doubts, in comparison to a hard disk drive, RAID has much more benefits to
users, including data redundancies, fast speed, etc. Now, in the followings, we’ll show 5 of them in detail.
38

1. Large Storage
First of all, undoubtedly, one of the most obvious advantages is that the RAID has much more storage space than a
single drive. It’s known that RAID arrays usually consist of two or more disks. Also, if you want more extra storage, you
can simply insert an additional hard drive into the array. Apparently, it is pretty convenient.

2. Fault Tolerance
In most RAID levels, data backup in the array will be automatically created. This is achieved by data parity of RAID,
generally called data redundancy. Moreover, in this way, RAID system becomes fault tolerant. In face of this feature,
many users treat RAID as a backup. Whereas, as a matter of fact, data loss still can happen on RAID systems now and
then. Therefore, you still should make backups for the data stored in the RAID arrays so as to avoid data recovery,
like PST recovery.

3. Continuous System Running


On a computer which contains only one hard drive, if the drive fails, the operating system will stop at once. However, in
a RAID arrays, if a hard disk fails, the system will be able to keep running normally for a certain time. In this process,
users can seize the time to replace the failed drive with a new appropriate one.

4. Parity Check
Furthermore, modern RAID comes endowed with an extremely important and excellent function – parity check. This
feature can check for any potential system crashes and warn you. At that time, you should figure out the reasons behind
the issues and fix them as soon as possible.

5. Fast Speed
Last but not least, RAID systems are able to work much faster than a single drive. It’s because that in the array, reading
and writing of data can be done at the same time. Therefore, the transmission rate is improved. Users can achieve better
disk performance.

You might also like