Unit - 2 (1) DBMS
Unit - 2 (1) DBMS
spread over different sites, i.e, on multiple computers or over a network of computers.
A distributed database system is located on various sites that don’t share physical
It needs to be managed such that for the users it looks like one single database.
Types:
1. Homogeneous Database:
operating system, database management system, and the data structures used – all are the
In a heterogeneous distributed database, different sites can use different schema and
Also, a particular site might be completely unaware of the other sites. Different
They may even use different data models for the database. Hence, translations are
There are 2 ways in which data can be stored on different sites. These are:
1. Replication –
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the entire
This is advantageous as it increases the availability of data at different sites. Also, now
However, it has certain disadvantages as well. Data needs to be constantly updated. Any
change made at one site needs to be recorded at every site that relation is stored or else it
may lead to inconsistency. This is a lot of overhead. Also, concurrency control becomes
way more complex as concurrent access now needs to be checked over a number of sites.
2. Fragmentation –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and
each of the fragments is stored in different sites where they’re required. It must be made
sure that the fragments are such that they can be used to reconstruct the original relation
problem.
The relation is fragmented into groups of tuples so that each tuple is assigned to
The schema of the relation is divided into smaller schemas. Each fragment must
A distributed database system is a type of database management system that stores data
database system, each site has its own database, and the databases are connected to each
e main advantage of a distributed database system is that it can provide higher availability
and reliability than a centralized database system. Because the data is stored across multiple
sites, the system can continue to function even if one or more sites fail. In addition, a
distributed database system can provide better performance by distributing the data and
Distribution − It states the physical distribution of data across the different sites.
Autonomy − It indicates the distribution of control of the database system and the
Architectural Models
This is a two-level architecture where the functionality is divided into servers and clients. The
server functions primarily encompass data management, query processing, optimization and
transaction management. Client functions include mainly user interface. However, they have
In these systems, each peer acts both as a client and a server for imparting database services.
The peers share their resource with other peers and co-ordinate their activities.
database systems.
Multi-database Internal Level − Depicts the data distribution across different sites
Local database Conceptual Level − Depicts local data organization at each site.
Local database Internal Level − Depicts physical data organization at each site.
The distribution design alternatives for the tables in a DDBMS are as follows −
Fully replicated
Partially replicated
Fragmented
Mixed
In this design alternative, different tables are placed at different sites. Data is placed so that it
is at a close proximity to the site where it is used most. It is most suitable for database
systems where the percentage of queries needed to join information in tables placed at
different sites is low. If an appropriate distribution strategy is adopted, then this design
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost
during update operations. Hence, this is suitable for systems where a large number of queries
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the
tables is done in accordance to the frequency of access. This takes into consideration the fact
that the frequency of accessing the tables vary considerably from site to site. The number of
copies of the tables (or portions) depends on how frequently the access queries execute and
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions,
and each fragment can be stored at different sites. This considers the fact that it seldom
happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially
fragmented in any form (horizontal or vertical), and then these fragments are partially
replicated across the different sites according to the frequency of accessing the fragments.
Transactions
unit of data processing. The operations performed in a transaction include one or more of
database operations like insert, delete, update or retrieve data. It is an atomic process that is
either performed into completion entirely or is not performed at all. A transaction involving
only data retrieval without any data update is called read-only transaction.
Each high level operation can be divided into a number of low level tasks or operations. For
Database access is restricted to read_item() and write_item() operations. Likewise, for all
commit − A signal to specify that the transaction has been successfully completed in
rollback − A signal to specify that the transaction has been unsuccessful and so all
rolled back.
Transaction States
A transaction may go through a subset of five states, active, partially committed, committed,
Active − The initial state where the transaction enters is the active state. The
transaction remains in this state while it is executing read, write or other operations.
Partially Committed − The transaction enters this state after the last statement of the
Committed − The transaction enters this state after successful completion of the
Failed − The transaction goes from partially committed state or active state to failed
state when it is discovered that normal execution can no longer proceed or system
checks fail.
Aborted − This is the state after the transaction has been rolled back after failure and
the database has been restored to its state that was before the transaction began.
The following state transition diagram depicts the states in the transaction and the low level
Any transaction must maintain the ACID properties, viz. Atomicity, Consistency, Isolation,
and Durability.
that is, either it is performed in its entirety or not performed at all. No partial update
should exist.
Consistency − A transaction should take the database from one consistent state to
another consistent state. It should not adversely affect any data item in the database.
There should not be any interference from the other concurrent transactions that are
simultaneously running.
Durability − If a committed transaction brings about a change, that change should be
T3………..Tn; for any transaction Ti, the operations in Ti must execute as laid down in the
schedule S.
Types of Schedules
Serial Schedules − In a serial schedule, at any point of time, only one transaction is
graph −
Parallel Schedules − In parallel schedules, more than one transactions are active
simultaneously, i.e. the transactions contain operations that overlap at time. This is
At least one of the operations is a write_item() operation, i.e. it tries to modify the
data item.
Serializability
serial schedule comprising of the same ‘n’ transactions. A serializable schedule contains the
correctness of serial schedule while ascertaining better CPU utilization of parallel schedule.
Equivalence of Schedules
Result equivalence − Two schedules producing identical results are said to be result
equivalent.
View equivalence − Two schedules that perform similar action in a similar manner
contain the same set of transactions and has the same order of conflicting pairs of
operations.
In a local database system, for committing a transaction, the transaction manager has to only
convey the decision to commit to the recovery manager. However, in a distributed system,
the transaction manager should convey the decision to commit to all the servers in the various
sites where the transaction is being executed and uniformly enforce the decision. When
processing is complete at each site, it reaches the partially committed transaction state and
waits for all other transactions to reach their partially committed states. When it receives the
message that all the sites are ready to commit, it starts to commit. In a distributed system,
One-phase commit
Two-phase commit
Three-phase commit
Distributed one-phase commit is the simplest commit protocol. Let us consider that there is a
controlling site and a number of slave sites where the transaction is being executed. The steps
After each slave has locally completed its transaction, it sends a “DONE” message to
When the controlling site receives “DONE” message from each slave, it makes a
decision to commit or abort. This is called the commit point. Then, it sends this
On receiving this message, a slave either commits or aborts and then sends an
Distributed two-phase commit reduces the vulnerability of one-phase commit protocols. The
After each slave has locally completed its transaction, it sends a “DONE” message to
the controlling site. When the controlling site has received “DONE” message from all
The slaves vote on whether they still want to commit or not. If a slave wants to
A slave that does not want to commit sends a “Not Ready” message. This may happen
After the controlling site has received “Ready” message from all the slaves −
o The slaves apply the transaction and send a “Commit ACK” message to the
controlling site.
o When the controlling site receives “Commit ACK” message from all the
After the controlling site has received the first “Not Ready” message from any slave −
o The slaves abort the transaction and send a “Abort ACK” message to the
controlling site.
o When the controlling site receives “Abort ACK” message from all the slaves,
The steps are same as two-phase commit except that “Commit ACK”/”Abort ACK” message
is not required.
Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved
are:
1. Parsing and translation
2. Optimization
3. Evaluation
As query processing includes certain activities for data retrieval. Initially, the given user
queries get translated in high-level database languages such as SQL. It gets translated into
expressions that can be further used at the physical level of the file system. After this, the
actual evaluation of the queries and a variety of query -optimizing transformations and takes
place. Thus before processing a query, a computer system needs to translate the query into a
Language is the best suitable choice for humans. But, it is not perfectly suitable for the
internal representation of the query to the system. Relational algebra is well suited for the
internal representation of a query. The translation process in query processing is similar to the
parser of a query. When a user executes any query, for generating the internal form of the
query, the parser in the system checks the syntax of the query, verifies the name of the
relation in the database, the tuple, and finally the required attribute value. The parser creates a
tree of the query, known as 'parse-tree.' Further, translate it into the form of relational algebra.
With this, it evenly replaces all the use of the views when used in the query.
Thus, we can understand the working of a query processing in the below-described diagram:
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the following query
is undertaken:
Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra. We can bring this query in the relational algebra form as:
After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation. Thus, after translating the user query, the system executes a query
evaluation plan.
o In order to fully evaluate a query, the system needs to construct a query evaluation
plan.
o The annotations in the evaluation plan may refer to the algorithms to be used for the
evaluation primitives carry the instructions needed for the evaluation of the operation.
o Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query
execution plan.
o A query execution engine is responsible for generating the output of the given query.
It takes the query execution plan, executes it, and finally makes the output for the user
query.
Optimization
o The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to
minimizes its cost. This type of task performed by the database system and is known
as Query Optimization.
o For optimizing a query, the query optimizer should have an estimated cost analysis of
each operation. It is because the overall operation cost depends on the memory
Finally, after selecting an evaluation plan, the system evaluates the query and produces the
Complex data types such as nested record structures, multi-valued attributes, and inheritance
handle huge amounts of data spread across different sites and ensure high availability,
be based on an existing class, inheriting all its attributes and methods. In the context of
databases, inheritance allows for hierarchies in which each child has characteristics of its
parent12.
There are several ways to implement inheritance in a database. One way is to use a one table
per inheritance hierarchy approach, where each table represents a level in the hierarchy.
Another way is to use a one table per entity approach, where each table represents an entity
in the hierarchy. A third way is to use a one table per entity with all attributes approach,
where each table contains all the attributes of all the entities in the hierarchy
Arrays and multisets are two collection types supported by SQL 23. An array is an ordered
Arrays and multisets provide powerful capabilities for storing and querying structured data in
SQL DBMS. By using these types, you can organize and manipulate data more efficiently,
object identity and reference types are related to how data is represented and managed within
1. Object Identity:
In the realm of databases, object identity refers to the unique identification of entities
or objects within the system. Each object in a distributed database has a distinct
identity that distinguishes it from other objects. This identity is crucial for various
the distribution of data across multiple nodes or sites. DDBMS must ensure that each
identify it across the distributed system. This is essential for maintaining consistency
and integrity in the database, especially when transactions involve multiple nodes.
2. Reference Types:
Local References: These references point to objects within the same node or
To support global references, DDBMS may use techniques such as distributed naming
resolve global references and ensure that transactions involving objects on different
crucial to maintaining the integrity of the data and supporting complex queries and
transactions.
In summary, object identity and reference types in DDBMS are fundamental concepts that
address the unique identification of objects and the establishment of relationships between
databases.
1. Data Model:
Model: OODBMS follows an object-oriented data model.
Objects: Data is stored in the form of objects, which encapsulate both data
and behavior.
Supports complex data types such as arrays, lists, and nested objects.
3. Query Language:
Typically uses a language specifically designed for working with objects, such
4. Scalability:
relationships.
5. Examples:
ObjectStore, db4o.
1. Data Model:
features.
3. Query Language:
4. Scalability:
traditional OODBMS.
5. Examples:
Summary:
objects with attributes and behaviors, and it's suitable for applications with complex,
interconnected data.