[go: up one dir, main page]

0% found this document useful (0 votes)
117 views25 pages

Unit - 2 (1) DBMS

A distributed database system stores data across multiple computers or sites connected by a network. This allows for higher availability and reliability compared to a centralized system, as the failure of one site will not cause the whole system to fail. Data and processing loads are also distributed across sites for improved performance. There are different architectures for distributed database systems depending on how data is distributed, the degree of autonomy at each site, and whether the systems are homogeneous or heterogeneous.

Uploaded by

admin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views25 pages

Unit - 2 (1) DBMS

A distributed database system stores data across multiple computers or sites connected by a network. This allows for higher availability and reliability compared to a centralized system, as the failure of one site will not cause the whole system to fail. Data and processing loads are also distributed across sites for improved performance. There are different architectures for distributed database systems depending on how data is distributed, the degree of autonomy at each site, and whether the systems are homogeneous or heterogeneous.

Uploaded by

admin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Distributed Database System

A distributed database is basically a database that is not limited to one system, it is

spread over different sites, i.e, on multiple computers or over a network of computers.

A distributed database system is located on various sites that don’t share physical

components. This may be required when a particular database needs to be accessed by

various users globally.

It needs to be managed such that for the users it looks like one single database.

Types:

1. Homogeneous Database:

In a homogeneous database, all different sites store database identically. The

operating system, database management system, and the data structures used – all are the

same at all sites.

Hence, they’re easy to manage.


2. Heterogeneous Database:

In a heterogeneous distributed database, different sites can use different schema and

software that can lead to problems in query processing and transactions.

Also, a particular site might be completely unaware of the other sites. Different

computers may use a different operating system, different database application. T

They may even use different data models for the database. Hence, translations are

required for different sites to communicate.

Distributed Data Storage :

There are 2 ways in which data can be stored on different sites. These are:

1. Replication –

In this approach, the entire relationship is stored redundantly at 2 or more sites. If the entire

database is available at all sites, it is a fully redundant database. Hence, in replication,

systems maintain copies of data.

This is advantageous as it increases the availability of data at different sites. Also, now

query requests can be processed in parallel.

However, it has certain disadvantages as well. Data needs to be constantly updated. Any

change made at one site needs to be recorded at every site that relation is stored or else it

may lead to inconsistency. This is a lot of overhead. Also, concurrency control becomes

way more complex as concurrent access now needs to be checked over a number of sites.

2. Fragmentation –

In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and

each of the fragments is stored in different sites where they’re required. It must be made

sure that the fragments are such that they can be used to reconstruct the original relation

(i.e, there isn’t any loss of data).


Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a

problem.

Fragmentation of relations can be done in two ways:

 Horizontal fragmentation – Splitting by rows –

The relation is fragmented into groups of tuples so that each tuple is assigned to

at least one fragment.

 Vertical fragmentation – Splitting by columns –

The schema of the relation is divided into smaller schemas. Each fragment must

contain a common candidate key so as to ensure a lossless join.

In certain cases, an approach that is hybrid of fragmentation and replication is used.

Applications of Distributed Database:

 It is used in Corporate Management Information System.

 It is used in multimedia applications.

 Used in Military’s control system, Hotel chains etc.

 It is also used in manufacturing control system.

A distributed database system is a type of database management system that stores data

across multiple computers or sites that are connected by a network. In a distributed

database system, each site has its own database, and the databases are connected to each

other to form a single, integrated system.

e main advantage of a distributed database system is that it can provide higher availability

and reliability than a centralized database system. Because the data is stored across multiple
sites, the system can continue to function even if one or more sites fail. In addition, a

distributed database system can provide better performance by distributing the data and

processing load across multiple sites.

Distributed DBMS Architectures

DDBMS architectures are generally developed depending on three parameters −

 Distribution − It states the physical distribution of data across the different sites.

 Autonomy − It indicates the distribution of control of the database system and the

degree to which each constituent DBMS can operate independently.

 Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system

components and databases.

Architectural Models

Some of the common architectural models are −

 Client - Server Architecture for DDBMS

 Peer - to - Peer Architecture for DDBMS

 Multi - DBMS Architecture

Client - Server Architecture for DDBMS

This is a two-level architecture where the functionality is divided into servers and clients. The

server functions primarily encompass data management, query processing, optimization and

transaction management. Client functions include mainly user interface. However, they have

some functions like consistency checking and transaction management.

The two different client - server architecture are −


 Single Server Multiple Client

 Multiple Server Multiple Client (shown in the following diagram)

Peer- to-Peer Architecture for DDBMS

In these systems, each peer acts both as a client and a server for imparting database services.

The peers share their resource with other peers and co-ordinate their activities.

This architecture generally has four levels of schemas −

 Global Conceptual Schema − Depicts the global logical view of data.

 Local Conceptual Schema − Depicts logical data organization at each site.

 Local Internal Schema − Depicts physical data organization at each site.

 External Schema − Depicts user view of data.


Multi - DBMS Architectures

This is an integrated database system formed by a collection of two or more autonomous

database systems.

Multi-DBMS can be expressed through six levels of schemas −

 Multi-database View Level − Depicts multiple user views comprising of subsets of

the integrated distributed database.

 Multi-database Conceptual Level − Depicts integrated multi-database that

comprises of global logical multi-database structure definitions.

 Multi-database Internal Level − Depicts the data distribution across different sites

and multi-database to local data mapping.

 Local database View Level − Depicts public view of local data.

 Local database Conceptual Level − Depicts local data organization at each site.
 Local database Internal Level − Depicts physical data organization at each site.

There are two design alternatives for multi-DBMS −

 Model with multi-database conceptual level.

 Model without multi-database conceptual level.


Design Alternatives

The distribution design alternatives for the tables in a DDBMS are as follows −

 Non-replicated and non-fragmented

 Fully replicated

 Partially replicated

 Fragmented

 Mixed

Non-replicated & Non-fragmented

In this design alternative, different tables are placed at different sites. Data is placed so that it

is at a close proximity to the site where it is used most. It is most suitable for database

systems where the percentage of queries needed to join information in tables placed at
different sites is low. If an appropriate distribution strategy is adopted, then this design

alternative helps to reduce the communication cost during data processing.

Fully Replicated

In this design alternative, at each site, one copy of all the database tables is stored. Since,

each site has its own copy of the entire database, queries are very fast requiring negligible

communication cost. On the contrary, the massive redundancy in data requires huge cost

during update operations. Hence, this is suitable for systems where a large number of queries

is required to be handled whereas the number of database updates is low.

Partially Replicated

Copies of tables or portions of tables are stored at different sites. The distribution of the

tables is done in accordance to the frequency of access. This takes into consideration the fact

that the frequency of accessing the tables vary considerably from site to site. The number of

copies of the tables (or portions) depends on how frequently the access queries execute and

the site which generate the access queries.

Fragmented

In this design, a table is divided into two or more pieces referred to as fragments or partitions,

and each fragment can be stored at different sites. This considers the fact that it seldom

happens that all data stored in a table is required at a given site. Moreover, fragmentation

increases parallelism and provides better disaster recovery. Here, there is only one copy of

each fragment in the system, i.e. no redundant data.

The three fragmentation techniques are −


 Vertical fragmentation

 Horizontal fragmentation

 Hybrid fragmentation

Mixed Distribution

This is a combination of fragmentation and partial replications. Here, the tables are initially

fragmented in any form (horizontal or vertical), and then these fragments are partially

replicated across the different sites according to the frequency of accessing the fragments.

DDBMS - Transaction Processing Systems

Transactions

A transaction is a program including a collection of database operations, executed as a logical

unit of data processing. The operations performed in a transaction include one or more of

database operations like insert, delete, update or retrieve data. It is an atomic process that is

either performed into completion entirely or is not performed at all. A transaction involving

only data retrieval without any data update is called read-only transaction.

Each high level operation can be divided into a number of low level tasks or operations. For

example, a data update operation can be divided into three tasks −

 read_item() − reads data item from storage to main memory.

 modify_item() − change value of item in the main memory.

 write_item() − write the modified value from main memory to storage.

Database access is restricted to read_item() and write_item() operations. Likewise, for all

transactions, read and write forms the basic database operations.


Transaction Operations

The low level operations performed in a transaction are −

 begin_transaction − A marker that specifies start of transaction execution.

 read_item or write_item − Database operations that may be interleaved with main

memory operations as a part of transaction.

 end_transaction − A marker that specifies end of transaction.

 commit − A signal to specify that the transaction has been successfully completed in

its entirety and will not be undone.

 rollback − A signal to specify that the transaction has been unsuccessful and so all

temporary changes in the database are undone. A committed transaction cannot be

rolled back.

Transaction States

A transaction may go through a subset of five states, active, partially committed, committed,

failed and aborted.

 Active − The initial state where the transaction enters is the active state. The

transaction remains in this state while it is executing read, write or other operations.

 Partially Committed − The transaction enters this state after the last statement of the

transaction has been executed.

 Committed − The transaction enters this state after successful completion of the

transaction and system checks have issued commit signal.

 Failed − The transaction goes from partially committed state or active state to failed

state when it is discovered that normal execution can no longer proceed or system

checks fail.
 Aborted − This is the state after the transaction has been rolled back after failure and

the database has been restored to its state that was before the transaction began.

The following state transition diagram depicts the states in the transaction and the low level

transaction operations that causes change in states.

Desirable Properties of Transactions

Any transaction must maintain the ACID properties, viz. Atomicity, Consistency, Isolation,

and Durability.

 Atomicity − This property states that a transaction is an atomic unit of processing,

that is, either it is performed in its entirety or not performed at all. No partial update

should exist.

 Consistency − A transaction should take the database from one consistent state to

another consistent state. It should not adversely affect any data item in the database.

 Isolation − A transaction should be executed as if it is the only one in the system.

There should not be any interference from the other concurrent transactions that are

simultaneously running.
 Durability − If a committed transaction brings about a change, that change should be

durable in the database and not lost in case of any failure.

Schedules and Conflicts

In a system with a number of simultaneous transactions, a schedule is the total order of

execution of operations. Given a schedule S comprising of n transactions, say T1, T2,

T3………..Tn; for any transaction Ti, the operations in Ti must execute as laid down in the

schedule S.

Types of Schedules

There are two types of schedules −

 Serial Schedules − In a serial schedule, at any point of time, only one transaction is

active, i.e. there is no overlapping of transactions. This is depicted in the following

graph −

 Parallel Schedules − In parallel schedules, more than one transactions are active

simultaneously, i.e. the transactions contain operations that overlap at time. This is

depicted in the following graph −


Conflicts in Schedules

In a schedule comprising of multiple transactions, a conflict occurs when two active

transactions perform non-compatible operations. Two operations are said to be in conflict,

when all of the following three conditions exists simultaneously −

 The two operations are parts of different transactions.

 Both the operations access the same data item.

 At least one of the operations is a write_item() operation, i.e. it tries to modify the

data item.

Serializability

A serializable schedule of ‘n’ transactions is a parallel schedule which is equivalent to a

serial schedule comprising of the same ‘n’ transactions. A serializable schedule contains the

correctness of serial schedule while ascertaining better CPU utilization of parallel schedule.

Equivalence of Schedules

Equivalence of two schedules can be of the following types −

 Result equivalence − Two schedules producing identical results are said to be result

equivalent.

 View equivalence − Two schedules that perform similar action in a similar manner

are said to be view equivalent.


 Conflict equivalence − Two schedules are said to be conflict equivalent if both

contain the same set of transactions and has the same order of conflicting pairs of

operations.

Distributed DBMS - Commit Protocols

In a local database system, for committing a transaction, the transaction manager has to only

convey the decision to commit to the recovery manager. However, in a distributed system,

the transaction manager should convey the decision to commit to all the servers in the various

sites where the transaction is being executed and uniformly enforce the decision. When

processing is complete at each site, it reaches the partially committed transaction state and

waits for all other transactions to reach their partially committed states. When it receives the

message that all the sites are ready to commit, it starts to commit. In a distributed system,

either all sites commit or none of them does.

The different distributed commit protocols are −

 One-phase commit

 Two-phase commit

 Three-phase commit

Distributed One-phase Commit

Distributed one-phase commit is the simplest commit protocol. Let us consider that there is a

controlling site and a number of slave sites where the transaction is being executed. The steps

in distributed commit are −

 After each slave has locally completed its transaction, it sends a “DONE” message to

the controlling site.


 The slaves wait for “Commit” or “Abort” message from the controlling site. This

waiting time is called window of vulnerability.

 When the controlling site receives “DONE” message from each slave, it makes a

decision to commit or abort. This is called the commit point. Then, it sends this

message to all the slaves.

 On receiving this message, a slave either commits or aborts and then sends an

acknowledgement message to the controlling site.

Distributed Two-phase Commit

Distributed two-phase commit reduces the vulnerability of one-phase commit protocols. The

steps performed in the two phases are as follows −

Phase 1: Prepare Phase

 After each slave has locally completed its transaction, it sends a “DONE” message to

the controlling site. When the controlling site has received “DONE” message from all

slaves, it sends a “Prepare” message to the slaves.

 The slaves vote on whether they still want to commit or not. If a slave wants to

commit, it sends a “Ready” message.

 A slave that does not want to commit sends a “Not Ready” message. This may happen

when the slave has conflicting concurrent transactions or there is a timeout.

Phase 2: Commit/Abort Phase

 After the controlling site has received “Ready” message from all the slaves −

o The controlling site sends a “Global Commit” message to the slaves.

o The slaves apply the transaction and send a “Commit ACK” message to the

controlling site.
o When the controlling site receives “Commit ACK” message from all the

slaves, it considers the transaction as committed.

 After the controlling site has received the first “Not Ready” message from any slave −

o The controlling site sends a “Global Abort” message to the slaves.

o The slaves abort the transaction and send a “Abort ACK” message to the

controlling site.

o When the controlling site receives “Abort ACK” message from all the slaves,

it considers the transaction as aborted.

Distributed Three-phase Commit

The steps in distributed three-phase commit are as follows −

Phase 1: Prepare Phase

The steps are same as in distributed two-phase commit.

Phase 2: Prepare to Commit Phase

 The controlling site issues an “Enter Prepared State” broadcast message.

 The slave sites vote “OK” in response.

Phase 3: Commit / Abort Phase

The steps are same as two-phase commit except that “Commit ACK”/”Abort ACK” message

is not required.

Query Processing in DBMS

Query Processing is the activity performed in extracting data from the database. In query

processing, it takes various steps for fetching the data from the database. The steps involved

are:
1. Parsing and translation

2. Optimization

3. Evaluation

The query processing works in the following way:

Parsing and Translation

As query processing includes certain activities for data retrieval. Initially, the given user

queries get translated in high-level database languages such as SQL. It gets translated into

expressions that can be further used at the physical level of the file system. After this, the

actual evaluation of the queries and a variety of query -optimizing transformations and takes

place. Thus before processing a query, a computer system needs to translate the query into a

human-readable and understandable language. Consequently, SQL or Structured Query

Language is the best suitable choice for humans. But, it is not perfectly suitable for the

internal representation of the query to the system. Relational algebra is well suited for the

internal representation of a query. The translation process in query processing is similar to the

parser of a query. When a user executes any query, for generating the internal form of the

query, the parser in the system checks the syntax of the query, verifies the name of the

relation in the database, the tuple, and finally the required attribute value. The parser creates a

tree of the query, known as 'parse-tree.' Further, translate it into the form of relational algebra.

With this, it evenly replaces all the use of the views when used in the query.

Thus, we can understand the working of a query processing in the below-described diagram:
Suppose a user executes a query. As we have learned that there are various methods of

extracting the data from the database. In SQL, a user wants to fetch the records of the

employees whose salary is greater than or equal to 10000. For doing this, the following query

is undertaken:

select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to be translated in the form of

relational algebra. We can bring this query in the relational algebra form as:

o σsalary>10000 (πsalary (Employee))

o πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra operation by using

different algorithms. So, in this way, a query processing begins its working.
Evaluation

For this, with addition to the relational algebra translation, it is required to annotate the

translated relational algebra expression with the instructions used for specifying and

evaluating each operation. Thus, after translating the user query, the system executes a query

evaluation plan.

Query Evaluation Plan

o In order to fully evaluate a query, the system needs to construct a query evaluation

plan.

o The annotations in the evaluation plan may refer to the algorithms to be used for the

particular index or the specific operations.

o Such relational algebra with annotations is referred to as Evaluation Primitives. The

evaluation primitives carry the instructions needed for the evaluation of the operation.

o Thus, a query evaluation plan defines a sequence of primitive operations used for

evaluating a query. The query evaluation plan is also referred to as the query

execution plan.

o A query execution engine is responsible for generating the output of the given query.

It takes the query execution plan, executes it, and finally makes the output for the user

query.

Optimization

o The cost of the query evaluation can vary for different types of queries. Although the

system is responsible for constructing the evaluation plan, the user does need not to

write their query efficiently.


o Usually, a database system generates an efficient query evaluation plan, which

minimizes its cost. This type of task performed by the database system and is known

as Query Optimization.

o For optimizing a query, the query optimizer should have an estimated cost analysis of

each operation. It is because the overall operation cost depends on the memory

allocations to several operations, execution costs, and so on.

Finally, after selecting an evaluation plan, the system evaluates the query and produces the

output of the query.

Complex data types such as nested record structures, multi-valued attributes, and inheritance

are supported by traditional programming languages. However, these features had to be

translated into simpler SQL data type.

n a Distributed Database Management System (DDBMS), data is stored across multiple

interconnected databases situated in different geographical areas. DDBMSs are designed to

handle huge amounts of data spread across different sites and ensure high availability,

scalability, and fault tolerance mechanisms1.

Inheritance is a feature of object-oriented programming languages that allows a new class to

be based on an existing class, inheriting all its attributes and methods. In the context of

databases, inheritance allows for hierarchies in which each child has characteristics of its

parent12.

There are several ways to implement inheritance in a database. One way is to use a one table

per inheritance hierarchy approach, where each table represents a level in the hierarchy.

Another way is to use a one table per entity approach, where each table represents an entity

in the hierarchy. A third way is to use a one table per entity with all attributes approach,

where each table contains all the attributes of all the entities in the hierarchy
Arrays and multisets are two collection types supported by SQL 23. An array is an ordered

collection of elements of the same type, while a multiset is an unordered collection of

elements that may occur multiple times23.

Arrays and multisets provide powerful capabilities for storing and querying structured data in

SQL DBMS. By using these types, you can organize and manipulate data more efficiently,

allowing for more flexible and expressive database designs2.

In the context of Distributed Database Management Systems (DDBMS), the concepts of

object identity and reference types are related to how data is represented and managed within

a distributed environment. Let's explore these concepts:

1. Object Identity:

In the realm of databases, object identity refers to the unique identification of entities

or objects within the system. Each object in a distributed database has a distinct

identity that distinguishes it from other objects. This identity is crucial for various

database operations, such as retrieval, update, and deletion.

In a distributed environment, maintaining object identity becomes challenging due to

the distribution of data across multiple nodes or sites. DDBMS must ensure that each

object has a globally unique identifier (GUID) or some mechanism to uniquely

identify it across the distributed system. This is essential for maintaining consistency

and integrity in the database, especially when transactions involve multiple nodes.

2. Reference Types:

Reference types, also known as references or pointers, are mechanisms used to

establish relationships between different objects in a database. In a distributed


database, these relationships can exist between objects residing on different nodes.

There are generally two types of reference types:

 Local References: These references point to objects within the same node or

site. They are relatively straightforward to manage, as they involve addressing

objects within the local context.

 Global References: These references point to objects located on different

nodes or sites in the distributed environment. Managing global references is

more complex, as it requires mechanisms to locate and access the referenced

objects across the distributed system.

To support global references, DDBMS may use techniques such as distributed naming

services, distributed directories, or distributed hash tables. These mechanisms help

resolve global references and ensure that transactions involving objects on different

nodes can be executed seamlessly.

Ensuring consistency and coherence of reference types in a distributed database is

crucial to maintaining the integrity of the data and supporting complex queries and

transactions.

In summary, object identity and reference types in DDBMS are fundamental concepts that

address the unique identification of objects and the establishment of relationships between

them in a distributed environment. The management of these concepts is essential for

ensuring data consistency, integrity, and efficient transaction processing in distributed

databases.

Object-Oriented Databases (OODBMS):

1. Data Model:
 Model: OODBMS follows an object-oriented data model.

 Objects: Data is stored in the form of objects, which encapsulate both data

and behavior.

 Inheritance: Supports inheritance, allowing objects to inherit attributes and

behaviors from other objects.

2. Complex Data Types:

 Supports complex data types such as arrays, lists, and nested objects.

 Relationships between objects are a fundamental part of the model.

3. Query Language:

 Typically uses a language specifically designed for working with objects, such

as OQL (Object Query Language).

4. Scalability:

 Well-suited for applications with complex, interconnected data and

relationships.

 May be less suitable for large-scale, data-intensive applications.

5. Examples:

 ObjectStore, db4o.

Object-Relational Databases (ORDBMS):

1. Data Model:

 Model: ORDBMS extends the relational data model to include object-oriented

features.

 Tables: Data is organized in tables (relations), similar to relational databases.

 Objects: Allows the storage of user-defined data types and methods.

2. Complex Data Types:


 Supports complex data types, such as arrays and user-defined types.

 Incorporates object-oriented features like encapsulation and inheritance into

the relational model.

3. Query Language:

 Uses SQL (Structured Query Language) for querying, which is a standard

language for relational databases.

 SQL is extended to support object-oriented features.

4. Scalability:

 Well-suited for applications that require the structure and simplicity of

relational databases while benefiting from some object-oriented features.

 More scalable for large datasets and data-intensive applications compared to

traditional OODBMS.

5. Examples:

 PostgreSQL, Oracle, Microsoft SQL Server (with object-relational features).

Summary:

 OODBMS emphasizes an object-oriented data model, where data is represented as

objects with attributes and behaviors, and it's suitable for applications with complex,

interconnected data.

 ORDBMS extends the relational data model to include object-oriented features,

maintaining the tabular structure of relational databases while supporting user-defined

data types and methods.

You might also like