[go: up one dir, main page]

0% found this document useful (0 votes)
89 views21 pages

DDB Lectures

Uploaded by

hassan313.g4l
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views21 pages

DDB Lectures

Uploaded by

hassan313.g4l
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

DISTRBUTED DATABASE

Review

In a computer, a file system (sometimes written file system) is


the way in which files is named and where they are placed logically
for storage and retrieval. The DOS, Windows, OS/2, Macintosh, and
UNIX-based operating systems all have file systems in which files are
placed somewhere in a hierarchical (tree) structure. A file is placed in
a directory (folder in Windows) or subdirectory at the desired place in
the tree structure. File systems specify conventions for naming files.
These conventions include the maximum number of characters in a
name, which characters can be used, and, in some systems, how long
the file name suffix can be. A file system also includes a format for
specifying the path to a file through the structure of directories.
Sometimes the term refers to the part of an operating system or an
added-on program that supports a file system. Examples of such add-
on file systems include the Network File System (NFS) and the
Andrew file system (AFS).

Traditional Files Features and Limitation:

1- When we handle a lot of files in one application program access


becomes complicated and various kind of trouble often Occur.

2- Since data to description correspond to programs one to one it is


complication to update all program when data format is update.

1
DISTRBUTED DATABASE

3- Since one data is often duplicated in more than two files when is
updated it becomes inconsistent unless maintained simultaneously to
all file concerned.

4- One file is usually created in a suitable format for some times the
file format cannot be used for another application.

5- One file may be restricted to only one access key.

6- Programmer must consider file integrity to avoid HW and SW


troubles.

File System Disadvantage: -

1- Data redundancy and inconsistency existed.

2- Difficulty in accessing data.

3- Data isolation.

4- Concurrent access anomalies.

5- Security problems existed.

6- Integrity problems existed.

Database: It is a collection of interrelated data store together without


harmful or unnecessary redundancy data to serve multiple
applications. The data is stored so as to be independent from the
programs.

2
DISTRBUTED DATABASE

Database Advantages

1- Reduction in data redundancy.

2- The ability to operate on deferent data structure.

3- Independent of data from the program.

4- High speed of retrieval and fast on line.

5- High degree of flexibility in handling data format.

6- Minimum cost.

7- Inconsistent can be avoided.

8- Integrity can be maintained.

9- Standard parameter can be enforced.

10- Security restriction can be applied.

DBMS: It is a software package designed to store and manage


database to gets:

- Data independence and efficient access.

- Reduced application development time.

- Data integrity and security.

- Uniform data administration.

- Concurrent access and recovery.

3
DISTRBUTED DATABASE

Distributed Database

1.1 A distributed database (DDB) is a collection of multiple, logically


interrelated databases distributed over a computer network. A distributed
database management system (distributed DBMS) is the software system that
permits the management of the distributed database and makes the distribution
transparent to the users.

The term distributed database system (DDBS) is typically used to refer to


the combination of DDB and the distributed DBMS. Distributed DBMSs are
similar to distributed file systems in that both facilitate access to distributed data.
However, there are important differences in structure and functionality, and these
characterize a distributed database system:

1. Distributed file systems simply allow users to access files that are located on
machines other than their own. These files have no explicit structure (i.e., they
are flat) and the relationships among data in different files (if there are any) are
not managed by the system and are the user’s responsibility. A DDB, on the other
hand, is organized according to a schema that defines both the structure of the
distributed data, and the relationships among the data. The schema is defined
according to some data model, which is usually relational or object-oriented.

2. A distributed file system provides a simple interface to users which allows


them to open, read/write (records or bytes), and close files. A distributed DBMS
system has the full functionality of a DBMS. It provides high-level, declarative
query capability, transaction management (both concurrency control and
recovery), and integrity enforcement. In this regard, distributed DBMSs are
different from transaction processing systems as well, since the latter provide
only some of these functions.

4
DISTRBUTED DATABASE

3. A distributed DBMS provides transparent access to data, while in a distributed


file system the user has to know (to some extent) the location of the data. A DDB
may be partitioned (called fragmentation) and replicated in addition to being
distributed across multiple sites. All of this is not visible to the users. In this sense,
the distributed database technology extends the concept of data independence,
which is a central notion of database management, to environments where data
are distributed and replicated over a number of machines connected by a network.
Thus, from a user s perspective, a DDB is logically a single database even if
physically it is distributed.

1.2 Data Delivery Alternatives

In distributed databases, data are “delivered” from the sites where they are
stored to where the query is posed. We characterize the data delivery
alternatives along three orthogonal dimensions: delivery modes, frequency and
communication methods. The combinations of alternatives along each of these
dimensions provide a rich design space.

The alternative delivery modes are:

1- Pull-Only: The transfer of data from servers to clients is initiated by a


client pull. When a client request is received at a server, the server
responds by locating the requested information. The main characteristic
of pull-based delivery is that the arrival of new data items or updates to
existing data items are carried out at a server without notification to
clients unless clients explicitly poll the server. Also, in pull-based mode,
servers must be interrupted continuously to deal with requests from
clients. Furthermore, the information that clients can obtain from a server
is limited to when and what clients know to ask for. Conventional
DBMSs offer primarily pull-based data delivery.

5
DISTRBUTED DATABASE

2- Push-only mode of data delivery, the transfer of data from servers to


clients is initiated by a server push in the absence of any specific request
from clients. The main difficulty of the push-based approach is in
deciding which data would be of common interest, and when to send
them to clients – alternatives are periodic, irregular, or conditional. Thus,
the usefulness of server push depends heavily upon the accuracy of a
server to predict the needs of clients. In push-based mode, servers
disseminate information to either an unbounded set of clients (random
broadcast) who can listen to a medium or selective set of clients
(multicast), who belong to some categories of recipients that may receive
the data.
3- The hybrid mode of data delivery combines the client-pull and server-
push mechanisms combining the pull and push modes: namely, the
transfer of information from servers to clients is first initiated by a client
pull (by posing the query), and the subsequent transfer of updated
information to clients is initiated by a server push.

1.3 Promises of DDBSs

There are many advantages of DDBSs:

1- Transparent Management of Distributed and Replicated Data:


Transparency refers to separation of the higher-level semantics of a
system from lower-level implementation issues. In other words, a
transparent system “hides” the implementation details from users. The
advantage of a fully transparent DBMS is the high level of support
that it provides for the development of complex applications.
2- Data Independence:
A- Logical data independence: refers to the immunity of user applications to
changes in the logical structure (i.e., schema) of the database.

6
DISTRBUTED DATABASE

B- Physical data independence, on the other hand, deals with hiding the
details of the storage structure from user applications. When a user
application is written, it should not be concerned with the details of
physical data organization. Therefore, the user application should not
need to be modified when data organization changes occur due to
performance considerations.
3- Network Transparency: the user should be protected from the
operational details of the network; possibly even hiding the existence
of the network. Then there would be no difference between database
applications that would run on a centralized database and those that
would run on a distributed database. This type of transparency is
referred to as network transparency or distribution transparency.
4- Replication Transparency: reliability, and availability reasons, it is
usually desirable to be able to distribute data in a replicated fashion
across the machines on a network. Such replication helps performance
since diverse and conflicting user requirements can be more easily
accommodated. For example, data that are commonly accessed by one
user can be placed on that user’s local machine as well as on the
machine of another user with the same access requirements. This
increases the locality of reference. Furthermore, if one of the machines
fails, a copy of the data is still available on another machine on the
network.
5- Fragmentation Transparency: it is commonly desirable to divide
each database relation into smaller fragments and treat each fragment
as a separate database object (i.e., another relation). This is commonly
done for reasons of performance, availability, and reliability.
Furthermore, fragmentation can reduce the negative effects of

7
DISTRBUTED DATABASE

replication. Each replica is not the full relation but only a subset of it;
thus, less space is required and fewer data items need be managed.

1.4 Reliability through Distributed Transactions

Distributed DBMSs are intended to improve reliability since they have


replicated components and, thereby eliminate single points of failure. The
failure of a single site, or the failure of a communication link which makes one
or more sites unreachable, is not sufficient to bring down the entire system. In
the case of a distributed database, this means that some of the data may be
unreachable, but with proper care, users.

1.5 Improved Performance:

The case for the improved performance of distributed DBMSs is typically


made based on two points. First, a distributed DBMS fragments the conceptual
database, enabling data to be stored in close proximity to its points of use (also
called data localization). This has two potential advantages:

1. Since each site handles only a portion of the database, contention for CPU
and I/O services is not as severe as for centralized databases.

2. Localization reduces remote access delays that are usually involved in wide
area networks (for example, the minimum round-trip message propagation delay
in satellite-based systems is about 1 second).

1.6 Easier System Expansion:

In a distributed environment, it is much easier to accommodate increasing


database sizes. Major system overhauls are seldom necessary; expansion can
usually be handled by adding processing and storage power to the network.
Obviously, it may not be possible to obtain a linear increase in “power,”

8
DISTRBUTED DATABASE

since this also depends on the overhead of distribution. However, significant


improvements are still possible.

One aspect of easier system expansion is economics. It normally costs much


less to put together a system of “smaller” computers with the equivalent
power of a single big machine. In earlier times, it was commonly believed
that it would be possible to purchase a fourfold powerful computer if one
spent twice as much. This was known as Grosh’s law. With the advent of
microcomputers and workstations, and their price/performance
characteristics, this law is considered invalid. This should not be interpreted
to mean that mainframes are dead; this is not the point that we are making
here. Indeed, in recent years, we have observed a resurgence in the world-
wide sale of mainframes. The point is that for many applications, it is more
economical to put together a distributed computer system (whether
composed of mainframes or workstations) with sufficient power than it is to
establish a single, centralized system to run these tasks. In fact, the latter
may not even be feasible these days.

1.7 Distributed Database Design

The design of a distributed computer system involves making decisions


on the placement of data and programs across the sites of a computer
network, as well as possibly designing the network itself. In the case of
distributed DBMSs, the distribution of applications involves two things: the
distribution of the distributed DBMS software and the distribution of the
application programs that run on it. We discussed the promises of distributed
DBMS technology, highlighting the challenges that need to be overcome in
order to realize them. In this section we build on this discussion by
presenting the design issues that arise in building a distributed DBMS.

9
DISTRBUTED DATABASE

It has been suggested that the organization of distributed systems can be


investigated along three orthogonal dimensions:

1- Level of sharing: In terms of the level of sharing, there are three possibilities.
First, there is no sharing each application and its data execute at one site, and
there is no communication with any other program or access to any data file at
other sites. This characterizes the very early days of networking and is probably
not very common today.
2. Behavior of access patterns: The access patterns of user requests may be
static, so that they do not change over time, or dynamic. It is obviously
considerably easier to plan for and manage the static environments than would
be the case for dynamic distributed systems. Unfortunately, it is difficult to find
many real-life distributed applications that would be classified as static. The
significant question, then, is not whether a system is static or dynamic, but how
dynamic it is. Incidentally, it is along this dimension that the relationship
between the distributed database design and query processing is established.

3. Level of knowledge on access pattern behavior: The third dimension of


classification is the level of knowledge about the access pattern behavior.
One possibility, of course, is that the designers do not have any information
about how users will access the database. This is a theoretical possibility, but
it is very difficult, if not impossible, to design a distributed DBMS that can
effectively cope with this situation. The more practical alternatives are that
the designers have complete information, where the access patterns can
reasonably be predicted and do not deviate significantly from these
predictions, or partial information, where there are deviations from the
predictions. Two major strategies that have been identified for designing
distributed databases are the top-down approach and the bottom-up
approach.

10
DISTRBUTED DATABASE

1- Top-Down Design Process

A framework for top-down design process is shown in Figure (1). The


activity begins with a requirements analysis that defines the environment of
the system and “elicits both the data and processing needs of all potential
database users”

The requirements study also specifies where the final system is expected to
stand with respect to the objectives of a distributed DBMS. These objectives
are defined with respect to performance, reliability and availability,
economics, and expandability (flexibility).

The requirements document is input to two parallel activities: view design


and conceptual design. The view design activity deals with defining the
interfaces for end users. The conceptual design, on the other hand, is the
process by which the enterprise is examined to determine entity types and
relationships among these entities. One can possibly divide this process into
two related activity groups entity analysis and functional analysis. Entity
analysis is concerned with determining the entities, their attributes, and the
relationships among them. Functional analysis, on the other hand, is
concerned with determining the fundamental functions with which the
modeled enterprise is involved. The results of these two steps need to be
cross-referenced to get a better understanding of which functions deal with
which entities.

11
DISTRBUTED DATABASE

There is a relationship between the conceptual design and the view design.
In one sense, the conceptual design can be interpreted as being an integration
of user views. Even though this view integration activity is very important,
the conceptual model should support not only the existing applications, but
also future applications.

Figure (1) top down design

View integration should be used to ensure that entity and relationship


requirements for all the views are covered in the conceptual schema. (Figure
1).

2- Bottom up Design Process

- The databases already exist at a number of sites.


- The databases should be connected to solve common tasks.
12
DISTRBUTED DATABASE

Figure (2) bottom up design

What is a reasonable unit of distribution? Relation or fragment of relation?

 Relations as unit of distribution:

– If the relation is not replicated, we get a high volume of remote data accesses.

– If the relation is replicated, we get unnecessary replications, which cause


problems in executing updates and waste disk space

– Might be an Ok solution, if queries need all the data in the relation and data
stays at the only sites that uses the data.

• Fragments of relations as unit of distribution:

13
DISTRBUTED DATABASE

– Application views are usually subsets of relations.

– Thus, locality of accesses of applications is defined on subsets of relations.

– Permits a number of transactions to execute concurrently, since they will


access different portions of a relation.

– Parallel execution of a single query (intra-query concurrency).

– However, semantic data control (especially integrity enforcement) is more


difficult.

Fragments of relations are (usually) the appropriate unit of distribution.

Fragmentation aims to improve:

– Reliability.

– Performance.

– Balanced storage capacity and costs.

– Communication costs.

– Security.

• The following information is used to decide fragmentation:

– Quantitative information: frequency of queries, site, where query is run,


selectivity of the queries, etc.

– Qualitative information: types of access of data, read/write, etc.

Types of Fragmentation

– Horizontal: partitions a relation along its tuples

– Vertical: partitions a relation along its attributes

– Mixed/hybrid: a combination of horizontal and vertical fragmentation

14
DISTRBUTED DATABASE

Example:

Branch-name account-number customer-name balance


Baghdad 305 Salem 500
Baghdad 226 Ahmed 336
Mousel 177 Ahmed 205
Mousel 402 Hassan 1000
Baghdad 155 Hassan 62
Mousel 408 Hassan 1123
Mousel 639 Ali 750
Table (1)

Horizontal fragmentations
Consists of partitioning the tuples of a global relation r into subsets r1, r2…
rn each subset can contain data with common properties. The reconstruction of
relation r can be obtained by taking the union of all fragments, that is: r = r1 Ur2
U …… Urn For example, suppose that the relation r is the deposit relation of
table (1) this relation has only two branches, Baghdad and Mosul, and if we
choose the attribute branch-name for horizontal fragmentation the relation, then
the result are two different fragment shows in Table (2).

Branch-name account-number customer-name balance


Baghdad 305 Salem 500
Baghdad 226 Ahmad 336
Baghdad 155 Hassan 62
deposit1
Branch-name customer-number account-name balance
Mosul 117 Ahmad 205

15
DISTRBUTED DATABASE

Mosul 402 Hassan 1000


Mosul 408 Hassan 1123
Mosul 639 Ali 750
deposit2
Table (2)
Vertical Fragmentations
Vertical Fragmentation for global relation is the subdivision of its attributes
into groups; subdivision is accomplished by adding a special attribute called a
tuple-id to the scheme R. A tuple-id is a physical or logical address for a tuple
since each tuple in r must have a unique address; the tuple-id attribute is a key
for the scheme.

Deposit-scheme3 = (branch-name, customer-name, tuple-id)


Deposit-scheme4 = (account-number, balance, tuple-id)

Branch-name account-number customer-name balance tuple-id


Baghdad 305 Salem 500 1
Baghdad 226 Ahmad 336 2
Mosul 177 Ahmad 205 3
Mosul 402 Hassan 1000 4
Baghdad 155 Hassan 62 5
Mosul 408 Hassan 1123 6
Mosul 639 Ali 750 7
Table (3)
Branch-name customer-name tuple-id
Baghdad Salem 1
Baghdad Ahmad 2

16
DISTRBUTED DATABASE

Mosul Ahmad 3
Mosul Hassan 4
Baghdad Hassan 5
Mosul Hassan 6
Mosul Ali 7
Deposit3
Account-number blanance tuple-id
305 500 1
226 336 2
177 205 3
402 1000 4
155 62 5
408 1123 6
639 75 7
Deposit 4
Table (4)
3- Distribution Design Issues

In the preceding section we indicated that the relations in a database


schema are usually decomposed into smaller fragments, but we did not offer
any justification or details for this process. The objective of this section is to
fill in these details. The following set of interrelated questions covers the
entire issue. We will therefore seek to answer them in the remainder of this
section.

1. Why fragment at all?

2. How should we fragment?

3. How much should we fragment?


17
DISTRBUTED DATABASE

4. Is there any way to test the correctness of decomposition?

5. How should we allocate?

6. What is the necessary information for fragmentation and allocation?

1.8. Architectural Alternatives


Architecturally, a distributed database system consists of a (possibly
empty) set of query sites and a non-empty set of data sites. The data sites have
data storage capability while the query sites do not. The latter only run the user
interface (in addition to applications) in order to facilitate data access at data sites.

Client/Server Systems

Client/server DBMSs entered the computing scene at the beginning of


1990’s and have made a significant impact on both the DBMS technology and
the way we do computing. The general idea is very simple and elegant:
distinguish the functionality that needs to be provided and divide these
functions into two classes: server functions and client functions. This provides a
two-level architecture which makes it easier to manage the complexity of
modern DBMSs and the complexity of distribution. As with any highly popular
term, client/server has been much abused and has come to mean different things

18
DISTRBUTED DATABASE

Client/Server Reference Architecture

19
DISTRBUTED DATABASE

Data base server approach

20
DISTRBUTED DATABASE

Distributed Database Servers

21

You might also like