03 Final Database2
03 Final Database2
03 Final Database2
The EER goal is achieved by incorporation of a semantic data modeling concepts into the
conceptual ER Model.
Remark:
The cooperating of the previews concepts has the following advantages:
a. Storage Saving
b. Performance Enhancement
2. Specialization Concept:
It is the process of defining a set of subclasses based on a distinguish Characteristics ( P.O.V “Point-of-
View”).
1
Figure (1)
Remarks:
1. In the previous example in figure (1).
We can determine three Specifications based on the following characteristics:
Job Type
Rank
9
Method of Pay
2. The subclasses that define a specialization are attached by lines to a circle which is
connected to superclass.
3. The subset symbol on each line connecting a subclass to the circle indicates the direction
of the superclass/subclass relationships.
4. Attributes that apply only to entities of a particular subclass are attached to the rectangle
representing that subclass & these attributes are called Specific Attributes (or Local
Attribute) ex: Typing_Speed of SECRETARY
5. Subclass can participate in a specific relationship types ex: The relationship named
Belongs_to in previous example.
2
Constraint and Characteristics of specialization:
1. Definition Constraints:
a. Predicate Defined Specialization
It is process of defining a condition to determine exactly the entities that will become
members of each subclass by placing a condition on the value of some attribute of
the superclass, which is called Defining Attribute of the related subclass.
2. Disjoints Constraints:
a. Disjoint Specialization
This means that an entity can be a member of at most one subclass of a specialization
(see Fig: 2)
Figure (2)
3
b. Overlapped specialization:
This means that an entity can be a member in any number of subclasses of specialization (see
Fig: 3)
Figure (3)
3. Participation Constraints:
a. Total Participation Specialization
It specifies that every entity in a superclass must be a member of at least one subclass in
the specialization.
b. Partial Participation Specialization
It allows an entity in superclass not to belong to any of its subclasses in the specialization.
4
Figure (4)
Remarks:
In specialization with lattice or hierarchy inheritance, a subclass inherits the attributes not
only of its direct superclass but also of all its predecessor superclasses all the way to the
root of the hierarchy or lattice.
Leaf Node Class: it is a class that has no subclasses of its own.
Shared Subclass: it is a subclass with more than one superclass and its entities represent
a subset of the intersection of the entities of its superclasses. This means that an entity of
the shared subclass must exist as an entity in all its superclasses.
For an Example see Fig 4 the shared subclass ENGINEERING_MANAGEER means
that an engineering manager must be an engineer,manager, and salaried_employee.
Example:-
Figure (5): shows another specialization lattice of more than one level. This part of a conceptual
schema for a UNIVERSITY database. Notice that this arrangement would have been a hierarchy
except for the STUDENT ASSISTANT subclass, which is a subclass in two distinct class/ subclass
relationships.
5
Figure (5)
The requirements for the part of the UNIVERSITY database shown in Figure (5)
are the following:
1. The database keeps track of three types of people: employees, alumni, and students. A person
can belong to one, two, or all three of these types. Each person has a name, SSN, sex, address,
and birth data.
2. Every employee has a salary, and there are three types of employees; faculty, staff, and student
assistant. Each employee belongs to exactly one of these types. For each alumnus, a record of
the degree or degrees that he or she earned at the university is kept, including the name of the
degree, the year granted, and the major department. Each student has a major department.
3. Each faculty has a rank, whereas each staff member has a staff position. Student assistants are
classified further as either research assistants or teaching assistants, and the percent of time that
they work is recorded in the database. Research assistants have their research project stored,
whereas teaching assistants have the current course they work on.
4. Students are further classified as either graduate or undergraduate with the specific attribute
degree program (M.S, Ph.D., M.B.A, and so on) and class (freshman, sophomore, and so on),
respectively.
6
In figure (5), all person entities represented in the database are members of the PERSON
entity type, which is specialized into the subclasses {EMPLOYEE, ALUMNUS, STUDENT}.
This specialization is over lapping; for example, an alumnus may also be an employee and may
also be a student pursuing an advanced degree. The subclass STUDENT is the superclass
for the specialization {GRADUTE_STUDENT, UNDERGRADUTE_STUDENT}, while
EMPLOYEE is the superclass for the specialization {STUDENT_ASSISTANT, FACULTY,
STAFF}. Notice that STUDENT_ASSISTANT is also a subclass of STUDENT. Finally,
STUDENT_ASSISTANT is the superclass for the specialization into
{RESEARCH_ASSISTANT, TEACHING_ASSISTANT}.
In such a specialization lattice or hierarchy, a subclass inherits the attributes not only of its
direct superclass, but also of all its predecessor superclasses all the way to the root of the
hierarchy or lattice.
For example, an entity in GRADUATE_STUDENT inherits all the attributes of that entity
as a STUDENT and as a PERSON. Notice that an entity may exist in several leaf nodes of the
hierarchy, where a leaf node is a class that has no subclasses of its own. For example, a
member of GRADUATE_STUDENT may also be a member of RESEARCH_ASSISTANT.
A subclass with more than one superclass is called Shared Subclass, such as
ENGINEERING_MANAGER in figure (4). This leads to the concept known as Multiple
Inheritance, where the shared subclass ENGINEERING_MANAGER directly inherits
attributes and relationships from multiple classes. Notice that the existence of at least one
shared class leads to a lattice (and hence to multiple inheritance); if no shared subclass existed,
we would have a hierarchy rather than a lattice.
An important rule related to multiple inheritances can be illustrated by the example of the
shared subclass STUDENT_ASSISTANT in figure (5), which inherits attributes from both
EMPLOYEE and STUDENT. Here, both EMPLOYEE and STUDENT inherit the same
attributes from PERSON. The rule states that if an attribute (or relationship) originating in the
same superclass (PERSON) is inherited more than once via different paths (EMPLOYEE and
STUDENT) in the lattice, than it should be included only once in the shared subclass
(STUDENT_ASSISTANT). Hence, the attributes of PERSON are inherited only once in the
STUDENT_ASSISTANT subclass of figure (5).
If we do not allow for overlapping to occur in specialization (by considering all possible
combinations of classes that may have some entity belong to all these classes) there will be no
shared subclasses.
For Example in the following solved example the person may be {E, A, S} and to prevent
overlapping It would be necessary to create seven subclasses of a person n order to cover all
possible types of entities: E,A, S, E-A, E-S, A-S and E-A-S which will lead to more
complexity.
7
Example of categories (UNION TYPES)
Remarks:
Attribute inheritance works more selectively in the case of category.For example in figure (5)
each owner entity inherits the attributes of a company, a person or a bank depending on the
superclass to which the entity belongs.
On the other hand, a shared subclass entity such as ENGINEERING_MANAGER figure (4)
inherits all the attributes of its superclasses: (SALARIED_EMPLOYEE, ENGINEER, and
MANAGER).
8
An Example UNIVERSITY EER Schema, Design Choices, and Formal
Definitions:
In this section, we first give an example of a database schema in the EER Model to illustrate the use
of the various concepts discussed here and in chapter (3).Then, we discuss design choice for
conceptual schema, and finally we summarize the EER Model concepts and define them formally in
the same manner in which we formally defined the concepts of the basic ER Model in chapter (3).
Figure (6):
completed [TRANSCRIPT]. Each TRANSCRIPT instance includes the grade the student received
[Grade] in the course section.
GRAD_STUDENT is a subclass of STUDENT ,with the defining predicate Class = 5. For each
graduate student, we keep a list of previous degrees in a composite, multi-valued attribute [Degrees].
We also relate the graduate student to a faculty advisor [ADVISOR] and to a thesis committee
[COMMITTEE], if one exists.
An academic department has the attributes name[Dname], telephone [Dphone], and office number
[Office] and is related to the faculty member who is its chairperson [CHAIRS] and to the college to
which it belongs [CD]. Each college has attributes collage name [Cname], office number [Coffice],
and the name of its dean [Dean].
A course has attributes course number [C#], course name [Cname], and course description [Cdesc].
Several sections of each course are offered, with each section having the attributes section number
[Sec#] and the year and quarter in which the section was offered ([Year] and [Qtr]). Section numbers
uniquely identify each section.
The sections being offered during the current quarter are in a subclassCURRENT_SECTION of
SECTION, with the defining predicate Qtr = Current_qtr and Year = Current_year. Each section is
related to the instructor who taught or is teaching it ([TEACH]), if that instructor is in the database.
EER-to-Relational Mapping:-
Here we are going to add further step to the ER-to-Relational mapping algorithm (Seven Steps) to
handle the mapping of specialization. This step will have 4-main options and conditions under which
we can determine the suitable option. We use Attrs( R ) to denote the attributes of relation R and PK (
R ) to denote the primary key of R.
First we describe the mapping formally, then we illustrate it by examples.
Create a relation L for C with attributes (L)= {k,a1,…,an} and PK(L)=k. Create a relation Li
for each subclass Si, 1 ≤ i ≤ m, with the attributes(Li)= {k} U {attributes of Si} and PK (Li)=k.
This option works for any specialization (total or partial, disjoint or overlapping).
Secretary Engineer
Technician 11
2. Mapping the following EER Schema using option 8B
Car
PK
Truck
PK
12
3. Mapping the following EER Schema using option 8C:-
Employee
PK
13
4. Mapping the following EER Schema using option 8D:-
Part
PK
Remarks:
2. Option 8B works for only when both the disjoint and total constraints hold. Why?
(use this figure in your Analysis)
14
Remarks:
3. Option 8c & 8D (Single Relations) are Not recommended if many specific attributes
(local att.) are defined for the subclasses. Why?
4. Option 8C & 8D (Single Relations) are recommended if few specific attributes (local
att.) are defined for the subclasses. Why?
5. Option 8C is used to handle disjoint subclasses by including a single type attribute
(discriminating) to indicate the subclass to which each tuple belongs. If specialization
is partial the type Attribute can have a Null values in tuples that do not belong to any
subclass.
6. Option 8D is designed to handle overlapping subclasses by including m boolean type
field where m is Number of subclasses of the specialization.
7. The following Figure
15
Remarks:
Here
16
All faculty members are related to the academic department(s) with which they are affiliated
[BELONGS] (a faculty member can be associated with several departments, so the relationship
is M: N).
For mapping a category whose superclasses have different keys, it is customary to specify a new key
attribute called surrogate key.
Surrogate key: it is a new key attribute created for a relation which corresponds to a category
defining superclasses that have different keys.
Remark:
The surrogate key is primary key of the category relation and it is also included as a foreign key
in the relations which corresponding to superclasses of the category.
17
Distributed DB
Beside centralized and database, that resides on a single hardware machine, with associated
secondary storage devices such as disks for on-line database storage and tapes for backup. In recent
years, there has been a rapid trend toward the distribution of computer systems over multiple sites
that are connected together via communication network.
18
To maintain the consistency of copies of a replicated data item.
To recover from individual site crashes and from new types of failures such as failure of a
communication link.
Architecture of a DDES:
At the physical hardware level, the main factors the distinguish a DDBS from a centralized system
are the following:
• There are multiple computers, called sites or nodded.
• These sites must be connected by some type of communication network to transmit data and
commands among sites, as shown in the following figure:
Communication
Network
19
The sites may all be located in the same building or typically within a l mile radius and connected via
a local area network, or they may be geographically distributed over large distances and connected via
a long-level network. Local area networks typically use cable, whereas long-level networks u?c
telephone lines or satellites.
In a DDBS it is possible that some sites contain both AP and DP software, where as other sites
contain only one or the other as illustrated in the previous figure.
A site is used mainly for DB function is called a back-machine, and a site that is used primarily for
the AP function is called a front-end machine.
Distribution Transparency
An important function of the AP is to hide the details of data distribution from the user; that
is; the user should write global quires and transactions as through the database were centralized,
without having to specify the sites at which the data referenced in global quires and transactions
reside. This property is called Distribution Transparency.
Data Fragmentation:
Before distributing the data, the logical units of the database must be determined. The simplest
logical units are the relations themselves; that is, each whole relation will be stored at a particular
site. However in many cases the relation can be divided into smaller logical units for distribution.
20
We have three different types of fragmentation:
1. Horizontal fragmentation:-
A horizontal fragmentation of a rotation is a subset of the tuples in that rotation. The tuples
that belong to the horizontal fragment are specified by a condition on one or more attributes of
the relation.
2. Vertical fragmentation
Another type of fragmentation is called vertical fragmentation. A vertical fragment of a
relation keeps only certain attributes in the relation that are related together in some way.
3. Mixed fragmentation
Both vertical and horizontal fragments can be intermixed yielding mixed fragments.
Remark:
The original relation can be reconstructed by applying union and outer join.
Data Replication and Allocation
Replication is useful in improving the availability of data. The most extreme case is
replication of the whole database at every site in the distributed system, thus creating a fully
replicated distributed database. This can improve availability remarkably because the system
can continue to operate as long as at least one site is up, it also improve performance of
retrieval for global quires because the result of such a query can be obtained locally from one
site. The disadvantage of full replication is that it can slow down update operations drastically
because a single logical update must be performed on every copy of the database to keep the
copies consistent. Also full replication makes recovery techniques more expensive than if there
was no replication. The other extreme from full replication is to have no replication; that each
fragment is stored at exactly one site. In this case all fragments must be disjoint, except for the
repetition of primary keys among vertical (or mixed) fragments. This technique is called non-
redundant allocation. Between these two extremes, we have a wide spectrum of partial
replication of the data; that is some fragments of the database may be replicated whereas others
are not.
The number of copies of each fragment can range from one to the number of sites in the
distributed system.
A description of the replication of fragments is some times called a replication schema. Each
copy of a fragment and the fragment itself must be assigned to a particular site in the
distributed system.
21
This process is called data distribution. The degree of replication depends on some factors:
• Performance and availability goals of the system.
• Types and frequencies of transactions submitted at each site.
Example of Fragmentation, Allocation and Replication :
We now consider an example of fragmenting and distributing the company database
previously mentioned. Suppose that the company has three computer sites -one for each
current department. Sites 2 & 3 for departments 5& 4 respectively. At each of these sites, we
expect frequent access to the EMPLOYEE and PROJECT information for the employees who
work in that department and the projects controlled by that department.
Further, we assume that these sites mainly access the NAME, SSN, SALARY, and
SUPERSSN attributes of EMPLOYEE. Site 1 is used by company headquarter and accesses
all employee and project information regularly, in addition to keeping track of Dependent
information for insurance purposes. The following figure explains this example:
22
23
Types of Distributed Data Base System
The term distributed data base management system can be applied to describe a variety of systems
that differ from one another in many respects
The main factors that differentiate distributed systems are:
24
Concurrency Control and Recovery in Distributed Database:-
For concurrency control and recovery purposes, numerous problems arise in distributed DBMS
environment. Some of these problems are the following:
Distributed Recovery
The recovery process in distributed database is quite involved. We will give only a very brief
idea of some of the issues here. In some cases, it is quite difficult even to determine when a site is
down without exchanging numerous messages with other sites.
For example, suppose that site X sends a message to site Y and expects a response from Xbut
does not receive it; there are several possible explanations
• The message was not delivered to Y because of failure.
• Site Y is down and could not respond.
• Site Y is running and pent a response but the response was not delivered.
Without additional information or sending of additional messages, it is difficult to determine what
actually happened.
25
Introduction to Transaction Processing
Concepts and Theory
Transaction Processing Systems:
They are systems with large databases and hundreds of concurrent users that are executing
database transactions.
Transaction:
It is an executing program that forms a logical unit of database processing. A Transaction
includes one or more database access operations (e.g. insertion, deletion, and modification or
retrieval operations). Transaction can also be defined as an atomic unit of work that either
completed entirely or not at all, if it fails for any reason.
In these systems a Concurrency Control Problem arises which occurs when multiple transactions
submitted by various users interfere with one another in a way that produces incorrect results.
Figure (1)
26
Basic Database Access Operations
The basic database access operations that a transaction can include are as follows:
1) Read-Item(X): Reads a database item named X into a program variable (also named X for
simplicity).
2) Write-Item (X): Writes value of program variable X into the database item named X.
27
Figure (2)
28
Why Recovery Is Needed?
Recovery is required to ensure the following when transaction is submitted to DBMS for
execution:
1. All the operations in the transaction are completed successfully and their effect is
recorded permanently in the database.
2. It is not allowed to some of transaction operations to update database, if transaction
fails for any reasons.
Types of Failures:
Failures are generally classified as Transaction, System and Media failures. These are several
possible reasons for a transaction to fail in the middle of execution:
1) Computer failure (system crash):
A hardware, software, or network error occurs in the computer system during
transaction execution. Disk failure is one of the main media failures since it contains the
system log file.
2) A transaction or system error:
Some operations in a transaction may cause it fails (e.g. divide by zero or logical
programming errors.
3) Local errors or exception conditions detected by the transaction:
During transaction execution, certain conditions may occur that necessitate transaction
cancellation (e.g. data for transaction not found or insufficient account balance).
4) Concurrency control enforcement:
The concurrency control method may decide to abort the transaction to be restarted
later.
5) Physical problems and catastrophes: see page 558.
Remark:
Why Concurrency is needed?
Why Concurrency control is needed?
1) Begin Transaction
This marks the begging of transaction execution.
29
2) Read or Write Operations of Database Items in an Transaction
Some operations in a transaction may cause it fails (e.g. divide by zero or logical
programming errors.
3) End-transaction
This marks the end of transaction execution. At this point it is necessary to check whether
the changes introduced by the transaction can be permanently applied to the database
(committed) or whether the transaction has to be aborted.
4) Commit-transaction
This marks a successful end of a transaction, so that any changes executed by the
transaction can be safely committed.
5) Rollback (Abort)
This marks an unsuccessful end of a transaction so that any changes performed by
transaction to the database must be undone. Figure (3) shows the sate transition diagram
that describes how the transaction moves through its execution states.
Figure (3)
Remark:
When transaction ends, it moves to the Partially Committed State.
At this point, some recovery protocols need to ensure that a system
failure will not result, so an inability to record the changes of
transaction permanently (this done by recording changes in the System
Log). Also, at this point the DBMS concurrency control system can
force the transaction to a failed state or it may fail and aborted during its
active state. Failed or aborted transaction may be restarted later either
automatically or after being resubmitted by the user. Aborted or failed
transaction changes must be rollback.
30
The Log entries have the following forms:
1) [Start-Transaction, T]
2) [Write-Item, T, X, Old-value,[new-value]
3) [Read- Item, T, X]
4) [Commit, T]
Indicates that transaction T has completed successfully and its effect can be committed
(recorded permanently to the database).
5) [Abort, T]
1) Atomicity:
A transaction is an atomic unit of processing to be performed entirely or not performed
at all. It is the responsibility of the transaction recovery subsystem.
2) Consistency preservation:
A transaction is consistency preserving, execution takes the database from a consistent
state to another consistent state. It is the responsibility of the programmers who write
the database program to enforce integrity constraints. A consistent state of the database
satisfies the constraints specified in the schema.
3) Isolation:
A transaction execution should not be interfered with any other transactions executing
concurrently. Isolation may be enforced by hiding (invisible) its updates to other
transactions until it is committed (this will avoid dirty read problem). It is responsibility
of the concurrency control subsystem.
4) Durability:
The changes applied to the database by a committed transaction must persist in the
database. These changes must not31be lost because of any failure. It is responsibility
of the recovery subsystem of the DBMS.