Unit 2
Unit 2
Unit 2
Database Design
Part I: Entity Relationship Model
Introduction to Entity Relationship Model
Entity Relational model is a model for identifying entities to be represented in the database and representation of
how those entities are related.
Let us first understand the design process of database design.
Design Phases
Following are the six steps of database design process. The ER model is most relevant to first three steps
• Entity set: The entity set is a set of entities of the same types. For example - All students studying in class X of the
School. The entity set need not be disjoint. Each entity in entity set have the same set of attributes and the set of
attributes will distinguish it from other entity sets. No other entity set will have exactly the same set of attributes.
2) Relationship Sets
Relationship is an association among two or more entities.
The relationship set is a collection of similar relationships. For example - Following Fig. 2.1.2 shows the relationship
works for for the two entities Employee and Departments.
The association between entity sets is called as participation. That is, the entity sets E1, E2,..., En participate in
relationship set R.
The function that an entity plays in a relationship is called that entity's role.
3) Attributes
Attributes define the properties of a data object of entity. For example if student is an entity, his ID, name, address,
date of birth, class are its attributes. The attributes help in determining the unique entity. Refer Fig. 2.1.3 for Student
entity set with attributes - ID, name, address. Note that entity is shown by rectangular box and attributes are shown
in oval. The primary key is underlined.
Types of Attributes
1) Simple and Composite Attributes:
1) Simple attributes are attributes that are drawn from the atomic value domains
For example - Name = {Parth}; Age = {23}
1) Composite attributes: Attributes that consist of a hierarchy of attributes For example - Address may consists of
"Number", "Street" and "Suburb"→ Address = {59+ 'JM Road' + 'Shivaji Nagar'}
2) Single valued and multivalued:
• There are some attributes that can be represented using a single value. For example
- StudentID attribute for a Student is specific only one studentID.
• Multivalued attributes: Attributes that have a set of values for each entity. It is represented by concentric ovals
For example - Degrees of a person: BSc', 'MTech', 'PhD'
3) Derived attribute:
Derived attributes are the attributes that contain values that are calculated from other attributes. To represent derived
attribute there is dotted ellipse inside the solid ellipse. For example Age can be derived from attribute DateOfBirth.
In this situation, DateOfBirth might be called Stored Attribute.
Mapping Cardinality
Mapping Cardinality represents the number of entities to which another entity can be associated via a relationship
set.
The mapping cardinalities are used in representing the binary relationship sets. Various types of mapping
cardinalities are -
1) One to One: An entity A is associated with at least one entity on B and an entity B is associated with at one entity
on A. This can be represented as,
2) One to Many: An entity in A is associated with any number of entities in B. An entity in B, however, can be
associated with at most one entity in A.
3) Many to One: An entity in A is associated with at most one entity in B. An entity in B, however, can be
associated with any number of entities in A.
4) Many to many:An entity in A is associated with any number (zero or more) of entities in B, and an entity in B is
associated with any number (zero or more) of entities in A.
ER Diagrams
An E-R diagram can express the overall logical structure of a database graphically.E-R diagrams are used to model
real-world objects like a person, a car, a company and the relation between these real-world objects.
Features of ER model
i) E-R diagrams are used to represent E-R model in a database, which makes them easy to be converted into relations
(tables).
ii) E-R diagrams provide the purpose of real-world modeling of objects which makes them intently useful.
iii) E-R diagrams require no technical knowledge and no hardware support.
iv) These diagrams are very easy to understand and easy to create even by a naive user.
v) It gives a standard solution of visualizing the data logically.
Various Components used in ER Model are-
Mapping Cardinality Representation using ER Diagram
There are four types of relationships that are considered for key constraints.
i) One to one relation: When entity A is associated with at the most one entity B then it shares one to one relation.
For example - There is one project manager who manages only one project.
ii) One to many :When entity A is associated with more than one entities at a time then there is one to many
relation. For example - One customer places order at a time.
iii) Many to one : When more than one entities are associated with only one entity then there is is many to one
relation. For example – Many student take a ComputerSciCourse
iv) Many to many: When more than one entities are associated with more than one entities. For example -Many
teachers can teach many students.
Alternate representation can be
Ternary Relationship
The relationship in which three entities are involved is called ternary relationship. For example -
• Now consider the CUSTOMER entity, and that the customer buys products. If all customers pay the same price for
a product, regardless of supplier, then you have a simple binary relationship between CUSTOMER and PRODUCT.
For the CUSTOMER/PRODUCT relationship, the intersection attribute is retail_price.
• Single ternary relation: Now consider a different scenario. Suppose the customer buys products but the price
depends not only on the product, but also on the supplier. Suppose you needed a customerID, a productID, and a
supplierID to identify a price. Now you have an attribute that depends on three things and hence you have a
relationship between three entities (a ternary relationship) that will have the intersection attribute, price.
Enhanced ER Model
Specialization and Generalization AU: Dec- 19, Marks 7
• Some entities have relationships that form hierarchies. For instance, Employee can be an hourly employee or
contracted employee.
• In this relationship hierarchies, some entities can act as superclass and some other entities can act as subclass.
• Superclass: An entity type that represents a general concept at a high level, is called superclass.
• Subclass: An entity type that represents a specific concept at lower levels, is called subclass.
• The subclass is said to inherit from superclass. When a subclass inherits from one or more superclasses, it inherits
all their attributes. In addition to the inherited attributes, a subclass can also define its own specific attributes.
• The process of making subclasses from a general concept is called specialization. This is top-down process. In this
process, the sub-groups are identified within an entity set which have attributes that are not shared by all entities.
• The process of making superclass from subclasses is called generalization. This is a bottom up process. In this
process multiple sets are synthesized into high level entities.
• The symbol used for specialization/ Generalization is,
• For example - There can be two subclass entities namely Hourly_Emps and Contract_Emps which are subclasses
of Empoyee class. We might have attributes hours_worked and hourly wage defined for Hourly_Emps and an
attribute contractid defined for ContractEmps.
Therefore, the attributes defined for an Hourly_Emps entity are the attributes for Employees plus Hourly_Emps. We
say that the attributes for the entity set Employees are inherited by the entity set Hourly_Emps and that Hourly-Emps
ISA (read is a) Employees. It can be represented by following Fig. 2.4.1.
Constraints on Specialization/Generalization
There are four types of constraints on specialization/generalization relationship. These are -
1) Membership constraints: This is a kind of constraints that involves determining which entities can be members
of a given lower-level entity. There are two types of TO S membership constraints –
i) Condition defined: In condition-defined lower-level entity sets,membership is evaluated on the basis of
whether or not an entity satisfies an explicit condition or predicate. For example - Consider the high-level entity Set
Employee that has attribute Employee_type. All Employee entities are evaluated on defining Employee_type
attribute. All entities that satisfy the condition student type = "ContractEmployee" are included in Contracted
Employee. Since all the lower-level entities are evaluated on the basis of the same attribute this type of
generalization is said to be attribute-defined.
ii) User defined: This is kind of entity set that in which the membership is manually defined.
2) Disjoint constraints: The disjoint constraint only applies when a superclass has more than one subclass. If
the subclasses are disjoint, then an entity occurrence can be a member of only one of the subclasses. For entity
Student has either Postgraduate Student entity or Undergraduate Student
3) Overlapping: When some entity can be a member of more than one subclasses. For example - Person can be both
a Student or a Staff. The And can be used to représent this constraint.
4) Completeness: It specifies whether or not an entity in the higher-level entity set must belong to at least one of the
lower-level entity sets within the generalization/specialization. This constraint may be one of the following –
i) Total generalization or specialization: Each higher-level entity must belong to a lower-level entity set. For
example - Account in the bank must either Savings account or Current Account. The mandatory can be used to
represent this constraint.
ii) Partial generalization or specialization: Some higher-level entities may not belong to any lower-level entity
set.
Aggregation
A feature of the entity relationship model that allows a relationship set to participate in another relationship set. This
is indicated on an ER diagram by drawing a dashed box around the aggregation.
For example - We treat the relationship set work and the entity sets employee and project as a higher-level entity set
called work.
ER to Relational Mapping
AU: May-17, Dec.-19, Marks 13
In this section we will discuss how to map various ER model constructs to Relational Model construct.
Mapping of Entity Set to Relationship
• An entity set is mapped to a relation in a straightforward way.
• Each attribute of entity set becomes an attribute of the table.
• The primary key attribute of entity set becomes an entity of the table.
• For example - Consider following ER diagram.
The converted employee table is as follows –
The SQL statement captures the information for above ER diageam as follows -
CREATE TABLE Employee( EmpID CHAR(11),
EName CHAR(30),
Salary INTEGER,
PRIMARY KEY(EmpID))
Mapping Relationship Sets(Without Constraints) to Tables
• Create a table for the relationship set.
• Add all primary keys of the participating entity sets as fields of the table.
• Add a field for each attribute of the relationship.
• Declare a primary key using all key fields from the entity sets.
• Declare foreign key constraints for all these fields from the entity sets.
For example - Consider following ER model
The SQL statement captures the information for relationship present in above ER diagram as follows -
CREATE TABLE Works In (EmpID CHAR(11),
DeptID CHAR(11),
EName CHAR(30),
Salary INTEGER,
DeptName CHAR(20),
Building CHAR(10),
PRIMARY KEY(EmpID,DeptID),
FOREIGN KEY (EmpID) REFERENCES Employee,
FOREIGN KEY (DeptID) REFERENCES Department
)
Mapping Relationship Sets( With Constraints) to Tables
• If a relationship set involves n entity sets and some m of them are linked via arrows in the ER diagram, the key for
anyone of these m entity sets constitutes a key for the relation to which the relationship set is mapped.
• Hence we have m candidate keys, and one of these should be designated as the primary key.
• There are two approaches used to convert a relationship sets with key constraints into table.
• Approach 1:
• By this approach the relationship associated with more than one entities is separately represented using a table.
For example - Consider following ER diagram. Each Dept has at most one manager, according to the key constraint
on Manages.
Here the constraint is each department has at the most one manager to manage it. Hence no two tuples can have same
DeptID. Hence there can be a separate table named Manages with DeptID as Primary Key. The table can be defined
using following SQL statement
CREATE TABLE Manages (EmpID CHAR(11),
DeptID INTEGER,
Since DATE,
PRIMARY KEY (DeptID),
FOREIGN KEY (EmpID) REFERENCES Employees,
FOREIGN KEY (DeptID) REFERENCES Departments)
Approach 2:
• In this approach, it is preferred to translate a relationship set with key constraints.
• It is a superior approach because, it avoids creating a distinct table for the relationship set.
• The idea is to include the information about the relationship set in the table corresponding to the entity set with
the key, taking advantage of the key constraint.
• This approach eliminates the need for a separate Manages relation, and queries asking for a department's manager
can be answered without combining information from two relations.
• The only drawback to this approach is that space could be wasted if several departments have no managers.
• The following SQL statement, defining a Dep_Mgr relation that captures the information in both Departments
and Manages, illustrates the second approach to translating relationship sets with key constraints:
CREATE TABLE Dep_Mgr (DeptID INTEGER,
DName CHAR(20),
Budget REAL,
EmpID CHAR (11),
since DATE,
PRIMARY KEY (DeptID),
FOREIGN KEY (EmpID) REFERENCES Employees)
Mapping Weak Entity Sets to Relational Mapping
• A weak entity can be identified uniquely only by considering the primary key of another (owner) entity. Following
steps are used for mapping Weak Entity Set to Relational Mapping
• Create a table for the weak entity set.
• Make each attribute of the weak entity set a field of the table. AI baris M
• Add fields for the primary key attributes of the identifying owner.
• Declare a foreign key constraint on these identifying owner fields.
• Instruct the system to automatically delete any tuples in the table for which there are no owners
For example - Consider following ER model,
Following SQL Statement illustrates this mapping
CREATE TABLE Department (DeptID CHAR(11),
DeptName CHAR(20),
Bldg No CHAR(5),
PRIMARY KEY (DeptID,Bldg_No),
FOREIGN KEY(Bldg_No) References Buildings on delete cascade
)
Mapping of Specialization / Generalization (EER Construct)to Relational Mapping
The specialialization/Generalization relationship (Enhanced ER Construct) can be mapped to database
tables(relations) using three methods. To demonstrate the methods, we will take the - InventoryItem, Book, DVD
Method 1: All the entities in the relationship are mapped to individual tables
InventoryItem(ID, name)
Book(ID,Publisher)
DVD(ID, Manufacturer)
Method 2: Only subclasses are mapped to tables. The attributes in the superclass are duplicated in all subclasses.
For example -
Book(ID,name, Publisher)
DVD(ID, name, Manufacturer)
Method 3: Only the superclass is mapped to a table. The attributes in the subclasses are taken to the superclass. For
example -
InventoryItem(ID, name, Publisher, Manufacturer)
This method will introduce null values. When we insert a Book record in the table, the Manufacturer column value
will be null. In the same way, when we insert a DVD record in the table, the Publisher value will be null.
Part II: Relational Database Design
Concept of Relational Database Design
AU: Dec.-19, Marks 7
• There are two primary goals of relational database design -
i) To generate a set of relation schemas that allows us to store information without unnecessary redundancy, and
ii) To allows us to retrieve information easily.
• For achieving these goals, the database design need to be normalized. That means we have to check whether the
schema is it normal form or not.
• For checking the normal form of the schema, it is necessary to check the functional dependencies and other data
dependencies that exists within the schema. Hence before letting us know what the normalization means, it is
necessary to understand the concept of functional dependencies.
Functional Dependencies
Definition: Let P and Q be sets of columns, then: P functionally determines Q, written P→Q if and only if any two
rows that are equal on (all the attributes in) P must be equal on (all the attributes in) Q.
In other words, the functional dependency holds if
T1.P T2.P, then T1.Q=T2.Q
Where notation T1.P projects the tuple T1 onto the attribute in P.
For example: Consider a relation in which the roll of the student and his/her name is stored as follows:
Here, R->N is true. That means the functional dependency holds true here. Because for every assigned RollNuumber
of student there will be unique name. For instance: The name of the Student whose RollNo is 1 is AAA. But if we
get two different names for the same roll number then that means the table does not hold the functional dependency.
Following is such table –
In above table for RollNumber 1 we are getting two different names - "AAA" and "XXX". Hence here it does not
hold the functional dependency.
Computing Closure Set of Functional Dependency
The closure set is a set of all functional dependencies implied by a given set F. It is denoted by F +
The closure set of functional dependency can be computed using basic three rules which are also called as
Armstrong's Axioms.
These are as follows -
i) Reflexivity: If XY, then X→ Y
ii) Augmentation: If X→Y, then XZ → YZ for any Z
iii) Transitivity: If X→Y and Y-Z, then X-Z
In addition to above axioms some additional rules for computing closure set of functional dependency are as follows
-
• Union: If XY and X-Z then XYZ
• Decomposition: If X-YZ, then XY and X→ Z
Example 2.8.1 Compute the closure of the following set of functional dependencies for a relation scheme
R(A,B,C,D,E), F={A->BC, CD->E, B->D, E->A)
Solution: Consider F as follows
A->BC
CD->E
B->D
E->A
The closure can be written for each attribute of relation as follows:
• (A)*= Step 1: {A}-> the attribute itself
Step 2: {ABC} as A->BC
Step 3: {ABCD} as B->D
Step 4: {ABCDE} as CD->E
Step 5: {ABCDE} as E->A and A is already present
Hence (A)+ ={ABCDE}
• (B)*=Step 1:{B}
Step 2: {BD} as B->D
Step 3: {BD} as there is no BD pair on LHS of F
Hence (B) ={BD}
• (C)*=Step 1:{C}
Step 2: {C} as there is no single C on LHS of F
Hence (C)* ={C}
• (D)+ = Step 1: {D}
Step 3: {D} as there is no BD pair on LHS of F
Hence (D)* ={D} .
• (E)+ =Step 1: {E}
Step 2: {EA} as E->A
Step 3: {EABC) as A->BC
Step 4: {EABCD} as B->D
Step 5: {EABCD} as CD->E and E is already present
By rearranging we get {ABCDE}
Hence (E)+ ={ABCDE}
• (CD)*=Step 1:{CD}
Step 2:{CDE}
Step 3: {CDEA}
Step 4:{CDEAB}
By rearranging we get {ABCDE}
Hence (CD)* ={ABCDE}
Example 2.8.2 Compute the closure of the following set of functional dependencies for a relation scheme
R(A,B,C,D,E), F={A->BC, CD->E, B->D, E->A) and Find the candidate key.
Solution: For finding the closure of functional dependencies - Refer example 2.8.1.
We can identify candidate from the given relation schema with the help of functional dependency. For that purpose
we need to compute the closure set of attribute. Now we will find out the closure set which can completely identify
the relation R(A,B,C,D).
Let, (A)+ = {ABCDE}
(B)+ = {BD}
(C)+ = {C}
(D)+ ={ABCDE}
(E)+ = {ABCDE}
(CD)+ = {ABCD}
Clearly, only (A), (E) and (CD)* gives us (ABCDE) i.e. complete relation R. Hence these are the candidate keys.
Canonical Cover or Minimal Cover
Formal Definition: A minimal cover for a set F of FDs is a set G of FDs such that:
1) Every dependency in G is of the form X->A, where A is a single attribute.
2) The closure F* is equal to the closure G*.
3) If we obtain a set H of dependencies from G by deleting one or more dependencies or by deleting attributes from
a dependency in G, then F* H*.
Concept of Extraneous Attributes
Definition: An attribute of a functional dependency is said to be extraneous if we can remove it without changing
the closure of the set of functional dependencies. The formal definition of extraneous attributes is as follows:
Consider a set F of functional dependencies and the functional dependency α→β in F
• Attribute A is extraneous in α if A€ α and F logically implies (F - {α→ B}) U {(α- A) →β}
• Attribute A is extraneous in β if A€ βand the set of functional dependencies (F-{α→β}) U {(α→ (β - A) } logically
implies F.
Algorithm for computing Canonical Cover for set of functional DependenciesF
Fc = F
repeat
Use the union rule to replace any dependencies in Fc of the form
α1 →β1 and αl→β2 and αl→β1β2
Find a functional dependency α→β in Fcwith an extraneous attribute either in α or in β
/* The test for extraneous attributes is done using Fc, not F */
If an extraneous attribute is found, delete it from α→βin Fc.
until (Fc does not change)
Example 2.8.3 Consider the following functional dependencies over the attribute set R(ABCDE) for finding minimal
cover FD = {A->C, AC->D, B->ADE}
Solution: Step 1: Split the FD such that R.H.S contain single attribute. Hence we get
A->C
AC->D
B->A
B->D
B->E
Step 2: Find the redundant entries and delete them. This can be done as follows –
For A->C: We find (A)* by assuming that we delete A->C temporarily. We get esimab (A)*={A}. Thus from A it is
not possible to obtain C by deleting A->C. This means we can not delete A->C
• For AC->D: We find (AC)* by assuming that we delete AC->D temporarily. We get (AC)=(AC). Thus by such
deletion it is not possible to obtain D. This means we can not delete AC->D
• For B->A: We find (B)* by assuming that we delete B->A temporarily. We get (B)*={BDE). Thus by such
deletion it is not possible to obtain A. This means we can not delete B->A
• For B->D: We find (B)* by assuming that we delete B->D temporarily. We get (B)=(BEACD). This shows clearly
that even if we delete B->D we can obtain D. This means we can delete B->A. Thus it is redundant.
• For B->E: We find (B) by assuming that we delete B->E temporarily. We get (B)*={BDAC). Thus by such
deletion it is not possible to obtain E. This means we can not delete B->E
To summarize we get now
A->C
AC->D
B->A
B->E
Thus R.H.S gets simplified.
Step 3: Now we will simplify L.H.S.
Consider AC->D. Here we can split A and C. For that we find closure set of A and C.
(A)+ = {AC}
(C)+ = {C}
Thus C can be obtained from both A as well as C. That also means we need not have to have AC on L.H.S. Instead,
only. A can be allowed and C can be eliminated. Thus after simplification we get
A->D
To summarize we get now
A->C
A->D
B->A
B->E
Thus L.H.S gets simplified.
Step 3: The simplified L.H.S. and R.H.S can be combined together to form
A->CD
B->AE
This is a minimal cover or Canonical cover of functional dependencies.
Concept of Redundancy and Anomalies
Definition: Redundancy is a condition created in database in which same piece of data is held at two different
places.
Redundancy is at the root of several problems associated with relational schemas.
Problems caused by redundancy: Following problems can be caused by redundancy-
i) Redundant storage: Some information is stored repeatedly.
ii) Update anomalies: If one copy of such repeated data is updated then inconsistency is created unless all other
copies are similarly updated.
iii) Insertion anomalies: Due to insertion of new record repeated information get added to the relation schema.
iv) Deletion anomalies: Due to deletion of particular record some other important information associated with the
deleted record get deleted and thus we may lose some other important information from the schema.
Example: Following example illustrates the above discussed anomalies or redundancy problems
Consider following Schema in which all possible information about Employee is stored.
1) Redundant storage: Note that the information about DeptID, DeptName and DeptLoc is repeated.
2) Update anomalies: In above table if we change DeptLoc of Pune to Chennai, then it will result inconsistency as
for DeptID 101 the DeptLoc is Pune. Or otherwise, we need to update multiple copies of DeptLoc from Pune to
Chennai. Hence this is an update anomaly.
3) Insertion anomalies: For above table if we want to add new tuple say (5, EEE,50000) for DeptID 101 then it will
cause repeated information of (101, XYZ,Pune) will occur.
4) Deletion anomalies: For above table, if we delete a record for EmpID 4, then automatically information about the
DeptID 102,DeptName PQR and DeptLoc Mumbai will get deleted and one may not be aware about DeptID 102.
This causes deletion anomaly.
Decomposition
• Decomposition is the process of breaking down one table into multiple tables.
• Formal definition of decomposition is -
• A decomposition of relation Schema R consists of replacing the relation Schema by two relation schema that each
contain a subset of attributes of R and together include all attributes of R by storing projections of the instance.
• For example - Consider the following table
Employee_Department table as follows -
We can decompose the above relation Schema into two relation schemas as Employee (Eid, Ename, Age, City,
Salary) and Department (Deptid, Eid, DeptName) as follows –
Employee Table
Department Table
• Hence, the above table can be decomposed into two Schema S and T as follows:
Problems Related to Decomposition:
Following are the potential problems to consider:
1) Some queries become more expensive.
2) Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the
original relation!
3) Checking some dependencies may require joining the instances of the decomposed relations.
4) There may be loss of information during decomposition.
Properties Associated With Decomposition
There are two properties associated with decomposition and those are -
1) Loss-less Join or non Loss Decomposition: When all information found in the original database is preserved
after decomposition, we call it as loss less or non loss decomposition.
2) Dependency Preservation:This is a property in which the constraints on the original table can be maintained by
simply enforcing some constraints on each of the smaller relations.
Non-loss Decomposition or Loss-less Join
The lossless join can be defined using following three conditions:
i) Union of attributes of R1 and R2 must be equal to attribute of R. Each attribute of R must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
ii) Intersection of attributes of R1 and R2 must not be NULL.
Att(R1) Att(R2) ≠ Φ
iii)Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) Att(R2) -> Att(R1)
Att (R1) Att (R2) -> Att (R2)
Example 2.10.1 Consider the following relation R(A,B,C,D)and FDs A->BC, is the decomposition of R into
R1(A,B,C), R2(A,D). Check if the decomposition is lossless join or not.
Solution:
Step 1: Here Att(R1) U Att(R2) = Att(R) i.e R1(A,B,C) u R2(A,D)=(A,B,C,D) i.e R.
Step 2: Here R1 R2={A}. Thus Att(R1) Att(R2)≠Φ. Here the second condition gets satisfied.
Step 3: Att(R1) n Att(R2) -> {A}. Now (A)*={A,B,C) ϵ attributes of R1. Thus the third condition gets satisfied.
This shows that the given decomposition is a lossless join.
Example 2.10.2 Consider the following relation R(A,B,C,D,E,F) and FDs A->BC, C->A, D- >E, F->A, E->D is the
decomposition of R into R1(A,C,D), R2(B,C,D), and R3(E,F,D). Check for lossless.
Solution:
Step 1: R1UR2UR3 = R. Here the first condition for checking lossless join is satisfied as (A,C,D)U (B,C,D) U
(E,F,D) = {A,B,C,D,E,F) which is nothing but R.
Step 2: Consider R1 ∩ R2 = {CD) and R2 R3 = {D}. Hence second condition of intersection not being gets
satisfied.
Step 3: Now, consider R1(A,C,D) and R2(B,C,D). We find R1/R2 = {CD}
(CD)* = {ABCDE} = attributes of R1 i.e.{A,C,D). Hence condition 3 for checking lossless join for R1 and R2 gets
satisfied.
Step 4: Now, consider R2(B,C,D) and R3(E,F,D). We find R2 R3 = {D}. (D)*={D,E} which is neither complete set
of attributes of R2 or R3.[Note that F is missing for being attribute of R3].
Hence it is not lossless join decomposition. Or in other words we can say it is a lossy decomposition.
Example 2.10.3 Suppose that we decompose schema R=(A,B,C,D,E) into (A,B,C) (C,D,E) Show that it is not a
lossless decomposition.
Solution:
Step 1: Here we need to assume some data for the attributes A, B, C, D, and E. Using this data we can represent the
relation as follows -
Relation R
Relation R1 = (A,B,C)
Relation R2 = (C,D,E)
Step 2: Now we will join these tables using natural join, i.e. the join based on common attribute C. We get R1 R2
as
Step 3: We will eliminate all the trivial relations and useless relations. Hence we can obtain R1 U R2 U R3 as,
Step 5:This proves that F+= (F1UF2UF3)+. Hence given decomposition is dependency preserving.
Example 2.10.6 Differentiate lossy decomposition and lossless decomposition.
Solution:
Normal Forms
• Normalization is the process of reorganizing data in a database so that it meets two basic requirements:
1) There is no redundancy of data (all data is stored in only one place), and
2) data dependencies are logical (all related data items are stored together)
Need for normalization
1) It eliminates redundant data.
2) It reduces chances of data error.
3) The normalization is important because it allows database to take up less disk space.
4) It also help in increasing the performance.
5) It improves the data integrity and consistency.
First Normal Form
The table is said to be in 1NF if it follows following rules –
i) It should only have single (atomic) valued attributes/columns.
ii) Values stored in a column should be of the same domain
iii) All the columns in a table should have unique names.
iv) And the order in which data is stored, does not matter.
Consider following Student table
Student
As there are multiple values of phone number for sid 1 and 3, the above table is not in 1NF. We can make it in 1NF.
The conversion is as follows -
Second Normal Form
Before understanding the second normal form let us first discuss the concept of j functional dependency and prime
and non prime attributes.
Concept of Partial Functional Dependency
Partial dependency means that a nonprime attribute is functionally dependent on part of a candidate key.
For example: Consider a relation R(A,B,C,D) with functional dependency {AB->CD,A->C}
Here (AB) is a candidate key because
(AB)+ = {ABCD} = {R}
Hence {A,B) are prime attributes and {C,D) are non prime attribute. In A->C, the non prime attribute C is dependent
upon A which is actually a part of candidate key AB. Hence due to A->C we get partial functional dependency.
Prime and Non Prime Attributes
• Prime attribute: An attribute, which is a part of the candidate-key, is known as a prime attribute.
• Non-prime attribute: An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.
• Example : Consider a Relation R ={A,B,C,D) and candidate key as AB, the Prime attributes: A, B
Non Prime attributes: C, D
The Second Normal Form
For a table to be in the Second Normal Form, following conditions must be followed
i) It should be in the First Normal form.
ii) It should not have partial functional dependency.
For example: Consider following table in which every information about a the Student is maintained in a table such
as student id(sid), student name(sname), course id(cid) and course name(cname).
Student_Course
This table is not in 2NF. For converting above table to 2NF we must follow the following steps -
Step 1: The above table is in 1NF.
Step 2: Here sname and sid are associated similarly cid and cname are associated with each other. Now if we delete
a record with sid=2, then automatically the course C++ will also get deleted. Thus,
sid->sname or cid->cname is a partial functional dependency, because {sid,cid} should be essentially a candidate
key for above table. Hence to bring the above table to 2NF we must decompose it as follows:
Student:
Course:
Superkeys
• {RegID}
• {RegID, RollNo}
• {RegID,Sname}
• {RollNo,Sname}
• {RegID, RollNo,Sname}
Candidate Keys
• {RegID}
• {RollNo}
Third Normal Form
A table is said to be in the Third Normal Form when,
i) It is in the Second Normal form.(i.e. it does not have partial functional dependency)
ii) It doesn't have transitive dependency.
Or in other words
In other words 3NF can be defined as: A table is in 3NF if it is in 2NF and for each functional dependency
X-> Y
at least one of the following conditions hold :
i) X is a super key of table
ii) Y is a prime attribute of table
For example: Consider following table Student_details as follows -
Here
Super keys: {sid}, {sid,sname}, {sid,sname,zipcode}, {sid,zipcode,cityname}... and so on.
Candidate keys:{sid}
Non-Prime attributes: {sname,zipcode,cityname,state}
The dependencies can be denoted as,
sid->sname
sid->zipcode
zipcode->cityname
cityname->state
The above denotes the transitive dependency. Hence above table is not in 3NF. We can convert it into 3NF as
follows:
Student
Zip
Example 2.11.1 Consider the relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies F=
{{A, B} →C, A →{D, E}, B→F, F→ {G, H}, D →{I, J} }
1. What is the key for R? Demonstrate it using the inference rules.
2. Decompose R into 2NF, then 3NF relations.
Solution: Let,
A→ DE (given)
A→ D, A→ E
AsD→IJ, A→IJ
Using union rule we get
A → DEIJ
As A→ A
we get A → ADEIJ
Using augmentation rule we compute AB
AB → ABDEIJ
But
AB→C (given)
AB →ABCDEIJ
B→ F (given) F→ GH .. B→ GH (transitivity)
AB →AGH is also true
Similarly AB→ AF B→F (given)
Thus now using union rule
AB → ABCDEFGHIJ
AB is a key
The table can be converted to 2NF as,
R1= (A, B, C)
R2= (A, D, E, I, J) smoot
R3= (B, F, G, H)
The above 2NF relations can be converted to 3NF as follows:
R1= (A, B, C)
R2= (A, D, E)
R1= (D, I, J)
R1= (B, E)
R3= (E, G, H).
Boyce / Codd Normal Form (BCNF)
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of
anomaly that is not handled by 3NF.
A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.
Or in other words,
For a table to be in BCNF, following conditions must be satisfied:
i) R must be in 3rd Normal Form
ii) For each functional dependency (X → Y), X should be a super Key. In simple words if Y is a prime attribute then
X can not be non prime attribute.
For example - Consider following table that represents that a Student enrollment for the course -
Enrollment Table
Course
Here sid =1 leads to multiple values for courses and skill. Following table shows this
Here sid and course are dependent but the Course and Skill are independent. The multivalued dependency is denoted
as :
sid→ Course
sid→ Skill
Fourth Normal Form
Definition: For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:
1) It should be in the Boyce-Codd Normal Form(BCNF).
2) And, the table should not have any multi-valued dependency.
For example: Consider following student relation which is not in 4NF as it contains multivalued dependency.
Student Table
Now to convert the above table to 4NF we must decompose the table into following two tables.
Student_Course Table
Key: (sid,Course)
Student Skill Table
Key: (sid, Skill)
To avoid the above problem we can decompose the tables into three tables as Seller_Company, Seller_Product, and
Company Product table